Get Deployment

import SGPClient from 'sgp'; const client = new SGPClient({ apiKey: 'My API Key', }); const modelDeployment = await client.models.deployments.retrieve('model_instance_id', 'deployment_id'); console.log(modelDeployment.id);

{ "name": "<string>", "id": "<string>", "created_at": "2023-11-07T05:31:56Z", "account_id": "<string>", "created_by_user_id": "<string>", "created_by_identity_type": "user", "status": "<string>", "model_creation_parameters": {}, "model_endpoint_id": "<string>", "model_instance_id": "<string>", "vendor_configuration": { "min_workers": 0, "max_workers": 1, "per_worker": 10, "vendor": "LAUNCH" }, "deployment_metadata": {} }

Authorizations

x-api-key

string

header

required

Path Parameters

model_instance_id

string

required

deployment_id

string

required

Response

Successful Response

name

string

required

string

required

The unique identifier of the entity.

created_at

string<date-time>

required

The date and time when the entity was created in ISO format.

account_id

string

required

The ID of the account that owns the given entity.

created_by_user_id

string

required

The user who originally created the entity.

created_by_identity_type

enum<string>

required

The type of identity that created the entity.

Available options:

user,

service_account

status

string

required

Status of the model's deployment.

model_creation_parameters

Model Creation Parameters · object

model_endpoint_id

string

model_instance_id

string

vendor_configuration

LaunchDeploymentVendorConfiguration · object

LaunchDeploymentVendorConfiguration
LLMEngineDeploymentVendorConfiguration

Show child attributes

vendor_configuration.min_workers

integer

default:0

vendor_configuration.max_workers

integer

default:1

vendor_configuration.per_worker

integer

default:10

The maximum number of concurrent requests that an individual worker can service. Launch automatically scales the number of workers for the endpoint so that each worker is processing per_worker requests, subject to the limits defined by min_workers and max_workers.

If the average number of concurrent requests per worker is lower than per_worker, then the number of workers will be reduced. - Otherwise, if the average number of concurrent requests per worker is higher than per_worker, then the number of workers will be increased to meet the elevated traffic.

Here is our recommendation for computing per_worker:

Compute min_workers and max_workers per your minimum and maximum throughput requirements. 2. Determine a value for the maximum number of concurrent requests in the workload. Divide this number by max_workers. Doing this ensures that the number of workers will "climb" to max_workers.

vendor_configuration.vendor

string

default:LAUNCH

Allowed value: "LAUNCH"

deployment_metadata

Deployment Metadata · object

Knowledge Bases

Chunks

Agents

Completions

Chat Completions

Models

Users

Accounts

Organizations

Question Sets

Evaluations

Evaluation Configs

Evaluation Datasets

Studio Projects

Application Specs

Questions

Knowledge Base Data Sources

Model Templates V3 (Beta)

Model server

API Reference

Fine Tuning Jobs V3 (Beta)

Training Datasets V3 (Beta)

package deployments

Beta

Applications

ChatThreads

Interactions

MonitoringDashboard

Chat Themes

account groups

Description

Details

Authorizations

Path Parameters

Response