JavaScript

import SGPClient from 'sgp';

const client = new SGPClient({
  apiKey: process.env['SGP_API_KEY'], // This is the default and can be omitted
});

const completionsResponse = await client.completions.create({ model: 'gpt-oss-120b', prompt: 'prompt' });

console.log(completionsResponse.completion);

{
  "completion": {
    "text": "<string>",
    "finish_reason": "<string>",
    "response_metadata": {}
  },
  "token_usage": {
    "total": 123,
    "prompt": 123,
    "completion": 123
  }
}

Completions

Create Completion

Description

Given a user’s input, runs LLM inference to produce the model’s response.

Details

LLM completions have many use cases, such as content summarization, question-answering, and text generation.

The model parameter determines which LLM will be used to generate the completion. Keep in mind that different models have varying sizes, costs, and may perform differently across different tasks.

The user input, commonly referred to as the “prompt”, is a required field in the request body. The quality of the model’s response can vary greatly depending on the input prompt. Good prompt engineering can significantly enhance the response quality. If you encounter suboptimal results, consider writing more specific instructions or providing examples to the LLM before trying more expensive techniques such as swapping in other models or finetuning.

By default, the endpoint will return the entire response as one whole object. If you would prefer to stream the completion in real-time, you can achieve this by setting the stream flag to true.

POST

completions

JavaScript

import SGPClient from 'sgp';

const client = new SGPClient({
  apiKey: process.env['SGP_API_KEY'], // This is the default and can be omitted
});

const completionsResponse = await client.completions.create({ model: 'gpt-oss-120b', prompt: 'prompt' });

console.log(completionsResponse.completion);

{
  "completion": {
    "text": "<string>",
    "finish_reason": "<string>",
    "response_metadata": {}
  },
  "token_usage": {
    "total": 123,
    "prompt": 123,
    "completion": 123
  }
}

Authorizations

x-api-key

string

header

required

Body

application/json

model

required

The ID of the model to use for completions.

Users have two options:

Option 1: Use one of the supported models from the dropdown.
Option 2: Enter the ID of a custom model.

Note: For custom models we currently only support models finetuned using using the Scale-hosted LLM-Engine API.

Available options:

gpt-oss-120b,

gpt-oss-20b,

gpt-4,

gpt-4-0613,

gpt-4-32k,

gpt-4-32k-0613,

gpt-4-vision-preview,

gpt-4o,

gpt-4o-mini,

gpt-4o-2024-08-06,

gpt-3.5-turbo,

gpt-3.5-turbo-0613,

gpt-3.5-turbo-16k,

gpt-3.5-turbo-16k-0613,

gemini-pro,

gemini-1.5-pro-001,

gemini-1.5-pro-002,

gemini-1.5-pro-preview-0409,

gemini-1.5-pro-preview-0514,

text-davinci-003,

text-davinci-002,

text-curie-001,

text-babbage-001,

text-ada-001,

claude-instant-1,

claude-instant-1.1,

claude-2,

claude-2.0,

llama-7b,

llama-2-7b,

llama-2-7b-chat,

llama-2-13b,

llama-2-13b-chat,

llama-2-70b,

llama-2-70b-chat,

llama-3-8b,

llama-3-8b-instruct,

llama-3-1-8b-instruct,

llama-3-1-70b-instruct,

llama-3-70b-instruct,

llama-3-2-1b-instruct,

llama-3-2-3b-instruct,

llama-3-3-70b-instruct,

Meta-Llama-3-8B-Instruct-RMU,

Meta-Llama-3-8B-Instruct-RR,

Meta-Llama-3-8B-Instruct-DERTA,

Meta-Llama-3-8B-Instruct-LAT,

falcon-7b,

falcon-7b-instruct,

falcon-40b,

falcon-40b-instruct,

mpt-7b,

mpt-7b-instruct,

flan-t5-xxl,

mistral-7b,

mistral-7b-instruct,

mixtral-8x7b,

mixtral-8x7b-instruct,

mixtral-8x22b-instruct,

llm-jp-13b-instruct-full,

llm-jp-13b-instruct-full-dolly,

zephyr-7b-alpha,

zephyr-7b-beta,

zephyr-cat-merged,

codellama-7b,

codellama-7b-instruct,

codellama-13b,

codellama-13b-instruct,

codellama-34b,

codellama-34b-instruct,

phi-3-mini-4k-instruct,

phi-3-cat-merged,

dolphin-2.9-llama3-8b,

dolphin-2.9-llama3-70b,

defense-llama-3-8b-instruct,

donovan-combat-llama

prompt

string

required

Prompt for which to generate the completion.

Good prompt engineering is crucial to getting performant results from the model. If you are having trouble getting the model to perform well, try writing a more specific prompt here before trying more expensive techniques such as swapping in other models or finetuning the underlying LLM.

account_id

string

The account ID to use for usage tracking. This will be gradually enforced.

images

ImageCompletionRequests · object[]

List of image urls to be used for image based completions. Leave empty for text based completions.

Show child attributes

model_parameters

ModelParameters · object

Configuration parameters for the completion model, such as temperature, max_tokens, and stop_sequences.

If not specified, the default value are:

temperature: 0.2
max_tokens: None (limited by the model's max tokens)
stop_sequences: None

Show child attributes

stream

boolean

default:false

Whether or not to stream the response.

Setting this to True will stream the completion in real-time.

Response

Successful Response

completion

Completion · object

required

The actual completion text and the finish reason.

Show child attributes

token_usage

TokenUsage · object

Token usage numbers. If streaming, this field is null until the stream completes, at which point it will be populated (if supported).

Show child attributes

Execute Agent Create Chat Completion

⌘I

Knowledge Bases

Chunks

Agents

Completions

Chat Completions

Models

Users

Accounts

Organizations

Question Sets

Evaluations

Evaluation Configs

Evaluation Datasets

Studio Projects

Application Specs

Questions

Knowledge Base Data Sources

Model Templates V3 (Beta)

Model server

API Reference

Fine Tuning Jobs V3 (Beta)

Training Datasets V3 (Beta)

package deployments

Beta

Applications

ChatThreads

Interactions

MonitoringDashboard

Chat Themes

account groups

Create Completion

Description

Details

Authorizations

Body

Response