JavaScript

import SGPClient from 'sgp';

const client = new SGPClient({
  apiKey: 'My API Key',
});

const chatCompletionsResponse = await client.chatCompletions.create({
  messages: [{ content: 'string' }],
  model: 'gpt-oss-120b',
});

console.log(chatCompletionsResponse.chat_completion);

{
  "chat_completion": {
    "message": {
      "role": "user",
      "content": "<string>"
    },
    "finish_reason": "<string>"
  },
  "token_usage": {
    "prompt": 123,
    "completion": 123,
    "total": 123
  }
}

Create Chat Completion

Description

Given a list of messages representing a conversation history, runs LLM inference to produce the next message.

Details

Like completions, chat completions involve an LLM’s response to input. However, chat completions take a conversation history as input, instead of a single prompt, which enables the LLM to create responses that take past context into account.

Messages

The primary input to the LLM is a list of messages represented by the messages array, which forms the conversation. The messages array must contain at least one message object. Each message object is attributed to a specific entity through its role. The available roles are:

user: Represents the human querying the model. - assistant: Represents the model responding to user. - system: Represents a non-user entity that provides information to guide the behavior of the assistant.

When the role of a message is set to user, assistant, or system, the message must also contain a content field which is a string representing the actual text of the message itself. Semantically, when the role is user, content contains the user’s query. When the role is assistant, content is the model’s response to the user. When the role is system, content represents the instruction for the assistant.

Instructions

You may provide instructions to the assistant by supplying by supplying instructions in the HTTP request body or by specifying a message with role set to system in the messages array. By convention, the system message should be the first message in the array. Do not specify both an instruction and a system message in the messages array.

POST

chat-completions

JavaScript

import SGPClient from 'sgp';

const client = new SGPClient({
  apiKey: 'My API Key',
});

const chatCompletionsResponse = await client.chatCompletions.create({
  messages: [{ content: 'string' }],
  model: 'gpt-oss-120b',
});

console.log(chatCompletionsResponse.chat_completion);

{
  "chat_completion": {
    "message": {
      "role": "user",
      "content": "<string>"
    },
    "finish_reason": "<string>"
  },
  "token_usage": {
    "prompt": 123,
    "completion": 123,
    "total": 123
  }
}

Authorizations

x-api-key

string

header

required

Body

application/json

model

enum<string>

required

The ID of the model to use for chat completions. We only support the models listed here so far.

Available options:

gpt-oss-120b,

gpt-oss-20b,

o1,

o1-mini,

o3-mini,

gpt-4,

gpt-4-0613,

gpt-4-32k,

gpt-4-32k-0613,

gpt-4o,

gpt-4o-mini,

gpt-4o-2024-08-06,

gpt-3.5-turbo,

gpt-3.5-turbo-0613,

gpt-3.5-turbo-16k,

gpt-3.5-turbo-16k-0613,

gemini-pro,

gemini-1.5-pro-001,

gemini-1.5-pro-002,

gemini-1.5-pro-preview-0409,

gemini-1.5-pro-preview-0514,

llama-2-7b-chat,

llama-2-13b-chat,

llama-2-70b-chat,

llama-3-8b-instruct,

llama-3-70b-instruct,

llama-3-1-8b-instruct,

llama-3-1-70b-instruct,

llama-3-2-1b-instruct,

llama-3-2-3b-instruct,

llama-3-3-70b-instruct,

Meta-Llama-3-8B-Instruct-RMU,

Meta-Llama-3-8B-Instruct-RR,

Meta-Llama-3-8B-Instruct-DERTA,

Meta-Llama-3-8B-Instruct-LAT,

mixtral-8x7b-instruct,

mixtral-8x22b-instruct,

claude-3-opus-20240229,

claude-3-sonnet-20240229,

claude-3-haiku-20240307,

claude-3-5-sonnet-20240620,

claude-3-5-sonnet-20241022,

mistral-large-latest,

phi-3-mini-4k-instruct,

phi-3-cat-merged,

zephyr-cat-merged,

dolphin-2.9-llama3-8b,

dolphin-2.9-llama3-70b,

defense-llama-3-8b-instruct,

donovan-combat-llama,

llama3-1-405b-instruct-v1

messages

Messages · array

required

The list of messages in the conversation.

Expand each message type to see how it works and when to use it. Most conversations should begin with a single user message.

Show child attributes

account_id

string

The account ID to use for usage tracking. This will be gradually enforced.

memory_strategy

object

The memory strategy to use for the agent. A memory strategy is a way to prevent the underlying LLM's context limit from being exceeded. Each memory strategy uses a different technique to condense the input message list into a smaller payload for the underlying LLM.

We only support the Last K memory strategy right now, but will be adding new strategies soon.

Show child attributes

model_parameters

object

Configuration parameters for the chat completion model, such as temperature, max_tokens, and stop_sequences.

If not specified, the default value are:

temperature: 0.2
max_tokens: None (limited by the model's max tokens)
stop_sequences: None

Show child attributes

instructions

string

default:You are an AI assistant that helps users with their questions by chatting back and forth with them. When asked a question, you should answer it as best as you can with the information you have. If you need more information, you can ask the user for it.

The initial instructions to provide to the chat completion model.

Use this to guide the model to act in more specific ways. For example, if you have specific rules you want to restrict the model to follow you can specify them here.

Good prompt engineering is crucial to getting performant results from the model. If you are having trouble getting the model to perform well, try writing more specific instructions here before trying more expensive techniques such as swapping in other models or finetuning the underlying LLM.

chat_template

string

Currently only supported for LLM-Engine models. A Jinja template string that defines how the chat completion API formats the string prompt. For Llama models, the template must take in at most a messages object, bos_token string, and eos_token string. The messages object is a list of dictionaries, each with keys role and content. For Mixtral models, the template must take in at most a messages object and eos_token string. The messages object looks identical to the Llama model's messages object, but the template can assume the role key takes on the values user or assistant, or system for the first message. The chat template either needs to handle this system message (which gets set via the instructions field or by the messages), or the instructions field must be set to null and the messages object must not contain any system messages.See the default chat template present in the Llama and Mixtral tokenizers for examples.

stream

boolean

default:false

Whether or not to stream the response.

Setting this to True will stream the completion in real-time.

Response

Successful Response

chat_completion

object

required

Show child attributes

token_usage

object

Show child attributes

Knowledge Bases

Chunks

Agents

Completions

Chat Completions

Models

Users

Accounts

Organizations

Question Sets

Evaluations

Evaluation Configs

Evaluation Datasets

Studio Projects

Application Specs

Questions

Knowledge Base Data Sources

Model Templates V3 (Beta)

Model server

API Reference

Fine Tuning Jobs V3 (Beta)

Training Datasets V3 (Beta)

package deployments

Beta

Applications

ChatThreads

Interactions

MonitoringDashboard

Chat Themes

account groups

account

Create Chat Completion

Description

Details

Messages

Instructions

Authorizations

Body

Response