Models

Generate chat completion

Description

Interact with the LLM model using the specified model_deployment_id. You can include a list of messages as the conversation history. The conversation can feature multiple messages from the roles user, assistant, and system. If the chosen model does not support chat completion, the API will revert to simple completion, disregarding the provided history. The endpoint manages context length exceedance optimistically: it estimates the token count from the provided history and prompt, and if it exceeds the context or approaches 80% of it, the exact token count will be calculated, and the history will be trimmed to fit the context.

{
    "prompt": "Generate 5 more",
    "chat_history": [
        { "role": "system", "content": "You are a name generator. Do not generate anything else than names" },
        { "role": "user", "content": "Generate 5 names" },
        { "role": "assistant", "content": "1. Olivia Bennett\n2. Ethan Carter\n3. Sophia Ramirez\n4. Liam Thompson\n5. Ava Mitchell" }
    ],
}

POST

models

{model_deployment_id}

chat-completions

object

model_request_parameters

object

bindings

object

temperature

number

stop_sequences

array

max_tokens

integer

top_p

number

top_k

number

frequency_penalty

number

presence_penalty

number

stream

boolean

logprobs

boolean

top_logprobs

integer

chat_template

string

chat_template_kwargs

object

chat_history

array

prompt

string

Authorizations

x-api-key

string

headerrequired

Headers

x-selected-account-id

string

Path Parameters

model_deployment_id

string

required

Body

application/json

model_request_parameters

object

temperature

number

default: 0.2

What sampling temperature to use, between [0, 2]. Higher values like 1.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Setting temperature=0.0 will enable fully deterministic (greedy) sampling.NOTE: The temperature parameter range for some model is limited to [0, 1] if the given value is above the available range, it defaults to the max value.

stop_sequences

string[]

List of up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

max_tokens

integer

The maximum number of tokens to generate in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length. If not, specified, max_tokens will be determined based on the model used: | Model API family | Model API default | EGP applied default | | --- | --- | --- | | OpenAI Completions | 16 | context window - prompt size | | OpenAI Chat Completions | context window - prompt size | context window - prompt size | | LLM Engine | max_new_tokens parameter is required | 100 | | Anthropic Claude 2 | max_tokens_to_sample parameter is required | 10000 |

top_p

number

The cumulative probability cutoff for token selection. Lower values mean sampling from a smaller, more top-weighted nucleus. Available for models provided by Google, LLM Engine, and OpenAI.

top_k

number

Sample from the k most likely next tokens at each step. Lower k focuses on higher probability tokens. Available for models provided by Google and LLM Engine.

frequency_penalty

number

Penalize tokens based on how much they have already appeared in the text. Positive values encourage the model to generate new tokens and negative values encourage the model to repeat tokens. Available for models provided by LLM Engine and OpenAI.

presence_penalty

number

Penalize tokens based on if they have already appeared in the text. Positive values encourage the model to generate new tokens and negative values encourage the model to repeat tokens. Available for models provided by LLM Engine and OpenAI.

stream

boolean

default: false

Flag indicating whether to stream the completion response

logprobs

boolean

Whether to return logprobs. Currently only supported for llmengine chat models.

top_logprobs

integer

Number of top logprobs to return. Currently only supported for llmengine chat models.

chat_template

string

The chat template to use for the completion. Currently only supported for llmengine chat models.

chat_template_kwargs

object

Additional keyword arguments for the chat template. Currently only supported for llmengine chat models.

chat_history

object[]

required

Chat history entries with roles and messages. If there's no history, pass an empty list.

prompt

string

required

New user prompt. This will be sent to the model with a user role.

Response

200 - application/json

finish_reason

string

prompt_tokens

integer

default: 0

response_tokens

integer

default: 0

completions

array

required

choices

object[]

Generate completion List Model Deployments of a Model

Models

Knowledge Bases

Chunks

Agents

Completions

Chat Completions

Users

Accounts

Question Sets

Evaluations

Evaluation Configs

Evaluation Datasets

Studio Projects

Application Specs

Questions

Knowledge Base Data Sources

Model Templates V3 (Beta)

Fine Tuning Jobs V3 (Beta)

Training Datasets V3 (Beta)

package deployments

Applications

ChatThreads

Interactions

MonitoringDashboard

Chat Themes

Knowledge Bases

Chunks

Agents

Completions

Chat Completions

Models

Users

Accounts

Question Sets

Evaluations

Evaluation Configs

Evaluation Datasets

Studio Projects

Application Specs

Questions

Knowledge Base Data Sources

Model Templates V3 (Beta)

Fine Tuning Jobs V3 (Beta)

Training Datasets V3 (Beta)

package deployments

Applications

ChatThreads

Interactions

MonitoringDashboard

Chat Themes

Authorizations

Headers

Path Parameters

Body

Response