POST
/
v4
/
chat-completions

Authorizations

x-api-key
string
headerrequired

Body

application/json
account_id
string

The account ID to use for usage tracking. This will be gradually enforced.

model
enum<string>
required

The ID of the model to use for chat completions. We only support the models listed here so far.

Available options:
gpt-4,
gpt-4-0613,
gpt-4-32k,
gpt-4-32k-0613,
gpt-4o,
gpt-4o-mini,
gpt-4o-2024-08-06,
gpt-3.5-turbo,
gpt-3.5-turbo-0613,
gpt-3.5-turbo-16k,
gpt-3.5-turbo-16k-0613,
gemini-pro,
gemini-1.5-pro-001,
gemini-1.5-pro-preview-0409,
gemini-1.5-pro-preview-0514,
llama-2-7b-chat,
llama-2-13b-chat,
llama-2-70b-chat,
llama-3-8b-instruct,
llama-3-70b-instruct,
llama-3-1-8b-instruct,
llama-3-1-70b-instruct,
Meta-Llama-3-8B-Instruct-RMU,
Meta-Llama-3-8B-Instruct-RR,
Meta-Llama-3-8B-Instruct-DERTA,
Meta-Llama-3-8B-Instruct-LAT,
mixtral-8x7b-instruct,
mixtral-8x22b-instruct,
claude-3-opus-20240229,
claude-3-sonnet-20240229,
claude-3-haiku-20240307,
claude-3-5-sonnet-20240620,
mistral-large-latest,
phi-3-mini-4k-instruct,
phi-3-cat-merged,
zephyr-cat-merged,
dolphin-2.9-llama3-8b,
dolphin-2.9-llama3-70b,
llama3-1-405b-instruct-v1
memory_strategy
object

The memory strategy to use for the agent. A memory strategy is a way to prevent the underlying LLM's context limit from being exceeded. Each memory strategy uses a different technique to condense the input message list into a smaller payload for the underlying LLM.

We only support the Last K memory strategy right now, but will be adding new strategies soon.

messages
object[]
required

The list of messages in the conversation.

Expand each message type to see how it works and when to use it. Most conversations should begin with a single user message.

model_parameters
object

Configuration parameters for the chat completion model, such as temperature, max_tokens, and stop_sequences.

If not specified, the default value are:

  • temperature: 0.2
  • max_tokens: None (limited by the model's max tokens)
  • stop_sequences: None
instructions
string
default: You are an AI assistant that helps users with their questions by chatting back and forth with them. When asked a question, you should answer it as best as you can with the information you have. If you need more information, you can ask the user for it.

The initial instructions to provide to the chat completion model.

Use this to guide the model to act in more specific ways. For example, if you have specific rules you want to restrict the model to follow you can specify them here.

Good prompt engineering is crucial to getting performant results from the model. If you are having trouble getting the model to perform well, try writing more specific instructions here before trying more expensive techniques such as swapping in other models or finetuning the underlying LLM.

chat_template
string

Currently only supported for LLM-Engine models. A Jinja template string that defines how the chat completion API formats the string prompt. For Llama models, the template must take in at most a messages object, bos_token string, and eos_token string. The messages object is a list of dictionaries, each with keys role and content. For Mixtral models, the template must take in at most a messages object and eos_token string. The messages object looks identical to the Llama model's messages object, but the template can assume the role key takes on the values user or assistant, or system for the first message. The chat template either needs to handle this system message (which gets set via the instructions field or by the messages), or the instructions field must be set to null and the messages object must not contain any system messages.See the default chat template present in the Llama and Mixtral tokenizers for examples.

stream
boolean
default: false

Whether or not to stream the response.

Setting this to True will stream the completion in real-time.

Response

200 - application/json
chat_completion
object
required
token_usage
object