@font-face {
    font-family: 'Aeonik';
    src: url('https://fonts.gstatic.com/s/inter/v18/UcC73FwrK3iLTeHuS_nVMrMxCp50SjIa1ZL7W0Q5nw.woff2');
    font-weight: normal;
    font-style: normal;
    font-display: swap;
  }
  
  h1,
  h2,
  h3 {
    font-family: 'aeonik';
    letter-spacing: normal !important;
  }

Specifying which Model to do Inference On

An overview of Scale Generative Platform's Inference Capabilities

Inference Overview

Scale AI

Home

Guides

API Reference

Recipes

Release Notes

Platform

Build, test, and optimize custom Generative AI applications that unlock the value of your data.

Introduction

Authentication

Getting started with the ScaleGP APIs and SDK

Quick Start

This guide provides initial explanations and pointers for all functions in the SGP UI.

Intro to the SGP UI

On overview of how SGP's Agent Building Framework.

Introduction to Agent Service

Getting started with building, testing and monitoring agents in SGP

Create Agents in SGP

First steps for building agents with YAML.

Hello World

Build a chat-bot for retrieval augmented generation.

Implementing a workflow that will iterate on a condition

Loops

Combining multiple workflows to solve tasks.

State Machines

Build workflows and state machines with conditional paths.

Conditions and Branches

Dynamically evaluate agents per workflow or per node

Evaluation

How to execute interact with agents in the SGP UI.

Running Agents in SGP

How to interact with agents built in Agent Service.

Using Agent Service Endpoints

Evaluation runs are ways to evaluate the performance of your application.

How to Run an Evaluation

For manual evaluations, the application variant will be run against the evaluation dataset and then humans from your team will annotate the results based on the rubric.

Manual Evaluations

The Scale confidence score is a measure of the general performance of your application that is produced when you run the Scale report card on an application variant.

Scale Confidence Score

Quantify the performance of an evaluation.

Evaluation Metrics

Auto Evaluations Overview

Auto Evaluations for Generation Datasets

Auto-evaluations for Summarization Use Cases

Auto-evaluations for Translation Use Cases

What is a Flexible Evaluation

Simple Guide: Evaluating a Math Bot

Full Guide To Flexible Evaluation

Annotation Configuration

How to create and evaluate a multiturn application

Multiturn Evaluation

How to create and evaluate a summarization application.

Summarization Evaluation

How to create and evaluate a translation application.

Translation Evaluation

How to evaluate an application with multimodal inputs

Creating a Multimodal Evaluation

How to upload files to run with multimodal inputs

Uploading a file

A data source describes a location at which a user's data is stored. When connected, the GenAI platform is able to read data from that location. This allows knowledge bases to be created from the data source.

Data Sources

A knowledge base is a centralized repository for information. Knowledge bases allow you to store, organize, and provide access to information (ingest and query on your data).

Knowledge Bases

Once you have a **Data Source** set up and a **Knowledge Base** connected to that **Data Source**, you want to make sure your **Knowledge Base** is up to date. You can ensure that any changes made to the data inside your Source will be updated by setting an upload schedule.

Scheduling Uploads

Applications in SGP

Executable applications are applications that you can build and run inside Scale's Generative AI Platform.

Native SGP Applications

Scale Generative AI Platform provides the ability for users to create fully executable applications directly within the platform.

Application Builder

Scale Generative AI Platform currently supports 4 application templates.

Templates

Integrate off-platform applications with SGP for evaluation

External Applications

Once you are ready to test your applications in front of end users, you can deploy your application inside Scale Generative AI Platform.

Deploying a Variant

Scale Generative AI Platform supports basic customization options for the UI of your application.

UI Customization

Once you have [deployed a variant](/docs/deploying-a-variant), you can interact with it as an application via the API endpoint or via a web application as shown below.

Interacting with an Application

After you deploy your application you can utilize the platform to monitor the performance of your applicaiton.

Monitoring Applications

If you already have an existing dataset that you want to use to evaluate your application inside the Scale Generative AI Platform, you can manually upload it onto the platform. You can use either the UI or the SDK.

Manual Upload

Generating datasets manually is a known tedious task. If you have a Knowledge Base set up for your application in the platform, you can also automatically generate datasets for evaluation through the platform. Often times, the best evaluations are done through having a diverse dataset, so we recommend evaluating your application against a mix of manually and synthetically generated datasets.

Using Knowledge Base

In addition to manually uploading datasets or auto generating form your knowledge, you can also autogenerate datasets centered around specific harms, and tweak the techniques used to create test cases.

Safety Dataset

Deprecation Schedule

Definitions for vocabulary used throughout the documentation

Glossary

Contributor Controls

Admin Controls

SGP's annotation system is built around a queuing system for human annotators or contributors.

SGP Annotation System

Retrieval

Evaluating large language models is intrinsically difficult because of the subjective nature of responses to open ended requests.

Evaluations

Build and evaluate 3 math answering bots of increasing complexity using Flexible Evaluations.

Create a Flexible Evaluation

Customize the UI that annotators see while annotating evaluations

Customize Annotation Config for Evaluations

Generate a report card for an application variant

Generate an Application Report Card

Use this recipe to create a simple completion application

Create a Completion Application

Create a dataset and manually upload test cases. Evaluation datasets contain a set of test cases used to evaluate the performance of your applications.

Create a manual dataset

Create an evaluation dataset and autogenerate test cases from a knowledge base. Evaluation datasets contain a set of test cases used to evaluate the performance of your applications.

Create an autogenerated evaluation dataset

Create an evaluation dataset and autogenerate test cases based on a list of harms. Harms are a list of negative or undesired topics that the model should not generate or properly handle. Advanced configs for emotions, moods, methods, tones can be provided to generate test cases based on the provided configurations

Create an safety evaluation dataset

Use this recipe to use the Scale GenAI Platform SDK to perform completion and chat completions

Create Completion and Chat Completions

Use this recipe to deploy and execute a gemini-pro completion model

Deploy and Execute a Model

Scale Generative AI Platform Release Notes 

### Description
Lists all knowledge bases owned by the authorized user.

### Details
This API can be used to list all knowledge bases that have been created by the user.         This API will return the details of all knowledge bases including their IDs, names, the         embedding models they use, any metadata associated with the knowledge bases, and the         timestamps for their creation, last-updated time.

#### Backwards Compatibility
V2 and V1 Knowledge Bases are entirely separate and not backwards compatible. Users who         have existing V1 knowledge bases will need to migrate their data to V2 knowledge bases.

List Knowledge Bases

### Description
Creates an EGP knowledge base.

### Details
A knowledge base is a storage device for all data that needs to be accessible to EGP models.         Users can upload data from a variety of data sources into a knowledge base, and then query the         knowledge base for chunks that are semantically relevant to the query.

Every knowledge base must be associated with a fixed embedding model. This embedding model         will be used to embed all data that is stored in the knowledge base. The embedding model         cannot be changed once the knowledge base is created. Only the embedding models in the         dropdown menu below are supported.

#### Differences from V1
- V1 data ingestion consisted of knowledge bases, vector stores, and data connectors.         V1 Knowledge bases interacted with natural language, V1 vector stores interacted with         chunks and embeddings, and V1 data connectors set up automatic ingestion pipelines with third         party data sources.
- In V2, all data ingestion is done through knowledge bases. Low level configuration such as         chunking strategies and data sources are now handled by this unified knowledge base v2
upload API.
- The way data is stores in V2 allows for better observability on the ingestion progress and         content of the knowledge base.
- Reliability and scalability is also improved via distributed temporal workflows.

#### Backwards Compatibility
V2 and V1 Knowledge Bases are entirely separate and not backwards compatible. Users who         have existing V1 knowledge bases will need to migrate their data to V2 knowledge bases.

Create Knowledge Base

### Description
Gets the details of a knowledge base.

### Details
This API can be used to get information about a single knowledge base by ID. To use this API,         pass in the `knowledge_base_id` that was returned from your [Create Knowledge Base API](         https://scale-egp.readme.io/reference/create_knowledge_base_v2) call as a path parameter.

This API will return the details of a knowledge base including its ID, name, the embedding         model it uses, any metadata associated with the knowledge base, and the timestamps for its
creation, last-updated time.

#### Backwards Compatibility
V2 and V1 Knowledge Bases are entirely separate and not backwards compatible. Users who         have existing V1 knowledge bases will need to migrate their data to V2 knowledge bases.

Get Knowledge Base

### Description
Deletes a knowledge base.

### Details
This API can be used to delete a knowledge base by ID. To use this API, pass in the         `knowledge_base_id` that was returned from your [Create Knowledge Base API](         https://scale-egp.readme.io/reference/create_knowledge_base_v2) call as a path parameter.

#### Backwards Compatibility
V2 and V1 Knowledge Bases are entirely separate and not backwards compatible. Users who         have existing V1 knowledge bases will need to migrate their data to V2 knowledge bases.

Delete Knowledge Base

Patch Knowledge Base

List Upload Jobs

### Description
Get chunks from a knowledge base using chunk IDs or a matching metadata field. This API will query from the Vector Database using
the passed in filters and optionally can return the embeddings.

    ### Details
    This API can be used to get a list of chunks from a knowledge base. Given a chunk id,             a metadata field and value, or both, matching chunks are searched for in the knowledge base             given by knowledge base id.

Get Chunks

### Description
Query a knowledge base for text chunks that are most semantically relevant to the query.

### Details
Given a query expressed as an embedding, this API runs a similarity search amongst the         embeddings indexed in the knowledge base to find the most relevant chunk embeddings. To use         this API, specify the `knowledge_base_id` of the knowledge base you want to query, pass in         the natural language `query` that you want to search for, specify the value `top_k`,         which is the number of similar chunks that will be returned, and specify whether you want         the returned chunks to `include_embeddings`.

Similarity search is used to efficiently find, retrieve, and rank chunks based on their         similarity to a given query, which is also expressed as an embedding. Similarity scores (         using the cosine similarity metric) are calculated between each chunk embedding and the         embedded query, and the chunks are ranked based on similarity score. The top-ranked chunks         are returned as the query results.

We are using the Hierarchical Navigable Small World (HNSW) algorithm to perform a k nearest         neighbors search in the vector space. This algorithm returns an estimate of the best k         nearest neighbors and is optimized for datasets with hundreds of thousands of vectors. You         can read more about the specifics of this algorithm [here](         https://opensearch.org/docs/1.0/search-plugins/knn/approximate-knn/).

#### Backwards Compatibility
V2 and V1 Knowledge Bases are entirely separate and not backwards compatible. Users who         have existing V1 knowledge bases will need to migrate their data to V2 knowledge bases.

Query Relevant Chunks

Delete Knowledge Base Data Source Connection

Submit Upload Job with local files

### Description
List all uploads for a knowledge base.

### Details
This API can be used to list all uploads that have been created for a knowledge base.         This API will return the details of all uploads including their IDs, their statuses, the         data source configs they use, the chunking strategy configs they use, and the timestamps for         their creation and last-updated time.

#### Backwards Compatibility
V2 and V1 Knowledge Bases are entirely separate and not backwards compatible. Users who         have existing V1 knowledge bases will need to migrate their data to V2 knowledge bases.

Submit Upload Job

Get Upload Job

Cancel Upload Job

### Description
List all artifacts tracked by a knowledge base.

### Details
This API can be used to list all artifacts that are currently tracked in a knowledge base.         This API will return the details of all artifacts including their IDs, names, the source they         originated from, their current upload statuses, and the timestamps for their creation and
last-updated time.

This list should be consistent with the state of the data source at the time of start of the         latest upload. If the state is not consistent, create a new upload to update the knowledge         base to reflect the latest state of the data source.

List Tracked Artifacts

### Description
Gets the details of an artifact tracked by a knowledge base.

### Details
This API can be used to get information about a single artifact by ID. This response will         contain much more detail about the artifact than show in the         [List Artifacts API](https://scale-egp.readme.io/reference/list_knowledge_base_artifacts_v2)         call. To use this API, pass in the `knowledge_base_id` and `artifact_id` that were returned         from your [List Artifacts API](         https://scale-egp.readme.io/reference/list_knowledge_base_artifacts_v2) call as path         parameters.

#### Compatibility with V1
V2 and V1 Knowledge Bases are entirely separate and not backwards compatible. Users who         have existing V1 knowledge bases will need to migrate their data to V2 knowledge bases.

Get Tracked Artifact Details

Delete Locally Stored Artifact

Patch Artifact Information

### Description
List chunks for a specific artifact. This API supports pagination and reads only from the data store to allow
querying chunks that are failed as well to enumerate all chunks of a specific artifact.

List Chunks of Artifacts with Pagination

Create chunk for local chunk artifacts

Get Single Chunk Information and status

Update Single Chunk data for local artifact

Delete Single Chunk from Local Artifact

Test Knowledge Base Data Source credentials

Batch Delete Locally Stored Artifacts

### Description
Lists all upload schedules accessible to the user.

### Details
This API can be used to list upload schedules. If a user has access to multiple accounts, all upload schedules from all accounts the user is associated with will be returned.

List Upload Schedules

### Description
Creates a upload schedule

### Details
This API can be used to create a upload schedule. To use this API, review the request schema and pass in all fields that are required to create a upload schedule.

Create Upload Schedule

### Description
Gets the details of a upload schedule

### Details
This API can be used to get information about a single upload schedule by ID. To use this API, pass in the `id` that was returned from your Create Upload Schedule API call as a path parameter.

Review the response schema to see the fields that will be returned.

Get Upload Schedule

### Description
Deletes a upload schedule

### Details
This API can be used to delete a upload schedule by ID. To use this API, pass in the `id` that was returned from your Create Upload Schedule API call as a path parameter.

Delete Upload Schedule

### Description
Updates a upload schedule

### Details
This API can be used to update the upload schedule that matches the ID that was passed in as a path parameter. To use this API, pass in the `id` that was returned from your Create Upload Schedule API call as a path parameter.

Review the request schema to see the fields that can be updated.

Update Upload Schedule

### Description
Sorts a list of text chunks by similarity against a given query string.

### Details
Use this API endpoint to rank which text chunks provide the most relevant responses to a         given a query string.

This is useful for stuffing chunks into a prompt where order may matter or for filtering out         less relevant chunks according to the ranking strategy. For example, this API may be useful         when doing retrieval augment generation (RAG). Sometimes vector store [similarity search](         https://scale-egp.readme.io/reference/query_vector_store) does not always return the best         ranking of text chunks, since this is heavily dependent on embedding generation. This API         endpoint can act as a post-processing step to re-sort the given chunks using more complex         strategies that may outperform vector search, and then filter only the top-k most relevant         chunks to stuff into the prompt for RAG.

### Restrictions and Limits
Ranking can be a very intensive and slow process depending on methodology where duration         scales with number of chunks. For best performance, we recommend ranking less than 640 chunks         at a time, and you may see a decrease in performance as the number of chunks ranked increases.

Rank Chunks

### Description
Synthesizes a response to the given query using the given chunks as context.

### Details
This API operates on "chunks," which are the result of [querying a vector store](         https://scale-egp.readme.io/reference/query_vector_store).         A chunk is simply a fragment of a larger document.         It can optionally have its own metadata or ID.         You can also construct your own chunks from scratch so long as you provide the text         pertaining to each chunk.

Synthesizing chunks can be thought of as the final step in a retrieval augmented generation (         RAG) system, after querying for chunks and potentially [ranking them](         https://scale-egp.readme.io/reference/rank_chunks).         Once you have the relevant chunks to use to respond to the user, you'll want to synthesize a         readable natural language answer.         During this synthesis step, we prompt an LLM with instructions and a set of (possibly         transformed) chunks to guide it toward synthesizing a natural language response to the user         query.

There are several synthesis strategies, whose difference become more apparent as the number         of chunks increases.         Currently, EGP supports the `compact` strategy by default: stuff as many chunks into the         synthesis LLM as possible, produce a best-effort answer, then continue to stuff and answer         the next set of chunks.         The answer is passed through and gradually refined with each iteration of chunk scanning.         This continues until all chunks have been scanned and a final answer is synthesized.         More synthesis strategies are currently in development!

### Restrictions and Limits
Generally, chunk synthesis increases its token consumption linearly the number of         chunk-tokens.
The `compact` strategy attempts to be as token-efficient as possible by stuffing tokens into         each round of chunk scanning.         Other strategies (in development!) may produce better summaries or more precise answers at         the expense of more tokens consumed.

Broadly, we recommend keeping the number of chunks to under 100, or the total number of         tokens across all chunks under around 10,000.

Synthesize Chunks

### Description
Executes one Agent inference step. Given a list of messages and a list of tools to ask         for help from, the Agent will either respond with a final answer directly or ask the user to         execute a tool to provide more information.

### Details
An Agent is a component that utilizes a Language Model (LLM) as an interpreter and decision         maker. Unlike asking an LLM for a direct response, communicating with an agent consists of a         running dialogue where an agent can optionally ask the user to execute specialized tools for         specific tasks, such as calculations, web searches, or accessing custom data from private         knowledge bases.

An agent is designed to be stateless, emitting outputs one step at a time. This means that         client-side applications are responsible for managing message history, tool execution,         and responses. This grants users greater flexibility to write and execute custom tools         and maintain explicit control over their message history.

#### Message Types
- `User Message`: A message from the user to the agent.
- `System Message`: An informational text message from the system to guide the agent. It is         not a user message or agent message because it did not come from either entity.
- `Agent Message`: A message from the agent to the client. It will contain either a final         answer as `content` or a request for the user to execute a tool as a `tool_request`.
- `Tool Message`: A message from the user to the agent that contains the output of a tool         execution. The tool message will be processed by the agent and the agent will respond with         either a final answer or another tool request.

#### Agent Instructions
Instructions are used to guide the agent's decision making process and output generation.

Good prompt engineering is crucial to getting performant results from the agent. If you are         having trouble getting the agent to perform well, try writing more specific instructions         before trying more expensive techniques such as swapping in other models or finetuning the         underlying LLM.

For example, the default instructions we set for the agent are the following:

> You are an AI assistant that helps users with their questions. You can answer questions         directly or acquire information from any of the attached tools to assist you. Always answer         the user's most recent query to the best of your knowledge.<br>
> When asked about what tools are available, you must list each attached tool's name         and description. When asked about what you can do, mention that in addition to your normal         capabilities, you can also use the attached tools by listing their names and descriptions.         You cannot use any other tools other than the ones provided to you explicitly.


### Restrictions and Limits
**Message Limits**:
  - The message list is not limited by length, but by the context limit of the underlying           language model. If you are getting an error regarding the underlying model's context           limit, try using a memory strategy to condense the input messages.

**Model Restrictions**:
  - Currently, only closed source models like GPT and Claude are supported due to the           limitations of open source models when it comes to tool selection, generating tool           arguments in valid JSON, and planning out multi-step tool execution. Specialized           fine-tuning will likely be required to make open source models compatible with agents.

Execute Agent

### Description
Given a user's input, runs LLM inference to produce the model's response.

### Details
LLM [completions](https://scale-egp.readme.io/docs/completions-1) have many use cases,         such as content summarization, question-answering, and text generation.

The `model` parameter determines which LLM will be used to generate the completion. Keep in         mind that different models have varying sizes, costs, and may perform differently across         different tasks.

The user input, commonly referred to as the "prompt", is a required field in the request         body. The quality of the model's response can vary greatly depending on the input prompt.         Good prompt engineering can significantly enhance the response quality. If you encounter         suboptimal results, consider writing more specific instructions or providing examples to the         LLM before trying more expensive techniques  such as swapping in other models or finetuning.

By default, the endpoint will return the entire response as one whole object. If you would         prefer to stream the completion in real-time, you can achieve this by setting the `stream`         flag to `true`.

Create Completion

### Description
Given a list of messages representing a conversation history, runs LLM inference to produce         the next message.

### Details
Like [completions](https://scale-egp.readme.io/docs/completions-1), [chat completions](         https://scale-egp.readme.io/docs/chat-completions-intro) involve an LLM's response to input.         However, chat completions take a conversation history as input, instead of a single prompt,         which enables the LLM to create responses that take past context into account.

### Messages
The primary input to the LLM is a list of messages represented by the `messages` array,         which forms the conversation. The `messages` array must contain at least one `message` object. 
Each `message` object is attributed to a specific entity through its `role`. The available         roles are:
   - `user`: Represents the human querying the model.            - `assistant`: Represents the model responding to user.            - `system`: Represents a non-user entity that provides information to guide the behavior            of the assistant.

When the `role` of a `message` is set to `user`, `assistant`, or `system`, the `message` must         also contain a `content` field which is a string representing the actual text of the message         itself. Semantically, when the `role` is `user`, `content` contains the user's query. When         the `role` is `assistant`, `content` is the model's response to the user. When the `role` is         `system`, `content` represents the instruction for the assistant.

### Instructions
You may provide instructions to the assistant by supplying by supplying `instructions` in the         HTTP request body or by specifying a `message` with `role` set to `system` in the `messages`         array. By convention, the system message should be the first message in the array. Do **not**         specify both an instruction and a system message in the `messages` array.

Create Chat Completion

Execute Model Deployment

### Description
Computes the text embeddings for text fragments using the model with the given model_deployment_id.

### Details
Users can use this API to execute EMBEDDING type EGP model they have access to. To use this API, pass in         the `id` of a model returned by the V3 Create Model API. An example text embedding request

```json
{
    "texts": ["Please compute my embedding vector", "Another text fragment"]
}
```

Generate text embedding

Generate reranking

### Description

Interact with the LLM model using the specified model_deployment_id. The LLM model will generate a text completion based on the provided prompt.

```json
{
    "prompt": "What is the capital of France?"
}
```

Generate completion

### Description

Interact with the LLM model using the specified model_deployment_id. You can include a list of messages as the conversation history.         The conversation can feature multiple messages from the roles user, assistant, and system. If the chosen model does not support chat completion,         the API will revert to simple completion, disregarding the provided history. The endpoint manages context length exceedance optimistically:         it estimates the token count from the provided history and prompt, and if it exceeds the context or approaches 80% of it, the exact token count will be calculated, and the history will be trimmed to fit the context.

```json
{
    "prompt": "Generate 5 more",
    "chat_history": [
        { "role": "system", "content": "You are a name generator. Do not generate anything else than names" },
        { "role": "user", "content": "Generate 5 names" },
        { "role": "assistant", "content": "1. Olivia Bennett\n2. Ethan Carter\n3. Sophia Ramirez\n4. Liam Thompson\n5. Ava Mitchell" }
    ],
}
```

Generate chat completion

List Model Deployments of a Model

### Description

Model Deployments are unique endpoints created for custom models in the Scale GenAI Platform. They enable users to interact with and utilize specific instances of models through the API/SDK.
Each deployment is associated with a model instance, containing the necessary model template and model-metadata. Model templates describe the creation parameters that are configured on the deployment.
The model deployments provide a means to call upon models for inference, logging calls, and monitoring usage.

Built-in models also have deployments for creating a consistent interface for all models. But they don't represent a real deployment, they are just a way to interact with the built-in models. These deployments are created automatically when the model is created and they are immutable.

### Endpoint details

This endpoint is used to deploy a model instance. The request payload schema depends on the `model_request_parameters_schema` of the Model Template that the created model was created from.

Deploy Model

Get Model usage by model name

List All Model Deployments

Get Model usage for one deployment

TODO: Add model instance to a model group.

Add model to group

Get Model usage for a group

### Description
Lists all models accessible to the user.

### Details
This API can be used to list models. If a user has access to multiple accounts, all models from all accounts the user is associated with will be returned.

List Models

### Description

Creates and hosts a model based on a model template.

Base embedding models, chunk ranking functions, and LLMs are often not sufficient for customer use cases. We have shown in various blogs that fine-tuning these models on customer data can lead to significant improvements in performance.

1. [We Fine-Tuned GPT-4 to Beat the Industry Standard for Text2SQL]( https://scale.com/blog/text2sql-fine-tuning)
2. [OpenAI Names Scale as Preferred Partner to Fine-Tune GPT-3.5]( https://scale.com/blog/open-ai-scale-partnership-gpt-3-5-fine-tuning)
3. [How to Fine-Tune GPT-3.5 Turbo With OpenAI API]( https://scale.com/blog/fine-tune-gpt-3.5)
        

### Details

Before creating a model, you must first create a model template. A model template serves 2 purposes. First, it provides  common scaffolding that is static across multiple models. Second, it exposes several variables that can be injected at model creation time to customize the model.

For example, a model template can define a docker image that contains code to run a HuggingFace or SentenceTransformers model. This docker image code also accepts environment variables that can be set to swap out the model weights or model name. Refer to the Create Model Template API for more details.

To create a new model, users must refer to an existing model template and provide the necessary parameters the the model template requires in its `model_creation_parameters_schema` field. The combination of the model template and the model creation parameters will be used to create and deploy a new model.

Once a model has been created, it can be executed by calling the Execute Model API.
        


### Coming Soon
Some of our EGP APIs depend on models, for example Knowledge Base APIs depend on embedding models, Chunk Ranking APIs depend on ranking models, and Completion APIs depend on LLMs.

In the near future, if a model is created from a model template that is compatible with one of these APIs (based on the model template's `model_type field`), the model will automatically be registered with the API. This will allow users to immediately start using the model with those API without any additional setup.

Create Model

### Description
Gets the details of a model

### Details
This API can be used to get information about a single model by ID. To use this API, pass in the `id` that was returned from your Create Model API call as a path parameter.

Review the response schema to see the fields that will be returned.

Get Model

### Description
Deletes a model

### Details
This API can be used to delete a model by ID. To use this API, pass in the `id` that was returned from your Create Model API call as a path parameter.

Delete Model

### Description
Updates a model

### Details
This API can be used to update the model that matches the ID that was passed in as a path parameter. To use this API, pass in the `id` that was returned from your Create Model API call as a path parameter.

Review the request schema to see the fields that can be updated.

Getting Started

Agents

Evaluations

Uploading Data

Creating Applications

Creating Applications

Evaluation Datasets

Inference

Miscellaneous

Managing Annotations

Components

Inference Overview

Specifying which Model to do Inference On

Getting Started

Agents

Evaluations

Uploading Data

Creating Applications

Creating Applications

Evaluation Datasets

Inference

Miscellaneous

Managing Annotations

Components

​Specifying which Model to do Inference On

Specifying which Model to do Inference On