Overview
Evaluations have been overhauled in the upcoming iteration of SGP to decrease overhead while increasing capabilities for both UI and SDK users. The API is currently available in the latest version of SGP, and access to the new UI is available behind a feature flag, which can be enabled for an account and/or deployment on request. At the most basic level, an evaluation now consists of:- Data: Unstructured, user defined data, optionally, in the form of a reusable dataset.
- Tasks: A set of instructions to produce results based on said data. For example, a chat completion task based on an
input
field within the user defined data, or a human annotation task to produce the expected output. These tasks can be interdependent and composed to create as simple or complex of an evaluation that your use case requires.
Creation
Examples in this guide use the
scale-gp-beta
package which runs exclusively on the V5 API.data
is made up of two items, each with an input
field. A single task is specified to generate a chat completion from each data item.
Data
Each object within thedata
field can be thought of as a row in a table. We refer to these rows as items, and they can be retrieved like so:
Tasks
Tasks can be thought of as instructions to produce “results” for each item within the evaluation, and are run asynchronously after evaluation creation. They contain 3 parts:Used to specify
task_type
specific parameters, such as the messages
field for a chat_completion
task.Referencing item data
Referencing item data
A string following the format When the task is ultimately executed, a chat completion request will be made for each evaluation item with the
"item.<field>"
, here on referred to as an ItemLocator
, can be used within a configuration to indicate that the value should be pulled from the evaluation item at task execution time.For example, in the first code snippet we defined a chat_completion
task that used an ItemLocator
as the content
field for a user message:content
field populated based on the item’s input
field.Data can also be referenced within a string by wrapping an ItemLocator
in double curly braces:Aliases are an optional way to specify the field name that will contain the task’s result for each evaluation item. By default, this will be the
task_type
.