Annotation Configuration

By default, the annotation UI which annotators see in SGP shows the test case input, expected output, and output. However, for complex evaluations may want to:

display data from the trace
select which parts of test case inputs and test case outputs to display
modify the layout the annotation UI

The Annotation Configuration allows you to do all three. Here’s what an example annotation configuration looks like:

from scale_gp.lib.types.data_locator import data_locator # this is a helper to produce

annotation_configuration = dict(
  annotation_config_type="flexible", # this is the default, so we could have omitted annotation_config_type entirely.
  direction="row", # or "col"
  components=[ # 2D array representing how things will be layed out in the UI
    [
      dict(data_loc=["test_case_output", "output", "string_output"], label="string output"),
      dict(data_loc=["test_case_data", "expected_output", "string_expected"]),
    ],
    [
      dict(data_loc=data_locator.test_case_output.output["messages_output"]), # The data_locator is an easier way of producing data_locs
    ],
    [
      dict(data_loc=data_locator.trace["tool_call"]).input["string_input"] # reference the "tool_call" node from the trace earlier
    ]
  ]
)

evaluation = sgp_client.evaluations.create(
    account_id=ACCOUNT_ID,
    name=f"example flexible evaluation",
    description="This is a test evaluation",
    type="builder",
    evaluation_config_id=evaluation_config.id, # You need to create an evaluation config, evaluation_dataset, and application spec/variant
    evaluation_dataset_id=flexible_evaluation_dataset.id,
    application_variant_id=application_variant.id,
    application_spec_id=application.id,
  annotation_config=annotation_configuration,
)

When a contributor annotates this evaluation in the UI, they will see an annotation UI that looks something like this: Flexible Annotation UI using Annotation Config

Flexible Annotation UI using Annotation Config

Let’s break down how a custom annotation config is set up:

annotation_config_type: by default this is “flexible”. The other types are “summarization” and “multiturn” which make it easier to work with specific use cases
components: this is a 2D list of annotation items. Each annotation item points to somewhere in the test case data, test case output, or trace. When the annotator grades the test case output, they will see data pulled from each location
- Each annotation item has a “data_loc” field and an optional “label” field. The “data_loc” is an array that points to where annotation data should be pulled from. The “label” is a name to be displayed to a user for the “data_loc”.
  ⚠️ if a “data_loc” points somewhere that doesn’t exist for one or more test cases, you will not be able to create the evaluation.
direction: by default “row”. Decides whether components are laid out as rows or as columns

Here’s are some examples of how different arrangements of components produce different UIs: Diagram of How Annotation Configs Map to Layouts

data_locs can take any of these shapes:

`data_locator` Helper	`data_loc` array	Meaning
`data_locator.test_case_data.input`	`["test_case_data", "input"]`	Display the entire input from the test case
`data_locator.test_case_data.input["\<input key>"]`	`["test_case_data", "input", "\<input key>"]`	Displays a single key from the input
`data_locator.test_case_data.expected`	`["test_case_data", "expected_output"]`	Display the entire expected output from the test case
`data_locator.test_case_data.expected["\<expected output key>"]`	`["test_case_data", "expected_output", "\<expected output key>"]`	Display a single key from the expected output
`data_locator.test_case_output`	`["test_case_output", "output"]`	Display the entire output from the test case output
`data_locator.test_case_output["\<output key>"]`	`["test_case_output", "output", "\<output key>"]`	Display a single key from the output
`data_locator.trace["\<node id from the trace>"].input`	`["trace", "\<node id from the trace>", "input"]`	Display the entire input from a single part of the trace
`data_locator.trace["\<node id from the trace>"].input["\<input key>"]`	`["trace", "\<node id from the trace>", "input", "\<input key>"]`	Display a single key from the input of a part of the trace
`data_locator.trace["\<node id from the trace>"].output`	`["trace", "\<node id from the trace>", "output"]`	Display the entire output from a single part of the trace
`data_locator.trace["\<node id from the trace>"].output["\<output key>"]`	`["trace", "\<node id from the trace>", "output", "\<output key>"]`	Display a single key from the output of a part of the trace
`data_locator.trace["\<node id from the trace>"].expected`	`["trace", "\<node id from the trace>", "expected"]`	Display the entire expected output from a single part of the trace
`["trace", "\<node id from the trace>", "expected", "\<expected key>"]`	`data_locator.trace["\<node id from the trace>"].expected["\<expected key>"]`	Display a single key from the expec output of a part of the trace

It is highly recommended that you use the data_locator helper instead of manually creating the data_loc array.

Customizing the Annotation UI per question

Sometimes, you have certain questions in an evaluation rubric that are relevant only to a specific part of the test case, test case output or trace. For instance, you might ask a question specifically about the “completion” or “reranking” step in the trace. In that case you can create a question_id_to_annotation_config mapping that lets you override the annotation config for a specific question ID:

question_id_to_annotation_config = {
    questions[1].id: dict(
        components=[
            [
                dict(
                    data_loc=data_locator.trace["completions"].input,
                    label="string output",
                ),
                dict(
                    data_loc=data_locator.trace["completions"].output
                )
            ],
            [
                dict(
                    data_loc=data_locator.trace["completions"].expected
                ),
            ],
        ],
    )
}

evaluation = sgp_client.evaluations.create(
  ... # specify all the usual evaluation fields,
  annotation_config=annotation_configuration,
  question_id_to_annotation_configuration=queston_id_to_annotation_config # where specified, overrides the annotation_config
)

In the annotation UI, the information rendered will now change for each respective evaluation question as mapped above: Flexible Annotation with question_id_to_evaluation_config Demonstration

Flexible Annotation with question_id_to_evaluation_config Demonstration

Dev Mode

SGP also supports “Dev Mode” which allows an annotator to view all the inputs, outputs and the full trace all at once. You can toggle Dev Mode by clicking on the top right in the annotation UI: Dev Mode turned on. An annotator can toggle dev mode using the "Dev Mode" switch in the top right corner.

Getting Started

Agents

Evaluations

Uploading Data

Creating Applications

Creating Applications

Evaluation Datasets

Inference

Miscellaneous

Managing Annotations

Components

Annotation Configuration

Customizing the Annotation UI per question

Dev Mode

Getting Started

Agents

Evaluations

Uploading Data

Creating Applications

Creating Applications

Evaluation Datasets

Inference

Miscellaneous

Managing Annotations

Components

​Customizing the Annotation UI per question

​Dev Mode

Customizing the Annotation UI per question

Dev Mode