Recipes
- Evaluations
- Applications
- Datasets
- Inference
Customize Annotation Config for Evaluations
Customize the UI that annotators see while annotating evaluations
Following the previous recipe, create an evaluation dataset, question set, application variant etc. Stop after generating test case outputs for your application.
from scale_gp.types.annotation_config_param import AnnotationConfigParam
external_application = ExternalApplication(client).initialize(
application_variant_id=external_application_variant,
application=application,
)
external_application.generate_outputs(
evaluation_dataset_id=flexible_evaluation_dataset.id,
evaluation_dataset_version=1,
)
An annotation configuration is used to configure what the annotation UI will look like. The annotation UI is displayed as a two dimensional grid. There are configurations used to indicate the annotation experience.
First we will define the “annotation_config_dict”. This is used to indicate the layout of the test case on the annotations page. This dictionary has three keys: “annotation_config_type”, “direction”, and “components”. For all flexible evaluations, set the “annotation_config_type” to “flexible”. The “direction” allows users to configure their UI in two ways. If the user selects “row” for this configuration, then the UI will allow for a max number of 2 items in each row and users can create as many rows as desired. If the user selects “col” for this configuration, then the UI will allow for a max number of 2 items in each column, and the users can choose to have as many columns as possible. Overflow for both will allow for either horizontal or vertical scrolling. The “components” section indicates which components in the application variant to show for annotations in the UI.
annotation_config_dict: AnnotationConfigParam = {
"direction": "row", # this is by default - try switching to "col"
"components": [
[
{
"data_loc": ["test_case_data", "input"],
},
{
"data_loc": ["test_case_data", "expected_output"],
},
],
[
{
"data_loc": ["test_case_output", "output"],
},
],
],
}
If desired, you can configure specific views per question. For each question that needs to be annotated, users can indicate which components of the test case are relevant to the question. For example, if the question is focused on evaluating the retrieval part of an application test case, indicate here to only show the traces that are relevant to the retrieval part of the test case in the annotation UI.
# try clicking on the second question to see the annotation UI change
question_id_to_annotation_config_dict: Dict[str, AnnotationConfigParam] = {
question_ids[1]: {
"components": [
[
{
"data_loc": ["test_case_data", "input"],
},
{
"data_loc": ["trace", "completion", "output", "completion_output"], # show a trace span of format ["trace", <node_id>, "output", <key, optionally>]
}
]
],
}
}
You can now create an evaluation with your specifiec annotation configs. If you take a look at the annotation view, you will see it follow the row and column format you specified, along with showing any special information such as traces.
If you click on the second question in the annotation view, you will see it change according to the question specific config defined in question_id_to_annotation_config_dict.
evaluation = client.evaluations.create(
type="builder",
account_id=account_id,
application_spec_id=external_application_spec,
application_variant_id=external_application_variant,
description="description",
evaluation_dataset_id=flexible_evaluation_dataset.id,
annotation_config=annotation_config_dict,
question_id_to_annotation_config=question_id_to_annotation_config_dict,
name="Flexible eval",
evaluation_config_id=evaluation_config.id,
)