> ## Documentation Index
> Fetch the complete documentation index at: https://docs.gp.scale.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Simple Guide: Evaluating a Math Bot

> 📘 Get code for this guide
>
> This guide is based off of the [Flexible Evaluation Recipe](/recipes/flexible-evaluations)

With Flexible Evaluations, you can evaluate an application variant that has multiple inputs/outputs and multiple steps.

For example, imagine you have an application that can answer math questions using stock data from the last few days to answer questions like "What was the percent change in APPL over the last three days?".

An evaluation dataset for this application would have two inputs:

1. `query`: the question the user asked the application
2. `stock_prices`: A list that stores the stock prices for APPL over the last five days

An evaluation dataset for this application will look like the following:

<img src="https://mintcdn.com/scalegp/E4QRE88hrDqzkijh/images/flexible-evaluation/flexible-evaluation-dataset-example.png?fit=max&auto=format&n=E4QRE88hrDqzkijh&q=85&s=972a9662e68410f531aab3cbb0196ee9" alt="Evaluation dataset example" width="3456" height="2234" data-path="images/flexible-evaluation/flexible-evaluation-dataset-example.png" />

We can generate outputs for these test cases (see the [Flexible Evaluation Recipe](/recipes/flexible-evaluations) for an example!). We can view them in the **test case outputs** table under our application variant:

<img src="https://mintcdn.com/scalegp/E4QRE88hrDqzkijh/images/flexible-evaluation/flexible-application-outputs.png?fit=max&auto=format&n=E4QRE88hrDqzkijh&q=85&s=769c16d5269591f3c9a9b1461bc13ff7" alt="Application outputs" width="3456" height="2234" data-path="images/flexible-evaluation/flexible-application-outputs.png" />

Now, the dataset has 2 different inputs, so we would need to configure what the annotators can see. As part of the flexible evaluations process, there's an object that users can use to tell the platform how to render the annotations UI called an **Annotation Configuration**. The annotation configuration is can pull data from different part of test case, test case output, and trace (using the `data_loc` field). You can see the full details of how the annotation configuration works, and how to construct them [here](/docs/flexible-evaluation/annotation-configuration).

```python theme={null}
annotation_config = {
    "direction": "row", # we could also change this to "col"
    "components": [
        [
            # Let's put the query and the output side by side since that's what we care about most
            { "data_loc": ["test_case_data", "input", "query"], "label": "Query" }, # Pull the "query" from the test case's input
            { "data_loc": ["test_case_output", "output"] } # Pull the entire output from running the application on the test case
        ],
        [
            # Pull the entire expected output from the test case
            { "data_loc": ["test_case_data", "expected_output"] }
        ],
        [
            # Pull the stock price from the test case's input
            { "data_loc": ["test_case_data", "input", "stock_prices"] }
        ],
    ]
}
```

Because the direction is set to `row`, the outer array of the `components` key will be rendered as rows and the inner array is displayed as columns of that row. This configuration is telling the UI to display three rows:

* The first row will have one box showing `query` as the label and the contents of `query` from the dataset in the example above. The second box will show the `output` value of the `test_case_output` from the variant above.
* The second row will show the `current_date` as the label and the contents of `current_date` from the dataset above.
* The third row will show `expected_output` as the title and the contents of `expected_output` from the dataset above.

Here is an example of what the Annotation UI will look like based on this configuration:

<img src="https://mintcdn.com/scalegp/2sSrpizRElJqluR6/images/flexible-evaluation/recipe/mathbot_external_data.gif?s=138c1de850ee9039f458c6fc7f32c35c" alt="Annotation UI example" width="1920" height="1080" data-path="images/flexible-evaluation/recipe/mathbot_external_data.gif" />

Once the evaluation result is complete, you will be able to see the results of the annotation by visiting the evaluations page and clicking on "Table".

<img src="https://mintcdn.com/scalegp/E4QRE88hrDqzkijh/images/flexible-evaluation/flexible-evaluation-results-1.png?fit=max&auto=format&n=E4QRE88hrDqzkijh&q=85&s=60a4186ea6706ebe5a71f17de73ff48a" alt="Evaluation results 1" width="3456" height="2234" data-path="images/flexible-evaluation/flexible-evaluation-results-1.png" />

<img src="https://mintcdn.com/scalegp/E4QRE88hrDqzkijh/images/flexible-evaluation/flexible-evaluation-results-2.png?fit=max&auto=format&n=E4QRE88hrDqzkijh&q=85&s=a8adaa205b6408195158e719d832054f" alt="Evaluation results 2" width="3456" height="2234" data-path="images/flexible-evaluation/flexible-evaluation-results-2.png" />

Flexible evaluation runs have many more powerful features. For instance, you can attach **traces** to the output which are the intermediate steps of the application's execution, and examine them while annotating. Or you can use a different annotation configuration for different questions.

To learn more, you can move forward to the full guide to flexible evaluation or read the [Flexible Evaluation Recipe](/recipes/flexible-evaluations).
