📘 Get code for this guide This guide is based off of the Flexible Evaluation RecipeWith Flexible Evaluations, you can evaluate an application variant that has multiple inputs/outputs and multiple steps. For example, imagine you have an application that can answer math questions using stock data from the last few days to answer questions like “What was the percent change in APPL over the last three days?”. An evaluation dataset for this application would have two inputs:
query
: the question the user asked the applicationstock_prices
: A list that stores the stock prices for APPL over the last five daysdata_loc
field). You can see the full details of how the annotation configuration works, and how to construct them here.
row
, the outer array of the components
key will be rendered as rows and the inner array is displayed as columns of that row. This configuration is telling the UI to display three rows:
query
as the label and the contents of query
from the dataset in the example above. The second box will show the output
value of the test_case_output
from the variant above.current_date
as the label and the contents of current_date
from the dataset above.expected_output
as the title and the contents of expected_output
from the dataset above.