The identifier of the model (in <model_provider>/<model_name> format) to generate the auto evaluation result. Note this model must support some form of guided decoding (e.g. OpenAI’s response formatting) in order for results to be computed.
The following illustrates a basic example in which a guided decoding task is defined to evaluate the correctness of a generated output compared to the ground truth.
Copy
Ask AI
client.evaluations.create( name="Example Correctness Evaluation", data=[ { "input": "What color is the sky?", "expected_output": "Blue", "generated_output": "The sky appears blue during ..." }, ... ], tasks=[ { "task_type": "auto_evaluation.guided_decoding", "alias": "correctness", "configuration": { "model": "openai/o3-mini", "prompt": """ Given the user's query: {{item.input}}, The agent's response was: {{item.generated_output}} The expected response is: {{item.expected_output}} Did the agent's response fully represent the expected response? """, "choices": ["Yes", "No"] } } ])