Guided Decoding

Use the auto_evaluation.guided_decoding task type for configuring auto evaluation tasks where the set of potential results are well defined.

Example Usage

The following illustrates a basic example in which a guided decoding task is defined to evaluate the correctness of a generated output compared to the ground truth.

client.evaluations.create(
  name="Example Correctness Evaluation",
  data=[
    {
      "input": "What color is the sky?",
      "expected_output": "Blue",
      "generated_output": "The sky appears blue during ..."
    },
    ...
  ],
  tasks=[
    {
      "task_type": "auto_evaluation.guided_decoding",
      "alias": "correctness",
      "configuration": {
        "model": "openai/o3-mini",
        "prompt": """
          Given the user's query: {{item.input}},
          The agent's response was: {{item.generated_output}}
          The expected response is: {{item.expected_output}}
          Did the agent's response fully represent the expected response?
        """,
        "choices": ["Yes", "No"]
      }
    }
  ]
)