Skip to main content

Agent-as-a-Judge

Use the auto_evaluation.agent task type for configuring auto evaluation tasks with specialized agents.

Example Usage

The following illustrates a basic example in which an agent task is defined with an Instruction Following Agent to evaluate the adherence of a given response to a prompt.
client.evaluations.create(
  name="Example Instruction Following Evaluation,
  data=[
    {
      "prompt": "You must either describe the process of photosynthesis in exactly 20 words or explain why it’s important in a single sentence under 10 words — choose whichever option is more precise.",
      "response": "Photosynthesis converts sunlight, water, and carbon dioxide into glucose and oxygen, sustaining plant life and fueling ecosystems.",
    },
    ...
  ],
  tasks=[
    {
      "task_type": "auto_evaluation.agent",
      "alias": "instruction-following-agent",
      "configuration": {
        "name": "instruction-following-agent",
        "definition": "Perform comprehensive analysis of instruction following",
        "output_rules": [
            "Score adherence on a 1-5 scale", 
            "List specific violations and well-followed instructions"
        ],
        "output_type": "integer",
        "designated_to": {
                "agent_name": "IFAgent",
                "config": {
                    "model": "openai/o3-mini"
                }
            }
      }
    }
  ]
)
I