Agent Evaluation

Agent-as-a-Judge

Use the auto_evaluation.agent task type for configuring auto evaluation tasks with specialized agents.

Hide configuration properties

model

string

required

The identifier of the model (in <model_provider>/<model_name> format) that the Agent will use to execute the task.

output_rules

string[]

required

A space to provide additional instructions to the agent.

output_type

string

If not provided, output_type is inferred from output rules.

temperature

float

For the APEAgent (Default Agent) you can configure a temperature

Example Usage

The following illustrates a basic example in which an agent task is defined with an Instruction Following Agent to evaluate the adherence of a given response to a prompt.

client.evaluations.create(
  name="Example Instruction Following Evaluation,
  data=[
    {
      "prompt": "You must either describe the process of photosynthesis in exactly 20 words or explain why it’s important in a single sentence under 10 words — choose whichever option is more precise.",
      "response": "Photosynthesis converts sunlight, water, and carbon dioxide into glucose and oxygen, sustaining plant life and fueling ecosystems.",
    },
    ...
  ],
  tasks=[
    {
      "task_type": "auto_evaluation.agent",
      "alias": "instruction-following-agent",
      "configuration": {
        "name": "instruction-following-agent",
        "definition": "Perform comprehensive analysis of instruction following",
        "output_rules": [
            "Score adherence on a 1-5 scale", 
            "List specific violations and well-followed instructions"
        ],
        "output_type": "integer",
        "designated_to": {
                "agent_name": "IFAgent",
                "config": {
                    "model": "openai/o3-mini"
                }
            }
      }
    }
  ]
)

Overview

Getting Started

Evaluations

Evaluation Dashboards

Tracing

Agents

Agent-as-a-Judge

Example Usage

Overview

Getting Started

Evaluations

Evaluation Dashboards

Tracing

Agents

​Agent-as-a-Judge

​Example Usage

Agent-as-a-Judge

Example Usage