Agent as a Judge

Create Evaluation

Navigate to the Evaluate tab and click “Create evaluation”.

Add Evaluation Details

Add in Evaluation name, description (optional), tags (optional), and select a dataset.

Add an Agent Judge

An Agent Judge uses a systematic approach to evaluate your dataset. You can either select an existing agent or create a new one. In the dropdown, you will see all existing judges, both LLM and Agent.

Agent Judge Types

We offer two built-in agent judges, Default Agent and IF Agent.

Default Agent - Used for general purpose evaluations
IF Agent - Excels in systematically extracting instructions from a given prompt and assessing a response’s adherence to each instruction. The IF Agent can identify implicit and explicit instructions, structure constraints, content requests, and behavioral requirements (tone, audience, vocabulary)

Configure Agent Judge

Each Agent Judge has its own parameters that can be configured

Model - The model that the Agent Judge uses to evaluate the dataset.
Output Column Name - The name of the column the results of this judge will show up on your evaluation results.
Description - A description of what the agent judge is evaluating.
Output Rules - A space to provide additional instructions to the agent.
Output Type (optional) - Text, integer, float, or boolean. If not provided, output type is inferred from output rules.

Create Evaluation

Select the rows on the dataset you want to run the evaluation, and click Create Evaluation.

View Evaluation Results

If you navigate back to the Evaluation tab, you should be able to see the results of the evaluation.

Data

The data page will have a column with the results of the Agent Judge

Overview

The overview page will have a graph with the visual representation of the evaluation result.

Overview

Getting Started

Evaluations

Evaluation Dashboards

Tracing

Agents

Create Evaluation

Navigate to the Evaluate tab and click “Create evaluation”.

Add Evaluation Details

Add an Agent Judge

Agent Judge Types

Configure Agent Judge

Create Evaluation

View Evaluation Results

Data

Overview

Overview

Getting Started

Evaluations

Evaluation Dashboards

Tracing

Agents

​Create Evaluation

​Navigate to the Evaluate tab and click “Create evaluation”.

​Add Evaluation Details

​Add an Agent Judge

​Agent Judge Types

​Configure Agent Judge

​Create Evaluation

​View Evaluation Results

​Data

​Overview

Create Evaluation

Navigate to the Evaluate tab and click “Create evaluation”.

Add Evaluation Details

Add an Agent Judge

Agent Judge Types

Configure Agent Judge

Create Evaluation

View Evaluation Results

Data

Overview