Skip to main content

Create Evaluation

create-evaluation

Add Evaluation Details

Add in Evaluation name, description (optional), tags (optional), and select a dataset. create-evaluation

Add an Agent Judge

An Agent Judge uses a systematic approach to evaluate your dataset. You can either select an existing agent or create a new one. In the dropdown, you will see all existing judges, both LLM and Agent. judge-dropdown

Agent Judge Types

We offer two built-in agent judges, Default Agent and IF Agent.
  • Default Agent - Used for general purpose evaluations
  • IF Agent - Excels in systematically extracting instructions from a given prompt and assessing a response’s adherence to each instruction. The IF Agent can identify implicit and explicit instructions, structure constraints, content requests, and behavioral requirements (tone, audience, vocabulary)
select-judge-type

Configure Agent Judge

Each Agent Judge has its own parameters that can be configured
  • Model - The model that the Agent Judge uses to evaluate the dataset.
  • Output Column Name - The name of the column the results of this judge will show up on your evaluation results.
  • Description - A description of what the agent judge is evaluating.
  • Output Rules - A space to provide additional instructions to the agent.
  • Output Type (optional) - Text, integer, float, or boolean. If not provided, output type is inferred from output rules.
agent-config

Create Evaluation

Select the rows on the dataset you want to run the evaluation, and click Create Evaluation. create-agent-eval

View Evaluation Results

If you navigate back to the Evaluation tab, you should be able to see the results of the evaluation.

Data

The data page will have a column with the results of the Agent Judge agent-eval-results

Overview

The overview page will have a graph with the visual representation of the evaluation result. agent-eval-overview
I