Auto Evaluations for Generation Datasets

On this page

How to run an autoevaluation

For generation datasets, you have the option of using an LLM to score the evaluation. This is called an auto evaluation, and if you choose this option, the application variant will be run against the evaluation dataset and then an LLM will annotate the results based on the rubric.

How to run an autoevaluation

For generation datasets, you will be able to select auto-evaluation when configuring the evaluation run. When configuring the evaluation run on the variant, you can either select Auto-Evaluation or Hybrid to enable an auto evaluation on the run (hybrid evaluations will run both an auto-evaluation and enable humans to annotate).

This will run as an asynchronous job in the background. When the evaluation is complete, you can view the updated status on the application page.

Auto Evaluations Overview Auto-evaluations for Summarization Use Cases

Getting Started

Agents

Evaluations

Uploading Data

Creating Applications

Creating Applications

Evaluation Datasets

Inference

Miscellaneous

Managing Annotations

Components

Auto Evaluations for Generation Datasets

How to run an autoevaluation

Getting Started

Agents

Evaluations

Uploading Data

Creating Applications

Creating Applications

Evaluation Datasets

Inference

Miscellaneous

Managing Annotations

Components

​ How to run an autoevaluation

How to run an autoevaluation