Auto Evaluations Overview

Evaluation types that support autoevaluations through the UI

Currently, the SGP platform supports evaluations done by LLMs in addition to humans. This is called an auto evaluation, and if you choose this option, the application variant will be run against the evaluation dataset and then an LLM will annotate the results based on the rubric.

Evaluation types that support autoevaluations through the UI

Currently, three types of evaluations support autoevaluations through the UI:

Generation Datasets: for RAG use cases
Flexible Datasets: for Summarization use cases
Flexible Datasets: for Translation use cases)

For other Flexible Dataset configurations, Auto Evaluations are not yet supported through the UI

Evaluation Metrics Auto Evaluations for Generation Datasets

⌘I

Getting Started

Agents

Evaluations

Uploading Data

Creating Applications

Creating Applications

Evaluation Datasets

Inference

Miscellaneous

Managing Annotations

Components

Auto Evaluations Overview

Evaluation types that support autoevaluations through the UI

Getting Started

Agents

Evaluations

Uploading Data

Creating Applications

Creating Applications

Evaluation Datasets

Inference

Miscellaneous

Managing Annotations

Components

​ Evaluation types that support autoevaluations through the UI

Evaluation types that support autoevaluations through the UI