Auto Evaluation
Auto Evaluations Overview
Currently, the SGP platform supports evaluations done by LLMs in addition to humans. This is called an auto evaluation, and if you choose this option, the application variant will be run against the evaluation dataset and then an LLM will annotate the results based on the rubric.
Evaluation types that support autoevaluations through the UI
Currently, three types of evaluations support autoevaluations through the UI:
- Generation Datasets: for RAG use cases
- Flexible Datasets: for Summarization use cases
- Flexible Datasets: for Translation use cases)
For other Flexible Dataset configurations, Auto Evaluations are not yet supported through the UI