Currently, the SGP platform supports evaluations done by LLMs in addition to humans. This is called an auto evaluation, and if you choose this option, the application variant will be run against the evaluation dataset and then an LLM will annotate the results based on the rubric.Documentation Index
Fetch the complete documentation index at: https://docs.gp.scale.com/llms.txt
Use this file to discover all available pages before exploring further.
Evaluation types that support autoevaluations through the UI
Currently, three types of evaluations support autoevaluations through the UI:- Generation Datasets: for RAG use cases
- Flexible Datasets: for Summarization use cases
- Flexible Datasets: for Translation use cases)

