For generation datasets, you have the option of using an LLM to score the evaluation. This is called an auto evaluation, and if you choose this option, the application variant will be run against the evaluation dataset and then an LLM will annotate the results based on the rubric.Documentation Index
Fetch the complete documentation index at: https://docs.gp.scale.com/llms.txt
Use this file to discover all available pages before exploring further.
How to run an autoevaluation
For generation datasets, you will be able to select auto-evaluation when configuring the evaluation run. When configuring the evaluation run on the variant, you can either select Auto-Evaluation or Hybrid to enable an auto evaluation on the run (hybrid evaluations will run both an auto-evaluation and enable humans to annotate).

