Currently, the SGP platform supports evaluations done by LLMs in addition to humans. This is called an auto evaluation, and if you choose this option, the application variant will be run against the evaluation dataset and then an LLM will annotate the results based on the rubric.