> ## Documentation Index > Fetch the complete documentation index at: https://docs.gp.scale.com/llms.txt > Use this file to discover all available pages before exploring further. # Auto Evaluation > Leverage LLM-based tasks to produce evaluation results. ## Guided Decoding Use the `auto_evaluation.guided_decoding` task type for configuring auto evaluation tasks where the set of potential results are well defined. The identifier of the model (in `/` format) to generate the auto evaluation result. Note this model must support some form of guided decoding (e.g. [OpenAI's response formatting](https://platform.openai.com/docs/guides/structured-outputs)) in order for results to be computed. An optional system prompt to be sent as the first message of the chat completion request. The user prompt containing the evaluation question and any relevant data from the evaluation item (see [referencing item data](/docs/v5/next-gen-evaluation/overview#referencing-item-data)). A condition that determines whether the model will be called for each row. A list of responses for the model to return, each with a name and type configuration. Any additional properties to be included in the chat completion request to the model
(e.g. `{ "temperature": 0 }`). ### Example Usage The following illustrates a basic example in which a guided decoding task is defined to evaluate the correctness of a generated output compared to the ground truth. ```python theme={null} client.evaluations.create( name="Example Correctness Evaluation", data=[ { "input": "What color is the sky?", "expected_output": "Blue", "generated_output": "The sky appears blue during ..." }, ... ], tasks=[ { "task_type": "auto_evaluation.guided_decoding", "alias": "correctness", "configuration": { "model": "openai/o3-mini", "prompt": """ Given the user's query: {{item.input}}, The agent's response was: {{item.generated_output}} The expected response is: {{item.expected_output}} Did the agent's response fully represent the expected response? """, "response_format": { "type":"object", "properties": { "question-response": { "type":"string", "enum":["yes","no"] } }, "required":["question-response"] } } } ] ) ``` When defining a task, you can also customize the response format. For example, you can have the judge LLM provide a reason for the final score, which is often helpful. The following example demonstrates several options. ``` tasks = [ { "task_type": "auto_evaluation.guided_decoding", "alias": "multi_response_option_judge", "configuration": { "model": "openai/gpt-4o", "prompt": "Evaluate this response...", "response_format": { "type": "object", "properties": { "is_helpful": { "type": "boolean", "description": "Whether the response is helpful" }, "quality_score": { "type": "integer", "minimum": 1, "maximum": 5, "description": "Quality score from 1 to 5" }, "accuracy_score": { "type": "number", "minimum": 0.0, "maximum": 1.0, "description": "Accuracy as a decimal" }, "category": { "type": "string", "enum": ["excellent", "good", "fair", "poor"], "description": "Quality category" }, "reasoning": { "type": "string", "description": "Explanation of the evaluation" } }, "required": ["is_helpful", "quality_score", "accuracy_score", "category", "reasoning"] } } } ] ```