The QueryAST (Query Abstract Syntax Tree) is a SQL-like query language for aggregating and filtering evaluation data in dashboard widgets. This guide covers the complete query syntax and advanced features.Documentation Index
Fetch the complete documentation index at: https://docs.gp.scale.com/llms.txt
Use this file to discover all available pages before exploring further.
Query Basics
When Are Queries Required?
Queries are required for data-driven widgets:- METRIC: Requires a MetricQuery (single aggregation)
- TABLE: Requires a SeriesQuery (multiple rows/columns)
- BAR: Requires a SeriesQuery (grouped data)
- HISTOGRAM: Requires a SeriesQuery (distribution data)
- DONUT: Requires a SeriesQuery (grouped data for slices)
- SCATTER: Requires a SeriesQuery (two or more numeric columns)
- TIMESERIES: Requires a SeriesQuery (aggregations over time)
- MARKDOWN: Static content only
- HEADING: Text display only
Data Sources
QueryAST can access two data sources for each evaluation item:data.* - User-defined data
- Custom fields you provided when creating the evaluation
- Examples:
data.input,data.expected_output,data.category,data.score
task_result_cache.* - Task execution results
- Results from evaluation tasks (chat completions, agent runs, etc.)
- Examples:
task_result_cache.llm_score,task_result_cache.latency_ms,task_result_cache.metadata.confidence
Both data sources support nested fields using dot notation. For example,
data.metadata.model_version or task_result_cache.output.reasoning.Query Structure Overview
A query is a JSON object defining what data to select, how to aggregate it, and what filters to apply:Query Types
MetricQuery
Returns a single scalar value. Used exclusively for METRIC widgets. Structure:- Select clause must have exactly one item
- That item must be an
AGGREGATION(not a rawCOLUMN) - No
GROUP BY,ORDER BY, orLIMITallowed
Example: Average Score
Example:SeriesQuery
Returns multiple rows and columns. Used forTABLE, BAR, HISTOGRAM, DONUT, SCATTER, and TIMESERIES widgets.
Structure:
- Select clause must have at least one item
- Can mix
COLUMNandAGGREGATIONexpressions - When using
GROUP BY, non-aggregated columns must be in thegroupBylist
Example: Performance by Category
Example:evaluation_ids (Evaluation Group Dashboards)
When building queries for evaluation group dashboards, you can control which evaluations in the group to include using the evaluation_ids field.
Query-Level evaluation_ids
Add evaluation_ids at the top level of a query to scope it to specific evaluations in the group. If omitted, all evaluations in the group are included.
Per-Aggregation evaluation_ids
Individual aggregation expressions can specify their own evaluation_ids to narrow the scope further. These must be a subset of the query-level evaluation_ids.
The special
_evaluation_id column is automatically available in evaluation group dashboard queries. You can use it in SELECT and GROUP BY to break down data by evaluation source. See the Evaluation Group Dashboards guide for details.SELECT Clause
The SELECT clause defines what columns or aggregations to return.
Column Expression
Reference a raw column from the data without aggregation.type: Always “COLUMN”column: Name of the column to selectsource: “data” or “task_result_cache” (optional, defaults to “data”)alias: Optional name for the result column (recommended for aggregation columns)
Aggregation Expression
Apply an aggregation function to compute a derived value.type: Always “AGGREGATION”function: One of the supported aggregation functions (see below)column: Column to aggregate, or"*"forCOUNT(*)source: “data” or “task_result_cache” (optional)params: Function-specific parameters (forPERCENTILEandPERCENTAGE)alias: Optional name for the result
Aggregation Functions
COUNT
Count the number of rows.
- Total number of items
- Items per category (with
GROUP BY) - Items matching filter criteria
SUM
Sum all values in a column.
- Total cost across all items
- Total tokens used
- Cumulative metrics
AVG
Calculate the mean value.
- Average score
- Mean latency
- Average confidence
MIN / MAX
Find the minimum or maximum value.
- Worst-case latency
- Best score achieved
- Range calculations
STDDEV / VARIANCE
Calculate standard deviation or variance.
- Measure score consistency
- Identify high-variance categories
- Quality control metrics
PERCENTILE
Calculate a specific percentile value.
percentile: Number between 0 and 100 (required)
- P95/P99 latency tracking
- Tail performance analysis
- SLA monitoring
COUNT_DISTINCT
Count unique values in a column.
- Number of unique users
- Distinct categories present
- Cardinality checks
PERCENTAGE
Calculate percentage of rows matching filter criteria.
percentage_filters: Filter object defining which rows to count (required)
- Pass/fail rates
- Percentage above threshold
- Compliance metrics
Filter Clause (WHERE)
Filter clauses narrow down which rows to include in computations.
Basic Filter
Single condition:Multiple Conditions
Combine conditions with logical operators:Comparison Operators
| Operator | Description | Example Value |
|---|---|---|
= | Equals | "test" or 42 |
!= | Not equals | "draft" |
> | Greater than | 0.5 |
< | Less than | 100 |
>= | Greater or equal | 0.7 |
<= | Less or equal | 1.0 |
IN | In list | ["cat1", "cat2"] |
NOT IN | Not in list | ["archived", "deleted"] |
LIKE | Substring match | "%error%" |
NOT LIKE | Not substring match | "%deprecated%" |
Complex Filter Example
Combine multiple conditions with AND/OR logic:(score >= 0.7) AND (category IN ['quality', 'accuracy']) AND (status != 'archived')
Logical operators are evaluated left-to-right without precedence. The expression
A AND B OR C evaluates as ((A AND B) OR C).GROUP BY Clause
Group rows by one or more columns before applying aggregations.
Single Column Grouping
Multi-Column Grouping
Group by multiple dimensions:ORDER BY Clause
Sort results by one or more columns.
Basic Sorting
ASC: Ascending (smallest to largest)DESC: Descending (largest to smallest)
Multi-Column Sorting
Sort by multiple columns with different directions:LIMIT Clause
Restrict the number of rows returned.
- Top-N queries (e.g., “Top 10 Items”)
- Prevent overwhelming table displays
- Performance optimization for large datasets
Advanced Examples
Example 1: Multi-Dimensional Performance Table
Break down performance by model and category with multiple statistics.Example 2: Filtered Percentile Calculation
Calculate P95 score for only production items.Example 3: Complex Filter with IN and LIKE
Find items in specific categories containing certain text.Example 4: Pass/Fail Rate by Model
Calculate success rates grouped by model.Example 5: Nested Data Access
Query nested fields using dot notation.data.metadata.model_version- nested field in user datatask_result_cache.output.confidence- nested field in task results
Debugging Failed Queries
If your widget showscomputation_status: "failed", check:
- Column names: Ensure columns exist in your evaluation data
- Source specification: Verify correct source (“data” vs “task_result_cache”)
- GROUP BY alignment: All non-aggregated
SELECTcolumns must be ingroupBy - Operator/value types: Match operator to value type (IN requires list, LIKE requires string)
- Required params: PERCENTILE needs percentile param, PERCENTAGE needs percentage_filters
error_message field in the widget result for specific error details.
Next Steps
- Apply these query patterns in the Widget Types reference
- See Getting Started for end-to-end examples
- Explore API Reference for programmatic query creation

