The QueryAST (Query Abstract Syntax Tree) is a SQL-like query language for aggregating and filtering evaluation data in dashboard widgets. This guide covers the complete query syntax and advanced features.
Query Basics
When Are Queries Required?
Queries are required for data-driven widgets:
- METRIC: Requires a MetricQuery (single aggregation)
- TABLE: Requires a SeriesQuery (multiple rows/columns)
- BAR: Requires a SeriesQuery (grouped data)
- HISTOGRAM: Requires a SeriesQuery (distribution data)
Queries are NOT needed for:
- MARKDOWN: Static content only
- HEADING: Text display only
Data Sources
QueryAST can access two data sources for each evaluation item:
data.* - User-defined data
- Custom fields you provided when creating the evaluation
- Examples:
data.input, data.expected_output, data.category, data.score
task_result_cache.* - Task execution results
- Results from evaluation tasks (chat completions, agent runs, etc.)
- Examples:
task_result_cache.llm_score, task_result_cache.latency_ms, task_result_cache.metadata.confidence
Both data sources support nested fields using dot notation. For example, data.metadata.model_version or task_result_cache.output.reasoning.
Query Structure Overview
A query is a JSON object defining what data to select, how to aggregate it, and what filters to apply:
{
"select": [...], // Columns and aggregations to return
"filter": {...}, // Optional WHERE clause conditions
"groupBy": [...], // Optional GROUP BY columns
"orderBy": [...], // Optional sorting
"limit": 100 // Optional row limit
}
Query Types
MetricQuery
Returns a single scalar value. Used exclusively for METRIC widgets.
Structure:
{
"select": [
{
"expression": {
"type": "AGGREGATION",
"function": "AVG|SUM|COUNT|...",
"column": "column_name",
"source": "data|task_result_cache"
},
"alias": "optional_alias"
}
],
"filter": { /* optional */ }
}
Constraints:
- Select clause must have exactly one item
- That item must be an
AGGREGATION (not a raw COLUMN)
- No
GROUP BY, ORDER BY, or LIMIT allowed
Example: Average Score
Example:
{
"select": [
{
"expression": {
"type": "AGGREGATION",
"function": "AVG",
"column": "score",
"source": "data"
},
"alias": "avg_score"
}
],
"filter": {
"conditions": [
{
"column": "category",
"source": "data",
"operator": "=",
"value": "test"
}
]
}
}
SQL Equivalent:
SELECT AVG(data->>'$.score') AS avg_score
FROM evaluation_items
WHERE data->>'$.category' = 'test'
SeriesQuery
Returns multiple rows and columns. Used for TABLE, BAR, and HISTOGRAM widgets.
Structure:
{
"select": [
// Mix of columns and aggregations
],
"filter": { /* optional */ },
"groupBy": [ /* optional column names */ ],
"orderBy": [ /* optional sort specs */ ],
"limit": 100 // optional
}
Constraints:
- Select clause must have at least one item
- Can mix
COLUMN and AGGREGATION expressions
- When using
GROUP BY, non-aggregated columns must be in the groupBy list
Example:
{
"select": [
{
"expression": {
"type": "COLUMN",
"column": "category",
"source": "data"
}
},
{
"expression": {
"type": "AGGREGATION",
"function": "AVG",
"column": "score",
"source": "data"
},
"alias": "avg_score"
}
],
"groupBy": ["category"],
"orderBy": [
{
"column": "avg_score",
"direction": "DESC"
}
],
"limit": 10
}
SQL Equivalent:
SELECT
data->>'$.category' AS category,
AVG(data->>'$.score') AS avg_score
FROM evaluation_items
GROUP BY data->>'$.category'
ORDER BY avg_score DESC
LIMIT 10
SELECT Clause
The SELECT clause defines what columns or aggregations to return.
Column Expression
Reference a raw column from the data without aggregation.
{
"expression": {
"type": "COLUMN",
"column": "category",
"source": "data"
},
"alias": "category_name" // Optional alias
}
Fields:
type: Always “COLUMN”
column: Name of the column to select
source: “data” or “task_result_cache” (optional, defaults to “data”)
alias: Optional name for the result column
Aggregation Expression
Apply an aggregation function to compute a derived value.
{
"expression": {
"type": "AGGREGATION",
"function": "AVG",
"column": "score",
"source": "data",
"params": {} // Optional parameters for certain functions
},
"alias": "average_score"
}
Fields:
type: Always “AGGREGATION”
function: One of the supported aggregation functions (see below)
column: Column to aggregate, or "*" for COUNT(*)
source: “data” or “task_result_cache” (optional)
params: Function-specific parameters (for PERCENTILE and PERCENTAGE)
alias: Optional name for the result
Aggregation Functions
COUNT
Count the number of rows.
{
"type": "AGGREGATION",
"function": "COUNT",
"column": "*"
}
Use Cases:
- Total number of items
- Items per category (with
GROUP BY)
- Items matching filter criteria
SUM
Sum all values in a column.
{
"type": "AGGREGATION",
"function": "SUM",
"column": "cost",
"source": "task_result_cache"
}
Use Cases:
- Total cost across all items
- Total tokens used
- Cumulative metrics
AVG
Calculate the mean value.
{
"type": "AGGREGATION",
"function": "AVG",
"column": "score",
"source": "data"
}
Use Cases:
- Average score
- Mean latency
- Average confidence
MIN / MAX
Find the minimum or maximum value.
{
"type": "AGGREGATION",
"function": "MAX",
"column": "latency_ms",
"source": "task_result_cache"
}
Use Cases:
- Worst-case latency
- Best score achieved
- Range calculations
STDDEV / VARIANCE
Calculate standard deviation or variance.
{
"type": "AGGREGATION",
"function": "STDDEV",
"column": "score",
"source": "data"
}
Use Cases:
- Measure score consistency
- Identify high-variance categories
- Quality control metrics
PERCENTILE
Calculate a specific percentile value.
{
"type": "AGGREGATION",
"function": "PERCENTILE",
"column": "latency_ms",
"source": "task_result_cache",
"params": {
"percentile": 95
}
}
Parameters:
percentile: Number between 0 and 100 (required)
Use Cases:
- P95/P99 latency tracking
- Tail performance analysis
- SLA monitoring
COUNT_DISTINCT
Count unique values in a column.
{
"type": "AGGREGATION",
"function": "COUNT_DISTINCT",
"column": "user_id",
"source": "data"
}
Use Cases:
- Number of unique users
- Distinct categories present
- Cardinality checks
PERCENTAGE
Calculate percentage of rows matching filter criteria.
{
"type": "AGGREGATION",
"function": "PERCENTAGE",
"column": "*",
"params": {
"percentage_filters": {
"conditions": [
{
"column": "score",
"source": "data",
"operator": ">=",
"value": 0.7
}
]
}
}
}
Parameters:
percentage_filters: Filter object defining which rows to count (required)
Use Cases:
- Pass/fail rates
- Percentage above threshold
- Compliance metrics
Filter Clause (WHERE)
Filter clauses narrow down which rows to include in computations.
Basic Filter
Single condition:
{
"filter": {
"conditions": [
{
"column": "category",
"source": "data",
"operator": "=",
"value": "quality"
}
]
}
}
Multiple Conditions
Combine conditions with logical operators:
{
"filter": {
"conditions": [
{
"column": "score",
"source": "data",
"operator": ">=",
"value": 0.5
},
{
"column": "category",
"source": "data",
"operator": "=",
"value": "production"
}
],
"logicalOperators": ["AND"]
}
}
The number of logical operators must be exactly len(conditions) - 1. For 3 conditions, you need 2 operators.
Comparison Operators
| Operator | Description | Example Value |
|---|
= | Equals | "test" or 42 |
!= | Not equals | "draft" |
> | Greater than | 0.5 |
< | Less than | 100 |
>= | Greater or equal | 0.7 |
<= | Less or equal | 1.0 |
IN | In list | ["cat1", "cat2"] |
NOT IN | Not in list | ["archived", "deleted"] |
LIKE | Substring match | "%error%" |
NOT LIKE | Not substring match | "%deprecated%" |
Complex Filter Example
Combine multiple conditions with AND/OR logic:
{
"filter": {
"conditions": [
{
"column": "score",
"source": "data",
"operator": ">=",
"value": 0.7
},
{
"column": "category",
"source": "data",
"operator": "IN",
"value": ["quality", "accuracy"]
},
{
"column": "status",
"source": "data",
"operator": "!=",
"value": "archived"
}
],
"logicalOperators": ["AND", "AND"]
}
}
This evaluates as: (score >= 0.7) AND (category IN ['quality', 'accuracy']) AND (status != 'archived')
Logical operators are evaluated left-to-right without precedence. The expression A AND B OR C evaluates as ((A AND B) OR C).
GROUP BY Clause
Group rows by one or more columns before applying aggregations.
Single Column Grouping
{
"select": [
{
"expression": {"type": "COLUMN", "column": "category", "source": "data"}
},
{
"expression": {"type": "AGGREGATION", "function": "COUNT", "column": "*"},
"alias": "count"
}
],
"groupBy": ["category"]
}
SQL Equivalent:
SELECT
data->>'$.category' AS category,
COUNT(*) AS count
FROM evaluation_items
GROUP BY data->>'$.category'
Multi-Column Grouping
Group by multiple dimensions:
{
"select": [
{
"expression": {"type": "COLUMN", "column": "model", "source": "data"}
},
{
"expression": {"type": "COLUMN", "column": "category", "source": "data"}
},
{
"expression": {"type": "AGGREGATION", "function": "AVG", "column": "score", "source": "data"},
"alias": "avg_score"
}
],
"groupBy": ["model", "category"]
}
SQL Equivalent:
SELECT
data->>'$.model' AS model,
data->>'$.category' AS category,
AVG(data->>'$.score') AS avg_score
FROM evaluation_items
GROUP BY data->>'$.model', data->>'$.category'
When using GROUP BY, all non-aggregated columns in SELECT must appear in the groupBy array.
ORDER BY Clause
Sort results by one or more columns.
Basic Sorting
{
"orderBy": [
{
"column": "avg_score",
"direction": "DESC"
}
]
}
Directions:
ASC: Ascending (smallest to largest)
DESC: Descending (largest to smallest)
Multi-Column Sorting
Sort by multiple columns with different directions:
{
"orderBy": [
{
"column": "category",
"direction": "ASC"
},
{
"column": "avg_score",
"direction": "DESC"
}
]
}
This sorts by category alphabetically, then by avg_score descending within each category.
LIMIT Clause
Restrict the number of rows returned.
Use Cases:
- Top-N queries (e.g., “Top 10 Items”)
- Prevent overwhelming table displays
- Performance optimization for large datasets
Example: Top 5 Categories
{
"select": [
{"expression": {"type": "COLUMN", "column": "category", "source": "data"}},
{"expression": {"type": "AGGREGATION", "function": "AVG", "column": "score", "source": "data"}, "alias": "avg_score"}
],
"groupBy": ["category"],
"orderBy": [{"column": "avg_score", "direction": "DESC"}],
"limit": 5
}
Advanced Examples
Break down performance by model and category with multiple statistics.
{
"select": [
{
"expression": {
"type": "COLUMN",
"column": "model",
"source": "data"
}
},
{
"expression": {
"type": "COLUMN",
"column": "category",
"source": "data"
}
},
{
"expression": {
"type": "AGGREGATION",
"function": "COUNT",
"column": "*"
},
"alias": "count"
},
{
"expression": {
"type": "AGGREGATION",
"function": "AVG",
"column": "score",
"source": "data"
},
"alias": "avg_score"
},
{
"expression": {
"type": "AGGREGATION",
"function": "STDDEV",
"column": "score",
"source": "data"
},
"alias": "stddev_score"
},
{
"expression": {
"type": "AGGREGATION",
"function": "MIN",
"column": "score",
"source": "data"
},
"alias": "min_score"
},
{
"expression": {
"type": "AGGREGATION",
"function": "MAX",
"column": "score",
"source": "data"
},
"alias": "max_score"
}
],
"groupBy": ["model", "category"],
"orderBy": [
{
"column": "model",
"direction": "ASC"
},
{
"column": "avg_score",
"direction": "DESC"
}
]
}
Example 2: Filtered Percentile Calculation
Calculate P95 score for only production items.
{
"select": [
{
"expression": {
"type": "AGGREGATION",
"function": "PERCENTILE",
"column": "latency_ms",
"source": "task_result_cache",
"params": {
"percentile": 95
}
},
"alias": "p95_latency"
}
],
"filter": {
"conditions": [
{
"column": "environment",
"source": "data",
"operator": "=",
"value": "production"
},
{
"column": "latency_ms",
"source": "task_result_cache",
"operator": ">",
"value": 0
}
],
"logicalOperators": ["AND"]
}
}
Example 3: Complex Filter with IN and LIKE
Find items in specific categories containing certain text.
{
"select": [
{
"expression": {"type": "COLUMN", "column": "input", "source": "data"}
},
{
"expression": {"type": "COLUMN", "column": "category", "source": "data"}
},
{
"expression": {"type": "COLUMN", "column": "score", "source": "data"}
}
],
"filter": {
"conditions": [
{
"column": "category",
"source": "data",
"operator": "IN",
"value": ["quality", "accuracy", "helpfulness"]
},
{
"column": "input",
"source": "data",
"operator": "LIKE",
"value": "%customer support%"
}
],
"logicalOperators": ["AND"]
},
"orderBy": [
{
"column": "score",
"direction": "DESC"
}
],
"limit": 20
}
Example 4: Pass/Fail Rate by Model
Calculate success rates grouped by model.
{
"select": [
{
"expression": {"type": "COLUMN", "column": "model", "source": "data"}
},
{
"expression": {"type": "AGGREGATION", "function": "COUNT", "column": "*"},
"alias": "total"
},
{
"expression": {
"type": "AGGREGATION",
"function": "PERCENTAGE",
"column": "*",
"params": {
"percentage_filters": {
"conditions": [
{
"column": "score",
"source": "data",
"operator": ">=",
"value": 0.7
}
]
}
}
},
"alias": "pass_rate"
}
],
"groupBy": ["model"],
"orderBy": [
{
"column": "pass_rate",
"direction": "DESC"
}
]
}
Example 5: Nested Data Access
Query nested fields using dot notation.
{
"select": [
{
"expression": {
"type": "COLUMN",
"column": "metadata.model_version",
"source": "data"
}
},
{
"expression": {
"type": "AGGREGATION",
"function": "AVG",
"column": "output.confidence",
"source": "task_result_cache"
},
"alias": "avg_confidence"
}
],
"groupBy": ["metadata.model_version"]
}
This accesses:
data.metadata.model_version - nested field in user data
task_result_cache.output.confidence - nested field in task results
Debugging Failed Queries
If your widget shows computation_status: "failed", check:
- Column names: Ensure columns exist in your evaluation data
- Source specification: Verify correct source (“data” vs “task_result_cache”)
- GROUP BY alignment: All non-aggregated
SELECT columns must be in groupBy
- Operator/value types: Match operator to value type (IN requires list, LIKE requires string)
- Required params: PERCENTILE needs percentile param, PERCENTAGE needs percentage_filters
Check the error_message field in the widget result for specific error details.
Next Steps