Introduction to Evaluation Dashboards

What are Evaluation Dashboards?

Evaluation Dashboards are customizable visualizations that allow you to analyze and monitor evaluation performance with widgets. Dashboards are a collection of metric cards, charts, and tables that can be used to provide insights into evaluation results.

Key Concepts

Dashboards

A dashboard is a collection of widgets arranged in a custom layout. Each dashboard is scoped to either:

Single Evaluation: Visualize metrics for one specific evaluation
Evaluation Group: Compare metrics across multiple related evaluations (Coming soon)

Dashboards use an XOR constraint - they belong to either a single evaluation OR an evaluation group, never both.

Widgets

Widgets are reusable visualization components that display computed metrics or data. Six widget types are available:

Metric: Display a single computed aggregation (e.g., “Average Score: 0.87”)
Table: Show aggregations in a table format, allowing groupings and sorting
Bar Chart: Visualize data across a categorical dimension
Histogram: Visualize data across a numerical dimension
Markdown: Add static content to the dashboard
Heading: Organize dashboard sections

Learn more in the Widget Types reference.

Queries

For data-driven widgets (Metric, Table, Bar, Histogram), you define a query using a SQL-like JSON object. Queries allow you to:

Aggregate data using functions
- AVG, SUM, COUNT, MIN, MAX, STDDEV, VARIANCE, PERCENTILE, PERCENTAGE, COUNT, COUNT_DISTINCT
Filter results based on conditions
Group by dimensions
Sort and limit results

See the Query Language guide for details. When you create or update a widget, the system automatically computes its results by executing the query against your evaluation data. Results are stored in the database and can be accessed by the widget when it is rendered.

Architecture & Data Model

Understanding how dashboards are structured in the database helps clarify the relationship between dashboards, widgets, and their computed results.

Three Core Entities

1. Dashboard A dashboard is a container that links to an evaluation (or evaluation group) and maintains an ordered list of widget IDs.

{
  "id": "dash-abc123",
  "name": "Model Performance Dashboard",
  "evaluation_id": "eval-xyz789",
  "widget_order": ["widget-1", "widget-2", "widget-3"]
}

The widget_order array defines both which widgets appear on the dashboard and their display order. 2. Widget (Template) Widgets are reusable computation templates that define:

What data to query (the query definition)
How to display it (the configuration)
The visualization type (metric, table, bar, etc.)

{
  "id": "widget-1",
  "title": "Average Score",
  "type": "metric",
  "query": {
    "select": [{"expression": {"type": "AGGREGATION", "function": "AVG", "column": "score"}}]
  },
  "config": {}
}

Widgets are stored independently from dashboards and can be reused across multiple dashboards by including the same widget ID in different widget_order arrays. This enables the same dashboard template to be used for different evaluations. 3. Widget Result (Computed Output) Widget results are the actual computed values produced by executing a widget’s query against evaluation data.

{
  "id": "result-def456",
  "widget_id": "widget-1",
  "evaluation_id": "eval-xyz789",
  "computed_result": {"type": "metric", "data": 0.873},
  "computation_status": "completed",
  "computed_at": "2026-01-15T10:35:02Z"
}

Results are specific to both a widget and an evaluation, meaning the same widget produces different results when used on different evaluations.

How They Work Together

Dashboard Creation: When you create a dashboard, you link it to an evaluation (or evaluation group)
Adding Widgets: When you add a widget to a dashboard:
- A widget template is created (or reused if it already exists)
- The widget’s ID is appended to the dashboard’s widget_order
- The system automatically computes the widget result for the dashboard’s evaluation data
Display: When rendering a dashboard:
- The system fetches the dashboard and its ordered widget list
- For each widget ID, it loads the widget definition (template)
- It retrieves the corresponding widget result (computed output)
- The UI renders widgets in order using the template definition and computed results
Updates: When you update a widget:
- If the widget belongs to only one dashboard: The template is updated in place
- If the widget is shared across multiple dashboards: A copy of the widget is created for your dashboard
- Results are recomputed if the query or configuration changed

Reusability Example

The same widget template can appear on multiple dashboards:

Widget "Average Score" (widget-1)
│── Dashboard A (evaluation-1)
│   └── Result: 0.873
│── Dashboard B (evaluation-2)
│   └── Result: 0.791
│── Dashboard C (evaluation-group-1)
    └── Result: 0.845

This architecture enables efficient widget reuse while maintaining separate computed results for each evaluation context.

Managing Widgets

Editing Widgets

You can edit a widget’s configuration, query, or title at any time: Via the UI:

Navigate to your dashboard
Click the edit icon (pencil) on the widget you want to modify
Update the title, query, or configuration
Click “Save Changes”

Via the SDK:

updated_widget = client.evaluation_dashboards.widgets.update(
    dashboard_id=dashboard.id,
    widget_id=widget.id,
    title="Updated Widget Title",
    query={
        "select": [
            {
                "expression": {
                    "type": "AGGREGATION",
                    "function": "AVG",
                    "column": "overall_score",
                    "source": "data"
                }
            }
        ]
    }
)

Widget Cloning Behavior: If a widget is shared across multiple dashboards, editing it will automatically create a copy for your dashboard to prevent affecting other dashboards using the same widget.

Removing Widgets from a Dashboard

Remove widgets you no longer need from your dashboard: Via the UI:

Navigate to your dashboard
Click the delete icon (trash) on the widget
Confirm the removal

Via the SDK:

client.evaluation_dashboards.widgets.remove(
    dashboard_id=dashboard.id,
    widget_id=widget.id
)

Removing a widget from a dashboard only removes it from that dashboard’s widget_order. The widget template itself remains in the system if it’s used by other dashboards. To permanently delete a widget template, use the delete endpoint.

Reordering Widgets

Organize your dashboard by rearranging widget order: Via the UI:

Navigate to your dashboard
Click and drag widgets to reorder them
Changes are saved automatically

Via the SDK:

# Specify the desired order of widget IDs
client.evaluation_dashboards.update(
    dashboard_id=dashboard.id,
    widget_order=["widget-3", "widget-1", "widget-2"]  # New order
)

The first widget ID in the array appears at the top of the dashboard, and subsequent widgets follow in order.

Duplicating Dashboards

Create a copy of an existing dashboard to reuse its widget configuration with a different evaluation.

Use Cases for Duplication

Template Reuse: Apply a dashboard template to multiple evaluations
Comparison: Create identical dashboards for different evaluation runs
Iteration: Start with a proven dashboard structure and customize it

Creating from a Template

Via the UI:

Navigate to the Dashboards page
Go to the dashboard you want to duplicate
Click the three dots menu and select “Duplicate Dashboard”
Fill in the form and select the target evaluation
Click “Duplicate”

Via the SDK:

# Create a new dashboard from an existing template
new_dashboard = client.evaluation_dashboards.create(
    name="Q1 Model Performance",
    evaluation_id="eval-new-123",
    template_dashboard_id="dash-template-abc"  # Source dashboard to copy
)

When you duplicate a dashboard:

The new dashboard copies the widget_order from the template
All widgets are reused (same widget IDs)
New widget results are computed for the new evaluation’s data
The template dashboard remains unchanged

Best Practice: Create a “template” dashboard with your standard metrics and visualizations, then duplicate it for each new evaluation run to maintain consistency.

Next Steps

Ready to create your first dashboard? Follow the Getting Started guide for a step-by-step walkthrough.
For detailed information on widget configuration, see the Widget Types reference.
To learn the query language for data-driven widgets, read the Query Language guide.
For programmatic query creation, read the API Reference.

Overview

Getting Started

Evaluations

Evaluation Dashboards

Tracing

Agents

Introduction to Evaluation Dashboards

What are Evaluation Dashboards?

Key Concepts

Dashboards

Widgets

Queries

Widget Results

Architecture & Data Model

Three Core Entities

How They Work Together

Reusability Example

Managing Widgets

Editing Widgets

Removing Widgets from a Dashboard

Reordering Widgets

Duplicating Dashboards

Use Cases for Duplication

Creating from a Template

Next Steps

Overview

Getting Started

Evaluations

Evaluation Dashboards

Tracing

Agents

​What are Evaluation Dashboards?

​Key Concepts

​Dashboards

​Widgets

​Queries

​Widget Results

​Architecture & Data Model

​Three Core Entities

​How They Work Together

​Reusability Example

​Managing Widgets

​Editing Widgets

​Removing Widgets from a Dashboard

​Reordering Widgets

​Duplicating Dashboards

​Use Cases for Duplication

​Creating from a Template

​Next Steps

What are Evaluation Dashboards?

Key Concepts

Dashboards

Widgets

Queries

Widget Results

Architecture & Data Model

Three Core Entities

How They Work Together

Reusability Example

Managing Widgets

Editing Widgets

Removing Widgets from a Dashboard

Reordering Widgets

Duplicating Dashboards

Use Cases for Duplication

Creating from a Template

Next Steps