Skip to main content

What are Evaluation Dashboards?

Evaluation Dashboards are customizable visualizations that allow you to analyze and monitor evaluation performance with widgets. Dashboards are a collection of metric cards, charts, and tables that can be used to provide insights into evaluation results.
Evaluation Dashboards Overview

Key Concepts

Dashboards

A dashboard is a collection of widgets arranged in a custom layout. Each dashboard is scoped to either:
  • Single Evaluation: Visualize metrics for one specific evaluation
  • Evaluation Group: Compare metrics across multiple related evaluations (Coming soon)
Dashboards use an XOR constraint - they belong to either a single evaluation OR an evaluation group, never both.

Widgets

Widgets are reusable visualization components that display computed metrics or data. Six widget types are available:
  • Metric: Display a single computed aggregation (e.g., “Average Score: 0.87”)
  • Table: Show aggregations in a table format, allowing groupings and sorting
  • Bar Chart: Visualize data across a categorical dimension
  • Histogram: Visualize data across a numerical dimension
  • Markdown: Add static content to the dashboard
  • Heading: Organize dashboard sections
Learn more in the Widget Types reference.

Queries

For data-driven widgets (Metric, Table, Bar, Histogram), you define a query using a SQL-like JSON object. Queries allow you to:
  • Aggregate data using functions
    • AVG, SUM, COUNT, MIN, MAX, STDDEV, VARIANCE, PERCENTILE, PERCENTAGE, COUNT, COUNT_DISTINCT
  • Filter results based on conditions
  • Group by dimensions
  • Sort and limit results
See the Query Language guide for details.

Widget Results

When you create or update a widget, the system automatically computes its results by executing the query against your evaluation data. Results are stored in the database and can be accessed by the widget when it is rendered.

Architecture & Data Model

Understanding how dashboards are structured in the database helps clarify the relationship between dashboards, widgets, and their computed results.

Three Core Entities

1. Dashboard A dashboard is a container that links to an evaluation (or evaluation group) and maintains an ordered list of widget IDs.
{
  "id": "dash-abc123",
  "name": "Model Performance Dashboard",
  "evaluation_id": "eval-xyz789",
  "widget_order": ["widget-1", "widget-2", "widget-3"]
}
The widget_order array defines both which widgets appear on the dashboard and their display order. 2. Widget (Template) Widgets are reusable computation templates that define:
  • What data to query (the query definition)
  • How to display it (the configuration)
  • The visualization type (metric, table, bar, etc.)
{
  "id": "widget-1",
  "title": "Average Score",
  "type": "metric",
  "query": {
    "select": [{"expression": {"type": "AGGREGATION", "function": "AVG", "column": "score"}}]
  },
  "config": {}
}
Widgets are stored independently from dashboards and can be reused across multiple dashboards by including the same widget ID in different widget_order arrays. This enables the same dashboard template to be used for different evaluations. 3. Widget Result (Computed Output) Widget results are the actual computed values produced by executing a widget’s query against evaluation data.
{
  "id": "result-def456",
  "widget_id": "widget-1",
  "evaluation_id": "eval-xyz789",
  "computed_result": {"type": "metric", "data": 0.873},
  "computation_status": "completed",
  "computed_at": "2026-01-15T10:35:02Z"
}
Results are specific to both a widget and an evaluation, meaning the same widget produces different results when used on different evaluations.

How They Work Together

  1. Dashboard Creation: When you create a dashboard, you link it to an evaluation (or evaluation group)
  2. Adding Widgets: When you add a widget to a dashboard:
    • A widget template is created (or reused if it already exists)
    • The widget’s ID is appended to the dashboard’s widget_order
    • The system automatically computes the widget result for the dashboard’s evaluation data
  3. Display: When rendering a dashboard:
    • The system fetches the dashboard and its ordered widget list
    • For each widget ID, it loads the widget definition (template)
    • It retrieves the corresponding widget result (computed output)
    • The UI renders widgets in order using the template definition and computed results
  4. Updates: When you update a widget:
    • If the widget belongs to only one dashboard: The template is updated in place
    • If the widget is shared across multiple dashboards: A copy of the widget is created for your dashboard
    • Results are recomputed if the query or configuration changed

Reusability Example

The same widget template can appear on multiple dashboards:
Widget "Average Score" (widget-1)
│── Dashboard A (evaluation-1)
│   └── Result: 0.873
│── Dashboard B (evaluation-2)
│   └── Result: 0.791
│── Dashboard C (evaluation-group-1)
    └── Result: 0.845
This architecture enables efficient widget reuse while maintaining separate computed results for each evaluation context.

Managing Widgets

Editing Widgets

You can edit a widget’s configuration, query, or title at any time: Via the UI:
  1. Navigate to your dashboard
  2. Click the edit icon (pencil) on the widget you want to modify
  3. Update the title, query, or configuration
  4. Click “Save Changes”
Via the SDK:
updated_widget = client.evaluation_dashboards.widgets.update(
    dashboard_id=dashboard.id,
    widget_id=widget.id,
    title="Updated Widget Title",
    query={
        "select": [
            {
                "expression": {
                    "type": "AGGREGATION",
                    "function": "AVG",
                    "column": "overall_score",
                    "source": "data"
                }
            }
        ]
    }
)
Widget Cloning Behavior: If a widget is shared across multiple dashboards, editing it will automatically create a copy for your dashboard to prevent affecting other dashboards using the same widget.

Removing Widgets from a Dashboard

Remove widgets you no longer need from your dashboard: Via the UI:
  1. Navigate to your dashboard
  2. Click the delete icon (trash) on the widget
  3. Confirm the removal
Via the SDK:
client.evaluation_dashboards.widgets.remove(
    dashboard_id=dashboard.id,
    widget_id=widget.id
)
Removing a widget from a dashboard only removes it from that dashboard’s widget_order. The widget template itself remains in the system if it’s used by other dashboards. To permanently delete a widget template, use the delete endpoint.

Reordering Widgets

Organize your dashboard by rearranging widget order: Via the UI:
  1. Navigate to your dashboard
  2. Click and drag widgets to reorder them
  3. Changes are saved automatically
Via the SDK:
# Specify the desired order of widget IDs
client.evaluation_dashboards.update(
    dashboard_id=dashboard.id,
    widget_order=["widget-3", "widget-1", "widget-2"]  # New order
)
The first widget ID in the array appears at the top of the dashboard, and subsequent widgets follow in order.

Duplicating Dashboards

Create a copy of an existing dashboard to reuse its widget configuration with a different evaluation.

Use Cases for Duplication

  • Template Reuse: Apply a dashboard template to multiple evaluations
  • Comparison: Create identical dashboards for different evaluation runs
  • Iteration: Start with a proven dashboard structure and customize it

Creating from a Template

Via the UI:
  1. Navigate to the Dashboards page
  2. Go to the dashboard you want to duplicate
  3. Click the three dots menu and select “Duplicate Dashboard”
  4. Fill in the form and select the target evaluation
  5. Click “Duplicate”
Via the SDK:
# Create a new dashboard from an existing template
new_dashboard = client.evaluation_dashboards.create(
    name="Q1 Model Performance",
    evaluation_id="eval-new-123",
    template_dashboard_id="dash-template-abc"  # Source dashboard to copy
)
When you duplicate a dashboard:
  • The new dashboard copies the widget_order from the template
  • All widgets are reused (same widget IDs)
  • New widget results are computed for the new evaluation’s data
  • The template dashboard remains unchanged
Best Practice: Create a “template” dashboard with your standard metrics and visualizations, then duplicate it for each new evaluation run to maintain consistency.

Next Steps