Capabilities Changelog

This page tracks updates and additions to Scale’s Capabilities documentation.

Latest Updates

Document Parsing with Dex in Compass

Added: February 2026 Compass now supports document parsing using Dex directly within workflows. This integration enables you to process unstructured documents, extract structured data, and use the results in downstream workflow operations.

What’s New

New Documentation Page: Guide to Document Parsing with Dex in Compass

Step-by-step tutorial for building document parsing workflows
Connect to cloud file storage (Amazon S3, Azure Blob, Google Cloud Storage)
Filter and select documents for processing
Parse documents with configurable Dex engines (Reducto)
Automatically create and manage Dex projects
Push parsed data to Knowledge Base for semantic search and RAG
Use parsed content in subsequent workflow cards

Key Features

Dex Parse Card:

Project Management: Automatically create Dex projects or reuse existing ones
Multiple Parse Engines: Support for Reducto and other parsing engines
Flexible Configuration: Configurable chunking methods and chunk sizes
Knowledge Base Integration: Push parsed results to vector stores for semantic search
Output Control: Choose between detailed parsed content or metadata-only outputs

Data Connector for Cloud File Storages:

Connect to Amazon S3, Azure Blob Storage, and Google Cloud Storage
Browse and select files from registered cloud storage connections

Workflow Integration:

Access parsed content in downstream cards using {{parsedContent}} variable
Call agents to analyze, categorize, or transform parsed documents
Export parsed data to SGP Datasets or Evaluations or SGP Knowledge Base
Monitor parsing status and track results

Some Use Cases

Resume Parsing: Extract structured data from resumes and categorize candidates
Document Analysis: Parse contracts, invoices, or reports and extract key information
Content Extraction: Convert unstructured documents into structured datasets for training or analysis
Semantic Search: Build Knowledge Base indexes from document collections

Evaluations in Compass Workflows

Added: January 2026 Compass is an AI-powered low-code workflow builder that enables enterprise operators to import data, transform it, call LLMs or Agents, analyze or evaluate the output, and automate the entire workflow—all in one interface.

What’s New

Two updated documentation pages:

Updated: Introduction to Compass
- Key features and available card types
- Scheduling and monitoring workflows
- Roadmap for H1 2026
New: Guide to Automating Evals on Compass
- Step-by-step tutorial for building automated evaluation workflows
- Connecting to data sources (Snowflake, Postgres, MongoDB, MS SQL)
- Calling Agentex agents for completions
- Joining ground truth datasets from other workflows
- Configuring LLM as Judge evaluations
- Exporting results to Evaluations and Dashboards
- Scheduling automated evaluation runs

Updates to Key Features

Data Source Connections: Query data directly from sources like Snowflake, Postgres, MongoDB, or MS SQL Workflow Building Cards:

Call Agent: Generate outputs from your Agentex agents with configurable timeout, retries, and response caching
Join Workflow: Combine datasets from multiple workflows with Left, Right, Inner, or Outer joins
LLM as Judge: Evaluate agent outputs with customizable rubrics and judge configurations

SGP Integrations:

Access sources configured in SGP Data Sources tab in your workflows
Export datasets directly to SGP Datasets tab
Export evaluation results directly to SGP Evaluations tab

Notifications for automations: Receive Slack notifications for failed or passed runs

How to see the features

Compass is available as the Workflows tab in SGP. Navigate to SGP > Workflows > New Workflow. Contact the Scale team if you don’t see it on your SGP instance.

Compass instances deployed in customer VPCs may have a limited set of available cards. Contact the Compass team to activate or disable cards based on your requirements.

Dex SDK: Version 0.4.0

Released: February 2025 Major improvements to the Dex SDK introducing unified entity operations, pagination, filtering, sorting, and enhanced async job support with SGP tracing integration.

🔄 Unified Entity Operations

New Feature: Consistent get and list operations are now available for all entities within a project. What’s New:

Unified API: All entities (files, jobs, parse_results, extractions, vector_stores) now support standardized get and list operations
Project-scoped operations: Access all entity types through your project instance
Consistent patterns: Same approach works across all entity types

Example:

# List operations available for all entities
files = await project.list_files()
jobs = await project.list_jobs()
parse_results = await project.list_parse_results()
extractions = await project.list_extractions()
vector_stores = await project.list_vector_stores()

📄 Pagination, Filtering, and Sorting

New Feature: Token-based pagination with filtering and sorting for improved performance and discoverability. What’s New:

Token-based pagination: Set page size and use continuation tokens to navigate through results
Entity-specific filters: Available for all entities with dedicated filter types
Flexible sorting: Sort results by any field in ascending or descending order
Performance improvements: Process large datasets more efficiently

Pagination:

from dex_sdk.types import PaginationParams

pagination_params = PaginationParams(
    page_size=10,
    sort_by="created_at",
    sort_order="desc",
    continuation_token=None
)

files = await project.list_files(pagination_params=pagination_params)

Filtering:

from dex_sdk.types import FileListFilter, JobListFilter, ParseResultListFilter, ExtractionListFilter, VectorStoreListFilter
from datetime import datetime, timedelta

# Filter for entities created in the last day
file_filter = FileListFilter(created_at_start=datetime.now() - timedelta(days=1))
job_filter = JobListFilter(created_at_start=datetime.now() - timedelta(days=1))
parse_result_filter = ParseResultListFilter(created_at_start=datetime.now() - timedelta(days=1))
extraction_filter = ExtractionListFilter(created_at_start=datetime.now() - timedelta(days=1))
vector_store_filter = VectorStoreListFilter(created_at_start=datetime.now() - timedelta(days=1))

# Apply filters
files = await project.list_files(pagination_params=pagination_params, filter=file_filter)
jobs = await project.list_jobs(pagination_params=pagination_params, filter=job_filter)
parse_results = await project.list_parse_results(pagination_params=pagination_params, filter=parse_result_filter)
extractions = await project.list_extractions(pagination_params=pagination_params, filter=extraction_filter)
vector_stores = await project.list_vector_stores(pagination_params=pagination_params, filter=vector_store_filter)

🔧 Enhanced Type Support

Improved: Better type safety with conditional return types based on pagination parameters. What’s New:

Type overrides: SDK provides different types for paginated vs non-paginated responses
Backward compatibility: Existing code continues to work without changes
Better IDE support: Enhanced autocomplete and type checking

⚡ Async Job Improvements

New Feature: Enhanced async job support with SGP tracing integration for better monitoring and debugging. What’s New:

Job monitoring: Easily track job status and progress
Throttling support: Control job throughput to match your needs
SGP trace integration: Jobs are now connected to SGP traces for end-to-end observability
Trace retrieval: Fetch complete trace data for any job

Example:

from dex_sdk.types import ReductoParseJobParams, ReductoParseEngineOptions, ReductoChunkingOptions, ReductoChunkingMethod, ParseEngine, JobStatus
import asyncio

# Start a parse job
parse_job = await project.start_parse_job(
    dex_file=dex_file,
    parameters=ReductoParseJobParams(
        engine=ParseEngine.REDUCTO,
        options=ReductoParseEngineOptions(
            chunking=ReductoChunkingOptions(
                chunk_mode=ReductoChunkingMethod.VARIABLE,
            )
        ),
    ),
)

# Monitor job progress
while parse_job.data.status != JobStatus.SUCCEEDED and parse_job.data.status != JobStatus.FAILED:
    await asyncio.sleep(1)
    await parse_job.refresh()

# Get the result
parse_result = await parse_job.get_result()

# Retrieve SGP traces for the job
from scale_gp_beta import SGPClient

sgp_client = SGPClient(...)
spans = [span for span in sgp_client.spans.search(
    sort_by="created_at",
    sort_order="desc",
    extra_metadata={"job_id": "job_parse_eb62b57e8c4940028b82c559903ed003"},
    parents_only=True,
)]

trace_id = spans[0].trace_id
spans = [span for span in sgp_client.spans.search(trace_ids=[trace_id])]

Benefits:

Better observability: Track your jobs through SGP’s tracing infrastructure
Easier debugging: Access detailed execution traces for failed jobs
Performance monitoring: Analyze job performance and identify bottlenecks
Request correlation: Connect job execution to API requests and traces

How to Update

1. Install/Update SDK: Use your configured CodeArtifact credentials to upgrade the Dex SDK to version 0.4.0 or higher. 2. Explore New Features:

Add pagination to large list operations for better performance
Use filters to find specific entities by creation time or other criteria
Integrate SGP tracing to monitor your async jobs
Leverage enhanced type support for better IDE experience

3. Backward Compatibility: Version 0.4.0 is fully backward compatible. Your existing code will continue to work without modifications.

Dex Document Understanding: Version 0.3.2

Released: November 2025 Significant updates to the Dex document understanding service and SDK, including authentication changes, data retention policies, and comprehensive documentation improvements.

🔐 Authentication Changes

Breaking Change: The authentication method for Dex has been updated to improve security and simplify the API. What Changed:

SGP credentials are now passed directly to DexClient instead of being stored in project configurations
Every API request is now authenticated using your SGP API key and account ID
The ProjectCredentials and SGPCredentials types are deprecated and will be removed in a future version

Migration Required: Old way (deprecated):

from dex_sdk import DexClient
from dex_sdk.types import ProjectCredentials, SGPCredentials

dex_client = DexClient(base_url="https://dex.sgp.scale.com")

project = await dex_client.create_project(
    name="My Project",
    credentials=ProjectCredentials(
        sgp=SGPCredentials(
            account_id=os.getenv("SGP_ACCOUNT_ID"),
            api_key=os.getenv("SGP_API_KEY"),
        ),
    ),
)

New way (version 0.3.2+):

import os
from dex_sdk import DexClient

# Pass credentials when creating the client
dex_client = DexClient(
    base_url="https://dex.sgp.scale.com",
    api_key=os.getenv("SGP_API_KEY"),
    account_id=os.getenv("SGP_ACCOUNT_ID"),
)

# Create project without credentials parameter
project = await dex_client.create_project(
    name="My Project",
)

Benefits:

Enhanced Security: SGP credentials are no longer stored in the Dex database
Simpler API: Credentials are set once at client initialization
Consistent Authentication: Every request is authenticated the same way

🗄️ Data Retention Policies

New Feature: Dex now supports configurable data retention policies for automatic lifecycle management of files and processing artifacts. What’s New:

Automatic cleanup: Set retention periods for files and result artifacts to automatically delete data after a specified time
Flexible configuration: Configure different retention periods for files vs. processing artifacts
Project-level control: Retention policies are configured per project and can be updated at any time

Usage: Configure retention when creating or updating a project:

from datetime import timedelta
from dex_sdk.types import ProjectConfiguration, RetentionPolicy

project = await dex_client.create_project(
    name="My Project",
    configuration=ProjectConfiguration(
        retention=RetentionPolicy(
            files=timedelta(days=30),           # Files expire after 30 days
            result_artifacts=timedelta(days=7),  # Parse/extract results expire after 7 days
        )
    )
)

Use Cases:

Compliance: Meet regulatory requirements (GDPR, HIPAA) by enforcing data retention limits
Cost Management: Reduce storage costs by automatically cleaning up old files and artifacts
Security: Minimize data exposure by limiting how long sensitive documents are stored

See the Getting Started with Dex guide for detailed examples.

📚 Documentation Improvements

Major Update: Comprehensive expansion of the Dex SDK documentation with detailed type information and practical examples. API Reference Enhancements:

Common Types Section: Added comprehensive documentation for all core data models organized by category:
- Project Types: ProjectEntity, ProjectStatus, ProjectConfiguration, RetentionPolicy
- File Types: FileEntity, FileStatus, FileDownloadURL
- Job Types: JobEntity, JobOperationType, JobStatus
- Parse Types: ParseResultEntity, ParseResultMetadata, ParseChunk, ParseBlock, BoundingBox, ParseEngine, ReductoChunkingMethod
- Extraction Types: ExtractionEntity, ExtractionResult, ExtractedField, ExtractionCitation, ExtractionParameters, UsageInfo
- Vector Store Types: VectorStoreEntity, VectorStoreEngines, VectorStoreChunk, SearchConfig
Parse Job Parameters: New section documenting parsing configuration options

Enhanced Examples:

Working with Parse Results: How to access parse metadata, iterate through chunks and blocks, access bounding boxes and confidence scores
Working with Extraction Results: How to access extracted field values, citations, confidence scores, and token usage information

Updated Documentation:

Introduction: Updated File Management section to mention data retention policies
Getting Started Guide: Added “Configuring Data Retention” section with practical examples
API Reference: Enhanced type documentation and examples

🔧 Type System Updates

ParseResult Naming Changes: The ParseResult classes have been harmonized for consistency:

Request objects now end with *Request (e.g., CustomParseResultRequest)
Entity objects now end with *Entity (e.g., ParseResultEntity)

Migration:

# Old import (deprecated)
from dex_sdk.types import CustomParseResult

# New import
from dex_sdk.types import CustomParseResultRequest

Improved Type Safety:

Fixed typing inconsistencies for entities returned by the DEX API
Better type hints for all SDK methods
Enhanced autocomplete support in IDEs

How to Update

1. Install/Update SDK: Use your configured CodeArtifact credentials to upgrade the Dex SDK to version 0.3.2 or higher. 2. Update Your Code:

Add api_key and account_id parameters to DexClient() initialization
Remove credentials parameter from create_project() calls
Remove imports of ProjectCredentials and SGPCredentials

3. Test Your Integration: Run your scripts to ensure they work with the new authentication method. MCP Integration Status: Authentication via Model Context Protocol (MCP) is still in progress. If you encounter issues, please reach out to the Dex team at #dex-help.

New Capability: IRIS OCR

Added: October 2024 IRIS is Scale’s OCR capability that transforms document images and PDFs into structured text through an intelligent multi-stage pipeline.

What’s New

Two new documentation pages:

Introduction to IRIS
- Overview of IRIS OCR capability
- Three-stage pipeline architecture (layout detection, OCR processing, assembly)
- 15+ supported OCR models including open-source and vision-language models
- Multi-language support with specialized Arabic models
- Common use cases and key advantages
Getting Started with IRIS
- Comprehensive guide to using IRIS through Dex SDK
- Prerequisites and setup instructions
- Parsing PDFs and images with complete examples
- Configuration options for parse engine
- Understanding parse results and chunk structure
- File management and error handling
- Batch processing examples
- Multi-language support details
- Best practices for production use
- Performance considerations and optimization tips

Key Features

Layout-Aware Processing: Automatically detects text, tables, and images before OCR
Multiple OCR Engines: Choose from Tesseract, EasyOCR, PaddleOCR, Surya, GPT-4o, Gemini, and more
Table-Specific Processing: Specialized models optimized for accurate table extraction
Multi-Language Support: Process documents in 35+ languages including Arabic
Dex Integration: Seamless integration with Dex’s document understanding platform
Async Processing: Non-blocking parse jobs with project-based organization

How to Access

IRIS is available through the Dex SDK as a parse engine option:

from dex_core.models.parse_job import IrisParseEngineOptions, IrisParseJobParams

parse_job = await dex_file.start_parse_job(
    IrisParseJobParams(options=IrisParseEngineOptions())
)

Configuration Updates

Updated: October 2024

Added explicit V5 (beta) version tags to all Capabilities navigation groups
Ensures proper scoping of Capabilities documentation to V5
Improved navigation organization for better user experience

Affected Sections:

Getting Started
Document Understanding
OCR
Workflows

Support and Feedback

Compass-Specific Questions

For questions or issues related to Compass Workflows:

Slack: #compass
Documentation: Introduction to Compass
Tutorial: Guide to Automating Evals on Compass

Dex-Specific Questions

For questions or issues related to Dex:

Slack: #dex-help
Documentation: Getting Started with Dex
API Reference: Dex SDK API Reference

General Capabilities Documentation

Have suggestions for improving our Capabilities documentation? Please contact the Scale AI team or submit feedback through your account dashboard.

Getting Started

Document Understanding

OCR

Workflows

​Latest Updates

​Document Parsing with Dex in Compass

​What’s New

​Key Features

​Some Use Cases

​Evaluations in Compass Workflows

​What’s New

​Updates to Key Features

​How to see the features

​Dex SDK: Version 0.4.0

​🔄 Unified Entity Operations

​📄 Pagination, Filtering, and Sorting

​🔧 Enhanced Type Support

​⚡ Async Job Improvements

​How to Update

​Dex Document Understanding: Version 0.3.2

​🔐 Authentication Changes

​🗄️ Data Retention Policies

​📚 Documentation Improvements

​🔧 Type System Updates

​How to Update

​New Capability: IRIS OCR

​What’s New

​Key Features

​How to Access

​Configuration Updates

​Navigation Structure Improvements

​Support and Feedback

​Compass-Specific Questions

​Dex-Specific Questions

​General Capabilities Documentation

Latest Updates

Document Parsing with Dex in Compass

What’s New

Key Features

Some Use Cases

Evaluations in Compass Workflows

What’s New

Updates to Key Features

How to see the features

Dex SDK: Version 0.4.0

🔄 Unified Entity Operations

📄 Pagination, Filtering, and Sorting

🔧 Enhanced Type Support

⚡ Async Job Improvements

How to Update

Dex Document Understanding: Version 0.3.2

🔐 Authentication Changes

🗄️ Data Retention Policies

📚 Documentation Improvements

🔧 Type System Updates

How to Update

New Capability: IRIS OCR

What’s New

Key Features

How to Access

Configuration Updates

Navigation Structure Improvements

Support and Feedback

Compass-Specific Questions

Dex-Specific Questions

General Capabilities Documentation