Skip to main content
This page tracks updates and additions to Scale’s Capabilities documentation.

Latest Updates

Major Upgrades to Compass Workflows

Added: January 2026 Compass is an AI-powered low-code workflow builder that enables enterprise operators to import data, transform it, call LLMs or Agents, analyze or evaluate the output, and automate the entire workflow—all in one interface.

What’s New

Two updated documentation pages:
  1. Updated: Introduction to Compass
    • Key features and available card types
    • Scheduling and monitoring workflows
    • Roadmap for H1 2026
  2. New: Guide to Automating Evals on Compass
    • Step-by-step tutorial for building automated evaluation workflows
    • Connecting to data sources (Snowflake, Postgres, MongoDB, MS SQL)
    • Calling Agentex agents for completions
    • Joining ground truth datasets from other workflows
    • Configuring LLM as Judge evaluations
    • Exporting results to Evaluations and Dashboards
    • Scheduling automated evaluation runs

Updates to Key Features

Data Source Connections: Query data directly from sources like Snowflake, Postgres, MongoDB, or MS SQL Workflow Building Cards:
  • Call Agent: Generate outputs from your Agentex agents with configurable timeout, retries, and response caching
  • Join Workflow: Combine datasets from multiple workflows with Left, Right, Inner, or Outer joins
  • LLM as Judge: Evaluate agent outputs with customizable rubrics and judge configurations
SGP Integrations:
  • Access sources configured in SGP Data Sources tab in your workflows
  • Export datasets directly to SGP Datasets tab
  • Export evaluation results directly to SGP Evaluations tab
Notifications for automations: Receive Slack notifications for failed or passed runs

How to see the features

Compass is available as the Workflows tab in SGP. Navigate to SGP > Workflows > New Workflow. Contact the Scale team if you don’t see it on your SGP instance.
Compass instances deployed in customer VPCs may have a limited set of available cards. Contact the Compass team to activate or disable cards based on your requirements.

Dex SDK: Version 0.4.0

Released: February 2025 Major improvements to the Dex SDK introducing unified entity operations, pagination, filtering, sorting, and enhanced async job support with SGP tracing integration.

🔄 Unified Entity Operations

New Feature: Consistent get and list operations are now available for all entities within a project. What’s New:
  • Unified API: All entities (files, jobs, parse_results, extractions, vector_stores) now support standardized get and list operations
  • Project-scoped operations: Access all entity types through your project instance
  • Consistent patterns: Same approach works across all entity types
Example:
# List operations available for all entities
files = await project.list_files()
jobs = await project.list_jobs()
parse_results = await project.list_parse_results()
extractions = await project.list_extractions()
vector_stores = await project.list_vector_stores()

📄 Pagination, Filtering, and Sorting

New Feature: Token-based pagination with filtering and sorting for improved performance and discoverability. What’s New:
  • Token-based pagination: Set page size and use continuation tokens to navigate through results
  • Entity-specific filters: Available for all entities with dedicated filter types
  • Flexible sorting: Sort results by any field in ascending or descending order
  • Performance improvements: Process large datasets more efficiently
Pagination:
from dex_sdk.types import PaginationParams

pagination_params = PaginationParams(
    page_size=10,
    sort_by="created_at",
    sort_order="desc",
    continuation_token=None
)

files = await project.list_files(pagination_params=pagination_params)
Filtering:
from dex_sdk.types import FileListFilter, JobListFilter, ParseResultListFilter, ExtractionListFilter, VectorStoreListFilter
from datetime import datetime, timedelta

# Filter for entities created in the last day
file_filter = FileListFilter(created_at_start=datetime.now() - timedelta(days=1))
job_filter = JobListFilter(created_at_start=datetime.now() - timedelta(days=1))
parse_result_filter = ParseResultListFilter(created_at_start=datetime.now() - timedelta(days=1))
extraction_filter = ExtractionListFilter(created_at_start=datetime.now() - timedelta(days=1))
vector_store_filter = VectorStoreListFilter(created_at_start=datetime.now() - timedelta(days=1))

# Apply filters
files = await project.list_files(pagination_params=pagination_params, filter=file_filter)
jobs = await project.list_jobs(pagination_params=pagination_params, filter=job_filter)
parse_results = await project.list_parse_results(pagination_params=pagination_params, filter=parse_result_filter)
extractions = await project.list_extractions(pagination_params=pagination_params, filter=extraction_filter)
vector_stores = await project.list_vector_stores(pagination_params=pagination_params, filter=vector_store_filter)

🔧 Enhanced Type Support

Improved: Better type safety with conditional return types based on pagination parameters. What’s New:
  • Type overrides: SDK provides different types for paginated vs non-paginated responses
  • Backward compatibility: Existing code continues to work without changes
  • Better IDE support: Enhanced autocomplete and type checking

⚡ Async Job Improvements

New Feature: Enhanced async job support with SGP tracing integration for better monitoring and debugging. What’s New:
  • Job monitoring: Easily track job status and progress
  • Throttling support: Control job throughput to match your needs
  • SGP trace integration: Jobs are now connected to SGP traces for end-to-end observability
  • Trace retrieval: Fetch complete trace data for any job
Example:
from dex_sdk.types import ReductoParseJobParams, ReductoParseEngineOptions, ReductoChunkingOptions, ReductoChunkingMethod, ParseEngine, JobStatus
import asyncio

# Start a parse job
parse_job = await project.start_parse_job(
    dex_file=dex_file,
    parameters=ReductoParseJobParams(
        engine=ParseEngine.REDUCTO,
        options=ReductoParseEngineOptions(
            chunking=ReductoChunkingOptions(
                chunk_mode=ReductoChunkingMethod.VARIABLE,
            )
        ),
    ),
)

# Monitor job progress
while parse_job.data.status != JobStatus.SUCCEEDED and parse_job.data.status != JobStatus.FAILED:
    await asyncio.sleep(1)
    await parse_job.refresh()

# Get the result
parse_result = await parse_job.get_result()

# Retrieve SGP traces for the job
from scale_gp_beta import SGPClient

sgp_client = SGPClient(...)
spans = [span for span in sgp_client.spans.search(
    sort_by="created_at",
    sort_order="desc",
    extra_metadata={"job_id": "job_parse_eb62b57e8c4940028b82c559903ed003"},
    parents_only=True,
)]

trace_id = spans[0].trace_id
spans = [span for span in sgp_client.spans.search(trace_ids=[trace_id])]
Benefits:
  • Better observability: Track your jobs through SGP’s tracing infrastructure
  • Easier debugging: Access detailed execution traces for failed jobs
  • Performance monitoring: Analyze job performance and identify bottlenecks
  • Request correlation: Connect job execution to API requests and traces

How to Update

1. Install/Update SDK: Use your configured CodeArtifact credentials to upgrade the Dex SDK to version 0.4.0 or higher. 2. Explore New Features:
  • Add pagination to large list operations for better performance
  • Use filters to find specific entities by creation time or other criteria
  • Integrate SGP tracing to monitor your async jobs
  • Leverage enhanced type support for better IDE experience
3. Backward Compatibility: Version 0.4.0 is fully backward compatible. Your existing code will continue to work without modifications.

Dex Document Understanding: Version 0.3.2

Released: November 2025 Significant updates to the Dex document understanding service and SDK, including authentication changes, data retention policies, and comprehensive documentation improvements.

🔐 Authentication Changes

Breaking Change: The authentication method for Dex has been updated to improve security and simplify the API. What Changed:
  • SGP credentials are now passed directly to DexClient instead of being stored in project configurations
  • Every API request is now authenticated using your SGP API key and account ID
  • The ProjectCredentials and SGPCredentials types are deprecated and will be removed in a future version
Migration Required: Old way (deprecated):
from dex_sdk import DexClient
from dex_sdk.types import ProjectCredentials, SGPCredentials

dex_client = DexClient(base_url="https://dex.sgp.scale.com")

project = await dex_client.create_project(
    name="My Project",
    credentials=ProjectCredentials(
        sgp=SGPCredentials(
            account_id=os.getenv("SGP_ACCOUNT_ID"),
            api_key=os.getenv("SGP_API_KEY"),
        ),
    ),
)
New way (version 0.3.2+):
import os
from dex_sdk import DexClient

# Pass credentials when creating the client
dex_client = DexClient(
    base_url="https://dex.sgp.scale.com",
    api_key=os.getenv("SGP_API_KEY"),
    account_id=os.getenv("SGP_ACCOUNT_ID"),
)

# Create project without credentials parameter
project = await dex_client.create_project(
    name="My Project",
)
Benefits:
  • Enhanced Security: SGP credentials are no longer stored in the Dex database
  • Simpler API: Credentials are set once at client initialization
  • Consistent Authentication: Every request is authenticated the same way

🗄️ Data Retention Policies

New Feature: Dex now supports configurable data retention policies for automatic lifecycle management of files and processing artifacts. What’s New:
  • Automatic cleanup: Set retention periods for files and result artifacts to automatically delete data after a specified time
  • Flexible configuration: Configure different retention periods for files vs. processing artifacts
  • Project-level control: Retention policies are configured per project and can be updated at any time
Usage: Configure retention when creating or updating a project:
from datetime import timedelta
from dex_sdk.types import ProjectConfiguration, RetentionPolicy

project = await dex_client.create_project(
    name="My Project",
    configuration=ProjectConfiguration(
        retention=RetentionPolicy(
            files=timedelta(days=30),           # Files expire after 30 days
            result_artifacts=timedelta(days=7),  # Parse/extract results expire after 7 days
        )
    )
)
Use Cases:
  • Compliance: Meet regulatory requirements (GDPR, HIPAA) by enforcing data retention limits
  • Cost Management: Reduce storage costs by automatically cleaning up old files and artifacts
  • Security: Minimize data exposure by limiting how long sensitive documents are stored
See the Getting Started with Dex guide for detailed examples.

📚 Documentation Improvements

Major Update: Comprehensive expansion of the Dex SDK documentation with detailed type information and practical examples. API Reference Enhancements:
  • Common Types Section: Added comprehensive documentation for all core data models organized by category:
    • Project Types: ProjectEntity, ProjectStatus, ProjectConfiguration, RetentionPolicy
    • File Types: FileEntity, FileStatus, FileDownloadURL
    • Job Types: JobEntity, JobOperationType, JobStatus
    • Parse Types: ParseResultEntity, ParseResultMetadata, ParseChunk, ParseBlock, BoundingBox, ParseEngine, ReductoChunkingMethod
    • Extraction Types: ExtractionEntity, ExtractionResult, ExtractedField, ExtractionCitation, ExtractionParameters, UsageInfo
    • Vector Store Types: VectorStoreEntity, VectorStoreEngines, VectorStoreChunk, SearchConfig
  • Parse Job Parameters: New section documenting parsing configuration options
Enhanced Examples:
  • Working with Parse Results: How to access parse metadata, iterate through chunks and blocks, access bounding boxes and confidence scores
  • Working with Extraction Results: How to access extracted field values, citations, confidence scores, and token usage information
Updated Documentation:
  • Introduction: Updated File Management section to mention data retention policies
  • Getting Started Guide: Added “Configuring Data Retention” section with practical examples
  • API Reference: Enhanced type documentation and examples

🔧 Type System Updates

ParseResult Naming Changes: The ParseResult classes have been harmonized for consistency:
  • Request objects now end with *Request (e.g., CustomParseResultRequest)
  • Entity objects now end with *Entity (e.g., ParseResultEntity)
Migration:
# Old import (deprecated)
from dex_sdk.types import CustomParseResult

# New import
from dex_sdk.types import CustomParseResultRequest
Improved Type Safety:
  • Fixed typing inconsistencies for entities returned by the DEX API
  • Better type hints for all SDK methods
  • Enhanced autocomplete support in IDEs

How to Update

1. Install/Update SDK: Use your configured CodeArtifact credentials to upgrade the Dex SDK to version 0.3.2 or higher. 2. Update Your Code:
  1. Add api_key and account_id parameters to DexClient() initialization
  2. Remove credentials parameter from create_project() calls
  3. Remove imports of ProjectCredentials and SGPCredentials
3. Test Your Integration: Run your scripts to ensure they work with the new authentication method. MCP Integration Status: Authentication via Model Context Protocol (MCP) is still in progress. If you encounter issues, please reach out to the Dex team at #dex-help.

New Capability: IRIS OCR

Added: October 2024 IRIS is Scale’s OCR capability that transforms document images and PDFs into structured text through an intelligent multi-stage pipeline.

What’s New

Two new documentation pages:
  1. Introduction to IRIS
    • Overview of IRIS OCR capability
    • Three-stage pipeline architecture (layout detection, OCR processing, assembly)
    • 15+ supported OCR models including open-source and vision-language models
    • Multi-language support with specialized Arabic models
    • Common use cases and key advantages
  2. Getting Started with IRIS
    • Comprehensive guide to using IRIS through Dex SDK
    • Prerequisites and setup instructions
    • Parsing PDFs and images with complete examples
    • Configuration options for parse engine
    • Understanding parse results and chunk structure
    • File management and error handling
    • Batch processing examples
    • Multi-language support details
    • Best practices for production use
    • Performance considerations and optimization tips

Key Features

  • Layout-Aware Processing: Automatically detects text, tables, and images before OCR
  • Multiple OCR Engines: Choose from Tesseract, EasyOCR, PaddleOCR, Surya, GPT-4o, Gemini, and more
  • Table-Specific Processing: Specialized models optimized for accurate table extraction
  • Multi-Language Support: Process documents in 35+ languages including Arabic
  • Dex Integration: Seamless integration with Dex’s document understanding platform
  • Async Processing: Non-blocking parse jobs with project-based organization

How to Access

IRIS is available through the Dex SDK as a parse engine option:
from dex_core.models.parse_job import IrisParseEngineOptions, IrisParseJobParams

parse_job = await dex_file.start_parse_job(
    IrisParseJobParams(options=IrisParseEngineOptions())
)

Configuration Updates

Updated: October 2024
  • Added explicit V5 (beta) version tags to all Capabilities navigation groups
  • Ensures proper scoping of Capabilities documentation to V5
  • Improved navigation organization for better user experience
Affected Sections:
  • Getting Started
  • Document Understanding
  • OCR
  • Workflows

Support and Feedback

Compass-Specific Questions

For questions or issues related to Compass Workflows:

Dex-Specific Questions

For questions or issues related to Dex:

General Capabilities Documentation

Have suggestions for improving our Capabilities documentation? Please contact the Scale AI team or submit feedback through your account dashboard.