This code, example notebooks, and documentation for Scale’s Dex service - a document understanding capability that helps extract accurate information from unstructured data.

Overview

Dex is Scale’s document understanding service that enables:
  • Document Parsing - Convert any document (PDFs, DOCX, images, etc.) into structured JSON format
  • Data Extraction - Extract specific information using custom schemas and prompts
  • Project Management - Organize and isolate data with proper credential management
  • File Management - Secure file upload and storage with cloud provider integration

Prerequisites

Before using this repository, ensure you have:
  • ✅ A valid Scale account with SGP (Scale General Platform) access
  • ✅ Your SGP account ID and API key set as environment variables:
export SGP_ACCOUNT_ID="your_account_id"
export SGP_API_KEY="your_api_key"
  • ✅ VPN connection to Scale’s internal network
  • ✅ Python 3.8+ installed
  • ✅ Required Python packages (see Installation section)

Installation

1. Install Dex SDK

The Dex SDK can be installed from Scale’s internal CodeArtifact repository

2. Alternative: Install from Local Wheels

If you have the wheel files locally:
pip install sdk/dex_core-xxx.whl sdk/dex_sdk-xxx.whl

3. Install Additional Dependencies

pip install requests pydantic

Quick Start

1. Initialize Dex Client

import os
from datetime import datetime
from dex_sdk import DexClient
from dex_sdk.types import ProjectCredentials, SGPCredentials

# Initialize the Dex client
dex_client = DexClient(
    base_url="https://dex.sgp.scale.com",
)

2. Create a Project

# Create a project with SGP credentials
project = await dex_client.create_project(
    name="My Dex Project",
    credentials=ProjectCredentials(
        sgp=SGPCredentials(
            account_id=os.getenv("SGP_ACCOUNT_ID"),
            api_key=os.getenv("SGP_API_KEY"),
        ),
    ),
)

3. Upload a Document

# Upload a file to the project
dex_file = await project.upload_file("path/to/your/document.pdf")
print(f"✅ File uploaded successfully! File ID: {dex_file.id}")

4. Parse the Document

from dex_sdk.types import (
    ParseEngine,
    ParseJobRequestParams,
    ReductoChunkingMethod,
    ReductoChunkingOptions,
    ReductoParseEngineOptions,
)

# Parse the document
parse_result = await dex_file.parse(
    ParseJobRequestParams(
        engine=ParseEngine.REDUCTO,
        options=ReductoParseEngineOptions(
            chunking=ReductoChunkingOptions(
                chunk_mode=ReductoChunkingMethod.VARIABLE,
            )
        ),
    )
)

5. Extract Structured Data

from pydantic import BaseModel, Field

# Define your extraction schema
class FinancialData(BaseModel):
    """Schema for financial data extraction"""
    taxable_this_period: float = Field(description="Taxable income for this period in dollars")
    tax_exempt_this_period: float = Field(description="Tax-exempt income for this period in dollars")
    # ... add more fields as needed

# Extract data using prompts and schema
system_prompt = "You are a helpful assistant that extracts financial data from documents with high accuracy."
user_prompt = """
From the provided text, extract the following:
1. **Income Summary**
   - Taxable income for this period
   - Tax-exempt income for this period
   - ... add more extraction instructions
"""

extract_result = await parse_result.extract(
    parameters={
        "prompt": system_prompt,
        "user_prompt": user_prompt,
        "extraction_schema": FinancialData,
        "model": "openai/gpt-4o",
    }
)

Troubleshooting

Common Issues

  1. VPN Connection Problems
# Test VPN connection
response = requests.get("https://dex.sgp.scale.com")
if response.status_code != 200:
    print("Connect to VPN and try again")
  1. Authentication Errors
    • Verify SGP_ACCOUNT_ID and SGP_API_KEY environment variables
    • Check that your Scale account has SGP access
  2. File Upload Issues
    • Ensure file format is supported
    • Check file size limits
    • Verify file path is correct
  3. Parsing Failures
    • Check document quality (scanned documents may need higher resolution)
    • Try different chunking methods
    • Verify OCR engine compatibility

API Reference

DexClient

  • create_project() - Create a new project
  • list_projects() - List all projects

Project

  • upload_file() - Upload a document
  • list_files() - List uploaded files

DexFile

  • parse() - Parse document to structured format

ParseResult

  • extract() - Extract structured data using schemas

Contributing

When adding new test cases or examples:
  1. Follow the existing notebook structure
  2. Include clear documentation and comments
  3. Test with various document types
  4. Update this README with new features or examples

Support

For issues or questions:
  • Check the troubleshooting section above
  • Review the notebook examples
  • Contact the Scale Dex team for technical support at #sgp-document-understanding-capability

License

This repository is for internal Scale use only.