Skip to main content
This guide covers common issues you might encounter when working with Dex and how to resolve them.

Installation Issues

Installing the Dex SDK

The Dex SDK is distributed via Scale’s private CodeArtifact repository and requires AWS authentication. Prerequisites:
  • AWS CLI installed and configured
  • Access to Scale’s production AWS account
Installation steps:
  1. Authenticate with AWS SSO:
aws sso login
  1. Configure pip for CodeArtifact:
aws codeartifact login \
    --tool pip \
    --domain scale \
    --repository scale-pypi \
    --profile production-developer \
    --region us-west-2
  1. Install the SDK: The exact command can be found on GitHub, in the introduction notebooks.
  2. Verify installation:
import dex_sdk
print(f"Dex SDK version: {dex_sdk.__version__}")

Common Installation Errors

Token Expired (401 Unauthorized) Re-run the CodeArtifact login command to refresh your authentication token. AWS CLI Not Configured Ensure AWS CLI is installed (aws --version) and configure AWS SSO for the production-developer profile. Package Not Found Verify your AWS credentials are valid: aws sts get-caller-identity Permission Denied Use a virtual environment instead of sudo: python -m venv venv && source venv/bin/activate

Runtime Issues

Connection Problems

Symptoms: Cannot connect to Dex service Test connection:
import requests
response = requests.get("https://dex.sgp.scale.com/health")
print(f"Status: {response.status_code}")
Solutions:
  • Check network connectivity
  • Verify credentials (see Authentication Errors below)
  • Confirm Dex service is available

Authentication Errors

Symptoms: Invalid credentials, permission denied Verify credentials:
import os
sgp_account_id = os.getenv("SGP_ACCOUNT_ID")
sgp_api_key = os.getenv("SGP_API_KEY")

if not sgp_account_id or not sgp_api_key:
    print("❌ Missing credentials")
Solutions:
  • Set environment variables: SGP_ACCOUNT_ID and SGP_API_KEY
  • Verify your account has SGP access
  • Check API key hasn’t expired or been revoked

File Upload Errors

Symptoms: Upload fails, file too large, unsupported format Supported formats:
  • Images: PNG, JPEG, TIFF, HEIC, and more
  • PDFs: PDF
  • Spreadsheets: CSV, XLSX, XLS
  • Documents: PPTX, DOCX, TXT, RTF
Solutions:
  • Verify file path is correct
  • Check file size is under 100MB
  • Ensure file format is supported
  • Verify file isn’t corrupted

Parsing Errors

Symptoms: OCR fails, document can’t be processed Common causes:
  • Low quality scans (use at least 300 DPI)
  • Password-protected documents
  • Extremely complex layouts
  • Very large documents timing out
Solutions:
  1. Check document quality and resolution
  2. Remove password protection before uploading
  3. Try a different OCR engine:
from dex_sdk.types import ParseEngine, ReductoParseJobParams

parse_result = await dex_file.parse(
    ReductoParseJobParams(engine=ParseEngine.SCALE_OCR)
)
  1. Split large documents into smaller chunks
  2. See complex layouts guide: Industry Document Types and Layout Challenges

Extraction Errors

Symptoms: Extraction fails, schema errors, timeout Common causes:
  • Invalid Pydantic schema
  • Model timeout or rate limiting
  • Insufficient context for extraction
  • Schema-data mismatch
Solutions:
  1. Validate your schema:
from pydantic import BaseModel, Field

class MySchema(BaseModel):
    field_name: str = Field(description="Clear description")

schema = MySchema.model_json_schema()
  1. Use clear prompts with detailed instructions
  2. Enable debugging:
extract_result = await parse_result.extract(
    extraction_schema=schema,
    user_prompt="Your detailed prompt",
    model="openai/gpt-4o",
    generate_citations=True,
    generate_confidence=True,
)
  1. Use vector stores for large documents:
vector_store = await project.create_vector_store(
    name="My Store",
    engine=VectorStoreEngines.SGP_KNOWLEDGE_BASE,
    embedding_model="openai/text-embedding-3-large",
)
await vector_store.add_parse_results([parse_result.id])

extract_result = await vector_store.extract(
    extraction_schema=schema,
    user_prompt=user_prompt,
    model="openai/gpt-4o",
)

Performance Issues

Slow Parsing

Causes: Large documents, complex layouts, high OCR load Solutions:
  • Process documents asynchronously in batches
  • Cache parse results for frequently accessed documents

Slow Extraction

Causes: Large context, complex schema, model performance Solutions:
  • Use vector stores to reduce context size
  • Simplify extraction schema
  • Choose faster models for time-sensitive applications
  • Use batch processing
Causes: Large document collections, complex queries Solutions:
  • Use appropriate top_k values
  • Add filters to narrow search scope
  • Use vector_store.search_in_file() for file-specific searches
  • Create separate vector stores for different categories

Error Handling

Exception Types

Always wrap Dex operations in try-except blocks:
from dex_sdk.exceptions import DexException

try:
    project = await dex_client.create_project(...)
    dex_file = await project.upload_file(...)
    parse_result = await dex_file.parse(...)
    extract_result = await parse_result.extract(...)
except DexException as e:
    print(f"Dex error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Common Exceptions

ExceptionCauseSolution
AuthenticationErrorInvalid credentialsVerify SGP_ACCOUNT_ID and SGP_API_KEY
FileUploadErrorFile format/size issuesCheck format and size limits
ParsingErrorOCR failureTry different OCR engine
ExtractionErrorSchema or model errorValidate schema
ConnectionErrorNetwork issuesCheck connectivity
RateLimitErrorToo many requestsImplement backoff/retry
PermissionErrorInsufficient accessCheck permissions

Debugging

Enable Logging

import logging

logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger("dex_sdk")
logger.setLevel(logging.DEBUG)

Inspect Parse Results

parse_data = parse_result.data.model_dump()
print(f"Pages: {len(parse_data.get('pages', []))}")
print(f"Chunks: {len(parse_data.get('chunks', []))}")

Validate Extraction Results

extraction_data = extract_result.data.model_dump()

if 'citations' in extraction_data:
    print(f"Citations found: {len(extraction_data['citations'])}")

if 'confidence' in extraction_data:
    print(f"Confidence: {extraction_data['confidence']}")

Test Components Individually

try:
    project = await dex_client.create_project(...)
    print("✅ Project created")

    dex_file = await project.upload_file(...)
    print("✅ File uploaded")

    parse_result = await dex_file.parse(...)
    print("✅ Parsing completed")

    extract_result = await parse_result.extract(...)
    print("✅ Extraction completed")
except Exception as e:
    print(f"❌ Failed at step: {e}")

Getting Help

Support Channels

When Reporting Issues

Include:
  1. Error message - Full error text and stack trace
  2. Code snippet - Minimal reproducible example
  3. Document type - File format and characteristics
  4. SDK version - Output of pip show dex_sdk
  5. Environment - Python version, OS
  6. Expected vs actual behavior

Additional Resources