Troubleshooting Dex

This guide covers common issues you might encounter when working with Dex and how to resolve them.

Installation Issues

Installing the Dex SDK

The Dex SDK is distributed via Scale’s private CodeArtifact repository and requires AWS authentication. Prerequisites:

AWS CLI installed and configured
Access to Scale’s production AWS account

Step 1: Authenticate with AWS SSO Before installing packages from CodeArtifact, authenticate with AWS:

aws sso login

This will open your browser for authentication. Complete the login process. Step 2: Configure pip for CodeArtifact After SSO login, configure pip to use CodeArtifact:

aws codeartifact login \
    --tool pip \
    --domain scale \
    --repository scale-pypi \
    --profile production-developer \
    --region us-west-2

This command:

Authenticates pip with CodeArtifact
Configures your pip to use Scale’s private repository
Sets up a temporary token (typically valid for 12 hours)

Step 3: Install the Dex SDK Once authenticated, install the SDK. The exact command can be found on GitHub, in the introduction notebooks.

Common Installation Problems

1. Token Expired Error If you see authentication errors like “401 Unauthorized” when installing packages, your CodeArtifact token has likely expired.

ERROR: Could not install packages due to an EnvironmentError: 401 Unauthorized

Solution: Refresh your authentication token:

aws codeartifact login \
    --tool pip \
    --domain scale \
    --repository scale-pypi \
    --profile production-developer \
    --region us-west-2

2. AWS CLI Not Configured If you get errors about missing AWS credentials or profiles:

Error: The config profile (production-developer) could not be found

Solution:

Ensure AWS CLI is installed: aws --version
Configure AWS SSO for the production-developer profile
Contact IT support for AWS access if you don’t have it

3. Package Not Found If pip cannot find the dex-sdk package:

ERROR: Could not find a version that satisfies the requirement dex-sdk

Solutions:

Re-run the CodeArtifact login command
Check that you have access to Scale’s CodeArtifact repository
Verify your AWS credentials are valid: aws sts get-caller-identity

4. Permission Denied If you encounter permission errors during installation: Solutions:

Don’t use sudo with pip - this can cause issues with CodeArtifact authentication
Use a virtual environment: python -m venv venv && source venv/bin/activate
Check that your AWS user has the necessary CodeArtifact permissions

5. SSL Certificate Errors If you see SSL/TLS errors when connecting to CodeArtifact: Solutions:

Update your CA certificates
Check your network proxy settings

Verifying Installation

After installation, verify the SDK is installed correctly:

import dex_sdk
print(f"Dex SDK version: {dex_sdk.__version__}")

If this runs without errors, your installation is successful!

Common Issues

1. Connection Problems

If you cannot connect to Dex, verify your network connectivity and authentication. Test your connection:

import requests

try:
    response = requests.get("https://dex.sgp.scale.com/health")
    if response.status_code == 200:
        print("✅ Connected to Dex")
    else:
        print("❌ Connection issue - check network")
except Exception as e:
    print(f"❌ Cannot reach Dex: {e}")
    print("Check your network connectivity and try again")

Solutions:

Check your network connectivity
Verify your credentials are correct (see Authentication Errors below)
Ensure the Dex service is available and not under maintenance

2. Authentication Errors

Authentication issues are usually caused by missing or incorrect credentials. Verify your credentials:

import os

sgp_account_id = os.getenv("SGP_ACCOUNT_ID")
sgp_api_key = os.getenv("SGP_API_KEY")

if not sgp_account_id or not sgp_api_key:
    print("❌ Missing credentials")
    print("Set SGP_ACCOUNT_ID and SGP_API_KEY environment variables")
else:
    print("✅ Credentials found")

Common causes:

Missing or incorrect SGP_ACCOUNT_ID and SGP_API_KEY environment variables
Insufficient permissions on your Scale account
Account doesn’t have SGP access
API key has expired or been revoked

Solutions:

Set your environment variables:

export SGP_ACCOUNT_ID="your_account_id"
export SGP_API_KEY="your_api_key"

Verify your account has SGP access
Ensure your API key is still valid
Check that you have the necessary permissions for the operations you’re trying to perform

3. File Upload Issues

File upload failures can occur due to format, size, or accessibility issues. Check file before upload:

import os

file_path = "document.pdf"

if not os.path.exists(file_path):
    print(f"❌ File not found: {file_path}")
elif os.path.getsize(file_path) > 100 * 1024 * 1024:  # 100MB
    print(f"❌ File too large: {os.path.getsize(file_path)} bytes")
else:
    print(f"✅ File ready for upload")

Supported file formats:

Images: PNG, JPEG/JPG, GIF, BMP, TIFF, PCX, PPM, APNG, PSD, CUR, DCX, FTEX, PIXAR, HEIC
PDFs: PDF (Portable Document Format)
Spreadsheets: CSV, XLSX, XLSM, XLS, XLTX, XLTM, QPW
Documents: PPTX, PPT, DOCX, DOC, DOTX, WPD, TXT, RTF

Common issues:

File doesn’t exist at the specified path
File exceeds size limit (100MB)
File format is not supported
File is corrupted or malformed
Insufficient storage quota in your project

Solutions:

Verify file path is correct
Check file size is under 100MB
Ensure file format is in the supported list
Try opening the file locally to verify it’s not corrupted
Contact support if you need a higher size limit

4. Parsing Errors

Parsing can fail due to unsupported content, OCR issues, or corrupted documents. Common parsing issues:

Document contains unsupported elements
OCR engine cannot process the content (e.g., extremely low quality scans)
Document is password-protected or encrypted
Document structure is too complex
Parsing timeout for very large documents

Solutions:

Check document quality: Ensure scanned documents have sufficient resolution (at least 300 DPI recommended)
Remove password protection: Unlock encrypted documents before uploading
Try a different OCR engine: Switch between Reducto and Scale OCR engines:

from dex_sdk.types import ParseEngine, ReductoParseJobParams

# Try Scale OCR if Reducto fails
parse_result = await dex_file.parse(
    ReductoParseJobParams(
        engine=ParseEngine.SCALE_OCR,
    )
)

Split large documents: For very large documents, consider splitting them into smaller chunks
Check document validity: Open the document in a native viewer to ensure it’s not corrupted

Handling Complex Layout Challenges

If you’re encountering parsing issues with documents that have complex layouts, consider these common challenges:

Multi-column layouts with narrow text and complex footnotes
Large, variable-structure tables with merged or rotated cells
Embedded charts, graphs, and financial figures
Watermarks, signatures, and scanned pages

For guidance on addressing these layout challenges and understanding best practices for different document types, see Industry Document Types and Layout Challenges. This resource covers proven approaches for handling complex document structures across finance, healthcare, insurance, and legal use cases.

5. Extraction Errors

Extraction can fail due to schema issues, model errors, or invalid parameters. Common extraction issues:

Invalid extraction schema
Model timeout or rate limiting
Insufficient context for extraction
Schema-data mismatch
Model doesn’t support requested features

Solutions:

Validate your schema:

from pydantic import BaseModel, Field

# Ensure your schema is valid Pydantic
class MySchema(BaseModel):
    field_name: str = Field(description="Clear description")

# Always use model_json_schema()
schema = MySchema.model_json_schema()

Use clear prompts: Provide detailed, specific instructions in your user_prompt
Enable debugging features:

extract_result = await parse_result.extract(
    extraction_schema=schema,
    user_prompt="Your detailed prompt here",
    model="openai/gpt-4o",
    generate_citations=True,  # Helps debug extraction
    generate_confidence=True,  # Shows confidence levels
)

Use vector stores for large documents: If extraction fails due to context length, use RAG-enhanced extraction:

# Create vector store and add parsed results
vector_store = await project.create_vector_store(
    name="My Store",
    engine=VectorStoreEngines.SGP_KNOWLEDGE_BASE,
    embedding_model="openai/text-embedding-3-large",
)
await vector_store.add_parse_results([parse_result.id])

# Extract with RAG context
extract_result = await vector_store.extract(
    extraction_schema=schema,
    user_prompt=user_prompt,
    model="openai/gpt-4o",
)

Check model availability: Ensure the model you’re using is available and not deprecated

Error Handling

The Dex SDK raises exceptions for various error conditions. Always wrap your code in try-except blocks for production use:

from dex_sdk.exceptions import DexException

try:
    # Your Dex operations
    project = await dex_client.create_project(...)
    dex_file = await project.upload_file(...)
    parse_result = await dex_file.parse(...)
    extract_result = await parse_result.extract(...)
except DexException as e:
    print(f"Dex error occurred: {e}")
    # Handle the error appropriately
except Exception as e:
    print(f"Unexpected error: {e}")
    # Handle unexpected errors

Common exception types:

Exception Type	Cause	Solution
`AuthenticationError`	Invalid credentials	Verify `SGP_ACCOUNT_ID` and `SGP_API_KEY`
`FileUploadError`	File format/size issues	Check file format and size limits
`ParsingError`	OCR failure	Try different OCR engine or check document quality
`ExtractionError`	Schema or model error	Validate schema and check model availability
`ConnectionError`	Network issues	Verify network connectivity and credentials
`RateLimitError`	Too many requests	Implement backoff and retry logic
`PermissionError`	Insufficient access	Check account permissions

Performance Issues

Slow Parsing

Causes:

Large document size
Complex document structure
High OCR processing load

Solutions:

Process documents asynchronously in batches
Consider caching parse results for frequently accessed documents

Slow Extraction

Causes:

Large context size
Complex extraction schema
Model performance

Solutions:

Use vector stores for large documents to reduce context size
Simplify extraction schema if possible
Choose faster models for time-sensitive applications
Use batch processing for multiple extractions

Vector Store Search Performance

Causes:

Large number of indexed documents
Complex search queries
Embedding model latency

Solutions:

Use appropriate top_k values (avoid retrieving too many results)
Add filters to narrow down search scope
Use file-specific search when possible: vector_store.search_in_file()
Consider creating separate vector stores for different document categories

Debugging Tips

Enable Verbose Logging

import logging

# Enable debug logging for Dex SDK
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger("dex_sdk")
logger.setLevel(logging.DEBUG)

Inspect Parse Results

# Check what was parsed
parse_data = parse_result.data.model_dump()
print(f"Pages: {len(parse_data.get('pages', []))}")
print(f"Chunks: {len(parse_data.get('chunks', []))}")
print(f"Blocks: {len(parse_data.get('blocks', []))}")

Validate Extraction Results

# Check extraction quality
extraction_data = extract_result.data.model_dump()

# Check if citations are present
if 'citations' in extraction_data:
    print(f"Citations found: {len(extraction_data['citations'])}")

# Check confidence scores
if 'confidence' in extraction_data:
    print(f"Confidence score: {extraction_data['confidence']}")

Test Components Individually

# Test each step separately
try:
    # 1. Test project creation
    project = await dex_client.create_project(...)
    print("✅ Project created")
    
    # 2. Test file upload
    dex_file = await project.upload_file(...)
    print("✅ File uploaded")
    
    # 3. Test parsing
    parse_result = await dex_file.parse(...)
    print("✅ Parsing completed")
    
    # 4. Test extraction
    extract_result = await parse_result.extract(...)
    print("✅ Extraction completed")
    
except Exception as e:
    print(f"❌ Failed at step: {e}")

Getting Help

If you continue to experience issues after trying these troubleshooting steps:

Internal Support Channels

Slack: Contact the Dex team at #dex
Documentation: Review the full documentation at Dex Documentation

When Reporting Issues

Please include:

Error message: Full error text and stack trace
Code snippet: Minimal reproducible example
Document type: File format and characteristics (if relevant)
SDK version: Output of pip show dex_sdk
Environment: Python version, OS
Expected vs actual behavior: What you expected to happen vs what actually happened

Useful Resources

Getting Started Guide: Quick start tutorial
Quick Reference: Common patterns and error fixes
Advanced Features: Vector stores, batch processing, optimization
API Reference: Complete SDK documentation
Introduction to Dex: Core concepts and architecture
SGP Platform Docs: Scale General Platform documentation

Getting Started

Document Understanding

OCR

Workflows

Troubleshooting Dex

Installation Issues

Installing the Dex SDK

Common Installation Problems

Verifying Installation

Common Issues

1. Connection Problems

2. Authentication Errors

3. File Upload Issues

4. Parsing Errors

Handling Complex Layout Challenges

5. Extraction Errors

Error Handling

Performance Issues

Slow Parsing

Slow Extraction

Vector Store Search Performance

Debugging Tips

Enable Verbose Logging

Inspect Parse Results

Validate Extraction Results

Test Components Individually

Getting Help

Internal Support Channels

When Reporting Issues

Useful Resources

Getting Started

Document Understanding

OCR

Workflows

​Installation Issues

​Installing the Dex SDK

​Common Installation Problems

​Verifying Installation

​Common Issues

​1. Connection Problems

​2. Authentication Errors

​3. File Upload Issues

​4. Parsing Errors

​Handling Complex Layout Challenges

​5. Extraction Errors

​Error Handling

​Performance Issues

​Slow Parsing

​Slow Extraction

​Vector Store Search Performance

​Debugging Tips

​Enable Verbose Logging

​Inspect Parse Results

​Validate Extraction Results

​Test Components Individually

​Getting Help

​Internal Support Channels

​When Reporting Issues

​Useful Resources

Installation Issues

Installing the Dex SDK

Common Installation Problems

Verifying Installation

Common Issues

1. Connection Problems

2. Authentication Errors

3. File Upload Issues

4. Parsing Errors

Handling Complex Layout Challenges

5. Extraction Errors

Error Handling

Performance Issues

Slow Parsing

Slow Extraction

Vector Store Search Performance

Debugging Tips

Enable Verbose Logging

Inspect Parse Results

Validate Extraction Results

Test Components Individually

Getting Help

Internal Support Channels

When Reporting Issues

Useful Resources