Skip to main content
This guide covers common issues you might encounter when working with Dex and how to resolve them.

Installation Issues

Installing the Dex SDK

The Dex SDK is distributed via Scale’s private CodeArtifact repository and requires AWS authentication. Prerequisites:
  • AWS CLI installed and configured
  • Access to Scale’s production AWS account
  • VPN connection to Scale’s internal network
Step 1: Authenticate with AWS SSO Before installing packages from CodeArtifact, authenticate with AWS:
aws sso login
This will open your browser for authentication. Complete the login process. Step 2: Configure pip for CodeArtifact After SSO login, configure pip to use CodeArtifact:
aws codeartifact login \
    --tool pip \
    --domain scale \
    --repository scale-pypi \
    --profile production-developer \
    --region us-west-2
This command:
  • Authenticates pip with CodeArtifact
  • Configures your pip to use Scale’s private repository
  • Sets up a temporary token (typically valid for 12 hours)
Step 3: Install the Dex SDK Once authenticated, install the SDK. The exact command can be found on GitHub, in the introduction notebooks.

Common Installation Problems

1. Token Expired Error If you see authentication errors like “401 Unauthorized” when installing packages, your CodeArtifact token has likely expired.
ERROR: Could not install packages due to an EnvironmentError: 401 Unauthorized
Solution: Refresh your authentication token:
aws codeartifact login \
    --tool pip \
    --domain scale \
    --repository scale-pypi \
    --profile production-developer \
    --region us-west-2
2. AWS CLI Not Configured If you get errors about missing AWS credentials or profiles:
Error: The config profile (production-developer) could not be found
Solution:
  • Ensure AWS CLI is installed: aws --version
  • Configure AWS SSO for the production-developer profile
  • Contact IT support for AWS access if you don’t have it
3. Package Not Found If pip cannot find the dex-sdk package:
ERROR: Could not find a version that satisfies the requirement dex-sdk
Solutions:
  • Verify you’re on the Scale VPN
  • Re-run the CodeArtifact login command
  • Check that you have access to Scale’s CodeArtifact repository
  • Verify your AWS credentials are valid: aws sts get-caller-identity
4. Permission Denied If you encounter permission errors during installation: Solutions:
  • Don’t use sudo with pip - this can cause issues with CodeArtifact authentication
  • Use a virtual environment: python -m venv venv && source venv/bin/activate
  • Check that your AWS user has the necessary CodeArtifact permissions
5. SSL Certificate Errors If you see SSL/TLS errors when connecting to CodeArtifact: Solutions:
  • Ensure you’re connected to the Scale VPN
  • Update your CA certificates
  • Check your network proxy settings

Verifying Installation

After installation, verify the SDK is installed correctly:
import dex_sdk
print(f"Dex SDK version: {dex_sdk.__version__}")
If this runs without errors, your installation is successful!

Common Issues

1. VPN Connection Problems

If you cannot connect to Dex, ensure you’re connected to Scale’s internal network via all-traffic VPN (not eng-split-prod). Test your connection:
import requests

try:
    response = requests.get("https://dex.sgp.scale.com/health")
    if response.status_code == 200:
        print("✅ Connected to Dex")
    else:
        print("❌ Connection issue - check VPN")
except Exception as e:
    print(f"❌ Cannot reach Dex: {e}")
    print("Connect to Scale VPN and try again")
Solutions:
  • Verify you’re connected to the correct VPN (all-traffic)
  • Check your network connectivity
  • Ensure the Dex service is available and not under maintenance

2. Authentication Errors

Authentication issues are usually caused by missing or incorrect credentials. Verify your credentials:
import os

sgp_account_id = os.getenv("SGP_ACCOUNT_ID")
sgp_api_key = os.getenv("SGP_API_KEY")

if not sgp_account_id or not sgp_api_key:
    print("❌ Missing credentials")
    print("Set SGP_ACCOUNT_ID and SGP_API_KEY environment variables")
else:
    print("✅ Credentials found")
Common causes:
  • Missing or incorrect SGP_ACCOUNT_ID and SGP_API_KEY environment variables
  • Insufficient permissions on your Scale account
  • Account doesn’t have SGP access
  • API key has expired or been revoked
Solutions:
  1. Set your environment variables:
export SGP_ACCOUNT_ID="your_account_id"
export SGP_API_KEY="your_api_key"
  1. Verify your account has SGP access
  2. Ensure your API key is still valid
  3. Check that you have the necessary permissions for the operations you’re trying to perform

3. File Upload Issues

File upload failures can occur due to format, size, or accessibility issues. Check file before upload:
import os

file_path = "document.pdf"

if not os.path.exists(file_path):
    print(f"❌ File not found: {file_path}")
elif os.path.getsize(file_path) > 100 * 1024 * 1024:  # 100MB
    print(f"❌ File too large: {os.path.getsize(file_path)} bytes")
else:
    print(f"✅ File ready for upload")
Supported file formats:
  • Images: PNG, JPEG/JPG, GIF, BMP, TIFF, PCX, PPM, APNG, PSD, CUR, DCX, FTEX, PIXAR, HEIC
  • PDFs: PDF (Portable Document Format)
  • Spreadsheets: CSV, XLSX, XLSM, XLS, XLTX, XLTM, QPW
  • Documents: PPTX, PPT, DOCX, DOC, DOTX, WPD, TXT, RTF
Common issues:
  • File doesn’t exist at the specified path
  • File exceeds size limit (100MB)
  • File format is not supported
  • File is corrupted or malformed
  • Insufficient storage quota in your project
Solutions:
  • Verify file path is correct
  • Check file size is under 100MB
  • Ensure file format is in the supported list
  • Try opening the file locally to verify it’s not corrupted
  • Contact support if you need a higher size limit

4. Parsing Errors

Parsing can fail due to unsupported content, OCR issues, or corrupted documents. Common parsing issues:
  • Document contains unsupported elements
  • OCR engine cannot process the content (e.g., extremely low quality scans)
  • Document is password-protected or encrypted
  • Document structure is too complex
  • Parsing timeout for very large documents
Solutions:
  1. Check document quality: Ensure scanned documents have sufficient resolution (at least 300 DPI recommended)
  2. Remove password protection: Unlock encrypted documents before uploading
  3. Try a different OCR engine: Switch between Reducto and Scale OCR engines:
from dex_sdk.types import ParseEngine, ParseJobRequestParams

# Try Scale OCR if Reducto fails
parse_result = await dex_file.parse(
    ParseJobRequestParams(
        engine=ParseEngine.SCALE_OCR,
    )
)
  1. Split large documents: For very large documents, consider splitting them into smaller chunks
  2. Check document validity: Open the document in a native viewer to ensure it’s not corrupted

5. Extraction Errors

Extraction can fail due to schema issues, model errors, or invalid parameters. Common extraction issues:
  • Invalid extraction schema
  • Model timeout or rate limiting
  • Insufficient context for extraction
  • Schema-data mismatch
  • Model doesn’t support requested features
Solutions:
  1. Validate your schema:
from pydantic import BaseModel, Field

# Ensure your schema is valid Pydantic
class MySchema(BaseModel):
    field_name: str = Field(description="Clear description")

# Always use model_json_schema()
schema = MySchema.model_json_schema()
  1. Use clear prompts: Provide detailed, specific instructions in your user_prompt
  2. Enable debugging features:
from dex_sdk.types import ExtractionParameters

extract_result = await parse_result.extract(
    ExtractionParameters(
        user_prompt="Your detailed prompt here",
        extraction_schema=schema,
        model="openai/gpt-4o",
        generate_citations=True,  # Helps debug extraction
        generate_confidence=True,  # Shows confidence levels
    )
)
  1. Use vector stores for large documents: If extraction fails due to context length, use RAG-enhanced extraction:
# Create vector store and add parsed results
vector_store = await project.create_vector_store(
    name="My Store",
    engine=VectorStoreEngines.SGP_KNOWLEDGE_BASE,
    embedding_model="openai/text-embedding-3-large",
)
await vector_store.add_parse_results([parse_result.id])

# Extract with RAG context
extract_result = await vector_store.extract(
    ExtractionParameters(
        user_prompt=user_prompt,
        extraction_schema=schema,
        model="openai/gpt-4o",
    )
)
  1. Check model availability: Ensure the model you’re using is available and not deprecated

Error Handling

The Dex SDK raises exceptions for various error conditions. Always wrap your code in try-except blocks for production use:
from dex_sdk.exceptions import DexException

try:
    # Your Dex operations
    project = await dex_client.create_project(...)
    dex_file = await project.upload_file(...)
    parse_result = await dex_file.parse(...)
    extract_result = await parse_result.extract(...)
except DexException as e:
    print(f"Dex error occurred: {e}")
    # Handle the error appropriately
except Exception as e:
    print(f"Unexpected error: {e}")
    # Handle unexpected errors
Common exception types:
Exception TypeCauseSolution
AuthenticationErrorInvalid credentialsVerify SGP_ACCOUNT_ID and SGP_API_KEY
FileUploadErrorFile format/size issuesCheck file format and size limits
ParsingErrorOCR failureTry different OCR engine or check document quality
ExtractionErrorSchema or model errorValidate schema and check model availability
ConnectionErrorNetwork/VPN issuesVerify VPN connection and network
RateLimitErrorToo many requestsImplement backoff and retry logic
PermissionErrorInsufficient accessCheck account permissions

Performance Issues

Slow Parsing

Causes:
  • Large document size
  • Complex document structure
  • High OCR processing load
Solutions:
  • Process documents asynchronously in batches
  • Consider caching parse results for frequently accessed documents

Slow Extraction

Causes:
  • Large context size
  • Complex extraction schema
  • Model performance
Solutions:
  • Use vector stores for large documents to reduce context size
  • Simplify extraction schema if possible
  • Choose faster models for time-sensitive applications
  • Use batch processing for multiple extractions

Vector Store Search Performance

Causes:
  • Large number of indexed documents
  • Complex search queries
  • Embedding model latency
Solutions:
  • Use appropriate top_k values (avoid retrieving too many results)
  • Add filters to narrow down search scope
  • Use file-specific search when possible: vector_store.search_in_file()
  • Consider creating separate vector stores for different document categories

Debugging Tips

Enable Verbose Logging

import logging

# Enable debug logging for Dex SDK
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger("dex_sdk")
logger.setLevel(logging.DEBUG)

Inspect Parse Results

# Check what was parsed
parse_data = parse_result.data.model_dump()
print(f"Pages: {len(parse_data.get('pages', []))}")
print(f"Chunks: {len(parse_data.get('chunks', []))}")
print(f"Blocks: {len(parse_data.get('blocks', []))}")

Validate Extraction Results

# Check extraction quality
extraction_data = extract_result.data.model_dump()

# Check if citations are present
if 'citations' in extraction_data:
    print(f"Citations found: {len(extraction_data['citations'])}")

# Check confidence scores
if 'confidence' in extraction_data:
    print(f"Confidence score: {extraction_data['confidence']}")

Test Components Individually

# Test each step separately
try:
    # 1. Test project creation
    project = await dex_client.create_project(...)
    print("✅ Project created")
    
    # 2. Test file upload
    dex_file = await project.upload_file(...)
    print("✅ File uploaded")
    
    # 3. Test parsing
    parse_result = await dex_file.parse(...)
    print("✅ Parsing completed")
    
    # 4. Test extraction
    extract_result = await parse_result.extract(...)
    print("✅ Extraction completed")
    
except Exception as e:
    print(f"❌ Failed at step: {e}")

Getting Help

If you continue to experience issues after trying these troubleshooting steps:

Internal Support Channels

  • Slack: Contact the Dex team at #sgp-document-understanding-capability
  • Documentation: Review the full documentation at Dex Documentation

When Reporting Issues

Please include:
  1. Error message: Full error text and stack trace
  2. Code snippet: Minimal reproducible example
  3. Document type: File format and characteristics (if relevant)
  4. SDK version: Output of pip show dex_sdk
  5. Environment: Python version, OS, VPN status
  6. Expected vs actual behavior: What you expected to happen vs what actually happened

Useful Resources