Overview
Dex is Scale’s document understanding capability that provides composable primitives for:- File Management - Secure file upload, storage, and retrieval
- Document Parsing - Convert any document (PDFs, DOCX, images, etc.) into structured JSON format with multiple OCR engines
- Vector Stores - Embed, index and search parsed document corpora
- Data Extraction - Extract specific information using custom schemas, prompts, and RAG-enhanced context
- Project Management - Organize and isolate data with proper credential management and authorization
Prerequisites
Before using Dex, ensure you have:- ✅ A valid Scale account with SGP (Scale General Platform) access
- ✅ Your SGP account ID and API key set as environment variables:
- ✅ Python 3.8+ installed
- ✅ Dex SDK installed (see Installation section)
Installation
Install Dex SDK from CodeArtifact
With access to Scale CodeArtifact, install the Dex SDK (version 0.4.0 or higher recommended) using your configured CodeArtifact credentials. This will install all required dependencies including Pydantic.Note: Version 0.4.0 introduces pagination, filtering, and enhanced async job support. Version 0.3.2 introduced a new authentication method. See the Changelog for details.
Quick Start
1. Initialize Dex Client
2. Create a Project
Projects isolate your data and credentials for tracing, billing, and SGP model calls. Every operation is tied to a project.Tip: Keep one project per use case or group of related files for clean traceability.
3. Upload a Document
Upload your document to the project. Dex supports PDFs, images, spreadsheets, and more.4. Parse the Document
Parse converts your document into a structured format with text, tables, and layout information.Note: Parsing is asynchronous. The SDK automatically polls for completion.
5. Extract Structured Data
Define a schema and extract specific information from your document.Complete Example
Here’s a complete working example you can copy and run:Next Steps
Now that you’ve completed the basics, explore these topics:Learn More
- File Management: Upload, pagination, and supported file types
- Parse: Parse engines and async job monitoring
- Chunking: Chunking strategies for documents
- Vector Stores: Semantic search and RAG-enhanced extraction
- Extract: Batch extraction and extraction patterns
- Best Practices: Quick start, retention policies, and optimization
Deep Dive into the API
- API Reference: Complete SDK documentation with all methods and types
- Troubleshooting Guide: Common issues and solutions
- Changelog: Latest updates and breaking changes
Additional Resources
- REST API: For non-Python integrations, see the API Reference
- Support: Questions? Ask in Slack channel
#dex-help - Examples: More examples in the Introduction guide
Common Questions
Q: How do I process multiple documents? A: Upload multiple files to the same project, parse each one, then optionally use vector stores for cross-document search. See Extract. Q: Can I use a synchronous client? A: Yes! UseDexSyncClient from dex_sdk for synchronous operations. See Best Practices.
Q: How do I configure data retention policies?
A: Set retention policies when creating a project. See Best Practices.
Q: What OCR engines are available?
A: Reducto (production-ready) and Iris (experimental, for custom needs). See When to choose Iris?.
Q: How do I list files with pagination? (New in v0.4.0)
A: Use PaginationParams with list_files() to control page size and sorting. See API Reference.
Q: How do I monitor async jobs? (New in v0.4.0)
A: Use start_parse_job() for better control and access to SGP traces. See API Reference.
