Overview
Dex is Scale’s document understanding capability that provides composable primitives for:- File Management - Secure file upload, storage, and retrieval with fine-grained access control
- Document Parsing - Convert any document (PDFs, DOCX, images, etc.) into structured JSON format with multiple OCR engines
- Vector Stores - Index and search parsed documents with semantic embeddings
- Data Extraction - Extract specific information using custom schemas, prompts, and RAG-enhanced context
- Project Management - Organize and isolate data with proper credential management and authorization
Prerequisites
Before using Dex, ensure you have:- ✅ A valid Scale account with SGP (Scale General Platform) access
- ✅ Your SGP account ID and API key set as environment variables:
- ✅ Python 3.8+ installed
- ✅ Dex SDK installed (see Installation section)
Installation
Install Dex SDK from CodeArtifact
With access to Scale CodeArtifact, install the Dex SDK (version 0.3.2 or higher) using your configured CodeArtifact credentials. This will install all required dependencies including Pydantic.Note: Version 0.3.2 introduces a new authentication method. Ensure you update to this version or higher. See the Changelog for details.
Quick Start
1. Initialize Dex Client
2. Create a Project
Projects isolate your data and credentials for tracing, billing, and SGP model calls. Every operation is tied to a project.Tip: Keep one project per use case or group of related files for clean traceability.
3. Upload a Document
Upload your document to the project. Dex supports PDFs, images, spreadsheets, and more.4. Parse the Document
Parse converts your document into a structured format with text, tables, and layout information.Note: Parsing is asynchronous. The SDK automatically polls for completion.
5. Extract Structured Data
Define a schema and extract specific information from your document.Complete Example
Here’s a complete working example you can copy and run:Next Steps
Now that you’ve completed the basics, explore these topics:Learn Advanced Features
- Advanced Features Guide: Vector stores, chunking strategies, batch processing, and more
- Quick Reference: Cheat sheet for common patterns and imports
Deep Dive into the API
- API Reference: Complete SDK documentation with all methods and types
- Troubleshooting Guide: Common issues and solutions
- Changelog: Latest updates and breaking changes
Additional Resources
- REST API: For non-Python integrations, see the REST API Reference
- Support: Questions? Ask in Slack channel
#dex - Examples: More examples in the Introduction guide
Common Questions
Q: How do I process multiple documents? A: Upload multiple files to the same project, parse each one, then optionally use vector stores for cross-document search. See Advanced Features. Q: Can I use a synchronous client? A: Yes! UseDexSyncClient from dex_sdk for synchronous operations. See Advanced Features.
Q: How do I configure data retention policies?
A: Set retention policies when creating a project. See Advanced Features.
Q: What OCR engines are available?
A: Reducto (for English and Latin scripts) and Iris (for non-English, non-Latin scripts like Arabic, Hebrew, CJK, Indic languages). See API Reference for details.
