Quick Decision Guide
Use Reducto (Recommended for Production)
Reducto is the recommended OCR engine for most production use cases:- You need proven stability and reliability for production workloads
- You’re doing batch processing where somewhat high latency is acceptable
- You want automatic scaling with your workload
- You’re processing English or Germanic language documents
- You need a low-complexity, managed solution
Use Iris (Experimental - Custom Needs)
Iris is for teams with specialized requirements who need custom OCR control:- You have custom OCR needs that require pipeline experimentation
- You want to maximize accuracy through testing 15+ different OCR models
- You need to minimize latency through specialized optimization
- You’re building specialized models for unique document types
- You need complete pipeline control for layout detection, OCR, and assembly
- You can accept experimental-level stability and support
Understanding the Relationship Between Iris and Dex
Iris and Dex are not mutually exclusive—they’re complementary. Dex is a document understanding platform that provides primitives for file management, parsing, vector stores, and data extraction. Iris is one of several OCR engines available within Dex. Think of it this way: Dex is the platform, Iris is one of the engines. When you use Dex, you choose which OCR engine to use for parsing:- Reducto: Best for English and Germanic languages
- Iris: Better accuracy for non-Germanic languages
- Custom engines: Integrate your own OCR solution
What is Iris?
Iris is Scale’s experimental OCR capability that provides a flexible, modular pipeline for extracting text from documents. It’s designed for teams with custom OCR needs who want complete control over the processing pipeline. Iris offers:- 15+ OCR models: Tesseract, EasyOCR, PaddleOCR, Surya, GPT-4o, Gemini, and specialized models
- Complete pipeline control: Configure layout detection, OCR processing, and assembly separately
- Inspection capabilities: Save and review intermediate results at each processing stage
- Extensibility: Add custom OCR models or layout detectors through simple adapters
- Optimization flexibility: Fine-tune for maximum accuracy or minimum latency based on your needs
When Iris Makes Sense
Iris is appropriate when you need:- Custom pipeline experimentation: Test different combinations of layout detectors and OCR models
- Specialized model development: Build custom OCR for unique document types or languages
- Performance optimization: Tune for specific accuracy or latency requirements
- Non-Germanic language optimization: Experiment with models to find best accuracy for Arabic, CJK, or Indic languages
What is Dex?
Dex is Scale’s document understanding platform—a production-ready service that transforms unstructured documents into actionable, structured data. It provides:- File Management: Secure upload, storage, and retrieval with access control
- Document Parsing: Convert documents (PDF, DOCX, images) into structured JSON using multiple OCR engines
- Vector Stores: Index and search parsed documents with semantic embeddings
- Data Extraction: Extract information using custom schemas, prompts, and RAG-enhanced context
- Project Management: Organize and isolate data with proper authorization
- Automatic scaling with your workload
- Multiple OCR engine options: Reducto (production-ready) and Iris (experimental)
Using Iris Within Dex
Dex supports multiple OCR engines. When creating a parse job in Dex, you specify which engine to use:Engine Options
Reducto (Recommended)- Production-ready with proven stability
- Good for batch processing with somewhat high but acceptable latency
- Auto-scaling capabilities
- Best accuracy for English and Germanic language documents
- Reliable and managed solution
- Use for: Standard production workloads, batch processing
- Experimental stability with specialized capabilities
- Complete pipeline control for custom optimization
- 15+ OCR models to experiment with
- Flexibility to maximize accuracy or minimize latency
- No auto-scaling, experimental-level support
- Use for: Custom OCR needs, specialized model development, performance optimization
Feature Comparison
| Feature | Reducto (Recommended) | Iris (Experimental) |
|---|---|---|
| Stability | ✅ Production-proven | ⚠️ Experimental |
| Best For | Standard production workloads | Custom OCR needs, experimentation |
| Support Level | Production support | Experimental support |
| Latency | Somewhat high (batch-friendly) | Configurable through optimization |
| Scalability | Full auto-scaling | No auto-scaling |
| OCR Models | Managed Reducto engine | 15+ models (fully configurable) |
| Pipeline Control | Managed | Complete control |
| Use Case | Batch processing, standard needs | Custom models, optimization experiments |
Decision Tree
Do you have custom OCR needs?No - Standard production document processing
→ Use Reducto within Dex- Proven stability for production
- Good for batch processing
- Auto-scaling
- Somewhat high but acceptable latency
Yes - Custom OCR requirements
What type of custom needs? Need to maximize accuracy or minimize latency:- Experiment with Iris’s 15+ OCR models
- Tune pipeline for specific performance goals
- Accept experimental software risks
- Build custom OCR for unique document types
- Integrate specialized models through Iris adapters
- Accept experimental-level stability and support
- Test different OCR engines for Arabic, CJK, Indic languages
- Experiment to find optimal accuracy
- Accept experimental-level stability and support
Common Misconceptions
“Reducto has high latency”Reducto has somewhat high latency, which is acceptable and expected for batch processing. This is normal for document OCR workloads and should not be a concern for typical batch use cases.
“I need Iris for better accuracy”Most teams should use Reducto. Choose Iris only if you have specific custom requirements (specialized models, pipeline optimization, or experimentation needs) that justify experimental-level stability.
Recommended Workflow
For Standard Production Applications
Use Reducto within Dex for reliable, production-ready document processing:- Start with Reducto for proven stability and auto-scaling
- Use batch processing to handle somewhat high latency
- Rely on managed infrastructure for operational simplicity
For Custom OCR Development
Use Iris when you have specialized requirements:- Identify specific custom needs: Accuracy optimization, latency minimization, or specialized models
- Experiment with Iris: Test 15+ OCR models and pipeline configurations
- Evaluate trade-offs: Ensure custom requirements justify experimental-level stability
- Consider Reducto first: If Reducto meets your needs, use it for better stability
Getting Started
Dex Documentation
Iris Documentation
Summary
- Reducto is recommended for standard production use: Proven stability, batch processing, auto-scaling, and managed infrastructure
- Iris has experimental stability: You can use it in production for custom OCR needs, but expect experimental-level support and stability
- Reducto has somewhat high latency: This is acceptable and expected for batch processing workloads
- Choose Iris for custom needs: When you need to maximize accuracy, minimize latency, or build specialized models
- Dex is the platform: Both Reducto and Iris are OCR engines within Dex’s document understanding platform
#dex-help on Slack.
