Quick Decision Guide
Use Iris (in Dex) when
- You’re processing non-Germanic language documents where Iris has proven better accuracy than Reducto
- Latency is acceptable for your use case (asynchronous batch processing)
- You need to experiment and evaluate which OCR models work best for your specific document types
- You want to test different OCR engines (EasyOCR, PaddleOCR, Gemini, etc.) to find the best fit
Use Dex with Reducto when
- You’re processing English or Germanic language documents
- You need better stability for production workloads
- You want the system to automatically scale with your workload
Understanding the Relationship Between Iris and Dex
Iris and Dex are not mutually exclusive—they’re complementary. Dex is a document understanding platform that provides primitives for file management, parsing, vector stores, and data extraction. Iris is one of several OCR engines available within Dex. Think of it this way: Dex is the platform, Iris is one of the engines. When you use Dex, you choose which OCR engine to use for parsing:- Reducto: Best for English and Germanic languages
- Iris: Better accuracy for non-Germanic languages
- Custom engines: Integrate your own OCR solution
What is Iris?
Iris is Scale’s OCR capability that provides a flexible, modular pipeline for extracting text from documents. It offers:- 15+ OCR models: Tesseract, EasyOCR, PaddleOCR, Surya, GPT-4o, Gemini, and more
- Specialized support for non-Germanic scripts: Arabic Nougat models, QAARI, MBZUAI-AIN, and others
- Complete pipeline control: Configure layout detection, OCR processing, and assembly separately
- Inspection capabilities: Save and review intermediate results at each processing stage
- Extensibility: Add custom OCR models or layout detectors
Current Limitations
Current issues include:- Stability concerns: Less robust than Reducto for production use
- Significant latency: Higher processing time than desired
- No auto-scaling: Cannot scale up or down with system load
What is Dex?
Dex is Scale’s document understanding platform—a service that transforms unstructured documents into actionable, structured data. It provides:- File Management: Secure upload, storage, and retrieval with access control
- Document Parsing: Convert documents (PDF, DOCX, images) into structured JSON using multiple OCR engines
- Vector Stores: Index and search parsed documents with semantic embeddings
- Data Extraction: Extract information using custom schemas, prompts, and RAG-enhanced context
- Project Management: Organize and isolate data with proper authorization
- Automatic scaling with your workload
- Multiple OCR engine options (Reducto and Iris)
Using Iris Within Dex
Dex supports multiple OCR engines. When creating a parse job in Dex, you specify which engine to use:Engine Options
Reducto- Best for English and Germanic language documents
- Better stability for production workloads
- Auto-scaling capabilities
- Use for: English/Germanic documents or when stability is critical
- Proven better accuracy for non-Germanic language documents
- Higher latency (asynchronous processing)
- Stability concerns for production
- Use for: Non-Germanic languages where accuracy is the priority and asynchronous processing is acceptable
While Iris provides better accuracy for non-Germanic languages, it’s not yet ready for production environments that require high stability or auto-scaling. We’re working on improving Iris to make it production-ready, but currently recommend considering the stability/latency trade-offs carefully.
Feature Comparison
| Feature | Iris (via Dex) | Dex with Reducto |
|---|---|---|
| Best For | Non-Germanic languages | English & Germanic languages |
| Accuracy | Better for non-Germanic scripts | Better for Germanic scripts |
| Latency | Higher (asynchronous) | Medium-high (asynchronous) |
| Scalability | No auto-scaling | Full auto-scaling |
| OCR Models | 15+ models (configurable) | Reducto engine |
| Production Ready | ⚠️ Not yet | ✅ Yes |
Decision Tree
What language are your documents?English or Germanic languages (German, Dutch, Swedish, etc.)
→ Use Dex with ReductoNon-Germanic languages (Arabic, Hebrew, CJK, Indic, etc.)
Do you need maximum stability?- Yes → Use Dex with Reducto (accept lower accuracy)
- No → Use Dex with Iris (better accuracy, accept latency/stability trade-offs)
Language Support Guidance
For English & Germanic Languages
- Recommended: Dex with Reducto
- Reason: Best accuracy and production stability
For Non-Germanic Languages (Arabic, Hebrew, CJK, Indic Languages)
For better accuracy: Dex with Iris (proven better results)- Trade-offs: Higher latency, stability concerns
- Use case: Asynchronous batch processing where accuracy is the priority
- Trade-offs: Lower accuracy for these languages
- Use case: Production workloads requiring high reliability
Common Misconceptions
“Dex is real-time/low latency”Neither Reducto nor Iris in Dex provides real-time processing. Both have significant latency. The difference is that Reducto is more stable and has auto-scaling.
“Iris has better accuracy for non-Germanic languages”This is correct. Iris (used via Dex) has been proven to work better than Reducto for non-Germanic language documents.
Recommended Workflow
For Production Applications
- English/Germanic languages → Use Dex with Reducto
- Non-Germanic languages → Choose based on priority:
- Accuracy priority → Use Dex with Iris (accept latency/stability trade-offs)
- Stability priority → Use Dex with Reducto (accept lower accuracy)
For Experimentation
- Test with Iris to evaluate different OCR models
- Compare results between Iris and Reducto
- Transition to Dex for production deployment with your chosen engine
Getting Started
Dex Documentation
Iris Documentation
Summary
- Dex is the platform for document understanding with file management, parsing, extraction, and vector stores
- Iris is an OCR engine available within Dex (and standalone) optimized for non-Germanic languages
- For English/Germanic languages: Use Dex with Reducto
- For non-Germanic languages: Use Dex with Iris for better accuracy (asynchronous processing), or Reducto for better stability
- Iris is not production-ready yet: We’re working on making it fast and lightweight, but stability issues currently exist
#dex-help on Slack.
