Skip to main content
Choosing the right OCR engine is critical for successful document processing. This guide will help you understand when to use Iris versus Reducto within Dex.

Quick Decision Guide

Reducto is the recommended OCR engine for most production use cases:
  • You need proven stability and reliability for production workloads
  • You’re doing batch processing where somewhat high latency is acceptable
  • You want automatic scaling with your workload
  • You’re processing English or Germanic language documents
  • You need a low-complexity, managed solution

Use Iris (Experimental - Custom Needs)

Iris is for teams with specialized requirements who need custom OCR control:
  • You have custom OCR needs that require pipeline experimentation
  • You want to maximize accuracy through testing 15+ different OCR models
  • You need to minimize latency through specialized optimization
  • You’re building specialized models for unique document types
  • You need complete pipeline control for layout detection, OCR, and assembly
  • You can accept experimental-level stability and support
Iris has experimental stability. While you can use Iris in production, expect experimental-level stability and support. For standard production workloads, we recommend Reducto within Dex.

Understanding the Relationship Between Iris and Dex

Iris and Dex are not mutually exclusive—they’re complementary. Dex is a document understanding platform that provides primitives for file management, parsing, vector stores, and data extraction. Iris is one of several OCR engines available within Dex. Think of it this way: Dex is the platform, Iris is one of the engines. When you use Dex, you choose which OCR engine to use for parsing:
  • Reducto: Best for English and Germanic languages
  • Iris: Better accuracy for non-Germanic languages
  • Custom engines: Integrate your own OCR solution

What is Iris?

Iris is Scale’s experimental OCR capability that provides a flexible, modular pipeline for extracting text from documents. It’s designed for teams with custom OCR needs who want complete control over the processing pipeline. Iris offers:
  • 15+ OCR models: Tesseract, EasyOCR, PaddleOCR, Surya, GPT-4o, Gemini, and specialized models
  • Complete pipeline control: Configure layout detection, OCR processing, and assembly separately
  • Inspection capabilities: Save and review intermediate results at each processing stage
  • Extensibility: Add custom OCR models or layout detectors through simple adapters
  • Optimization flexibility: Fine-tune for maximum accuracy or minimum latency based on your needs

When Iris Makes Sense

Iris is appropriate when you need:
  • Custom pipeline experimentation: Test different combinations of layout detectors and OCR models
  • Specialized model development: Build custom OCR for unique document types or languages
  • Performance optimization: Tune for specific accuracy or latency requirements
  • Non-Germanic language optimization: Experiment with models to find best accuracy for Arabic, CJK, or Indic languages
Iris has experimental stability. It lacks auto-scaling and has more stability concerns than Reducto. You can use it in production but should expect experimental-level support.

What is Dex?

Dex is Scale’s document understanding platform—a production-ready service that transforms unstructured documents into actionable, structured data. It provides:
  • File Management: Secure upload, storage, and retrieval with access control
  • Document Parsing: Convert documents (PDF, DOCX, images) into structured JSON using multiple OCR engines
  • Vector Stores: Index and search parsed documents with semantic embeddings
  • Data Extraction: Extract information using custom schemas, prompts, and RAG-enhanced context
  • Project Management: Organize and isolate data with proper authorization
  • Automatic scaling with your workload
  • Multiple OCR engine options: Reducto (production-ready) and Iris (experimental)

Using Iris Within Dex

Dex supports multiple OCR engines. When creating a parse job in Dex, you specify which engine to use:

Engine Options

Reducto (Recommended)
  • Production-ready with proven stability
  • Good for batch processing with somewhat high but acceptable latency
  • Auto-scaling capabilities
  • Best accuracy for English and Germanic language documents
  • Reliable and managed solution
  • Use for: Standard production workloads, batch processing
Iris (Experimental)
  • Experimental stability with specialized capabilities
  • Complete pipeline control for custom optimization
  • 15+ OCR models to experiment with
  • Flexibility to maximize accuracy or minimize latency
  • No auto-scaling, experimental-level support
  • Use for: Custom OCR needs, specialized model development, performance optimization
Reducto is recommended for standard production use. Iris has experimental stability—you can use it in production but should expect experimental-level support and stability.

Feature Comparison

FeatureReducto (Recommended)Iris (Experimental)
Stability✅ Production-proven⚠️ Experimental
Best ForStandard production workloadsCustom OCR needs, experimentation
Support LevelProduction supportExperimental support
LatencySomewhat high (batch-friendly)Configurable through optimization
ScalabilityFull auto-scalingNo auto-scaling
OCR ModelsManaged Reducto engine15+ models (fully configurable)
Pipeline ControlManagedComplete control
Use CaseBatch processing, standard needsCustom models, optimization experiments

Decision Tree

Do you have custom OCR needs?

No - Standard production document processing

Use Reducto within Dex
  • Proven stability for production
  • Good for batch processing
  • Auto-scaling
  • Somewhat high but acceptable latency

Yes - Custom OCR requirements

What type of custom needs? Need to maximize accuracy or minimize latency:
  • Experiment with Iris’s 15+ OCR models
  • Tune pipeline for specific performance goals
  • Accept experimental software risks
Need specialized models:
  • Build custom OCR for unique document types
  • Integrate specialized models through Iris adapters
  • Accept experimental-level stability and support
Need non-Germanic language optimization:
  • Test different OCR engines for Arabic, CJK, Indic languages
  • Experiment to find optimal accuracy
  • Accept experimental-level stability and support
If you don’t have specialized requirements, use Reducto for production-proven stability and support.

Common Misconceptions

“Iris has production-level stability”Iris has experimental stability. You can use it in production, but expect experimental-level support and stability. For production-proven reliability, use Reducto.
“Reducto has high latency”Reducto has somewhat high latency, which is acceptable and expected for batch processing. This is normal for document OCR workloads and should not be a concern for typical batch use cases.
“I need Iris for better accuracy”Most teams should use Reducto. Choose Iris only if you have specific custom requirements (specialized models, pipeline optimization, or experimentation needs) that justify experimental-level stability.

For Standard Production Applications

Use Reducto within Dex for reliable, production-ready document processing:
  1. Start with Reducto for proven stability and auto-scaling
  2. Use batch processing to handle somewhat high latency
  3. Rely on managed infrastructure for operational simplicity

For Custom OCR Development

Use Iris when you have specialized requirements:
  1. Identify specific custom needs: Accuracy optimization, latency minimization, or specialized models
  2. Experiment with Iris: Test 15+ OCR models and pipeline configurations
  3. Evaluate trade-offs: Ensure custom requirements justify experimental-level stability
  4. Consider Reducto first: If Reducto meets your needs, use it for better stability

Getting Started

Dex Documentation

Iris Documentation

Summary

  • Reducto is recommended for standard production use: Proven stability, batch processing, auto-scaling, and managed infrastructure
  • Iris has experimental stability: You can use it in production for custom OCR needs, but expect experimental-level support and stability
  • Reducto has somewhat high latency: This is acceptable and expected for batch processing workloads
  • Choose Iris for custom needs: When you need to maximize accuracy, minimize latency, or build specialized models
  • Dex is the platform: Both Reducto and Iris are OCR engines within Dex’s document understanding platform
Need help deciding? Contact the Dex team at #dex-help on Slack.