Skip to main content
IRIS is Scale’s experimental OCR capability that transforms document images and PDFs into structured text. It provides a comprehensive “layout-then-OCR” pipeline that combines layout detection, specialized OCR models, and intelligent assembly to extract meaningful information from complex documents.
IRIS has experimental stability. While you can use IRIS in production, expect experimental-level stability and support. For standard production workloads, we recommend Reducto within Dex for proven reliability. IRIS is best suited for teams with custom OCR needs who want to maximize latency, accuracy, or build specialized models.

When to Use IRIS?

IRIS is designed for teams that need:
  • Custom OCR pipeline control: Fine-tune layout detection, OCR models, and assembly for specialized document types
  • Maximum accuracy optimization: Experiment with 15+ OCR models to find the best fit for your specific documents
  • Minimum latency requirements: Optimize processing speed for your specific use case through custom configuration
  • Specialized model development: Build and integrate custom OCR models or layout detectors for unique document challenges
  • Multi-language experimentation: Test different OCR engines for non-Germanic languages to find optimal accuracy

For Standard Production Use

If you need stable batch document processing without custom requirements, use Reducto within Dex:
  • Production stability with proven reliability
  • Good for batch processing with somewhat high but acceptable latency
  • Auto-scaling with your workload
  • Best accuracy for English and Germanic languages
  • Lower operational complexity
See the When to choose Iris? guide for a detailed comparison.

IRIS Capabilities

IRIS provides advanced capabilities for teams with specialized OCR needs:
  • Layout-Aware Processing: Automatically detects and classifies different regions (text, tables, images, etc.) before applying specialized OCR
  • 15+ OCR Models: Choose from Tesseract, EasyOCR, PaddleOCR, Surya, GPT-4o, Gemini, and specialized models for Arabic and other languages
  • Table-Specific Processing: Apply specialized OCR models optimized for tabular data extraction
  • Extensible Architecture: Easily add custom OCR models or layout detectors through simple adapter interfaces
  • Multi-Language Flexibility: Test and optimize for documents in multiple languages including Arabic with specialized support
  • Complete Pipeline Control: Inspect and configure each step—layout detection, content extraction, and assembly

How IRIS Works

IRIS follows a three-stage pipeline:

1. Layout Detection

Detects and classifies regions in document images with bounding boxes and content types:
  • Text regions for paragraphs and body content
  • Table regions for structured tabular data
  • Image regions for figures and diagrams
  • Other content types as needed
Supported layout detection models:
  • rt_detr_bce: RT-DETR model (best performance)
  • surya_bce: Surya layout detection
  • yolo_bce: YOLO-based detection

2. OCR Processing

Applies specialized OCR models to each detected region based on content type:
  • Text OCR: Optimized for continuous text (paragraphs, headings, etc.)
  • Table OCR: Specialized models for accurate table extraction
  • Image Handling: Preserves image crops for inclusion in output
Available OCR engines include:
  • Open-source models: EasyOCR, Tesseract, PaddleOCR, Surya
  • Vision-language models: GPT-4o, Gemini (via LiteLLM), Qwen2-VL, Qwen2.5-VL
  • Specialized models: Arabic Nougat (small/base/large), QAARI, MBZUAI-AIN
  • Cloud services: Azure Computer Vision
  • Advanced models: SmolDocling

3. Assembly

Combines extracted content back into useful formats:
  • MarkdownAssembler: Converts to structured markdown (default)
  • JSONAssembler: Outputs detailed JSON for debugging and inspection

Core Features

Modular Architecture

Each pipeline component (layout detector, OCR model, assembler) is independently configurable and extensible. Add custom models by implementing simple adapter interfaces.

Inspection and Debugging

Save intermediate results at each pipeline stage to understand and optimize the processing:
  • Layout detection bounding boxes and classifications
  • Individual OCR outputs per region
  • Final assembled output

Configuration Management

Use command-line arguments or YAML configuration files for reproducible processing:
layout: rt_detr_bce
text_ocr: easyocr
table_ocr: gemini
output_dir: results
save_intermediate: true

Performance Optimization

First-time model usage downloads weights; subsequent runs use cached models for faster processing.

Common Use Cases

  • Document Digitization: Convert scanned documents and PDFs to searchable text
  • Form Processing: Extract structured data from forms with mixed content types
  • Table Extraction: Accurately capture tabular data from complex documents
  • Multi-Language Documents: Process documents in various languages including Arabic
  • Research and Analysis: Extract text and data from academic papers, reports, and technical documents
  • Archive Processing: Batch process large document collections

Getting Started

IRIS is integrated with Dex for seamless document processing:

Python Package

Use programmatically through the Dex SDK:
from dex_sdk.client import DexClient
from dex_core.models.parse_job import IrisParseEngineOptions, IrisParseJobParams

# Initialize client and project
dex_client = DexClient(base_url="your-dex-url")
project = await dex_client.create_project(name="ocr-project")

# Upload and parse document
dex_file = await project.upload_file("document.pdf")
parse_job = await dex_file.start_parse_job(
    IrisParseJobParams(options=IrisParseEngineOptions())
)
result = await parse_job.get_parse_result()

Language Support

IRIS supports multi-language OCR with models optimized for different language families. Specialized support includes:
  • Arabic: Arabic Nougat models, QAARI, MBZUAI-AIN, Gemini
  • Multi-language: EasyOCR, Tesseract, PaddleOCR with broad language coverage
  • Universal: Vision-language models (GPT-4o, Gemini, Qwen) supporting many languages

Key Advantages

Extensibility

Designed for easy extension. Add support for new OCR libraries, layout detectors, or assembly formats through simple adapter classes.

Transparency

Inspect every pipeline stage to understand model behavior, debug issues, and optimize results for your specific documents.

Flexibility

Mix and match components—use different OCR models for text vs. tables, choose layout detectors based on document type, and customize assembly output.

API Integration

Vision-language models (Gemini, GPT-4o) use LiteLLM for unified API access, making it easy to leverage advanced AI capabilities.

Next Steps

To start using IRIS:
  1. Review the Getting Started with IRIS guide for step-by-step instructions
  2. Learn about IRIS’s multi-stage OCR pipeline and configuration options
  3. Integrate IRIS into your Dex-based document processing workflows
IRIS provides the building blocks for robust document OCR while giving you complete control over the processing pipeline.