Why Use IRIS?
Document processing often requires more than simple text extraction—you need to understand document structure, handle different content types, and preserve the relationships between elements. IRIS solves these challenges:- Layout-Aware Processing: Automatically detects and classifies different regions (text, tables, images, etc.) before applying specialized OCR
- Multiple OCR Engines: Choose from 15+ OCR models including Tesseract, EasyOCR, PaddleOCR, Surya, and advanced vision-language models like GPT-4o and Gemini
- Table-Specific Processing: Apply specialized OCR models optimized for tabular data extraction
- Flexible Architecture: Easily extend with custom OCR models or layout detectors
- Multi-Language Support: Handle documents in multiple languages including Arabic with specialized support
- Complete Pipeline Control: Inspect and configure each step—layout detection, content extraction, and assembly
How IRIS Works
IRIS follows a three-stage pipeline:1. Layout Detection
Detects and classifies regions in document images with bounding boxes and content types:- Text regions for paragraphs and body content
- Table regions for structured tabular data
- Image regions for figures and diagrams
- Other content types as needed
- rt_detr_bce: RT-DETR model (best performance)
- surya_bce: Surya layout detection
- yolo_bce: YOLO-based detection
2. OCR Processing
Applies specialized OCR models to each detected region based on content type:- Text OCR: Optimized for continuous text (paragraphs, headings, etc.)
- Table OCR: Specialized models for accurate table extraction
- Image Handling: Preserves image crops for inclusion in output
- Open-source models: EasyOCR, Tesseract, PaddleOCR, Surya
- Vision-language models: GPT-4o, Gemini (via LiteLLM), Qwen2-VL, Qwen2.5-VL
- Specialized models: Arabic Nougat (small/base/large), QAARI, MBZUAI-AIN
- Cloud services: Azure Computer Vision
- Advanced models: SmolDocling
3. Assembly
Combines extracted content back into useful formats:- MarkdownAssembler: Converts to structured markdown (default)
- JSONAssembler: Outputs detailed JSON for debugging and inspection
Core Features
Modular Architecture
Each pipeline component (layout detector, OCR model, assembler) is independently configurable and extensible. Add custom models by implementing simple adapter interfaces.Inspection and Debugging
Save intermediate results at each pipeline stage to understand and optimize the processing:- Layout detection bounding boxes and classifications
- Individual OCR outputs per region
- Final assembled output
Configuration Management
Use command-line arguments or YAML configuration files for reproducible processing:Performance Optimization
First-time model usage downloads weights; subsequent runs use cached models for faster processing.Common Use Cases
- Document Digitization: Convert scanned documents and PDFs to searchable text
- Form Processing: Extract structured data from forms with mixed content types
- Table Extraction: Accurately capture tabular data from complex documents
- Multi-Language Documents: Process documents in various languages including Arabic
- Research and Analysis: Extract text and data from academic papers, reports, and technical documents
- Archive Processing: Batch process large document collections
Getting Started
IRIS is integrated with Dex for seamless document processing:Python Package
Use programmatically through the Dex SDK:Language Support
IRIS supports multi-language OCR with models optimized for different language families. Specialized support includes:- Arabic: Arabic Nougat models, QAARI, MBZUAI-AIN, Gemini
- Multi-language: EasyOCR, Tesseract, PaddleOCR with broad language coverage
- Universal: Vision-language models (GPT-4o, Gemini, Qwen) supporting many languages
Key Advantages
Extensibility
Designed for easy extension. Add support for new OCR libraries, layout detectors, or assembly formats through simple adapter classes.Transparency
Inspect every pipeline stage to understand model behavior, debug issues, and optimize results for your specific documents.Flexibility
Mix and match components—use different OCR models for text vs. tables, choose layout detectors based on document type, and customize assembly output.API Integration
Vision-language models (Gemini, GPT-4o) use LiteLLM for unified API access, making it easy to leverage advanced AI capabilities.Next Steps
To start using IRIS:- Review the Using IRIS with Dex guide for integration details
- Set up your Dex client and SGP credentials
- Upload documents and start parse jobs
- Configure OCR options for your specific document types
- Integrate IRIS into your document processing workflows

