> ## Documentation Index
> Fetch the complete documentation index at: https://docs.gp.scale.com/llms.txt
> Use this file to discover all available pages before exploring further.

# DEX model engine choices

> A guide to the DEX model engine options

DEX is the SDK that runs document parsing, extraction, retrieval and research flows. There are currently two options for the underlying parsing engine.

1. *IRIS* - Currently on V2, IRIS is Scale's proprietary OCR & document extraction model that has the best performance across both arabic & english texts. This is highly customisable with different models available for layout and parsing steps with high parallelism across pages or sections available in later versions.
2. *REDUCTO* - Currently being deprecated - Reducto is a third party provider that provides end to end parsing of a wide range of documents without the performance & customisation available with IRIS V2

## Quick Decision Guide

### Use Iris (Default engine)

**Iris(v2) is the recommended default DEX engine for all production use cases:**

* You can **maximize accuracy** through testing any OCR model uploaded to SGP
* You can **minimize latency** through specialized optimization
* You need **greater transparency** of the processes within Temporal
* You can use **specialized models** that are either open source or have been built or specifically for unique document types
* You can \*\*customise and control pipelines \*\* for layout detection, OCR, and assembly

### Use Reducto (Soon to be Deprecated)

**Reducto is the legacy OCR DEX engine for most production use cases:**

* You can accept a *third party service* for OCR and parsing integrated into your application
* You need **proven stability** and do not need future functionality or updates from the DEX team
* You're doing **batch processing** where somewhat high latency is acceptable
* You want **automatic scaling** with your workload
* You're processing **English or Germanic language documents**
* You need a **low-complexity, managed solution**

## Understanding the Relationship Between Iris and Dex

**Iris and Dex are not mutually exclusive**—they're complementary. Dex is a document understanding platform that provides primitives for file management, parsing, vector stores, and data extraction. Iris is one of several OCR engines available within Dex.

Think of it this way: *Dex is the platform, Iris is one of the engines*. When you use Dex, you choose which OCR engine to use for parsing:

* **Iris**: Better accuracy & latency for all languages including Arabic & English
* **Custom engines**: Integrate your own OCR solution
* **Reducto**: (Deprecated)

## What is Iris?

Iris is Scale's OCR capability that provides a flexible, modular pipeline for extracting text from documents. It's designed for teams with custom OCR needs who want complete control over the processing pipeline.

**Iris offers:**

* **Customisable OCR models**: Tesseract, EasyOCR, PaddleOCR, Surya, GPT-5.4, Gemini, and any other specialized models uploaded to SGP
* **Complete pipeline control**: Configure layout detection, OCR processing, and assembly separately
* **Inspection capabilities**: Save and review intermediate results at each processing stage
* **Extensibility**: Add custom OCR models through SGP or layout detectors through simple adapters
* **Optimization flexibility**: Fine-tune for maximum accuracy or minimum latency based on your needs
* **Custom pipeline experimentation**: Test different combinations of layout detectors and OCR models
* **Specialized model development**: Build custom OCR for unique document types or languages
* **Performance optimization**: Tune for specific accuracy or latency requirements
* **Non-Germanic language optimization**: Experiment with models to find best accuracy for other languages such as Arabic.

## What is Dex?

Dex is Scale's document understanding platform—a production-ready service that transforms unstructured documents into actionable, structured data. It provides:

* **File Management**: Secure upload, storage, and retrieval with access control
* **Document Parsing**: Convert documents (PDF, DOCX, images) into structured JSON using multiple OCR engines
* **Vector Stores**: Index and search parsed documents with semantic embeddings
* **Data Extraction**: Extract information using custom schemas, prompts, and RAG-enhanced context
* **Project Management**: Organize and isolate data with proper authorization
* **Automatic scaling** with your workload
* **Multiple OCR engine options**: Iris (Recommended default, V2) & Reducto (soon to be deprecated)

## Feature Comparison

| Feature              | Iris (✅ Recommended)                                                                                                    | Reducto (⚠️ Soon to be Deprecated)                                 |
| -------------------- | ----------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------ |
| **Engine**           | Scale proprietary OCR (V2)                                                                                              | Third-party end-to-end parsing                                     |
| **Best For**         | Custom OCR control, specialized requirements                                                                            | Low-complexity managed solution, legacy production                 |
| **Performance**      | Best accuracy and latency across Arabic and English; highly customizable                                                | Wide document support without Iris V2 performance or customization |
| **OCR Models**       | Customisable OCR models (e.g. Tesseract, EasyOCR, PaddleOCR, Surya, GPT-5.4, Gemini, and more) throguh SGP model engine | Managed Reducto engine                                             |
| **Pipeline Control** | Configure layout detection and OCR as needed                                                                            | Managed end-to-end                                                 |
| **Languages**        | All languages, including Arabic                                                                                         | English or Germanic documents                                      |
| **Latency**          | Minimizable through high parallelism (V2)                                                                               | Somewhat high (acceptable for batch)                               |
| **Transparency**     | Greater visibility into processes within Temporal                                                                       | Third-party service integrated into your application               |
| **Future updates**   | Active development from the DEX team                                                                                    | No future functionality or updates from the DEX team               |
| **Typical use case** | Maximize accuracy, minimize latency, specialized or custom models                                                       | Batch processing where higher latency is acceptable                |

## Recommended Workflow

### For Standard Production Applications

**Use Iris as the DEX default model engine:**

1. **SOTA performance**: Accuracy optimization, latency minimization, or specialized models
2. **Customisability**: Test any OCR model hosted in SGP and pipeline configurations
3. **Pipeline control**: Configure layout detection and OCR as separate steps
4. **Language coverage**: Best accuracy and latency for Arabic, English, and other non-Germanic languages
5. **Transparency**: Inspect and debug processing stages within Temporal
6. **Extensibility**: Add custom OCR models or layout detectors through adapters
7. **Future-ready**: Benefit from ongoing development and Iris V2 parallelism improvements from the DEX team

**Use Reducto within Dex** for reliable, production-ready document processing:

1. **Managed infrastructure**: Rely on managed infrastructure for operational simplicity
2. **Language fit**: Process English or Germanic-language documents where Iris customization is not required
3. **Low complexity**: Use a third-party, end-to-end managed parsing service with minimal pipeline setup

## Getting Started

### Dex Documentation

* [Introduction to Dex](/docs/capabilities/document-understanding/introduction-to-dex)
* [Getting Started with Dex](/docs/capabilities/document-understanding/getting-started-with-dex)

### Iris Documentation

* [Introduction to Iris](/docs/capabilities/ocr/introduction-to-iris)
* [Getting Started with Iris](/docs/capabilities/ocr/getting-started-with-iris)

## Summary

* **Dex is the platform**: DEX runs parsing, extraction, retrieval, and research flows—you choose the underlying engine when you parse documents.
* **Iris (V2) is the recommended default**: Scale’s proprietary engine with the best Arabic and English performance, customisable OCR models, and full control over layout, OCR, and assembly.
* **Choose Iris when you need customization**: Optimize accuracy or latency, inspect pipelines in Temporal, use specialized or custom models, and support non-Germanic languages.
* **Reducto is legacy and being deprecated**: A third-party, managed end-to-end option with no future DEX updates—use only for existing low-complexity or English/Germanic batch workloads.
* **Reducto still fits narrow cases**: Proven stability, auto-scaling, and acceptable batch latency when you do not need Iris V2 performance or pipeline control.

Need help deciding? Contact the Dex team at `#dex-help` on Slack.
