IRIS v2 Chunk Structure
IRIS v2 always returns achunks array in the parse result. chunk_mode: disabled is a Reducto-only concept (soon to be deprecated) that does not exist in IRIS v2.
Chunk count and size are driven by how IRIS segments each page, not by a separate chunking enum.
| Control | Values | Effect on chunks |
|---|---|---|
layout | "rt_detr_bce" (default) | One chunk per detected layout region (text, table, image, etc.) |
layout | "whole_page" | One region per page → one chunk per page |
e2e_ocr + e2e_response_parser | e.g. DeepSeek OCR2 | Full-page e2e OCR; parser defines region/box structure |
| Layout filters | confidence_threshold, containment_threshold, strict_containment_filter, table/image thresholds | Fewer or more regions → fewer or more chunks |
img_method | "description", "base64", "skip" | Whether image regions become chunks (skip omits image regions) |
IrisParseEngineOptions inside IrisParseJobParams. They are not the same as Dex’s four rechunk strategies below—they control IRIS’s native output only.
Parse with IRIS v2
Dex Chunking Strategies
Dex offers four chunking strategies that work on any parse result—including IRIS v2 via rechunking. Use these when you need consistent, configurable chunk boundaries across documents or for embeddings/RAG| Strategy | Description | Best For | Embedding Suitability |
|---|---|---|---|
token_size | Splits by token count using a tokenizer (e.g., tiktoken) | LLM APIs with token limits, cost optimization | Excellent |
recursive | Recursively splits using separators (paragraphs → sentences → words) | Articles, documentation, RAG systems | Excellent |
by_page | Splits by page boundaries, grouping complete pages | Legal documents, forms, reports | May be large |
by_section | Splits by section headers (e.g., markdown #, ##, ###) | Technical manuals, wikis, academic papers | Good |
Parse Once, Post-chunking as needed
With IRIS 2, you parse without Dex rechunking (omit chunking_options), then apply Dex strategies on the parse result. This is the recommended pattern: one parse, many chunking experiments.Async Rechunking
For long-running documents, start the rechunk job and poll for completion:Chunking Decision Tree
IRIS v2 native (IrisParseEngineOptions):
- Many small, layout-aware chunks →
layout="rt_detr_bce"(default) - One chunk per page →
layout="whole_page" - Full-page e2e model →
e2e_ocr+e2e_response_parser - Fewer regions (fewer chunks) → raise confidence thresholds or tighten containment filters
token_size when:
- Working with LLM APIs that have token limits
- Embedding models with specific token limits
- Cost optimization
recursive when:
- General document chunking for RAG
- Preserving paragraphs and sentences
- Articles, blog posts, documentation
by_page when:
- Legal documents, forms, reports
- Page references matter
- Page structure should be preserved
by_section when:
- Documents have clear section headers
- Technical manuals, wikis, academic papers
- Semantic coherence within topics
Reducto Chunking Parse-Time - (Legacy)
When using the Reducto parse engine (soon to be deprecated), you can chunk during the initial parse instead of rechunking. Reducto’s methods are layout-aware and use document structure.| Method | Chunk Size | Best For | Embedding | Location Tracking |
|---|---|---|---|---|
| VARIABLE | Auto (optimal) | General use, embeddings | Excellent | Good |
| BLOCK | Small (~100-500 chars) | Precise locations, UI overlays | Too small | Excellent |
| SECTION | Medium (~1000-3000 chars) | Structured documents | Good | Good |
| PAGE | Large (full page) | Page-oriented docs | May be large | Excellent |
| PAGE_SECTIONS | Medium-Large | Hybrid needs | Good | Good |
| DISABLED | Very large (entire doc) | Special cases | Too large | Excellent |
VARIABLE for most cases, especially with embeddings.
Pattern: Retry with Different Chunking
Reducto-only decision tree (Legacy)
Use ReductoVARIABLE when:
- Still on Reducto and want layout-aware chunking at parse time
- General document processing with parser-optimal chunk sizes
BLOCK when:
- Need precise bounding box information
- Building UI overlays on documents
- Not using for embeddings
DISABLED when:
- You plan to rechunk with Dex strategies (same pattern as IRIS v2)
Next Steps
- Vector Stores: Add chunks to vector stores for semantic search
- Extract: Extract structured data from parse results
- Parse: Parse engine options and configuration (IRIS v2)

