Dex Chunking Strategies
Dex offers four chunking strategies that work with any parse result via rechunking. Use these when you need consistent, configurable chunk boundaries across different parse engines.| Strategy | Description | Best For | Embedding Suitability |
|---|---|---|---|
token_size | Splits by token count using a tokenizer (e.g., tiktoken) | LLM APIs with token limits, cost optimization | Excellent |
recursive | Recursively splits using separators (paragraphs → sentences → words) | Articles, documentation, RAG systems | Excellent |
by_page | Splits by page boundaries, grouping complete pages | Legal documents, forms, reports | May be large |
by_section | Splits by section headers (e.g., markdown #, ##, ###) | Technical manuals, wikis, academic papers | Good |
Parse Once, Rechunk as Needed
Parse with disabled chunking (Reducto only) to get raw blocks, then apply Dex chunking strategies. This lets you experiment with different strategies without re-parsing.Async Rechunking
For long-running documents, start the rechunk job and poll for completion:Chunking with Iris
The Iris parse engine supports Dex chunking strategies viachunking_options in IrisParseJobParams, or rechunking after the parse.
Option 1: Configure Chunking in Parse Job
Passchunking_options when creating the parse job. Chunking is applied automatically after Iris parses the document:
Option 2: Rechunk After Parsing
Parse first, then rechunk to experiment with different strategies without re-parsing:Reducto Chunking (Parse-Time)
When using the Reducto parse engine, you can chunk during the initial parse instead of rechunking. Reducto’s methods are layout-aware and use document structure.| Method | Chunk Size | Best For | Embedding | Location Tracking |
|---|---|---|---|---|
| VARIABLE | Auto (optimal) | General use, embeddings | Excellent | Good |
| BLOCK | Small (~100-500 chars) | Precise locations, UI overlays | Too small | Excellent |
| SECTION | Medium (~1000-3000 chars) | Structured documents | Good | Good |
| PAGE | Large (full page) | Page-oriented docs | May be large | Excellent |
| PAGE_SECTIONS | Medium-Large | Hybrid needs | Good | Good |
| DISABLED | Very large (entire doc) | Special cases | Too large | Excellent |
VARIABLE for most cases, especially with embeddings.
Pattern: Retry with Different Chunking
Chunking Decision Tree
Use Dextoken_size when:
- Working with LLM APIs that have token limits
- Embedding models with specific token limits
- Cost optimization
recursive when:
- General document chunking for RAG
- Preserving paragraphs and sentences
- Articles, blog posts, documentation
by_page when:
- Legal documents, forms, reports
- Page references matter
- Page structure should be preserved
by_section when:
- Documents have clear section headers
- Technical manuals, wikis, academic papers
- Semantic coherence within topics
VARIABLE when:
- Parsing with Reducto and want layout-aware chunking
- General document processing
- You want optimal chunk sizes from the parser
BLOCK when:
- Need precise bounding box information
- Building UI overlays on documents
- Not using for embeddings
DISABLED when:
- You plan to rechunk with Dex strategies
- Parse once, experiment with multiple chunking approaches
Which Chunking Method?
Next Steps
- Vector Stores: Add chunks to vector stores for semantic search
- Extract: Extract structured data from parse results
- Parse: Parse engine options and configuration

