Documentation Index
Fetch the complete documentation index at: https://docs.gp.scale.com/llms.txt
Use this file to discover all available pages before exploring further.
Convert documents to structured format for extraction. Dex supports multiple parse engines and async job monitoring.
Parse Document (Default)
parse_result = await dex_file.parse(
ReductoParseJobParams(
engine=ParseEngine.REDUCTO,
options=ReductoParseEngineOptions(
chunking=ReductoChunkingOptions(
chunk_mode=ReductoChunkingMethod.VARIABLE,
)
),
)
)
OCR Engine Comparison
| Engine | Best For | Status | Speed | Use When |
|---|
| Reducto | Production workloads, batch processing | Production-ready | Somewhat high (batch-friendly) | Standard production use |
| Iris | Custom OCR needs, experimentation | Experimental | Configurable | Custom models, accuracy/latency optimization |
| Azure Vision | Multilingual documents | Production-ready | Medium | Multilingual production use |
Recommendation: Use Reducto for production. See When to choose Iris? for custom OCR needs.
Which OCR Engine?
Production use → Reducto
Multilingual → Azure Vision or Reducto
Custom OCR needs → Iris (experimental) - see guide
Async Job Monitoring
New in v0.4.0: Use start_parse_job for better control over async operations and access to SGP traces.
Monitor Parse Job
import asyncio
# Start a parse job (returns immediately)
parse_job = await project.start_parse_job(
dex_file=dex_file,
parameters=ReductoParseJobParams(
engine=ParseEngine.REDUCTO,
options=ReductoParseEngineOptions(
chunking=ReductoChunkingOptions(
chunk_mode=ReductoChunkingMethod.VARIABLE,
)
),
),
)
# Monitor job progress
while parse_job.data.status not in [JobStatus.SUCCEEDED, JobStatus.FAILED]:
await asyncio.sleep(1)
await parse_job.refresh()
print(f"Job status: {parse_job.data.status}")
# Get result
if parse_job.data.status == JobStatus.SUCCEEDED:
parse_result = await parse_job.get_result()
print("Parse completed successfully")
else:
print(f"Parse failed: {parse_job.data.error_message}")
Retrieving SGP Traces for Debugging
from scale_gp_beta import SGPClient
sgp_client = SGPClient(
api_key=os.getenv("SGP_API_KEY"),
account_id=os.getenv("SGP_ACCOUNT_ID"),
)
# Search for job traces
spans = list(sgp_client.spans.search(
sort_by="created_at",
sort_order="desc",
extra_metadata={"job_id": parse_job.data.id},
parents_only=True,
))
if spans:
trace_id = spans[0].trace_id
all_spans = list(sgp_client.spans.search(trace_ids=[trace_id]))
for span in all_spans:
print(f"Span: {span.name}, Duration: {span.duration_ms}ms")
Process Multiple Files
import asyncio
# Parse all files in parallel
parse_tasks = [
dex_file.parse(
ReductoParseJobParams(
engine=ParseEngine.REDUCTO,
options=ReductoParseEngineOptions(
chunking=ReductoChunkingOptions(
chunk_mode=ReductoChunkingMethod.VARIABLE,
)
),
)
)
for dex_file in dex_files
]
parse_results = await asyncio.gather(*parse_tasks)
print(f"Parsed {len(parse_results)} documents")
Multi-Language Support
Dex supports 35+ languages with automatic language detection.
Parsing Non-English Documents
# Language is automatically detected
parse_result = await dex_file.parse(
ReductoParseJobParams(
engine=ParseEngine.REDUCTO,
options=ReductoParseEngineOptions(
chunking=ReductoChunkingOptions(
chunk_mode=ReductoChunkingMethod.VARIABLE,
),
extraction_mode="ocr", # Use OCR mode for better language support
),
)
)
Supported Languages
Germanic languages have excellent support. Additional 35+ languages include:
- European: Spanish, French, German, Italian, Portuguese, Dutch, Swedish, Norwegian, Danish, Polish, Russian, Ukrainian
- Asian: Chinese, Japanese, Korean, Thai, Vietnamese, Khmer, Lao
- Middle Eastern: Arabic, Hebrew, Persian, Turkish
- Indian: Hindi, Bengali, Tamil, Telugu, Malayalam, Kannada, Gujarati, Marathi, Punjabi
- And many more…
See the Introduction guide for the complete list.
Appendix: Essential Imports
from dex_sdk.types import (
ParseEngine,
ReductoParseJobParams,
ReductoChunkingMethod,
ReductoChunkingOptions,
ReductoParseEngineOptions,
IrisParseJobParams,
IrisParseEngineOptions,
JobStatus,
)
Next Steps
- Chunking: Choose chunking strategies for your documents
- Extract: Extract structured data from parse results
- Vector Stores: Use vector stores for large documents