Research

Research runs an agentic job that answers complex questions over your documents. It decomposes the task, searches and cites your sources, aggregates findings, and validates the output. Inputs are scoped to parse results, vector stores, and/or indexes, and results come back as structured fields—or a markdown report—each backed by citations and usage metrics.

Research is a newly released Dex capability. The API may change in future SDK versions.

Prerequisites

A research job needs at least one source to search. Before running a job, prepare one or more of:

Parse results — parsed documents in your project
A vector store — for semantic/hybrid chunk search (at most one per job)
An index — for locating relevant files in a corpus or files which are not ingested in the vector store (at most one per job)

Running a Research Job

Build a ResearchJobRequest describing the inputs and the task, start the job, then poll for the result.

from dex_sdk import ResearchInputs, ResearchJobRequest, ResearchTask

req = ResearchJobRequest(
    inputs=ResearchInputs(
        vector_stores=[vector_store.id],
        indexes=[index.id],
    ),
    task=ResearchTask(
        description="Between 'UK - DP' and 'DACH - DP', which had higher cash income in FY 2016?",
        output_schema=MyOutputSchema.model_json_schema(),
    ),
    model="openai/gpt-5.2",
)

# Start the job
job = await project.start_research_job(req)
print("Research job:", job.id)

# Poll to completion and get the typed result
res = await job.get_result(max_attempts=1000)

Reading Results

The result’s data maps each output field to a value, confidence, and citations. Citations are a discriminated union—use citation.kind to tell block citations (from PDFs) from spreadsheet citations.

for name, field in res.data.items():
    print(f"{name}: {field.value} (confidence={field.confidence})")
    for citation in field.citations:
        if citation.kind == "block":
            print(f"  [block] file={citation.file_id} page={citation.page} bbox={citation.bbox}")
        else:  # "spreadsheet"
            print(f"  [spreadsheet] file={citation.file_id} sheet={citation.sheet} cell={citation.location}")

# Other details
print("Total tokens used:", res.usage_info.total_tokens)
print("Research trace:", res.trace)

Structured vs. Report Output

Choose the output shape on the ResearchTask:

Structured — set output_schema to a JSON Schema. Results populate res.data with one entry per schema field.
Report — set report_schema to a markdown template. Access the rendered report with the res.report shortcut.

task = ResearchTask(
    description="Summarize FY2016 segment performance.",
    report_schema="## Summary\n{summary}\n\n## Key figures\n{figures}",
)
# ...after the job completes:
print(res.report)  # markdown string, or None for structured runs

Advanced Configuration

For finer control, configure tools and per-stage behavior. Per-tool config — enable/disable tools and pass tool-specific options (e.g. top_k, filters, rerank config):

from dex_sdk.types import ResearchToolConfigEntry

req = ResearchJobRequest(
    inputs=ResearchInputs(vector_stores=[vector_store.id], indexes=[index.id]),
    task=ResearchTask(description="..."),
    model="openai/gpt-5.2",
    tools={
        # Vector store search tools: similarity_search, lexical_search, hybrid_search.
        # Index search tools: search_index, navigate_index.
        "similarity_search": ResearchToolConfigEntry(
            enabled=True,
            config={"top_k": 10},
        ),
        "lexical_search": ResearchToolConfigEntry(enabled=False, config={}),
    },
)

Per-stage overrides — override the prompt, model, temperature, or reasoning effort for any stage of the agentic pipeline (decomposition, subtasks, aggregation, output validation):

from dex_sdk.types import (
    ResearchAdvanced,
    ResearchAdvancedStep,
    ResearchReasoningBlock,
)

req.advanced = ResearchAdvanced(
    aggregation=ResearchAdvancedStep(
        prompt="<override prompt for the aggregation step>",
        model="openai/gpt-5.4",
        temperature=0.0,
        reasoning=ResearchReasoningBlock(effort="high"),
    ),
)

Additional flags on the request:

enable_subtable_citations — cite specific table cells/ranges rather than whole tables.

Listing Research Jobs

jobs = await project.list_research_jobs()
for job in jobs:
    print(job.id)

Appendix: Essential Imports

from dex_sdk import DexClient, ResearchInputs, ResearchJobRequest, ResearchTask

The advanced configuration types (ResearchToolConfigEntry, ResearchAdvanced, ResearchAdvancedStep, ResearchReasoningBlock, and the result types ResearchResult, BlockCitation, SpreadsheetCitation) are re-exported from dex_sdk.types.

Next Steps

Indexing: Build a hierarchical index to use as a research input
Vector Stores: Semantic and hybrid search for RAG-enhanced extraction
Extract: Extract structured data from a single parse result or vector store

Getting Started

Document Understanding

OCR

Workflows

Training

Prerequisites

Running a Research Job

Reading Results

Structured vs. Report Output

Advanced Configuration

Listing Research Jobs

Appendix: Essential Imports

Next Steps

​Prerequisites

​Running a Research Job

​Reading Results

​Structured vs. Report Output

​Advanced Configuration

​Listing Research Jobs

​Appendix: Essential Imports

​Next Steps

Prerequisites

Running a Research Job

Reading Results

Structured vs. Report Output

Advanced Configuration

Listing Research Jobs

Appendix: Essential Imports

Next Steps