Skip to main content
This reference documents the Python SDK methods for Scale’s Dex document understanding capability.

DexClient

The main client for interacting with the Dex service.

Project Management

  • create_project(name, configuration) - Create a new project with optional configuration
  • list_projects() - List all accessible projects
  • get_project(project_id) - Retrieve a specific project
  • update_project(project_id, updates) - Update project name, configuration, or status
Example:
import os
from datetime import timedelta
from dex_sdk import DexClient
from dex_sdk.types import ProjectConfiguration, RetentionPolicy

# Initialize client with SGP credentials
dex_client = DexClient(
    base_url="https://dex.sgp.scale.com",
    api_key=os.getenv("SGP_API_KEY"),
    account_id=os.getenv("SGP_ACCOUNT_ID"),
)

# Create project (credentials are passed via client initialization)
project = await dex_client.create_project(
    name="My Project",
)

# Create project with data retention policy
project = await dex_client.create_project(
    name="My Compliant Project",
    configuration=ProjectConfiguration(
        retention=RetentionPolicy(
            files=timedelta(days=30),
            result_artifacts=timedelta(days=7),
        )
    )
)

# Update project configuration
await dex_client.update_project(
    project_id=project.id,
    updates={
        "configuration": ProjectConfiguration(
            retention=RetentionPolicy(
                files=timedelta(days=90),
                result_artifacts=timedelta(days=30),
            )
        )
    }
)

Project

Represents a Dex project with isolated data and credentials.

File Operations

  • upload_file(file_path) - Upload a document to the project
  • list_files(pagination_params, filter) - List all uploaded files with optional pagination and filtering
  • get_file(file_id) - Get file metadata
  • download_file(file_id) - Download file content
Example:
# Upload a file
dex_file = await project.upload_file("path/to/document.pdf")

# List all files (simple)
files = await project.list_files()

# List files with pagination
from dex_sdk.types import PaginationParams

pagination_params = PaginationParams(
    page_size=10,
    sort_by="created_at",
    sort_order="desc",
    continuation_token=None
)
files = await project.list_files(pagination_params=pagination_params)

# List files with filtering
from dex_sdk.types import FileListFilter
from datetime import datetime, timedelta

file_filter = FileListFilter(created_at_start=datetime.now() - timedelta(days=1))
files = await project.list_files(pagination_params=pagination_params, filter=file_filter)

Job Operations

  • list_jobs(pagination_params, filter) - List all jobs in the project with optional pagination and filtering
  • get_job(job_id) - Get job details and status
Example:
# List all jobs (simple)
jobs = await project.list_jobs()

# List jobs with pagination and filtering
from dex_sdk.types import PaginationParams, JobListFilter
from datetime import datetime, timedelta

pagination_params = PaginationParams(
    page_size=10,
    sort_by="created_at",
    sort_order="desc",
)
job_filter = JobListFilter(created_at_start=datetime.now() - timedelta(days=1))
jobs = await project.list_jobs(pagination_params=pagination_params, filter=job_filter)

Parse Result Operations

  • list_parse_results(pagination_params, filter) - List all parse results with optional pagination and filtering
  • get_parse_result(parse_result_id) - Get parse result details
Example:
# List parse results with filtering
from dex_sdk.types import ParseResultListFilter

parse_result_filter = ParseResultListFilter(created_at_start=datetime.now() - timedelta(days=7))
parse_results = await project.list_parse_results(filter=parse_result_filter)

Extraction Operations

  • list_extractions(pagination_params, filter) - List all extractions with optional pagination and filtering
  • get_extraction(extraction_id) - Get extraction details
Example:
# List extractions with filtering
from dex_sdk.types import ExtractionListFilter

extraction_filter = ExtractionListFilter(created_at_start=datetime.now() - timedelta(days=7))
extractions = await project.list_extractions(filter=extraction_filter)

Vector Store Operations

  • create_vector_store(name, engine, embedding_model) - Create a vector store with SGP Knowledge Base engine
  • list_vector_stores(pagination_params, filter) - List all vector stores with optional pagination and filtering
  • get_vector_store(vector_store_id) - Get vector store details
  • delete_vector_store(vector_store_id) - Delete a vector store
Example:
from dex_sdk.types import VectorStoreEngines

vector_store = await project.create_vector_store(
    name="My Vector Store",
    engine=VectorStoreEngines.SGP_KNOWLEDGE_BASE,
    embedding_model="openai/text-embedding-3-large",
)

# List vector stores with filtering
from dex_sdk.types import VectorStoreListFilter

vector_store_filter = VectorStoreListFilter(created_at_start=datetime.now() - timedelta(days=7))
vector_stores = await project.list_vector_stores(filter=vector_store_filter)

DexFile

Represents an uploaded file in Dex.

Parsing

  • parse(params) - Parse document to structured format (automatically polls for completion)
  • start_parse_job(params) - Start a parse job and return immediately (recommended for async workflows)
Example:
from dex_sdk.types import (
    ParseEngine,
    ReductoParseJobParams,
    ReductoChunkingMethod,
    ReductoChunkingOptions,
    ReductoParseEngineOptions,
)

# Standard parsing (auto-polls for completion)
parse_result = await dex_file.parse(
    ReductoParseJobParams(
        engine=ParseEngine.REDUCTO,
        options=ReductoParseEngineOptions(
            chunking=ReductoChunkingOptions(
                chunk_mode=ReductoChunkingMethod.VARIABLE,
            )
        ),
    )
)

# Async job workflow (recommended for better control)
parse_job = await dex_file.start_parse_job(
    ReductoParseJobParams(
        engine=ParseEngine.REDUCTO,
        options=ReductoParseEngineOptions(
            chunking=ReductoChunkingOptions(
                chunk_mode=ReductoChunkingMethod.VARIABLE,
            )
        ),
    )
)
# See Job Monitoring section for how to track job progress

Working with Parse Results

After parsing, you can access the structured content including chunks and blocks. Example:
# Parse the file
parse_result = await dex_file.parse(parse_params)

# Access metadata
metadata = parse_result.parse_metadata
print(f"Source: {metadata.filename} ({metadata.pages_processed} pages, engine: {parse_result.engine})")

# Access content chunks
for i, chunk in enumerate(parse_result.content.chunks):
    print(f"\nChunk {i}: {chunk.content[:100]}... ({len(chunk.blocks)} blocks)")

    for block in chunk.blocks:
        print(f"  [{block.type}] Page {block.page_number}, "
              f"confidence: {block.confidence:.2f}, "
              f"pos: ({block.bbox.left:.2f}, {block.bbox.top:.2f})")

ParseResult

Represents the result of a document parsing operation.

Extraction

  • extract(extraction_schema, user_prompt, model, generate_citations, generate_confidence) - Extract structured data with user prompt, schema, model, and options
Parameters:
  • extraction_schema (BaseModel): Pydantic model class for extraction (pass the class directly, not model_json_schema())
  • user_prompt (str): Natural language instructions for extraction
  • model (str): LLM model to use (e.g., “openai/gpt-4o”)
  • generate_citations (bool): Include source citations in results
  • generate_confidence (bool): Include confidence scores in results
Example:
from pydantic import BaseModel, Field

class InvoiceData(BaseModel):
    invoice_number: str = Field(description="The invoice number")
    total_amount: float = Field(description="Total amount in dollars")
    date: str = Field(description="Invoice date")

extract_result = await parse_result.extract(
    extraction_schema=InvoiceData,
    user_prompt="Extract invoice details from this document.",
    model="openai/gpt-4o",
    generate_citations=True,
    generate_confidence=True,
)

Working with Extraction Results

After extraction, you can access the structured data, citations, and confidence scores. Example:
# Extract data
extract_result = await parse_result.extract(
    extraction_schema=InvoiceData,
    user_prompt="Extract invoice details from this document.",
    model="openai/gpt-4o",
    generate_citations=True,
    generate_confidence=True,
)

# Access the extraction result
result = extract_result.result

# Access structured data
for field_name, field in result.data.items():
    print(f"{field_name}: {field.value} (confidence: {field.confidence:.2f})")

    if field.citations:
        for cite in field.citations:
            loc = f", pos: ({cite.bbox.left:.2f}, {cite.bbox.top:.2f})" if cite.bbox else ""
            print(f"  → Page {cite.page}: {cite.content[:50]}...{loc}")

# Access usage information
if result.usage_info:
    usage = result.usage_info
    print(f"\nTokens: {usage.total_tokens} total ({usage.prompt_tokens} prompt + {usage.completion_tokens} completion)")

VectorStore

Represents a vector store for semantic search and RAG-enhanced extraction.

Indexing

  • add_parse_results(parse_result_ids) - Add parsed documents to vector store by parse result IDs
  • remove_files(file_ids) - Remove files from index
Example:
# Add parsed documents to vector store
await vector_store.add_parse_results([parse_result.id])

# Remove files from vector store
await vector_store.remove_files([file_id])
  • search(query, top_k, filters) - Semantic search across all documents in the vector store
  • search_in_file(file_id, query, top_k, filters) - Search within a specific file with optional filters
Example:
# Search across all documents
results = await vector_store.search(
    query="What is the total revenue?",
    top_k=5,
)

# Search within a specific file
file_results = await vector_store.search_in_file(
    file_id=dex_file.id,
    query="What is the total revenue?",
    top_k=5,
    filters=None,
)

Extraction

  • extract(extraction_schema, user_prompt, model, generate_citations, generate_confidence) - Extract structured data from entire vector store with RAG context
Example:
# Extract from vector store with RAG context
extract_result = await vector_store.extract(
    extraction_schema=FinancialData,
    user_prompt="Extract financial summary from all documents.",
    model="openai/gpt-4o",
    generate_citations=True,
    generate_confidence=True,
)

Job Monitoring and SGP Tracing

New in v0.4.0: Enhanced async job support with SGP tracing integration for better observability and debugging.

Async Job Workflow

Use start_parse_job() for better control over async operations and access to SGP traces. Example:
from dex_sdk.types import (
    ReductoParseJobParams,
    ReductoParseEngineOptions,
    ReductoChunkingOptions,
    ReductoChunkingMethod,
    ParseEngine,
    JobStatus
)
import asyncio

# Start a parse job
parse_job = await project.start_parse_job(
    dex_file=dex_file,
    parameters=ReductoParseJobParams(
        engine=ParseEngine.REDUCTO,
        options=ReductoParseEngineOptions(
            chunking=ReductoChunkingOptions(
                chunk_mode=ReductoChunkingMethod.VARIABLE,
            )
        ),
    ),
)

# Monitor job progress
while parse_job.data.status != JobStatus.SUCCEEDED and parse_job.data.status != JobStatus.FAILED:
    await asyncio.sleep(1)
    await parse_job.refresh()

# Get the result
parse_result = await parse_job.get_result()

Retrieving SGP Traces

Jobs are now connected to SGP traces for end-to-end observability. You can retrieve complete trace data for any job. Example:
from scale_gp_beta import SGPClient

# Initialize SGP client
sgp_client = SGPClient(
    api_key=os.getenv("SGP_API_KEY"),
    account_id=os.getenv("SGP_ACCOUNT_ID"),
)

# Search for spans by job ID
spans = [span for span in sgp_client.spans.search(
    sort_by="created_at",
    sort_order="desc",
    extra_metadata={"job_id": parse_job.data.id},
    parents_only=True,
)]

# Get full trace
if spans:
    trace_id = spans[0].trace_id
    all_spans = [span for span in sgp_client.spans.search(trace_ids=[trace_id])]

    # Analyze trace data
    for span in all_spans:
        print(f"Span: {span.name}, Duration: {span.duration_ms}ms")
Benefits:
  • Better observability: Track jobs through SGP’s tracing infrastructure
  • Easier debugging: Access detailed execution traces for failed jobs
  • Performance monitoring: Analyze job performance and identify bottlenecks
  • Request correlation: Connect job execution to API requests and traces

Parse Job Parameters

When parsing documents, you can specify different engines and options to customize the parsing behavior.

Reducto Parse Parameters

ReductoParseJobParams - Parameters for the Reducto OCR engine. Best for: English and Latin-script documents (Spanish, French, German, Italian, Portuguese, etc.) with tables, figures, and complex layouts. Fields:
  • engine (ParseEngine): Set to ParseEngine.REDUCTO
  • options (ReductoParseEngineOptions): Parsing options
  • advanced_options (dict): Advanced options for fine-tuning
  • experimental_options (dict): Experimental features
  • priority (bool): Whether to prioritize this job (default: False)
ReductoParseEngineOptions:
  • chunking (ReductoChunkingOptions | None): Chunking configuration
ReductoChunkingOptions:
  • chunk_mode (ReductoChunkingMethod): Chunking method (default: VARIABLE)
    • DISABLED: No chunking
    • BLOCK: Block-level chunks
    • PAGE: Page-level chunks
    • PAGE_SECTIONS: Page sections
    • SECTION: Section-level chunks
    • VARIABLE: Variable-size chunks based on content
  • chunk_size (int | None): Custom chunk size
Example:
from dex_sdk.types import (
    ParseEngine,
    ReductoParseJobParams,
    ReductoChunkingMethod,
    ReductoChunkingOptions,
    ReductoParseEngineOptions,
)

parse_params = ReductoParseJobParams(
    engine=ParseEngine.REDUCTO,
    options=ReductoParseEngineOptions(
        chunking=ReductoChunkingOptions(
            chunk_mode=ReductoChunkingMethod.VARIABLE,
            chunk_size=None,
        )
    ),
    priority=False,
)

parse_result = await dex_file.parse(parse_params)

Iris Parse Parameters

IrisParseJobParams - Parameters for the Iris OCR engine (experimental).
Iris has experimental stability. For standard production use, we recommend Reducto. See When to choose Iris?.
Fields:
  • engine (ParseEngine): Set to ParseEngine.IRIS
  • options (IrisParseEngineOptions): Parsing options
IrisParseEngineOptions:
  • layout (str | None): Layout detection model to use
  • text_ocr (str | None): Text OCR model to use
  • table_ocr (str | None): Table OCR model to use
  • text_prompt (str | None): Custom prompt for text extraction (VLMs only)
  • table_prompt (str | None): Custom prompt for table extraction (VLMs only)
  • left_to_right (bool | None): Sort regions left-to-right instead of right-to-left (default: False)
  • confidence_threshold (float | None): Minimum confidence threshold for layout detection
  • containment_threshold (float | None): Containment threshold for filtering overlapping boxes
Example:
from dex_sdk.types import (
    ParseEngine,
    IrisParseJobParams,
    IrisParseEngineOptions,
)

parse_params = IrisParseJobParams(
    engine=ParseEngine.IRIS,
    options=IrisParseEngineOptions(
        layout="layout_model_v1",
        text_ocr="text_ocr_v1",
        confidence_threshold=0.5,
    )
)

parse_result = await dex_file.parse(parse_params)

Common Types

This section documents the core data models and types used throughout the Dex SDK.

Type Categories

Importable Types - Types you can import from dex_sdk.types to configure your requests:
  • Configuration types (ProjectConfiguration, RetentionPolicy)
  • Pagination types (PaginationParams, FileListFilter, JobListFilter, ParseResultListFilter, ExtractionListFilter, VectorStoreListFilter) - New in v0.4.0
  • Parse parameter types (ReductoParseJobParams, IrisParseJobParams, etc.)
  • Enum types (ParseEngine, ReductoChunkingMethod, VectorStoreEngines, JobStatus) - JobStatus new in v0.4.0
Response Types - Types returned by the SDK, accessible via the .data attribute on wrapper objects:
  • When you call SDK methods, you get wrapper objects (DexProject, DexFile, DexParseResult, etc.)
  • Access the underlying data via .data: project.data.id, file.data.filename
  • These entities are automatically validated but don’t need to be imported

Configuration Types

ProjectConfiguration

Configuration options for a Dex project. Import: from dex_sdk.types import ProjectConfiguration Fields:
  • retention (RetentionPolicy | None): Data retention policy for the project
Example:
from datetime import timedelta
from dex_sdk.types import ProjectConfiguration, RetentionPolicy

config = ProjectConfiguration(
    retention=RetentionPolicy(
        files=timedelta(days=30),
        result_artifacts=timedelta(days=7),
    )
)

RetentionPolicy

Defines data retention periods for automatic cleanup of files and processing artifacts. Import: from dex_sdk.types import RetentionPolicy Fields:
  • files (timedelta | None): Retention period for uploaded files. Files older than this period are automatically deleted. If None, files are retained indefinitely.
  • result_artifacts (timedelta | None): Retention period for parse results, extraction results, and job artifacts. If None, artifacts are retained indefinitely.
Example:
from datetime import timedelta
from dex_sdk.types import RetentionPolicy

# 30-day file retention, 7-day artifact retention
policy = RetentionPolicy(
    files=timedelta(days=30),
    result_artifacts=timedelta(days=7),
)

# Keep files indefinitely, but clean up artifacts after 14 days
policy = RetentionPolicy(
    files=None,
    result_artifacts=timedelta(days=14),
)
Use Cases:
  • Compliance: Meet regulatory requirements (GDPR, HIPAA, etc.)
  • Cost Management: Automatically clean up old data to reduce storage costs
  • Security: Limit exposure of sensitive documents by enforcing retention limits
Note: The retention period is calculated from the creation time of the file or artifact. Retention policies can be updated at any time using update_project().

PaginationParams

New in v0.4.0: Parameters for paginated list operations. Import: from dex_sdk.types import PaginationParams Fields:
  • page_size (int | None): Number of items to return per page (default: 50, max: 100)
  • sort_by (str | None): Field name to sort by (e.g., "created_at")
  • sort_order (str | None): Sort order, either "asc" or "desc" (default: "desc")
  • continuation_token (str | None): Token for fetching next/previous page
Example:
from dex_sdk.types import PaginationParams

pagination_params = PaginationParams(
    page_size=10,
    sort_by="created_at",
    sort_order="desc",
    continuation_token=None  # Set to next_token from previous response for pagination
)

files = await project.list_files(pagination_params=pagination_params)

List Filter Types

New in v0.4.0: Filter types for list operations on different entity types. Import: from dex_sdk.types import FileListFilter, JobListFilter, ParseResultListFilter, ExtractionListFilter, VectorStoreListFilter Common Fields:
  • created_at_start (datetime | None): Filter for entities created after this time
  • created_at_end (datetime | None): Filter for entities created before this time
Example:
from dex_sdk.types import FileListFilter, JobListFilter
from datetime import datetime, timedelta

# Filter files created in the last 24 hours
file_filter = FileListFilter(
    created_at_start=datetime.now() - timedelta(days=1)
)
files = await project.list_files(filter=file_filter)

# Filter jobs created in the last week
job_filter = JobListFilter(
    created_at_start=datetime.now() - timedelta(days=7)
)
jobs = await project.list_jobs(filter=job_filter)
Available Filter Types:
  • FileListFilter - Filter uploaded files
  • JobListFilter - Filter jobs
  • ParseResultListFilter - Filter parse results
  • ExtractionListFilter - Filter extractions
  • VectorStoreListFilter - Filter vector stores

ExtractionParameters

Parameters for extraction operations. Import: from dex_sdk.types import ExtractionParameters Fields:
  • model (str): LLM model to use (e.g., "openai/gpt-4o")
  • model_kwargs (dict | None): Additional kwargs for the LLM model
  • extraction_schema (dict): JSON schema defining the desired output structure
  • system_prompt (str | None): High-level instructions for the extraction model
  • user_prompt (str | None): Specific hints about the current document
  • generate_citations (bool): Whether to return bounding boxes for extracted values (default: True)
  • generate_confidence (bool): Whether to return confidence scores (default: True)

Parse Configuration Types

ParseEngine

Enum of available OCR engines. Import: from dex_sdk.types import ParseEngine Values:
  • REDUCTO = “reducto”
  • IRIS = “iris”
  • CUSTOM = “custom”

ReductoParseJobParams

Parameters for the Reducto OCR engine. Import: from dex_sdk.types import ReductoParseJobParams See the Parse Job Parameters section for detailed usage.

IrisParseJobParams

Parameters for the Iris OCR engine. Import: from dex_sdk.types import IrisParseJobParams See the Parse Job Parameters section for detailed usage.

ReductoChunkingMethod

Enum of chunking methods for Reducto parser. Import: from dex_sdk.types import ReductoChunkingMethod Values:
  • DISABLED = “disabled”
  • BLOCK = “block”
  • PAGE = “page”
  • PAGE_SECTIONS = “page_sections”
  • SECTION = “section”
  • VARIABLE = “variable”

ReductoChunkingOptions

Chunking configuration for Reducto parser. Import: from dex_sdk.types import ReductoChunkingOptions Fields:
  • chunk_mode (ReductoChunkingMethod): Chunking method
  • chunk_size (int | None): Custom chunk size

ReductoParseEngineOptions

Options for Reducto parser. Import: from dex_sdk.types import ReductoParseEngineOptions Fields:
  • chunking (ReductoChunkingOptions | None): Chunking configuration

IrisParseEngineOptions

Options for Iris parser. Import: from dex_sdk.types import IrisParseEngineOptions Fields:
  • layout (str | None): Layout detection model
  • text_ocr (str | None): Text OCR model
  • table_ocr (str | None): Table OCR model
  • text_prompt (str | None): Custom prompt for text extraction
  • table_prompt (str | None): Custom prompt for table extraction
  • left_to_right (bool | None): Sort regions left-to-right
  • confidence_threshold (float | None): Minimum confidence threshold
  • containment_threshold (float | None): Containment threshold for filtering

Vector Store Types

VectorStoreEngines

Enum of available vector store engines. Import: from dex_sdk.types import VectorStoreEngines Values:
  • SGP_KNOWLEDGE_BASE = “sgp_knowledge_base”

VectorStoreSearchResult

Result from vector store search operations containing matching chunks. Import: from dex_sdk.types import VectorStoreSearchResult Fields:
  • chunks (list[VectorStoreChunk]): List of matching chunks with relevance scores
Example:
# Perform search
search_results = await vector_store.search(
    query="What is the total revenue?",
    top_k=5,
)

# Access chunks
for chunk in search_results.chunks:
    print(f"Score: {chunk.score:.2f}")
    print(f"Content: {chunk.content[:100]}...")
    print(f"File ID: {chunk.file_id}")

VectorStoreChunk

Represents a single chunk returned from vector store search operations. Fields:
  • content (str): Text content of the chunk
  • score (float): Relevance score for the search query
  • file_id (str | None): ID of the file this chunk belongs to
  • parse_result_id (str | None): ID of the parse result this chunk belongs to
  • metadata (dict[str, Any] | None): Additional metadata from the chunk, which may include information like chunk indices, embeddings metadata, or other custom fields added during indexing
  • blocks (list): List of block objects with layout information
Example:
# Access chunk details
for chunk in search_results.chunks:
    print(f"Score: {chunk.score:.2f}")
    print(f"Content: {chunk.content[:100]}...")
    print(f"File: {chunk.file_id}, Parse Result: {chunk.parse_result_id}")

    # Access additional metadata
    if chunk.metadata:
        print(f"Chunk metadata: {chunk.metadata}")

    # Access block layout information
    for block in chunk.blocks:
        print(f"  Page {block.page_number}, type: {block.type}")

Response Entity Types

These types are returned by SDK methods and accessed via the .data attribute on wrapper objects. You typically don’t need to import these directly.

Working with Response Data

When you call SDK methods, you receive wrapper objects with a .data attribute:
# Create a project
project = await dex_client.create_project(name="My Project")
print(f"Project: {project.data.name} ({project.data.id}) created at {project.data.created_at}")

# Upload a file
dex_file = await project.upload_file("document.pdf")
print(f"File: {dex_file.data.filename} ({dex_file.data.size_bytes} bytes) → {dex_file.data.id}")

# Parse a file
parse_result = await dex_file.parse(parse_params)
metadata = parse_result.data.parse_metadata
print(f"Parsed: {metadata.pages_processed} pages with {parse_result.data.engine}{parse_result.data.id}")

Common Response Entity Fields

ProjectEntity (accessed via project.data):
  • id (str): Project ID with proj_ prefix
  • name (str): Project readable name
  • status (str): Project status ("active" or "archived")
  • configuration (ProjectConfiguration | None): Project configuration
  • created_at (datetime): When the project was created
  • archived_at (datetime | None): When the project was archived
FileEntity (accessed via dex_file.data):
  • id (str): File ID with file_ prefix
  • project_id (str): Project ID that the file belongs to
  • filename (str): Original filename
  • size_bytes (int): File size in bytes
  • mime_type (str): MIME type of the file
  • status (str): Current file status
  • created_at (datetime): When the file was uploaded
ParseResultEntity (accessed via parse_result.data):
  • id (str): Parse result ID with pres_ prefix
  • project_id (str): Project ID
  • source_document_id (str): Source document ID that was parsed
  • engine (str): Engine used for parsing
  • parse_metadata (object): Metadata including filename, pages_processed
  • content (object): Parsed content with chunks
  • created_at (datetime): When the parse result was created
ExtractionEntity (accessed via extract_result or in extraction results):
  • id (str): Extraction result ID
  • source_id (str): Source ID that was extracted from
  • result (object): The extraction result with data and usage_info
  • parameters (ExtractionParameters): Parameters used for extraction
  • created_at (datetime): When the extraction was completed
  • processing_time_ms (int | None): Processing time in milliseconds
VectorStoreEntity (accessed via vector_store.data):
  • id (str): Vector store ID with vs_ prefix
  • project_id (str): Project ID
  • name (str): Name of the vector store
  • engine (str): Engine used for vector store
  • created_at (datetime): When the vector store was created

Deprecated Types

The following types are deprecated as of version 0.3.2 and should no longer be used:
  • ProjectCredentials - No longer used; credentials are passed to DexClient constructor
  • SGPCredentials - No longer used; credentials are passed to DexClient constructor
See the Changelog for migration instructions.

Error Handling

The SDK raises exceptions for various error conditions. For detailed troubleshooting guidance, see the Troubleshooting Guide.
from dex_sdk.exceptions import DexException

try:
    parse_result = await dex_file.parse(...)
except DexException as e:
    print(f"Error: {e}")

Async/Await Pattern

The Dex SDK is fully async. Use await with all SDK methods:
import asyncio
import os

async def main():
    # Initialize client with credentials
    dex_client = DexClient(
        base_url="https://dex.sgp.scale.com",
        api_key=os.getenv("SGP_API_KEY"),
        account_id=os.getenv("SGP_ACCOUNT_ID"),
    )

    project = await dex_client.create_project(name="My Project")
    dex_file = await project.upload_file("document.pdf")
    parse_result = await dex_file.parse(...)
    extract_result = await parse_result.extract(...)

# Run in Jupyter/IPython
await main()

# Run in regular Python script
asyncio.run(main())

See Also