Skip to main content
This reference documents the Python SDK methods for Scale’s Dex document understanding capability.

DexClient

The main client for interacting with the Dex service.

Project Management

  • create_project(name, credentials) - Create a new project with SGP credentials
  • list_projects() - List all accessible projects
  • get_project(project_id) - Retrieve a specific project
  • update_project(project_id, updates) - Update project configuration
Example:
from dex_sdk import DexClient
from dex_sdk.types import ProjectCredentials, SGPCredentials

dex_client = DexClient(base_url="https://dex.sgp.scale.com")

project = await dex_client.create_project(
    name="My Project",
    credentials=ProjectCredentials(
        sgp=SGPCredentials(
            account_id="your_account_id",
            api_key="your_api_key",
        ),
    ),
)

Project

Represents a Dex project with isolated data and credentials.

File Operations

  • upload_file(file_path) - Upload a document to the project
  • list_files() - List all uploaded files
  • get_file(file_id) - Get file metadata
  • download_file(file_id) - Download file content
Example:
# Upload a file
dex_file = await project.upload_file("path/to/document.pdf")

# List all files
files = await project.list_files()

Vector Store Operations

  • create_vector_store(name, engine, embedding_model) - Create a vector store with SGP Knowledge Base engine
  • list_vector_stores() - List all vector stores
  • get_vector_store(vector_store_id) - Get vector store details
  • delete_vector_store(vector_store_id) - Delete a vector store
Example:
from dex_sdk.types import VectorStoreEngines

vector_store = await project.create_vector_store(
    name="My Vector Store",
    engine=VectorStoreEngines.SGP_KNOWLEDGE_BASE,
    embedding_model="openai/text-embedding-3-large",
)

DexFile

Represents an uploaded file in Dex.

Parsing

  • parse(params) - Parse document to structured format
Example:
from dex_sdk.types import (
    ParseEngine,
    ParseJobRequestParams,
    ReductoChunkingMethod,
    ReductoChunkingOptions,
    ReductoParseEngineOptions,
)

parse_result = await dex_file.parse(
    ParseJobRequestParams(
        engine=ParseEngine.REDUCTO,
        options=ReductoParseEngineOptions(
            chunking=ReductoChunkingOptions(
                chunk_mode=ReductoChunkingMethod.VARIABLE,
            )
        ),
    )
)

ParseResult

Represents the result of a document parsing operation.

Extraction

  • extract(ExtractionParameters) - Extract structured data with user prompt, schema, model, and options
ExtractionParameters fields:
  • user_prompt (str): Natural language instructions for extraction
  • extraction_schema (dict): JSON schema from YourModel.model_json_schema()
  • model (str): LLM model to use (e.g., “openai/gpt-4o”)
  • generate_citations (bool): Include source citations in results
  • generate_confidence (bool): Include confidence scores in results
Example:
from pydantic import BaseModel, Field
from dex_sdk.types import ExtractionParameters

class InvoiceData(BaseModel):
    invoice_number: str = Field(description="The invoice number")
    total_amount: float = Field(description="Total amount in dollars")
    date: str = Field(description="Invoice date")

extract_result = await parse_result.extract(
    ExtractionParameters(
        user_prompt="Extract invoice details from this document.",
        extraction_schema=InvoiceData.model_json_schema(),
        model="openai/gpt-4o",
        generate_citations=True,
        generate_confidence=True,
    )
)

Data Access

  • data.model_dump() - Access parsed content as dictionary
Example:
# Access extraction results
extraction_data = extract_result.data.model_dump()
print(extraction_data)

VectorStore

Represents a vector store for semantic search and RAG-enhanced extraction.

Indexing

  • add_parse_results(parse_result_ids) - Add parsed documents to vector store by parse result IDs
  • list_files() - List indexed files
  • remove_files(file_ids) - Remove files from index
Example:
# Add parsed documents to vector store
await vector_store.add_parse_results([parse_result.id])

# List files in vector store
files = await vector_store.list_files()

# Remove files from vector store
await vector_store.remove_files([file_id])
  • search(query, top_k, filters) - Semantic search across all documents in the vector store
  • search_in_file(file_id, query, top_k, filters) - Search within a specific file with optional filters
Example:
# Search across all documents
results = await vector_store.search(
    query="What is the total revenue?",
    top_k=5,
)

# Search within a specific file
file_results = await vector_store.search_in_file(
    file_id=dex_file.id,
    query="What is the total revenue?",
    top_k=5,
    filters=None,
)

Extraction

  • extract(ExtractionParameters) - Extract structured data from entire vector store with RAG context
Example:
# Extract from vector store with RAG context
extract_result = await vector_store.extract(
    ExtractionParameters(
        user_prompt="Extract financial summary from all documents.",
        extraction_schema=FinancialData.model_json_schema(),
        model="openai/gpt-4o",
        generate_citations=True,
        generate_confidence=True,
    )
)

Common Types

ProjectCredentials

Contains credentials for SGP and other services. Fields:
  • sgp: SGPCredentials object

SGPCredentials

Contains SGP account credentials. Fields:
  • account_id (str): Your SGP account ID
  • api_key (str): Your SGP API key

ParseEngine

Enum of available OCR engines. Values:
  • ParseEngine.REDUCTO - Reducto OCR engine (default)
  • ParseEngine.SCALE_OCR - Scale’s custom OCR engine

VectorStoreEngines

Enum of available vector store engines. Values:
  • VectorStoreEngines.SGP_KNOWLEDGE_BASE - SGP Knowledge Base vector store

ReductoChunkingMethod

Enum of chunking methods for Reducto parser. Values:
  • ReductoChunkingMethod.VARIABLE - Variable-size chunks based on content
  • ReductoChunkingMethod.BLOCK - Fixed block-level chunks

Error Handling

The SDK raises exceptions for various error conditions:
from dex_sdk.exceptions import DexException

try:
    parse_result = await dex_file.parse(...)
except DexException as e:
    print(f"Error: {e}")
Common exceptions:
  • File upload errors (invalid format, size limit exceeded)
  • Parsing errors (unsupported content, OCR failure)
  • Extraction errors (invalid schema, model errors)
  • Authentication errors (invalid credentials)

Async/Await Pattern

The Dex SDK is fully async. Use await with all SDK methods:
import asyncio

async def main():
    dex_client = DexClient(base_url="https://dex.sgp.scale.com")
    
    project = await dex_client.create_project(...)
    dex_file = await project.upload_file(...)
    parse_result = await dex_file.parse(...)
    extract_result = await parse_result.extract(...)

# Run in Jupyter/IPython
await main()

# Run in regular Python script
asyncio.run(main())

See Also

I