Manage files, projects, and collections in Dex. The same pagination and filtering patterns apply to jobs, parse results, extractions, and vector stores.
Create Project
import os
from dex_sdk import DexClient
dex_client = DexClient(
base_url="https://dex.sgp.scale.com",
api_key=os.getenv("SGP_API_KEY"),
account_id=os.getenv("SGP_ACCOUNT_ID"),
)
project = await dex_client.create_project(name="My Project")
Create Project with Retention
project = await dex_client.create_project(
name="My Project",
configuration=ProjectConfiguration(
retention=RetentionPolicy(
files=timedelta(days=30),
result_artifacts=timedelta(days=7),
)
)
)
Upload File
dex_file = await project.upload_file("document.pdf")
Process Multiple Files
import asyncio
files_to_upload = ["doc1.pdf", "doc2.pdf", "doc3.pdf"]
dex_files = await asyncio.gather(*[project.upload_file(f) for f in files_to_upload])
print(f"Uploaded {len(dex_files)} files")
Pagination and Filtering
New in v0.4.0: Efficiently manage large collections with pagination and filtering.
pagination_params = PaginationParams(
page_size=50,
sort_by="created_at",
sort_order="desc",
continuation_token=None
)
result = await project.list_files(pagination_params=pagination_params)
# Access items
for file in result.items:
print(f"File: {file.filename} ({file.created_at})")
# Paginate through remaining results
while result.next_token:
pagination_params.continuation_token = result.next_token
result = await project.list_files(pagination_params=pagination_params)
# Process result.items
Filtering by Creation Time
from datetime import datetime, timedelta
# Filter files created in the last 24 hours
file_filter = FileListFilter(
created_at_start=datetime.now() - timedelta(days=1)
)
recent_files = await project.list_files(filter=file_filter)
# Combine pagination and filtering for files
pagination_params = PaginationParams(page_size=20, sort_by="created_at", sort_order="desc")
file_filter = FileListFilter(created_at_start=datetime.now() - timedelta(days=30))
recent_files = await project.list_files(
pagination_params=pagination_params,
filter=file_filter
)
Available List Operations
All entity types support unified list_* operations with pagination and filtering:
list_files(pagination_params, filter) - List uploaded files
list_jobs(pagination_params, filter) - List all jobs
list_parse_results(pagination_params, filter) - List parse results
list_extractions(pagination_params, filter) - List extractions
list_vector_stores(pagination_params, filter) - List vector stores
Supported File Types
Dex supports a wide variety of document formats:
Images
PNG, JPEG/JPG, GIF, BMP, TIFF, PCX, PPM, APNG, PSD, CUR, DCX, FTEX, PIXAR, HEIC
Documents
PDF, DOCX, DOC, DOTX, WPD, TXT, RTF, PPTX, PPT
Spreadsheets
CSV, XLSX, XLSM, XLS, XLTX, XLTM, QPW
For best results with spreadsheets, use XLSX format. CSV files are processed as-is without layout analysis.
Access Response Data
SDK methods return wrapper objects; access data via .data:
# Correct
project_id = project.data.id
project_name = project.data.name
dex_file.data.id
dex_file.data.filename
dex_file.data.size_bytes
Appendix: Essential Imports
from dex_sdk.types import (
PaginationParams,
FileListFilter,
ProjectConfiguration,
RetentionPolicy,
)
from datetime import timedelta
Next Steps