Overview

Modern applications, especially those leveraging AI agents and complex tool use, are distributed and often opaque. Understanding the flow of data, performance bottlenecks, and error propagation across different services and functions is critical for debugging, optimization, and ensuring reliability.

Our Tracing capabilities provide deep observability into your workflows by allowing you to generate, collect, and visualize spans and traces. This system is designed to significantly decrease the overhead of understanding complex operations while increasing your ability to debug and monitor.

At its core, Tracing now consists of:

  1. Spans: Individual units of work within your application.
  2. Traces: Collections of interconnected spans representing a complete workflow.
  3. Dedicated Traces Page: A new UI for visualizing, filtering, and drilling down into your tracing data.
  4. Flexible SDK & APIs: Tools to instrument your Python applications and integrate tracing directly into your code.

Traces & Spans Concepts

To effectively use tracing, it’s helpful to understand the fundamental concepts:

  • Span: A span represents a single unit-of-work within a trace, such as a function call, a database query, or an external API request. It captures crucial information about that operation, including:

    • name: A descriptive name for the operation.
    • input & output: Data flowing into and out of the operation.
    • trace_id: The identifier of the overall workflow this span belongs to.
    • parent_id: The identifier of the span that directly invoked this operation, allowing us to build a nested hierarchy.
    • start_time & end_time: Timestamps for the duration of the operation.
    • metadata: Arbitrary key-value pairs for additional context or custom annotations.
    • status: Indicates if the span completed successfully or with an error.
    • type: Optionally give the span a type, to better filtering and visualization
  • Trace: A trace is a collection of spans that together describe a complete, end-to-end workflow, such as a user’s web request, an AI agent’s complex decision-making process, or a batch processing job.

    • Every trace begins with a single “root-span”, which represents the starting point of that entire workflow.
    • Server-side, we primarily record and visualize spans. The “trace” resource itself is implicitly derived from the relationships and common trace_id among spans, with the root span providing the top-level details for the workflow.

For a deeper dive into the standardized tracing model, refer to the OpenTelemetry specification on Traces & Spans: https://opentelemetry.io/docs/specs/otel/overview/#tracing-signal.

Traces Page in the UI

The new dedicated Traces page in the SGP UI provides powerful capabilities to monitor, filter, and drill down into your trace data:

  • High-Level Monitoring: Get an immediate overview of your application’s workflows.
  • Filtering: Use basic filters now, with more in-depth filtering capabilities coming soon. You’ll be able to search on:
    • Static fields: Like a specific trace ID or a span type.
    • Dynamic fields: Custom key-value pairs found in your span’s metadata.
  • Trace Detail View: Clicking on any trace reveals a detailed breakdown, allowing you to:
    • View run-time statistics for each span.
    • Inspect input, output, and metadata associated with each span.
    • Trigger thread-forking (for SGP native traces) for advanced debugging.

Getting Started with Tracing

Examples in this guide use the scale-gp-beta package, which runs exclusively on the V5 API.

The SGP Tracing SDK provides a convenient and ergonomic way to instrument your Python applications. The Python SDK is auto-generated using Stainless with a custom code patch for ergonomic tracing support.

Initialization

Before you can create any traces or spans, you should initialize the tracing SDK with your SGPClient. It’s best practice to do this once at your application’s entry point.

import scale_gp_beta.lib.tracing as tracing
from scale_gp_beta import SGPClient

# Initialize with your API Key and Account ID
client = SGPClient(api_key="YOUR_API_KEY", account_id="YOUR_ACCOUNT_ID")
tracing.init(client=client)

The tracing SDK uses the SGPClient for all data export requests. This means you can leverage existing client configurations, such as base_url or timeout settings, by passing a pre-configured client to tracing.init().

Creating Your First Trace & Spans

The most straightforward and recommended way to create traces and spans is by using them as context managers (with statements). This ensures that spans are automatically started, ended, and errors are captured, simplifying your instrumentation.

Define a Workflow as a Trace

Use tracing.create_trace() as a context manager to define a new end-to-end workflow. This automatically creates the root span for your trace.

import scale_gp_beta.lib.tracing as tracing

def my_application_flow():
    with tracing.create_trace(name="my_customer_onboarding_process", metadata={"customer_tier": "premium"}):
        print("Starting customer onboarding...")
        # ... logic for onboarding
        print("Customer onboarding complete.")

Instrument Units of Work as Spans

Inside a create_trace block, use tracing.create_span() as a context manager. These spans will automatically inherit the trace_id from the current trace and will be nested under the currently active span (or the root span if no other span is active).

If you are working in a distributed setting, please see Distributed Tracing

Quick Start Examples

For quick runnable examples, see the examples/tracing directory in the repository.

Traces API Endpoints

For direct interaction with span data via the API, beyond the SDK, you can use the following endpoints: