Distributed Tracing
Implement tracing across multiple processes and workers in your Python app.
Distributed applications, especially those involving multiple Python processes, threads, or asynchronous workers, introduce unique challenges for tracing. While the SGP Tracing SDK automatically manages span context within a single Python process (using contextvars
), this automatic propagation does not extend across process boundaries or when work is explicitly dispatched to new, independent execution contexts.
The SGP backend expects well-formed trace data with clear parent-child relationships. Without careful management, there is a strong chance of race conditions or orphaned spans if child spans are reported before their parents, leading to incomplete or broken traces in the UI.
This guide outlines key strategies for effective tracing in multi-process and multi-worker environments.
Understanding Context Propagation
When using context managers (with tracing.create_span(...)
), the SDK automatically sets the current span and trace in a context-local variable. However, when you:
- Spawn a new Python process (e.g., using
multiprocessing
). - Enqueue a task to a background job queue (e.g., Celery, RQ).
- Dispatch work to a separate thread pool where contextvars might not propagate by default (though
threading
typically handles this better thanmultiprocessing
).
The new execution context will not automatically inherit the trace_id
or parent_id
from the originating process. You must explicitly pass this context.
Strategies for Distributed Tracing
One Trace Per Worker (Simplest)
For parallel work where strict hierarchical linking of every operation across workers isn’t necessary, the easiest approach is to create an independent trace for each worker or process.
You can then use a group_id
to logically link these independent traces together, allowing you to see all related activity in the Traces page, even if they don’t form a single, continuous trace hierarchy. This is ideal for scenarios where workers process independent units of work concurrently.
Extending a Trace Across Workers
If you need to maintain a single, continuous trace hierarchy where operations performed in separate workers are direct children of a span in the main process (or another worker), you must manually propagate the tracing context.
This involves:
- Retrieving the
trace_id
and thespan_id
of the parent span in the originating process. - Passing these IDs to the new worker/process (e.g., as function arguments or message queue payload fields).
- Using these explicit IDs when creating new spans in the worker, ensuring they are correctly linked as children.
Example: Passing Context via Function Arguments
Important Considerations
tracing.flush_queue()
: Always calltracing.flush_queue()
orspan.flush()
in the originating process before enqueuing a job or spawning a process that creates child spans. This helps ensure the parent span (and any preceding spans in that context) is sent to the backend before its children, reducing the chance of broken traces.- Worker Initialization: Each independent Python process (e.g., a new
multiprocessing.Process
) will have its own tracing queue manager. Ensuretracing.init()
is called within each worker’s entry point if you expect it to send tracing data. This typically means callingtracing.init()
at the start of the function that the worker executes. - Error Handling: In distributed systems, be diligent with error handling and ensure spans are ended correctly, even if an exception occurs. Context managers (
with tracing.create_span(...)
) handle this automatically within their scope.