Distributed Tracing: Making Sense of Microservice Latency

*By Zak Hassan — Staff SRE | May 2026*

Distributed tracing is the observability technique that makes microservice latency legible. In a monolith, a slow request is easy to profile — the call stack is right there. In a system where a single user-facing request fans out through 15 services, understanding which service is slow, why it's slow, and what upstream or downstream effects that slowness creates requires tracing.

But raw tracing data is easy to generate and hard to use. Most teams that instrument their services with OpenTelemetry end up with terabytes of trace data and limited ability to extract signal from it. This is the guide to making tracing actually useful — structuring spans correctly, sampling intelligently, and building analysis workflows that surface the latency problems that matter.

What a Trace Is and Isn't

A trace is a record of a single request's journey through a distributed system. It's composed of spans — each span represents one unit of work (a service processing the request, a database call, an external API call). Spans have a parent-child relationship that forms the trace tree.

What traces are good at:

Identifying which service in a call chain is adding latency
Understanding call fan-out patterns (service A calls B, C, and D in parallel — which one is the bottleneck?)
Correlating an individual user complaint with specific system behavior
Debugging intermittent failures in specific code paths

What traces are not good at:

Statistical analysis of latency across large populations (use metrics for that)
Showing aggregate error rates (use metrics)
Understanding why a problem is happening (traces show what happened, not why the system is configured that way)

The most effective observability stacks use traces as the drill-down tool after metrics identify that something is wrong.

Span Design: What to Instrument

The choice of what to make a span is the highest-leverage decision in tracing. Too coarse and the trace doesn't isolate latency. Too fine and the trace is noise.

Instrument at these boundaries:

from opentelemetry import trace
from opentelemetry.trace import Status, StatusCode
import functools

tracer = trace.get_tracer("my-service")

# 1. Every inbound HTTP/gRPC request (auto-instrumented by most frameworks)
# If not auto-instrumented:
@app.route("/api/orders/<order_id>")
def get_order(order_id: str):
    with tracer.start_as_current_span("get_order") as span:
        span.set_attribute("order.id", order_id)
        span.set_attribute("user.id", get_current_user_id())
        return _get_order_internal(order_id)

# 2. Every outbound call — database, cache, external API, internal service
def fetch_user(user_id: str) -> User:
    with tracer.start_as_current_span("db.fetch_user") as span:
        span.set_attribute("db.system", "postgresql")
        span.set_attribute("db.operation", "SELECT")
        span.set_attribute("db.table", "users")
        span.set_attribute("user.id", user_id)
        
        try:
            result = db.query("SELECT * FROM users WHERE id = %s", user_id)
            span.set_attribute("db.rows_returned", len(result))
            return result
        except Exception as e:
            span.set_status(Status(StatusCode.ERROR, str(e)))
            span.record_exception(e)
            raise

# 3. Significant internal computations (>10ms expected duration)
def compute_recommendation_scores(user_features: dict) -> list:
    with tracer.start_as_current_span("recommendation.score_computation") as span:
        span.set_attribute("recommendation.feature_count", len(user_features))
        scores = _run_scoring_model(user_features)
        span.set_attribute("recommendation.candidates_scored", len(scores))
        return scores

# 4. Message queue publish/consume
def publish_order_event(order: Order):
    with tracer.start_as_current_span("kafka.publish") as span:
        span.set_attribute("messaging.system", "kafka")
        span.set_attribute("messaging.destination", "order-events")
        span.set_attribute("messaging.message_id", order.event_id)
        producer.send("order-events", order.to_bytes())

Span attributes that make traces searchable:

The value of a span isn't just its duration — it's the attributes you attach that let you filter and correlate later. At minimum: service name, version, environment, user/tenant ID (where applicable), the specific entity being operated on (order ID, product ID), and error details when errors occur.

Trace Context Propagation: The Most Common Failure Mode

Distributed tracing breaks when context doesn't propagate across service boundaries. If service A calls service B but doesn't pass the trace ID in the request headers, B starts a new trace — and the traces are unlinked. You can't see the full call chain.

Most tracing failures in production-like lab environments are propagation failures, not instrumentation failures.

# Correct: using auto-instrumented HTTP clients that handle propagation
import httpx
from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor

HTTPXClientInstrumentor().instrument()  # All httpx requests now propagate context

# Correct: explicit propagation for custom transports
from opentelemetry import propagate

def call_internal_service(endpoint: str, payload: dict) -> dict:
    headers = {}
    propagate.inject(headers)  # Inject W3C trace context headers
    # headers now contains: traceparent, tracestate
    
    response = custom_http_client.post(endpoint, json=payload, headers=headers)
    return response.json()

# Correct: async task queues — inject context into the job payload
from opentelemetry import propagate
import json

def enqueue_processing_job(data: dict):
    carrier = {}
    propagate.inject(carrier)  # {"traceparent": "00-abc123...-def456...-01"}
    
    job_payload = {
        "data": data,
        "trace_context": carrier  # Carry the trace context through the queue
    }
    queue.enqueue(job_payload)

def process_job(job_payload: dict):
    # On the consumer side, extract and restore context
    ctx = propagate.extract(job_payload.get("trace_context", {}))
    with trace.use_span(trace.get_current_span(), context=ctx):
        _process(job_payload["data"])

Propagation format: use W3C TraceContext (traceparent header) as the standard. If you have legacy services using B3 or Jaeger headers, configure the OTel SDK to accept multiple formats.

Sampling: The Strategy That Determines What You Can Learn

You cannot store every trace in a high-traffic system — a service handling 10,000 requests per second generates 10,000 traces per second, and storing all of them is prohibitively expensive. Sampling is the decision of which traces to keep.

Head sampling makes the decision at the start of the trace, before any service has processed the request. Simple to implement; misses rare errors on non-sampled requests.

Tail sampling makes the decision after the trace completes — letting you keep all error traces, all slow traces, and a random sample of everything else. Requires a central component (the OTel Collector) that buffers spans until the trace is complete.

# Tail sampling policy in the OTel Collector
processors:
  tail_sampling:
    decision_wait: 10s       # Wait up to 10s for all spans to arrive
    num_traces: 50000        # Buffer up to 50k traces in memory
    expected_new_traces_per_sec: 1000
    
    policies:
      # Always keep error traces
      - name: keep-errors
        type: status_code
        status_code:
          status_codes: [ERROR]
      
      # Always keep slow traces (>2 seconds end-to-end)
      - name: keep-slow
        type: latency
        latency:
          threshold_ms: 2000
      
      # Always keep traces from specific high-value users
      - name: keep-vip-users
        type: string_attribute
        string_attribute:
          key: user.tier
          values: ["enterprise", "vip"]
      
      # Random sample of everything else
      - name: probabilistic-sample
        type: probabilistic
        probabilistic:
          sampling_percentage: 1    # Keep 1% of remaining traces
      
      # Composite: apply policies in order, keep if ANY match
      - name: composite
        type: composite
        composite:
          max_total_spans_per_second: 5000
          policy_order: [keep-errors, keep-slow, keep-vip-users, probabilistic-sample]
          rate_allocation:
            - policy: keep-errors
              percent: 30
            - policy: keep-slow
              percent: 30
            - policy: keep-vip-users
              percent: 10
            - policy: probabilistic-sample
              percent: 30

Trace Analysis: Finding the Slow Service

The trace UI (Jaeger, Tempo, Datadog APM) is good for investigating specific incidents. But systematic latency analysis requires querying across many traces.

The critical path problem: in a trace with parallel calls, the total latency is determined by the longest parallel branch — the critical path. A trace where service A calls B and C in parallel, B takes 50ms and C takes 200ms, has a critical path through C. Optimizing B doesn't help.

def find_critical_path(trace: dict) -> list[dict]:
    """
    Given a trace (as returned by the Jaeger API), find the critical path —
    the sequence of spans that determines the total trace duration.
    """
    spans_by_id = {span['spanID']: span for span in trace['spans']}
    
    def get_end_time(span):
        return span['startTime'] + span['duration']
    
    def find_longest_child_path(span_id: str) -> tuple[int, list]:
        span = spans_by_id[span_id]
        children = [s for s in trace['spans'] if span_id in [r['spanID'] for r in s.get('references', [])]]
        
        if not children:
            return span['duration'], [span]
        
        child_paths = [find_longest_child_path(child['spanID']) for child in children]
        longest_duration, longest_path = max(child_paths, key=lambda x: x[0])
        
        return span['duration'] + longest_duration, [span] + longest_path
    
    # Find root span
    root_span = next(s for s in trace['spans'] if not s.get('references'))
    _, critical_path = find_longest_child_path(root_span['spanID'])
    
    return critical_path

# Aggregate: which service spans appear most often on the critical path?
def aggregate_critical_path_analysis(traces: list[dict]) -> dict:
    service_critical_path_count = {}
    
    for trace in traces:
        critical_path = find_critical_path(trace)
        for span in critical_path:
            service = span.get('process', {}).get('serviceName', 'unknown')
            service_critical_path_count[service] = service_critical_path_count.get(service, 0) + 1
    
    return dict(sorted(service_critical_path_count.items(), key=lambda x: x[1], reverse=True))

Trace-based SLO validation: for each user-facing operation, use traces to validate that latency SLOs are being met end-to-end:

# Query Tempo/Jaeger for traces over SLO threshold
def get_traces_exceeding_slo(operation: str, slo_ms: int, lookback_minutes: int = 60) -> list:
    # Using Tempo HTTP API
    params = {
        "tags": f"operation={operation}",
        "minDuration": f"{slo_ms}ms",
        "start": int((time.time() - lookback_minutes * 60) * 1e9),
        "end": int(time.time() * 1e9),
        "limit": 100
    }
    response = requests.get(f"{TEMPO_URL}/api/search", params=params)
    return response.json().get("traces", [])

The Exemplar Bridge: Connecting Metrics to Traces

The most powerful observability workflow connects metrics to traces: a latency histogram shows a spike, you click on it, and it takes you to a sample trace from that time period. This requires exemplars — trace IDs embedded in metric data points.

from opentelemetry import trace
from prometheus_client import Histogram

request_duration = Histogram(
    'http_request_duration_seconds',
    'HTTP request duration',
    ['method', 'endpoint', 'status_code'],
    buckets=[.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10]
)

def handle_request(method: str, endpoint: str):
    start = time.time()
    try:
        response = _process_request()
        duration = time.time() - start
        
        # Get current span for exemplar
        current_span = trace.get_current_span()
        span_context = current_span.get_span_context()
        
        # Record with exemplar — Prometheus will store the trace ID alongside
        request_duration.labels(
            method=method,
            endpoint=endpoint,
            status_code=str(response.status_code)
        ).observe(duration, exemplar={
            "traceID": format(span_context.trace_id, '032x'),
            "spanID": format(span_context.span_id, '016x')
        })
        
        return response
    except Exception as e:
        # ... error handling
        raise

With exemplars in place, Grafana can display "sample trace" links on any histogram panel — clicking a latency spike jumps directly to a representative trace from that exact time window.

*Zak Hassan is a Staff SRE specializing in observability engineering, distributed systems tracing, and AI-powered operations. Find him at zakhassan.com or on LinkedIn.*

Topic Paths

SRE and Reliability Kubernetes and Platform Engineering Observability and Incident Learning Cloud Cost and Capacity

About the Author

Zak Hassan writes about reliability engineering under real scale constraints.

Staff-level SRE and platform engineer focused on identity reliability, Kubernetes, observability, cloud architecture, AI infrastructure, and reducing operational uncertainty.

Connect on LinkedIn