Step by Step Guide to Distributed Tracing with Jaeger: From Setup to Insight

When I first started working with microservices, debugging a single failed request felt like trying to find a needle in a haystack—except the haystack was spread across five different servers and three different programming languages. Standard logs weren’t enough because they lacked a common thread. This is why I shifted to distributed tracing. In this step by step guide to distributed tracing with Jaeger, I’ll show you how to implement a system that allows you to visualize the entire lifecycle of a request as it travels through your architecture.

Distributed tracing isn’t just about finding errors; it’s about understanding latency. If a page load takes 2 seconds, is the bottleneck in the database query, the authentication middleware, or a slow third-party API? Jaeger gives you the answer visually.

Prerequisites

Before we dive into the implementation, ensure you have the following installed and configured on your machine:

Docker & Docker Compose: We’ll use these to run Jaeger without manual installation.
A basic Microservices Project: I’ll use a simple Node.js and Python setup, but the concepts apply to any language.
OpenTelemetry SDKs: Since Jaeger is fully compatible with OpenTelemetry, we will use OTel for instrumentation. If you’re new to this, I highly recommend reading my introduction to OpenTelemetry for developers to understand the standard.

Step 1: Deploying the Jaeger Backend

The fastest way to get started is using the “all-in-one” Docker image. This combines the agent, collector, and UI into a single container, which is perfect for development and testing.

docker run -d --name jaeger 
  -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 
  -p 5775:5775/udp 
  -p 6831:6831/udp 
  -p 6832:6832/udp 
  -p 5778:5778 
  -p 16686:16686 
  -p 14268:14268 
  -p 14250:14250 
  -p 9411:9411 
  jaegertracing/all-in-one:latest

Once the container is running, you can access the Jaeger UI at http://localhost:16686. At this stage, the dashboard will be empty because we haven’t sent any traces yet.

Step 2: Instrumenting Your Application

To make your apps “traceable,” you need to instrument them. While you can use Jaeger-specific libraries, I always use OpenTelemetry (OTel) because it prevents vendor lock-in. Here is how I set up a Node.js service to send spans to Jaeger.

const { NodeSDK } = require('@opentelemetry/sdk-node');
const { JaegerExporter } = require('@opentelemetry/exporter-jaeger');
const { Resource } = require('@opentelemetry/resources');
const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions');

const sdk = new NodeSDK({
  resource: new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: 'order-service',
  }),
  traceExporter: new JaegerExporter({
    endpoint: 'http://localhost:14250',
  }),
});

sdk.start();

In my experience, the most critical part of this setup is the SERVICE_NAME. If you give multiple services the same name, your traces will overlap in the UI, making it impossible to distinguish between them.

Step 3: Propagating Context Across Services

Tracing only works if the trace-id is passed from Service A to Service B. This is called Context Propagation. When Service A calls Service B via HTTP, it injects a header (usually traceparent). Service B then extracts this header and starts its own span as a child of the original trace.

As shown in the image below, this creates a parent-child relationship that Jaeger visualizes as a gantt chart, allowing you to see exactly where time is being spent.

Jaeger UI showing a distributed trace timeline with parent and child spans

Step 4: Analyzing Traces in the Jaeger UI

Now, trigger a few requests in your application. Go back to http://localhost:16686, select your service from the dropdown, and click “Find Traces.”

When you click on a trace, you’ll see the timeline view. Look for “long bars”—these represent slow operations. You can click on a span to see the tags and logs associated with it, such as the HTTP status code or specific error messages. This integrates perfectly with best practices for centralized logging in microservices, as you can use the trace_id to jump from a log entry directly to the visual trace.

Pro Tips for Jaeger Production Setups

Sampling Rates: Don’t trace 100% of requests in production. It will overwhelm your storage and add latency. Start with a 1% or 5% sampling rate.
Use a Collector: In production, don’t send spans directly from the app to the Jaeger backend. Use the Jaeger Collector or the OpenTelemetry Collector as a middleman to buffer and batch data.
Custom Tags: Add business-specific tags (e.g., customer_id or order_type) to your spans. This allows you to search for traces affecting a specific high-value customer.

Troubleshooting Common Issues

Issue	Likely Cause	Solution
No traces appearing in UI	Incorrect exporter endpoint or firewall blocking port 14250.	Verify connectivity with `curl` and check the app logs for export errors.
Traces are fragmented (multiple traces for one request)	Context propagation is missing between services.	Ensure the OTel propagation headers are being sent and received in HTTP calls.
High CPU usage on app	Tracing overhead due to 100% sampling.	Implement a `ParentBased` sampler to reduce data volume.

What’s Next?

Now that you have a basic step by step guide to distributed tracing with Jaeger implemented, you should look into integrating your traces with metrics. While Jaeger tells you why a request was slow, a tool like Prometheus tells you how many requests are slow. Combining these three pillars—logs, metrics, and traces—is the gold standard of modern observability.

Ready to level up your infrastructure? Check out my other guides on OpenTelemetry and log aggregation to build a bulletproof monitoring stack.