When I first started working with microservices, debugging a single failed request felt like trying to find a needle in a haystack—except the haystack was spread across five different servers and three different programming languages. Standard logs weren’t enough because they lacked a common thread. This is why I shifted to distributed tracing. In this step by step guide to distributed tracing with Jaeger, I’ll show you how to implement a system that allows you to visualize the entire lifecycle of a request as it travels through your architecture.

Distributed tracing isn’t just about finding errors; it’s about understanding latency. If a page load takes 2 seconds, is the bottleneck in the database query, the authentication middleware, or a slow third-party API? Jaeger gives you the answer visually.

Prerequisites

Before we dive into the implementation, ensure you have the following installed and configured on your machine:

Step 1: Deploying the Jaeger Backend

The fastest way to get started is using the “all-in-one” Docker image. This combines the agent, collector, and UI into a single container, which is perfect for development and testing.

docker run -d --name jaeger 
  -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 
  -p 5775:5775/udp 
  -p 6831:6831/udp 
  -p 6832:6832/udp 
  -p 5778:5778 
  -p 16686:16686 
  -p 14268:14268 
  -p 14250:14250 
  -p 9411:9411 
  jaegertracing/all-in-one:latest

Once the container is running, you can access the Jaeger UI at http://localhost:16686. At this stage, the dashboard will be empty because we haven’t sent any traces yet.

Step 2: Instrumenting Your Application

To make your apps “traceable,” you need to instrument them. While you can use Jaeger-specific libraries, I always use OpenTelemetry (OTel) because it prevents vendor lock-in. Here is how I set up a Node.js service to send spans to Jaeger.

const { NodeSDK } = require('@opentelemetry/sdk-node');
const { JaegerExporter } = require('@opentelemetry/exporter-jaeger');
const { Resource } = require('@opentelemetry/resources');
const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions');

const sdk = new NodeSDK({
  resource: new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: 'order-service',
  }),
  traceExporter: new JaegerExporter({
    endpoint: 'http://localhost:14250',
  }),
});

sdk.start();

In my experience, the most critical part of this setup is the SERVICE_NAME. If you give multiple services the same name, your traces will overlap in the UI, making it impossible to distinguish between them.

Step 3: Propagating Context Across Services

Tracing only works if the trace-id is passed from Service A to Service B. This is called Context Propagation. When Service A calls Service B via HTTP, it injects a header (usually traceparent). Service B then extracts this header and starts its own span as a child of the original trace.

As shown in the image below, this creates a parent-child relationship that Jaeger visualizes as a gantt chart, allowing you to see exactly where time is being spent.

Jaeger UI showing a distributed trace timeline with parent and child spans
Jaeger UI showing a distributed trace timeline with parent and child spans

Step 4: Analyzing Traces in the Jaeger UI

Now, trigger a few requests in your application. Go back to http://localhost:16686, select your service from the dropdown, and click “Find Traces.”

When you click on a trace, you’ll see the timeline view. Look for “long bars”—these represent slow operations. You can click on a span to see the tags and logs associated with it, such as the HTTP status code or specific error messages. This integrates perfectly with best practices for centralized logging in microservices, as you can use the trace_id to jump from a log entry directly to the visual trace.

Pro Tips for Jaeger Production Setups

Troubleshooting Common Issues

Issue Likely Cause Solution
No traces appearing in UI Incorrect exporter endpoint or firewall blocking port 14250. Verify connectivity with curl and check the app logs for export errors.
Traces are fragmented (multiple traces for one request) Context propagation is missing between services. Ensure the OTel propagation headers are being sent and received in HTTP calls.
High CPU usage on app Tracing overhead due to 100% sampling. Implement a ParentBased sampler to reduce data volume.

What’s Next?

Now that you have a basic step by step guide to distributed tracing with Jaeger implemented, you should look into integrating your traces with metrics. While Jaeger tells you why a request was slow, a tool like Prometheus tells you how many requests are slow. Combining these three pillars—logs, metrics, and traces—is the gold standard of modern observability.

Ready to level up your infrastructure? Check out my other guides on OpenTelemetry and log aggregation to build a bulletproof monitoring stack.