Distributed Tracing with Jaeger: Getting Started

Distributed Tracing with Jaeger: Getting Started

Disclaimer: This post was originally written for my company’s blog . I’ve translated it and added a personal touch for this space.

Running microservices in Kubernetes brings quite a few advantages: easy to scale, flexible, and efficient. But the more services you add, the trickier it gets to figure out what’s slowing down your system.

One of our clients at my company b’nerd ran into exactly this problem. Even with a nicely tuned cluster, some requests were lagging, and standard metrics and logs didn’t give enough clues to find the bottleneck.

That’s where distributed tracing came in. By setting up Jaeger, we could see the full path of requests across services, and pinpoint the bottleneck causing the latency.

In this post, I’ll walk you a bit through Jaeger - why it is so handy in Kubernetes, what it brings to the table, and how we can use it.

Why Jaeger?

Jaeger is an open-source tracing system originally developed by Uber, helping us to analyze and visualize requests that span multiple microservices.

Here are some of the benefits it brings to the table:

  • Kubernetes-friendly: Jaeger is made for containerized setups, so it slots nicely into a Kubernetes-native stack.
  • Clear traces: Unlike logs and metrics that only show pieces, Jaeger gives a structured, visual flow of requests.
  • High scalability: Big cluster? No problem. Jaeger can process tons of trace data without breaking a sweat.
  • OpenTelemetry support: Works smoothly with OpenTelemetry, so you’re set for the future and not locked into any vendor.

With Jaeger, we can finally answer questions like:

  • Why is this API endpoint slower than expected?
  • Which services are talking too much or inefficiently?
  • How can we add better observability to complement Prometheus or Loki?

Best Practices for Jaeger in Kubernetes

To get the most out of Jaeger, a few strategies and configurations can help:

Installation

The simplest way to deploy Jaeger is via the official Helm chart:

helm repo add jaegertracing https://jaegertracing.github.io/helm-charts
helm install jaeger jaegertracing/jaeger

You could also use the manifests or the operator if you prefer - whatever floats your boat.

Storage Strategy

Jaeger supports multiple storage backends. While in-memory storage is great for testing or small setups, you should definitely go with Elasticsearch or Cassandra for anything production-related. There’s even a gRPC plugin if you want a custom backend.

Sampling Strategy

Tracing every request can slow things down. Jaeger gives us options:

  • Probabilistic: Only trace a percentage of requests.
  • Adaptive: Adjusts dynamically based on system load.
  • Constant: Trace everything (or nothing) - handy for dev environments.

Security Considerations

As always, we shouldn’t forget about security. Use RBAC to control who can see the Jaeger UI or API. Network policies can also help prevent accidental exposure of traces.

Getting Traces Into the Services

Once Jaeger is installed and running, we need to actually collect data from our services.

Instrumentation

To collect trace data from our services, we’ll need to integrate tracing logic. Jaeger previously offered its own client libraries, but these have been deprecated. And given everyone is talking about OpenTelemetry anyhow, this is also the recommended approach.

Here’s a minimal example for Node.js:

const { NodeTracerProvider } = require("@opentelemetry/sdk-trace-node");
const provider = new NodeTracerProvider();
provider.register();

And voilà - the requests start generating traces.

Automating with Alerts

We can also integrate Jaeger into the Prometheus Alertmanager to get notified about slow requests:

- alert: HighRequestLatency
  expr: histogram_quantile(0.99, rate(jaeger_traces_latency_bucket[5m])) > 1
  for: 5m
  labels:
  severity: critical

This way, we don’t have to stare at dashboards constantly - the alerts come to us.

Wrapping up

Jaeger has been a game-changer for us when debugging distributed systems in Kubernetes. We can finally see the full journey of a request, find bottlenecks fast, and optimize our services with confidence.

If our cluster has multiple microservices, I’d seriously recommend giving Jaeger a try.