I remember the first time my production Prometheus instance went into a crash loop. I had just added a new label to my request metrics—user_id—thinking it would be great for debugging. Within an hour, the memory usage spiked, the OOM killer stepped in, and my entire monitoring stack vanished. I had fallen into the classic trap of scaling Prometheus for high cardinality metrics without a plan.

High cardinality occurs when a metric has labels with a huge number of unique values. While Prometheus is incredibly efficient, every unique combination of label values creates a new time series. When you have millions of these, your RAM usage explodes, and your queries slow to a crawl. If you’ve already learned how to setup Prometheus and Grafana on Kubernetes, you know the basics—but scaling for cardinality requires a different mental model.

The Challenge: The Cardinality Explosion

In Prometheus, a time series is identified by the metric name and its label set. For example, http_requests_total{method="GET", endpoint="/api/v1"} is one series. If you add user_id as a label, and you have 100,000 users, you suddenly have 100,000 time series for every single combination of method and endpoint.

This puts immense pressure on the TSDB (Time Series Database) in three ways:

Solution Overview: Strategies for Scale

When I’m tasked with scaling Prometheus for high cardinality metrics, I follow a hierarchy of optimization: first, eliminate the waste; second, optimize the queries; and third, distribute the load.

1. The “Clean House” Approach (Label Management)

The most effective way to scale is to stop collecting data you don’t need. I’ve found that 80% of cardinality issues are caused by labels like user_id, request_id, or timestamp. These belong in logs (like Loki or ELK), not in metrics.

2. Recording Rules for Heavy Queries

If you have a high-cardinality metric that you must keep, but you only need the aggregated view in your custom Grafana dashboard tutorial for Node.js, use Recording Rules. These pre-calculate expensive queries and save the result as a new, low-cardinality time series.

# prometheus.rules.yml
groups:
  - name: aggregation_rules
    rules:
      - record: job:http_requests:rate5m
        expr: sum by (job) (rate(http_requests_total[5m]))

Now, instead of querying millions of series, Grafana queries one series per job. As shown in the performance comparison below, this can reduce dashboard load times from 10 seconds to 200ms.

Benchmark chart comparing raw Prometheus query time vs Recording Rule query time
Benchmark chart comparing raw Prometheus query time vs Recording Rule query time

Implementation: Scaling with Thanos and Cortex

Once you’ve optimized your labels and rules, and you’re still hitting limits, it’s time to move beyond a single Prometheus server. I typically recommend Thanos for its simplicity in adding long-term storage and a global query view.

Deploying the Thanos Sidecar

By adding a Thanos sidecar to your Prometheus pods, you can ship your blocks of data to object storage (S3/GCS). This removes the local disk bottleneck and allows you to scale horizontally by sharding your metrics across multiple Prometheus instances.

Pro Tip: If you are managing a massive multi-tenant environment, look into Cortex or Mimir. They provide a fully distributed TSDB architecture that handles cardinality much more gracefully than a standalone Prometheus instance.

Case Study: From 1M to 50M Series

In a recent project, I helped a client scale their monitoring from 1 million to 50 million active series. We implemented a three-pronged strategy:

  1. Label Dropping: We used metric_relabel_configs to drop high-cardinality labels from non-critical services.
  2. Remote Write: We shifted the storage burden to a managed Mimir cluster.
  3. Query Optimization: We replaced count(up) with specific recording rules for service health.

The result? RAM usage on the scraping pods dropped by 40%, and query reliability hit 99.9%.

Common Pitfalls to Avoid