How to Setup Prometheus and Grafana on Kubernetes: A Practical Guide

If you’ve ever deployed a production app to Kubernetes and then wondered, “Why is my pod restarting every five minutes?” or “Why is the API lagging?”, you know the pain of lacking observability. In my experience, trying to rely on kubectl logs and kubectl top is like trying to drive a car while looking through a straw.

That’s why knowing how to setup Prometheus and Grafana on Kubernetes is a non-negotiable skill for any DevOps engineer. Prometheus handles the data collection (the “brain”), and Grafana handles the visualization (the “eyes”). While you can install these manually via YAML manifests, it’s a maintenance nightmare. In this guide, I’ll show you the industry-standard way using the kube-prometheus-stack Helm chart, which bundles everything you need into one manageable package.

Prerequisites

Before we dive in, make sure you have the following ready:

A running Kubernetes cluster (Minikube, Kind, EKS, GKE, or AKS).
kubectl installed and configured to point to your cluster.
Helm 3 installed on your local machine.
Basic familiarity with Kubernetes namespaces.

Step-by-Step Implementation

Step 1: Adding the Prometheus Community Helm Repo

The easiest way to manage the installation is through the Prometheus community charts. First, add the repository and update your local chart cache:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Step 2: Creating a Dedicated Namespace

I always recommend isolating your monitoring tools in their own namespace. This prevents configuration drift and makes cleanup easier if you need to start over.

kubectl create namespace monitoring

Step 3: Installing the kube-prometheus-stack

The kube-prometheus-stack is the gold standard because it doesn’t just install Prometheus; it includes the Prometheus Operator, Alertmanager, and Grafana, and it comes pre-configured to scrape Kubernetes nodes and pods.

helm install prometheus-stack prometheus-community/kube-prometheus-stack \n  --namespace monitoring \n  --set grafana.adminPassword=admin-password-123

Wait a few minutes for the pods to initialize. You can check the progress with kubectl get pods -n monitoring.

Step 4: Accessing the Dashboards

By default, the services are created as ClusterIP, meaning they aren’t accessible from outside the cluster. For a quick test, I use port-forwarding. To access Grafana:

kubectl port-forward deployment/prometheus-stack-grafana 3000:3000 -n monitoring

Now, open your browser and go to http://localhost:3000. Log in using the password you set in the Helm command (or the default prom-operator user).

As shown in the image below, you will see that the stack comes with several pre-installed dashboards that immediately give you visibility into your CPU, Memory, and Network usage across the cluster.

Grafana dashboard showing Kubernetes cluster health metrics including CPU and Memory usage

Step 5: Configuring Custom Metrics

The real power comes when you monitor your own applications. To do this, you need to expose a /metrics endpoint in your app. If you’re using Node.js, I highly recommend checking out my custom Grafana dashboard tutorial for Node.js to see how to instrument your code properly.

Once your app is exposing metrics, you create a ServiceMonitor resource. This tells the Prometheus Operator to automatically start scraping your pods based on specific labels.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app-monitor
  labels:
    release: prometheus-stack
spec:
  selector:
    matchLabels:
      app: my-app
  endpoints:
  - port: metrics-port
    interval: 30s

Pro Tips for Production

Storage Classes: By default, Helm might use emptyDir for storage, meaning your data vanishes when the pod restarts. Always configure a PersistentVolumeClaim (PVC) in your values.yaml for Prometheus.
Resource Limits: Prometheus is memory-hungry. I’ve seen it crash entire nodes by consuming all available RAM. Always set resources.limits.memory.
High Cardinality: Be careful with labels. Adding a unique ID (like a user ID) to a Prometheus label can lead to a “cardinality explosion,” crashing your TSDB. If you start seeing performance degradation, read more about scaling Prometheus for high cardinality metrics.

Troubleshooting Common Issues

Prometheus is not scraping my pods

The most common culprit is a label mismatch. Ensure the labels in your ServiceMonitor exactly match the labels on your Service. Also, check if the ServiceMonitor has the release: prometheus-stack label, as the operator only watches monitors that match its own release label.

Grafana shows “No Data”

Verify that the Prometheus data source is correctly configured in Grafana. Navigate to Connections → Data Sources and ensure the URL points to the internal Prometheus service (e.g., http://prometheus-stack-kube-prometheus-prometheus.monitoring.svc.cluster.local).

What’s Next?

Now that you’ve mastered how to setup Prometheus and Grafana on Kubernetes, you should focus on Alerting. Configuring the Alertmanager to send notifications to Slack or PagerDuty ensures you find out about outages before your users do.

If you’re looking to further optimize your infrastructure, I suggest exploring automated scaling policies and advanced logging stacks like Loki to complement your Prometheus metrics.