If you’ve ever deployed a production app to Kubernetes and then wondered, “Why is my pod restarting every five minutes?” or “Why is the API lagging?”, you know the pain of lacking observability. In my experience, trying to rely on kubectl logs and kubectl top is like trying to drive a car while looking through a straw.
That’s why knowing how to setup Prometheus and Grafana on Kubernetes is a non-negotiable skill for any DevOps engineer. Prometheus handles the data collection (the “brain”), and Grafana handles the visualization (the “eyes”). While you can install these manually via YAML manifests, it’s a maintenance nightmare. In this guide, I’ll show you the industry-standard way using the kube-prometheus-stack Helm chart, which bundles everything you need into one manageable package.
Prerequisites
Before we dive in, make sure you have the following ready:
- A running Kubernetes cluster (Minikube, Kind, EKS, GKE, or AKS).
kubectlinstalled and configured to point to your cluster.- Helm 3 installed on your local machine.
- Basic familiarity with Kubernetes namespaces.
Step-by-Step Implementation
Step 1: Adding the Prometheus Community Helm Repo
The easiest way to manage the installation is through the Prometheus community charts. First, add the repository and update your local chart cache:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
Step 2: Creating a Dedicated Namespace
I always recommend isolating your monitoring tools in their own namespace. This prevents configuration drift and makes cleanup easier if you need to start over.
kubectl create namespace monitoring
Step 3: Installing the kube-prometheus-stack
The kube-prometheus-stack is the gold standard because it doesn’t just install Prometheus; it includes the Prometheus Operator, Alertmanager, and Grafana, and it comes pre-configured to scrape Kubernetes nodes and pods.
helm install prometheus-stack prometheus-community/kube-prometheus-stack \n --namespace monitoring \n --set grafana.adminPassword=admin-password-123
Wait a few minutes for the pods to initialize. You can check the progress with kubectl get pods -n monitoring.
Step 4: Accessing the Dashboards
By default, the services are created as ClusterIP, meaning they aren’t accessible from outside the cluster. For a quick test, I use port-forwarding. To access Grafana:
kubectl port-forward deployment/prometheus-stack-grafana 3000:3000 -n monitoring
Now, open your browser and go to http://localhost:3000. Log in using the password you set in the Helm command (or the default prom-operator user).
As shown in the image below, you will see that the stack comes with several pre-installed dashboards that immediately give you visibility into your CPU, Memory, and Network usage across the cluster.
Step 5: Configuring Custom Metrics
The real power comes when you monitor your own applications. To do this, you need to expose a /metrics endpoint in your app. If you’re using Node.js, I highly recommend checking out my custom Grafana dashboard tutorial for Node.js to see how to instrument your code properly.
Once your app is exposing metrics, you create a ServiceMonitor resource. This tells the Prometheus Operator to automatically start scraping your pods based on specific labels.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-app-monitor
labels:
release: prometheus-stack
spec:
selector:
matchLabels:
app: my-app
endpoints:
- port: metrics-port
interval: 30s
Pro Tips for Production
- Storage Classes: By default, Helm might use
emptyDirfor storage, meaning your data vanishes when the pod restarts. Always configure aPersistentVolumeClaim (PVC)in yourvalues.yamlfor Prometheus. - Resource Limits: Prometheus is memory-hungry. I’ve seen it crash entire nodes by consuming all available RAM. Always set
resources.limits.memory. - High Cardinality: Be careful with labels. Adding a unique ID (like a user ID) to a Prometheus label can lead to a “cardinality explosion,” crashing your TSDB. If you start seeing performance degradation, read more about scaling Prometheus for high cardinality metrics.
Troubleshooting Common Issues
Prometheus is not scraping my pods
The most common culprit is a label mismatch. Ensure the labels in your ServiceMonitor exactly match the labels on your Service. Also, check if the ServiceMonitor has the release: prometheus-stack label, as the operator only watches monitors that match its own release label.
Grafana shows “No Data”
Verify that the Prometheus data source is correctly configured in Grafana. Navigate to Connections → Data Sources and ensure the URL points to the internal Prometheus service (e.g., http://prometheus-stack-kube-prometheus-prometheus.monitoring.svc.cluster.local).
What’s Next?
Now that you’ve mastered how to setup Prometheus and Grafana on Kubernetes, you should focus on Alerting. Configuring the Alertmanager to send notifications to Slack or PagerDuty ensures you find out about outages before your users do.
If you’re looking to further optimize your infrastructure, I suggest exploring automated scaling policies and advanced logging stacks like Loki to complement your Prometheus metrics.