do you know what your pods are actually using?
We set our k8s resource limits to 300m CPU and 512MB memory. Reasonable defaults. Then I realized I had no idea if our pods were using 10% or 90% of those limits.
Fastest way to check: k9s. Navigate to your pods, hit d for detail view. You get actual CPU in millicores, actual memory in MB, and percentage of limit used. 30 seconds to see if your limits are sane.
k9s# Navigate to pod → press 'd' for details# Shows: CPU 45m/300m (15%), MEM 280Mi/512Mi (55%)If you’re running HPA, you need to know this. HPA decisions are driven by resource utilization. If your pods consistently use 15% CPU, HPA will never scale up. Might be fine, might mean your limits are way too generous and you’re wasting money.
For proper observability beyond spot-checking: Prometheus + Grafana + kube-state-metrics. Prometheus collects metrics, kube-state-metrics exposes cluster-level state, Grafana visualizes. Track utilization over time, set up alerts, see patterns not snapshots.
But running ECHO on DigitalOcean k8s, start with k9s and Metrics Server. Don’t deploy a full Prometheus stack until you have a specific question that requires historical data. For a small team, maintaining Prometheus + Grafana + Fluentd + Elasticsearch is real overhead. k9s gives you 80% of the insight for 0% of the maintenance.
One investment worth making early: Kubernetes Event Exporter. Scaling events, pod restarts, OOM kills get lost in the default event stream. Exporting them to a searchable log means you can actually do post-incident analysis when something breaks at 2am.
Our monitoring evolved incrementally: k9s for daily checks, Sentry for error tracking, ArgoCD for deployment visibility, Plausible for user analytics. Each added when we hit a specific pain point, not because a blog post said we needed it. Observability is only useful if someone’s actually looking at it.