Why Your Kubernetes Cluster Is Overprovisioned (And How to Fix It)

Agentic SRECloudmess Team8 min readFebruary 11, 2026

The 20% Utilization Problem

We audit a lot of Kubernetes clusters. The average CPU utilization we see across EKS clusters is 20 to 35%. That means 65 to 80% of your compute spend is waste. This is not because teams are careless. It is because Kubernetes resource management is genuinely hard to get right, and the defaults encourage overprovisioning. When a developer writes a Deployment manifest for the first time, they guess at resource requests: 'I will request 2 CPU cores and 4GB memory just to be safe.' The Kubernetes scheduler dutifully reserves those resources on a node, the Cluster Autoscaler provisions a new node if existing ones lack capacity, and the cycle repeats across dozens of pods. The result is clusters running at a fraction of their capacity with AWS bills that reflect the reserved resources, not the actual usage. We have measured the gap: on one 40-node cluster, actual CPU utilization averaged 22% while memory utilization averaged 31%, representing roughly $8,200/month in wasted compute.

Resource Requests: The Root Cause

The single most impactful misconfiguration is resource requests that do not match actual usage. In Kubernetes, resource requests determine scheduling: the scheduler places a pod on a node only if the node has enough unreserved capacity to satisfy the request. Resource limits determine throttling and OOM kills. The problem is that requests are often set once during initial deployment and never revisited. A pod requests 2 CPU cores and 4GB memory because that is what someone guessed during a sprint. Actual usage, measured over 4 weeks via Prometheus with kube-state-metrics and cAdvisor, shows 0.3 CPU and 800MB memory at P95. The scheduler reserves the full 2 CPU and 4GB, so even though the pod only uses 15% of what it asked for, nothing else can be scheduled on the remaining 85%. Multiply this by 80 pods across a cluster and you are running 3x more nodes than you need. The fix is methodical: deploy Prometheus with kube-state-metrics to collect container_cpu_usage_seconds_total and container_memory_working_set_bytes over 2 to 4 weeks, then use Goldilocks (by Fairwinds) or Kubecost's right-sizing recommendations to generate new requests based on P95 utilization plus a 20% buffer. Apply changes incrementally, 5 to 10 deployments per week, and monitor for OOM kills or CPU throttling after each batch.

Node Sizing and Instance Selection

Another common issue is using the wrong EC2 instance types for your node groups. Teams default to m5.xlarge (4 vCPU, 16GB RAM, $0.192/hour) because it is a safe general-purpose choice. But if your workloads are memory-heavy (Java applications with large heaps, Redis sidecars, ML model loading), you end up wasting CPU. If they are CPU-heavy (Go services, data processing, compression), you waste memory. We use Karpenter, AWS's open-source node provisioning tool for EKS, to automatically select the optimal instance type based on pending pod requirements. Karpenter's NodePool CRD lets you define a list of allowed instance families (for example: m5, m6i, m6a, c5, c6i, r5, r6i) and sizes (xlarge through 4xlarge), and Karpenter selects the cheapest instance that satisfies the pending pods' aggregate resource requests. It also supports consolidation: if node utilization drops below a configurable threshold (we set 40%), Karpenter will bin-pack pods onto fewer nodes and terminate the underutilized ones. This is more aggressive and faster than the Cluster Autoscaler's scale-down logic, which defaults to a 10-minute cooldown. Switching from static managed node groups to Karpenter typically saves 25 to 40% on compute because Karpenter picks right-sized, cheapest-available instances and consolidates aggressively.

Spot Instances for Non-Critical Workloads

Spot instances on EKS offer 60 to 70% savings over on-demand pricing. An m5.xlarge spot instance typically costs $0.06 to $0.08/hour compared to $0.192/hour on-demand. They work well for stateless workloads that can tolerate interruption: development and staging environments, batch processing jobs, CI/CD runners (we use GitHub Actions self-hosted runners on spot nodes), integration test suites, and non-user-facing background services like log processors and metric aggregators. The key is separating workloads into critical (on-demand or reserved) and non-critical (spot-eligible) using Karpenter NodePools with different requirements. Critical workloads get a NodePool restricted to on-demand capacity types, while spot-eligible workloads get a NodePool that prefers spot with on-demand as fallback. We configure PodDisruptionBudgets (PDBs) on every Deployment to ensure at least 1 replica remains available during spot interruptions. For spot diversification, we configure Karpenter to select from at least 10 different instance types across 3 AZs, which reduces the probability of simultaneous interruption. In practice, we run 40 to 60% of a cluster's workloads on spot instances, and interruption rates are under 5% with graceful handling via the 2-minute interruption notice and Kubernetes rescheduling.

The Optimization Playbook

Our standard EKS cost optimization engagement follows this sequence over 4 to 6 weeks. Week 1: Install Prometheus (via kube-prometheus-stack Helm chart), Kubecost (free tier is sufficient for most clusters), and Goldilocks. Configure Prometheus retention at 15 days and ensure kube-state-metrics and node-exporter are collecting. Week 2: Let metrics accumulate. Run Kubecost's cluster efficiency report and Goldilocks VPA recommendations. Identify the top 20 pods by resource waste (requested minus actual). Week 3: Right-size resource requests for the top 20 pods based on observed P95 usage plus 20% buffer. Deploy changes incrementally with monitoring for OOM kills (container_oom_events_total) and CPU throttling (container_cpu_cfs_throttled_seconds_total). Week 4: Deploy Karpenter to replace static managed node groups. Configure NodePools with instance family diversification and consolidation enabled. Set the consolidation policy to WhenUnderutilized with a TTL of 30 seconds. Week 5: Implement spot instances for eligible workloads. Start with dev/staging environments, then expand to production non-critical workloads. Configure PDBs and multi-AZ spot diversification. Week 6: Set up ongoing monitoring dashboards in Grafana showing cluster utilization, cost per namespace, and Goldilocks recommendations. Configure Slack alerts for utilization dropping below 50%. On a recent engagement, a 40-node EKS cluster running m5.xlarge instances dropped to 18 nodes (a mix of m5.large, m6i.xlarge, and c6i.large selected by Karpenter). Monthly compute cost went from $14,000 to $5,800, a 59% reduction with identical workload performance.

Back to Blog

Agentic SRE

EKS vs ECS vs Lambda: How to Pick the Right AWS Compute Layer

Every AWS compute option has tradeoffs. We break down when EKS, ECS, and Lambda actually make sense based on real workloads, not vendor marketing.