EKS vs ECS vs Lambda: How to Pick the Right AWS Compute Layer
The Compute Decision Nobody Gets Right the First Time
We get this question on almost every engagement: should we use EKS, ECS, or Lambda? The honest answer is that it depends, but not in the hand-wavy way most consultants mean it. It depends on specific, measurable factors: your traffic patterns, cold start tolerance, team expertise, and operational budget. Most teams pick based on what they already know or what a blog post recommended, then spend months fighting the consequences.
When Lambda Is the Right Call
Lambda works brilliantly for event-driven workloads with unpredictable traffic. API endpoints that get 10 requests per minute most of the day but spike to 10,000 during a marketing push. Data transformation jobs triggered by S3 uploads. Webhook handlers. Cron jobs that run for under 15 minutes. If your workload is bursty, stateless, and completes quickly, Lambda saves you from paying for idle compute. The tradeoff is cold starts (1-3 seconds for Java/Python, less for Node), a 15-minute execution limit, and limited control over the runtime environment. If you need persistent connections, long-running processes, or GPU access, Lambda is the wrong choice.
When ECS Makes Sense
ECS on Fargate is the sweet spot for teams that want containers without Kubernetes complexity. You define tasks, set CPU and memory, and AWS handles the rest. It works well for straightforward web services, background workers, and microservices where you need more control than Lambda but don't need the full Kubernetes ecosystem. We recommend ECS Fargate for teams under 20 engineers running fewer than 15 services. The operational overhead is significantly lower than EKS. You lose some flexibility around service mesh, custom scheduling, and advanced networking, but for most workloads that doesn't matter.
When You Actually Need EKS
EKS is the right choice when you genuinely need Kubernetes features: custom operators, advanced scheduling (GPU affinity, spot instance management), service mesh with Istio or Linkerd, or portability across cloud providers. ML inference workloads that need GPU scheduling, multi-tenant platforms, and teams already running Kubernetes on-prem are all solid EKS use cases. The cost is real though. EKS has a $73/month control plane fee per cluster, and you need at least one engineer who understands Kubernetes networking, RBAC, and upgrades. We've seen teams adopt EKS because it seemed like the 'serious' choice, then spend 40% of their engineering time on cluster operations instead of building product.
The Hybrid Approach That Usually Wins
Most of our clients end up with a mix. Core services run on ECS Fargate or EKS depending on scale. Event-driven glue logic runs on Lambda. Batch processing uses Step Functions with Lambda or Fargate tasks. The key is matching each workload to the compute layer that minimizes both cost and operational burden. We've seen a 50-person startup run their entire platform on ECS Fargate with Lambda for async jobs, and a 200-person company that genuinely needed EKS for their ML platform but ran everything else on ECS. Neither was wrong. Both were right for their context.