EKS vs ECS vs Lambda: How to Pick the Right AWS Compute Layer
The Compute Decision Nobody Gets Right the First Time
We get this question on almost every engagement: should we use EKS, ECS, or Lambda? The honest answer is that it depends, but not in the hand-wavy way most consultants mean it. It depends on specific, measurable factors: your traffic patterns, cold start tolerance, team expertise, and operational budget. Most teams pick based on what they already know or what a blog post recommended, then spend months fighting the consequences. We have seen a team choose EKS for a 3-service application because 'Kubernetes is the industry standard,' then spend 6 months struggling with cluster upgrades, RBAC policies, and networking. They eventually migrated to ECS Fargate and cut their operational overhead by 70%.
When Lambda Is the Right Call
Lambda works brilliantly for event-driven workloads with unpredictable traffic. API endpoints that get 10 requests per minute most of the day but spike to 10,000 during a marketing push. Data transformation jobs triggered by S3 uploads via EventBridge. Webhook handlers for Stripe, GitHub, or Slack integrations. Scheduled jobs via EventBridge Scheduler that run for under 15 minutes. If your workload is bursty, stateless, and completes quickly, Lambda saves you from paying for idle compute. At the infrastructure level, Lambda supports up to 10GB of memory, 6 vCPUs (allocated proportionally to memory), and runs on Firecracker microVMs. The tradeoffs are real: cold starts range from 100ms to 3 seconds depending on runtime and package size (Java and .NET are the worst offenders; Node.js and Python with minimal dependencies are the fastest). The 15-minute execution limit is a hard cap. You get limited control over the runtime environment, no persistent filesystem (only 512MB of ephemeral /tmp storage, expandable to 10GB), and no GPU access. If you need persistent connections, WebSockets, long-running processes, or GPU inference, Lambda is the wrong choice.
When ECS Makes Sense
ECS on Fargate is the sweet spot for teams that want containers without Kubernetes complexity. You define task definitions specifying CPU (0.25 to 4 vCPU) and memory (0.5 to 30 GB), configure an ECS service with desired count and deployment settings, and AWS handles host management, patching, and scaling. It works well for straightforward web services behind an ALB, background workers processing SQS queues, and microservices where you need more control than Lambda but do not need the full Kubernetes ecosystem. We recommend ECS Fargate for teams under 20 engineers running fewer than 15 services. The operational overhead is significantly lower than EKS: no cluster upgrades, no node management, no CNI plugin debugging, and no etcd maintenance. You lose some flexibility around service mesh (App Mesh is AWS's offering but lags behind Istio significantly), custom scheduling, and advanced networking. But for most workloads, ECS Service Connect provides adequate service-to-service communication, and CloudMap handles service discovery. The cost is approximately 20% higher than equivalent EC2 instances, but the operational savings in engineering hours more than compensate.
When You Actually Need EKS
EKS is the right choice when you genuinely need Kubernetes-native features: custom operators (like Spark Operator for data workloads or KubeFlow for ML pipelines), advanced scheduling with node affinity and GPU taints, service mesh with Istio or Linkerd for mTLS and traffic management, or multi-cloud portability where you run the same workloads on EKS and GKE. ML inference workloads that need NVIDIA GPU scheduling via the device plugin, multi-tenant platforms requiring namespace-level isolation with NetworkPolicies, and teams already running Kubernetes on-prem with established GitOps practices are all solid EKS use cases. The cost is real, though. EKS charges $73/month per cluster for the control plane (or $146/month for extended support). You need at least one engineer who understands Kubernetes networking (VPC CNI, CoreDNS, kube-proxy in iptables vs IPVS mode), RBAC (ClusterRoles, ServiceAccounts, IRSA for IAM integration), and the upgrade lifecycle (EKS versions trail upstream by roughly 2 to 3 months, and you must upgrade at least annually). We have seen teams adopt EKS because it seemed like the 'serious' choice, then spend 40% of their engineering time on cluster operations instead of building product.
The Hybrid Approach That Usually Wins
Most of our clients end up with a hybrid architecture. Core services run on ECS Fargate or EKS depending on scale and team expertise. Event-driven glue logic (file processing, notification dispatch, webhook handling) runs on Lambda. Batch processing uses Step Functions orchestrating Lambda tasks for short jobs or Fargate tasks for long-running jobs over 15 minutes. We have seen a 50-person startup run their entire platform on ECS Fargate with Lambda for async jobs, spending $3,200/month in compute. And a 200-person company that genuinely needed EKS for their ML platform with 8 GPU node groups, Karpenter for autoscaling, and Istio for traffic management, but ran all their non-ML services on ECS to avoid over-complicating operations. Neither was wrong. Both were right for their context. The decision framework we use: start with Lambda for anything event-driven and bursty, use ECS Fargate as the default for persistent services, and only adopt EKS when you have a concrete Kubernetes-native requirement that ECS cannot satisfy. This avoids premature complexity while leaving the door open for migration as your needs evolve.