We Cut a $65K Cloud Bill by 60%. Here Is Exactly How

FinOpsCloudmess Team6 min readJanuary 20, 2026

The $65K Question

The CFO asked for a breakdown of the cloud bill. The CTO had no answer, just knew 'that is what the models cost to run.' Sound familiar? This is one of the most common conversations we have with engineering leaders. Cloud costs creep up gradually: nobody owns the bill, and by the time someone asks questions, the waste is baked into the infrastructure. In this case, the company was running a mix of ML inference endpoints, application services, and data pipelines on AWS. Their Cost Explorer showed a steep upward trend over 8 months, from $28K to $65K, but cost allocation tags were either missing or inconsistent across 80% of their resources. Step one was getting visibility. We deployed Kubecost on their EKS clusters and configured AWS Cost and Usage Reports with Athena for cross-service analysis.

Week 1: The Audit

We start every FinOps engagement with a systematic audit using a combination of AWS Cost Explorer, Trusted Advisor, and custom scripts that query the AWS Cost and Usage Report via Athena. We map every resource to a team, a project, and a purpose using a tagging strategy based on four mandatory tags: team, project, environment, and cost-center. In this case, we found the usual suspects: 6 RDS db.r5.2xlarge instances sized for peak traffic that happens 2 hours a day (average CPU utilization was 11%), 14 development EC2 instances running 24/7 including weekends at a combined cost of $4,200/month, 3 reserved instances purchased 18 months ago for workloads that had been decommissioned, and zero auto-scaling configured on 9 ECS services that could easily scale down during off-peak hours between 10pm and 7am.

Week 2: The Quick Wins

Some savings are immediate and require no architectural changes. We right-sized the 6 RDS instances from db.r5.2xlarge ($1,740/month each) to db.r5.large ($435/month each) based on 30 days of CloudWatch metrics showing peak CPU at 34% and peak memory at 42%, well within the capacity of the smaller instances. We configured AWS Instance Scheduler on all development environments to run only Monday through Friday, 8am to 8pm, reducing their runtime by 65%. That alone saved $8,100/month. We terminated the 3 unused reserved instances and purchased Compute Savings Plans at a 1-year no-upfront commitment, covering 70% of their steady-state compute at a 28% discount. We also identified $1,400/month in idle Elastic IPs, unused EBS volumes, and orphaned snapshots using AWS Trusted Advisor and a custom cleanup script.

Week 3: Structural Changes

The bigger savings came from architectural changes that required careful planning and testing. We implemented target-tracking auto-scaling on all ECS services with a CPU target of 60% and a scale-down cooldown of 300 seconds, allowing services to shrink from 4 tasks to 1 during off-peak hours. We migrated 12TB of infrequently accessed data from S3 Standard ($0.023/GB) to S3 Intelligent-Tiering, which automatically moves objects between access tiers and saved roughly $180/month. We consolidated 4 redundant ElastiCache clusters that had accumulated during two years of rapid growth into a single Redis 7.0 cluster with 3 shards, reducing ElastiCache spend from $2,100/month to $720/month. Finally, we set up a real-time cost dashboard in Grafana using the AWS Cost and Usage Report as a data source, broken down by team and project, so the question 'where is the money going?' always has an answer within 24 hours of spend.

The Result

Monthly bill went from $65K to $26K, a 60% reduction. Performance stayed identical across all services. Zero degradation in latency, throughput, or availability. The P99 latency on their primary API actually improved by 15% after the RDS right-sizing because the new instances had faster EBS-optimized networking. The engineering team now has visibility into costs by team and project through Kubecost dashboards and weekly automated cost reports sent to Slack via a Lambda function. New resources are automatically tagged via an AWS Config rule that flags untagged resources within 24 hours. We also set up AWS Budgets with alerts at 80% and 100% thresholds per team, so cost overruns are caught before the end of the month. The CFO stopped asking hard questions because the answers are always available in real time.

Back to Blog

FinOps

The Real Cost of Running AI in Production on AWS

GPU instances, model inference, data pipelines, storage. We break down the actual cost structure of production AI workloads and where most teams overspend.