Cutting Cloud Costs by 40%: FinOps Framework

The cloud bill nobody understands

We reviewed a client’s AWS bill last October. They were paying 14,200 euros monthly. When we asked what percentage of that bill they could explain line by line, the answer was “maybe 60%.” The remaining 40% was a mystery distributed among instances someone spun up for a test eight months ago, EBS snapshots accumulated since 2022, and cross-region data transfer that nobody had questioned.

That is the reality for most mid-market companies on cloud. They are not overspending on purpose. They lack visibility into what they spend, and without visibility there is no optimization.

According to the FinOps Foundation, FinOps is not a tool. It is an operational practice that turns cloud spend into a conscious business decision. For a company spending between 5,000 and 50,000 euros monthly on cloud, implementing basic FinOps can generate savings of 30-45% in the first 90 days.

The three-phase framework

Phase 1: Visibility (weeks 1-2)

You cannot optimize what you do not measure. Before touching a single instance, you need a complete map of your cloud spend.

Mandatory tagging. Every cloud resource must have at least three tags: team (who created it), project (what it serves), and environment (production, staging, development). Without tags, your bill is a wall of anonymous line items. AWS Cost Explorer and GCP Billing differentiate by tags. Without them, those tools are useless.

We implement a tag-or-terminate policy: any resource without tags after 72 hours receives an alert. After one week, it gets shut down. Sounds aggressive. It works. In three weeks, we went from 23% of resources tagged to 94%.

Real-time cost dashboard. Not a monthly report nobody reads. A dashboard showing accumulated spend for the day, week, and month, with comparisons against the prior period. We use Grafana Cloud with billing API data. Having this dashboard visible on a monitor in the engineering area is not optional.

Anomaly detection. AWS Cost Anomaly Detection (free) or equivalent. If your daily spend deviates more than 20% from the average, you want to know before a week passes. We have caught GPU instances left running on a Friday that would have cost 800 euros over the weekend.

Phase 2: Optimization (weeks 3-6)

With visibility established, we move to the three optimization levers that generate 80% of savings.

Right-sizing. Most instances are oversized. It is natural: nobody wants production to go down because it lacks resources, so they request a t3.xlarge “just in case” when a t3.medium would suffice.

AWS Compute Optimizer and GCP Recommender analyze the actual CPU, memory, and network usage of your instances and recommend the correct size. In our experience, between 40% and 60% of instances can be reduced by at least one size with no performance impact.

The trick is not doing it all at once. We start with development and staging instances (low risk). Then we move to production, one instance at a time, with performance monitoring for 48 hours before confirming. Typical right-sizing saves between 15% and 25% of compute spend.

Reserved Instances / Committed Use. If an instance will be running 24/7 for the next year (databases, core APIs, caches), paying on-demand is throwing money away. AWS Reserved Instances offer discounts of 30-40% (1-year, no upfront) or 55-65% (3-year, all upfront).

Our rule: any instance with utilization above 70% for the last 3 months is a reservation candidate. We do not reserve 100% of the fleet. We reserve 60-70% (the base load) and leave the rest on-demand to absorb spikes.

A common mistake: buying Savings Plans that are too large. If your actual usage drops (because you right-sized or because a project ended), the commitment forces you to pay for unused capacity. Better to start conservative and expand than to overbuy.

Spot Instances for fault-tolerant workloads. Data pipelines, integration tests, batch processing, ephemeral development environments. All of this can run on spot instances with discounts of 60-90% off on-demand. Yes, the instance can disappear with 2 minutes notice. But if your pipeline is designed for checkpointing and automatic retry, the interruption is a minor inconvenience, not a disaster.

We use AWS Spot Fleet with instance diversification: instead of requesting 10 m5.xlarge instances, we request a fleet of m5.xlarge, m5a.xlarge, m5d.xlarge, and m4.xlarge. This drastically reduces interruption probability because we do not depend on a single capacity pool.

Phase 3: Governance (week 7 onward)

Optimization without governance is a sprint, not a marathon. Savings erode in weeks when someone spins up a “temporary” instance that stays forever.

Per-team budgets. Each team has a monthly cloud budget. It is not a hard limit (we do not cut services if exceeded), but it is a visible number. When a team exceeds its budget, they have to explain why. That alone changes behavior.

Monthly cost review. 30 minutes, once a month. Review the three largest cost increases compared to the prior month, identify zombie resources (running but receiving no traffic), and update reservations. This meeting is the most important piece of the framework because it creates ongoing accountability.

Automated cleanup. Scripts that shut down development environments outside business hours (Monday to Friday, 8 AM to 8 PM). Scripts that delete EBS snapshots older than 30 days (except those tagged as critical). Scripts that detect load balancers without targets, unassociated elastic IPs, and unmounted EBS volumes.

This cleanup automation alone saves between 8% and 12% of total spend, and once implemented requires no human effort.

Real results

The client from the beginning of this article went from 14,200 to 8,400 euros monthly in 90 days. A 41% reduction.

Breakdown: right-sizing saved 2,100 euros/month (we downsized 12 instances). Reserved Instances saved 1,800 euros/month (6 database instances and 4 API instances). Zombie resource cleanup saved 1,300 euros/month (snapshots, test instances, a NAT Gateway nobody used). Development environment shutdown scheduling saved 600 euros/month.

Total effort was 40 hours of a cloud engineer over 6 weeks. ROI was achieved in the second month.

It is not magic. It is visibility, discipline, and a repeatable process. For mid-market companies, the difference between having FinOps and not having it is not a marginal refinement. It is the difference between burning 170,000 euros per year and burning 100,000. Those 70,000 euros are a senior engineer, or three innovation projects, or a financial cushion that saves you one day.

If your cloud bill is a mystery, the first step is a technology audit that identifies where the money is. Because it is somewhere. You just have to look. For companies with multi-cloud environments, FinOps optimization is even more critical as spend sources multiply. Our cloud and DevOps team implements complete FinOps frameworks.

Cutting Cloud Costs by 40%: A FinOps Framework for Mid-Market

The cloud bill nobody understands

The three-phase framework

Phase 1: Visibility (weeks 1-2)

Phase 2: Optimization (weeks 3-6)

Phase 3: Governance (week 7 onward)

Real results

Tags

About the author

Related articles

Cloud Disaster Recovery: Plan, Test and Automate

Cloud Migration Step by Step: A Framework for CTOs

ENS for Tech Companies: A Practical Certification Guide