Cloud Cost Engineering for SREs: FinOps Practices That Actually Work

*By Zak Hassan — Staff SRE | May 2026*

Cloud costs are reliability's shadow metric. A team that over-provisions for reliability headroom wastes money; a team that under-provisions to save money creates reliability risk. The SRE who understands cost engineering can make reliability investments intelligently — buying the right amount of resilience for the actual risk — instead of either hoarding resources out of fear or cutting corners that will matter at 3am.

This is the operational side of cloud cost engineering: attribution, rightsizing, purchasing strategies, and the tools that make cost management a continuous discipline rather than a quarterly panic.

The Attribution Problem

The fundamental challenge in cloud cost management is attribution — knowing which team, service, or product is responsible for which cost. Without attribution, cost conversations are impossible: you can see the total bill going up, but you can't hold anyone accountable or make intelligent decisions about where to cut.

Tagging strategy: every resource in AWS/GCP/Azure should carry a minimum set of tags:

# Enforced tag set via AWS Config Rule or GCP Organization Policy
REQUIRED_TAGS = {
    "team": "The engineering team responsible (e.g., 'backend', 'data')",
    "service": "The application or service (e.g., 'checkout-service', 'ml-pipeline')",
    "environment": "production | staging | development",
    "cost_center": "Finance cost center code for chargeback",
}

# AWS Config rule to flag untagged resources
aws_config_rule = {
    "ConfigRuleName": "required-tags",
    "Source": {
        "Owner": "AWS",
        "SourceIdentifier": "REQUIRED_TAGS"
    },
    "InputParameters": json.dumps({
        "tag1Key": "team",
        "tag2Key": "service",
        "tag3Key": "environment",
        "tag4Key": "cost_center"
    })
}

Kubernetes workload costs require a different approach — tagging cloud instances doesn't tell you which Kubernetes service consumed which resources on a shared node:

# OpenCost / Kubecost: namespace-level cost attribution
# Query the OpenCost API for cost by namespace
def get_namespace_costs(window: str = "7d") -> dict:
    response = requests.get(
        f"http://opencost.monitoring:9003/allocation/compute",
        params={
            "window": window,
            "aggregate": "namespace",
            "accumulate": "true"
        }
    )
    
    allocations = response.json()["data"][0]
    
    costs_by_namespace = {}
    for namespace, data in allocations.items():
        costs_by_namespace[namespace] = {
            "total_cost": data["totalCost"],
            "cpu_cost": data["cpuCost"],
            "memory_cost": data["ramCost"],
            "storage_cost": data["pvCost"],
            "network_cost": data["networkCost"],
            "efficiency": data["totalEfficiency"]  # % of requested resources actually used
        }
    
    return dict(sorted(costs_by_namespace.items(), key=lambda x: x[1]["total_cost"], reverse=True))

Rightsizing: The Biggest Lever

Rightsizing — matching instance sizes to actual workload — is consistently the highest-ROI cloud cost reduction activity. Most production environments are significantly over-provisioned because developers over-specify resource requirements to avoid OOM kills, and the over-specification is never revisited.

AWS Compute Optimizer integration:

import boto3

def get_rightsizing_recommendations() -> list[dict]:
    optimizer = boto3.client('compute-optimizer', region_name='us-west-2')
    
    # Get EC2 instance recommendations
    ec2_recs = optimizer.get_ec2_instance_recommendations(
        filters=[{
            "name": "Finding",
            "values": ["Overprovisioned"]  # Only get downsizing opportunities
        }]
    )
    
    recommendations = []
    for rec in ec2_recs['instanceRecommendations']:
        current = rec['currentInstanceType']
        recommended = rec['recommendationOptions'][0]['instanceType']  # Best option
        monthly_savings = rec['recommendationOptions'][0]['projectedUtilizationMetrics']
        
        recommendations.append({
            "instance_id": rec['instanceArn'].split('/')[-1],
            "current_type": current,
            "recommended_type": recommended,
            "finding": rec['finding'],
            "estimated_monthly_savings": rec['recommendationOptions'][0].get('estimatedMonthlySavings', {}).get('value', 0),
            "utilization_p99_cpu": next((m['value'] for m in rec['utilizationMetrics'] if m['name'] == 'CPU' and m['statistic'] == 'MAXIMUM'), None)
        })
    
    return sorted(recommendations, key=lambda x: x['estimated_monthly_savings'], reverse=True)

Kubernetes resource rightsizing (from VPA recommendations, covered in the Kubernetes post):

def generate_rightsizing_report(namespace: str) -> list[dict]:
    """Generate a rightsizing report comparing current requests vs VPA recommendations."""
    
    vpas = k8s_client.list_namespaced_custom_object(
        group="autoscaling.k8s.io",
        version="v1",
        namespace=namespace,
        plural="verticalpodautoscalers"
    )
    
    report = []
    for vpa in vpas['items']:
        deployment = vpa['spec']['targetRef']['name']
        
        if 'recommendation' not in vpa.get('status', {}):
            continue
        
        for container in vpa['status']['recommendation']['containerRecommendations']:
            current = get_current_requests(namespace, deployment, container['containerName'])
            recommended = container['target']
            
            # Calculate cost delta (approximate)
            cpu_delta_cores = (parse_cpu(current.get('cpu', '0')) - 
                              parse_cpu(recommended.get('cpu', '0')))
            mem_delta_gb = (parse_memory_gb(current.get('memory', '0')) - 
                           parse_memory_gb(recommended.get('memory', '0')))
            
            # AWS us-east-1 approximate rates
            monthly_savings = (cpu_delta_cores * 30 * 24 * 0.048 +   # $0.048/vCPU-hour
                              mem_delta_gb * 30 * 24 * 0.006)          # $0.006/GB-hour
            
            if monthly_savings > 10:  # Only flag significant savings
                report.append({
                    "deployment": deployment,
                    "container": container['containerName'],
                    "current_cpu": current.get('cpu'),
                    "recommended_cpu": recommended.get('cpu'),
                    "current_memory": current.get('memory'),
                    "recommended_memory": recommended.get('memory'),
                    "estimated_monthly_savings": monthly_savings
                })
    
    return sorted(report, key=lambda x: x['estimated_monthly_savings'], reverse=True)

Spot and Preemptible Instances: The 70% Discount

Spot instances (AWS) and preemptible instances (GCP) offer the same compute as on-demand instances at 60-90% discount, with the trade-off that they can be reclaimed with 2 minutes notice. Most workloads can be engineered to tolerate spot interruption.

Workloads appropriate for spot:

Batch processing, ML training, data pipelines
Kubernetes worker nodes running stateless services (with correct PodDisruptionBudgets)
CI/CD runners
Development and staging environments

Workloads NOT appropriate for spot:

Primary database instances
Single-replica stateful services
Anything requiring guaranteed availability during business hours

# Kubernetes: prefer spot nodes but fall back to on-demand
apiVersion: apps/v1
kind: Deployment
metadata:
  name: batch-processor
spec:
  template:
    spec:
      # Tolerate spot node interruption taints
      tolerations:
        - key: "node.kubernetes.io/spot"
          operator: "Exists"
          effect: "NoSchedule"
      
      # Prefer spot, fall back to on-demand
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              preference:
                matchExpressions:
                  - key: "node.kubernetes.io/capacity-type"
                    operator: In
                    values: ["spot", "preemptible"]
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: "node.kubernetes.io/capacity-type"
                    operator: In
                    values: ["spot", "preemptible", "on-demand"]  # Fall back to on-demand

Handling spot interruption gracefully:

# Spot interruption handler — run on every node as a DaemonSet
import requests
import subprocess
import time

def watch_for_interruption():
    """
    AWS provides a 2-minute warning before spot interruption via instance metadata.
    Use it to drain the node gracefully.
    """
    while True:
        try:
            # Check the interruption notice endpoint (only populated during interruption)
            response = requests.get(
                "http://169.254.169.254/latest/meta-data/spot/interruption-action",
                timeout=1
            )
            
            if response.status_code == 200:
                # Interruption is coming — drain this node
                node_name = get_current_node_name()
                
                subprocess.run([
                    "kubectl", "drain", node_name,
                    "--ignore-daemonsets",
                    "--delete-emptydir-data",
                    "--grace-period=90",   # 90 seconds to finish current work
                    "--timeout=100s"
                ])
                
                break  # Node will be terminated, no need to continue
                
        except requests.exceptions.Timeout:
            pass  # Metadata endpoint timeout = no interruption notice
        
        time.sleep(5)  # Poll every 5 seconds

Reserved Capacity and Savings Plans

For predictable baseline workloads, committed-use discounts offer 40-60% savings over on-demand pricing. The decision framework:

Compute Savings Plans (AWS): commit to a dollar/hour spend on compute. The flexibility to change instance types, regions, and operating systems makes this lower risk than Reserved Instances.

def analyze_savings_plan_opportunity(
    lookback_days: int = 90,
    commitment_years: int = 1
) -> SavingsPlanRecommendation:
    """
    Analyze historical compute spend to recommend Savings Plan commitment.
    """
    ce = boto3.client('cost-explorer')
    
    # Get historical on-demand compute spend
    historical_spend = ce.get_cost_and_usage(
        TimePeriod={
            'Start': (datetime.today() - timedelta(days=lookback_days)).strftime('%Y-%m-%d'),
            'End': datetime.today().strftime('%Y-%m-%d')
        },
        Granularity='DAILY',
        Filter={'Dimensions': {'Key': 'PURCHASE_TYPE', 'Values': ['On Demand']}},
        Metrics=['UnblendedCost']
    )
    
    daily_costs = [float(day['Total']['UnblendedCost']['Amount']) 
                   for day in historical_spend['ResultsByTime']]
    
    # Conservative: commit to p10 of daily spend (baseline, not peak)
    baseline_daily = sorted(daily_costs)[len(daily_costs) // 10]
    hourly_commitment = baseline_daily / 24
    
    # Savings: ~40% for 1-year, ~60% for 3-year
    discount = 0.40 if commitment_years == 1 else 0.60
    annual_savings = hourly_commitment * 8760 * discount
    
    return SavingsPlanRecommendation(
        hourly_commitment=hourly_commitment,
        annual_savings=annual_savings,
        commitment_years=commitment_years,
        confidence="conservative"  # p10 baseline minimizes risk of underutilization
    )

The commitment level matters: committing to baseline utilization (what the system uses 90% of the time) is low risk. Committing to peak utilization wastes money during off-peak periods when you've already paid for capacity you're not using.

Cost Anomaly Detection

Unexpected cost spikes — a forgotten load test left running, a misconfigured autoscaler, a data pipeline gone infinite loop — can add thousands of dollars to the monthly bill before anyone notices. Automated anomaly detection catches these before they compound.

# Cost anomaly detection using AWS Cost Anomaly Detection
import boto3

def setup_cost_anomaly_alerts():
    ce = boto3.client('cost-explorer')
    
    # Create a monitor for overall account spending
    monitor = ce.create_anomaly_monitor(
        AnomalyMonitor={
            'MonitorName': 'AllServicesMonitor',
            'MonitorType': 'DIMENSIONAL',
            'MonitorDimension': 'SERVICE'
        }
    )
    
    # Alert when anomaly exceeds $500 total impact or $100/day
    subscription = ce.create_anomaly_subscription(
        AnomalySubscription={
            'MonitorArnList': [monitor['MonitorArn']],
            'Subscribers': [
                {
                    'Address': 'sre-team@example.com',
                    'Type': 'EMAIL'
                },
                {
                    'Address': 'arn:aws:sns:us-east-1:123456789:cost-alerts',
                    'Type': 'SNS'
                }
            ],
            'Threshold': 500,        # Alert when impact > $500
            'Frequency': 'DAILY',
            'SubscriptionName': 'SRECostAlerts'
        }
    )

# For GCP: use Cloud Billing budget alerts
# For Azure: use Azure Cost Management budgets and alerts

Cost anomaly detection doesn't replace tagging — it's the safety net for costs that slip through. A well-tagged environment with per-service cost dashboards catches most surprises through regular review; anomaly detection catches the ones that reviewers miss.

*Zak Hassan is a Staff SRE specializing in FinOps, cloud infrastructure cost optimization, and reliability engineering. Find him at zakhassan.com or on LinkedIn.*

Topic Paths

SRE and Reliability Kubernetes and Platform Engineering Observability and Incident Learning Cloud Cost and Capacity

About the Author

Zak Hassan writes about reliability engineering under real scale constraints.

Staff-level SRE and platform engineer focused on identity reliability, Kubernetes, observability, cloud architecture, AI infrastructure, and reducing operational uncertainty.

Connect on LinkedIn