*By Zak Hassan — Staff SRE | May 2026*


Cloud costs are reliability's shadow metric. A team that over-provisions for reliability headroom wastes money; a team that under-provisions to save money creates reliability risk. The SRE who understands cost engineering can make reliability investments intelligently — buying the right amount of resilience for the actual risk — instead of either hoarding resources out of fear or cutting corners that will matter at 3am.

This is the operational side of cloud cost engineering: attribution, rightsizing, purchasing strategies, and the tools that make cost management a continuous discipline rather than a quarterly panic.


The Attribution Problem

The fundamental challenge in cloud cost management is attribution — knowing which team, service, or product is responsible for which cost. Without attribution, cost conversations are impossible: you can see the total bill going up, but you can't hold anyone accountable or make intelligent decisions about where to cut.

Tagging strategy: every resource in AWS/GCP/Azure should carry a minimum set of tags:

python
# Enforced tag set via AWS Config Rule or GCP Organization Policy
REQUIRED_TAGS = {
    "team": "The engineering team responsible (e.g., 'backend', 'data')",
    "service": "The application or service (e.g., 'checkout-service', 'ml-pipeline')",
    "environment": "production | staging | development",
    "cost_center": "Finance cost center code for chargeback",
}

# AWS Config rule to flag untagged resources
aws_config_rule = {
    "ConfigRuleName": "required-tags",
    "Source": {
        "Owner": "AWS",
        "SourceIdentifier": "REQUIRED_TAGS"
    },
    "InputParameters": json.dumps({
        "tag1Key": "team",
        "tag2Key": "service",
        "tag3Key": "environment",
        "tag4Key": "cost_center"
    })
}

Kubernetes workload costs require a different approach — tagging cloud instances doesn't tell you which Kubernetes service consumed which resources on a shared node:

python
# OpenCost / Kubecost: namespace-level cost attribution
# Query the OpenCost API for cost by namespace
def get_namespace_costs(window: str = "7d") -> dict:
    response = requests.get(
        f"http://opencost.monitoring:9003/allocation/compute",
        params={
            "window": window,
            "aggregate": "namespace",
            "accumulate": "true"
        }
    )
    
    allocations = response.json()["data"][0]
    
    costs_by_namespace = {}
    for namespace, data in allocations.items():
        costs_by_namespace[namespace] = {
            "total_cost": data["totalCost"],
            "cpu_cost": data["cpuCost"],
            "memory_cost": data["ramCost"],
            "storage_cost": data["pvCost"],
            "network_cost": data["networkCost"],
            "efficiency": data["totalEfficiency"]  # % of requested resources actually used
        }
    
    return dict(sorted(costs_by_namespace.items(), key=lambda x: x[1]["total_cost"], reverse=True))

Rightsizing: The Biggest Lever

Rightsizing — matching instance sizes to actual workload — is consistently the highest-ROI cloud cost reduction activity. Most production environments are significantly over-provisioned because developers over-specify resource requirements to avoid OOM kills, and the over-specification is never revisited.

AWS Compute Optimizer integration:

python
import boto3

def get_rightsizing_recommendations() -> list[dict]:
    optimizer = boto3.client('compute-optimizer', region_name='us-west-2')
    
    # Get EC2 instance recommendations
    ec2_recs = optimizer.get_ec2_instance_recommendations(
        filters=[{
            "name": "Finding",
            "values": ["Overprovisioned"]  # Only get downsizing opportunities
        }]
    )
    
    recommendations = []
    for rec in ec2_recs['instanceRecommendations']:
        current = rec['currentInstanceType']
        recommended = rec['recommendationOptions'][0]['instanceType']  # Best option
        monthly_savings = rec['recommendationOptions'][0]['projectedUtilizationMetrics']
        
        recommendations.append({
            "instance_id": rec['instanceArn'].split('/')[-1],
            "current_type": current,
            "recommended_type": recommended,
            "finding": rec['finding'],
            "estimated_monthly_savings": rec['recommendationOptions'][0].get('estimatedMonthlySavings', {}).get('value', 0),
            "utilization_p99_cpu": next((m['value'] for m in rec['utilizationMetrics'] if m['name'] == 'CPU' and m['statistic'] == 'MAXIMUM'), None)
        })
    
    return sorted(recommendations, key=lambda x: x['estimated_monthly_savings'], reverse=True)

Kubernetes resource rightsizing (from VPA recommendations, covered in the Kubernetes post):

python
def generate_rightsizing_report(namespace: str) -> list[dict]:
    """Generate a rightsizing report comparing current requests vs VPA recommendations."""
    
    vpas = k8s_client.list_namespaced_custom_object(
        group="autoscaling.k8s.io",
        version="v1",
        namespace=namespace,
        plural="verticalpodautoscalers"
    )
    
    report = []
    for vpa in vpas['items']:
        deployment = vpa['spec']['targetRef']['name']
        
        if 'recommendation' not in vpa.get('status', {}):
            continue
        
        for container in vpa['status']['recommendation']['containerRecommendations']:
            current = get_current_requests(namespace, deployment, container['containerName'])
            recommended = container['target']
            
            # Calculate cost delta (approximate)
            cpu_delta_cores = (parse_cpu(current.get('cpu', '0')) - 
                              parse_cpu(recommended.get('cpu', '0')))
            mem_delta_gb = (parse_memory_gb(current.get('memory', '0')) - 
                           parse_memory_gb(recommended.get('memory', '0')))
            
            # AWS us-east-1 approximate rates
            monthly_savings = (cpu_delta_cores * 30 * 24 * 0.048 +   # $0.048/vCPU-hour
                              mem_delta_gb * 30 * 24 * 0.006)          # $0.006/GB-hour
            
            if monthly_savings > 10:  # Only flag significant savings
                report.append({
                    "deployment": deployment,
                    "container": container['containerName'],
                    "current_cpu": current.get('cpu'),
                    "recommended_cpu": recommended.get('cpu'),
                    "current_memory": current.get('memory'),
                    "recommended_memory": recommended.get('memory'),
                    "estimated_monthly_savings": monthly_savings
                })
    
    return sorted(report, key=lambda x: x['estimated_monthly_savings'], reverse=True)

Spot and Preemptible Instances: The 70% Discount

Spot instances (AWS) and preemptible instances (GCP) offer the same compute as on-demand instances at 60-90% discount, with the trade-off that they can be reclaimed with 2 minutes notice. Most workloads can be engineered to tolerate spot interruption.

Workloads appropriate for spot:

  • Batch processing, ML training, data pipelines
  • Kubernetes worker nodes running stateless services (with correct PodDisruptionBudgets)
  • CI/CD runners
  • Development and staging environments

Workloads NOT appropriate for spot:

  • Primary database instances
  • Single-replica stateful services
  • Anything requiring guaranteed availability during business hours
yaml
# Kubernetes: prefer spot nodes but fall back to on-demand
apiVersion: apps/v1
kind: Deployment
metadata:
  name: batch-processor
spec:
  template:
    spec:
      # Tolerate spot node interruption taints
      tolerations:
        - key: "node.kubernetes.io/spot"
          operator: "Exists"
          effect: "NoSchedule"
      
      # Prefer spot, fall back to on-demand
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              preference:
                matchExpressions:
                  - key: "node.kubernetes.io/capacity-type"
                    operator: In
                    values: ["spot", "preemptible"]
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: "node.kubernetes.io/capacity-type"
                    operator: In
                    values: ["spot", "preemptible", "on-demand"]  # Fall back to on-demand

Handling spot interruption gracefully:

python
# Spot interruption handler — run on every node as a DaemonSet
import requests
import subprocess
import time

def watch_for_interruption():
    """
    AWS provides a 2-minute warning before spot interruption via instance metadata.
    Use it to drain the node gracefully.
    """
    while True:
        try:
            # Check the interruption notice endpoint (only populated during interruption)
            response = requests.get(
                "http://169.254.169.254/latest/meta-data/spot/interruption-action",
                timeout=1
            )
            
            if response.status_code == 200:
                # Interruption is coming — drain this node
                node_name = get_current_node_name()
                
                subprocess.run([
                    "kubectl", "drain", node_name,
                    "--ignore-daemonsets",
                    "--delete-emptydir-data",
                    "--grace-period=90",   # 90 seconds to finish current work
                    "--timeout=100s"
                ])
                
                break  # Node will be terminated, no need to continue
                
        except requests.exceptions.Timeout:
            pass  # Metadata endpoint timeout = no interruption notice
        
        time.sleep(5)  # Poll every 5 seconds

Reserved Capacity and Savings Plans

For predictable baseline workloads, committed-use discounts offer 40-60% savings over on-demand pricing. The decision framework:

Compute Savings Plans (AWS): commit to a dollar/hour spend on compute. The flexibility to change instance types, regions, and operating systems makes this lower risk than Reserved Instances.

python
def analyze_savings_plan_opportunity(
    lookback_days: int = 90,
    commitment_years: int = 1
) -> SavingsPlanRecommendation:
    """
    Analyze historical compute spend to recommend Savings Plan commitment.
    """
    ce = boto3.client('cost-explorer')
    
    # Get historical on-demand compute spend
    historical_spend = ce.get_cost_and_usage(
        TimePeriod={
            'Start': (datetime.today() - timedelta(days=lookback_days)).strftime('%Y-%m-%d'),
            'End': datetime.today().strftime('%Y-%m-%d')
        },
        Granularity='DAILY',
        Filter={'Dimensions': {'Key': 'PURCHASE_TYPE', 'Values': ['On Demand']}},
        Metrics=['UnblendedCost']
    )
    
    daily_costs = [float(day['Total']['UnblendedCost']['Amount']) 
                   for day in historical_spend['ResultsByTime']]
    
    # Conservative: commit to p10 of daily spend (baseline, not peak)
    baseline_daily = sorted(daily_costs)[len(daily_costs) // 10]
    hourly_commitment = baseline_daily / 24
    
    # Savings: ~40% for 1-year, ~60% for 3-year
    discount = 0.40 if commitment_years == 1 else 0.60
    annual_savings = hourly_commitment * 8760 * discount
    
    return SavingsPlanRecommendation(
        hourly_commitment=hourly_commitment,
        annual_savings=annual_savings,
        commitment_years=commitment_years,
        confidence="conservative"  # p10 baseline minimizes risk of underutilization
    )

The commitment level matters: committing to baseline utilization (what the system uses 90% of the time) is low risk. Committing to peak utilization wastes money during off-peak periods when you've already paid for capacity you're not using.


Cost Anomaly Detection

Unexpected cost spikes — a forgotten load test left running, a misconfigured autoscaler, a data pipeline gone infinite loop — can add thousands of dollars to the monthly bill before anyone notices. Automated anomaly detection catches these before they compound.

python
# Cost anomaly detection using AWS Cost Anomaly Detection
import boto3

def setup_cost_anomaly_alerts():
    ce = boto3.client('cost-explorer')
    
    # Create a monitor for overall account spending
    monitor = ce.create_anomaly_monitor(
        AnomalyMonitor={
            'MonitorName': 'AllServicesMonitor',
            'MonitorType': 'DIMENSIONAL',
            'MonitorDimension': 'SERVICE'
        }
    )
    
    # Alert when anomaly exceeds $500 total impact or $100/day
    subscription = ce.create_anomaly_subscription(
        AnomalySubscription={
            'MonitorArnList': [monitor['MonitorArn']],
            'Subscribers': [
                {
                    'Address': 'sre-team@example.com',
                    'Type': 'EMAIL'
                },
                {
                    'Address': 'arn:aws:sns:us-east-1:123456789:cost-alerts',
                    'Type': 'SNS'
                }
            ],
            'Threshold': 500,        # Alert when impact > $500
            'Frequency': 'DAILY',
            'SubscriptionName': 'SRECostAlerts'
        }
    )

# For GCP: use Cloud Billing budget alerts
# For Azure: use Azure Cost Management budgets and alerts

Cost anomaly detection doesn't replace tagging — it's the safety net for costs that slip through. A well-tagged environment with per-service cost dashboards catches most surprises through regular review; anomaly detection catches the ones that reviewers miss.


*Zak Hassan is a Staff SRE specializing in FinOps, cloud infrastructure cost optimization, and reliability engineering. Find him at zakhassan.com or on LinkedIn.*

Topic Paths

About the Author

Zak Hassan writes about reliability engineering under real scale constraints.

Staff-level SRE and platform engineer focused on identity reliability, Kubernetes, observability, cloud architecture, AI infrastructure, and reducing operational uncertainty.

Connect on LinkedIn