Capacity Forecasting for SREs: Time Series Models, Anomaly Detection, and Automated Scaling Triggers

*By Zak Hassan — Staff SRE | May 2026*

Most capacity planning conversations start the same way: someone pulls up a Grafana dashboard, draws a mental line through the last thirty days of CPU or memory data, and declares "capacity will hit the limit in about six weeks." That estimate gets entered into a spreadsheet, a ticket is filed, and everyone feels productive. The problem is that this approach — linear extrapolation — fails precisely when it matters most. Traffic growth compounds. Seasonal effects can double load overnight. A product launch can deliver a step-function jump that looks nothing like the gradual slope you were watching. By the time your linear model is visibly wrong, you are already in an incident, not a planning meeting.

The deeper issue is that most infrastructure teams reach for forecasting only reactively: they notice a threshold approaching and then scramble. A mature capacity practice inverts this. It runs forecasts continuously, decomposes signals into their structural components, flags anomalies before they escalate, and automatically surfaces scaling decisions with enough lead time to actually act on them.

Why Linear Extrapolation Fails

Traffic growth is not linear because businesses are not linear. A B2C product might see a 3x spike every December that crushes the projected trendline. A B2B SaaS platform may grow smoothly for months, then triple overnight when a large enterprise customer goes live. Microservice architectures compound this further: a single user request fans out across a dozen services, each with its own scaling characteristics, so the relationship between "user count" and "infrastructure demand" is nonlinear almost by definition.

There are three patterns that break naive forecasting consistently. First, seasonality: diurnal cycles, weekly patterns, annual peaks — if you fit a trend on two weeks of data that happened to span a public holiday, your baseline is already wrong. Second, step-function growth: a product launch, a viral moment, a partnership going live — these are discontinuities, not trends. A line fit before the step will dramatically underestimate load after it. Third, compounding across services: when Services A, B, and C each grow at 10% per month, the aggregate load on a shared data layer can grow at a significantly higher rate because request fan-out amplifies every input increment.

Time Series Decomposition

Before you can forecast accurately, you need to understand what you are forecasting. A raw Prometheus metric like container_cpu_usage_seconds_total is a mixture of several underlying signals: a long-term trend (the secular growth of the platform), seasonal components (weekly and daily cycles), and a residual (unexplained noise, which is where anomalies live).

The standard approach is STL decomposition — Seasonal and Trend decomposition using Loess. STL separates your time series into additive components: y(t) = trend(t) + seasonal(t) + residual(t). You must decompose before forecasting because a model that sees seasonal variation as trend will make wildly wrong predictions about a week or a month out. The residual component, once you strip out trend and seasonality, is also your most sensitive anomaly signal — spikes in the residual are real anomalies, not just the Monday morning traffic ramp.

import pandas as pd
from statsmodels.tsa.seasonal import STL
import matplotlib.pyplot as plt

def decompose_metric(series: pd.Series, period: int = 288) -> dict:
    """
    Decompose a time series into trend, seasonal, and residual components.
    period=288 assumes 5-minute intervals over a 24-hour day (288 samples/day).
    """
    stl = STL(series, period=period, robust=True)
    result = stl.fit()
    return {
        "trend": result.trend,
        "seasonal": result.seasonal,
        "residual": result.resid,
        "weights": result.weights,  # robust weights flag outlier observations
    }

# Example usage
# df is a DataFrame with a DatetimeIndex and a 'value' column
components = decompose_metric(df["value"], period=288)
print(f"Trend range: {components['trend'].min():.2f} – {components['trend'].max():.2f}")

The robust=True flag tells STL to down-weight outliers when fitting the seasonal and trend components, which means a single bad data point — a brief outage, a metrics collection gap — will not distort the entire decomposition.

Prophet for Capacity Forecasting

Facebook's Prophet library is well-suited for capacity forecasting because it was designed for exactly this domain: business time series with strong seasonal patterns, known future events, and irregular gaps. Unlike ARIMA, Prophet does not require stationarity and handles missing data gracefully. Its additive model explicitly represents trend, weekly seasonality, yearly seasonality, and user-defined holidays or events — which maps directly onto the "changepoint" model of capacity planning.

The workflow for SRE capacity forecasting: pull historical metric data from Prometheus, fit a Prophet model, register known future events (product launches, planned migrations, seasonal peaks) as additional regressors or changepoints, then generate a 90-day forecast with uncertainty intervals.

import requests
import pandas as pd
from prophet import Prophet
from datetime import datetime, timedelta

PROMETHEUS_URL = "http://prometheus.internal:9090"

def fetch_prometheus_range(query: str, days_back: int = 90) -> pd.DataFrame:
    """Pull a range query from Prometheus and return a Prophet-compatible DataFrame."""
    end = datetime.utcnow()
    start = end - timedelta(days=days_back)
    params = {
        "query": query,
        "start": start.isoformat() + "Z",
        "end": end.isoformat() + "Z",
        "step": "1h",  # hourly resolution for 90-day forecast
    }
    resp = requests.get(f"{PROMETHEUS_URL}/api/v1/query_range", params=params, timeout=30)
    resp.raise_for_status()
    data = resp.json()["data"]["result"]
    if not data:
        raise ValueError(f"No data returned for query: {query}")
    values = data[0]["values"]  # assume single time series
    df = pd.DataFrame(values, columns=["ds", "y"])
    df["ds"] = pd.to_datetime(df["ds"], unit="s", utc=True).dt.tz_localize(None)
    df["y"] = df["y"].astype(float)
    return df


def forecast_capacity(
    metric_query: str,
    provisioned_limit: float,
    forecast_days: int = 90,
    future_events: list[dict] | None = None,
) -> pd.DataFrame:
    """
    Fit a Prophet model on Prometheus history and generate a capacity forecast.

    future_events: list of dicts with keys 'holiday', 'ds', 'lower_window', 'upper_window'
    Example: [{"holiday": "big_launch", "ds": "2026-06-15", "lower_window": 0, "upper_window": 3}]
    """
    df = fetch_prometheus_range(metric_query, days_back=90)

    holidays = None
    if future_events:
        holidays = pd.DataFrame(future_events)
        holidays["ds"] = pd.to_datetime(holidays["ds"])

    model = Prophet(
        growth="linear",
        seasonality_mode="multiplicative",  # correct for traffic that scales with growth
        changepoint_prior_scale=0.05,       # regularise trend changepoints
        yearly_seasonality=False,           # not enough history for yearly
        weekly_seasonality=True,
        daily_seasonality=True,
        holidays=holidays,
        interval_width=0.90,                # 90% confidence intervals
    )

    model.fit(df)

    future = model.make_future_dataframe(periods=forecast_days * 24, freq="h")
    forecast = model.predict(future)

    # Tag rows where the upper bound of the forecast exceeds provisioned capacity
    forecast["provisioned_limit"] = provisioned_limit
    forecast["breach_risk"] = forecast["yhat_upper"] > provisioned_limit

    return forecast[["ds", "yhat", "yhat_lower", "yhat_upper", "provisioned_limit", "breach_risk"]]


# Run it
query = 'sum(rate(container_cpu_usage_seconds_total{namespace="production"}[5m])) by (cluster)'
future_events = [
    {"holiday": "q3_launch", "ds": "2026-07-01", "lower_window": 0, "upper_window": 7},
]
forecast_df = forecast_capacity(query, provisioned_limit=800.0, future_events=future_events)
print(forecast_df[forecast_df["breach_risk"]].head(10))

The multiplicative seasonality mode is important for growing systems: it models the seasonal swing as a fraction of the trend, so as the platform grows, the absolute amplitude of weekly traffic swings grows proportionally. An additive model assumes the swing is constant in absolute terms, which consistently underestimates weekend peak load six months from now.

Anomaly Detection

Anomaly detection on raw metrics is noisy and produces too many false positives to be operationally useful. The right signal is the residual component after decomposition — what is left once trend and seasonality are removed. A residual spike is genuinely anomalous because it cannot be explained by known patterns.

For univariate signals, z-score and IQR-based detection are simple and transparent. Z-score works well when residuals are approximately normally distributed; IQR is more robust when they are skewed or heavy-tailed.

import numpy as np
import pandas as pd

def zscore_anomaly_detector(
    residuals: pd.Series,
    window: int = 48,          # rolling window in samples (e.g., 48h at hourly resolution)
    threshold: float = 3.0,
) -> pd.Series:
    """
    Flag anomalies where the residual deviates more than `threshold` standard
    deviations from the rolling mean. Returns a boolean Series.
    """
    rolling_mean = residuals.rolling(window=window, center=True, min_periods=1).mean()
    rolling_std = residuals.rolling(window=window, center=True, min_periods=1).std()
    z_scores = (residuals - rolling_mean) / rolling_std.replace(0, np.nan)
    return z_scores.abs() > threshold


def iqr_anomaly_detector(residuals: pd.Series, multiplier: float = 1.5) -> pd.Series:
    """
    Flag anomalies outside [Q1 - multiplier*IQR, Q3 + multiplier*IQR].
    More robust than z-score when residuals have heavy tails.
    """
    q1 = residuals.quantile(0.25)
    q3 = residuals.quantile(0.75)
    iqr = q3 - q1
    lower = q1 - multiplier * iqr
    upper = q3 + multiplier * iqr
    return (residuals < lower) | (residuals > upper)

For multi-dimensional anomaly detection — for example, flagging unusual combinations of CPU, memory, and request latency simultaneously — DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is more appropriate. DBSCAN does not require you to specify the number of clusters in advance and naturally labels sparse points as noise (label -1), which is exactly your anomaly set.

from sklearn.preprocessing import StandardScaler
from sklearn.cluster import DBSCAN

def dbscan_anomaly_detector(
    feature_df: pd.DataFrame,
    eps: float = 0.5,
    min_samples: int = 10,
) -> pd.Series:
    """
    Multi-dimensional anomaly detection via DBSCAN.
    Points labelled -1 are anomalies (density outliers).

    feature_df: DataFrame where each column is a normalised metric (e.g., cpu, mem, latency_p99)
    """
    scaler = StandardScaler()
    scaled = scaler.fit_transform(feature_df.dropna())
    labels = DBSCAN(eps=eps, min_samples=min_samples, n_jobs=-1).fit_predict(scaled)
    result = pd.Series(-1, index=feature_df.index)
    result.iloc[: len(labels)] = labels
    return result == -1  # True where anomalous

Use z-score or IQR when you have a single well-understood metric and need a transparent, explainable signal. Use DBSCAN when you need to catch correlated failures — the kind that look borderline-normal on any single metric but are clearly aberrant when viewed together.

Connecting Forecasts to Scaling Decisions

The central tension in capacity planning is lead time. Cloud reserved instances require commitment weeks or months in advance. Even on-demand provisioning of persistent storage, database replicas, or egress bandwidth takes time. If your forecast only tells you "you will breach in five days," you have already lost the window for cost-effective action.

The right trigger is the upper confidence bound crossing a threshold, not the point forecast. This is exactly what the yhat_upper column in the Prophet output represents — it is the 90th percentile projection. Alerting on yhat_upper gives you an early warning that accounts for forecast uncertainty. For reserved capacity decisions, you want to alert when that upper bound is projected to breach your safety threshold within your procurement lead time.

from dataclasses import dataclass
from datetime import datetime

@dataclass
class CapacityAlert:
    metric: str
    current_provisioned: float
    forecasted_peak: float
    breach_date: datetime
    lead_time_days: int
    alert_level: str  # "warning" | "critical"

def check_capacity_forecast(
    forecast_df: pd.DataFrame,
    provisioned_limit: float,
    safety_margin: float = 0.80,    # alert when forecast hits 80% of limit
    lead_time_days: int = 21,       # procurement lead time
    metric_name: str = "cpu_cores",
) -> CapacityAlert | None:
    """
    Scan a Prophet forecast DataFrame and return a CapacityAlert if the
    upper-bound forecast is projected to breach the safety threshold
    within the procurement lead time window.
    """
    threshold = provisioned_limit * safety_margin
    horizon = datetime.utcnow() + pd.Timedelta(days=lead_time_days)

    # Only examine the future portion of the forecast
    future = forecast_df[forecast_df["ds"] > datetime.utcnow()].copy()
    future = future[future["ds"] <= horizon]

    breaches = future[future["yhat_upper"] >= threshold]
    if breaches.empty:
        return None  # No action needed within lead time

    first_breach = breaches.iloc[0]
    alert_level = "critical" if first_breach["yhat_upper"] >= provisioned_limit else "warning"

    return CapacityAlert(
        metric=metric_name,
        current_provisioned=provisioned_limit,
        forecasted_peak=float(first_breach["yhat_upper"]),
        breach_date=first_breach["ds"].to_pydatetime(),
        lead_time_days=lead_time_days,
        alert_level=alert_level,
    )

Wire this function into a weekly batch job and route the output to your ticketing system. A warning alert triggers a planning conversation; a critical alert — where the upper bound already exceeds the hard limit within your lead time — triggers an immediate provisioning action.

Traffic Pattern Classification and Autoscaling Implications

Not all traffic behaves the same way, and your autoscaling configuration should reflect the pattern of the service it governs. Bursty traffic — characterised by short, sharp spikes with low coefficient of variation between spikes — demands aggressive scale-out with short HPA stabilization windows (30–60 seconds). Setting a long stabilization window on a bursty service means Kubernetes will average out the spike before responding, and you absorb the latency impact.

Smooth, diurnal traffic — steady ramps up in the morning, plateaus during business hours, ramps back down at night — is better served by pre-scaling: a scheduled CronJob that scales the deployment up fifteen minutes before the expected ramp, rather than waiting for the HPA to react. Pre-scaling eliminates the cold-start latency tail during the morning traffic ramp, which is often when on-call alerts are most sensitive.

You can classify traffic patterns programmatically by examining the coefficient of variation of the residual component and the autocorrelation function of the raw series. High residual CV with low autocorrelation = bursty. Low residual CV with strong 24-hour autocorrelation = smooth diurnal.

Building a Capacity Planning Pipeline

A production-grade capacity planning pipeline has seven components: metric collection (Prometheus with a long-retention remote write backend like Thanos or Cortex), a dedicated forecasting job (a Python service running weekly via a Kubernetes CronJob), a result store (a simple PostgreSQL table with columns for metric, forecast date, yhat, yhat_lower, yhat_upper, and provisioned_limit), a capacity alert function (as above), a notification layer (PagerDuty for critical, Slack for warning), a dashboard (Grafana panel reading the forecast table), and a review cadence (monthly capacity review with engineering leads).

The most common failure mode is skipping the result store and re-running forecasts on demand. Without persisted forecasts, you cannot track forecast accuracy over time, which means you cannot tune your models or build organisational trust in the predictions. Store every forecast run with its timestamp and model parameters. After 90 days, compare yhat to actual observed values. Track mean absolute percentage error. Iterate.

The second most common failure mode is forecasting at the wrong granularity. A 90-day forecast at 5-minute resolution produces 26,000 data points and takes forever to fit. A 90-day forecast at hourly resolution gives you 2,160 points — enough to capture diurnal patterns and weekly seasonality, fast enough to run in a batch job without dedicated compute. Match your forecast resolution to your decision horizon: tactical autoscaling decisions need hourly or sub-hourly data; strategic capacity procurement decisions need daily or weekly aggregates.

Start with your three most resource-constrained services. Run the pipeline for 90 days. Present the first month of forecast results alongside actuals in your next capacity review. Adjust model parameters. Expand coverage. The goal is not a perfect forecast — it is a systematic process that catches capacity shortfalls three to four weeks before they become incidents.

*Zak Hassan is a Staff SRE specializing in observability, capacity planning, and distributed systems reliability. Find him at zakhassan.com or on LinkedIn.*

Topic Paths

SRE and Reliability Kubernetes and Platform Engineering Observability and Incident Learning AI Infrastructure and Operations Cloud Cost and Capacity

About the Author

Zak Hassan writes about reliability engineering under real scale constraints.

Staff-level SRE and platform engineer focused on identity reliability, Kubernetes, observability, cloud architecture, AI infrastructure, and reducing operational uncertainty.

Connect on LinkedIn