GitOps has won. The argument about whether declarative, git-driven infrastructure is a good idea is over — it is, and the industry has largely moved on to the harder question: how do you do it well at scale? The tooling ecosystem has matured significantly, and the patterns for progressive delivery, environment promotion, and multi-cluster management have crystallized.
This is the state of the GitOps stack in 2026, what's actually working in production, and where the interesting problems still are.
The Core GitOps Stack
The three-tier stack that most mature GitOps organizations have settled on:
Infrastructure layer: Terraform/OpenTofu + atlantis or TFC. Cloud resources (VPCs, RDS instances, EKS clusters, IAM roles) are defined in Terraform, stored in git, and applied through automation triggered by pull requests. Atlantis or Terraform Cloud handles the plan/apply workflow, approval gates, and state management. Changes to infrastructure require a PR, a plan review, and an approval before anything changes in the account.
Cluster configuration layer: ArgoCD or Flux. Kubernetes manifests, Helm charts, and Kustomize overlays live in git. ArgoCD or Flux continuously reconcile the cluster state to match what's in the repository. Drift — manual changes to the cluster that aren't reflected in git — is detected and either automatically corrected or flagged for review.
Application delivery layer: Argo Rollouts + Kargo. Application deployments move through environments (dev → staging → prod) via promotion workflows. Kargo (the newer piece of this stack) handles the environment promotion logic: when a new image passes staging validation, Kargo creates the PR to update the production environment manifest. Progressive delivery patterns (canary, blue-green) are managed by Argo Rollouts within each environment.
ArgoCD at Scale: What Actually Gets Hard
ArgoCD's Application model works well for tens or hundreds of applications. At thousands of applications across multiple clusters, a few things break down.
ApplicationSet is the answer to scale. Rather than defining individual ArgoCD Applications for each service, ApplicationSet generates Applications dynamically from a template and a generator (git directory structure, cluster list, or external data source). One ApplicationSet definition manages all 200 services in your monorepo:
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: services
namespace: argocd
spec:
generators:
- git:
repoURL: https://github.com/your-org/services
revision: HEAD
directories:
- path: services/*
template:
metadata:
name: '{{path.basename}}'
spec:
project: default
source:
repoURL: https://github.com/your-org/services
targetRevision: HEAD
path: '{{path}}'
destination:
server: https://kubernetes.default.svc
namespace: '{{path.basename}}'
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=trueNew service? Add a directory to services/. ArgoCD picks it up automatically. No manual Application creation.
Multi-cluster ArgoCD. Running ArgoCD at the management cluster level, deploying to spoke clusters, is the standard multi-cluster pattern. The management cluster hosts ArgoCD; the spoke clusters are registered as deployment targets. One control plane, N deployment targets.
The failure mode to watch: if the management cluster has an outage, ArgoCD cannot deploy to any spoke cluster during that window. Design the management cluster for high availability (multi-AZ, multiple replicas), and ensure your rollback capability doesn't depend on ArgoCD being available (kubectl apply from the same git repo should always work as a fallback).
Kargo: Environment Promotion as a First-Class Concept
Kargo is the piece of the GitOps stack that was missing for years. ArgoCD is excellent at keeping a cluster in sync with a git repository. It doesn't have opinions about how a new Docker image moves from the repository through dev → staging → production. Kargo fills that gap.
The Kargo model:
# A "Warehouse" watches for new artifacts (images, Helm charts)
apiVersion: kargo.akuity.io/v1alpha1
kind: Warehouse
metadata:
name: my-service
namespace: kargo-demo
spec:
subscriptions:
- image:
repoURL: ghcr.io/your-org/my-service
tagFormat: '^v\d+\.\d+\.\d+$' # Semantic version tags only
---
# A "Stage" represents an environment
apiVersion: kargo.akuity.io/v1alpha1
kind: Stage
metadata:
name: dev
namespace: kargo-demo
spec:
subscriptions:
warehouse: my-service # Pulls new artifacts from the Warehouse
promotionMechanisms:
gitRepoUpdates:
- repoURL: https://github.com/your-org/gitops-repo
writeBranch: main
kustomize:
images:
- image: ghcr.io/your-org/my-service
path: environments/dev
---
# Staging pulls from dev (not directly from Warehouse)
apiVersion: kargo.akuity.io/v1alpha1
kind: Stage
metadata:
name: staging
namespace: kargo-demo
spec:
subscriptions:
stages:
- dev # Only promotes what has passed dev
promotionMechanisms:
gitRepoUpdates:
- repoURL: https://github.com/your-org/gitops-repo
writeBranch: main
kustomize:
images:
- image: ghcr.io/your-org/my-service
path: environments/stagingKargo knows about your artifact supply chain: which version is in dev, which is in staging, which is in production. It enforces promotion ordering — you can't skip staging. It can require human approval gates before production promotion. And it keeps a complete history of what was deployed to each environment and when.
Progressive Delivery with Argo Rollouts
Argo Rollouts extends Kubernetes Deployments with canary and blue-green strategies. The canary pattern:
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: my-service
spec:
replicas: 20
strategy:
canary:
steps:
- setWeight: 5 # 5% of traffic to canary
- pause: {duration: 5m}
- analysis:
templates:
- templateName: error-rate-check
- setWeight: 20
- pause: {duration: 10m}
- analysis:
templates:
- templateName: error-rate-check
- templateName: latency-check
- setWeight: 50
- pause: {duration: 10m}
- setWeight: 100
selector:
matchLabels:
app: my-service
template:
# ... pod templateThe analysis steps are where this gets powerful. AnalysisTemplates query your metrics backend (Prometheus, Datadog, CloudWatch) and automatically promote or roll back based on the results:
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: error-rate-check
spec:
metrics:
- name: error-rate
interval: 1m
successCondition: result[0] < 0.01 # Error rate < 1%
failureLimit: 2
provider:
prometheus:
address: http://prometheus.monitoring.svc.cluster.local:9090
query: |
sum(rate(http_requests_total{
app="my-service",
status=~"5.."
}[5m])) /
sum(rate(http_requests_total{
app="my-service"
}[5m]))If the error rate exceeds 1% twice during the analysis window, Argo Rollouts automatically rolls back. Zero human intervention for the happy path; automatic protection against bad deploys.
Drift Detection and the Honesty of GitOps
The most underappreciated value of GitOps is what it reveals about your actual operational practices. When you implement ArgoCD with auto-sync, you discover how often people are making manual changes to clusters that were never committed to git. The drift detection isn't just a security control — it's an audit of your operational culture.
Healthy GitOps organizations have drift rates near zero: everything in the cluster matches git, and manual changes are exceptional and immediately documented. Organizations struggling with GitOps adoption see drift regularly, usually because engineers are still in the habit of kubectl apply -f or helm upgrade directly against clusters.
The organizational change is harder than the technical implementation. GitOps requires that git is the source of truth even when it's inconvenient — including at 2am during an incident when the fastest fix is a direct kubectl patch. Building the muscle of "fix forward via git" rather than "fix directly in cluster and backport later" is a culture question, not a tooling question.
*Zak Hassan is a Staff SRE specializing in GitOps, platform engineering, and AI-powered infrastructure automation. Find him at zakhassan.com or on LinkedIn.*
Topic Paths