GitOps in 2026: ArgoCD, Kargo, and the Progressive Delivery Stack

GitOps has won. The argument about whether declarative, git-driven infrastructure is a good idea is over — it is, and the industry has largely moved on to the harder question: how do you do it well at scale? The tooling ecosystem has matured significantly, and the patterns for progressive delivery, environment promotion, and multi-cluster management have crystallized.

This is the state of the GitOps stack in 2026, what's actually working in production, and where the interesting problems still are.

The Core GitOps Stack

The three-tier stack that most mature GitOps organizations have settled on:

Infrastructure layer: Terraform/OpenTofu + atlantis or TFC. Cloud resources (VPCs, RDS instances, EKS clusters, IAM roles) are defined in Terraform, stored in git, and applied through automation triggered by pull requests. Atlantis or Terraform Cloud handles the plan/apply workflow, approval gates, and state management. Changes to infrastructure require a PR, a plan review, and an approval before anything changes in the account.

Cluster configuration layer: ArgoCD or Flux. Kubernetes manifests, Helm charts, and Kustomize overlays live in git. ArgoCD or Flux continuously reconcile the cluster state to match what's in the repository. Drift — manual changes to the cluster that aren't reflected in git — is detected and either automatically corrected or flagged for review.

Application delivery layer: Argo Rollouts + Kargo. Application deployments move through environments (dev → staging → prod) via promotion workflows. Kargo (the newer piece of this stack) handles the environment promotion logic: when a new image passes staging validation, Kargo creates the PR to update the production environment manifest. Progressive delivery patterns (canary, blue-green) are managed by Argo Rollouts within each environment.

ArgoCD at Scale: What Actually Gets Hard

ArgoCD's Application model works well for tens or hundreds of applications. At thousands of applications across multiple clusters, a few things break down.

ApplicationSet is the answer to scale. Rather than defining individual ArgoCD Applications for each service, ApplicationSet generates Applications dynamically from a template and a generator (git directory structure, cluster list, or external data source). One ApplicationSet definition manages all 200 services in your monorepo:

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: services
  namespace: argocd
spec:
  generators:
  - git:
      repoURL: https://github.com/your-org/services
      revision: HEAD
      directories:
      - path: services/*
  template:
    metadata:
      name: '{{path.basename}}'
    spec:
      project: default
      source:
        repoURL: https://github.com/your-org/services
        targetRevision: HEAD
        path: '{{path}}'
      destination:
        server: https://kubernetes.default.svc
        namespace: '{{path.basename}}'
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
        - CreateNamespace=true

New service? Add a directory to services/. ArgoCD picks it up automatically. No manual Application creation.

Multi-cluster ArgoCD. Running ArgoCD at the management cluster level, deploying to spoke clusters, is the standard multi-cluster pattern. The management cluster hosts ArgoCD; the spoke clusters are registered as deployment targets. One control plane, N deployment targets.

The failure mode to watch: if the management cluster has an outage, ArgoCD cannot deploy to any spoke cluster during that window. Design the management cluster for high availability (multi-AZ, multiple replicas), and ensure your rollback capability doesn't depend on ArgoCD being available (kubectl apply from the same git repo should always work as a fallback).

Kargo: Environment Promotion as a First-Class Concept

Kargo is the piece of the GitOps stack that was missing for years. ArgoCD is excellent at keeping a cluster in sync with a git repository. It doesn't have opinions about how a new Docker image moves from the repository through dev → staging → production. Kargo fills that gap.

The Kargo model:

# A "Warehouse" watches for new artifacts (images, Helm charts)
apiVersion: kargo.akuity.io/v1alpha1
kind: Warehouse
metadata:
  name: my-service
  namespace: kargo-demo
spec:
  subscriptions:
  - image:
      repoURL: ghcr.io/your-org/my-service
      tagFormat: '^v\d+\.\d+\.\d+$'  # Semantic version tags only

---
# A "Stage" represents an environment
apiVersion: kargo.akuity.io/v1alpha1
kind: Stage
metadata:
  name: dev
  namespace: kargo-demo
spec:
  subscriptions:
    warehouse: my-service   # Pulls new artifacts from the Warehouse
  promotionMechanisms:
    gitRepoUpdates:
    - repoURL: https://github.com/your-org/gitops-repo
      writeBranch: main
      kustomize:
        images:
        - image: ghcr.io/your-org/my-service
          path: environments/dev

---
# Staging pulls from dev (not directly from Warehouse)
apiVersion: kargo.akuity.io/v1alpha1
kind: Stage
metadata:
  name: staging
  namespace: kargo-demo
spec:
  subscriptions:
    stages:
    - dev   # Only promotes what has passed dev
  promotionMechanisms:
    gitRepoUpdates:
    - repoURL: https://github.com/your-org/gitops-repo
      writeBranch: main
      kustomize:
        images:
        - image: ghcr.io/your-org/my-service
          path: environments/staging

Kargo knows about your artifact supply chain: which version is in dev, which is in staging, which is in production. It enforces promotion ordering — you can't skip staging. It can require human approval gates before production promotion. And it keeps a complete history of what was deployed to each environment and when.

Progressive Delivery with Argo Rollouts

Argo Rollouts extends Kubernetes Deployments with canary and blue-green strategies. The canary pattern:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: my-service
spec:
  replicas: 20
  strategy:
    canary:
      steps:
      - setWeight: 5      # 5% of traffic to canary
      - pause: {duration: 5m}
      - analysis:
          templates:
          - templateName: error-rate-check
      - setWeight: 20
      - pause: {duration: 10m}
      - analysis:
          templates:
          - templateName: error-rate-check
          - templateName: latency-check
      - setWeight: 50
      - pause: {duration: 10m}
      - setWeight: 100
  selector:
    matchLabels:
      app: my-service
  template:
    # ... pod template

The analysis steps are where this gets powerful. AnalysisTemplates query your metrics backend (Prometheus, Datadog, CloudWatch) and automatically promote or roll back based on the results:

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: error-rate-check
spec:
  metrics:
  - name: error-rate
    interval: 1m
    successCondition: result[0] < 0.01   # Error rate < 1%
    failureLimit: 2
    provider:
      prometheus:
        address: http://prometheus.monitoring.svc.cluster.local:9090
        query: |
          sum(rate(http_requests_total{
            app="my-service",
            status=~"5.."
          }[5m])) /
          sum(rate(http_requests_total{
            app="my-service"
          }[5m]))

If the error rate exceeds 1% twice during the analysis window, Argo Rollouts automatically rolls back. Zero human intervention for the happy path; automatic protection against bad deploys.

Drift Detection and the Honesty of GitOps

The most underappreciated value of GitOps is what it reveals about your actual operational practices. When you implement ArgoCD with auto-sync, you discover how often people are making manual changes to clusters that were never committed to git. The drift detection isn't just a security control — it's an audit of your operational culture.

Healthy GitOps organizations have drift rates near zero: everything in the cluster matches git, and manual changes are exceptional and immediately documented. Organizations struggling with GitOps adoption see drift regularly, usually because engineers are still in the habit of kubectl apply -f or helm upgrade directly against clusters.

The organizational change is harder than the technical implementation. GitOps requires that git is the source of truth even when it's inconvenient — including at 2am during an incident when the fastest fix is a direct kubectl patch. Building the muscle of "fix forward via git" rather than "fix directly in cluster and backport later" is a culture question, not a tooling question.

*Zak Hassan is a Staff SRE specializing in GitOps, platform engineering, and AI-powered infrastructure automation. Find him at zakhassan.com or on LinkedIn.*

Topic Paths

SRE and Reliability Kubernetes and Platform Engineering Observability and Incident Learning Cloud Cost and Capacity

About the Author

Zak Hassan writes about reliability engineering under real scale constraints.

Staff-level SRE and platform engineer focused on identity reliability, Kubernetes, observability, cloud architecture, AI infrastructure, and reducing operational uncertainty.

Connect on LinkedIn