*By Zak Hassan — Staff SRE | May 2026*
Most teams treat container security as a checklist item: run a scanner, fix the CVEs flagged red, ship. That mindset produces a false sense of security. The container model introduces a layered attack surface that spans the build pipeline, the image registry, the Kubernetes admission path, and the running workload itself. Each layer is a potential breach point, and a gap in any one of them can undermine the hardening you've done everywhere else. This post walks through the full stack — from writing a secure Dockerfile to enforcing runtime behavioral controls — with enough practical tooling to go from "teams have a scanner" to "teams have a security posture."
The Container Attack Surface
Containers are not virtual machines. They share the host kernel, and that boundary is enforced by Linux namespaces and cgroups — both of which have meaningful escape paths. When a container runs as root (UID 0), it runs with the same UID as root on the host. If a vulnerability in the container runtime, a misconfigured volume mount, or a kernel exploit allows a process to break out of the namespace, it lands as root on the node. From there, it can read secrets from other pods, tamper with the kubelet socket, or exfiltrate data from mounted host paths.
Privilege escalation within a container is also easier than most engineers expect. The --privileged flag gives a container full access to all Linux capabilities and all host devices. Even without --privileged, capabilities like CAP_NET_ADMIN, CAP_SYS_PTRACE, and CAP_DAC_OVERRIDE each create distinct escalation paths. allowPrivilegeEscalation: true (the Kubernetes default unless explicitly set otherwise) lets a process inside the container gain more privileges than its parent — the classic vector for setuid binary abuse.
The isolation is real but not absolute. Defense in depth means assuming the isolation will fail and making every other layer count.
Image Scanning in CI
Catching known vulnerabilities before an image reaches a cluster is the lowest-cost mitigation available. Trivy and Grype are the two tools worth integrating; both scan the OS package layer, language-specific manifests (Go, Python, Node, Rust), and Dockerfile misconfigurations.
The key decision is whether scanning is informational or blocking. Informational scanning generates noise that teams learn to ignore. Blocking on CRITICAL CVEs — with a clearly documented exception path — makes the policy actionable.
Here is a GitHub Actions workflow that blocks the pipeline on any critical severity finding and uploads a full SARIF report to the Security tab:
# .github/workflows/image-scan.yml
name: Container Image Scan
on:
push:
branches: [main]
pull_request:
jobs:
scan:
runs-on: ubuntu-latest
permissions:
contents: read
security-events: write
steps:
- uses: actions/checkout@v4
- name: Build image
run: docker build -t myapp:${{ github.sha }} .
- name: Run Trivy vulnerability scan (blocking)
uses: aquasecurity/trivy-action@master
with:
image-ref: myapp:${{ github.sha }}
format: table
exit-code: "1" # fail the build
severity: CRITICAL
ignore-unfixed: true # skip CVEs with no available fix
- name: Run Trivy full scan (SARIF for Security tab)
if: always()
uses: aquasecurity/trivy-action@master
with:
image-ref: myapp:${{ github.sha }}
format: sarif
output: trivy-results.sarif
severity: CRITICAL,HIGH,MEDIUM
- name: Upload SARIF
if: always()
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: trivy-results.sarifThe base image update problem is orthogonal to per-PR scanning: a FROM ubuntu:22.04 image that was clean on Monday will accumulate CVEs by Thursday without any code change. The fix is scheduled re-scanning of promoted images in your registry (ECR, GAR, Harbor all support this natively) combined with automated PRs that bump the base image digest when a new patched version is available. Renovate handles this well with the docker manager enabled.
Secure Dockerfile Patterns
A secure Dockerfile applies four principles: use a minimal base, build in multiple stages, never bake secrets into layers, and run as a non-root user.
# Stage 1: build
FROM golang:1.22-bookworm AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -trimpath -ldflags="-s -w" -o /app/server ./cmd/server
# Stage 2: minimal runtime image — no shell, no package manager, no root
FROM gcr.io/distroless/static-debian12:nonroot
# Copy binary only — no source, no build tools, no OS packages
COPY --from=builder /app/server /server
# nonroot UID defined in the distroless image (65532)
USER nonroot:nonroot
EXPOSE 8080
ENTRYPOINT ["/server"]The distroless/static image contains only the application binary and the CA certificates bundle. There is no shell (/bin/sh does not exist), which eliminates an entire class of post-exploitation techniques. Multi-stage builds mean the Go toolchain, intermediate build artifacts, and any dev dependencies never appear in the final image layer history. Secrets — API keys, database passwords — must come in at runtime via environment variables injected by the orchestrator, not ARG or ENV instructions at build time (Docker's build cache makes ARG values recoverable from intermediate layers).
Admission Controllers: Enforcing Policy at the Gate
Image scanning and Dockerfile hygiene are preventive controls in the developer workflow. Admission controllers are the enforcement point in the cluster: every resource creation or update passes through them before it is persisted to etcd. OPA/Gatekeeper and Kyverno both work well; Kyverno's native Kubernetes YAML syntax has a gentler learning curve for teams that aren't already writing Rego.
The following Kyverno ClusterPolicy enforces four security standards: containers must not run as root, images must come from an approved registry, resource limits must be set, and pods must carry a required label set.
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: enforce-security-standards
annotations:
policies.kyverno.io/description: >
Enforces baseline security standards: no-root containers,
approved registry, resource limits, required labels.
spec:
validationFailureAction: Enforce
background: true
rules:
- name: require-non-root
match:
any:
- resources:
kinds: [Pod]
validate:
message: "Containers must not run as root (runAsNonRoot: true required)."
pattern:
spec:
containers:
- securityContext:
runAsNonRoot: true
allowPrivilegeEscalation: false
- name: approved-registry
match:
any:
- resources:
kinds: [Pod]
validate:
message: "Images must be pulled from registry.mycompany.com."
pattern:
spec:
containers:
- image: "registry.mycompany.com/*"
- name: require-resource-limits
match:
any:
- resources:
kinds: [Pod]
validate:
message: "CPU and memory limits are required on all containers."
pattern:
spec:
containers:
- resources:
limits:
memory: "?*"
cpu: "?*"
- name: require-labels
match:
any:
- resources:
kinds: [Pod]
validate:
message: "Pods must have 'app', 'team', and 'env' labels."
pattern:
metadata:
labels:
app: "?*"
team: "?*"
env: "?*"Setting validationFailureAction: Enforce makes these hard blocks. During rollout, start with Audit to surface violations without disrupting existing workloads, then flip to Enforce once the backlog is cleared.
Runtime Security with Falco
Admission controllers police what is declared. Falco polices what actually happens at runtime by monitoring the Linux kernel via eBPF, evaluating a stream of system calls against a rule engine, and alerting when behavior deviates from the expected baseline. It catches things no static policy can: a process spawning a shell inside a container, an unexpected outbound connection, a binary writing to /etc, or a container attempting to read the Docker socket.
Falco ships with a useful default ruleset, but writing custom rules for your environment is where the real signal-to-noise improvement comes from.
# /etc/falco/rules.d/myapp-rules.yaml
# Alert when any process inside a container spawns an interactive shell
- rule: Shell Spawned in Container
desc: >
Detects any shell process spawned inside a running container.
Legitimate applications should not require a shell at runtime.
condition: >
spawned_process
and container
and not container.image.repository in (allowed_shell_images)
and proc.name in (shell_binaries)
output: >
Shell spawned in container
(user=%user.name user_loginuid=%user.loginuid
container_id=%container.id image=%container.image.repository
shell=%proc.name parent=%proc.pname cmdline=%proc.cmdline)
priority: WARNING
tags: [container, shell, mitre_execution]
# Alert on unexpected writes to sensitive directories
- rule: Write to Sensitive Directory
desc: Detects writes to /etc, /usr, or /bin inside a container.
condition: >
open_write
and container
and (fd.name startswith /etc/ or fd.name startswith /usr/ or fd.name startswith /bin/)
and not proc.name in (package_mgmt_binaries)
output: >
Sensitive directory write in container
(user=%user.name file=%fd.name container=%container.id image=%container.image.repository)
priority: ERROR
tags: [container, filesystem, mitre_persistence]
# Alert if a container process tries to access the host Docker socket
- rule: Docker Socket Access
desc: Container attempting to access /var/run/docker.sock.
condition: >
open_read
and container
and fd.name = /var/run/docker.sock
output: >
Docker socket accessed from container
(user=%user.name container=%container.id image=%container.image.repository)
priority: CRITICAL
tags: [container, escape, mitre_privilege_escalation]Route Falco alerts to your SIEM or alertmanager via the Falco Sidekick sidecar, which supports Slack, PagerDuty, Datadog, Elasticsearch, and a dozen other outputs with a single Helm values configuration.
Supply Chain Security: Signing Images with Cosign
Knowing that an image passed scanning at build time is not the same as knowing the image running in production-like lab environments is the one that passed scanning. Supply chain attacks — where a legitimate image is replaced or tampered with between build and deploy — are addressed by cryptographic signing.
Cosign (part of the Sigstore project) signs images by attaching a signature to the image manifest in the registry. The signature is verifiable with the corresponding public key at any point downstream, including at admission time.
# Generate a key pair (for keyless signing, use OIDC instead)
cosign generate-key-pair
# Sign the image after pushing to the registry
cosign sign --key cosign.key registry.mycompany.com/myapp:v1.2.3
# Verify before deploying (can be scripted into CI or admission webhook)
cosign verify \
--key cosign.pub \
registry.mycompany.com/myapp:v1.2.3Pair this with a Kyverno ClusterPolicy using the verifyImages rule type to enforce signature verification at admission — unsigned images from your registry are then rejected by the cluster itself. SLSA (Supply-chain Levels for Software Artifacts) provides the broader framework: at SLSA Level 2, you require a signed provenance attestation proving the image was built by your CI system from a specific source commit, not assembled manually.
Network Policies: Microsegmentation in Kubernetes
By default, all pods in a Kubernetes cluster can communicate with all other pods across all namespaces. A compromised pod can therefore reach the database, your internal metrics endpoint, your secret store API, and every other pod in the cluster. Kubernetes NetworkPolicy resources allow you to define an explicit allow-list model: start with a deny-all baseline, then open only the traffic paths that are required.
# 1. Default deny-all for a namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {} # applies to all pods in the namespace
policyTypes:
- Ingress
- Egress
---
# 2. Allow the frontend to reach only the API service
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-api
namespace: production
spec:
podSelector:
matchLabels:
app: api
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080
---
# 3. Allow the API to reach only the database and DNS
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-api-egress
namespace: production
spec:
podSelector:
matchLabels:
app: api
policyTypes:
- Egress
egress:
- to:
- podSelector:
matchLabels:
app: postgres
ports:
- protocol: TCP
port: 5432
- to: # DNS resolution
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- protocol: UDP
port: 53NetworkPolicy enforcement requires a CNI that supports it — Cilium, Calico, and Weave all do; the default kubenet does not. Cilium's extended policy model (via CiliumNetworkPolicy) allows you to write L7 rules that restrict based on HTTP method, path, or gRPC service — useful when zero-trust requirements exceed what L4 TCP rules can express.
Putting It Together
Container security is a defense-in-depth problem. No single control is sufficient: a signed image can still run as root; a non-root container with no resource limits can still exhaust node capacity; a perfectly written Dockerfile can still communicate to arbitrary endpoints without network policies. The stack described here covers the full lifecycle — build-time (Dockerfile patterns, image scanning, signing), deploy-time (admission controllers, network policy), and runtime (Falco). Instrument all three layers, alert on deviations at each layer, and treat policy violations as incidents rather than warnings. The teams that get this right are the ones that make security a platform capability, not a pre-deploy checklist.
*Zak Hassan is a Staff SRE specializing in Kubernetes security, platform engineering, and reliability tooling. Find him at zakhassan.com or on LinkedIn.*
Topic Paths