A consistent policy & compliance layer ensures platform guardrails are predictable, observable, progressive, and reversible. This document outlines how to use Kyverno (cluster runtime admission / mutation / validation) and Checkov (CI Infrastructure-as-Code scanning) under the same GitOps promotion model (App‑of‑Apps) to prevent last‑minute surprises.
- Enforce baseline security, reliability, and hygiene controls early (CI) and at runtime (admission).
- Provide safe adoption path: ALL policies start non‑blocking (audit) before graduated enforcement.
- Avoid lockouts (exclude system / bootstrap namespaces, phased enablement per cluster).
- Offer transparent visibility (Kyverno policy reports + report-ui UI + CI scan outputs).
- Permit tightly scoped, traceable exceptions (labels/annotations, dedicated exception Policies) – not ad‑hoc cluster edits.
In Scope: Kubernetes resource policies (Pods, Deployments, Ingress/Gateway, Secrets config), Helm chart templates (pre-commit / PR) (Checkov). Out of Scope: Runtime container behavior (handled by other controls) and cloud account CSPM (separate program).
Kyverno operates inside each cluster as an admission controller applying:
- Validate rules (block or audit) – e.g. require labels, disallow privileged, enforce host suffix patterns.
- Mutate rules – inject defaults (resource limits, securityContext).
- Generate rules – create derivative resources (ConfigMaps, NetworkPolicies) based on source objects.
- VerifyImages (optional) – signature / attestation enforcement.
Kyverno advantages: Native K8s resources (no custom DSL needed), policy CRDs stored in Git, integrates cleanly with App‑of‑Apps for multi‑cluster rollout.
All new policies follow this state machine:
Draft (local branch) → Audit (dev) → Audit (staging) → Audit (prod) → Enforce (dev) → Enforce (staging) → Enforce (prod)
Rules:
- Minimum soak time in audit (collect report data) before first enforce step.
- No direct Draft → Enforce jumps.
- Regressions (excess violations) revert to Audit via Git commit (fast rollback).
Key spec fields:
validate:
failureAction: Audit # later switched to Enforce
failureActionOverrides: # fine-grained exceptions (optional)
- action: Audit
namespaces: ["job-runner"]Exclude critical namespaces from initial enforcement to prevent cluster bootstrap or core service disruption:
kube-system,kube-public,kube-node-lease- Argo CD control plane namespace
- Kyverno’s own namespace
- CNI / CSI driver namespaces
- Any sealed-secrets, cert-manager, gateway controller namespaces (early phases)
Implementation patterns:
- Policy
matchselector restricts to application namespaces via label (e.g.team/app) instead of excluding system namespaces repeatedly. excludeblock for explicit carve-outs until workloads remediated:
match:
any:
- resources:
kinds: [Pod]
namespaces: ["*"]
exclude:
any:
- resources:
namespaces: ["kube-system", "kyverno", "argocd"]Prefer structured exception channels, not ad-hoc disabling:
- Dedicated label (e.g.
policy.exception/<rule>=approved) added via pull request with reviewer sign‑off. - Kyverno rule references label in a
preconditionorpatternnegation. - Time-bounded exceptions (tracked in Git with TODO / expiry annotation).
Example (skip privileged check if approved):
preconditions:
all:
- key: "{{ request.object.metadata.labels.policy.exception/allow-privileged }}"
operator: NotEquals
value: "true"If teams need conditional resources (jobs, migrations), provide a separate Generate NetworkPolicy or allowlist policy limited by annotation.
Install Kyverno Policy Reports + report-ui to surface:
- Violations grouped by policy → cluster → namespace.
- Trend lines (decreasing violation rate before enforcing).
- Drilldown: object YAML diff vs expected pattern.
Argo CD sync ensures report-ui is versioned like policies. Grafana (optional) can scrape kyverno_policy_results_total metrics for alerting (e.g. spike after new chart release).
Operational checks:
| Check | Command | Outcome |
|---|---|---|
| Policy CR status | kubectl get policyreports |
Shows counts (pass/warn/fail) |
| Specific rule | kubectl get clusterpolicy disallow-privileged -o yaml |
Confirms failureAction |
| Report UI | Browser → /kyverno/ path |
Lists policies & violations |
| Phase | Focus | Actions |
|---|---|---|
| 0 | Visibility | Deploy Kyverno + policies (Audit) + report-ui |
| 1 | Hygiene | Enforce low-risk (labels, resource limits) |
| 2 | Security Baseline | Enforce no privileged / hostPath / forbidden capabilities |
| 3 | Image Integrity | Add VerifyImages (signatures), still audit first |
| 4 | Supply Chain | Enforce provenance attestations (later) |
Graduation criteria: sustained low (near-zero) audit violations for 1–2 release cycles in staging.
Checkov scans Terraform, Kubernetes manifests (Helm-rendered), and other IaC artifacts before merge:
- Prevents late-stage admission failures (e.g. disallowed capabilities) by failing PR early.
- Ensures cloud infra (buckets, networks, IAM) aligns with baseline (encryption, versioning, least privilege) complementary to runtime Kyverno (which only sees K8s API objects).
Pipeline pattern:
- Render Helm templates (
helm template) for changed charts. - Run Checkov on rendered manifests + Terraform modules (scoped to diff paths).
- Output SARIF / JUnit for PR comment + artifact retention.
- Block merge on critical failed checks.
Alignment with Kyverno:
- Same naming conventions (
cluster,environmentlabels) enforced both in CI (Checkov custom policies) and at runtime (Kyverno validate rules). - Reduces noise: by the time manifests reach cluster, most structural violations already fixed.
- Developer adds new Deployment chart change lacking resource limits.
- CI renders chart → Checkov fails (missing limits) → PR updated with limits.
- Merged; Argo CD syncs dev cluster; Kyverno (Audit) still records 0 violations.
- After audit stability, platform team flips corresponding Kyverno policy to
Enforcein dev. - Promote revision to staging (still Enforce) after soak → finally to prod.
- Future similar PRs blocked earlier (CI) + runtime policy prevents drift.
| Task | Action | Tool |
|---|---|---|
| Add new policy | Author ClusterPolicy (Audit), commit to dev values / policies dir | GitOps + Argo |
| Review violations | Open report-ui, filter by namespace / policy | UI |
| Promote to staging | Merge branch updating failureAction (still Audit) |
Git |
| Enforce policy | Change failureAction: Enforce after thresholds |
Git |
| Add exception | PR adding label/annotation + (optional) policy precondition tweak | Git |
| Rollback enforcement | Revert commit (Enforce → Audit) | Git |
| Monitor spike | Alert on metric delta (violations/min) | Grafana |
- Kyverno supplies runtime guardrails; EVERY policy begins in Audit to gather signal safely.
- Namespace/resource exclusions + label-scoped matching prevent accidental platform disruption.
- Exceptions are structured, labeled, reviewable, not ad hoc edits.
- report-ui + metrics provide clear visibility to drive confident enforcement decisions.
- Checkov gives shift‑left assurance: infrastructure & manifest misconfigurations are surfaced pre‑merge, shrinking Kyverno violation noise.
- App‑of‑Apps enables gradual, cluster-by-cluster enablement and controlled promotion of enforcement states through the same branching/tag strategy used for traffic & observability.
Adopt iteratively: visibility first (Audit), then low-risk hygiene, then privilege/security, then supply chain controls.