Skip to content

fix: gate Grafana ingress forward-auth annotation on site opt-in#320

Draft
amdove wants to merge 1 commit into
mainfrom
fix-grafana-ingress-forward-auth-gate
Draft

fix: gate Grafana ingress forward-auth annotation on site opt-in#320
amdove wants to merge 1 commit into
mainfrom
fix-grafana-ingress-forward-auth-gate

Conversation

@amdove

@amdove amdove commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Description

Grafana's in-cluster ingress returned HTTP 404 on workloads that don't enable traefik-forward-auth. The Grafana Helm ingress always stamped a traefik.ingress.kubernetes.io/router.middlewares annotation referencing the kube-system-traefik-forward-auth-main and kube-system-traefik-forward-auth-add-forwarded-headers middlewares. Those middlewares are only created by the clusters step when a site sets use_traefik_forward_auth: true, so on workloads without forward-auth the ingress referenced a non-existent middleware and Traefik invalidated the router.

The annotation is now gated on the main site's opt-in. When forward-auth is not enabled, Grafana routes straight through to its local-account login.

Root cause

Regression from the eks/cluster Python→Go migration (commit ae26379), which folded the previously-unconditional global traefik-forward-auth middleware into per-site, flag-gated logic — but the Grafana ingress kept stamping the annotation unconditionally. It surfaced on academy01-production (the only workload reaching Grafana via the public grafana.* ingress; others port-forward) after its cluster-version bump re-ran the Go steps.

Code Flow

The ingress-map construction in awsHelmGrafana (lib/steps/helm_aws.go) is extracted into a pure helper grafanaIngressValues(domain, sites). It builds the base ingress (enabled, hosts, path) and only adds the annotations key with the two-middleware string when sites["main"] is present and Spec.UseTraefikForwardAuth is true. Production behavior is byte-for-byte identical to the prior unconditional path when the flag is set.

Test plan

  • New unit test TestGrafanaIngressValues in lib/steps/helm_aws_test.go covers: flag on → annotation present and equals the exact two-middleware string; flag off → no annotation; no main site → no annotation; nil sites → no annotation; and basic hosts/path/enabled values.
  • go test ./steps/... passes.

Category of change

  • Bug fix (non-breaking change which fixes an issue)
  • Version upgrade (upgrading the version of a service or product)
  • New feature (non-breaking change which adds functionality)
  • Build: a code change that affects the build system or external dependencies
  • Performance: a code change that improves performance
  • Refactor: a code change that neither fixes a bug nor adds a feature
  • Documentation: documentation changes
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist

  • I have reviewed my own diff and added inline comments on lines I want reviewers to focus on or that I am uncertain about

🤖 Generated with Claude Code

The Grafana Helm ingress unconditionally stamped the
kube-system-traefik-forward-auth-main / -add-forwarded-headers middleware
annotation. Those middlewares are only created by the clusters step when a
site sets use_traefik_forward_auth: true, so on workloads that don't enable
forward-auth the ingress referenced a non-existent middleware and Traefik
invalidated the router (HTTP 404).

Only add the annotation when the main site opts into forward-auth; otherwise
Grafana routes straight through to its local-account login.

Regression from the eks/cluster Python->Go migration (ae26379), which folded
the previously-unconditional global traefik-forward-auth middleware into
per-site, flag-gated logic.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant