Skip to content

[kube-prometheus-stack] Alertmanager PVC stuck Pending on GKE since v82+ — CSI rejects operator-injected empty selector #6845

Description

@bmailhe

Describe the bug a clear and concise description of what the bug is.

Describe the bug

After upgrading kube-prometheus-stack from 81.5.0 to 83.4.2, any Alertmanager with alertmanager.alertmanagerSpec.storage.volumeClaimTemplate configured fails to provision on GKE clusters. The PVC is rejected by the GCE PD CSI driver with:

ProvisioningFailed  pd.csi.storage.gke.io  claim Selector is not supported

The resulting Alertmanager pod is stuck Pending with FailedScheduling: running PreBind plugin "VolumeBinding": binding volumes: context deadline exceeded.

Downgrading back to 81.5.0 fixes it. prometheus volumeClaimTemplate appears unaffected in our setup — only alertmanager hits this.

Workaround

Pin chart to 81.5.0.

What's your helm version?

version.BuildInfo{Version:"v3.20.0", GitCommit:"b2e4314fa0f229a1de7b4c981273f61d69ee5a59", GitTreeState:"clean", GoVersion:"go1.25.6"}

What's your kubectl version?

Client Version: v1.32.3 Kustomize Version: v5.5.0 Server Version: v1.34.4-gke.1047000

Which chart?

kube-prometheus-stack

What's the chart version?

83.4.2

What happened?

The PersistentVolumeClaim materialized from the Alertmanager STS volumeClaimTemplates has spec.selector: {} — an empty but non-nil selector. GKE's PD CSI external-provisioner rejects any PVC with a selector, including empty.

Evidence on a running cluster (chart 83.4.2, operator v0.90.1):

  • Helm-rendered Alertmanager CR has no selector field
  • kubectl get --raw on the STS spec.volumeClaimTemplates[0].spec has no selector field
  • kubectl get --raw on the resulting PVC has "selector": {}
  • managedFields show kube-controller-manager owns f:selector: {} (the STS controller materializes it during PVC creation from the template)

What you expected to happen?

What you expected to happen?

The PVC should be created without a selector field (matching the Alertmanager CR and STS template), so the CSI provisioner accepts it.

How to reproduce it?

How to reproduce it?

  1. Deploy kube-prometheus-stack 83.4.2 on a GKE cluster (any PD-backed storage class: standard-rwo, premium-rwo, etc. — all funnel through pd.csi.storage.gke.io via CSI Migration).
  2. Configure alertmanager with persistence:
    alertmanager:
      alertmanagerSpec:
        storage:
          volumeClaimTemplate:
            spec:
              storageClassName: standard-rwo
              accessModes: ["ReadWriteOnce"]
              resources:
                requests:
                  storage: 5Gi
  3. PVC stays Pending with claim Selector is not supported; pod stays Pending.

Enter the changed values of values.yaml?

  alertmanager:
    alertmanagerSpec:
      storage:
        volumeClaimTemplate:
          spec:
            storageClassName: standard-rwo
            accessModes: ["ReadWriteOnce"]
            resources:
              requests:
                storage: 5Gi

Enter the command that you execute and failing/misfunctioning.

helm secrets upgrade -i prometheus-operator . -n monitoring --create-namespace -f myvalues.yaml

Anything else we need to know?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions