AKS: system node pool can't be K8s-version-bumped via Pulumi due to agentPoolProfiles ignoreChanges

## Problem

When running a K8s version upgrade on an AKS workload cluster via `ptd ensure --only-steps kubernetes --auto-apply`, **the system node pool (`agentpool`) is not upgraded**. Pulumi rolls the control plane and the user node pool(s), but leaves the system pool on the previous version. The operator has to run a manual `az aks nodepool upgrade --name agentpool --kubernetes-version <target>` after every hop.

This was discovered while running the `1.32.6 → 1.35.4` multi-hop upgrade on `internal-az-staging` on 2026-05-14/15 (see [posit-dev/ptd-workspace#15](https://github.com/posit-dev/ptd-workspace/pull/15) for the skill update that documents the workaround, and [rstudio/ptd-config#2899](https://github.com/rstudio/ptd-config/pull/2899) for the upgrade itself).

## Why it happens

The system pool is declared inline in `agentPoolProfiles` on the `ManagedCluster` resource ([`lib/steps/aks.go:129-166`](https://github.com/posit-dev/ptd/blob/main/lib/steps/aks.go)). User pools, by contrast, are managed as separate `azure-native:containerservice:AgentPool` Pulumi resources.

When user pools exist, [`lib/steps/aks.go:184-186`](https://github.com/posit-dev/ptd/blob/main/lib/steps/aks.go) adds `agentPoolProfiles` to Pulumi's `ignoreChanges` on the `ManagedCluster`:

```go
// Always ignore agentPoolProfiles when using separate AgentPool resources
// Azure reflects separate pools in this array, but we manage them as distinct resources
if len(userNodePools) > 0 {
    ignoreChanges = append(ignoreChanges, "agentPoolProfiles")
}
```

The comment explains the reason for the existence of this ignore: Azure ARM returns all pools (system + user) in `agentPoolProfiles` even when user pools are managed separately. Without the blanket ignore, Pulumi would see user pool entries as drift and try to delete them from the array.

The unintended side effect: the blanket ignore also covers the system pool's `orchestratorVersion`, so when we bump `clusterConfig.KubernetesVersion`, the change is applied to `ManagedCluster.kubernetesVersion` (control plane) but NOT to `agentPoolProfiles[0].orchestratorVersion` (the system pool entry — also driven from `clusterConfig.KubernetesVersion` in `aks.go:148`).

## Proposed fixes (pick one)

1. **Per-field `ignoreChanges`** — replace the blanket `agentPoolProfiles` ignore with something like `agentPoolProfiles[*].count`, `agentPoolProfiles[*].powerState`, etc., so Pulumi can still manage `orchestratorVersion`. Smallest change. Need to verify Pulumi's `ignoreChanges` syntax supports targeting all-but-first-element (system pool is index 0, user pools follow), or rely on per-field rather than per-index.
2. **Model system pool as a separate `AgentPool` resource** — bigger refactor but uniform with how user pools are modeled. Eliminates the dual-source-of-truth issue entirely.

## Operator workaround (current state)

The `cluster-upgrade` skill ([posit-dev/ptd-workspace#15](https://github.com/posit-dev/ptd-workspace/pull/15)) now documents the manual step:

```bash
az aks nodepool upgrade \
  --cluster-name <aks-cluster-name> \
  --resource-group <rg> \
  --name agentpool \
  --kubernetes-version <target> \
  --yes --no-wait
```

This adds ~15–25 min of wall-clock per hop and is easy to forget. Worth fixing for the next operator who does an AKS upgrade.

## Out of scope

`max_surge` is also hardcoded to `"10%"` in [`lib/steps/aks.go:160-162`](https://github.com/posit-dev/ptd/blob/main/lib/steps/aks.go) and [`:347-348`](https://github.com/posit-dev/ptd/blob/main/lib/steps/aks.go) — making it configurable would speed up rolls but is a nice-to-have, not a correctness issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AKS: system node pool can't be K8s-version-bumped via Pulumi due to agentPoolProfiles ignoreChanges #292

Problem

Why it happens

Proposed fixes (pick one)

Operator workaround (current state)

Out of scope

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

AKS: system node pool can't be K8s-version-bumped via Pulumi due to agentPoolProfiles ignoreChanges #292

Description

Problem

Why it happens

Proposed fixes (pick one)

Operator workaround (current state)

Out of scope

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions