Migrate int and stg e2e jobs to slot-manager#80467
Conversation
|
Skipping CI for Draft Pull Request. |
WalkthroughThis PR introduces a new persistent E2E workflow execution pattern for ARO-HCP that replaces ephemeral resource leasing with slot-manager coordination. It adds workflow infrastructure, updates Azure authentication to support slot-based environments, configures new Boskos quota resources, and migrates CI jobs across multiple config files to use the new pattern. ChangesPersistent E2E workflow infrastructure and CI job migration
🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error)
✅ Passed checks (14 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: roivaz The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
[REHEARSALNOTIFIER]
Prior to this PR being merged, you will need to either run and acknowledge or opt to skip these rehearsals. Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ci-operator/config/Azure/ARO-HCP/Azure-ARO-HCP-main.yaml`:
- Around line 239-240: VAULT_SECRET_PROFILE was changed to an int-rh value
without updating the step-level credential profile contracts, so runtime will
not find the mounted credentials; update the step credential declarations used
by the aro-hcp jobs to include the new "-rh" profile names (or revert
VAULT_SECRET_PROFILE to the original profile) so the step
contract/path-derivation logic exposes/mounts the declared profile;
specifically, ensure the steps that reference VAULT_SECRET_PROFILE (the
aro-hcp-persistent-e2e workflow and its related e2e/periodic/periodic-cleanup
job step definitions) declare and mount the "int-rh" (and any other migrated
"*-rh") profiles in their credentials/profile contract blocks to match the env
value.
In
`@ci-operator/step-registry/aro-hcp/test/persistent/aro-hcp-test-persistent-commands.sh`:
- Around line 14-20: When sourcing env_file="${SHARED_DIR}/aro-hcp-slot.env",
preserve the existing fallback behavior for CUSTOMER_SUBSCRIPTION so the script
won't fail under set -u if the env file doesn't set that variable: after
sourcing (inside the branch where env_file exists) export CUSTOMER_SUBSCRIPTION
using the current CUSTOMER_SUBSCRIPTION if present, otherwise read the
subscription from the cluster profile file (the same file used in the else
branch, referenced by CLUSTER_PROFILE_DIR/subscription-name); update the logic
in aro-hcp-test-persistent-commands.sh to perform this conditional export so
both the env file and the cluster-profile fallback are honored.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 87e3db46-4ec4-4fe5-b556-1b24529ce291
📒 Files selected for processing (10)
ci-operator/config/Azure/ARO-HCP/Azure-ARO-HCP-main.yamlci-operator/config/Azure/ARO-HCP/Azure-ARO-HCP-main__e2e.yamlci-operator/config/Azure/ARO-HCP/Azure-ARO-HCP-main__periodic-cleanup.yamlci-operator/config/Azure/ARO-HCP/Azure-ARO-HCP-main__periodic.yamlci-operator/step-registry/aro-hcp/persistent-e2e/OWNERSci-operator/step-registry/aro-hcp/persistent-e2e/aro-hcp-persistent-e2e-workflow.metadata.jsonci-operator/step-registry/aro-hcp/persistent-e2e/aro-hcp-persistent-e2e-workflow.yamlci-operator/step-registry/aro-hcp/test/persistent/aro-hcp-test-persistent-commands.shcore-services/prow/02_config/_boskos.yamlcore-services/prow/02_config/generate-boskos.py
| VAULT_SECRET_PROFILE: int-rh | ||
| workflow: aro-hcp-persistent-e2e |
There was a problem hiding this comment.
Root cause: VAULT_SECRET_PROFILE values were migrated to *-rh without matching step-level credential profile contract updates across all affected configs.
Affected files: ci-operator/config/Azure/ARO-HCP/Azure-ARO-HCP-main.yaml, ci-operator/config/Azure/ARO-HCP/Azure-ARO-HCP-main__e2e.yaml, ci-operator/config/Azure/ARO-HCP/Azure-ARO-HCP-main__periodic.yaml, and ci-operator/config/Azure/ARO-HCP/Azure-ARO-HCP-main__periodic-cleanup.yaml.
All these jobs now rely on profile names that the referenced step contracts (and path derivation logic) do not currently show as mounted/declared, which can cause deterministic runtime auth/bootstrap failures.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ci-operator/config/Azure/ARO-HCP/Azure-ARO-HCP-main.yaml` around lines 239 -
240, VAULT_SECRET_PROFILE was changed to an int-rh value without updating the
step-level credential profile contracts, so runtime will not find the mounted
credentials; update the step credential declarations used by the aro-hcp jobs to
include the new "-rh" profile names (or revert VAULT_SECRET_PROFILE to the
original profile) so the step contract/path-derivation logic exposes/mounts the
declared profile; specifically, ensure the steps that reference
VAULT_SECRET_PROFILE (the aro-hcp-persistent-e2e workflow and its related
e2e/periodic/periodic-cleanup job step definitions) declare and mount the
"int-rh" (and any other migrated "*-rh") profiles in their credentials/profile
contract blocks to match the env value.
| env_file="${SHARED_DIR}/aro-hcp-slot.env" | ||
| if [[ -f "${env_file}" ]]; then | ||
| # shellcheck disable=SC1090 | ||
| source "${env_file}" | ||
| export LOCATION="${SELECTED_LOCATION:-${LOCATION:-}}" | ||
| else | ||
| export CUSTOMER_SUBSCRIPTION; CUSTOMER_SUBSCRIPTION=$(cat "${CLUSTER_PROFILE_DIR}/subscription-name") |
There was a problem hiding this comment.
Preserve the cluster-profile fallback for CUSTOMER_SUBSCRIPTION.
If Line 15 finds ${SHARED_DIR}/aro-hcp-slot.env but that file does not export CUSTOMER_SUBSCRIPTION, Line 24 aborts under set -u. The new branch makes the env file optional for discovery, but mandatory for this value.
Proposed fix
env_file="${SHARED_DIR}/aro-hcp-slot.env"
if [[ -f "${env_file}" ]]; then
# shellcheck disable=SC1090
source "${env_file}"
+ export CUSTOMER_SUBSCRIPTION="${CUSTOMER_SUBSCRIPTION:-$(< "${CLUSTER_PROFILE_DIR}/subscription-name")}"
export LOCATION="${SELECTED_LOCATION:-${LOCATION:-}}"
else
- export CUSTOMER_SUBSCRIPTION; CUSTOMER_SUBSCRIPTION=$(cat "${CLUSTER_PROFILE_DIR}/subscription-name")
+ export CUSTOMER_SUBSCRIPTION="$(< "${CLUSTER_PROFILE_DIR}/subscription-name")"
fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/aro-hcp/test/persistent/aro-hcp-test-persistent-commands.sh`
around lines 14 - 20, When sourcing env_file="${SHARED_DIR}/aro-hcp-slot.env",
preserve the existing fallback behavior for CUSTOMER_SUBSCRIPTION so the script
won't fail under set -u if the env file doesn't set that variable: after
sourcing (inside the branch where env_file exists) export CUSTOMER_SUBSCRIPTION
using the current CUSTOMER_SUBSCRIPTION if present, otherwise read the
subscription from the cluster profile file (the same file used in the else
branch, referenced by CLUSTER_PROFILE_DIR/subscription-name); update the logic
in aro-hcp-test-persistent-commands.sh to perform this conditional export so
both the env file and the cluster-profile fallback are honored.
Summary by CodeRabbit
This PR migrates the ARO-HCP integration (INT) and staging (STG) end-to-end test jobs from the shared
aro-hcp-e2eworkflow to use the newaro-hcp-persistent-e2eworkflow backed by slot-manager, which provides dedicated infrastructure resources with dedicated service principals.Key changes:
CI Job Configuration Updates — All parallel E2E test jobs for INT and STG environments across multiple CI configuration files (
Azure-ARO-HCP-main.yaml,Azure-ARO-HCP-main__e2e.yaml,Azure-ARO-HCP-main__periodic.yaml) have been updated to:int/stgto dedicated profilesint-rh/stg-rharo-hcp-e2etoaro-hcp-persistent-e2eleasesblock configurations (now managed by slot-manager)Cleanup Jobs — Periodic cleanup jobs (
delete-expired-integration-resource-groups,delete-expired-stage-resource-groups) have been updated inAzure-ARO-HCP-main__periodic-cleanup.yamlto specify dedicatedCUSTOMER_SUBSCRIPTIONvalues and use the newint-rh/stg-rhvault secret profiles.New Persistent E2E Workflow — Added a new workflow definition (
aro-hcp-persistent-e2e-workflow.yaml) that orchestrates E2E tests against pre-deployed/persistent environments, coordinating lease acquisition and release with slot-manager for subscription and container identity selection.Test Script Updates — Modified the persistent test execution script to read slot-manager allocations from
aro-hcp-slot.env(when available) to determine theCUSTOMER_SUBSCRIPTIONand deployment location, enabling dynamic subscription assignment per test execution.Boskos Slot Resources — Extended the Boskos configuration generator to define new quota-based slot resource types for INT and STG E2E work (
aro-hcp-int-shard0-slotandaro-hcp-stg-shard0-slot), enabling slot-manager to allocate dedicated subscriptions to concurrent test jobs.The migration enables INT and STG e2e jobs to run against dedicated infrastructure resources with isolated service principals, improving test reliability and resource isolation compared to the previous shared infrastructure model.