Skip to content

Migrate int and stg e2e jobs to slot-manager#80467

Draft
roivaz wants to merge 3 commits into
openshift:mainfrom
roivaz:migrate-int-and-stg-to-slot-manager
Draft

Migrate int and stg e2e jobs to slot-manager#80467
roivaz wants to merge 3 commits into
openshift:mainfrom
roivaz:migrate-int-and-stg-to-slot-manager

Conversation

@roivaz

@roivaz roivaz commented Jun 12, 2026

Copy link
Copy Markdown
Contributor
  • INT uses now a dedicated INT-only service principal
  • STG uses now a dedicated STG-only service principal and a new subscription in the RG tenant
  • Cleanup jobs also updated to use the new service principals and subscriptions

Summary by CodeRabbit

This PR migrates the ARO-HCP integration (INT) and staging (STG) end-to-end test jobs from the shared aro-hcp-e2e workflow to use the new aro-hcp-persistent-e2e workflow backed by slot-manager, which provides dedicated infrastructure resources with dedicated service principals.

Key changes:

  1. CI Job Configuration Updates — All parallel E2E test jobs for INT and STG environments across multiple CI configuration files (Azure-ARO-HCP-main.yaml, Azure-ARO-HCP-main__e2e.yaml, Azure-ARO-HCP-main__periodic.yaml) have been updated to:

    • Change vault secret profiles from generic int/stg to dedicated profiles int-rh/stg-rh
    • Switch the workflow from aro-hcp-e2e to aro-hcp-persistent-e2e
    • Remove the old leases block configurations (now managed by slot-manager)
  2. Cleanup Jobs — Periodic cleanup jobs (delete-expired-integration-resource-groups, delete-expired-stage-resource-groups) have been updated in Azure-ARO-HCP-main__periodic-cleanup.yaml to specify dedicated CUSTOMER_SUBSCRIPTION values and use the new int-rh/stg-rh vault secret profiles.

  3. New Persistent E2E Workflow — Added a new workflow definition (aro-hcp-persistent-e2e-workflow.yaml) that orchestrates E2E tests against pre-deployed/persistent environments, coordinating lease acquisition and release with slot-manager for subscription and container identity selection.

  4. Test Script Updates — Modified the persistent test execution script to read slot-manager allocations from aro-hcp-slot.env (when available) to determine the CUSTOMER_SUBSCRIPTION and deployment location, enabling dynamic subscription assignment per test execution.

  5. Boskos Slot Resources — Extended the Boskos configuration generator to define new quota-based slot resource types for INT and STG E2E work (aro-hcp-int-shard0-slot and aro-hcp-stg-shard0-slot), enabling slot-manager to allocate dedicated subscriptions to concurrent test jobs.

The migration enables INT and STG e2e jobs to run against dedicated infrastructure resources with isolated service principals, improving test reliability and resource isolation compared to the previous shared infrastructure model.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 12, 2026
@openshift-ci

openshift-ci Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

This PR introduces a new persistent E2E workflow execution pattern for ARO-HCP that replaces ephemeral resource leasing with slot-manager coordination. It adds workflow infrastructure, updates Azure authentication to support slot-based environments, configures new Boskos quota resources, and migrates CI jobs across multiple config files to use the new pattern.

Changes

Persistent E2E workflow infrastructure and CI job migration

Layer / File(s) Summary
Persistent E2E workflow definition and ownership
ci-operator/step-registry/aro-hcp/persistent-e2e/aro-hcp-persistent-e2e-workflow.yaml, ci-operator/step-registry/aro-hcp/persistent-e2e/aro-hcp-persistent-e2e-workflow.metadata.json, ci-operator/step-registry/aro-hcp/persistent-e2e/OWNERS
Introduces the persistent E2E workflow YAML with pre/test/post step references and best-effort post handling, adds workflow metadata declaring ownership, and sets approver/reviewer groups in the OWNERS file.
Azure login script updates for slot-manager coordination
ci-operator/step-registry/aro-hcp/test/persistent/aro-hcp-test-persistent-commands.sh
Updates the Azure login script to conditionally source slot environment variables from aro-hcp-slot.env when present and use CUSTOMER_SUBSCRIPTION instead of the previous SUBSCRIPTION_ID approach for subscription selection.
Boskos quota configuration for slot types
core-services/prow/02_config/generate-boskos.py, core-services/prow/02_config/_boskos.yaml
Extends the Boskos resource generator to define aro-hcp-int-shard0-slot and aro-hcp-stg-shard0-slot quota types and generates the corresponding resource entries in the configuration file.
CI job migrations to persistent-e2e workflow
ci-operator/config/Azure/ARO-HCP/Azure-ARO-HCP-main.yaml, ci-operator/config/Azure/ARO-HCP/Azure-ARO-HCP-main__e2e.yaml, ci-operator/config/Azure/ARO-HCP/Azure-ARO-HCP-main__periodic.yaml, ci-operator/config/Azure/ARO-HCP/Azure-ARO-HCP-main__periodic-cleanup.yaml
Migrates integration and stage E2E job definitions to the aro-hcp-persistent-e2e workflow, updates VAULT_SECRET_PROFILE to -rh variants across all config files, and removes ephemeral lease configurations from affected jobs.

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

lgtm, rehearsals-ack


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error)

Check name Status Explanation Resolution
No-Sensitive-Data-In-Logs ❌ Error Persistent script uses set -o xtrace and runs az login ... -p "${AZURE_CLIENT_SECRET}", so the client secret (customer data) can be printed to logs. Remove/disable xtrace when handling secrets (e.g., set +x before az login, set -x after) and avoid commands that echo/print secret or subscription values.
✅ Passed checks (14 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Migrate int and stg e2e jobs to slot-manager' directly and accurately describes the primary change across all modified files—migrating integration and stage E2E jobs to use slot-manager with new service principals and vault profiles.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR changes only Azure CI YAML and bash scripts; no Ginkgo test titles (It/Describe/Context/When) exist in repo/modified files to flag for dynamic content.
Test Structure And Quality ✅ Passed PR #80467 changes CI YAML, workflow metadata, and scripts (no Go/Ginkgo test files). Therefore Ginkgo test structure/quality requirements are not applicable here.
Microshift Test Compatibility ✅ Passed The PR diff adds/modifies only CI YAML/OWNERS/workflow/scripts/Boskos generator—no .go or ginkgo/Describe content—so no new Ginkgo e2e tests were introduced to assess for MicroShift compatibility.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR #80467 changes only CI/workflow YAML plus shell/Python helpers (no *_test.go or ginkgo/Describe/It additions), so no SNO assumptions to evaluate.
Topology-Aware Scheduling Compatibility ✅ Passed Verified PR’s changed CI/operator YAMLs, step-registry workflow/scripts, and Boskos config for scheduling keywords (nodeSelector/affinity/anti-affinity/topologySpread/PDB/control-plane labels) and...
Ote Binary Stdout Contract ✅ Passed PR #80467 only modifies CI YAML/JSON/shell/python files (no OTE/openshift-tests Go main/init/TestMain code), so no stdout JSON contract violations introduced.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed Searched the PR commit’s changed files: no Go files under e2e/ginkgo paths were added/modified; only CI configs and tooling tests changed, so no IPv6/disconnected Ginkgo test assumptions to flag.
No-Weak-Crypto ✅ Passed No weak cryptography patterns (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB) or non-constant-time secret comparisons detected in the 10 modified files. Changes consist of CI/CD configuration, workflow...
Container-Privileges ✅ Passed Scanned the PR-mentioned Azure CI YAMLs and aro-hcp persistent e2e workflow/scripts for privileged:true, hostPID/hostNetwork/hostIPC, SYS_ADMIN, allowPrivilegeEscalation, securityContext; none found.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci

openshift-ci Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: roivaz
Once this PR has been reviewed and has the lgtm label, please assign jmguzik for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@roivaz: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-Azure-ARO-HCP-main-integration-e2e-parallel Azure/ARO-HCP presubmit Ci-operator config changed
pull-ci-Azure-ARO-HCP-main-integration-e2e-parallel-ocp-fast Azure/ARO-HCP presubmit Ci-operator config changed
pull-ci-Azure-ARO-HCP-main-integration-e2e-parallel-ocp-nightly Azure/ARO-HCP presubmit Ci-operator config changed
pull-ci-Azure-ARO-HCP-main-integration-e2e-parallel-ocp-stable Azure/ARO-HCP presubmit Ci-operator config changed
pull-ci-Azure-ARO-HCP-main-stage-e2e-parallel Azure/ARO-HCP presubmit Ci-operator config changed
pull-ci-Azure-ARO-HCP-main-stage-e2e-parallel-ocp-fast Azure/ARO-HCP presubmit Ci-operator config changed
pull-ci-Azure-ARO-HCP-main-stage-e2e-parallel-ocp-nightly Azure/ARO-HCP presubmit Ci-operator config changed
pull-ci-Azure-ARO-HCP-main-stage-e2e-parallel-ocp-stable Azure/ARO-HCP presubmit Ci-operator config changed
pull-ci-Azure-ARO-HCP-main-prod-e2e-parallel Azure/ARO-HCP presubmit Registry content changed
pull-ci-Azure-ARO-HCP-main-prod-e2e-parallel-ocp-stable Azure/ARO-HCP presubmit Registry content changed
pull-ci-Azure-ARO-HCP-main-prod-e2e-parallel-ocp-fast Azure/ARO-HCP presubmit Registry content changed
pull-ci-Azure-ARO-HCP-main-prod-e2e-parallel-ocp-nightly Azure/ARO-HCP presubmit Registry content changed
periodic-ci-Azure-ARO-HCP-main-periodic-cleanup-delete-expired-integration-resource-groups N/A periodic Ci-operator config changed
periodic-ci-Azure-ARO-HCP-main-periodic-cleanup-delete-expired-stage-resource-groups N/A periodic Ci-operator config changed
periodic-ci-Azure-ARO-HCP-main-periodic-integration-e2e-parallel N/A periodic Ci-operator config changed
periodic-ci-Azure-ARO-HCP-main-periodic-stage-e2e-parallel N/A periodic Ci-operator config changed
periodic-ci-Azure-ARO-HCP-main-periodic-prod-e2e-parallel-ocp-nightly N/A periodic Registry content changed
periodic-ci-Azure-ARO-HCP-main-periodic-prod-e2e-parallel N/A periodic Registry content changed
periodic-ci-Azure-ARO-HCP-main-periodic-stage-e2e-parallel-ocp-nightly N/A periodic Ci-operator config changed

Prior to this PR being merged, you will need to either run and acknowledge or opt to skip these rehearsals.

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ci-operator/config/Azure/ARO-HCP/Azure-ARO-HCP-main.yaml`:
- Around line 239-240: VAULT_SECRET_PROFILE was changed to an int-rh value
without updating the step-level credential profile contracts, so runtime will
not find the mounted credentials; update the step credential declarations used
by the aro-hcp jobs to include the new "-rh" profile names (or revert
VAULT_SECRET_PROFILE to the original profile) so the step
contract/path-derivation logic exposes/mounts the declared profile;
specifically, ensure the steps that reference VAULT_SECRET_PROFILE (the
aro-hcp-persistent-e2e workflow and its related e2e/periodic/periodic-cleanup
job step definitions) declare and mount the "int-rh" (and any other migrated
"*-rh") profiles in their credentials/profile contract blocks to match the env
value.

In
`@ci-operator/step-registry/aro-hcp/test/persistent/aro-hcp-test-persistent-commands.sh`:
- Around line 14-20: When sourcing env_file="${SHARED_DIR}/aro-hcp-slot.env",
preserve the existing fallback behavior for CUSTOMER_SUBSCRIPTION so the script
won't fail under set -u if the env file doesn't set that variable: after
sourcing (inside the branch where env_file exists) export CUSTOMER_SUBSCRIPTION
using the current CUSTOMER_SUBSCRIPTION if present, otherwise read the
subscription from the cluster profile file (the same file used in the else
branch, referenced by CLUSTER_PROFILE_DIR/subscription-name); update the logic
in aro-hcp-test-persistent-commands.sh to perform this conditional export so
both the env file and the cluster-profile fallback are honored.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 87e3db46-4ec4-4fe5-b556-1b24529ce291

📥 Commits

Reviewing files that changed from the base of the PR and between 684bae3 and 343dc44.

📒 Files selected for processing (10)
  • ci-operator/config/Azure/ARO-HCP/Azure-ARO-HCP-main.yaml
  • ci-operator/config/Azure/ARO-HCP/Azure-ARO-HCP-main__e2e.yaml
  • ci-operator/config/Azure/ARO-HCP/Azure-ARO-HCP-main__periodic-cleanup.yaml
  • ci-operator/config/Azure/ARO-HCP/Azure-ARO-HCP-main__periodic.yaml
  • ci-operator/step-registry/aro-hcp/persistent-e2e/OWNERS
  • ci-operator/step-registry/aro-hcp/persistent-e2e/aro-hcp-persistent-e2e-workflow.metadata.json
  • ci-operator/step-registry/aro-hcp/persistent-e2e/aro-hcp-persistent-e2e-workflow.yaml
  • ci-operator/step-registry/aro-hcp/test/persistent/aro-hcp-test-persistent-commands.sh
  • core-services/prow/02_config/_boskos.yaml
  • core-services/prow/02_config/generate-boskos.py

Comment on lines +239 to +240
VAULT_SECRET_PROFILE: int-rh
workflow: aro-hcp-persistent-e2e

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Root cause: VAULT_SECRET_PROFILE values were migrated to *-rh without matching step-level credential profile contract updates across all affected configs.
Affected files: ci-operator/config/Azure/ARO-HCP/Azure-ARO-HCP-main.yaml, ci-operator/config/Azure/ARO-HCP/Azure-ARO-HCP-main__e2e.yaml, ci-operator/config/Azure/ARO-HCP/Azure-ARO-HCP-main__periodic.yaml, and ci-operator/config/Azure/ARO-HCP/Azure-ARO-HCP-main__periodic-cleanup.yaml.
All these jobs now rely on profile names that the referenced step contracts (and path derivation logic) do not currently show as mounted/declared, which can cause deterministic runtime auth/bootstrap failures.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ci-operator/config/Azure/ARO-HCP/Azure-ARO-HCP-main.yaml` around lines 239 -
240, VAULT_SECRET_PROFILE was changed to an int-rh value without updating the
step-level credential profile contracts, so runtime will not find the mounted
credentials; update the step credential declarations used by the aro-hcp jobs to
include the new "-rh" profile names (or revert VAULT_SECRET_PROFILE to the
original profile) so the step contract/path-derivation logic exposes/mounts the
declared profile; specifically, ensure the steps that reference
VAULT_SECRET_PROFILE (the aro-hcp-persistent-e2e workflow and its related
e2e/periodic/periodic-cleanup job step definitions) declare and mount the
"int-rh" (and any other migrated "*-rh") profiles in their credentials/profile
contract blocks to match the env value.

Comment on lines +14 to +20
env_file="${SHARED_DIR}/aro-hcp-slot.env"
if [[ -f "${env_file}" ]]; then
# shellcheck disable=SC1090
source "${env_file}"
export LOCATION="${SELECTED_LOCATION:-${LOCATION:-}}"
else
export CUSTOMER_SUBSCRIPTION; CUSTOMER_SUBSCRIPTION=$(cat "${CLUSTER_PROFILE_DIR}/subscription-name")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve the cluster-profile fallback for CUSTOMER_SUBSCRIPTION.

If Line 15 finds ${SHARED_DIR}/aro-hcp-slot.env but that file does not export CUSTOMER_SUBSCRIPTION, Line 24 aborts under set -u. The new branch makes the env file optional for discovery, but mandatory for this value.

Proposed fix
 env_file="${SHARED_DIR}/aro-hcp-slot.env"
 if [[ -f "${env_file}" ]]; then
     # shellcheck disable=SC1090
     source "${env_file}"
+    export CUSTOMER_SUBSCRIPTION="${CUSTOMER_SUBSCRIPTION:-$(< "${CLUSTER_PROFILE_DIR}/subscription-name")}"
     export LOCATION="${SELECTED_LOCATION:-${LOCATION:-}}"
 else
-    export CUSTOMER_SUBSCRIPTION; CUSTOMER_SUBSCRIPTION=$(cat "${CLUSTER_PROFILE_DIR}/subscription-name")
+    export CUSTOMER_SUBSCRIPTION="$(< "${CLUSTER_PROFILE_DIR}/subscription-name")"
 fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/aro-hcp/test/persistent/aro-hcp-test-persistent-commands.sh`
around lines 14 - 20, When sourcing env_file="${SHARED_DIR}/aro-hcp-slot.env",
preserve the existing fallback behavior for CUSTOMER_SUBSCRIPTION so the script
won't fail under set -u if the env file doesn't set that variable: after
sourcing (inside the branch where env_file exists) export CUSTOMER_SUBSCRIPTION
using the current CUSTOMER_SUBSCRIPTION if present, otherwise read the
subscription from the cluster profile file (the same file used in the else
branch, referenced by CLUSTER_PROFILE_DIR/subscription-name); update the logic
in aro-hcp-test-persistent-commands.sh to perform this conditional export so
both the env file and the cluster-profile fallback are honored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant