OCPBUGS-87841: ci-operator/step-registry/ipi/install/install: Unpack log-bundle archives#80479
OCPBUGS-87841: ci-operator/step-registry/ipi/install/install: Unpack log-bundle archives#80479wking wants to merge 1 commit into
Conversation
…ives LeakTK currently redacts the entire archive if it finds concerning content, like [1]: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-techpreview/2063201609579499520/artifacts/e2e-aws-ovn-techpreview/ipi-install-install/artifacts/log-bundle-20260606110924.tar This file contained potentially sensitive information and has been removed. If we unpack the archive first, LeakTK will scan and redact at the file level, and not at the whole-archive level. And folks who know which file they want will be able to access that file without pulling down the entire archive. As a downside for me, I often don't know which file I want, and downloading the archive and unpacking locally was a convenient way to get all the bits for local grep. But I'll survive, and an entirely-redacted archive was helping nobody. [1]: https://redhat.atlassian.net/browse/OCPBUGS-87841
|
@wking: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughThe PR modifies the IPI install step to extract log bundle archives into the shared artifact directory instead of copying the archive files themselves. This changes what artifacts become available downstream from the install step. ChangesLog Bundle Extraction
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Suggested labels
Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning)
✅ Passed checks (13 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@ci-operator/step-registry/ipi/install/install/ipi-install-install-commands.sh`:
- Line 111: The single tar invocation using the glob
"${dir}"/log-bundle-*.tar.gz fails when the glob matches multiple archives;
replace that single command with logic that iterates over each matching file and
extracts them individually into "${ARTIFACT_DIR}" (i.e., for each file matching
"${dir}"/log-bundle-*.tar.gz, skip if no match, and run tar -xzf on that file
with -C "${ARTIFACT_DIR}" redirecting stderr as before) so each archive is
unpacked separately.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 19d08bcc-e4fe-4a40-a0ce-35bf4d061a06
📒 Files selected for processing (1)
ci-operator/step-registry/ipi/install/install/ipi-install-install-commands.sh
|
|
||
| current_time=$(date +%s) | ||
| cp "${dir}"/log-bundle-*.tar.gz "${ARTIFACT_DIR}/" 2>/dev/null | ||
| tar -xzf "${dir}"/log-bundle-*.tar.gz -C "${ARTIFACT_DIR}/" 2>/dev/null |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="ci-operator/step-registry/ipi/install/install/ipi-install-install-commands.sh"
# Show shell header and around the target line, plus any log-bundle handling
echo "== File header =="
sed -n '1,80p' "$FILE" | nl -ba | sed 's/^/LINE /'
echo
echo "== log-bundle occurrences (with context) =="
rg -n "log-bundle" "$FILE" || true
echo
echo "== Around line 111 =="
sed -n '90,140p' "$FILE" | nl -ba | sed 's/^/LINE /'Repository: openshift/release
Length of output: 140
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# tar behavior experiment: simulate wildcard expansion into multiple archives
set -euo pipefail
tmp=$(mktemp -d)
echo "tmp=$tmp"
src="$tmp/src"
mkdir -p "$src"
dest="$tmp/dest"
mkdir -p "$dest"
# Create two distinct tar.gz archives
mkdir -p "$src/a" "$src/b"
echo "from a" > "$src/a/file-a.txt"
echo "from b" > "$src/b/file-b.txt"
# Need GNU tar to create .tar.gz
tar -czf "$tmp/log-bundle-1.tar.gz" -C "$src/a" file-a.txt
tar -czf "$tmp/log-bundle-2.tar.gz" -C "$src/b" file-b.txt
echo "== Extract using wildcard-expanded multi-archive argument (matches original pattern) =="
rm -rf "$dest"/*
# This replicates: tar -xzf "${dir}"/log-bundle-*.tar.gz -C "${ARTIFACT_DIR}/"
# where the glob expands to multiple .tar.gz paths.
tar -xzf "$tmp"/log-bundle-*.tar.gz -C "$dest" 2>&1 | head -n 50 || true
echo "Extracted files:"
find "$dest" -maxdepth 2 -type f -print | sed 's/^/ - /'
echo "Contents:"
for f in $(find "$dest" -type f | sort); do
echo "---- $f"
cat "$f"
done
echo
echo "== Extract using loop over each archive (proposed fix) =="
rm -rf "$dest"/*
for bundle in "$tmp"/log-bundle-*.tar.gz; do
tar -xzf "$bundle" -C "$dest" 2>/dev/null
done
echo "Extracted files:"
find "$dest" -maxdepth 2 -type f -print | sed 's/^/ - /'Repository: openshift/release
Length of output: 477
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "tar version:"
tar --version 2>/dev/null | head -n 5 || trueRepository: openshift/release
Length of output: 344
🌐 Web query:
GNU tar: behavior when extracting with -f and multiple archive filenames on command line treated as members
💡 Result:
In GNU tar, the -f (or --file) option is used to specify the archive file or device to operate on [1][2][3]. When you provide multiple -f options on the command line, GNU tar interprets them as successive volumes of a single multi-volume archive, not as multiple separate archives to extract simultaneously [1][4][5]. When extracting with -f, any arguments appearing on the command line that are not options (i.e., non-option arguments) are treated by GNU tar as the names of archive members to be extracted from that archive [6][7][8][9]. If no member names are provided as non-option arguments, GNU tar performs the operation on the entire contents of the archive [6][7]. In summary: - Multiple -f options: These define a sequence of volumes for a single multi-volume archive [1][4][5]. - Non-option arguments after -f: These are treated as the specific files (members) within that archive to act upon [6][7][9]. If you intend to extract from multiple distinct archive files, you must run separate tar commands for each archive, as GNU tar does not support extracting from multiple independent archive files in a single invocation [1][10]. Attempting to provide multiple filenames as arguments to a single -f option (e.g., tar -x -f archive1.tar archive2.tar) would result in tar treating 'archive2.tar' as an archive member name to look for inside 'archive1.tar', rather than as a separate archive file [6][7][3].
Citations:
- 1: https://www.gnu.org/software/tar/manual/html_chapter/Media.html
- 2: https://www.gnu.org/software/tar/manual/html_node/Device.html
- 3: http://www.gnu.org/s/tar/manual/html_section/file.html
- 4: https://www.gnu.org/software/tar/manual/html_node/Multi_002dVolume-Archives.html
- 5: https://www.gnu.org/software/tar/manual/html_section/Using-Multiple-Tapes.html
- 6: https://www.gnu.org/software/tar/manual/html_node/Synopsis.html
- 7: https://www.gnu.org/software/tar/manual/html_section/Synopsis.html
- 8: https://www.man7.org/linux/man-pages/man1/tar.1.html
- 9: https://man.archlinux.org/man/tar.1.en
- 10: https://www.gnu.org/software/tar/manual/tar.html
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="ci-operator/step-registry/ipi/install/install/ipi-install-install-commands.sh"
echo "== Shebang/top 40 lines =="
sed -n '1,40p' "$FILE" | cat -n
echo
echo "== log-bundle related lines (near occurrences) =="
rg -n "log-bundle" "$FILE"
echo
echo "== Around line 100-130 (with line numbers) =="
awk 'NR>=90 && NR<=130 {printf "%4d: %s\n", NR, $0}' "$FILE"Repository: openshift/release
Length of output: 4768
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="ci-operator/step-registry/ipi/install/install/ipi-install-install-commands.sh"
echo "== Around log-bundle workaround (240-320) =="
sed -n '240,320p' "$FILE" | nl -ba | sed 's/^/LINE /'Repository: openshift/release
Length of output: 167
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="ci-operator/step-registry/ipi/install/install/ipi-install-install-commands.sh"
awk 'NR>=240 && NR<=320 {printf "%4d: %s\n", NR, $0}' "$FILE"Repository: openshift/release
Length of output: 4408
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="ci-operator/step-registry/ipi/install/install/ipi-install-install-commands.sh"
echo "== log-bundle in this file =="
rg -n "log-bundle" "$FILE"
echo
echo "== Other occurrences repo-wide (limited) =="
rg -n "log-bundle-\\*\\.tar\\.gz" . -S || true
echo
echo "== Any creation of log bundles (tar -czf + log-bundle) repo-wide (limited) =="
rg -n "tar\\s+-c[^\\n]*log-bundle" . -S || true
echo
echo "== Any deletion/cleanup of log bundle archives repo-wide (limited) =="
rg -n "rm\\s+-f[^\\n]*log-bundle|rm\\s+-r[^\\n]*log-bundle" . -S || trueRepository: openshift/release
Length of output: 6274
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="ci-operator/step-registry/ipi/install/install/ipi-install-install-commands.sh"
rg -n "nullglob|shopt" "$FILE" || trueRepository: openshift/release
Length of output: 43
Fix log-bundle extraction when the glob matches multiple archives
At line 111, tar -xzf "${dir}"/log-bundle-*.tar.gz ... fails when the glob expands to more than one log-bundle-*.tar.gz (GNU tar treats the extra filenames as members of the first archive), so additional log bundles won’t be extracted into ${ARTIFACT_DIR}.
Suggested fix
- tar -xzf "${dir}"/log-bundle-*.tar.gz -C "${ARTIFACT_DIR}/" 2>/dev/null
+ shopt -s nullglob
+ for bundle in "${dir}"/log-bundle-*.tar.gz; do
+ tar -xzf "${bundle}" -C "${ARTIFACT_DIR}/" 2>/dev/null
+ done
+ shopt -u nullglob📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| tar -xzf "${dir}"/log-bundle-*.tar.gz -C "${ARTIFACT_DIR}/" 2>/dev/null | |
| shopt -s nullglob | |
| for bundle in "${dir}"/log-bundle-*.tar.gz; do | |
| tar -xzf "${bundle}" -C "${ARTIFACT_DIR}/" 2>/dev/null | |
| done | |
| shopt -u nullglob |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/ipi/install/install/ipi-install-install-commands.sh`
at line 111, The single tar invocation using the glob
"${dir}"/log-bundle-*.tar.gz fails when the glob matches multiple archives;
replace that single command with logic that iterates over each matching file and
extracts them individually into "${ARTIFACT_DIR}" (i.e., for each file matching
"${dir}"/log-bundle-*.tar.gz, skip if no match, and run tar -xzf on that file
with -C "${ARTIFACT_DIR}" redirecting stderr as before) so each archive is
unpacked separately.
|
[REHEARSALNOTIFIER]
A total of 30321 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs. A full list of affected jobs can be found here Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
/pj-rehearse periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-serial-runc |
|
@wking: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse |
|
@wking: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/retitle OCPBUGS-87841: ci-operator/step-registry/ipi/install/install: Unpack log-bundle archives |
|
@wking: This pull request references Jira Issue OCPBUGS-87841, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@patrickdillon: This pull request references Jira Issue OCPBUGS-87841, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: patrickdillon, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Some logs is better than no logs, but I hope we can eventually find a better solution that preserves the log bundle tar |
LeakTK currently redacts the entire archive if it finds concerning content, like:
If we unpack the archive first, LeakTK will scan and redact at the file level, and not at the whole-archive level. And folks who know which file they want will be able to access that file without pulling down the entire archive.
As a downside for me, I often don't know which file I want, and downloading the archive and unpacking locally was a convenient way to get all the bits for local grep. But I'll survive, and an entirely-redacted archive was helping nobody.
Summary by CodeRabbit
This PR modifies the OpenShift CI ipi/install step to unpack log-bundle archives after cluster installation, rather than preserving them as compressed tar archives.
What changed:
The
populate_artifact_dir()function in the ipi-install-install-commands.sh step now extracts log-bundle tar.gz archives into the artifact directory usingtar -xzfinstead of copying the archive file as-is.Why this matters:
LeakTK, the security scanning tool used in CI, would previously redact entire archives if it detected sensitive content within them, making the entire archive inaccessible. By unpacking archives first, LeakTK can now scan and redact at the individual file level, allowing users to access specific files without needing to download and unpack complete archives locally. This improves the usability of CI artifacts while maintaining security redaction of sensitive information.
Impact:
This affects all OpenShift installation CI jobs that use the ipi/install/install step. Installation logs and related artifacts will now be available as individual unpacked files in the shared artifacts directory rather than as compressed archives.