You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
New Cluster Identified: 1 (log-clarity only, no runtime impact)
Today's audit shows the highest volume in the five-day baseline (~8× yesterday) driven by smoke suites and code-review workflows. Most notably, the review_path_unresolved_422 remediation from 2026-05-22 fired again today on a new PR and worked exactly as designed - third independent validation of the body-only fallback. Meanwhile, the previously-latent target_star_review_comment_no_pr_number_fallback cluster was positively exercised for the first time since 2026-05-22 and did NOT reproduce.
What happened: submit_pull_request_review received Unprocessable Entity: "Line could not be resolved and Line could not be resolved" from the GitHub API
Recovery: Body-only fallback in safe_output_handler_manager.cjs triggered automatically; review #4360463391 was created instead. Final tally: Total 4, Successful 4, Failed 0
Significance: The body-only fallback (introduced after 2026-05-21 hard failure) is now validated across three independent occurrences. Remediation confidence: high
2. target_star_review_comment_no_pr_number_fallback — First Positive Exercise Since Detection
What happened: Across the two runs, 4 create_pull_request_review_comment + 1 reply_to_pull_request_review_comment + 2 submit_pull_request_review messages were emitted - all succeeded
Significance: First positive exercise of this code path since the 2026-05-22 failure. Pattern did NOT reproduce. Status moves from "latent for 4 days" to "exercised, not reproduced; remediation gap status unclear without code inspection"
3. NEW Cluster: cancellation_counter_mislabeled_code_push_failed
What happened: set_issue_field was aborted by Threat-detection warn policy (Requirement WTD3) - this is correct behavior. However, the Processing Summary reports the abort as Cancelled (code push failed): 1 and emits ##[warning]1 message(s) were cancelled because a code push operation failed, even though no code push was attempted in this workflow.
Severity: Log-clarity only, no runtime impact
Why it matters: Future investigators looking for actual code-push regressions will encounter false positives in this counter. The per-message warning text already clearly labels WTD3 (🚫 Threat-detection warn policy aborted "set_issue_field" (Requirement WTD3)...) - only the rolled-up counter is misleading.
Root Cause Analysis
API Errors
Transient DNS error in checkout (§26431514043): ##[error]fatal: unable to access 'https://github.com/github/gh-aw/': Could not resolve host: github.com. GitHub Actions checkout retried automatically and succeeded; safe-output messages (4) all completed. Not a safe-output failure - infrastructure blip.
Approve own PR fallback (runs §26433108008, §26430960244): Cannot submit APPROVE review on own PR. Retrying with event=COMMENT. Clean fallback to comment-style review - working as designed.
Additional Non-Failing Warnings
Missing label (§26433367571): Could not find label IDs for: automated-testing - label may have been renamed/removed.
Permissions gap on resolve_pull_request_review_thread (§26431767267): integration token lacks permission to resolve review threads; handler skipped the message and surfaced a clear remediation hint (Use safe-outputs.resolve-pull-request-review-thread.github-token with a token that can resolve review threads). Working as designed.
Branch protection check permissions gap (§26431374230): jsweep could not query branch protection for copilot/add-replay-command-for-logs; push proceeded successfully.
Invalid temporary_id format (§26431767340): Smoke Copilot emitted aw_smoke_discussion (16 chars, violates 12-char cap). Handler auto-generated aw_FpLYMUhF and proceeded, but the #aw_smoke_discussion references in body text were not substituted. Agent-side fix: shorten the temporary_id in the Smoke Copilot workflow.
Draft config override (§26430590893): Agent requested draft: false, but configuration enforces draft: true. Configuration takes precedence for security. Working as designed.
Markdown fence count mismatch (7 runs): [renderMarkdownTemplate] Fence count mismatch: input had 4 fence marker(s), output has 2. Cosmetic template-processor warning; output is still written. Recurring across runs - worth investigating as a separate template-processing follow-up.
Recommendations
Critical Issues (Immediate Action Required)
None. Three consecutive clean days; no hard failures or safe-output regressions.
File/Location: safe_output_handler_manager.cjs (or wherever the Processing Summary rollup lives)
Problem: The cancellation counter and warning text conflate WTD3 / threat-detection policy aborts with code-push-failure cascades, making log triage harder.
Fix Option A (minimal): Rename the catch-all counter and warning to Cancelled (downstream gate blocked): N so the text matches all the cancellation reasons that feed it.
Fix Option B (recommended): Split the counter into distinct buckets: Cancelled (code push failed): N, Cancelled (threat-detection policy aborted): N, Cancelled (other downstream gate): N. Emit per-bucket warning lines so an investigator can grep for the right cause.
Affected jobs: Any safe-output workflow that uses non-reviewable outputs combined with safe-outputs.threat-detection (currently mostly Smoke Codex, but the pattern applies to any workflow with set_issue_field, set_issue_type, etc.)
Recommended: Add a check in the PR Sous Chef handler that, when update pull request branch from base fails repeatedly for the same PR, surfaces the underlying conflict state (mergeable_state) so a human can intervene.
Current state: Cluster did not reproduce when re-exercised today, but it's unclear whether the gap is closed in code or just not hit by today's specific items.
Proposed: Code-inspect safe_output_handler_manager.cjscreate_pull_request_review_comment handler to confirm whether the explicit-item-target:* branch now falls back to the triggering PR context (the gap identified on 2026-05-22). If confirmed closed, mark the cluster as resolved; if still open, prioritize the fix before next Smoke Claude/Copilot run.
Add markdown_fence_count_mismatch as a recurring observability item
7 runs today emitted the same [renderMarkdownTemplate] Fence count mismatch warning. This is currently treated as cosmetic but suggests the template renderer may be silently dropping fenced code blocks. Worth a separate follow-up to confirm output correctness.
Work Item Plans
Work Item 1: Fix cancellation counter mislabeling
Type: Bug Fix
Priority: Low (log clarity only, no runtime impact)
Description: In safe_output_handler_manager.cjs, the rolled-up Processing Summary counter Cancelled (code push failed): N and the trailing warning ##[warning]N message(s) were cancelled because a code push operation failed are emitted for any downstream-gate-blocked cancellation, not just code-push failures. Threat-detection WTD3 policy aborts get lumped into this counter, making future investigations harder.
Acceptance Criteria:
WTD3 policy aborts are counted and warned separately from code-push-failure cancellations
The Processing Summary reflects the actual reason for each cancelled message
An audit on Smoke Codex re-runs the WTD3 scenario and sees the new, accurate label
Existing per-message warning text (🚫 Threat-detection warn policy aborted "set_issue_field" (Requirement WTD3)) is preserved
Technical Approach: Track cancellation reasons in a cancellationReasons array (one entry per cancelled message). Roll up by reason when emitting the Processing Summary. Update the trailing aggregate warning to enumerate distinct reasons.
Estimated Effort: Small
Dependencies: None
Work Item 2: Smoke Copilot temporary_id length fix
Type: Configuration Fix
Priority: Low
Description: Smoke Copilot's discussion item uses temporary_id: aw_smoke_discussion (16 chars), exceeding the 12-char cap defined by the regex ^#?aw_[A-Za-z0-9_]{3,12}$/i. The validator auto-corrects to a generated ID, but the body text references the original (invalid) ID and is not substituted.
Acceptance Criteria:
Smoke Copilot workflow uses a temporary_id <= 12 chars
Next Smoke Copilot run emits no Invalid temporary_id format warning
Body text references resolve correctly to the artifact URL
Estimated Effort: Small (one-line workflow edit)
Dependencies: None
Work Item 3: Code-inspect target:"*" fallback closure (verification)
Type: Investigation
Priority: Medium
Description: The target_star_review_comment_no_pr_number_fallback cluster from 2026-05-22 did not reproduce when re-exercised today, but it's unclear whether the gap is closed in code or just not hit by today's items. A code inspection of create_pull_request_review_comment would confirm.
Acceptance Criteria:
Code path verified: when an item carries explicit target: "*" AND no pull_request_number, the handler falls back to the triggering PR context (refs/pull/{n}/merge)
If fallback is missing, implement it consistent with update_pull_request and submit_pull_request_review siblings
Update error-patterns.json to reflect status: resolved if the gap is confirmed closed
Estimated Effort: Small (inspection) + Small-Medium (fix if needed)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Executive Summary
Today's audit shows the highest volume in the five-day baseline (~8× yesterday) driven by smoke suites and code-review workflows. Most notably, the
review_path_unresolved_422remediation from 2026-05-22 fired again today on a new PR and worked exactly as designed - third independent validation of the body-only fallback. Meanwhile, the previously-latenttarget_star_review_comment_no_pr_number_fallbackcluster was positively exercised for the first time since 2026-05-22 and did NOT reproduce.Safe-Output Job Statistics
Critical Findings
No critical issues. Three consecutive days of zero hard failures across safe-output jobs.
Notable Events
1.
review_path_unresolved_422— Third Occurrence, Second Soft Recoveryreplaycommand for rendering unified timeline logs #34835submit_pull_request_reviewreceivedUnprocessable Entity: "Line could not be resolved and Line could not be resolved"from the GitHub APIsafe_output_handler_manager.cjstriggered automatically; review #4360463391 was created instead. Final tally: Total 4, Successful 4, Failed 02.
target_star_review_comment_no_pr_number_fallback— First Positive Exercise Since Detectioncreate_pull_request_review_comment+ 1reply_to_pull_request_review_comment+ 2submit_pull_request_reviewmessages were emitted - all succeeded3. NEW Cluster:
cancellation_counter_mislabeled_code_push_failedset_issue_fieldwas aborted by Threat-detection warn policy (Requirement WTD3) - this is correct behavior. However, the Processing Summary reports the abort asCancelled (code push failed): 1and emits##[warning]1 message(s) were cancelled because a code push operation failed, even though no code push was attempted in this workflow.🚫 Threat-detection warn policy aborted "set_issue_field" (Requirement WTD3)...) - only the rolled-up counter is misleading.Root Cause Analysis
API Errors
##[error]fatal: unable to access 'https://github.com/github/gh-aw/': Could not resolve host: github.com. GitHub Actions checkout retried automatically and succeeded; safe-output messages (4) all completed. Not a safe-output failure - infrastructure blip.Failed to update pull request #33219 branch from base (non-fatal): ERR_API. Same PR rebase attempted twice ~90 minutes apart, both with non-fatal warnings. Worth flagging if persistent.Cannot submit APPROVE review on own PR. Retrying with event=COMMENT.Clean fallback to comment-style review - working as designed.Additional Non-Failing Warnings
Could not find label IDs for: automated-testing- label may have been renamed/removed.Use safe-outputs.resolve-pull-request-review-thread.github-token with a token that can resolve review threads). Working as designed.copilot/add-replay-command-for-logs; push proceeded successfully.aw_smoke_discussion(16 chars, violates 12-char cap). Handler auto-generatedaw_FpLYMUhFand proceeded, but the#aw_smoke_discussionreferences in body text were not substituted. Agent-side fix: shorten the temporary_id in the Smoke Copilot workflow.Agent requested draft: false, but configuration enforces draft: true. Configuration takes precedence for security.Working as designed.[renderMarkdownTemplate] Fence count mismatch: input had 4 fence marker(s), output has 2. Cosmetic template-processor warning; output is still written. Recurring across runs - worth investigating as a separate template-processing follow-up.Recommendations
Critical Issues (Immediate Action Required)
None. Three consecutive clean days; no hard failures or safe-output regressions.
Bug Fixes Required
cancellation_counter_mislabeled_code_push_failed(newly identified)safe_output_handler_manager.cjs(or wherever the Processing Summary rollup lives)Cancelled (downstream gate blocked): Nso the text matches all the cancellation reasons that feed it.Cancelled (code push failed): N,Cancelled (threat-detection policy aborted): N,Cancelled (other downstream gate): N. Emit per-bucket warning lines so an investigator can grep for the right cause.safe-outputs.threat-detection(currently mostly Smoke Codex, but the pattern applies to any workflow withset_issue_field,set_issue_type, etc.)Configuration Changes
Smoke Copilot workflow - shorten temporary_id
aw_smoke_discussion(16 chars - violates 3-12 char cap)aw_smoke_disc(12 chars) oraw_smk_disc(10 chars)PR Sous Chef - investigate PR Bind Node toolcache into AWF chroot for Copilot-engine workflow startup reliability #33219 rebase API failures
ERR_API. The warning is non-fatal but suggests either a PR-specific issue (conflicts? archived branch?) or a GitHub API consistency issue.update pull request branch from basefails repeatedly for the same PR, surfaces the underlying conflict state (mergeable_state) so a human can intervene.Process Improvements
Verify
target_star_review_comment_no_pr_number_fallbackresolutionsafe_output_handler_manager.cjscreate_pull_request_review_commenthandler to confirm whether the explicit-item-target:*branch now falls back to the triggering PR context (the gap identified on 2026-05-22). If confirmed closed, mark the cluster asresolved; if still open, prioritize the fix before next Smoke Claude/Copilot run.Add
markdown_fence_count_mismatchas a recurring observability item[renderMarkdownTemplate] Fence count mismatchwarning. This is currently treated as cosmetic but suggests the template renderer may be silently dropping fenced code blocks. Worth a separate follow-up to confirm output correctness.Work Item Plans
Work Item 1: Fix cancellation counter mislabeling
safe_output_handler_manager.cjs, the rolled-up Processing Summary counterCancelled (code push failed): Nand the trailing warning##[warning]N message(s) were cancelled because a code push operation failedare emitted for any downstream-gate-blocked cancellation, not just code-push failures. Threat-detection WTD3 policy aborts get lumped into this counter, making future investigations harder.🚫 Threat-detection warn policy aborted "set_issue_field" (Requirement WTD3)) is preservedcancellationReasonsarray (one entry per cancelled message). Roll up by reason when emitting the Processing Summary. Update the trailing aggregate warning to enumerate distinct reasons.Work Item 2: Smoke Copilot temporary_id length fix
temporary_id: aw_smoke_discussion(16 chars), exceeding the 12-char cap defined by the regex^#?aw_[A-Za-z0-9_]{3,12}$/i. The validator auto-corrects to a generated ID, but the body text references the original (invalid) ID and is not substituted.Invalid temporary_id formatwarningWork Item 3: Code-inspect
target:"*"fallback closure (verification)target_star_review_comment_no_pr_number_fallbackcluster from 2026-05-22 did not reproduce when re-exercised today, but it's unclear whether the gap is closed in code or just not hit by today's items. A code inspection ofcreate_pull_request_review_commentwould confirm.target: "*"AND nopull_request_number, the handler falls back to the triggering PR context (refs/pull/{n}/merge)update_pull_requestandsubmit_pull_request_reviewsiblingserror-patterns.jsonto reflectstatus: resolvedif the gap is confirmed closedHistorical Context
Trends
review_path_unresolved_422: hit 3×, soft-recovered 2× - remediation validatedtarget_star_review_comment_no_pr_number_fallback: hit 1×, exercised 4× post-detection without reproduction - status pending code verificationcancellation_counter_mislabeled_code_push_failed: newly identified, 1× occurrence, log-clarity onlyMetrics
create_pull_request_review_comment(18 messages across 6 runs)Next Steps
markdown_fence_count_mismatchwarnings - if frequency grows, open a follow-up template-processing auditReferences:
Beta Was this translation helpful? Give feedback.
All reactions