[safe-output-health] Safe Output Health Report - 2026-05-26 #34878

2026-05-26T05:51:17Z

github-actions[bot]
Bot May 26, 2026

Executive Summary

Period: Last 24 hours (2026-05-25T05:30Z → 2026-05-26T05:33Z)
Runs Analyzed: 64 with safe_outputs slot (47 executed, 17 if-gated)
Workflows Active: 30+ unique workflows
Safe-Output Messages Processed: 122
Hard Failures: 0 ✅
Soft Recoveries: 1 (review_path_unresolved_422 → body-only fallback)
Overall Success Rate: 100% (third consecutive clean day)
New Cluster Identified: 1 (log-clarity only, no runtime impact)

Today's audit shows the highest volume in the five-day baseline (~8× yesterday) driven by smoke suites and code-review workflows. Most notably, the review_path_unresolved_422 remediation from 2026-05-22 fired again today on a new PR and worked exactly as designed - third independent validation of the body-only fallback. Meanwhile, the previously-latent target_star_review_comment_no_pr_number_fallback cluster was positively exercised for the first time since 2026-05-22 and did NOT reproduce.

Safe-Output Job Statistics

Handler Type	Messages	Successful	Notes
noop	17	17
create_pull_request_review_comment	18	18	422-recovery once
add_comment	12	12
submit_pull_request_review	9	9	1 body-only fallback; 2 own-PR fallbacks
create_issue	8	8
create_discussion	7	7
add_labels	6	6
upload_asset	6	0 (standalone)	skipped to standalone job, not a failure
create_pull_request	5	5
update_pull_request	5	5	2× non-fatal rebase warnings on PR #33219
create_check_run	5	5
push_to_pull_request_branch	2	2
comment_memory	2	2
set_issue_field	1	0	aborted by WTD3 policy (working as designed)
(15 other handler types)	25	24	1 perms-skip on resolve_pull_request_review_thread
Total	122	116	6 standalone-skips, 0 hard failures

Critical Findings

No critical issues. Three consecutive days of zero hard failures across safe-output jobs.

Notable Events

1. `review_path_unresolved_422` — Third Occurrence, Second Soft Recovery

Run: §26431374263 - PR Code Quality Reviewer on PR feat: add replay command for rendering unified timeline logs #34835
What happened: submit_pull_request_review received Unprocessable Entity: "Line could not be resolved and Line could not be resolved" from the GitHub API
Recovery: Body-only fallback in safe_output_handler_manager.cjs triggered automatically; review #4360463391 was created instead. Final tally: Total 4, Successful 4, Failed 0
Significance: The body-only fallback (introduced after 2026-05-21 hard failure) is now validated across three independent occurrences. Remediation confidence: high

2. `target_star_review_comment_no_pr_number_fallback` — First Positive Exercise Since Detection

Runs: §26431767267 Smoke Claude and §26431767340 Smoke Copilot on PR Move model alias/multiplier propagation from step env to activation artifact file #34837
What happened: Across the two runs, 4 create_pull_request_review_comment + 1 reply_to_pull_request_review_comment + 2 submit_pull_request_review messages were emitted - all succeeded
Significance: First positive exercise of this code path since the 2026-05-22 failure. Pattern did NOT reproduce. Status moves from "latent for 4 days" to "exercised, not reproduced; remediation gap status unclear without code inspection"

3. NEW Cluster: `cancellation_counter_mislabeled_code_push_failed`

Run: §26431767308 Smoke Codex on PR Move model alias/multiplier propagation from step env to activation artifact file #34837
What happened: set_issue_field was aborted by Threat-detection warn policy (Requirement WTD3) - this is correct behavior. However, the Processing Summary reports the abort as Cancelled (code push failed): 1 and emits ##[warning]1 message(s) were cancelled because a code push operation failed, even though no code push was attempted in this workflow.
Severity: Log-clarity only, no runtime impact
Why it matters: Future investigators looking for actual code-push regressions will encounter false positives in this counter. The per-message warning text already clearly labels WTD3 (🚫 Threat-detection warn policy aborted "set_issue_field" (Requirement WTD3)...) - only the rolled-up counter is misleading.

Root Cause Analysis

API Errors

Transient DNS error in checkout (§26431514043): ##[error]fatal: unable to access 'https://github.com/github/gh-aw/': Could not resolve host: github.com. GitHub Actions checkout retried automatically and succeeded; safe-output messages (4) all completed. Not a safe-output failure - infrastructure blip.
PR rebase warnings (PR Bind Node toolcache into AWF chroot for Copilot-engine workflow startup reliability #33219, runs §26429260951 and §26431884792): Failed to update pull request #33219 branch from base (non-fatal): ERR_API. Same PR rebase attempted twice ~90 minutes apart, both with non-fatal warnings. Worth flagging if persistent.
Approve own PR fallback (runs §26433108008, §26430960244): Cannot submit APPROVE review on own PR. Retrying with event=COMMENT. Clean fallback to comment-style review - working as designed.

Additional Non-Failing Warnings

Missing label (§26433367571): Could not find label IDs for: automated-testing - label may have been renamed/removed.
Permissions gap on resolve_pull_request_review_thread (§26431767267): integration token lacks permission to resolve review threads; handler skipped the message and surfaced a clear remediation hint (Use safe-outputs.resolve-pull-request-review-thread.github-token with a token that can resolve review threads). Working as designed.
Branch protection check permissions gap (§26431374230): jsweep could not query branch protection for copilot/add-replay-command-for-logs; push proceeded successfully.
Invalid temporary_id format (§26431767340): Smoke Copilot emitted aw_smoke_discussion (16 chars, violates 12-char cap). Handler auto-generated aw_FpLYMUhF and proceeded, but the #aw_smoke_discussion references in body text were not substituted. Agent-side fix: shorten the temporary_id in the Smoke Copilot workflow.
Draft config override (§26430590893): Agent requested draft: false, but configuration enforces draft: true. Configuration takes precedence for security. Working as designed.
Markdown fence count mismatch (7 runs): [renderMarkdownTemplate] Fence count mismatch: input had 4 fence marker(s), output has 2. Cosmetic template-processor warning; output is still written. Recurring across runs - worth investigating as a separate template-processing follow-up.

Recommendations

Critical Issues (Immediate Action Required)

None. Three consecutive clean days; no hard failures or safe-output regressions.

Bug Fixes Required

Fix cancellation_counter_mislabeled_code_push_failed (newly identified)
- File/Location: safe_output_handler_manager.cjs (or wherever the Processing Summary rollup lives)
- Problem: The cancellation counter and warning text conflate WTD3 / threat-detection policy aborts with code-push-failure cascades, making log triage harder.
- Fix Option A (minimal): Rename the catch-all counter and warning to Cancelled (downstream gate blocked): N so the text matches all the cancellation reasons that feed it.
- Fix Option B (recommended): Split the counter into distinct buckets: Cancelled (code push failed): N, Cancelled (threat-detection policy aborted): N, Cancelled (other downstream gate): N. Emit per-bucket warning lines so an investigator can grep for the right cause.
- Affected jobs: Any safe-output workflow that uses non-reviewable outputs combined with safe-outputs.threat-detection (currently mostly Smoke Codex, but the pattern applies to any workflow with set_issue_field, set_issue_type, etc.)
- Effort: Small

Configuration Changes

Smoke Copilot workflow - shorten temporary_id
- Current: aw_smoke_discussion (16 chars - violates 3-12 char cap)
- Recommended: aw_smoke_disc (12 chars) or aw_smk_disc (10 chars)
- Reason: Prevents the auto-generated ID from breaking body-text references and eliminates the two repeated warnings per Smoke Copilot run.
PR Sous Chef - investigate PR Bind Node toolcache into AWF chroot for Copilot-engine workflow startup reliability #33219 rebase API failures
- Observation: Two consecutive runs (90 minutes apart) failed to update PR Bind Node toolcache into AWF chroot for Copilot-engine workflow startup reliability #33219's branch from base with ERR_API. The warning is non-fatal but suggests either a PR-specific issue (conflicts? archived branch?) or a GitHub API consistency issue.
- Recommended: Add a check in the PR Sous Chef handler that, when update pull request branch from base fails repeatedly for the same PR, surfaces the underlying conflict state (mergeable_state) so a human can intervene.

Process Improvements

Verify target_star_review_comment_no_pr_number_fallback resolution
- Current state: Cluster did not reproduce when re-exercised today, but it's unclear whether the gap is closed in code or just not hit by today's specific items.
- Proposed: Code-inspect safe_output_handler_manager.cjs create_pull_request_review_comment handler to confirm whether the explicit-item-target:* branch now falls back to the triggering PR context (the gap identified on 2026-05-22). If confirmed closed, mark the cluster as resolved; if still open, prioritize the fix before next Smoke Claude/Copilot run.
Add markdown_fence_count_mismatch as a recurring observability item
- 7 runs today emitted the same [renderMarkdownTemplate] Fence count mismatch warning. This is currently treated as cosmetic but suggests the template renderer may be silently dropping fenced code blocks. Worth a separate follow-up to confirm output correctness.

Work Item Plans

Work Item 1: Fix cancellation counter mislabeling

Type: Bug Fix
Priority: Low (log clarity only, no runtime impact)
Description: In safe_output_handler_manager.cjs, the rolled-up Processing Summary counter Cancelled (code push failed): N and the trailing warning ##[warning]N message(s) were cancelled because a code push operation failed are emitted for any downstream-gate-blocked cancellation, not just code-push failures. Threat-detection WTD3 policy aborts get lumped into this counter, making future investigations harder.
Acceptance Criteria:
- WTD3 policy aborts are counted and warned separately from code-push-failure cancellations
- The Processing Summary reflects the actual reason for each cancelled message
- An audit on Smoke Codex re-runs the WTD3 scenario and sees the new, accurate label
- Existing per-message warning text (🚫 Threat-detection warn policy aborted "set_issue_field" (Requirement WTD3)) is preserved
Technical Approach: Track cancellation reasons in a cancellationReasons array (one entry per cancelled message). Roll up by reason when emitting the Processing Summary. Update the trailing aggregate warning to enumerate distinct reasons.
Estimated Effort: Small
Dependencies: None

Work Item 2: Smoke Copilot temporary_id length fix

Type: Configuration Fix
Priority: Low
Description: Smoke Copilot's discussion item uses temporary_id: aw_smoke_discussion (16 chars), exceeding the 12-char cap defined by the regex ^#?aw_[A-Za-z0-9_]{3,12}$/i. The validator auto-corrects to a generated ID, but the body text references the original (invalid) ID and is not substituted.
Acceptance Criteria:
- Smoke Copilot workflow uses a temporary_id <= 12 chars
- Next Smoke Copilot run emits no Invalid temporary_id format warning
- Body text references resolve correctly to the artifact URL
Estimated Effort: Small (one-line workflow edit)
Dependencies: None

Work Item 3: Code-inspect `target:"*"` fallback closure (verification)

Type: Investigation
Priority: Medium
Description: The target_star_review_comment_no_pr_number_fallback cluster from 2026-05-22 did not reproduce when re-exercised today, but it's unclear whether the gap is closed in code or just not hit by today's items. A code inspection of create_pull_request_review_comment would confirm.
Acceptance Criteria:
- Code path verified: when an item carries explicit target: "*" AND no pull_request_number, the handler falls back to the triggering PR context (refs/pull/{n}/merge)
- If fallback is missing, implement it consistent with update_pull_request and submit_pull_request_review siblings
- Update error-patterns.json to reflect status: resolved if the gap is confirmed closed
Estimated Effort: Small (inspection) + Small-Medium (fix if needed)
Dependencies: None

Historical Context

Date	Runs	Executed	Failed	Success %	Headline
2026-05-21	73	48	1	97.9%	review_path_unresolved_422 (first hit)
2026-05-22	65	24	1	95.8%	target_star_review_comment_no_pr_number_fallback (first hit); 422 soft-recovery validated
2026-05-23	21	8	0	100.0%	Clean (Saturday, low volume)
2026-05-24	13	6	0	100.0%	Clean (review paths not exercised)
2026-05-25	—	—	—	—	No audit recorded (Sunday gap)
2026-05-26	64	47	0	100.0%	*Third clean day; 422 cluster re-exercised + recovered; target:"" cluster positively exercised**

Trends

Error rate trend: Stable at 0 hard failures × 3 consecutive days
Volume trend: 8× rebound today vs. yesterday's quiet day (47 vs 6 executed)
Coverage: Broadest single-day coverage in baseline - 23+ distinct handler types exercised
Cluster status:
- review_path_unresolved_422: hit 3×, soft-recovered 2× - remediation validated
- target_star_review_comment_no_pr_number_fallback: hit 1×, exercised 4× post-detection without reproduction - status pending code verification
- cancellation_counter_mislabeled_code_push_failed: newly identified, 1× occurrence, log-clarity only

Metrics

Overall Safe-Output Success Rate (24h): 100% (47/47 jobs, 116/116 deliverable messages succeeded; 6 standalone-skips are not failures)
Most Reliable Handler Types: All 23+ types have a perfect record today
Most Exercised Handler Type: create_pull_request_review_comment (18 messages across 6 runs)
Engine Mix: Copilot 28, Claude 14, Codex 1 (Antigravity/Gemini/Pi runs did not reach safe_outputs)
PRs Created Today: 5 (Code Simplifier ×2, Go Logger Enhancement, jsweep, PR Description Updater)
Issues Created Today: 8
Discussions Created Today: 7

Next Steps

Triage Work Item 1 (cancellation counter mislabeling) - low priority but recommended
Apply Work Item 2 (Smoke Copilot temporary_id) - one-line fix, can ship immediately
Schedule Work Item 3 (target:"*" code inspection) before next Smoke Claude/Copilot release window
Monitor PR Bind Node toolcache into AWF chroot for Copilot-engine workflow startup reliability #33219 - if rebase API warnings persist, surface mergeable_state to operator
Track markdown_fence_count_mismatch warnings - if frequency grows, open a follow-up template-processing audit
Resume daily audit cadence (note: 2026-05-25 audit was not recorded)

References:

Generated by 🔒 Safe Output Health Monitor · opus47 52.1M · ◷

expires on May 27, 2026, 5:51 AM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[safe-output-health] Safe Output Health Report - 2026-05-26 #34878

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[safe-output-health] Safe Output Health Report - 2026-05-26 #34878

Uh oh!

github-actions[bot] Bot May 26, 2026

Executive Summary

Safe-Output Job Statistics

Critical Findings

Notable Events

1. review_path_unresolved_422 — Third Occurrence, Second Soft Recovery

2. target_star_review_comment_no_pr_number_fallback — First Positive Exercise Since Detection

3. NEW Cluster: cancellation_counter_mislabeled_code_push_failed

Root Cause Analysis

API Errors

Recommendations

Critical Issues (Immediate Action Required)

Bug Fixes Required

Configuration Changes

Process Improvements

Work Item Plans

Work Item 1: Fix cancellation counter mislabeling

Work Item 2: Smoke Copilot temporary_id length fix

Work Item 3: Code-inspect target:"*" fallback closure (verification)

Historical Context

Trends

Metrics

Next Steps

Replies: 0 comments

github-actions[bot]
Bot May 26, 2026

1. `review_path_unresolved_422` — Third Occurrence, Second Soft Recovery

2. `target_star_review_comment_no_pr_number_fallback` — First Positive Exercise Since Detection

3. NEW Cluster: `cancellation_counter_mislabeled_code_push_failed`

Work Item 3: Code-inspect `target:"*"` fallback closure (verification)