Skip to content

Detect unknown_model_ai_credits failure in conclusion job#38610

Merged
pelikhan merged 2 commits into
mainfrom
copilot/detect-unknown-model-ai-credits-failure
Jun 11, 2026
Merged

Detect unknown_model_ai_credits failure in conclusion job#38610
pelikhan merged 2 commits into
mainfrom
copilot/detect-unknown-model-ai-credits-failure

Conversation

Copilot AI commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

When max-ai-credits is active and the requested model isn't in the built-in pricing table, the AWF API proxy returns HTTP 400 with type unknown_model_ai_credits. Previously this produced only a generic fallback message in the conclusion comment. Reference: run 27346208956

Detection pipeline

Follows the same pattern as ai_credits_rate_limit_error:

  • ai_credits_context.cjsparseUnknownModelAICreditsFromAuditLog walks firewall audit JSONL for "type": "unknown_model_ai_credits"
  • parse_mcp_gateway_log.cjs — detects across all log paths (gateway.jsonl, rpc-messages.jsonl, gateway.md, legacy logs), emits unknown_model_ai_credits step output
  • compiler_main_job.go — exposes it as an agent job output from parse-mcp-gateway
  • notify_comment.go — passes GH_AW_UNKNOWN_MODEL_AI_CREDITS env var to the conclusion job

Failure message

New template unknown_model_ai_credits.md explains the error is a non-transient config failure (not worth retrying) and offers three remediation paths:

# Option 1: alias to a known model via `models`
model: my-custom-model
max-ai-credits: 500
models:
  my-custom-model:
    model: gpt-4.1

# Option 2: use a model already in the pricing table

# Option 3: supply a fallback price
models:
  my-custom-model:
    defaultAiCreditsPricing: 3.0
  • handle_agent_failure.cjs — reads env var, adds unknown_model_ai_credits failure category, new buildUnknownModelAICreditsContext wired into both comment and issue template contexts
  • agent_failure_comment.md / agent_failure_issue.md — insert {unknown_model_ai_credits_context} after {ai_credits_rate_limit_error_context}

Copilot AI and others added 2 commits June 11, 2026 12:50
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI requested a review from pelikhan June 11, 2026 13:04
@pelikhan pelikhan marked this pull request as ready for review June 11, 2026 13:04
Copilot AI review requested due to automatic review settings June 11, 2026 13:04

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves failure reporting when max-ai-credits is enabled and the requested model is not present in the built-in AI credits pricing table by surfacing an unknown_model_ai_credits signal to the conclusion job and ensuring it can be rendered in the failure comment/issue templates.

Changes:

  • Expose unknown_model_ai_credits as an agent job output sourced from parse-mcp-gateway, and forward it to the conclusion job via GH_AW_UNKNOWN_MODEL_AI_CREDITS.
  • Update agent failure templates to include {unknown_model_ai_credits_context} alongside existing AI-credits failure contexts.
  • Add a focused compiler/lockfile assertion test and refresh golden/lockfile fixtures to include the new output wiring.
Show a summary per file
File Description
pkg/workflow/compiler_main_job.go Adds unknown_model_ai_credits to the agent job outputs map, sourced from the parse-mcp-gateway step.
pkg/workflow/notify_comment.go Passes GH_AW_UNKNOWN_MODEL_AI_CREDITS into the conclusion job environment from agent outputs.
pkg/workflow/unknown_model_ai_credits_test.go Adds a regression test asserting the output/env-var plumbing appears in compiled lock output.
actions/setup/md/agent_failure_comment.md Includes {unknown_model_ai_credits_context} in the failure comment template insertion point.
actions/setup/md/agent_failure_issue.md Includes {unknown_model_ai_credits_context} in the failure issue template insertion point.
pkg/workflow/testdata/TestWasmGolden_CompileFixtures/basic-copilot.golden Updates golden to include the new agent output.
pkg/workflow/testdata/TestWasmGolden_CompileFixtures/with-imports.golden Updates golden to include the new agent output.
pkg/workflow/testdata/TestWasmGolden_CompileFixtures/smoke-copilot.golden Updates golden to include the new agent output.
pkg/workflow/testdata/TestWasmGolden_CompileFixtures/playwright-cli-mode.golden Updates golden to include the new agent output.
pkg/workflow/testdata/TestWasmGolden_AllEngines/pi.golden Updates golden to include the new agent output.
pkg/workflow/testdata/TestWasmGolden_AllEngines/gemini.golden Updates golden to include the new agent output.
pkg/workflow/testdata/TestWasmGolden_AllEngines/copilot.golden Updates golden to include the new agent output.
pkg/workflow/testdata/TestWasmGolden_AllEngines/codex.golden Updates golden to include the new agent output.
pkg/workflow/testdata/TestWasmGolden_AllEngines/claude.golden Updates golden to include the new agent output.
.github/workflows/*.lock.yml Regenerates workflow lockfiles to propagate the new agent output and conclusion env var wiring across locked workflows.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 265/265 changed files
  • Comments generated: 0

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

PR Code Quality Reviewer completed the code quality review.

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #38610 does not have the "implementation" label (has_implementation_label=false) and has only 40 new lines of code in business logic directories, below the 100-line threshold (requires_adr_by_default_volume=false).

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

🧠 Matt Pocock Skills Reviewer was skipped during the skills-based review.

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel completed test quality analysis.

@github-actions github-actions Bot mentioned this pull request Jun 11, 2026
@github-actions

Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

⚠️ Test Quality Score: 72/100 — Acceptable

Analyzed 11 test(s): 11 design, 0 implementation, 0 guideline violation(s). Marginal test inflation detected in 2 JS files (just over 2:1 ratio).

📊 Metrics & Test Classification (11 tests analyzed)
Metric Value
New/modified tests analyzed 11
✅ Design tests (behavioral contracts) 11 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 8 (73%)
Duplicate test clusters 0
Test inflation detected ⚠️ Yes — marginal (ai_credits_context: 2.03:1, handle_agent_failure: 2.09:1)
🚨 Coding-guideline violations 0

Test Classification Details

Test File Classification Issues Detected
detects unknown_model_ai_credits type in audit log ai_credits_context.test.cjs ✅ Design None
detects unknown_model_ai_credits using audit.jsonl filename ai_credits_context.test.cjs ✅ Design None
detects unknown_model_ai_credits in a multi-entry log ai_credits_context.test.cjs ✅ Design None
returns false when no unknown_model_ai_credits signal is present ai_credits_context.test.cjs ✅ Design None
returns false for missing audit log ai_credits_context.test.cjs ✅ Design None
returns false for empty audit log ai_credits_context.test.cjs ✅ Design None
does not detect other error types as unknown_model_ai_credits ai_credits_context.test.cjs ✅ Design None
returns empty string when no unknown_model_ai_credits error handle_agent_failure.test.cjs ✅ Design None
returns template content when error is true and template exists handle_agent_failure.test.cjs ✅ Design None
throws when template is missing handle_agent_failure.test.cjs ✅ Design None
TestUnknownModelAICreditsOutput pkg/workflow/unknown_model_ai_credits_test.go ✅ Design Happy-path only

Language Support

Tests analyzed:

  • 🐹 Go (*_test.go): 1 test — unit (//go:build !integration) ✅
  • 🟨 JavaScript (*.test.cjs): 10 tests (vitest)
⚠️ Notes — Marginal Observations (1 item)

⚠️ Test Inflation — Marginal (ai_credits_context, handle_agent_failure)

Classification: Metric note (not a guideline violation)
Observation: Both JS test files added slightly more than twice the lines of their corresponding production files:

  • ai_credits_context.test.cjs added 71 lines vs 35 in ai_credits_context.cjs2.03:1
  • handle_agent_failure.test.cjs added 46 lines vs 22 in handle_agent_failure.cjs2.09:1

The ratios are only marginally above the 2:1 threshold and are attributable to the fact that thorough edge-case coverage (7 and 3 tests respectively) naturally requires more test scaffolding than production code. The tests themselves are high quality — the inflation ratio metric is flagged for completeness, not as a concern.

📝 TestUnknownModelAICreditsOutput — Happy-path only

Classification: Design test (no deduction)
Observation: The Go test verifies the compiled lock file contains both unknown_model_ai_credits output wiring strings, but does not include a negative case (e.g., verifying the string is absent when the feature is off). Consider adding an error-path or absent-feature row to the table-driven pattern used elsewhere in the package.

Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%). All 11 tests enforce behavioral contracts. No coding-guideline violations detected.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

References: §27348833330

🧪 Test quality analysis by Test Quality Sentinel · 300.9 AIC · ⌖ 20.8 AIC ·

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 72/100. Test quality is acceptable — 0% of new tests are implementation tests (threshold: 30%). All 11 tests enforce behavioral contracts with solid edge-case coverage. No coding-guideline violations detected.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REQUEST_CHANGES — One critical correctness bug must be fixed before merge.

Blocking issues

Critical: writeStepSummaryWithTokenUsage deleted from main path (parse_mcp_gateway_log.cjs ~line 994)

The new setUnknownModelAICreditsOutput line replaced writeStepSummaryWithTokenUsage instead of being inserted alongside it. On every normal workflow run (the main code path through main()), this means:

  • core.summary.write() is never called → the step summary is silently discarded
  • token-usage.jsonl is never read → GH_AW_AIC, aic, ambient_context outputs are never set
  • The unified event timeline is never appended

All three early-return paths (~lines 903, 934, 973) were updated correctly; only the terminal path at the bottom of main() was missed.

Medium: Redundant disk reads in hasUnknownModelAICreditsError (parse_mcp_gateway_log.cjs ~line 269)

The function unconditionally falls through to parseUnknownModelAICreditsFromAuditLog() when the in-memory pattern does not match, causing up to 5 sequential disk reads of the same audit log file (once per content source). This is inconsistent with hasAICreditsRateLimitError, which is content-only. See inline comment for the recommended refactor.

🔎 Code quality review by PR Code Quality Reviewer · ⌖ 21.9 AIC

}
setAICreditsRateLimitOutput(core, aiCreditsRateLimitError);
writeStepSummaryWithTokenUsage(core);
setUnknownModelAICreditsOutput(core, unknownModelAICredits);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped writeStepSummaryWithTokenUsage call breaks the main execution path: the step summary is never flushed, token usage (GH_AW_AIC, aic, ambient_context) is never exported, and the unified event timeline is never appended — silently, on every normal workflow run.

💡 Suggested fix

The new line was intended to be added alongside the existing call, not instead of it. All three early-return paths (lines ~903, ~934, ~973) already do this correctly. The final path was missed:

    setAICreditsRateLimitOutput(core, aiCreditsRateLimitError);
    setUnknownModelAICreditsOutput(core, unknownModelAICredits);
+   writeStepSummaryWithTokenUsage(core);   // <-- must be re-added
  } catch (error) {

Without this call, core.summary.write() is never invoked, so any content queued with core.summary.addRaw() in the main path is silently discarded. Token-usage.jsonl is never read, so downstream jobs that depend on aic / ambient_context outputs will receive empty values.

function hasUnknownModelAICreditsError(contents) {
const joined = contents.filter(Boolean).join("\n");
if (joined && UNKNOWN_MODEL_AI_CREDITS_PATTERNS.some(pattern => pattern.test(joined))) return true;
return parseUnknownModelAICreditsFromAuditLog();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasUnknownModelAICreditsError reads the audit log from disk on every call: it unconditionally falls through to parseUnknownModelAICreditsFromAuditLog() when the in-memory pattern check fails. Because the function is called for each content source (gateway.jsonl, rpc-messages.jsonl, gateway.md, gateway.log, stderr.log), this means up to 5 sequential disk reads of the audit log when no error is detected — inconsistent with hasAICreditsRateLimitError, which is content-only.

💡 Suggested fix

Read the audit log once outside the per-source loop, using ||= to latch the result:

// Before the content-source loop, read audit log once
unknownModelAICredits ||= parseUnknownModelAICreditsFromAuditLog();

Then simplify hasUnknownModelAICreditsError to only check in-memory content (matching the shape of hasAICreditsRateLimitError):

function hasUnknownModelAICreditsError(contents) {
  const joined = contents.filter(Boolean).join("\n");
  return joined ? UNKNOWN_MODEL_AI_CREDITS_PATTERNS.some(p => p.test(joined)) : false;
}

This mirrors the existing architecture where audit-log parsing is centralised and done once, not embedded inside a per-content-block helper.

@pelikhan pelikhan merged commit 1a95104 into main Jun 11, 2026
110 checks passed
@pelikhan pelikhan deleted the copilot/detect-unknown-model-ai-credits-failure branch June 11, 2026 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants