Skip to content

docs(validation): main-chain validation suite (YAML, 43 cases)#3853

Closed
lefarcen wants to merge 2 commits into
mainfrom
feat/main-chain-validation-suite
Closed

docs(validation): main-chain validation suite (YAML, 43 cases)#3853
lefarcen wants to merge 2 commits into
mainfrom
feat/main-chain-validation-suite

Conversation

@lefarcen

@lefarcen lefarcen commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Adds validation/main-chain-suite.yaml — open-design's "must work every release" regression contract / acceptance criteria, living alongside the product code.

What it is

  • 43 cases (11 P0 / 26 P1 / 6 P2, 14 chains, 10 per_model × all AMR models), seeded from the 0.10.0 human acceptance + mining the source / e2e / daemon tests; each case cites its sources for traceability.
  • The nexu-xray agent reads this contract from the repo it validates and runs each case autonomously; its results are aligned with the test team's human acceptance by case id to compute a deviation rate (R&D dashboard /daily/?view=yang, the "Agent autonomous validation vs human acceptance" card).

Why it lives in the product repo

Each product ships its own validation contract; the tool (nexu-xray) is a generic runner. The contract belongs with the product code — owned by the product/QA team, updated in the same PR when a feature changes, and only ever grows (every human acceptance round is sedimented back into it).

Format

YAML — readable by humans and agents, diff-friendly, supports comments, and lighter than JSON. Migrated from the nexu-xray tool repo's corpus/main-chain-suite.json; this is now the single source of truth.

See validation/README.md. Two new files only, no code changes.

🤖 Generated with Claude Code

open-design 的「每次 release 必须工作」回归契约 / 验收标准,和产品代码住在一起。

- validation/main-chain-suite.yaml — 43 用例(11 P0 / 26 P1 / 6 P2),14 chain,
  10 条 ×AMR 全模型;从 0.10.0 验收 + 源码/e2e/daemon 测试挖掘种入。
- nexu-xray agent 自主验证从本仓读取它逐条跑;结果 ↔ 人工验收按 case id 对齐算偏差率
  (研发看板「Agent 自主验证 vs 人工验收」卡)。
- 设计:每个产品自带验证契约,工具是通用 runner → 契约住产品仓。
- 格式 YAML(人读+agent读+可注释+省 token),从 nexu-xray 工具仓 corpus 迁移而来。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lefarcen lefarcen requested a review from Siri-Ray June 8, 2026 03:47
@lefarcen lefarcen added size/L PR changes 300-700 lines risk/medium Medium risk: regular code changes type/docs Documentation changes only labels Jun 8, 2026

@Siri-Ray Siri-Ray left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lefarcen Thanks for moving this validation contract into the product repo. I checked the new YAML structure, case counts, duplicate IDs, declared chains, and the runtime registry references; I found one non-blocking consistency issue in the machine-readable agent list that is worth tightening before the runner depends on it.

🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.

per_model_focus: 'For each model: does execution complete WITHOUT error (no ''Connection reset by server'',
no ''socket connection'', no ''opencode event stream'' error), does it PRODUCE an output artifact,
and does retry-after-mid-run-error preserve prior thinking/output.'
amr_agents_registry:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NON_BLOCKING: amr_agents_registry is now a second machine-readable source for the runtime set, but it does not match the registry that the preceding enumerate text tells the runner to use. The current apps/daemon/src/runtimes/registry.ts BASE_AGENT_DEFS also includes amr, trae-cli, aider, antigravity, and reasonix, while this list stops at deepseek. If nexu-xray consumes this YAML field instead of dynamically reading BASE_AGENT_DEFS, those adapters will be skipped even though the contract says every release must regress the live catalog. Please either remove this hard-coded list and keep only the dynamic-enumeration rule, or update it to match BASE_AGENT_DEFS and add a note/test that keeps the YAML in sync with that source.

🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.

@lefarcen lefarcen changed the title docs(validation): 主链路验证用例库 (YAML, 43 cases) docs(validation): main-chain validation suite (YAML, 43 cases) Jun 8, 2026
…hinese)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@Siri-Ray Siri-Ray left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lefarcen Thanks for moving the validation contract into the product repo and for converting the README/header context to English on this head. I rechecked the YAML parse, case counts, duplicate IDs, declared chains, and the runtime registry references; I found one non-blocking consistency issue that is still worth tightening before nexu-xray depends on this field.

🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.

per_model_focus: 'For each model: does execution complete WITHOUT error (no ''Connection reset by server'',
no ''socket connection'', no ''opencode event stream'' error), does it PRODUCE an output artifact,
and does retry-after-mid-run-error preserve prior thinking/output.'
amr_agents_registry:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NON_BLOCKING: This machine-readable amr_agents_registry still does not match the runtime source that lines 22-26 tell the runner to use. The current apps/daemon/src/runtimes/registry.ts BASE_AGENT_DEFS includes amr, trae-cli, aider, antigravity, and reasonix, but this YAML list starts at claude and stops at deepseek. If nexu-xray ever consumes this YAML field instead of dynamically enumerating BASE_AGENT_DEFS, those adapters will be skipped even though the contract says every release must regress the live AMR catalog. Please either remove this hard-coded registry field and keep only the dynamic-enumeration rule, or update it to match BASE_AGENT_DEFS and add a small sync check so the contract cannot drift again.

🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.

@lefarcen lefarcen closed this Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

risk/medium Medium risk: regular code changes size/L PR changes 300-700 lines type/docs Documentation changes only

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants