docs(validation): main-chain validation suite (YAML, 43 cases) by lefarcen · Pull Request #3853 · nexu-io/open-design

lefarcen · 2026-06-08T03:45:33Z

Adds validation/main-chain-suite.yaml — open-design's "must work every release" regression contract / acceptance criteria, living alongside the product code.

What it is

43 cases (11 P0 / 26 P1 / 6 P2, 14 chains, 10 per_model × all AMR models), seeded from the 0.10.0 human acceptance + mining the source / e2e / daemon tests; each case cites its sources for traceability.
The nexu-xray agent reads this contract from the repo it validates and runs each case autonomously; its results are aligned with the test team's human acceptance by case id to compute a deviation rate (R&D dashboard /daily/?view=yang, the "Agent autonomous validation vs human acceptance" card).

Why it lives in the product repo

Each product ships its own validation contract; the tool (nexu-xray) is a generic runner. The contract belongs with the product code — owned by the product/QA team, updated in the same PR when a feature changes, and only ever grows (every human acceptance round is sedimented back into it).

Format

YAML — readable by humans and agents, diff-friendly, supports comments, and lighter than JSON. Migrated from the nexu-xray tool repo's corpus/main-chain-suite.json; this is now the single source of truth.

See validation/README.md. Two new files only, no code changes.

🤖 Generated with Claude Code

open-design 的「每次 release 必须工作」回归契约 / 验收标准,和产品代码住在一起。 - validation/main-chain-suite.yaml — 43 用例(11 P0 / 26 P1 / 6 P2),14 chain, 10 条 ×AMR 全模型;从 0.10.0 验收 + 源码/e2e/daemon 测试挖掘种入。 - nexu-xray agent 自主验证从本仓读取它逐条跑;结果 ↔ 人工验收按 case id 对齐算偏差率 (研发看板「Agent 自主验证 vs 人工验收」卡)。 - 设计:每个产品自带验证契约,工具是通用 runner → 契约住产品仓。 - 格式 YAML(人读+agent读+可注释+省 token),从 nexu-xray 工具仓 corpus 迁移而来。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Siri-Ray

@lefarcen Thanks for moving this validation contract into the product repo. I checked the new YAML structure, case counts, duplicate IDs, declared chains, and the runtime registry references; I found one non-blocking consistency issue in the machine-readable agent list that is worth tightening before the runner depends on it.

_{🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.}

Siri-Ray · 2026-06-08T03:50:46Z

+  per_model_focus: 'For each model: does execution complete WITHOUT error (no ''Connection reset by server'',
+    no ''socket connection'', no ''opencode event stream'' error), does it PRODUCE an output artifact,
+    and does retry-after-mid-run-error preserve prior thinking/output.'
+  amr_agents_registry:


NON_BLOCKING: amr_agents_registry is now a second machine-readable source for the runtime set, but it does not match the registry that the preceding enumerate text tells the runner to use. The current apps/daemon/src/runtimes/registry.ts BASE_AGENT_DEFS also includes amr, trae-cli, aider, antigravity, and reasonix, while this list stops at deepseek. If nexu-xray consumes this YAML field instead of dynamically reading BASE_AGENT_DEFS, those adapters will be skipped even though the contract says every release must regress the live catalog. Please either remove this hard-coded list and keep only the dynamic-enumeration rule, or update it to match BASE_AGENT_DEFS and add a note/test that keeps the YAML in sync with that source.
_{🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.}

…hinese) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Siri-Ray

@lefarcen Thanks for moving the validation contract into the product repo and for converting the README/header context to English on this head. I rechecked the YAML parse, case counts, duplicate IDs, declared chains, and the runtime registry references; I found one non-blocking consistency issue that is still worth tightening before nexu-xray depends on this field.

_{🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.}

Siri-Ray · 2026-06-09T12:26:27Z

+  per_model_focus: 'For each model: does execution complete WITHOUT error (no ''Connection reset by server'',
+    no ''socket connection'', no ''opencode event stream'' error), does it PRODUCE an output artifact,
+    and does retry-after-mid-run-error preserve prior thinking/output.'
+  amr_agents_registry:


NON_BLOCKING: This machine-readable amr_agents_registry still does not match the runtime source that lines 22-26 tell the runner to use. The current apps/daemon/src/runtimes/registry.ts BASE_AGENT_DEFS includes amr, trae-cli, aider, antigravity, and reasonix, but this YAML list starts at claude and stops at deepseek. If nexu-xray ever consumes this YAML field instead of dynamically enumerating BASE_AGENT_DEFS, those adapters will be skipped even though the contract says every release must regress the live AMR catalog. Please either remove this hard-coded registry field and keep only the dynamic-enumeration rule, or update it to match BASE_AGENT_DEFS and add a small sync check so the contract cannot drift again.
_{🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.}

lefarcen requested a review from Siri-Ray June 8, 2026 03:47

lefarcen added size/L PR changes 300-700 lines risk/medium Medium risk: regular code changes type/docs Documentation changes only labels Jun 8, 2026

Siri-Ray reviewed Jun 8, 2026

View reviewed changes

lefarcen changed the title ~~docs(validation): 主链路验证用例库 (YAML, 43 cases)~~ docs(validation): main-chain validation suite (YAML, 43 cases) Jun 8, 2026

docs(validation): English README + suite header (case content stays C…

6102bf8

…hinese) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Siri-Ray reviewed Jun 9, 2026

View reviewed changes

lefarcen closed this Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(validation): main-chain validation suite (YAML, 43 cases)#3853

docs(validation): main-chain validation suite (YAML, 43 cases)#3853
lefarcen wants to merge 2 commits into
mainfrom
feat/main-chain-validation-suite

lefarcen commented Jun 8, 2026 •

edited

Loading

Uh oh!

Siri-Ray left a comment

Uh oh!

Siri-Ray Jun 8, 2026

Uh oh!

Siri-Ray left a comment

Uh oh!

Siri-Ray Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lefarcen commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What it is

Why it lives in the product repo

Format

Uh oh!

Siri-Ray left a comment

Choose a reason for hiding this comment

Uh oh!

Siri-Ray Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Siri-Ray left a comment

Choose a reason for hiding this comment

Uh oh!

Siri-Ray Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lefarcen commented Jun 8, 2026 •

edited

Loading