docs(validation): main-chain validation suite (YAML, 43 cases)#3853
docs(validation): main-chain validation suite (YAML, 43 cases)#3853lefarcen wants to merge 2 commits into
Conversation
open-design 的「每次 release 必须工作」回归契约 / 验收标准,和产品代码住在一起。 - validation/main-chain-suite.yaml — 43 用例(11 P0 / 26 P1 / 6 P2),14 chain, 10 条 ×AMR 全模型;从 0.10.0 验收 + 源码/e2e/daemon 测试挖掘种入。 - nexu-xray agent 自主验证从本仓读取它逐条跑;结果 ↔ 人工验收按 case id 对齐算偏差率 (研发看板「Agent 自主验证 vs 人工验收」卡)。 - 设计:每个产品自带验证契约,工具是通用 runner → 契约住产品仓。 - 格式 YAML(人读+agent读+可注释+省 token),从 nexu-xray 工具仓 corpus 迁移而来。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Siri-Ray
left a comment
There was a problem hiding this comment.
@lefarcen Thanks for moving this validation contract into the product repo. I checked the new YAML structure, case counts, duplicate IDs, declared chains, and the runtime registry references; I found one non-blocking consistency issue in the machine-readable agent list that is worth tightening before the runner depends on it.
🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.| per_model_focus: 'For each model: does execution complete WITHOUT error (no ''Connection reset by server'', | ||
| no ''socket connection'', no ''opencode event stream'' error), does it PRODUCE an output artifact, | ||
| and does retry-after-mid-run-error preserve prior thinking/output.' | ||
| amr_agents_registry: |
There was a problem hiding this comment.
NON_BLOCKING: amr_agents_registry is now a second machine-readable source for the runtime set, but it does not match the registry that the preceding enumerate text tells the runner to use. The current apps/daemon/src/runtimes/registry.ts BASE_AGENT_DEFS also includes amr, trae-cli, aider, antigravity, and reasonix, while this list stops at deepseek. If nexu-xray consumes this YAML field instead of dynamically reading BASE_AGENT_DEFS, those adapters will be skipped even though the contract says every release must regress the live catalog. Please either remove this hard-coded list and keep only the dynamic-enumeration rule, or update it to match BASE_AGENT_DEFS and add a note/test that keeps the YAML in sync with that source.
…hinese) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Siri-Ray
left a comment
There was a problem hiding this comment.
@lefarcen Thanks for moving the validation contract into the product repo and for converting the README/header context to English on this head. I rechecked the YAML parse, case counts, duplicate IDs, declared chains, and the runtime registry references; I found one non-blocking consistency issue that is still worth tightening before nexu-xray depends on this field.
🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.| per_model_focus: 'For each model: does execution complete WITHOUT error (no ''Connection reset by server'', | ||
| no ''socket connection'', no ''opencode event stream'' error), does it PRODUCE an output artifact, | ||
| and does retry-after-mid-run-error preserve prior thinking/output.' | ||
| amr_agents_registry: |
There was a problem hiding this comment.
NON_BLOCKING: This machine-readable amr_agents_registry still does not match the runtime source that lines 22-26 tell the runner to use. The current apps/daemon/src/runtimes/registry.ts BASE_AGENT_DEFS includes amr, trae-cli, aider, antigravity, and reasonix, but this YAML list starts at claude and stops at deepseek. If nexu-xray ever consumes this YAML field instead of dynamically enumerating BASE_AGENT_DEFS, those adapters will be skipped even though the contract says every release must regress the live AMR catalog. Please either remove this hard-coded registry field and keep only the dynamic-enumeration rule, or update it to match BASE_AGENT_DEFS and add a small sync check so the contract cannot drift again.
Adds
validation/main-chain-suite.yaml— open-design's "must work every release" regression contract / acceptance criteria, living alongside the product code.What it is
per_model× all AMR models), seeded from the 0.10.0 human acceptance + mining the source / e2e / daemon tests; each case cites itssourcesfor traceability.idto compute a deviation rate (R&D dashboard/daily/?view=yang, the "Agent autonomous validation vs human acceptance" card).Why it lives in the product repo
Each product ships its own validation contract; the tool (nexu-xray) is a generic runner. The contract belongs with the product code — owned by the product/QA team, updated in the same PR when a feature changes, and only ever grows (every human acceptance round is sedimented back into it).
Format
YAML — readable by humans and agents, diff-friendly, supports comments, and lighter than JSON. Migrated from the nexu-xray tool repo's
corpus/main-chain-suite.json; this is now the single source of truth.See
validation/README.md. Two new files only, no code changes.🤖 Generated with Claude Code