feat(research-ops): add enterprise Research Operations domain (v2.9.0)#752
Conversation
New top-level research-ops/ domain — the managed, cross-functional counterpart to the academic research/ domain. Single domain plugin (commercial/ pattern): orchestrator (context: fork) + 4 sub-skills: clinical-research (study design), research-finance (R&D program finance), market-research (sizing/survey/segmentation methodology), product-research (user-research method + insight repository). 12 stdlib-only deterministic Python tools (all pass --help/--sample/--output json), 12 reference docs (5-7 sources each), 4 asset templates, a cs-research-ops-orchestrator agent, and 6 commands (router + grill + 4 per-skill). Hard rules enforced via persona + anti-patterns: clinical outputs are estimates with a named clinical owner, finance routes capitalize-vs-expense to a named finance owner, market sizes show method (both ways) + assumptions, product insights require cross-participant recurrence. Explicit distinct_from framing keeps clear of ra-qm-team, finance, research/grants, product-team, and marketing-skill. Registered in marketplace.json (61 -> 62 plugins) and the codex/gemini/ hermes/vibe sync scripts + generate-docs; root CLAUDE.md bumped to v2.9.0. https://claude.ai/code/session_01PUNmQVE4WYvcrzpq2anC3D
Review: feat(research-ops) — enterprise Research Operations domain (v2.9.0)Overall this is a well-designed PR following the Blocking Issues (fix before merge to dev)1. File handle leak in all 12 scripts Every script uses:
Affects all 12 scripts: 2. Dead variable in
3.
4. No JSON parse error handling on Malformed JSON produces a raw Non-blocking Issues (can follow in v2.9.1)5.
6. Very long line in The Test Plan Gap7. Plugin-audit unchecked — recommended before dev→main
Minor8. Authorship inconsistency in orchestrator SKILL.md frontmatter
Positive Findings
Verdict: Approve with items 1–4 addressed before merge. Items 5–8 can be a follow-up. Item 7 strongly recommended before dev→main. The domain design and hard-rule implementation are solid. |
…utoresearch bridge
Connects each research-ops sub-skill to engineering/autoresearch-agent and adds
per-skill onboarding with a customization approach that the tools actually consume.
Each sub-skill (clinical-research, research-finance, market-research,
product-research) now ships three integration scripts:
- onboard.py — its own onboarding questionnaire; interactive or
--defaults / --set key=value / --reset / --scope; writes
~/.config/research-ops/<skill>.json (global) or
./.research-ops/<skill>.json (project).
- config_loader.py — loads that config (project > global > defaults;
RESEARCH_OPS_NO_CONFIG=1 bypass). Every scoring tool now
reads it so saved answers change behavior: default profile,
thresholds (alpha/power/dropout, F&A rate, runway, confidence,
MoE, insight source-threshold), and named owners printed on
clinical/finance outputs. CLI flags always override.
- ar_evaluator.py — an isolated, OPT-IN ground-truth evaluator bridging to
autoresearch. The loop edits the skill's input file and the
evaluator (never edited) scores it: clinical
feasibility_composite (higher), finance runway_months (higher),
market tam_divergence (lower), product validated_insights
(higher). No cross-skill coupling; invoked only on explicit
user request.
SKILL.md (each), the orchestrator, the agent, per-skill commands, and the domain
CLAUDE.md document the onboarding + opt-in autoresearch handoff. plugin.json /
marketplace describe 24 tools (12 analysis + 12 integration).
Builds the cross-platform plugins: codex symlinks + gemini SKILL.md mirrors +
vibe symlinks materialized for all 5 research-ops skills; indices regenerated.
Root CLAUDE.md scope/highlights updated.
https://claude.ai/code/session_01PUNmQVE4WYvcrzpq2anC3D
Code Review — PR #752: feat(research-ops): enterprise Research Operations domain (v2.9.0)Overall this is a well-structured, ambitious PR that follows the repo's established patterns cleanly. The domain design, plugin packaging, and SKILL.md quality are all strong. A few issues need attention before merge — one regression and one class of code bug are the main blockers. ✅ What's done well
🔴 Critical — File handle leaks (6 scripts)Multiple analysis scripts open JSON files without a context manager: # ❌ current pattern (leaks handle until GC)
data = json.load(open(args.input))
# ✅ correct
with open(args.input) as f:
data = json.load(f)Affected scripts:
This is a straightforward fix across all six files. 🔴 Possible regression —
|
| Skill | Before | After |
|---|---|---|
review |
engineering-team/playwright-pro/skills/review |
engineering-team/self-improving-agent/skills/review |
run |
engineering/agenthub/skills/run |
engineering/autoresearch-agent/skills/run |
status |
engineering/agenthub/skills/status |
engineering/autoresearch-agent/skills/status |
These are distinct skills with distinct descriptions — the swap doesn't look intentional. If the sync script regenerated these from a non-deterministic dict or glob, the churn should be reverted. Please confirm whether this was intended refactoring or a sync artifact, and revert if the latter.
🟡 High — Input validation gaps
capex_vs_opex_router.py — items with an empty or missing criteria dict are routed without flagging the input as incomplete. At minimum, surface a warning rather than silently proceeding with default behavior.
insight_synthesizer.py — observations missing the participant key silently default to "UNKNOWN" rather than raising an error or warning. A malformed observations list could produce misleading output (e.g., a single participant contributing to multiple "recurrences").
🟡 Medium — Silent config failure in config_loader.py
_read_json() catches both OSError and JSONDecodeError and returns None silently. If a saved config is unreadable (e.g., permission denied), the tool falls back to defaults with no user-visible warning. This makes it hard to diagnose misconfiguration. A stderr warning on fallback would help.
🟡 Medium — Hardcoded constants without parametrization
phase_gate_scorer.py— recruitment nominal capacity (25.0 enroll/site/yr) is hardcoded with no flag to override.endpoint_selector.py— surrogate endpoint penalty weight (0.7) is hardcoded. Both values are domain-specific and could reasonably vary by trial type; exposing them as--profile-settable config keys would be consistent with the customization pattern already in place.
🟡 Medium — Edge cases in numerical tools
sample_size_estimator.py— no guard fordropout=1.0(produces division by zero) oralpha/powercombinations that yieldinfsample size.market_sizer.py—tam_divergencecalculation has a zero-denominator risk iftop_down_tam=0; bottoms-up path silently returnsimplied_customers_at_SOM: Nonewhenprice=0rather than raisingValueError.program_budget_planner.py— silent0.0fallback when periods count mismatches amounts array length; a warning would surface data entry errors.
🔵 Minor — ar_evaluator exit-code ambiguity
On JSON decode error, ar_evaluator.py prints both {METRIC}: N/A and an error line before returning exit code 1. The autoresearch loop treats non-zero exit as DISCARD, which is correct, but the N/A metric line on stdout before the error may confuse parsers that read only the first output line. Consider writing the error to stderr only.
Summary
| Severity | Count | Action |
|---|---|---|
| 🔴 Critical | 2 | File handle leaks (fix in 6 scripts) + codex index regression (confirm/revert) |
| 🟡 Medium | 4 | Input validation, silent config failure, hardcoded constants, numerical edge cases |
| 🔵 Minor | 1 | ar_evaluator stderr/stdout split |
The domain design, plugin structure, SKILL.md quality, and hard-rule enforcement are all excellent. Once the file-handle leaks are fixed and the codex index regression is confirmed/reverted, this is ready to merge.
Summary
New top-level
research-ops/domain — the managed, cross-functional counterpart to the academicresearch/domain. Single domain plugin (thecommercial/v2.8.0 pattern): orchestrator (context: fork) + 4 managed sub-skills, each now wired with per-skill onboarding, a customization config the tools actually consume, and an isolated opt-in autoresearch bridge.research-ops-skillsclinical-researchresearch-financemarket-researchproduct-researchPer-skill integration (onboarding · customization · autoresearch)
Each sub-skill ships three integration scripts beside its 3 analysis tools:
onboard.py— its own onboarding questionnaire (distinct question set per skill). Interactive, or--defaults/--set key=value/--reset/--scope {global,project}. Writes~/.config/research-ops/<skill>.jsonor./.research-ops/<skill>.json.config_loader.py— loads that config (precedence project > global > defaults;RESEARCH_OPS_NO_CONFIG=1bypass). Every scoring tool reads it, so saved answers change behavior: default profile, thresholds (alpha/power/dropout, F&A rate, runway, confidence, MoE, insight source-threshold), and the named owners printed on clinical/finance outputs. CLI flags always override.ar_evaluator.py— an isolated, opt-in ground-truth evaluator bridging toengineering/autoresearch-agent. Invoked only on explicit user request; the loop edits the skill's input file, the evaluator is never edited. Metrics: clinicalfeasibility_composite↑, financerunway_months↑, markettam_divergence↓, productvalidated_insights↑. No cross-skill coupling.Hard rules (unchanged, enforced via persona + anti-patterns)
clinical = estimate + named clinical owner (never fact) · finance surfaces assumptions, routes treatment to a named finance owner (never auto-decides) · market shows method (both ways) + assumptions (never a single number) · product requires recurrence across independent participants.
Plugin builds
research-ops/is a Claude Code plugin (plugin.json+ marketplace entry, 62 plugins). Cross-platform artifacts materialized: Codex symlinks, GeminiSKILL.mdmirrors, Vibe symlinks for all 5 skills; indices regenerated. RootCLAUDE.mdbumped to v2.9.0.Test plan
--help,--sample/--show, andcompileall--profileRESEARCH_OPS_NO_CONFIG=1bypass worksscripts/check_plugin_json.py --allpasses; marketplace + all 3 sync indices parserun/status/reviewsymlink churn reverted/plugin-audit8-phase pipeline on the orchestrator + one sub-skillhttps://claude.ai/code/session_01PUNmQVE4WYvcrzpq2anC3D