feat(research-ops): add enterprise Research Operations domain (v2.9.0) by alirezarezvani · Pull Request #752 · alirezarezvani/claude-skills

alirezarezvani · 2026-05-27T00:32:01Z

Summary

New top-level research-ops/ domain — the managed, cross-functional counterpart to the academic research/ domain. Single domain plugin (the commercial/ v2.8.0 pattern): orchestrator (context: fork) + 4 managed sub-skills, each now wired with per-skill onboarding, a customization config the tools actually consume, and an isolated opt-in autoresearch bridge.

Skill	Purpose
`research-ops-skills`	Orchestrator — two-signal routing + Matt Pocock grill discipline
`clinical-research`	Endpoint selection, sample-size/power (means/proportions/survival), phase-gate feasibility
`research-finance`	R&D program budgeting (F&A split), burn/runway, capitalize-vs-expense routing
`market-research`	TAM/SAM/SOM (both methods), survey sampling (FPC + per-segment), Kotler segmentation
`product-research`	Goal-matched study design, method-based saturation, insight synthesis

Per-skill integration (onboarding · customization · autoresearch)

Each sub-skill ships three integration scripts beside its 3 analysis tools:

onboard.py — its own onboarding questionnaire (distinct question set per skill). Interactive, or --defaults / --set key=value / --reset / --scope {global,project}. Writes ~/.config/research-ops/<skill>.json or ./.research-ops/<skill>.json.
config_loader.py — loads that config (precedence project > global > defaults; RESEARCH_OPS_NO_CONFIG=1 bypass). Every scoring tool reads it, so saved answers change behavior: default profile, thresholds (alpha/power/dropout, F&A rate, runway, confidence, MoE, insight source-threshold), and the named owners printed on clinical/finance outputs. CLI flags always override.
ar_evaluator.py — an isolated, opt-in ground-truth evaluator bridging to engineering/autoresearch-agent. Invoked only on explicit user request; the loop edits the skill's input file, the evaluator is never edited. Metrics: clinical feasibility_composite↑, finance runway_months↑, market tam_divergence↓, product validated_insights↑. No cross-skill coupling.

Hard rules (unchanged, enforced via persona + anti-patterns)

clinical = estimate + named clinical owner (never fact) · finance surfaces assumptions, routes treatment to a named finance owner (never auto-decides) · market shows method (both ways) + assumptions (never a single number) · product requires recurrence across independent participants.

Plugin builds

research-ops/ is a Claude Code plugin (plugin.json + marketplace entry, 62 plugins). Cross-platform artifacts materialized: Codex symlinks, Gemini SKILL.md mirrors, Vibe symlinks for all 5 skills; indices regenerated. Root CLAUDE.md bumped to v2.9.0.

Test plan

All 24 scripts (12 analysis + 12 onboarding/customization/autoresearch) pass --help, --sample/--show, and compileall
12 scoring tools emit valid JSON; every scoring tool accepts --profile
Customization verified in use end-to-end: onboarding answers flow into profile, power, named owners, F&A rate, confidence/MoE, and insight threshold (e.g. threshold=2 promotes a second cluster to INSIGHT)
No network/LLM imports; RESEARCH_OPS_NO_CONFIG=1 bypass works
scripts/check_plugin_json.py --all passes; marketplace + all 3 sync indices parse
Cross-platform syncs discover all 5 research-ops skills; unrelated codex run/status/review symlink churn reverted
Optional: /plugin-audit 8-phase pipeline on the orchestrator + one sub-skill

https://claude.ai/code/session_01PUNmQVE4WYvcrzpq2anC3D

New top-level research-ops/ domain — the managed, cross-functional counterpart to the academic research/ domain. Single domain plugin (commercial/ pattern): orchestrator (context: fork) + 4 sub-skills: clinical-research (study design), research-finance (R&D program finance), market-research (sizing/survey/segmentation methodology), product-research (user-research method + insight repository). 12 stdlib-only deterministic Python tools (all pass --help/--sample/--output json), 12 reference docs (5-7 sources each), 4 asset templates, a cs-research-ops-orchestrator agent, and 6 commands (router + grill + 4 per-skill). Hard rules enforced via persona + anti-patterns: clinical outputs are estimates with a named clinical owner, finance routes capitalize-vs-expense to a named finance owner, market sizes show method (both ways) + assumptions, product insights require cross-participant recurrence. Explicit distinct_from framing keeps clear of ra-qm-team, finance, research/grants, product-team, and marketing-skill. Registered in marketplace.json (61 -> 62 plugins) and the codex/gemini/ hermes/vibe sync scripts + generate-docs; root CLAUDE.md bumped to v2.9.0. https://claude.ai/code/session_01PUNmQVE4WYvcrzpq2anC3D

claude · 2026-05-27T00:38:54Z

Review: feat(research-ops) — enterprise Research Operations domain (v2.9.0)

Overall this is a well-designed PR following the commercial/ + business-operations/ v2.8.0 pattern. Domain differentiation is clear, hard-rules are thoughtfully enforced, and the corpus is internally consistent. A handful of code-quality issues should be addressed before merge.

Blocking Issues (fix before merge to dev)

1. File handle leak in all 12 scripts

Every script uses:

data = SAMPLE if (args.sample or not args.input) else json.load(open(args.input))

open() without a with block leaks the file handle. Replace with:

if args.sample or not args.input:
    data = SAMPLE
else:
    with open(args.input) as f:
        data = json.load(f)

Affects all 12 scripts: endpoint_selector.py, market_sizer.py, capex_vs_opex_router.py, insight_synthesizer.py, and the remaining 8.

2. Dead variable in endpoint_selector.py:classify()

top = ordered[0]["composite"] is assigned but never referenced. Any linter flags this. Remove it or use it.

3. import math inside function body — saturation_planner.py:usability_plan()

import math is nested inside the function. Move it to module level with the other imports.

4. No JSON parse error handling on --input files

Malformed JSON produces a raw JSONDecodeError traceback. The main() already has try/except ValueError for calculation errors — extend it to also catch json.JSONDecodeError and emit a friendly error: invalid JSON in <path> to stderr.

Non-blocking Issues (can follow in v2.9.1)

5. --sample silently overrides --standard in capex_vs_opex_router.py

python3 capex_vs_opex_router.py --sample --standard usgaap silently uses IFRS from the hardcoded sample. If the intent is that sample data wins, add a comment. If not, let the explicit CLI flag take precedence.

6. Very long line in market_sizer.py:bottoms_up()

The implied_customers_at_SOM expression is ~180 chars. Split it into a named intermediate.

Test Plan Gap

7. Plugin-audit unchecked — recommended before dev→main

commercial/ and business-operations/ both ran the 8-phase /plugin-audit and posted scores in their PRs. For a new top-level domain, consider running it on research-ops-skills (orchestrator) and clinical-research before the dev→main merge.

Minor

8. Authorship inconsistency in orchestrator SKILL.md frontmatter

research-ops-skills/SKILL.md has author: claude-code-skills; plugin.json, the agent, and all commands use Alireza Rezvani. Align them.

Positive Findings

context: fork correctly set in orchestrator SKILL.md — consistent with sibling orchestrators
plugin.json skills array uses "./skills/..." prefix per canonical CC 2.1.144+ form
marketplace.json correctly bumped 61 → 62; all 5 sync scripts updated consistently in one PR
CLAUDE.md counters (335 skills, 15 domains, ~463 tools, ~602 refs) are arithmetically correct
ESTIMATE banners in clinical tools with named owner requirements are exactly right for biostatistics tooling
Named owner chain in phase_gate_scorer.py (PI, Medical Monitor, Biostatistician, Regulatory Owner) and never-auto-decides in capex_vs_opex_router.py match the domain's hard rules
Forcing-question library in orchestrator SKILL.md implements one-per-turn Matt Pocock discipline with recommended answers and canon citations
distinct_from coverage (ra-qm-team / finance / research/grants / product-team / marketing-skill) is thorough and defensible
z-table lookup in sample_size_estimator.py avoids scipy entirely — clean stdlib-only design
Division guard in market_sizer.py:bottoms_up() correctly protects against price=0
source extension field in plugin.json uses the approved format with spec + build_pattern + distinct_from

Verdict: Approve with items 1–4 addressed before merge. Items 5–8 can be a follow-up. Item 7 strongly recommended before dev→main. The domain design and hard-rule implementation are solid.

…utoresearch bridge Connects each research-ops sub-skill to engineering/autoresearch-agent and adds per-skill onboarding with a customization approach that the tools actually consume. Each sub-skill (clinical-research, research-finance, market-research, product-research) now ships three integration scripts: - onboard.py — its own onboarding questionnaire; interactive or --defaults / --set key=value / --reset / --scope; writes ~/.config/research-ops/<skill>.json (global) or ./.research-ops/<skill>.json (project). - config_loader.py — loads that config (project > global > defaults; RESEARCH_OPS_NO_CONFIG=1 bypass). Every scoring tool now reads it so saved answers change behavior: default profile, thresholds (alpha/power/dropout, F&A rate, runway, confidence, MoE, insight source-threshold), and named owners printed on clinical/finance outputs. CLI flags always override. - ar_evaluator.py — an isolated, OPT-IN ground-truth evaluator bridging to autoresearch. The loop edits the skill's input file and the evaluator (never edited) scores it: clinical feasibility_composite (higher), finance runway_months (higher), market tam_divergence (lower), product validated_insights (higher). No cross-skill coupling; invoked only on explicit user request. SKILL.md (each), the orchestrator, the agent, per-skill commands, and the domain CLAUDE.md document the onboarding + opt-in autoresearch handoff. plugin.json / marketplace describe 24 tools (12 analysis + 12 integration). Builds the cross-platform plugins: codex symlinks + gemini SKILL.md mirrors + vibe symlinks materialized for all 5 research-ops skills; indices regenerated. Root CLAUDE.md scope/highlights updated. https://claude.ai/code/session_01PUNmQVE4WYvcrzpq2anC3D

claude · 2026-05-27T00:57:58Z

Code Review — PR #752: feat(research-ops): enterprise Research Operations domain (v2.9.0)

Overall this is a well-structured, ambitious PR that follows the repo's established patterns cleanly. The domain design, plugin packaging, and SKILL.md quality are all strong. A few issues need attention before merge — one regression and one class of code bug are the main blockers.

✅ What's done well

Plugin.json — all five skills use the required ./ prefix convention (./skills/research-ops-skills, etc.); all ClawHub required fields present; source.distinct_from scope boundaries are clear and accurate.
SKILL.md quality — all four sub-skills have YAML frontmatter with Use when trigger clauses, a Forcing-question library (5 questions each, with recommended answers + canon citations), and 3 references citing 5–7 authoritative sources. Fully compliant with the Matt Pocock grill pattern.
Hard rules enforced — clinical outputs carry the "ESTIMATE + named clinical owner" banner, finance surfaces assumption blocks and routes to a named finance owner, market outputs always show method + assumptions, product insights require recurrence across independent participants. The persona constraints are correctly wired.
ar_evaluator design — isolated, opt-in, locked-ground-truth. Loop edits the skill's input file; the evaluator is never edited. Metrics are machine-parseable (feasibility_composite: <float>). No cross-skill coupling detected.
config_loader.py design — project > global > defaults precedence is correct; RESEARCH_OPS_NO_CONFIG=1 bypass works as documented; deep-merge handles nested dicts properly.
Stdlib-only — confirmed across all 24 scripts; no network or LLM calls.

🔴 Critical — File handle leaks (6 scripts)

Multiple analysis scripts open JSON files without a context manager:

# ❌ current pattern (leaks handle until GC)
data = json.load(open(args.input))

# ✅ correct
with open(args.input) as f:
    data = json.load(f)

Affected scripts:

clinical-research/scripts/endpoint_selector.py
clinical-research/scripts/phase_gate_scorer.py
research-finance/scripts/program_budget_planner.py
research-finance/scripts/capex_vs_opex_router.py
market-research/scripts/market_sizer.py
product-research/scripts/insight_synthesizer.py

This is a straightforward fix across all six files.

🔴 Possible regression — `.codex/skills-index.json` source swaps

Three unrelated skills have their source paths swapped in the index:

Skill	Before	After
`review`	`engineering-team/playwright-pro/skills/review`	`engineering-team/self-improving-agent/skills/review`
`run`	`engineering/agenthub/skills/run`	`engineering/autoresearch-agent/skills/run`
`status`	`engineering/agenthub/skills/status`	`engineering/autoresearch-agent/skills/status`

These are distinct skills with distinct descriptions — the swap doesn't look intentional. If the sync script regenerated these from a non-deterministic dict or glob, the churn should be reverted. Please confirm whether this was intended refactoring or a sync artifact, and revert if the latter.

🟡 High — Input validation gaps

capex_vs_opex_router.py — items with an empty or missing criteria dict are routed without flagging the input as incomplete. At minimum, surface a warning rather than silently proceeding with default behavior.

insight_synthesizer.py — observations missing the participant key silently default to "UNKNOWN" rather than raising an error or warning. A malformed observations list could produce misleading output (e.g., a single participant contributing to multiple "recurrences").

🟡 Medium — Silent config failure in `config_loader.py`

_read_json() catches both OSError and JSONDecodeError and returns None silently. If a saved config is unreadable (e.g., permission denied), the tool falls back to defaults with no user-visible warning. This makes it hard to diagnose misconfiguration. A stderr warning on fallback would help.

🟡 Medium — Hardcoded constants without parametrization

phase_gate_scorer.py — recruitment nominal capacity (25.0 enroll/site/yr) is hardcoded with no flag to override.
endpoint_selector.py — surrogate endpoint penalty weight (0.7) is hardcoded. Both values are domain-specific and could reasonably vary by trial type; exposing them as --profile-settable config keys would be consistent with the customization pattern already in place.

🟡 Medium — Edge cases in numerical tools

sample_size_estimator.py — no guard for dropout=1.0 (produces division by zero) or alpha/power combinations that yield inf sample size.
market_sizer.py — tam_divergence calculation has a zero-denominator risk if top_down_tam=0; bottoms-up path silently returns implied_customers_at_SOM: None when price=0 rather than raising ValueError.
program_budget_planner.py — silent 0.0 fallback when periods count mismatches amounts array length; a warning would surface data entry errors.

🔵 Minor — ar_evaluator exit-code ambiguity

On JSON decode error, ar_evaluator.py prints both {METRIC}: N/A and an error line before returning exit code 1. The autoresearch loop treats non-zero exit as DISCARD, which is correct, but the N/A metric line on stdout before the error may confuse parsers that read only the first output line. Consider writing the error to stderr only.

Summary

Severity	Count	Action
🔴 Critical	2	File handle leaks (fix in 6 scripts) + codex index regression (confirm/revert)
🟡 Medium	4	Input validation, silent config failure, hardcoded constants, numerical edge cases
🔵 Minor	1	ar_evaluator stderr/stdout split

The domain design, plugin structure, SKILL.md quality, and hard-rule enforcement are all excellent. Once the file-handle leaks are fixed and the codex index regression is confirmed/reverted, this is ready to merge.

alirezarezvani and others added 2 commits May 26, 2026 16:07

Merge pull request #750 from alirezarezvani/dev

eace618

alirezarezvani marked this pull request as ready for review May 27, 2026 00:33

alirezarezvani merged commit e7d038c into dev May 27, 2026
5 checks passed

alirezarezvani deleted the claude/research-skills-category-qWMjm branch May 27, 2026 05:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(research-ops): add enterprise Research Operations domain (v2.9.0)#752

feat(research-ops): add enterprise Research Operations domain (v2.9.0)#752
alirezarezvani merged 3 commits into
devfrom
claude/research-skills-category-qWMjm

alirezarezvani commented May 27, 2026 •

edited

Loading

Uh oh!

claude Bot commented May 27, 2026

Uh oh!

claude Bot commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

alirezarezvani commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Per-skill integration (onboarding · customization · autoresearch)

Hard rules (unchanged, enforced via persona + anti-patterns)

Plugin builds

Test plan

Uh oh!

claude Bot commented May 27, 2026

Review: feat(research-ops) — enterprise Research Operations domain (v2.9.0)

Blocking Issues (fix before merge to dev)

Non-blocking Issues (can follow in v2.9.1)

Test Plan Gap

Minor

Positive Findings

Uh oh!

claude Bot commented May 27, 2026

Code Review — PR #752: feat(research-ops): enterprise Research Operations domain (v2.9.0)

✅ What's done well

🔴 Critical — File handle leaks (6 scripts)

🔴 Possible regression — .codex/skills-index.json source swaps

🟡 High — Input validation gaps

🟡 Medium — Silent config failure in config_loader.py

🟡 Medium — Hardcoded constants without parametrization

🟡 Medium — Edge cases in numerical tools

🔵 Minor — ar_evaluator exit-code ambiguity

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alirezarezvani commented May 27, 2026 •

edited

Loading

🔴 Possible regression — `.codex/skills-index.json` source swaps

🟡 Medium — Silent config failure in `config_loader.py`