[recipe] add routing-aware replay utilities for MoE RL by kaining-never-stop · Pull Request #114 · verl-project/verl-recipe

kaining-never-stop · 2026-06-21T11:57:24Z

This is a small draft PR for the RFC here:

The idea is to keep the first version deliberately lightweight: a self-contained routing_aware_replay recipe for comparing replay masks in MoE RL, without touching verl core.

What is included:

Fisher-weighted replay mask construction
budget-matched uniform/random controls
compact replay diagnostics
a CPU-only synthetic example
unit tests for the mask and diagnostics behavior

This is not meant to be a full training recipe yet. I kept it CPU-testable so it is easier to review first; if the direction looks useful, I can add a small MoE training config or align the schema with an upstream router replay output format in a follow-up.

I ran:

python examples/synthetic_router_replay_demo.py
python -m unittest discover -s tests
pytest tests

and the ruff hooks on the new Python files:

pre-commit run ruff --files <new python files>
pre-commit run ruff-format --files <new python files>

gemini-code-assist

Code Review

This pull request introduces the routing_aware_replay utility, which provides tools for studying routing-aware replay policies in MoE RL post-training, including Fisher-weighted replay masks, budget-matched baselines, and diagnostics. The review feedback highlights a few improvement opportunities: handling identical scores during min-max normalization to avoid discarding high identical scores, validating that tau lies between theta_low and theta_high in the configuration schema, and adding corresponding unit tests to verify the behavior of identical scores.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

kaining-never-stop · 2026-06-21T12:02:38Z

Thanks, these edge cases make sense. I pushed 7eb7c50 to handle them:

identical positive scores now keep the replay mask on instead of being normalized away;
identical zero scores still produce a zero mask;
tau now has to stay between theta_low and theta_high;
added tests for those cases.

I re-ran the demo, unittest, pytest, ruff, and the ruff pre-commit hooks on the new Python files.

kaining-never-stop · 2026-06-23T02:11:52Z

Just checking in. The bot review comments have been addressed, and the PR is ready for maintainer review. Happy to rename the recipe or trim the scope if that would make the first version easier to review.

feat: add routing-aware replay recipe utilities

c7f1cd4

gemini-code-assist Bot reviewed Jun 21, 2026

View reviewed changes

Comment thread routing_aware_replay/routing_aware_replay/fisher_mask.py Outdated

Comment thread routing_aware_replay/routing_aware_replay/schema.py

Comment thread routing_aware_replay/tests/test_fisher_mask.py

fix: handle identical replay scores

7eb7c50

kaining-never-stop marked this pull request as ready for review June 21, 2026 12:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[recipe] add routing-aware replay utilities for MoE RL#114

[recipe] add routing-aware replay utilities for MoE RL#114
kaining-never-stop wants to merge 2 commits into
verl-project:mainfrom
kaining-never-stop:recipe/routing-aware-replay

kaining-never-stop commented Jun 21, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kaining-never-stop commented Jun 21, 2026 •

edited

Loading

Uh oh!

kaining-never-stop commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kaining-never-stop commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kaining-never-stop commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kaining-never-stop commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kaining-never-stop commented Jun 21, 2026 •

edited

Loading

kaining-never-stop commented Jun 21, 2026 •

edited

Loading