[recipe] add routing-aware replay utilities for MoE RL#114
[recipe] add routing-aware replay utilities for MoE RL#114kaining-never-stop wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces the routing_aware_replay utility, which provides tools for studying routing-aware replay policies in MoE RL post-training, including Fisher-weighted replay masks, budget-matched baselines, and diagnostics. The review feedback highlights a few improvement opportunities: handling identical scores during min-max normalization to avoid discarding high identical scores, validating that tau lies between theta_low and theta_high in the configuration schema, and adding corresponding unit tests to verify the behavior of identical scores.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
Thanks, these edge cases make sense. I pushed 7eb7c50 to handle them:
I re-ran the demo, unittest, pytest, ruff, and the ruff pre-commit hooks on the new Python files. |
|
Just checking in. The bot review comments have been addressed, and the PR is ready for maintainer review. Happy to rename the recipe or trim the scope if that would make the first version easier to review. |
This is a small draft PR for the RFC here:
verl-project/verl#6805
The idea is to keep the first version deliberately lightweight: a self-contained
routing_aware_replayrecipe for comparing replay masks in MoE RL, without touchingverlcore.What is included:
This is not meant to be a full training recipe yet. I kept it CPU-testable so it is easier to review first; if the direction looks useful, I can add a small MoE training config or align the schema with an upstream router replay output format in a follow-up.
I ran:
and the ruff hooks on the new Python files: