fix: mindspeed-mm backend adopt custom trainer's gradient clipping an… by Zhang1Sheng · Pull Request #84 · verl-project/verl-recipe

Zhang1Sheng · 2026-04-13T06:36:44Z

mindspeed-mm backend adopt custom trainer's gradient clipping and optimizer methods

…d optimizer methods

gemini-code-assist

Code Review

This pull request updates the MMActorRolloutRefWorker to ensure model parameters are correctly moved to the GPU before rollout and introduces a registered init_model method for handling optimizer steps. The review feedback identifies a critical regression where the removal of init_context during trainer initialization could lead to out-of-memory errors. Additionally, it is recommended to use a specific actor_trainer attribute instead of a generic trainer to prevent state being overwritten when the worker manages both actor and reference models.

gemini-code-assist · 2026-04-13T06:39:34Z

        )

-        with init_context(), warnings.catch_warnings():
+        with warnings.catch_warnings():


The init_context manager is defined on line 283 but is not used in the with statement. This is a regression that may lead to high memory usage or OOM during model initialization, as the meta-tensor context or other initialization optimizations won't be applied to the Trainer instantiation.

Suggested change

with warnings.catch_warnings():

with init_context, warnings.catch_warnings():

gemini-code-assist · 2026-04-13T06:39:34Z

                self.mm_args.parallel.fsdp_plan.cpu_offload = True

            trainer = Trainer(args=self.mm_args, dataloader_provider=self._dataloader)
+            self.trainer = trainer


Using a single self.trainer attribute is problematic if the worker manages both an actor and a reference model (e.g., in Role.ActorRolloutRef mode). The reference model's trainer will overwrite the actor's trainer, causing the _optimizer_step (defined in init_model) to use the wrong model and a None optimizer. It is safer to explicitly store the actor's trainer.

Suggested change

self.trainer = trainer

if role == "actor":

self.actor_trainer = trainer

gemini-code-assist · 2026-04-13T06:39:34Z

+            grad_norm = clip_grad_norm(self.trainer.model, max_norm=self.mm_args.training.clip_grad, foreach=self.mm_args.training.clip_grad_foreach)
+            # Update parameters
+            if self._is_actor:
+                self.trainer.optimizer.step()


Update this to use self.actor_trainer to ensure the correct model and optimizer are used, especially when both actor and reference models are present in the same worker. This prevents using the reference model's trainer (which has no optimizer) for the actor's update step.

Suggested change

grad_norm = clip_grad_norm(self.trainer.model, max_norm=self.mm_args.training.clip_grad, foreach=self.mm_args.training.clip_grad_foreach)

# Update parameters

if self._is_actor:

self.trainer.optimizer.step()

grad_norm = clip_grad_norm(self.actor_trainer.model, max_norm=self.mm_args.training.clip_grad, foreach=self.mm_args.training.clip_grad_foreach)

# Update parameters

if self._is_actor:

self.actor_trainer.optimizer.step()

gemini-code-assist · 2026-04-13T06:39:34Z

 class MMActorRolloutRefWorker(ActorRolloutRefWorker):
    def __init__(self, config: DictConfig, role: str, **kwargs):
        super().__init__(config, role, **kwargs)
+        self.trainer = None


Initialize self.actor_trainer instead of self.trainer to avoid ambiguity and potential overwriting when multiple models (actor and reference) are managed by the same worker.

Suggested change

self.trainer = None

self.actor_trainer = None

fix: mindspeed-mm backend adopt custom trainer's gradient clipping an…

67fe5ce

…d optimizer methods

gemini-code-assist Bot reviewed Apr 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: mindspeed-mm backend adopt custom trainer's gradient clipping an…#84

fix: mindspeed-mm backend adopt custom trainer's gradient clipping an…#84
Zhang1Sheng wants to merge 1 commit into
verl-project:mainfrom
Zhang1Sheng:main

Zhang1Sheng commented Apr 13, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	with warnings.catch_warnings():
	with init_context, warnings.catch_warnings():

	self.trainer = trainer
	if role == "actor":
	self.actor_trainer = trainer

Conversation

Zhang1Sheng commented Apr 13, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant