Skip to content

fix: mindspeed-mm backend adopt custom trainer's gradient clipping an…#84

Open
Zhang1Sheng wants to merge 1 commit into
verl-project:mainfrom
Zhang1Sheng:main
Open

fix: mindspeed-mm backend adopt custom trainer's gradient clipping an…#84
Zhang1Sheng wants to merge 1 commit into
verl-project:mainfrom
Zhang1Sheng:main

Conversation

@Zhang1Sheng

Copy link
Copy Markdown

mindspeed-mm backend adopt custom trainer's gradient clipping and optimizer methods

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the MMActorRolloutRefWorker to ensure model parameters are correctly moved to the GPU before rollout and introduces a registered init_model method for handling optimizer steps. The review feedback identifies a critical regression where the removal of init_context during trainer initialization could lead to out-of-memory errors. Additionally, it is recommended to use a specific actor_trainer attribute instead of a generic trainer to prevent state being overwritten when the worker manages both actor and reference models.

)

with init_context(), warnings.catch_warnings():
with warnings.catch_warnings():

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The init_context manager is defined on line 283 but is not used in the with statement. This is a regression that may lead to high memory usage or OOM during model initialization, as the meta-tensor context or other initialization optimizations won't be applied to the Trainer instantiation.

Suggested change
with warnings.catch_warnings():
with init_context, warnings.catch_warnings():

self.mm_args.parallel.fsdp_plan.cpu_offload = True

trainer = Trainer(args=self.mm_args, dataloader_provider=self._dataloader)
self.trainer = trainer

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using a single self.trainer attribute is problematic if the worker manages both an actor and a reference model (e.g., in Role.ActorRolloutRef mode). The reference model's trainer will overwrite the actor's trainer, causing the _optimizer_step (defined in init_model) to use the wrong model and a None optimizer. It is safer to explicitly store the actor's trainer.

Suggested change
self.trainer = trainer
if role == "actor":
self.actor_trainer = trainer

Comment on lines +370 to +373
grad_norm = clip_grad_norm(self.trainer.model, max_norm=self.mm_args.training.clip_grad, foreach=self.mm_args.training.clip_grad_foreach)
# Update parameters
if self._is_actor:
self.trainer.optimizer.step()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Update this to use self.actor_trainer to ensure the correct model and optimizer are used, especially when both actor and reference models are present in the same worker. This prevents using the reference model's trainer (which has no optimizer) for the actor's update step.

Suggested change
grad_norm = clip_grad_norm(self.trainer.model, max_norm=self.mm_args.training.clip_grad, foreach=self.mm_args.training.clip_grad_foreach)
# Update parameters
if self._is_actor:
self.trainer.optimizer.step()
grad_norm = clip_grad_norm(self.actor_trainer.model, max_norm=self.mm_args.training.clip_grad, foreach=self.mm_args.training.clip_grad_foreach)
# Update parameters
if self._is_actor:
self.actor_trainer.optimizer.step()

class MMActorRolloutRefWorker(ActorRolloutRefWorker):
def __init__(self, config: DictConfig, role: str, **kwargs):
super().__init__(config, role, **kwargs)
self.trainer = None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Initialize self.actor_trainer instead of self.trainer to avoid ambiguity and potential overwriting when multiple models (actor and reference) are managed by the same worker.

Suggested change
self.trainer = None
self.actor_trainer = None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant