【draft】add new dapo trainer with TransferQueue by ji-huazhong · Pull Request #82 · verl-project/verl-recipe

ji-huazhong · 2026-04-12T11:41:11Z

No description provided.

gemini-code-assist

Code Review

This pull request introduces the DAPOSyncPPOTrainer, which integrates DAPO dynamic sampling with TransferQueue and ReplayBuffer for synchronized training. The implementation includes a new Hydra configuration and a trainer class that supports multi-batch generation and conditional KL computation. Review feedback highlights a potential dimension mismatch when processing reward scores, suggests the removal of unused variables, and recommends replacing standard print statements with proper logging. Additionally, there is a suggestion to handle excessive generation retries more gracefully to prevent the training process from crashing due to a ValueError.

gemini-code-assist · 2026-04-12T11:42:42Z

+        if metric_name == "seq_final_reward" and "token_level_rewards" in data:
+            metric_values = data["token_level_rewards"].to_padded_tensor().sum(dim=-1).numpy()
+        else:
+            metric_values = data["rm_scores"].to_padded_tensor().sum(dim=-1).numpy()


The sum(dim=-1) operation on rm_scores might be incorrect if the reward model provides sequence-level scores (which are already 1D). If rm_scores is 1D, sum(dim=-1) will reduce the entire batch to a single scalar, causing the subsequent zip at line 133 to fail with a TypeError. It is safer to check the dimensions before summing.

metric_values = data["rm_scores"].to_padded_tensor() if metric_values.ndim > 1: metric_values = metric_values.sum(dim=-1) metric_values = metric_values.numpy()

gemini-code-assist · 2026-04-12T11:42:42Z

+            uids = list(uid_data)
+
+        prompt_uid2metric_vals = defaultdict(list)
+        prompt_uid2key_indices = defaultdict(list)


The variable prompt_uid2key_indices is initialized but never used in the function. It should be removed to keep the code clean.

gemini-code-assist · 2026-04-12T11:42:42Z

+
+                            prompt_bsz = self.config.data.train_batch_size
+                            if num_prompt_in_batch < prompt_bsz:
+                                print(f"{num_prompt_in_batch=} < {prompt_bsz=}")


Using print for logging inside the training loop can be problematic, especially if it spams the console during multiple generation retries. It is better to use the configured logger.

Suggested change

print(f"{num_prompt_in_batch=} < {prompt_bsz=}")

logger.info(f"{num_prompt_in_batch=} < {prompt_bsz=}")

gemini-code-assist · 2026-04-12T11:42:42Z

+                                print(f"{num_prompt_in_batch=} < {prompt_bsz=}")
+                                max_num_gen_batches = self.config.algorithm.filter_groups.max_num_gen_batches
+                                if max_num_gen_batches <= 0 or num_gen_batches < max_num_gen_batches:
+                                    print(f"{num_gen_batches=}. Keep generating...")


Using print for logging inside the training loop can be problematic. It is better to use the configured logger.

Suggested change

print(f"{num_gen_batches=}. Keep generating...")

logger.info(f"{num_gen_batches=}. Keep generating...")

gemini-code-assist · 2026-04-12T11:42:42Z

+                                    raise ValueError(
+                                        f"{num_gen_batches=} >= {max_num_gen_batches=}. "
+                                        "Generated too many. Please check if your data are too difficult. "
+                                        "You could also try set max_num_gen_batches=0 to enable endless trials."
+                                    )


Raising a ValueError here will crash the entire training process. While this is a safety check against infinite loops, in a production or long-running environment, it might be preferable to log a critical error and either skip the batch or stop the training gracefully (e.g., by returning from fit) to allow for final checkpointing or cleaner exits.

add new dapo trainer with TransferQueue

4ec4a7b

ji-huazhong marked this pull request as draft April 12, 2026 11:41

gemini-code-assist Bot reviewed Apr 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【draft】add new dapo trainer with TransferQueue#82

【draft】add new dapo trainer with TransferQueue#82
ji-huazhong wants to merge 1 commit into
verl-project:mainfrom
ji-huazhong:feat/dapo-tq

ji-huazhong commented Apr 12, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 12, 2026

Uh oh!

gemini-code-assist Bot Apr 12, 2026

Uh oh!

gemini-code-assist Bot Apr 12, 2026

Uh oh!

gemini-code-assist Bot Apr 12, 2026

Uh oh!

gemini-code-assist Bot Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	print(f"{num_prompt_in_batch=} < {prompt_bsz=}")
	logger.info(f"{num_prompt_in_batch=} < {prompt_bsz=}")

	print(f"{num_gen_batches=}. Keep generating...")
	logger.info(f"{num_gen_batches=}. Keep generating...")

Conversation

ji-huazhong commented Apr 12, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant