feat(recipe): Add context-management agent loops (relocation of #5636)#107
feat(recipe): Add context-management agent loops (relocation of #5636)#107HarishKMurali wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a pluggable context-management recipe for verl agent loops, including a model-based summarizer loop and a rule-based sliding-window tool loop, along with their respective context managers, tests, and a GRPO example. The review feedback highlights several critical issues and robustness improvements: a syntax error in the bash script due to inline comments breaking line continuation, inverted truncation logic in the tool response truncation, Python 3.9 compatibility issues with zip(..., strict=False), potential false positives in multi-modal data checks, and fragile assumptions regarding system prompt lengths and regex group counts during token slicing and replacement.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| actor_rollout_ref.rollout.mode=async \ | ||
| actor_rollout_ref.rollout.n=8 \ | ||
| actor_rollout_ref.rollout.multi_turn.enable=True \ | ||
| `# --- context-management wiring (the point of this recipe) ---` \ |
There was a problem hiding this comment.
Using a comment inside backticks with a line continuation backslash (\) in a multi-line bash command is a syntax error. The # character comments out the rest of the line, including the closing backtick and the backslash, which breaks the line continuation and causes the script to fail. Please move this comment to the line before the python3 command.
| multi_modal_data = await self.process_vision_info(messages) | ||
| if multi_modal_data: | ||
| raise ValueError("ToolSlidingWindowAgentLoop only supports text prompts.") |
There was a problem hiding this comment.
Checking if multi_modal_data: is prone to false positives if process_vision_info returns a dictionary with empty lists or None values (e.g., {'images': [], 'videos': []}). It is safer to explicitly check for the presence of actual image or video data.
| multi_modal_data = await self.process_vision_info(messages) | |
| if multi_modal_data: | |
| raise ValueError("ToolSlidingWindowAgentLoop only supports text prompts.") | |
| multi_modal_data = await self.process_vision_info(messages) | |
| if multi_modal_data and (multi_modal_data.get("images") or multi_modal_data.get("videos")): | |
| raise ValueError("ToolSlidingWindowAgentLoop only supports text prompts.") |
| generated_response_ids = [ | ||
| token_id for token_id, token_mask in zip(response_ids, state.response_mask, strict=False) if token_mask == 1 | ||
| ] |
There was a problem hiding this comment.
The strict parameter in zip() was introduced in Python 3.10. Explicitly passing strict=False will raise a TypeError on Python 3.9. Since False is the default behavior of zip(), it is safer to omit this parameter to maintain backward compatibility with Python 3.9.
| generated_response_ids = [ | |
| token_id for token_id, token_mask in zip(response_ids, state.response_mask, strict=False) if token_mask == 1 | |
| ] | |
| generated_response_ids = [ | |
| token_id for token_id, token_mask in zip(response_ids, state.response_mask) if token_mask == 1 | |
| ] |
| generated_response_ids = [ | ||
| token_id for token_id, token_mask in zip(response_ids, state.response_mask, strict=False) if token_mask == 1 | ||
| ] |
There was a problem hiding this comment.
The strict parameter in zip() was introduced in Python 3.10. Explicitly passing strict=False will raise a TypeError on Python 3.9. Since False is the default behavior of zip(), it is safer to omit this parameter to maintain backward compatibility with Python 3.9.
| generated_response_ids = [ | |
| token_id for token_id, token_mask in zip(response_ids, state.response_mask, strict=False) if token_mask == 1 | |
| ] | |
| generated_response_ids = [ | |
| token_id for token_id, token_mask in zip(response_ids, state.response_mask) if token_mask == 1 | |
| ] |
| tokenize=True, | ||
| **self.apply_chat_template_kwargs, | ||
| ) | ||
| summary_ids = normalize_token_ids(tokenized_summary)[self.system_prompt_length :] |
There was a problem hiding this comment.
Slicing tokenized_summary by a pre-calculated self.system_prompt_length assumes that the tokenizer's chat template always prepends the system prompt. If the template does not prepend a system prompt when none is provided in the messages, this will incorrectly truncate the actual summary tokens. It is safer to dynamically check if the system prompt is present before slicing.
| summary_ids = normalize_token_ids(tokenized_summary)[self.system_prompt_length :] | |
| system_prompt_ids = initialize_system_prompt(self.tokenizer, **self.apply_chat_template_kwargs) | |
| summary_ids = normalize_token_ids(tokenized_summary) | |
| if summary_ids[:len(system_prompt_ids)] == system_prompt_ids: | |
| summary_ids = summary_ids[len(system_prompt_ids):] |
| for index, match in enumerate(matches): | ||
| compressed_parts.append(text[last_end : match.start()]) | ||
| if index < num_to_compress: | ||
| start_tag, _, end_tag = match.groups() | ||
| # Previously compressed wouldn't be counted as 'removed_num_obs' this time | ||
| if match.group(2).strip() != self.replacing_text: | ||
| removed_num_obs += 1 | ||
| compressed_parts.append(f"{start_tag}{self.replacing_text}{end_tag}") | ||
| else: | ||
| compressed_parts.append(match.group(0)) | ||
| last_end = match.end() |
There was a problem hiding this comment.
Unpacking match.groups() assumes the regex pattern has exactly 3 groups. If a user configures a custom tool_response_pattern with extra passive or nested groups, this will raise a ValueError. It is more robust to use absolute indices of the second group (match.start(2) and match.end(2)) to perform the replacement.
| for index, match in enumerate(matches): | |
| compressed_parts.append(text[last_end : match.start()]) | |
| if index < num_to_compress: | |
| start_tag, _, end_tag = match.groups() | |
| # Previously compressed wouldn't be counted as 'removed_num_obs' this time | |
| if match.group(2).strip() != self.replacing_text: | |
| removed_num_obs += 1 | |
| compressed_parts.append(f"{start_tag}{self.replacing_text}{end_tag}") | |
| else: | |
| compressed_parts.append(match.group(0)) | |
| last_end = match.end() | |
| for index, match in enumerate(matches): | |
| if index < num_to_compress: | |
| if match.group(2).strip() != self.replacing_text: | |
| removed_num_obs += 1 | |
| compressed_parts.append(text[last_end : match.start(2)]) | |
| compressed_parts.append(self.replacing_text) | |
| last_end = match.end(2) | |
| else: | |
| compressed_parts.append(text[last_end : match.end()]) | |
| last_end = match.end() |
Port the agentic context-management code from verl#5636 ([algo] feat: supporting agentic rl with context management; issue #5375) into a self-contained recipe, as the verl maintainers requested. Contents: the ContextManager abstraction with sliding-window and summarizer implementations, the naive_summarizer_agent and tool_sliding_window_agent loops, the design doc, a runnable GRPO example, and the original CPU unit tests (19 passing against verl main 9c38b8bb). Changes vs #5636: relocate intra-package imports to recipe.context_management.* and fix one drifted core import (verl.tools.utils.tool_registry -> verl.tools.tool_registry) for current verl. AI assistance was used. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Zhentao Fan <zhentao.fan@mail.utoronto.ca>
f8e65e7 to
422ffe4
Compare
- Correct inverted tool_response_truncate_side in ToolSlidingWindowAgentLoop to
match verl-core ToolAgentLoop (left => keep tail, right => keep head).
Pre-existing in #5636.
- Guard the text-only multi-modal check against empty-list dicts
({'images': [], 'videos': []}).
- Remove a fragile backtick-comment idiom from the example run script.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
@wuxibin89 @ArronHZG would you be able to review or help suggest a reviewer? Thanks |
What does this PR do?
Adds a self-contained
context_management/recipe providing plug-in context management for verlagent loops — keeping long / multi-turn rollouts within the model's context window by compressing the
trajectory on the fly instead of truncating or failing once the window is exceeded.
It contains two agent loops and the shared
ContextManagerabstraction:name)naive_summarizer_agentSummarizerAgentLoop<summary>…</summary>, reset history to(prompt + summary)and continue.tool_sliding_window_agentToolSlidingWindowAgentLoopWhy this is not a duplicate
This is the maintainer-requested relocation of the AgentLoop-layer code from #5636 ("[algo] feat: supporting agentic rl with context management", issue #5375) out of
verl/experimental/agent_loop/and into a recipe, so it can evolve independently of core. Continuing @ZhentaoFan's work at their request.Contents
Changes vs #5636 (minimal, mechanical)
recipe.context_management.*(loop file + both tests + doc).verl.tools.utils.tool_registry→verl.tools.tool_registry.agent_loop,verl.tools,verl.utils.*,verl.workers.rollout.replica). No verl core changes are required to use this recipe.-Also corrects an inverted tool_response_truncate_side in the ported ToolSlidingWindowAgentLoop to match verl-core ToolAgentLoop (caught in review).
Testing
Run inside a verl checkout with this recipe mounted at
recipe/(verl9c38b8bb, imageverlai/verl:vllm011.latest):pytest recipe/context_management/test_context_manager.py \ recipe/context_management/test_agent_loop_with_context_management.py # => 19 passedImport + registration verified end-to-end:
Wired into a run via
actor_rollout_ref.rollout.agent.agent_loop_config_path=recipe/context_management/example/agent.yaml.Verification
The relocated code is behavior-identical to #5636 — only import paths changed, plus one drifted core
import fixed — and #5636's own demo verification ("Agent with Summarizer") is linked there. This PR
adds fresh evidence that the relocation is correct against current verl:
19 passedon the recipe's CPU unit tests (command above).recipe.context_management.*(snippet above).Open question for reviewers
Is the above (relocation + passing unit tests on current verl + #5636's demo) sufficient verification
for this move, or do you expect a fresh comparative experiment? Happy to add one if needed.
Notes
REQUIRED_VERL.txtpins the core-library commit this was validated against.