Skip to content

feat(recipe): Add context-management agent loops (relocation of #5636)#107

Open
HarishKMurali wants to merge 2 commits into
verl-project:mainfrom
HarishKMurali:naive_summarizer
Open

feat(recipe): Add context-management agent loops (relocation of #5636)#107
HarishKMurali wants to merge 2 commits into
verl-project:mainfrom
HarishKMurali:naive_summarizer

Conversation

@HarishKMurali

@HarishKMurali HarishKMurali commented Jun 5, 2026

Copy link
Copy Markdown

What does this PR do?

Adds a self-contained context_management/ recipe providing plug-in context management for verl
agent loops — keeping long / multi-turn rollouts within the model's context window by compressing the
trajectory on the fly instead of truncating or failing once the window is exceeded.

It contains two agent loops and the shared ContextManager abstraction:

Agent loop (name) Class Strategy
naive_summarizer_agent SummarizerAgentLoop On a model-emitted <summary>…</summary>, reset history to (prompt + summary) and continue.
tool_sliding_window_agent ToolSlidingWindowAgentLoop Sliding window over tool-calling turns; drop oldest when exceeded.

Why this is not a duplicate

This is the maintainer-requested relocation of the AgentLoop-layer code from #5636 ("[algo] feat: supporting agentic rl with context management", issue #5375) out of verl/experimental/agent_loop/ and into a recipe, so it can evolve independently of core. Continuing @ZhentaoFan's work at their request.

Contents

context_management/
  context_manager.py                      # ContextManager + Sliding-window / Summarizer impls (from #5636)
  agent_loop_with_context_management.py   # AgentLoopWithContextManagement + the two loops (from #5636)
  context_manager_plugin.md               # design notes (from #5636)
  test_context_manager.py                 # CPU unit tests (from #5636)
  test_agent_loop_with_context_management.py
  example/                                # agent.yaml + run_summarizer.sh + README
  README.md, REQUIRED_VERL.txt, __init__.py

Changes vs #5636 (minimal, mechanical)

  • Relocated the two intra-package imports to recipe.context_management.* (loop file + both tests + doc).
  • Fixed one core-API drift for current verl: verl.tools.utils.tool_registryverl.tools.tool_registry.
  • All other imports remain on stable verl-core APIs (agent_loop, verl.tools, verl.utils.*,
    verl.workers.rollout.replica). No verl core changes are required to use this recipe.
    -Also corrects an inverted tool_response_truncate_side in the ported ToolSlidingWindowAgentLoop to match verl-core ToolAgentLoop (caught in review).

Testing

Run inside a verl checkout with this recipe mounted at recipe/ (verl 9c38b8bb, image
verlai/verl:vllm011.latest):

pytest recipe/context_management/test_context_manager.py \
       recipe/context_management/test_agent_loop_with_context_management.py
# => 19 passed

Import + registration verified end-to-end:

import recipe.context_management.agent_loop_with_context_management  # registers both loops
from verl.experimental.agent_loop.agent_loop import _agent_loop_registry
assert 'naive_summarizer_agent' in _agent_loop_registry
assert 'tool_sliding_window_agent' in _agent_loop_registry

Wired into a run via actor_rollout_ref.rollout.agent.agent_loop_config_path=recipe/context_management/example/agent.yaml.

Verification

The relocated code is behavior-identical to #5636 — only import paths changed, plus one drifted core
import fixed — and #5636's own demo verification ("Agent with Summarizer") is linked there. This PR
adds fresh evidence that the relocation is correct against current verl:

  • 19 passed on the recipe's CPU unit tests (command above).
  • Imports resolve and both loops register under recipe.context_management.* (snippet above).

Open question for reviewers

Is the above (relocation + passing unit tests on current verl + #5636's demo) sufficient verification
for this move, or do you expect a fresh comparative experiment? Happy to add one if needed.

Notes

  • AI assistance (Claude) was used; every changed line was reviewed and the tests were run by the submitter.
  • REQUIRED_VERL.txt pins the core-library commit this was validated against.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a pluggable context-management recipe for verl agent loops, including a model-based summarizer loop and a rule-based sliding-window tool loop, along with their respective context managers, tests, and a GRPO example. The review feedback highlights several critical issues and robustness improvements: a syntax error in the bash script due to inline comments breaking line continuation, inverted truncation logic in the tool response truncation, Python 3.9 compatibility issues with zip(..., strict=False), potential false positives in multi-modal data checks, and fragile assumptions regarding system prompt lengths and regex group counts during token slicing and replacement.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

actor_rollout_ref.rollout.mode=async \
actor_rollout_ref.rollout.n=8 \
actor_rollout_ref.rollout.multi_turn.enable=True \
`# --- context-management wiring (the point of this recipe) ---` \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using a comment inside backticks with a line continuation backslash (\) in a multi-line bash command is a syntax error. The # character comments out the rest of the line, including the closing backtick and the backslash, which breaks the line continuation and causes the script to fail. Please move this comment to the line before the python3 command.

Comment on lines +270 to +272
multi_modal_data = await self.process_vision_info(messages)
if multi_modal_data:
raise ValueError("ToolSlidingWindowAgentLoop only supports text prompts.")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Checking if multi_modal_data: is prone to false positives if process_vision_info returns a dictionary with empty lists or None values (e.g., {'images': [], 'videos': []}). It is safer to explicitly check for the presence of actual image or video data.

Suggested change
multi_modal_data = await self.process_vision_info(messages)
if multi_modal_data:
raise ValueError("ToolSlidingWindowAgentLoop only supports text prompts.")
multi_modal_data = await self.process_vision_info(messages)
if multi_modal_data and (multi_modal_data.get("images") or multi_modal_data.get("videos")):
raise ValueError("ToolSlidingWindowAgentLoop only supports text prompts.")

Comment thread context_management/agent_loop_with_context_management.py
Comment on lines +223 to +225
generated_response_ids = [
token_id for token_id, token_mask in zip(response_ids, state.response_mask, strict=False) if token_mask == 1
]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The strict parameter in zip() was introduced in Python 3.10. Explicitly passing strict=False will raise a TypeError on Python 3.9. Since False is the default behavior of zip(), it is safer to omit this parameter to maintain backward compatibility with Python 3.9.

Suggested change
generated_response_ids = [
token_id for token_id, token_mask in zip(response_ids, state.response_mask, strict=False) if token_mask == 1
]
generated_response_ids = [
token_id for token_id, token_mask in zip(response_ids, state.response_mask) if token_mask == 1
]

Comment on lines +238 to +240
generated_response_ids = [
token_id for token_id, token_mask in zip(response_ids, state.response_mask, strict=False) if token_mask == 1
]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The strict parameter in zip() was introduced in Python 3.10. Explicitly passing strict=False will raise a TypeError on Python 3.9. Since False is the default behavior of zip(), it is safer to omit this parameter to maintain backward compatibility with Python 3.9.

Suggested change
generated_response_ids = [
token_id for token_id, token_mask in zip(response_ids, state.response_mask, strict=False) if token_mask == 1
]
generated_response_ids = [
token_id for token_id, token_mask in zip(response_ids, state.response_mask) if token_mask == 1
]

tokenize=True,
**self.apply_chat_template_kwargs,
)
summary_ids = normalize_token_ids(tokenized_summary)[self.system_prompt_length :]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Slicing tokenized_summary by a pre-calculated self.system_prompt_length assumes that the tokenizer's chat template always prepends the system prompt. If the template does not prepend a system prompt when none is provided in the messages, this will incorrectly truncate the actual summary tokens. It is safer to dynamically check if the system prompt is present before slicing.

Suggested change
summary_ids = normalize_token_ids(tokenized_summary)[self.system_prompt_length :]
system_prompt_ids = initialize_system_prompt(self.tokenizer, **self.apply_chat_template_kwargs)
summary_ids = normalize_token_ids(tokenized_summary)
if summary_ids[:len(system_prompt_ids)] == system_prompt_ids:
summary_ids = summary_ids[len(system_prompt_ids):]

Comment on lines +171 to +181
for index, match in enumerate(matches):
compressed_parts.append(text[last_end : match.start()])
if index < num_to_compress:
start_tag, _, end_tag = match.groups()
# Previously compressed wouldn't be counted as 'removed_num_obs' this time
if match.group(2).strip() != self.replacing_text:
removed_num_obs += 1
compressed_parts.append(f"{start_tag}{self.replacing_text}{end_tag}")
else:
compressed_parts.append(match.group(0))
last_end = match.end()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Unpacking match.groups() assumes the regex pattern has exactly 3 groups. If a user configures a custom tool_response_pattern with extra passive or nested groups, this will raise a ValueError. It is more robust to use absolute indices of the second group (match.start(2) and match.end(2)) to perform the replacement.

Suggested change
for index, match in enumerate(matches):
compressed_parts.append(text[last_end : match.start()])
if index < num_to_compress:
start_tag, _, end_tag = match.groups()
# Previously compressed wouldn't be counted as 'removed_num_obs' this time
if match.group(2).strip() != self.replacing_text:
removed_num_obs += 1
compressed_parts.append(f"{start_tag}{self.replacing_text}{end_tag}")
else:
compressed_parts.append(match.group(0))
last_end = match.end()
for index, match in enumerate(matches):
if index < num_to_compress:
if match.group(2).strip() != self.replacing_text:
removed_num_obs += 1
compressed_parts.append(text[last_end : match.start(2)])
compressed_parts.append(self.replacing_text)
last_end = match.end(2)
else:
compressed_parts.append(text[last_end : match.end()])
last_end = match.end()

Port the agentic context-management code from verl#5636 ([algo] feat:
supporting agentic rl with context management; issue #5375) into a
self-contained recipe, as the verl maintainers requested.

Contents: the ContextManager abstraction with sliding-window and summarizer
implementations, the naive_summarizer_agent and tool_sliding_window_agent
loops, the design doc, a runnable GRPO example, and the original CPU unit
tests (19 passing against verl main 9c38b8bb).

Changes vs #5636: relocate intra-package imports to recipe.context_management.*
and fix one drifted core import (verl.tools.utils.tool_registry ->
verl.tools.tool_registry) for current verl. AI assistance was used.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Zhentao Fan <zhentao.fan@mail.utoronto.ca>
- Correct inverted tool_response_truncate_side in ToolSlidingWindowAgentLoop to
  match verl-core ToolAgentLoop (left => keep tail, right => keep head).
  Pre-existing in #5636.
- Guard the text-only multi-modal check against empty-list dicts
  ({'images': [], 'videos': []}).
- Remove a fragile backtick-comment idiom from the example run script.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@HarishKMurali

Copy link
Copy Markdown
Author

@wuxibin89 @ArronHZG would you be able to review or help suggest a reviewer? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant