feat(recipe): Add context-management agent loops (relocation of #5636) by HarishKMurali · Pull Request #107 · verl-project/verl-recipe

HarishKMurali · 2026-06-05T15:30:48Z

What does this PR do?

Adds a self-contained context_management/ recipe providing plug-in context management for verl
agent loops — keeping long / multi-turn rollouts within the model's context window by compressing the
trajectory on the fly instead of truncating or failing once the window is exceeded.

It contains two agent loops and the shared ContextManager abstraction:

Agent loop (`name`)	Class	Strategy
`naive_summarizer_agent`	`SummarizerAgentLoop`	On a model-emitted `<summary>…</summary>`, reset history to `(prompt + summary)` and continue.
`tool_sliding_window_agent`	`ToolSlidingWindowAgentLoop`	Sliding window over tool-calling turns; drop oldest when exceeded.

Why this is not a duplicate

This is the maintainer-requested relocation of the AgentLoop-layer code from #5636 ("[algo] feat: supporting agentic rl with context management", issue #5375) out of verl/experimental/agent_loop/ and into a recipe, so it can evolve independently of core. Continuing @ZhentaoFan's work at their request.

context_management/
  context_manager.py                      # ContextManager + Sliding-window / Summarizer impls (from #5636)
  agent_loop_with_context_management.py   # AgentLoopWithContextManagement + the two loops (from #5636)
  context_manager_plugin.md               # design notes (from #5636)
  test_context_manager.py                 # CPU unit tests (from #5636)
  test_agent_loop_with_context_management.py
  example/                                # agent.yaml + run_summarizer.sh + README
  README.md, REQUIRED_VERL.txt, __init__.py

Changes vs #5636 (minimal, mechanical)

Relocated the two intra-package imports to recipe.context_management.* (loop file + both tests + doc).
Fixed one core-API drift for current verl: verl.tools.utils.tool_registry → verl.tools.tool_registry.
All other imports remain on stable verl-core APIs (agent_loop, verl.tools, verl.utils.*,
verl.workers.rollout.replica). No verl core changes are required to use this recipe.
-Also corrects an inverted tool_response_truncate_side in the ported ToolSlidingWindowAgentLoop to match verl-core ToolAgentLoop (caught in review).

Testing

Run inside a verl checkout with this recipe mounted at recipe/ (verl 9c38b8bb, image
verlai/verl:vllm011.latest):

pytest recipe/context_management/test_context_manager.py \
       recipe/context_management/test_agent_loop_with_context_management.py
# => 19 passed

Import + registration verified end-to-end:

import recipe.context_management.agent_loop_with_context_management  # registers both loops
from verl.experimental.agent_loop.agent_loop import _agent_loop_registry
assert 'naive_summarizer_agent' in _agent_loop_registry
assert 'tool_sliding_window_agent' in _agent_loop_registry

Wired into a run via actor_rollout_ref.rollout.agent.agent_loop_config_path=recipe/context_management/example/agent.yaml.

Verification

The relocated code is behavior-identical to #5636 — only import paths changed, plus one drifted core
import fixed — and #5636's own demo verification ("Agent with Summarizer") is linked there. This PR
adds fresh evidence that the relocation is correct against current verl:

19 passed on the recipe's CPU unit tests (command above).
Imports resolve and both loops register under recipe.context_management.* (snippet above).

Open question for reviewers

Is the above (relocation + passing unit tests on current verl + #5636's demo) sufficient verification
for this move, or do you expect a fresh comparative experiment? Happy to add one if needed.

Notes

AI assistance (Claude) was used; every changed line was reviewed and the tests were run by the submitter.
REQUIRED_VERL.txt pins the core-library commit this was validated against.

gemini-code-assist

Code Review

This pull request introduces a pluggable context-management recipe for verl agent loops, including a model-based summarizer loop and a rule-based sliding-window tool loop, along with their respective context managers, tests, and a GRPO example. The review feedback highlights several critical issues and robustness improvements: a syntax error in the bash script due to inline comments breaking line continuation, inverted truncation logic in the tool response truncation, Python 3.9 compatibility issues with zip(..., strict=False), potential false positives in multi-modal data checks, and fragile assumptions regarding system prompt lengths and regex group counts during token slicing and replacement.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-05T15:33:02Z

+    actor_rollout_ref.rollout.mode=async \
+    actor_rollout_ref.rollout.n=8 \
+    actor_rollout_ref.rollout.multi_turn.enable=True \
+    `# --- context-management wiring (the point of this recipe) ---` \


Using a comment inside backticks with a line continuation backslash (\) in a multi-line bash command is a syntax error. The # character comments out the rest of the line, including the closing backtick and the backslash, which breaks the line continuation and causes the script to fail. Please move this comment to the line before the python3 command.

gemini-code-assist · 2026-06-05T15:33:02Z

+        multi_modal_data = await self.process_vision_info(messages)
+        if multi_modal_data:
+            raise ValueError("ToolSlidingWindowAgentLoop only supports text prompts.")


Checking if multi_modal_data: is prone to false positives if process_vision_info returns a dictionary with empty lists or None values (e.g., {'images': [], 'videos': []}). It is safer to explicitly check for the presence of actual image or video data.

Suggested change

multi_modal_data = await self.process_vision_info(messages)

if multi_modal_data:

raise ValueError("ToolSlidingWindowAgentLoop only supports text prompts.")

multi_modal_data = await self.process_vision_info(messages)

if multi_modal_data and (multi_modal_data.get("images") or multi_modal_data.get("videos")):

raise ValueError("ToolSlidingWindowAgentLoop only supports text prompts.")

gemini-code-assist · 2026-06-05T15:33:02Z

+        generated_response_ids = [
+            token_id for token_id, token_mask in zip(response_ids, state.response_mask, strict=False) if token_mask == 1
+        ]


The strict parameter in zip() was introduced in Python 3.10. Explicitly passing strict=False will raise a TypeError on Python 3.9. Since False is the default behavior of zip(), it is safer to omit this parameter to maintain backward compatibility with Python 3.9.

Suggested change

generated_response_ids = [

token_id for token_id, token_mask in zip(response_ids, state.response_mask, strict=False) if token_mask == 1

]

generated_response_ids = [

token_id for token_id, token_mask in zip(response_ids, state.response_mask) if token_mask == 1

]

gemini-code-assist · 2026-06-05T15:33:02Z

+        generated_response_ids = [
+            token_id for token_id, token_mask in zip(response_ids, state.response_mask, strict=False) if token_mask == 1
+        ]


The strict parameter in zip() was introduced in Python 3.10. Explicitly passing strict=False will raise a TypeError on Python 3.9. Since False is the default behavior of zip(), it is safer to omit this parameter to maintain backward compatibility with Python 3.9.

Suggested change

generated_response_ids = [

token_id for token_id, token_mask in zip(response_ids, state.response_mask, strict=False) if token_mask == 1

]

generated_response_ids = [

token_id for token_id, token_mask in zip(response_ids, state.response_mask) if token_mask == 1

]

gemini-code-assist · 2026-06-05T15:33:03Z

+            tokenize=True,
+            **self.apply_chat_template_kwargs,
+        )
+        summary_ids = normalize_token_ids(tokenized_summary)[self.system_prompt_length :]


Slicing tokenized_summary by a pre-calculated self.system_prompt_length assumes that the tokenizer's chat template always prepends the system prompt. If the template does not prepend a system prompt when none is provided in the messages, this will incorrectly truncate the actual summary tokens. It is safer to dynamically check if the system prompt is present before slicing.

Suggested change

summary_ids = normalize_token_ids(tokenized_summary)[self.system_prompt_length :]

system_prompt_ids = initialize_system_prompt(self.tokenizer, **self.apply_chat_template_kwargs)

summary_ids = normalize_token_ids(tokenized_summary)

if summary_ids[:len(system_prompt_ids)] == system_prompt_ids:

summary_ids = summary_ids[len(system_prompt_ids):]

gemini-code-assist · 2026-06-05T15:33:03Z

+        for index, match in enumerate(matches):
+            compressed_parts.append(text[last_end : match.start()])
+            if index < num_to_compress:
+                start_tag, _, end_tag = match.groups()
+                # Previously compressed wouldn't be counted as 'removed_num_obs' this time
+                if match.group(2).strip() != self.replacing_text:
+                    removed_num_obs += 1
+                compressed_parts.append(f"{start_tag}{self.replacing_text}{end_tag}")
+            else:
+                compressed_parts.append(match.group(0))
+            last_end = match.end()


Unpacking match.groups() assumes the regex pattern has exactly 3 groups. If a user configures a custom tool_response_pattern with extra passive or nested groups, this will raise a ValueError. It is more robust to use absolute indices of the second group (match.start(2) and match.end(2)) to perform the replacement.

Suggested change

for index, match in enumerate(matches):

compressed_parts.append(text[last_end : match.start()])

if index < num_to_compress:

start_tag, _, end_tag = match.groups()

# Previously compressed wouldn't be counted as 'removed_num_obs' this time

if match.group(2).strip() != self.replacing_text:

removed_num_obs += 1

compressed_parts.append(f"{start_tag}{self.replacing_text}{end_tag}")

else:

compressed_parts.append(match.group(0))

last_end = match.end()

for index, match in enumerate(matches):

if index < num_to_compress:

if match.group(2).strip() != self.replacing_text:

removed_num_obs += 1

compressed_parts.append(text[last_end : match.start(2)])

compressed_parts.append(self.replacing_text)

last_end = match.end(2)

else:

compressed_parts.append(text[last_end : match.end()])

last_end = match.end()

Port the agentic context-management code from verl#5636 ([algo] feat: supporting agentic rl with context management; issue #5375) into a self-contained recipe, as the verl maintainers requested. Contents: the ContextManager abstraction with sliding-window and summarizer implementations, the naive_summarizer_agent and tool_sliding_window_agent loops, the design doc, a runnable GRPO example, and the original CPU unit tests (19 passing against verl main 9c38b8bb). Changes vs #5636: relocate intra-package imports to recipe.context_management.* and fix one drifted core import (verl.tools.utils.tool_registry -> verl.tools.tool_registry) for current verl. AI assistance was used. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Zhentao Fan <zhentao.fan@mail.utoronto.ca>

- Correct inverted tool_response_truncate_side in ToolSlidingWindowAgentLoop to match verl-core ToolAgentLoop (left => keep tail, right => keep head). Pre-existing in #5636. - Guard the text-only multi-modal check against empty-list dicts ({'images': [], 'videos': []}). - Remove a fragile backtick-comment idiom from the example run script. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

HarishKMurali · 2026-06-12T15:44:16Z

@wuxibin89 @ArronHZG would you be able to review or help suggest a reviewer? Thanks

gemini-code-assist Bot reviewed Jun 5, 2026

View reviewed changes

HarishKMurali force-pushed the naive_summarizer branch from f8e65e7 to 422ffe4 Compare June 5, 2026 15:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(recipe): Add context-management agent loops (relocation of #5636)#107

feat(recipe): Add context-management agent loops (relocation of #5636)#107
HarishKMurali wants to merge 2 commits into
verl-project:mainfrom
HarishKMurali:naive_summarizer

HarishKMurali commented Jun 5, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Uh oh!

HarishKMurali commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-        summary_ids = normalize_token_ids(tokenized_summary)[self.system_prompt_length :]
+        system_prompt_ids = initialize_system_prompt(self.tokenizer, **self.apply_chat_template_kwargs)
+        summary_ids = normalize_token_ids(tokenized_summary)
+        if summary_ids[:len(system_prompt_ids)] == system_prompt_ids:
+            summary_ids = summary_ids[len(system_prompt_ids):]

Conversation

HarishKMurali commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Why this is not a duplicate

Contents

Changes vs #5636 (minimal, mechanical)

Testing

Verification

Open question for reviewers

Notes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

HarishKMurali commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

HarishKMurali commented Jun 5, 2026 •

edited

Loading