Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 62 additions & 0 deletions context_management/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Context-management agent loops

Plug-in **context management** for verl agent loops: keep multi-turn / long-horizon rollouts within
the model's context window by compressing the trajectory on the fly, instead of truncating or
failing once the window is exceeded.

This recipe provides two ready-to-use agent loops and the `ContextManager` abstraction they share:

| Agent loop (`name`) | Class | Strategy |
|---|---|---|
| `naive_summarizer_agent` | `SummarizerAgentLoop` | When the model emits a `<summary>...</summary>` block, replace the history with `(initial prompt + summary)` and continue. |
| `tool_sliding_window_agent` | `ToolSlidingWindowAgentLoop` | Keep a sliding window over tool-calling turns, dropping the oldest turns when the window is exceeded. |

Both subclass `AgentLoopWithContextManagement`, which drives a generic
`generate → check_and_compress → continue` loop around any `ContextManager`
(`SummarizerContextManager`, `SlidingWindowContextManager`, or your own).

## Background

This code was originally proposed for verl core in
[volcengine/verl#5636](https://github.com/verl-project/verl/pull/5636)
("[algo] feat: supporting agentic rl with context management", see issue
[#5375](https://github.com/verl-project/verl/issues/5375)). At the maintainers' request it now lives
here as a self-contained recipe rather than in `verl/experimental/agent_loop/`, so it can evolve
independently of the core library. The multi-trajectory / session-level GRPO training support that
complements it lands separately in core (see verl#5401, #5969).

## Layout

```
context_management/
context_manager.py # ContextManager + Sliding-window / Summarizer implementations
agent_loop_with_context_management.py # AgentLoopWithContextManagement + the two agent loops
context_manager_plugin.md # design notes / how to write a custom ContextManager
test_context_manager.py # CPU unit tests
test_agent_loop_with_context_management.py
example/ # runnable GRPO example wiring the summarizer loop
```

## Usage

The loops register themselves under the `name`s above. Point verl at this recipe's agent-loop config
and select a loop:

```bash
actor_rollout_ref.rollout.agent.agent_loop_config_path=recipe/context_management/example/agent.yaml
actor_rollout_ref.rollout.agent.default_agent_loop=naive_summarizer_agent
```

See [`example/`](example/) for a full run script, and
[`context_manager_plugin.md`](context_manager_plugin.md) for writing your own `ContextManager`.

## Required verl version

See [`REQUIRED_VERL.txt`](REQUIRED_VERL.txt) for the upstream repo and the pinned core-library commit.

## Tests

```bash
pytest recipe/context_management/test_context_manager.py
pytest recipe/context_management/test_agent_loop_with_context_management.py
```
11 changes: 11 additions & 0 deletions context_management/REQUIRED_VERL.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# context_management — rolling; refresh the commit against your verl checkout before publishing
UPSTREAM=https://github.com/verl-project/verl.git
MODE=rolling
BRANCH=main
# Core-library commit this recipe was developed/tested against. Refresh before opening the PR.
VERL_COMMIT=9c38b8bb1876a81273d76de3e79328b2dd2b7b32
PIP_INSTALL=pip install verl@git+https://github.com/verl-project/verl.git@9c38b8bb1876a81273d76de3e79328b2dd2b7b32
GIT_SETUP=git clone https://github.com/verl-project/verl.git && cd verl && git checkout 9c38b8bb1876a81273d76de3e79328b2dd2b7b32 && git submodule update --init --recursive recipe
RECIPE_FOLDER=context_management
NOTES=Depends only on stable verl core APIs: verl.experimental.agent_loop.agent_loop (AgentLoopBase, register, AgentLoopOutput, AgentLoopMetrics), verl.tools, verl.utils.chat_template, verl.utils.tokenizer, verl.workers.rollout.replica.TokenOutput. No core code changes are required to use this recipe.
REFRESH=Recompute VERL_COMMIT: (cd verl && git rev-parse HEAD). Re-run the tests under recipe/context_management/ after bumping.
13 changes: 13 additions & 0 deletions context_management/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Copyright 2024 Bytedance Ltd. and/or its affiliates
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Loading
Loading