Feat/add agentic training example by KunWuLuan · Pull Request #83 · verl-project/verl-recipe

KunWuLuan · 2026-04-13T03:07:03Z

Summary

Implement RemoteAgentLoop for VeRL, enabling RL training of external agents (e.g., SWE-agent, Claude Code) running in separate environments. The proxy server captures all LLM
traffic via an HTTP proxy, recording token_ids and logprobs for PPO updates.
Closes #5737

Commits

Commit	Description
`177581f`	Add initial agentic RL training example with `RemoteAgentLoop` and Ray-based proxy
`374c1d4`	Add standalone proxy mode support with WebSocket relay
`60a24c7`	Add Kubernetes deployment manifest for standalone proxy

Architecture


┌─────────────────────────────────────────────────────────────────────────┐
│                         VeRL Training Cluster                            │
│                                                                          │
│  ┌──────────────────────┐                                               │
│  │  AgentLoop Worker    │                                               │
│  │  (Ray Actor)         │                                               │
│  │                      │                                               │
│  │  ┌────────────────┐  │   HTTP (OpenAI API)                          │
│  │  │ RemoteAgent    │◄─┼──────────────────────────────────────┐       │
│  │  │ Loop           │  │                                      │       │
│  │  └───────┬────────┘  │                                      │       │
│  │          │           │                                      │       │
│  │          ▼           │                                      │       │
│  │  ┌────────────────┐  │   ┌─────────────────────────────┐   │       │
│  │  │ LLMProxyServer │  │   │  Third-Party Agent Cluster  │   │       │
│  │  │ (Ray Singleton │  │   │  (Ray or Kubernetes)        │   │       │
│  │  │  or Standalone)│  │   │                             │   │       │
│  │  │                │  │   │  ┌──────────────────────┐  │   │       │
│  │  │ - FastAPI      │  │   │  │ Third-Party Agent    │  │   │       │
│  │  │ - Session Rec. │  │   │  │ (e.g., SWE-agent)    │  │   │       │
│  │  │ - Tokenizer    │  │   │  │                      │  │   │       │
│  │  └───────┬────────┘  │   │  │ OpenAI SDK           │  │   │       │
│  │          │           │   │  │ base_url =           │  │   │       │
│  │          │           │   │  │ http://proxy/{id}/v1 │  │   │       │
│  │          ▼           │   │  └──────────────────────┘  │   │       │
│  │  ┌───────────────┐    │   └─────────────────────────────┘   │       │
│  │  │ vLLM Rollout  │    │                                     │       │
│  │  │ Servers       │    │                                     │       │
│  │  └───────────────┘    │                                     │       │
│  └──────────────────────┘                                     │       │
└─────────────────────────────────────────────────────────────────────────┘

Key Components

RemoteAgentLoop

Extends AgentLoopBase to:

Generate unique trial_id for each rollout trajectory
Discover proxy URL via Ray actor or environment variable (PROXY_SERVER_URL)
Submit tasks to remote agent server with proxy URL configuration
Reconstruct AgentLoopOutput from recorded session data (token_ids, logprobs)

LLMProxyServer

Supports two operating modes:

Local mode (Ray actor): Direct LiteLLM + vLLM integration
Relay mode (standalone): WebSocket relay to framework-side inference workers

InferenceWorkerClient

WebSocket client that:

Connects to standalone proxy's /ws/worker endpoint
Bridges vLLM inference requests from proxy to VeRL's rollout servers
Supports automatic reconnection with exponential backoff

Files Changed

File	Description
`agentic/__init__.py`	Package marker
`agentic/agent_loop/__init__.py`	Register `remote_agent` loop
`agentic/agent_loop/config.py`	`RemoteAgentConfig` with env parsing
`agentic/agent_loop/remote_agent_loop.py`	`RemoteAgentLoop` implementation
`agentic/agentic-qwen2.5-3b.sh`	Example training script
`agentic/agentic_main.py`	TaskRunner integration
`agentic/config/agentic_trainer.yaml`	Trainer config
`agentic/mcp-tools.sh`	MCP tools launcher
`agentic/swe-agent.yaml`	SWE-agent config
`agentic/proxyserver/__init__.py`	Package marker
`agentic/proxyserver/models.py`	Data models
`agentic/proxyserver/recorder.py`	Session recording
`agentic/proxyserver/server.py`	Core proxy server (dual-mode)
`agentic/proxyserver/vllm_provider.py`	LiteLLM custom provider for vLLM
`agentic/proxyserver/proxy_server.py`	Standalone entry point
`agentic/proxyserver/ray_actor.py`	Ray actor wrapper
`agentic/proxyserver/relay.py`	WebSocket inference relay
`agentic/proxyserver/worker_client.py`	Framework-side WebSocket client
`agentic/proxyserver/Dockerfile`	Standalone proxy image
`agentic/proxyserver/deploy.yaml`	Kubernetes deployment
`agentic/serversdk/__init__.py`	SDK package
`agentic/serversdk/client.py`	Harbor server client
`agentic/serversdk/models.py`	SDK models

Usage

Ray actor mode (default):

./agentic/agentic-qwen2.5-3b.sh

Standalone proxy mode:

# Deploy proxy
kubectl apply -f agentic/proxyserver/deploy.yaml

# Set environment variable
export PROXY_SERVER_URL=http://llm-proxy-server:8080
./agentic/agentic-qwen2.5-3b.sh

Test Plan

- Run agentic/agentic-qwen2.5-3b.sh with Ray actor mode
- Deploy standalone proxy to Kubernetes
- Test PROXY_SERVER_URL mode with inference worker client
- Verify token_ids and logprobs capture in session data
- End-to-end SWE-bench training run

gemini-code-assist

Code Review

This pull request introduces a RemoteAgentLoop and an associated LLM proxy server to enable PPO training with remote third-party agents. The implementation includes a standalone proxy mode using WebSockets and a Ray-based actor mode, along with an SDK for submitting agent runs to a Harbor server. Feedback focuses on improving robustness and efficiency, specifically regarding error handling for environment variable parsing, optimizing HTTP session management, fixing potential crashes in the agent loop logic, and leveraging modern Ray features for asynchronous operations.

gemini-code-assist · 2026-04-13T03:09:37Z

+        if v := os.getenv("REMOTE_MODEL_NAME"):
+            kwargs["model_name"] = v
+        if v := os.getenv("REMOTE_AGENT_KWARGS"):
+            kwargs["agent_kwargs"] = json.loads(v)


json.loads() is called directly on environment variables without error handling. If REMOTE_AGENT_KWARGS (or other JSON-based env vars on lines 84 and 86) contains invalid JSON, the application will crash with a JSONDecodeError during configuration loading. It is safer to wrap these calls in a try-except block and provide a descriptive error message.

gemini-code-assist · 2026-04-13T03:09:38Z

+    async def _register_session(self, proxy_url: str, trial_id: str) -> None:
+        """POST /sessions/{trial_id} to register a new session."""
+        async with aiohttp.ClientSession() as session:
+            async with session.post(f"{proxy_url}/sessions/{trial_id}") as resp:
+                resp.raise_for_status()
+
+    async def _get_session_data(self, proxy_url: str, trial_id: str) -> SessionRecord | None:
+        """GET /sessions/{trial_id} to retrieve recorded session data."""
+        async with aiohttp.ClientSession() as session:
+            async with session.get(f"{proxy_url}/sessions/{trial_id}") as resp:
+                if resp.status == 404:
+                    return None
+                resp.raise_for_status()
+                data = await resp.json()
+                return SessionRecord(**data)
+
+    async def _complete_session(self, proxy_url: str, trial_id: str) -> None:
+        """POST /sessions/{trial_id}/complete to mark session completed."""
+        async with aiohttp.ClientSession() as session:
+            async with session.post(f"{proxy_url}/sessions/{trial_id}/complete") as resp:
+                resp.raise_for_status()
+
+    async def _delete_session(self, proxy_url: str, trial_id: str) -> None:
+        """DELETE /sessions/{trial_id} to remove session data."""
+        async with aiohttp.ClientSession() as session:
+            async with session.delete(f"{proxy_url}/sessions/{trial_id}") as resp:
+                resp.raise_for_status()


Creating a new aiohttp.ClientSession for every helper method call (e.g., _register_session, _get_session_data, etc.) is inefficient. aiohttp sessions are designed to be reused to take advantage of connection pooling and reduce overhead. Consider creating a single session instance in the __init__ method and reusing it throughout the lifecycle of the RemoteAgentLoop instance.

gemini-code-assist · 2026-04-13T03:09:38Z

+            logprobs captured by the proxy.
+        """
+        import shortuuid
+        messages = list(kwargs["raw_prompt"])


The messages variable is initialized here but is never used in the rest of the run method. Additionally, accessing kwargs["raw_prompt"] directly will raise a KeyError if the key is missing. Since the reconstruction logic later uses session.turns[0].request_messages, this line appears to be redundant and potentially unsafe.

gemini-code-assist · 2026-04-13T03:09:38Z

+                    "LOCAL_IP is not set, falling back to 0.0.0.0. "
+                    "The remote agent may not be able to reach the proxy."
+                )
+            agent_base_url = f"http://{local_ip}:{proxy_port}/{trial_id}/v1"


urlparse(proxy_url).port returns None if the port is not explicitly specified in the URL (e.g., http://proxy-server). This results in an invalid agent_base_url like http://10.0.30.11:None/.... You should handle the case where the port is missing by defaulting to the standard port for the scheme (80 for http, 443 for https).

Suggested change

agent_base_url = f"http://{local_ip}:{proxy_port}/{trial_id}/v1"

proxy_port = parsed.port or (443 if parsed.scheme == "https" else 80)

gemini-code-assist · 2026-04-13T03:09:38Z

+                )
+            agent_base_url = f"http://{local_ip}:{proxy_port}/{trial_id}/v1"
+
+            agent_kwargs = dict(self.remote_agent_kwargs)


dict(self.remote_agent_kwargs) creates a shallow copy. If the agent configuration contains nested dictionaries or lists, they will be shared across all rollout executions. Mutations to these nested structures (e.g., by the agent SDK) could lead to subtle bugs or race conditions. Use copy.deepcopy to ensure each run has a completely isolated set of arguments.

gemini-code-assist · 2026-04-13T03:09:38Z

+
+        import ray
+
+        output = await asyncio.to_thread(ray.get, ref)


In modern Ray versions, ObjectRef is awaitable. Instead of using asyncio.to_thread(ray.get, ref), which blocks a thread from the pool, you can directly await ref. This is more efficient and follows the recommended pattern for asynchronous Ray applications.

Suggested change

output = await asyncio.to_thread(ray.get, ref)

output = await ref

gemini-code-assist · 2026-04-13T03:09:38Z

+        if not log_probs:
+            return None
+        content = []
+        for tid, lp in zip(token_ids, log_probs):


Calling self.tokenizer.decode([tid]) in a loop for every token is inefficient, especially for long sequences. Consider using self.tokenizer.convert_ids_to_tokens or decoding the entire sequence once and mapping tokens to their positions to improve performance when building the logprobs metadata.

Signed-off-by: KunWuLuan <30817980+KunWuLuan@users.noreply.github.com>

KunWuLuan · 2026-05-14T08:22:25Z

@wuxibin89 Hi, would like to hear your thoughts, thank you!

- Promote REMOTE_AGENT_* / LOCAL_IP knobs into yaml; inject via runtime_env.env_vars with os.environ.setdefault to keep CLI > shell env > yaml precedence. - Rename proxy_server.local_ip -> llm_proxy_ip (LOCAL_IP -> LLM_PROXY_IP) to match its real semantic (LLM proxy's externally reachable IP). - Add recipe/agentic/dataset/local_harbor.py: scan local Harbor task dirs (task.toml + instruction.md) into a cached verl parquet; driver overrides data.train_files / data.val_files when data.train_harbor_dir / val_harbor_dir is set. - Rows emit top-level instance_id / local_task_path; RemoteAgentLoop resolves task_path as local_task_path -> <harbor_root>/<instance_id> -> legacy task_path_template. - Add README, rewrite agentic-qwen2.5-3b.sh to pure hydra overrides, drop obsolete mcp-tools.sh. Assisted-by: Qoder Signed-off-by: kunwuluan@gmail.com

gemini-code-assist Bot reviewed Apr 13, 2026

View reviewed changes

KunWuLuan added 3 commits May 14, 2026 16:13

feat(agentic): add agentic RL training example

738250c

Signed-off-by: KunWuLuan <30817980+KunWuLuan@users.noreply.github.com>

support standalone proxy mode

6e269d8

Signed-off-by: KunWuLuan <30817980+KunWuLuan@users.noreply.github.com>

add deploy

13ccb14

Signed-off-by: KunWuLuan <30817980+KunWuLuan@users.noreply.github.com>

KunWuLuan force-pushed the feat/add-agentic-training-example branch from 60a24c7 to 13ccb14 Compare May 14, 2026 08:14

KunWuLuan mentioned this pull request May 14, 2026

Feature Request: RemoteAgentLoop - Support for External Distributed Agent Integration verl-project/verl#5737

Open

16 tasks

KunWuLuan force-pushed the feat/add-agentic-training-example branch from 4f15d7c to e510cca Compare May 15, 2026 02:02

KunWuLuan added 2 commits May 15, 2026 16:25

fix agentic for verl 0514

fb1213f

update k8s configuration example

3d29886

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/add agentic training example#83

Feat/add agentic training example#83
KunWuLuan wants to merge 6 commits into
verl-project:mainfrom
alibaba:feat/add-agentic-training-example

KunWuLuan commented Apr 13, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Uh oh!

KunWuLuan commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	agent_base_url = f"http://{local_ip}:{proxy_port}/{trial_id}/v1"
	proxy_port = parsed.port or (443 if parsed.scheme == "https" else 80)

	output = await asyncio.to_thread(ray.get, ref)
	output = await ref

Conversation

KunWuLuan commented Apr 13, 2026

Summary

Commits

Architecture

Key Components

RemoteAgentLoop

LLMProxyServer

InferenceWorkerClient

Files Changed

Usage

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

KunWuLuan commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant