Skip to content

fix(verl): preserve multi-turn tool-call prefix extension for math tool agent for Qwen 3 models#516

Draft
JasonWei05 wants to merge 2 commits into
mainfrom
feat/multi-turn-extension
Draft

fix(verl): preserve multi-turn tool-call prefix extension for math tool agent for Qwen 3 models#516
JasonWei05 wants to merge 2 commits into
mainfrom
feat/multi-turn-extension

Conversation

@JasonWei05

@JasonWei05 JasonWei05 commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator

Summary

This PR adds rllm.multi_turn_extension support for Qwen 3 multi-turn tool-calling rollouts through the verl gateway path. When enabled, the gateway renders chat prompts with rLLM's chat template parser and forwards raw text to vLLM's /v1/completions, avoiding vLLM chat-template canonicalization that breaks prefix-extension checks. Qwen 3 and Deepseek-Qwen parsers now preserve model-emitted assistant whitespace around tool calls so replayed turns remain byte-identical. This is not an issue for Qwen 3.5 and Qwen 3 coder models, which use qwen3_coder and qwen3_xml parsers. This is an issue with the hermes parser.

Type of change

  • Feature
  • Fix
  • Docs
  • Refactor
  • Example / Project
  • Infra / CI

What changed

  • Added parser-backed gateway transport for multi_turn_extension=true, converting chat-completions requests to raw-text completions while preserving trace token ids and logprobs.
  • Threaded multi_turn_extension, tokenizer name, thinking mode, and accumulated reasoning config through the gateway and verl rollout setup.
  • Updated Qwen and Deepseek-Qwen parser rendering/parsing so tool-call turns can preserve model-emitted whitespace.
  • Made cookbooks/math_tool_agent/train_verl.sh default to MULTI_TURN_EXTENSION=true with legacy vLLM tool parsing only when disabled.
  • Added focused parser and gateway tests for raw-text transport, logprob normalization, tool-call finish reasons, and Qwen/Deepseek-Qwen whitespace behavior.

Validation

  • pre-commit run --all-files
  • Targeted tests: pytest ...
  • Manual validation performed
  • Not run (reason below)

Validation details:

  • PYTHONPATH=. uv run --no-project --with pytest python -m pytest tests/parser/test_multi_turn_extension.py -q passed: 4 passed.
  • PYTHONPATH=src uv run --no-project --with pytest --with pytest-asyncio --with fastapi --with uvicorn --with httpx --with 'pydantic>=2' --with aiosqlite --with PyYAML python -m pytest tests/unit/test_parser_transport.py tests/unit/test_server.py -q passed: 60 passed, 2 warnings.
  • PYTHONPATH=src uv run --no-project --with pytest --with pytest-asyncio --with fastapi --with uvicorn --with httpx --with 'pydantic>=2' --with aiosqlite --with PyYAML python -m pytest tests/unit/ -q passed: 186 passed, 2 warnings.
  • uv run --no-project --with ruff ruff check ... passed on changed Python files.
  • python -m compileall -q ..., bash -n cookbooks/math_tool_agent/train_verl.sh, and git diff --check passed.

Breaking changes / migration notes

  • Adds rllm.multi_turn_extension, defaulting to False in base config.
  • For multi_turn_extension=true, gateway parser transport requires a tokenizer/model path and either rllm.disable_thinking=true or rllm.accumulate_reasoning=true.
  • Streaming, n>1, multimodal chat content, and required tool-choice enforcement are not supported by the v1 parser transport path.

Docs / examples

  • Not needed
  • Updated docs
  • Updated examples
  • Follow-up docs needed

Related issues / PRs

  • Fixes #
  • Related to #
  • Stacked on / depends on #

Screenshots / logs

N/A

@JasonWei05 JasonWei05 marked this pull request as draft April 30, 2026 08:31
@JasonWei05 JasonWei05 changed the title fix(verl): preserve multi-turn tool-call prefix extension for math tool agent fix(verl): preserve multi-turn tool-call prefix extension for math tool agent for Qwen 3 models Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant