feat(parser): renderers-based parser backend with selectable parser_backend by listar2000 · Pull Request #591 · rllm-org/rllm

listar2000 · 2026-05-21T21:39:23Z

Summary

Adds a token-native message parser built on the external renderers package as an alternative to the chat-template-string path, behind a shared BaseParser contract so rollout engines can switch backends transparently.

The chat-template approach (messages → string → tokens) silently breaks token identity in multi-turn RL — boolean round-trips, BPE retokenization drift, dropped reasoning blocks. renderers renders messages straight to token ids and extends a rollout with bridge_to_next_turn, reusing model-sampled tokens verbatim instead of re-rendering history.

This PR puts the new backend in place and makes it selectable; it is opt-in per engine via parser_backend.

Changes

New parser layer (rllm/parser/)

base.py — BaseParser contract (render, parse_completion, get_stop_token_ids, bridge_to_next_turn), ParsedCompletion result type, and ParserSession (per-rollout cache that uses bridge_to_next_turn to extend multi-turn rollouts without re-rendering sampled tokens).
renderer_parser.py — RendererParser, wrapping the renderers package. Pinned renderers>=0.1.7,<0.2.0 (added to the verl / tinker extras).
ChatTemplateParser now implements BaseParser too, so both backends are interchangeable. ParsedCompletion is dict-compatible (["content"] / .get(...)) so legacy callers of the old parse_completion dict keep working unchanged.

Engine integration

VerlEngine: single self.parser: BaseParser, selected by parser_backend — "renderer" (default) or "chat_template".
TinkerEngine: bypass_render_with_parser replaced by parser_backend — "tinker" (default, Tinker renderer) or "chat_template". The new renderer parser is not wired into Tinker (it has its own renderer).
Configs, docs, and the tinker example script updated for the parser_backend key.

Notes

RendererParser is text-only for now; VLM runs on the verl backend must set parser_backend=chat_template (enforced with a fail-fast error).
Non-experimental engines under rllm/engine/rollout/ are intentionally untouched.

Test plan

pytest tests/parser/ — 38 passing (incl. new test_renderer_parser.py: render, parse, tool-call round-trip, bridge_to_next_turn byte-identity, multi-turn ParserSession, DefaultRenderer no-bridge fallback; and test_chat_parser_satisfies_base_parser)
End-to-end verl run with rllm.parser_backend=renderer
End-to-end verl run with rllm.parser_backend=chat_template (regression check)

🤖 Generated with Claude Code

…er_backend Introduces a token-native message parser built on the external `renderers` package as an alternative to the chat-template-string path, and a shared `BaseParser` contract so rollout engines can switch backends transparently. - Add `BaseParser` contract (`base.py`) + `RendererParser` (`renderer_parser.py`) wrapping `renderers`; pinned to `renderers>=0.1.7,<0.2.0`. - Make `ChatTemplateParser` implement `BaseParser` (`render`, `get_stop_token_ids`, `parse_completion -> ParsedCompletion`); `ParsedCompletion` is dict-compatible for legacy callers. - `ParserSession` caches prompt/completion ids and uses `bridge_to_next_turn` to extend multi-turn rollouts without re-rendering sampled tokens. - VerlEngine: single `self.parser: BaseParser`, selectable via `parser_backend` ("renderer" default | "chat_template"). - TinkerEngine: replace `bypass_render_with_parser` with `parser_backend` ("tinker" default | "chat_template"). - Update configs, docs, and tests accordingly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mintlify · 2026-05-21T21:40:48Z

Preview deployment for your docs. Learn more about Mintlify Previews.

Project	Status	Preview	Updated (UTC)
rllm	🟢 Ready	View Preview	May 21, 2026, 9:41 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

…ng coalesce Two issues surfaced while testing the renderer parser backend end-to-end on the solver_judge cookbooks. countdown_transform only emitted {target, nums, data_source}, but the `countdown` dataset is registered with `instruction_field = "question"` (and an `instruction.md.tpl` of `{{question}}`). The promised `question` field never existed, so `task.instruction` rendered empty and any consumer reading `task["question"]` raised KeyError. Build the natural-language `question` (wording matched to examples/countdown/prepare_countdown_data.py so train/test stay consistent with stage2/stage3) and surface `ground_truth`. The agentflow solver-judge flow had a workaround for the missing field; it now prefers `task.instruction` and keeps the metadata reconstruction only as a fallback for datasets registered before this fix. The workflow solver-judge flow passed `thought=output.reasoning` straight into `Step`; `ModelOutput.reasoning` is `str | None` (None for non-thinking completions) while `Step.thought` is a plain `str`, raising a pydantic ValidationError. Coalesce with `or ""`, matching the `Step` factory in rllm/types.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mintlify Bot deployed to staging - docs May 21, 2026 21:41 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(parser): renderers-based parser backend with selectable parser_backend#591

feat(parser): renderers-based parser backend with selectable parser_backend#591
listar2000 wants to merge 2 commits into
mainfrom
feat/renderer-parser-backend

listar2000 commented May 21, 2026

Uh oh!

mintlify Bot commented May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

listar2000 commented May 21, 2026

Summary

Changes

Notes

Test plan

Uh oh!

mintlify Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mintlify Bot commented May 21, 2026 •

edited

Loading