Skip to content

feat(parser): renderers-based parser backend with selectable parser_backend#591

Draft
listar2000 wants to merge 2 commits into
mainfrom
feat/renderer-parser-backend
Draft

feat(parser): renderers-based parser backend with selectable parser_backend#591
listar2000 wants to merge 2 commits into
mainfrom
feat/renderer-parser-backend

Conversation

@listar2000

Copy link
Copy Markdown
Collaborator

Summary

Adds a token-native message parser built on the external renderers package as an alternative to the chat-template-string path, behind a shared BaseParser contract so rollout engines can switch backends transparently.

The chat-template approach (messages → string → tokens) silently breaks token identity in multi-turn RL — boolean round-trips, BPE retokenization drift, dropped reasoning blocks. renderers renders messages straight to token ids and extends a rollout with bridge_to_next_turn, reusing model-sampled tokens verbatim instead of re-rendering history.

This PR puts the new backend in place and makes it selectable; it is opt-in per engine via parser_backend.

Changes

New parser layer (rllm/parser/)

  • base.pyBaseParser contract (render, parse_completion, get_stop_token_ids, bridge_to_next_turn), ParsedCompletion result type, and ParserSession (per-rollout cache that uses bridge_to_next_turn to extend multi-turn rollouts without re-rendering sampled tokens).
  • renderer_parser.pyRendererParser, wrapping the renderers package. Pinned renderers>=0.1.7,<0.2.0 (added to the verl / tinker extras).
  • ChatTemplateParser now implements BaseParser too, so both backends are interchangeable. ParsedCompletion is dict-compatible (["content"] / .get(...)) so legacy callers of the old parse_completion dict keep working unchanged.

Engine integration

  • VerlEngine: single self.parser: BaseParser, selected by parser_backend"renderer" (default) or "chat_template".
  • TinkerEngine: bypass_render_with_parser replaced by parser_backend"tinker" (default, Tinker renderer) or "chat_template". The new renderer parser is not wired into Tinker (it has its own renderer).
  • Configs, docs, and the tinker example script updated for the parser_backend key.

Notes

  • RendererParser is text-only for now; VLM runs on the verl backend must set parser_backend=chat_template (enforced with a fail-fast error).
  • Non-experimental engines under rllm/engine/rollout/ are intentionally untouched.

Test plan

  • pytest tests/parser/ — 38 passing (incl. new test_renderer_parser.py: render, parse, tool-call round-trip, bridge_to_next_turn byte-identity, multi-turn ParserSession, DefaultRenderer no-bridge fallback; and test_chat_parser_satisfies_base_parser)
  • End-to-end verl run with rllm.parser_backend=renderer
  • End-to-end verl run with rllm.parser_backend=chat_template (regression check)

🤖 Generated with Claude Code

…er_backend

Introduces a token-native message parser built on the external `renderers`
package as an alternative to the chat-template-string path, and a shared
`BaseParser` contract so rollout engines can switch backends transparently.

- Add `BaseParser` contract (`base.py`) + `RendererParser` (`renderer_parser.py`)
  wrapping `renderers`; pinned to `renderers>=0.1.7,<0.2.0`.
- Make `ChatTemplateParser` implement `BaseParser` (`render`,
  `get_stop_token_ids`, `parse_completion -> ParsedCompletion`); `ParsedCompletion`
  is dict-compatible for legacy callers.
- `ParserSession` caches prompt/completion ids and uses `bridge_to_next_turn`
  to extend multi-turn rollouts without re-rendering sampled tokens.
- VerlEngine: single `self.parser: BaseParser`, selectable via
  `parser_backend` ("renderer" default | "chat_template").
- TinkerEngine: replace `bypass_render_with_parser` with
  `parser_backend` ("tinker" default | "chat_template").
- Update configs, docs, and tests accordingly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mintlify

mintlify Bot commented May 21, 2026

Copy link
Copy Markdown

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
rllm 🟢 Ready View Preview May 21, 2026, 9:41 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

…ng coalesce

Two issues surfaced while testing the renderer parser backend end-to-end
on the solver_judge cookbooks.

countdown_transform only emitted {target, nums, data_source}, but the
`countdown` dataset is registered with `instruction_field = "question"`
(and an `instruction.md.tpl` of `{{question}}`). The promised `question`
field never existed, so `task.instruction` rendered empty and any
consumer reading `task["question"]` raised KeyError. Build the
natural-language `question` (wording matched to
examples/countdown/prepare_countdown_data.py so train/test stay
consistent with stage2/stage3) and surface `ground_truth`.

The agentflow solver-judge flow had a workaround for the missing field;
it now prefers `task.instruction` and keeps the metadata reconstruction
only as a fallback for datasets registered before this fix.

The workflow solver-judge flow passed `thought=output.reasoning`
straight into `Step`; `ModelOutput.reasoning` is `str | None` (None for
non-thinking completions) while `Step.thought` is a plain `str`, raising
a pydantic ValidationError. Coalesce with `or ""`, matching the `Step`
factory in rllm/types.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant