feat(parser): renderers-based parser backend with selectable parser_backend#591
Draft
listar2000 wants to merge 2 commits into
Draft
feat(parser): renderers-based parser backend with selectable parser_backend#591listar2000 wants to merge 2 commits into
listar2000 wants to merge 2 commits into
Conversation
…er_backend
Introduces a token-native message parser built on the external `renderers`
package as an alternative to the chat-template-string path, and a shared
`BaseParser` contract so rollout engines can switch backends transparently.
- Add `BaseParser` contract (`base.py`) + `RendererParser` (`renderer_parser.py`)
wrapping `renderers`; pinned to `renderers>=0.1.7,<0.2.0`.
- Make `ChatTemplateParser` implement `BaseParser` (`render`,
`get_stop_token_ids`, `parse_completion -> ParsedCompletion`); `ParsedCompletion`
is dict-compatible for legacy callers.
- `ParserSession` caches prompt/completion ids and uses `bridge_to_next_turn`
to extend multi-turn rollouts without re-rendering sampled tokens.
- VerlEngine: single `self.parser: BaseParser`, selectable via
`parser_backend` ("renderer" default | "chat_template").
- TinkerEngine: replace `bypass_render_with_parser` with
`parser_backend` ("tinker" default | "chat_template").
- Update configs, docs, and tests accordingly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
…ng coalesce
Two issues surfaced while testing the renderer parser backend end-to-end
on the solver_judge cookbooks.
countdown_transform only emitted {target, nums, data_source}, but the
`countdown` dataset is registered with `instruction_field = "question"`
(and an `instruction.md.tpl` of `{{question}}`). The promised `question`
field never existed, so `task.instruction` rendered empty and any
consumer reading `task["question"]` raised KeyError. Build the
natural-language `question` (wording matched to
examples/countdown/prepare_countdown_data.py so train/test stay
consistent with stage2/stage3) and surface `ground_truth`.
The agentflow solver-judge flow had a workaround for the missing field;
it now prefers `task.instruction` and keeps the metadata reconstruction
only as a fallback for datasets registered before this fix.
The workflow solver-judge flow passed `thought=output.reasoning`
straight into `Step`; `ModelOutput.reasoning` is `str | None` (None for
non-thinking completions) while `Step.thought` is a plain `str`, raising
a pydantic ValidationError. Coalesce with `or ""`, matching the `Step`
factory in rllm/types.py.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a token-native message parser built on the external
rendererspackage as an alternative to the chat-template-string path, behind a sharedBaseParsercontract so rollout engines can switch backends transparently.The chat-template approach (messages → string → tokens) silently breaks token identity in multi-turn RL — boolean round-trips, BPE retokenization drift, dropped reasoning blocks.
renderersrenders messages straight to token ids and extends a rollout withbridge_to_next_turn, reusing model-sampled tokens verbatim instead of re-rendering history.This PR puts the new backend in place and makes it selectable; it is opt-in per engine via
parser_backend.Changes
New parser layer (
rllm/parser/)base.py—BaseParsercontract (render,parse_completion,get_stop_token_ids,bridge_to_next_turn),ParsedCompletionresult type, andParserSession(per-rollout cache that usesbridge_to_next_turnto extend multi-turn rollouts without re-rendering sampled tokens).renderer_parser.py—RendererParser, wrapping therendererspackage. Pinnedrenderers>=0.1.7,<0.2.0(added to theverl/tinkerextras).ChatTemplateParsernow implementsBaseParsertoo, so both backends are interchangeable.ParsedCompletionis dict-compatible (["content"]/.get(...)) so legacy callers of the oldparse_completiondict keep working unchanged.Engine integration
VerlEngine: singleself.parser: BaseParser, selected byparser_backend—"renderer"(default) or"chat_template".TinkerEngine:bypass_render_with_parserreplaced byparser_backend—"tinker"(default, Tinker renderer) or"chat_template". The new renderer parser is not wired into Tinker (it has its own renderer).parser_backendkey.Notes
RendererParseris text-only for now; VLM runs on the verl backend must setparser_backend=chat_template(enforced with a fail-fast error).rllm/engine/rollout/are intentionally untouched.Test plan
pytest tests/parser/— 38 passing (incl. newtest_renderer_parser.py: render, parse, tool-call round-trip,bridge_to_next_turnbyte-identity, multi-turnParserSession,DefaultRendererno-bridge fallback; andtest_chat_parser_satisfies_base_parser)rllm.parser_backend=rendererrllm.parser_backend=chat_template(regression check)🤖 Generated with Claude Code