Docs: align Unit 2 DPOConfig & Unit 3 VLM SFT format with current TRL (1.5)#298
Open
ColebyPearson wants to merge 1 commit into
Open
Docs: align Unit 2 DPOConfig & Unit 3 VLM SFT format with current TRL (1.5)#298ColebyPearson wants to merge 1 commit into
ColebyPearson wants to merge 1 commit into
Conversation
… (1.5)
Two small but reproducible API-drift errors learners hit when following
units/en/ verbatim with TRL 1.5+:
1) units/en/unit2/2.md — DPOConfig example still lists max_prompt_length.
TRL 1.5 removed that argument:
TypeError: DPOConfig.__init__() got an unexpected keyword argument
'max_prompt_length'
max_length on its own caps the combined prompt+completion sequence, so
max_prompt_length is redundant. Removed from the snippet and the comment
on max_length adjusted to make the new role explicit.
2) units/en/unit3/4.md — format_data() builds messages with typed-parts
content (`content: [{"type": "image", ...}, {"type": "text", ...}]`).
TRL's SFT collator (trl.data_utils.prepare_multimodal_messages) counts
`<image>` placeholders in *string* content and auto-injects them from
the `images` field. The typed-parts form is invisible to it, so
training raises:
ValueError: Number of images provided (1) does not match number of
image placeholders (0).
Switched format_data to plain-string content (TRL injects the
placeholder itself) and added a short comment explaining the why so
the change is greppable.
The earlier apply_chat_template usage in the same file (Exercise 1) is
left untouched — typed-parts work fine for the processor's chat
template, only the SFTTrainer collator path is brittle.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two small but reproducible API-drift errors learners hit when running
units/en/verbatim against TRL 1.5+.1.
units/en/unit2/2.md— dropmax_prompt_lengthfrom theDPOConfigexampleTRL 1.5 removed the
max_prompt_lengthargument. The current snippet (lines 86–94) raises immediately:max_lengthalone caps the combined prompt+completion sequence, somax_prompt_lengthis redundant. Removed the line and updated the comment onmax_lengthto make its new role explicit.2.
units/en/unit3/4.md—format_data()uses typed-parts message content that TRL's SFT collator rejectsThe current
format_data()builds messages like:{"role": "user", "content": [{"type": "image", ...}, {"type": "text", ...}]}TRL's SFT collator (
trl.data_utils.prepare_multimodal_messages) counts<image>placeholders in string content and auto-injects them from theimagesfield. The typed-parts form is invisible to it, so training raises:Switching to plain-string content lets TRL inject the placeholder itself. I added a short comment in the snippet explaining the why, so the next learner who runs into a related collator error has something greppable.
The earlier
apply_chat_templateusage in the same file (Exercise 1, lines ~148–152) is left untouched — typed-parts work fine for the processor's chat-template path; only the SFTTrainer collator is brittle.Verification
🤖 Generated with Claude Code