Skip to content

Docs: align Unit 2 DPOConfig & Unit 3 VLM SFT format with current TRL (1.5)#298

Open
ColebyPearson wants to merge 1 commit into
huggingface:mainfrom
ColebyPearson:fix-trl-1.5-api-drift
Open

Docs: align Unit 2 DPOConfig & Unit 3 VLM SFT format with current TRL (1.5)#298
ColebyPearson wants to merge 1 commit into
huggingface:mainfrom
ColebyPearson:fix-trl-1.5-api-drift

Conversation

@ColebyPearson

Copy link
Copy Markdown

Two small but reproducible API-drift errors learners hit when running units/en/ verbatim against TRL 1.5+.

1. units/en/unit2/2.md — drop max_prompt_length from the DPOConfig example

TRL 1.5 removed the max_prompt_length argument. The current snippet (lines 86–94) raises immediately:

TypeError: DPOConfig.__init__() got an unexpected keyword argument 'max_prompt_length'

max_length alone caps the combined prompt+completion sequence, so max_prompt_length is redundant. Removed the line and updated the comment on max_length to make its new role explicit.

2. units/en/unit3/4.mdformat_data() uses typed-parts message content that TRL's SFT collator rejects

The current format_data() builds messages like:

{"role": "user", "content": [{"type": "image", ...}, {"type": "text", ...}]}

TRL's SFT collator (trl.data_utils.prepare_multimodal_messages) counts <image> placeholders in string content and auto-injects them from the images field. The typed-parts form is invisible to it, so training raises:

ValueError: Number of images provided (1) does not match number of image placeholders (0).

Switching to plain-string content lets TRL inject the placeholder itself. I added a short comment in the snippet explaining the why, so the next learner who runs into a related collator error has something greppable.

The earlier apply_chat_template usage in the same file (Exercise 1, lines ~148–152) is left untouched — typed-parts work fine for the processor's chat-template path; only the SFTTrainer collator is brittle.

Verification

  • Diff is intentionally surgical: 3 insertions / 35 deletions across both files, no whitespace/line-ending noise.
  • Hit both while running the unit verbatim from a clean Python 3.13 + TRL 1.5 + PEFT 0.18 install (notes in VoicesColeby/HFsmolcourse if useful).
  • Tested by re-running the Unit 2 + Unit 3 scripts against current TRL — both now train end-to-end and push to the Hub.

🤖 Generated with Claude Code

… (1.5)

Two small but reproducible API-drift errors learners hit when following
units/en/ verbatim with TRL 1.5+:

1) units/en/unit2/2.md — DPOConfig example still lists max_prompt_length.
   TRL 1.5 removed that argument:
     TypeError: DPOConfig.__init__() got an unexpected keyword argument
                'max_prompt_length'
   max_length on its own caps the combined prompt+completion sequence, so
   max_prompt_length is redundant. Removed from the snippet and the comment
   on max_length adjusted to make the new role explicit.

2) units/en/unit3/4.md — format_data() builds messages with typed-parts
   content (`content: [{"type": "image", ...}, {"type": "text", ...}]`).
   TRL's SFT collator (trl.data_utils.prepare_multimodal_messages) counts
   `<image>` placeholders in *string* content and auto-injects them from
   the `images` field. The typed-parts form is invisible to it, so
   training raises:
     ValueError: Number of images provided (1) does not match number of
                 image placeholders (0).
   Switched format_data to plain-string content (TRL injects the
   placeholder itself) and added a short comment explaining the why so
   the change is greppable.

   The earlier apply_chat_template usage in the same file (Exercise 1) is
   left untouched — typed-parts work fine for the processor's chat
   template, only the SFTTrainer collator path is brittle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant