Docs: align Unit 2 DPOConfig & Unit 3 VLM SFT format with current TRL (1.5) by ColebyPearson · Pull Request #298 · huggingface/smol-course

ColebyPearson · 2026-05-27T15:36:42Z

Two small but reproducible API-drift errors learners hit when running units/en/ verbatim against TRL 1.5+.

1. `units/en/unit2/2.md` — drop `max_prompt_length` from the `DPOConfig` example

TRL 1.5 removed the max_prompt_length argument. The current snippet (lines 86–94) raises immediately:

TypeError: DPOConfig.__init__() got an unexpected keyword argument 'max_prompt_length'

max_length alone caps the combined prompt+completion sequence, so max_prompt_length is redundant. Removed the line and updated the comment on max_length to make its new role explicit.

2. `units/en/unit3/4.md` — `format_data()` uses typed-parts message content that TRL's SFT collator rejects

The current format_data() builds messages like:

{"role": "user", "content": [{"type": "image", ...}, {"type": "text", ...}]}

TRL's SFT collator (trl.data_utils.prepare_multimodal_messages) counts <image> placeholders in string content and auto-injects them from the images field. The typed-parts form is invisible to it, so training raises:

ValueError: Number of images provided (1) does not match number of image placeholders (0).

Switching to plain-string content lets TRL inject the placeholder itself. I added a short comment in the snippet explaining the why, so the next learner who runs into a related collator error has something greppable.

The earlier apply_chat_template usage in the same file (Exercise 1, lines ~148–152) is left untouched — typed-parts work fine for the processor's chat-template path; only the SFTTrainer collator is brittle.

Verification

Diff is intentionally surgical: 3 insertions / 35 deletions across both files, no whitespace/line-ending noise.
Hit both while running the unit verbatim from a clean Python 3.13 + TRL 1.5 + PEFT 0.18 install (notes in VoicesColeby/HFsmolcourse if useful).
Tested by re-running the Unit 2 + Unit 3 scripts against current TRL — both now train end-to-end and push to the Hub.

🤖 Generated with Claude Code

… (1.5) Two small but reproducible API-drift errors learners hit when following units/en/ verbatim with TRL 1.5+: 1) units/en/unit2/2.md — DPOConfig example still lists max_prompt_length. TRL 1.5 removed that argument: TypeError: DPOConfig.__init__() got an unexpected keyword argument 'max_prompt_length' max_length on its own caps the combined prompt+completion sequence, so max_prompt_length is redundant. Removed from the snippet and the comment on max_length adjusted to make the new role explicit. 2) units/en/unit3/4.md — format_data() builds messages with typed-parts content (`content: [{"type": "image", ...}, {"type": "text", ...}]`). TRL's SFT collator (trl.data_utils.prepare_multimodal_messages) counts `<image>` placeholders in *string* content and auto-injects them from the `images` field. The typed-parts form is invisible to it, so training raises: ValueError: Number of images provided (1) does not match number of image placeholders (0). Switched format_data to plain-string content (TRL injects the placeholder itself) and added a short comment explaining the why so the change is greppable. The earlier apply_chat_template usage in the same file (Exercise 1) is left untouched — typed-parts work fine for the processor's chat template, only the SFTTrainer collator path is brittle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs: align Unit 2 DPOConfig & Unit 3 VLM SFT format with current TRL (1.5)#298

Docs: align Unit 2 DPOConfig & Unit 3 VLM SFT format with current TRL (1.5)#298
ColebyPearson wants to merge 1 commit into
huggingface:mainfrom
ColebyPearson:fix-trl-1.5-api-drift

ColebyPearson commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ColebyPearson commented May 27, 2026

1. units/en/unit2/2.md — drop max_prompt_length from the DPOConfig example

2. units/en/unit3/4.md — format_data() uses typed-parts message content that TRL's SFT collator rejects

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `units/en/unit2/2.md` — drop `max_prompt_length` from the `DPOConfig` example

2. `units/en/unit3/4.md` — `format_data()` uses typed-parts message content that TRL's SFT collator rejects