Skip to content

Reduce BPE merge update allocations and add file progress logs#2092

Open
voidful wants to merge 1 commit into
huggingface:mainfrom
voidful:bpe-progress-merge-reduce
Open

Reduce BPE merge update allocations and add file progress logs#2092
voidful wants to merge 1 commit into
huggingface:mainfrom
voidful:bpe-progress-merge-reduce

Conversation

@voidful

@voidful voidful commented Jun 8, 2026

Copy link
Copy Markdown

Summary

  • aggregate BPE pair count deltas and update positions per worker before updating global merge tables
  • avoid collecting all per-word merge changes into a single intermediate vector
  • add optional file preprocessing stderr progress controlled by TOKENIZERS_PROGRESS_LOG_INTERVAL_SECONDS

Testing

  • cargo check
  • cargo test models::bpe::trainer -- --nocapture

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants