Skip to content

perf: --eager-load-params for fast steady-state streaming#1646

Closed
fszontagh wants to merge 2 commits into
leejet:masterfrom
fszontagh:perf/eager-load-params
Closed

perf: --eager-load-params for fast steady-state streaming#1646
fszontagh wants to merge 2 commits into
leejet:masterfrom
fszontagh:perf/eager-load-params

Conversation

@fszontagh

Copy link
Copy Markdown
Contributor

Summary

After #1644 centralized weight staging, params are loaded from disk to the params backend lazily on the first prepare_params call. For multi-segment streaming on a large model this means the first sampling step pays the entire disk-read cost (8-15 seconds per segment on Z-Image bf16), and batch images re-pay it whenever runner_done() releases the params storage.

This PR adds a sd_ctx_params_t::eager_load_params flag (CLI: --eager-load-params) that loads every registered tensor into the params backend right after metadata validation. Default off, so the lazy behavior is preserved for users who want lower peak host RAM at model-load time.

Numbers

RTX 3060 12 GB, --offload-to-cpu --stream-layers --max-vram -1:

Workload Default (lazy) --eager-load-params
SDXL bf16 1152x896 batch=2 8 steps generate_image 21 s 17 s
Z-Image bf16 1024x688 batch=2 9 steps generate_image 359 s 58 s

For long-lived processes (servers, batch generation) the eager path also reduces total wallclock because images 2..N reuse the warm pinned-host cache instead of re-reading the model from disk.

Implementation

  • ModelManager::load_all_params_eagerly() collects all registered states and calls the existing load_tensors_to_params_backend.
  • Plumbed through sd_ctx_params_t::eager_load_params, init, and to_str.
  • CLI flag added in examples/common.

Checklist

@leejet

leejet commented Jun 14, 2026

Copy link
Copy Markdown
Owner

Closing this as obsolete.

The original issue this PR tried to address no longer applies to the current default behavior. Params storage is kept after first use in the default params backend, so later runs can reuse the loaded weights. Storage is only released for tensors with Disk residency, which corresponds to --params-backend disk, and that reload/release behavior is expected for that mode.

So I don't think we need a separate --eager-load-params option anymore. Thanks for the contribution!

@leejet leejet closed this Jun 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants