perf: --eager-load-params for fast steady-state streaming by fszontagh · Pull Request #1646 · leejet/stable-diffusion.cpp

fszontagh · 2026-06-13T08:19:31Z

Summary

After #1644 centralized weight staging, params are loaded from disk to the params backend lazily on the first prepare_params call. For multi-segment streaming on a large model this means the first sampling step pays the entire disk-read cost (8-15 seconds per segment on Z-Image bf16), and batch images re-pay it whenever runner_done() releases the params storage.

This PR adds a sd_ctx_params_t::eager_load_params flag (CLI: --eager-load-params) that loads every registered tensor into the params backend right after metadata validation. Default off, so the lazy behavior is preserved for users who want lower peak host RAM at model-load time.

Numbers

RTX 3060 12 GB, --offload-to-cpu --stream-layers --max-vram -1:

Workload	Default (lazy)	`--eager-load-params`
SDXL bf16 1152x896 batch=2 8 steps `generate_image`	21 s	17 s
Z-Image bf16 1024x688 batch=2 9 steps `generate_image`	359 s	58 s

For long-lived processes (servers, batch generation) the eager path also reduces total wallclock because images 2..N reuse the warm pinned-host cache instead of re-reading the model from disk.

Implementation

ModelManager::load_all_params_eagerly() collects all registered states and calls the existing load_tensors_to_params_backend.
Plumbed through sd_ctx_params_t::eager_load_params, init, and to_str.
CLI flag added in examples/common.

Checklist

I have read and confirmed this PR follows the contribution guidelines.

…arams # Conflicts: # examples/common/common.cpp # src/stable-diffusion.cpp

leejet · 2026-06-14T11:51:29Z

Closing this as obsolete.

The original issue this PR tried to address no longer applies to the current default behavior. Params storage is kept after first use in the default params backend, so later runs can reuse the loaded weights. Storage is only released for tensors with Disk residency, which corresponds to --params-backend disk, and that reload/release behavior is expected for that mode.

So I don't think we need a separate --eager-load-params option anymore. Thanks for the contribution!

fszontagh added 2 commits June 13, 2026 09:41

perf: --eager-load-params for fast steady-state streaming

466698a

Merge remote-tracking branch 'upstream/master' into perf/eager-load-p…

81e16ca

…arams # Conflicts: # examples/common/common.cpp # src/stable-diffusion.cpp

leejet closed this Jun 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: --eager-load-params for fast steady-state streaming#1646

perf: --eager-load-params for fast steady-state streaming#1646
fszontagh wants to merge 2 commits into
leejet:masterfrom
fszontagh:perf/eager-load-params

fszontagh commented Jun 13, 2026

Uh oh!

leejet commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fszontagh commented Jun 13, 2026

Summary

Numbers

Implementation

Checklist

Uh oh!

leejet commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants