Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 26 additions & 4 deletions docs/backend.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
`stable-diffusion.cpp` has two backend assignments:

- `--backend` selects the runtime backend used to execute model graphs.
- `--params-backend` selects the backend used to allocate model parameters.
- `--params-backend` selects where model parameters are kept.

If `--params-backend` is not set, parameters use the same backend as their module runtime backend.

Expand All @@ -29,6 +29,12 @@ The same syntax is used for parameter placement:
sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend te=cpu,vae=cpu
```

`--params-backend` also accepts the special value `disk`:

```shell
sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend disk
```

Module names are case-insensitive. Hyphens and underscores in module names are ignored, so `clip_vision`, `clip-vision`, and `clipvision` are equivalent.

`all=`, `default=`, and `*=` can be used to set the default backend inside a mixed assignment:
Expand Down Expand Up @@ -64,9 +70,11 @@ The special values `auto`, `default`, and an empty backend name select the defau

The special value `gpu` selects the first GPU backend, falling back to the first integrated GPU backend.

The special value `disk` is accepted only by `--params-backend`. `--backend disk` is invalid because `disk` is a parameter residency mode, not a runtime compute backend.

## Runtime backend vs. parameter backend

The runtime backend controls where graph execution runs. The parameter backend controls where model weights are allocated.
The runtime backend controls where graph execution runs. The parameter backend controls where model weights are allocated or whether they are reloaded from disk on demand.

For example:

Expand All @@ -76,6 +84,16 @@ sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend cpu

This runs all modules on `cuda0`, but stores parameters in CPU RAM. During execution, parameters are moved to the runtime backend as needed.

For example:

```shell
sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend disk
```

This runs all modules on `cuda0`, reloads parameters from the model file as needed, and releases those parameter buffers after use.

`disk` is never selected implicitly. If `--params-backend` is not set, parameters use the runtime backend.

Per-module assignments can be mixed:

```shell
Expand All @@ -100,6 +118,8 @@ uses one shared CPU backend for both `te` and `vae` runtime execution.

Runtime and parameter assignments also share the same backend cache. If `--backend diffusion=cuda0` and `--params-backend diffusion=cuda0` resolve to the same device, both use the same backend instance.

`--params-backend disk` does not create a separate backend instance. Parameters are loaded lazily using the module runtime backend.

`SDBackendManager` owns the backend instances and frees them when the context or upscaler is destroyed. Model runners receive non-owning runtime and parameter backend pointers and do not free them.

## Compatibility flags
Expand All @@ -113,10 +133,12 @@ The older CPU placement flags are still supported:

`--clip-on-cpu`, `--vae-on-cpu`, and `--control-net-cpu` affect runtime backend assignment only when `--backend` is not set. They map to `te=cpu`, `vae=cpu`, and `controlnet=cpu`.

`--offload-to-cpu` affects parameter backend assignment only when `--params-backend` is not set. It is equivalent to:
`--offload-to-cpu` prepends a CPU default to the parameter assignment before parsing:

```shell
--params-backend cpu
--params-backend '*=cpu'
```

Because this default is inserted first, later explicit `--params-backend` entries can still override it, for example `--offload-to-cpu --params-backend te=disk` keeps non-TE parameters on CPU and reloads TE parameters from disk.

Explicit `--backend` and `--params-backend` assignments are preferred for new commands.
34 changes: 33 additions & 1 deletion docs/performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,38 @@ and the compute buffer shrink in the debug log:

Using `--offload-to-cpu` allows you to offload weights to the CPU, saving VRAM without reducing generation speed.

## Use params backend to reduce VRAM or RAM usage.

`--params-backend` controls where model parameters are kept. If it is not set, parameters use the same backend as `--backend`, so a GPU runtime backend also keeps parameters in VRAM.

Use CPU params to reduce VRAM usage:

```shell
--backend cuda0 --params-backend cpu
```

This keeps model weights in system RAM and moves them to the runtime backend when needed. `--offload-to-cpu` is a compatibility shortcut that prepends `*=cpu` to `--params-backend`, so explicit module assignments can still override it:

```shell
--offload-to-cpu --params-backend te=disk
```

Use disk params to reduce both VRAM and RAM usage:

```shell
--backend cuda0 --params-backend disk
```

This reloads parameters from the model file on demand and releases them after use. It has the lowest memory residency, but can be slower because weights must be read again. `disk` is never selected implicitly; set it explicitly when RAM usage matters more than reload cost.

Per-module assignments can target only the largest modules:

```shell
--backend cuda0 --params-backend diffusion=disk,te=cpu,vae=cpu
```

See [backend selection](./backend.md) for full syntax.

## Use quantization to reduce memory usage.

[quantization](./quantization_and_gguf.md)
[quantization](./quantization_and_gguf.md)
2 changes: 1 addition & 1 deletion examples/cli/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -746,7 +746,7 @@ int main(int argc, const char* argv[]) {
vae_decode_only = false;
}

sd_ctx_params_t sd_ctx_params = ctx_params.to_sd_ctx_params_t(vae_decode_only, true, cli_params.taesd_preview);
sd_ctx_params_t sd_ctx_params = ctx_params.to_sd_ctx_params_t(vae_decode_only, cli_params.taesd_preview);

SDImageVec results;
int num_results = 0;
Expand Down
5 changes: 2 additions & 3 deletions examples/common/common.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -421,7 +421,7 @@ ArgOptions SDContextParams::get_options() {
&backend},
{"",
"--params-backend",
"parameter backend assignment, e.g. cpu or diffusion=cpu,clip=cpu",
"parameter backend assignment, e.g. disk, cpu, or diffusion=disk,clip=cpu",
&params_backend},
};

Expand Down Expand Up @@ -757,7 +757,7 @@ std::string SDContextParams::to_string() const {
return oss.str();
}

sd_ctx_params_t SDContextParams::to_sd_ctx_params_t(bool vae_decode_only, bool free_params_immediately, bool taesd_preview) {
sd_ctx_params_t SDContextParams::to_sd_ctx_params_t(bool vae_decode_only, bool taesd_preview) {
embedding_vec.clear();
embedding_vec.reserve(embedding_map.size());
for (const auto& kv : embedding_map) {
Expand Down Expand Up @@ -788,7 +788,6 @@ sd_ctx_params_t SDContextParams::to_sd_ctx_params_t(bool vae_decode_only, bool f
photo_maker_path.c_str(),
tensor_type_rules.c_str(),
vae_decode_only,
free_params_immediately,
n_threads,
wtype,
rng_type,
Expand Down
2 changes: 1 addition & 1 deletion examples/common/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ struct SDContextParams {
bool validate(SDMode mode);
bool resolve_and_validate(SDMode mode);
std::string to_string() const;
sd_ctx_params_t to_sd_ctx_params_t(bool vae_decode_only, bool free_params_immediately, bool taesd_preview);
sd_ctx_params_t to_sd_ctx_params_t(bool vae_decode_only, bool taesd_preview);
};

struct SDGenerationParams {
Expand Down
2 changes: 1 addition & 1 deletion examples/server/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ int main(int argc, const char** argv) {
LOG_DEBUG("%s", ctx_params.to_string().c_str());
LOG_DEBUG("%s", default_gen_params.to_string().c_str());

sd_ctx_params_t sd_ctx_params = ctx_params.to_sd_ctx_params_t(false, false, false);
sd_ctx_params_t sd_ctx_params = ctx_params.to_sd_ctx_params_t(false, false);
SDCtxPtr sd_ctx(new_sd_ctx(&sd_ctx_params));

if (sd_ctx == nullptr) {
Expand Down
1 change: 0 additions & 1 deletion include/stable-diffusion.h
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,6 @@ typedef struct {
const char* photo_maker_path;
const char* tensor_type_rules;
bool vae_decode_only;
bool free_params_immediately;
int n_threads;
enum sd_type_t wtype;
enum rng_type_t rng_type;
Expand Down
38 changes: 28 additions & 10 deletions src/core/ggml_extend_backend.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,10 @@ static bool is_default_backend_token(const std::string& name) {
return lower.empty() || lower == "default" || lower == "auto";
}

static bool is_disk_backend_token(const std::string& name) {
return lower_copy(trim_copy(name)) == "disk";
}

static bool parse_backend_module(const std::string& raw_name, SDBackendModule* module) {
std::string name = lower_copy(trim_copy(raw_name));
name.erase(std::remove(name.begin(), name.end(), '-'), name.end());
Expand Down Expand Up @@ -504,6 +508,9 @@ ggml_backend_t SDBackendManager::params_backend(SDBackendModule module) {
if (name.empty()) {
return runtime_backend(module);
}
if (is_disk_backend_token(name)) {
return runtime_backend(module);
}
return init_cached_backend(name);
}

Expand All @@ -515,6 +522,10 @@ bool SDBackendManager::params_backend_is_cpu(SDBackendModule module) {
return sd_backend_is_cpu(params_backend(module));
}

bool SDBackendManager::params_backend_is_disk(SDBackendModule module) const {
return is_disk_backend_token(params_assignment_.get(module));
}

bool SDBackendManager::runtime_backend_supports_host_buffer(SDBackendModule module) {
ggml_backend_t backend = runtime_backend(module);
if (backend == nullptr) {
Expand All @@ -534,7 +545,6 @@ bool SDBackendManager::runtime_backend_supports_host_buffer(SDBackendModule modu

bool SDBackendManager::init(const char* backend_spec,
const char* params_backend_spec,
bool offload_params_to_cpu,
bool keep_clip_on_cpu,
bool keep_vae_on_cpu,
bool keep_control_net_on_cpu,
Expand All @@ -560,18 +570,20 @@ bool SDBackendManager::init(const char* backend_spec,
}
}

if (params_assignment_.empty() && offload_params_to_cpu) {
params_assignment_.set_default("cpu");
}

return validate(error);
}

bool SDBackendManager::validate(std::string* error) const {
auto validate_name = [&](const std::string& name) -> bool {
auto validate_runtime_name = [&](const std::string& name) -> bool {
if (is_default_backend_token(name)) {
return true;
}
if (is_disk_backend_token(name)) {
if (error != nullptr) {
*error = "backend 'disk' is only supported by params_backend";
}
return false;
}
if (!sd_resolve_backend_name(name).empty()) {
return true;
}
Expand All @@ -580,18 +592,24 @@ bool SDBackendManager::validate(std::string* error) const {
}
return false;
};
auto validate_params_name = [&](const std::string& name) -> bool {
if (is_disk_backend_token(name)) {
return true;
}
return validate_runtime_name(name);
};

if (!validate_name(runtime_assignment_.default_name) ||
!validate_name(params_assignment_.default_name)) {
if (!validate_runtime_name(runtime_assignment_.default_name) ||
!validate_params_name(params_assignment_.default_name)) {
return false;
}
for (const auto& kv : runtime_assignment_.module_names) {
if (!validate_name(kv.second)) {
if (!validate_runtime_name(kv.second)) {
return false;
}
}
for (const auto& kv : params_assignment_.module_names) {
if (!validate_name(kv.second)) {
if (!validate_params_name(kv.second)) {
return false;
}
}
Expand Down
2 changes: 1 addition & 1 deletion src/core/ggml_extend_backend.h
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,6 @@ class SDBackendManager {

bool init(const char* backend_spec,
const char* params_backend_spec,
bool offload_params_to_cpu,
bool keep_clip_on_cpu,
bool keep_vae_on_cpu,
bool keep_control_net_on_cpu,
Expand All @@ -63,6 +62,7 @@ class SDBackendManager {

bool runtime_backend_is_cpu(SDBackendModule module);
bool params_backend_is_cpu(SDBackendModule module);
bool params_backend_is_disk(SDBackendModule module) const;
bool runtime_backend_supports_host_buffer(SDBackendModule module);

private:
Expand Down
2 changes: 1 addition & 1 deletion src/model/adapter/lora.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ struct LoraModel : public GGMLRunner {
if (model_manager == nullptr ||
!model_manager->register_param_tensors("LoRA",
std::move(tensors),
ModelManager::ResidencyMode::Resident,
ModelManager::ResidencyMode::ParamBackend,
runtime_backend,
params_backend) ||
!model_manager->validate_registered_tensors()) {
Expand Down
2 changes: 1 addition & 1 deletion src/model/adapter/pmid.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -622,7 +622,7 @@ struct PhotoMakerIDEmbed : public GGMLRunner {
model_loader.load_tensors(on_new_tensor_cb);
if (!model_manager->register_param_tensors("PhotoMaker ID embeds",
tensors,
ModelManager::ResidencyMode::Resident,
ModelManager::ResidencyMode::ParamBackend,
runtime_backend,
params_backend) ||
!model_manager->validate_registered_tensors()) {
Expand Down
2 changes: 1 addition & 1 deletion src/model/diffusion/control.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -482,7 +482,7 @@ struct ControlNet : public GGMLRunner {
manager->set_n_threads(n_threads);
if (!manager->register_param_tensors("ControlNet",
std::move(tensors),
ModelManager::ResidencyMode::Resident,
ModelManager::ResidencyMode::ParamBackend,
runtime_backend,
params_backend) ||
!manager->validate_registered_tensors()) {
Expand Down
2 changes: 1 addition & 1 deletion src/model/diffusion/flux.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1609,7 +1609,7 @@ namespace Flux {
if (!model_manager->register_runner_params("Flux test",
*flux,
"model.diffusion_model",
ModelManager::ResidencyMode::Resident,
ModelManager::ResidencyMode::ParamBackend,
backend,
backend) ||
!model_manager->validate_registered_tensors()) {
Expand Down
2 changes: 1 addition & 1 deletion src/model/diffusion/ltxv.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -2048,7 +2048,7 @@ namespace LTXV {
if (!model_manager->register_runner_params("LTXAV test",
*ltxav,
"model.diffusion_model",
ModelManager::ResidencyMode::Resident,
ModelManager::ResidencyMode::ParamBackend,
backend,
backend) ||
!model_manager->validate_registered_tensors()) {
Expand Down
2 changes: 1 addition & 1 deletion src/model/diffusion/mmdit.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1015,7 +1015,7 @@ struct MMDiTRunner : public DiffusionModelRunner {
if (!model_manager->register_runner_params("MMDiT test",
*mmdit,
"model.diffusion_model",
ModelManager::ResidencyMode::Resident,
ModelManager::ResidencyMode::ParamBackend,
backend,
backend) ||
!model_manager->validate_registered_tensors()) {
Expand Down
2 changes: 1 addition & 1 deletion src/model/diffusion/qwen_image.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -715,7 +715,7 @@ namespace Qwen {
if (!model_manager->register_runner_params("Qwen image test",
*qwen_image,
"model.diffusion_model",
ModelManager::ResidencyMode::Resident,
ModelManager::ResidencyMode::ParamBackend,
backend,
backend) ||
!model_manager->validate_registered_tensors()) {
Expand Down
2 changes: 1 addition & 1 deletion src/model/diffusion/wan.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1040,7 +1040,7 @@ namespace WAN {
if (!model_manager->register_runner_params("Wan test",
*wan,
"model.diffusion_model",
ModelManager::ResidencyMode::Resident,
ModelManager::ResidencyMode::ParamBackend,
backend,
backend) ||
!model_manager->validate_registered_tensors()) {
Expand Down
2 changes: 1 addition & 1 deletion src/model/diffusion/z_image.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -723,7 +723,7 @@ namespace ZImage {
if (!model_manager->register_runner_params("ZImage test",
*z_image,
"model.diffusion_model",
ModelManager::ResidencyMode::Resident,
ModelManager::ResidencyMode::ParamBackend,
backend,
backend) ||
!model_manager->validate_registered_tensors()) {
Expand Down
2 changes: 1 addition & 1 deletion src/model/te/llm.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -2084,7 +2084,7 @@ namespace LLM {
if (!model_manager->register_runner_params("LLM test",
*llm,
"text_encoders.llm",
ModelManager::ResidencyMode::Resident,
ModelManager::ResidencyMode::ParamBackend,
backend,
backend) ||
!model_manager->validate_registered_tensors()) {
Expand Down
2 changes: 1 addition & 1 deletion src/model/te/t5.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -592,7 +592,7 @@ struct T5Embedder {
if (!model_manager->register_runner_params("T5 test",
*t5,
"",
ModelManager::ResidencyMode::Resident,
ModelManager::ResidencyMode::ParamBackend,
backend,
backend) ||
!model_manager->validate_registered_tensors()) {
Expand Down
2 changes: 1 addition & 1 deletion src/model/vae/ltx_audio_vae.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1082,7 +1082,7 @@ namespace LTXV {

if (!model_manager->register_runner_params("LTX audio VAE test",
*ltx_audio_vae,
ModelManager::ResidencyMode::Resident,
ModelManager::ResidencyMode::ParamBackend,
backend,
backend) ||
!model_manager->validate_registered_tensors()) {
Expand Down
Loading
Loading