huggingface · linoytsaban · Jun 20, 2026
diff --git a/skills/huggingface-lora-space-builder/references/base-models/ltx.md b/skills/huggingface-lora-space-builder/references/base-models/ltx.md
@@ -102,13 +102,16 @@ pipe = LTX2Pipeline.from_pretrained("Lightricks/LTX-2", torch_dtype=torch.bfloat
 
 LTX-2 supports a two-stage pipeline (base + latent upsample) for production quality. For a demo, the single-stage pipeline is usually sufficient and faster.
 
+**Don't default two-stage on for IC-LoRAs — check the card.** The common LTX-2.3 two-stage recipe runs stage 2 as an x2 latent-upsample + refine **with the IC-LoRA disabled** (`disable_lora()`), re-rendering on the bare base — which re-degrades exactly the LoRA-conditioned detail (text, identity). Whether two-stage is appropriate is per-LoRA, per the card: e.g. the *In/Out-painting* card explicitly says "both tasks use a two-stage pipeline," whereas the *Ingredients* card recommends a single-pass 30-step recipe and never mentions two stages. Single-stage is the safer demo default unless the card calls for two-stage.
+
 ## Inference defaults
 
 - **Resolution**: typically multiples of 32. Common sizes: 768×512, 1216×704, 704×480.
 - **Frame count**: typically `8k+1` (e.g. 121, 161, 257). Compute from `duration_seconds * fps` and round.
 - **fps**: 24 by default; some LoRAs are trained at different rates (16, 30) — check the model card.
 - **`num_inference_steps`**: 30–50 for non-distilled. Distilled checkpoints (look for "distilled" in the name) often run at 8–12.
 - **`negative_prompt`**: LTX is sensitive to negative prompts. A good default: `"worst quality, inconsistent motion, blurry, jittery, distorted"`.
+- **Distilled IC-LoRA — disable audio guidance too, not just `guidance_scale`.** `LTX2InContextPipeline` computes `do_classifier_free_guidance = guidance_scale > 1 OR audio_guidance_scale > 1`, and **`audio_guidance_scale` defaults to `7.0`**. So `guidance_scale=1.0` is not enough — CFG stays on via audio (and `stg_scale` defaults on), which doubles/mis-batches the forward against the in-context reference tokens (wrong recipe, sometimes a runtime error). For a distilled IC-LoRA pass all four: `pipe(..., guidance_scale=1.0, stg_scale=0.0, audio_guidance_scale=1.0, audio_stg_scale=0.0)`.
 
 ## Frame-count helper
 
@@ -120,7 +123,16 @@ def num_frames_for_duration(seconds, fps=24, base=8):
 
 ## Native pipeline path (LTX-2.3 fallback)
 
-The native repo is a fallback when the diffusers path doesn't work, or for specific conditionings that diffusers may not support yet. Try `LTX2InContextPipeline` on the diffusers path first (see the IC-LoRAs section above). The pattern looks like:
+The native repo is a fallback when the diffusers path doesn't work, or for specific conditionings that diffusers may not support yet. Try `LTX2InContextPipeline` on the diffusers path first (see the IC-LoRAs section above).
+
+**The native repo depends on the model generation** — pick by the LoRA's base model (and what its card's snippet imports):
+
+| Base model | Native repo | Package(s) / import | Pipeline classes |
+|---|---|---|---|
+| `LTX-Video` (0.9.x) | `github.com/Lightricks/LTX-Video` | `ltx_video` | `from ltx_video.pipelines import LTXPipeline` |
+| `LTX-2`, `LTX-2.3` | `github.com/Lightricks/LTX-2` | `ltx-core` + `ltx-pipelines` (editable) | `ltx_pipelines.ic_lora.ICLoraPipeline`, `TI2VidOneStagePipeline`, … |
+
+For the 0.9.x series the pattern looks like:
 
 ```bash
 # in requirements.txt
@@ -136,6 +148,19 @@ The native path doesn't use `load_lora_weights`. Instead, LoRAs are usually wire
 
 When the LoRA's model card has a Python snippet using the native repo, copy its construction pattern verbatim. The native API changes more often than diffusers' does, so don't paraphrase.
 
+### LTX-2.x native path on ZeroGPU — gotchas
+
+For `LTX-2` / `LTX-2.3` (the `ltx-core` + `ltx-pipelines` repo), clone + `pip install -e packages/ltx-core packages/ltx-pipelines` at runtime in `app.py` and pin a commit. Three things bite on ZeroGPU:
+
+- **Native loader bypasses ZeroGPU virtualization → "No CUDA GPUs available" at startup.** The native safetensors loader does `safe_open(path, device="cuda")` and copies host→device inside safetensors' own C++ (`cudaMemcpy`), **bypassing `torch.Tensor.to`** — the call ZeroGPU patches to virtualize + pack module-scope weights. Nothing packs and module-scope placement raises *"No CUDA GPUs are available."* Fix: monkeypatch the loader to open on CPU then move with `.to`:
+  ```python
+  with safetensors.safe_open(shard, framework="pt", device="cpu") as f:
+      value = f.get_tensor(name).to(device=device)   # torch path → ZeroGPU-virtualisable
+  ```
+  Also: patch attention to **SDPA** (FA3 crashes on Blackwell ZeroGPU), and **never call `torch.cuda.*` / `get_device_capability()` at module scope** (it poisons virtualization).
+- **Native two-stage pins two 22B transformers → offload-disk overflow.** `ICLoraPipeline` builds two independent `ModelLedger`s (stage-1 with the LoRA, stage-2 without), each loading its **own** full transformer. The CLI loads them sequentially, but ZeroGPU pins all weights at module scope, so enabling stage 2 pins **both** (~143G) and overflows the offload disk (`OSError: [Errno 28] No space left on device`, ~96G cap). For a demo: pin stage 1, then have stage 2 **reuse stage 1's pinned modules** (`for n in [...]: setattr(s2, n, getattr(s1, n))`) so only one transformer is resident (~71G).
+- **`ICLoraPipeline` is distilled-only.** Per the LTX-2 `ltx-pipelines` guide, IC-LoRA inference runs in distilled-only mode (fixed 8-step sigmas, no `num_inference_steps`/`guidance_scale`/`negative_prompt`); the docs do not describe non-distilled IC-LoRA or IC-LoRA + CFG/STG (guidance lives only on full-model pipelines like `TI2VidOneStagePipeline`, and in two-stage is stage-1-only). So if a card recommends a *non-distilled* recipe (e.g. 30 steps + guidance + STG on a `…-dev` base), there is no stock IC-LoRA pipeline for it — use the supported distilled `ICLoraPipeline` and note the recipe difference, or fall back to the diffusers `LTX2InContextPipeline` (which exposes steps/guidance).
+
 ## IC-LoRAs (in-context conditioning)
 
 LTX IC-LoRAs condition the model on a reference video alongside a first-frame image. The diffusers path uses `LTX2InContextPipeline` with standard `load_lora_weights` + `set_adapters`.
@@ -173,6 +198,7 @@ Set `@spaces.GPU(duration=...)` to comfortably exceed expected generation time.
 ## Things to watch
 
 - **Latest pipelines often need git diffusers.** Pin to `git+https://github.com/huggingface/diffusers` in `requirements.txt` for any LTX variant released in the last few weeks.
+- **`git+diffusers` is a moving target — verify LTX2 output quality.** Bare `@main` inherits whatever's there. The LTX2 text connector carried a token-reversal regression (PR #13564) that scrambled prompt tokens/registers — degrading prompt adherence and fine detail (e.g. garbled on-screen text), worst on short prompts — until **PR #13931 (merged 2026-06-19)** fixed it. If LTX2 output looks weak or garbled, confirm the installed diffusers commit is **at or past #13931**, and prefer pinning to a known-good commit (`git+https://github.com/huggingface/diffusers@<sha>`) over bare `@main` for reproducibility.
 - **Don't `torch.compile`.** LTX is fast enough on ZeroGPU without it; compile is incompatible anyway.
 - **`enable_vae_tiling()` for higher resolutions.** LTX VAE memory grows with resolution; enable tiling for outputs above 768px on either axis.
 - **Negative prompts matter more than for image models.** Don't ship with an empty negative_prompt unless the LoRA's model card says so.

diff --git a/skills/huggingface-lora-space-builder/references/zerogpu-and-publishing.md b/skills/huggingface-lora-space-builder/references/zerogpu-and-publishing.md
@@ -170,6 +170,8 @@ When sharing the URL with the user:
 - Offer to look at the logs if anything fails.
 - Don't add a long postamble — they want to click the link, not read more text.
 
+**Confirm a redeploy is actually live before testing it.** An `app.py`-only push does **not** change the Space's reported `runtime.stage` — the old replica keeps serving "RUNNING" while the new build swaps in, so a `gradio_client` test can silently hit **stale code**. To be sure: push → `api.restart_space(repo)` → poll until the stage leaves and returns to RUNNING → grep the boot logs (`/logs/run`) for a unique `[VERSION] …` marker you printed at module scope → then test. Also set `demo.launch(show_error=True)` so `gradio_client` surfaces the real traceback instead of a generic `AppError`.
+
 ## Publish-time failures (before the build starts)
 
 These happen during `create_repo` or `upload_file`, *before* the Space build pipeline runs. Diagnose by reading the exception, not the container logs (the container hasn't started yet).