mosaic-temporal-gpu

The high-speed sibling of mosaic-temporal. NVDEC/NVENC + torch-lap Hungarian + on-GPU torch kernels (Triton port queued for v0.2).

⚠️ Status: 0.1.0 release candidate. Public API (run_pipeline), kernels, solver, NVDEC/NVENC bridge, config schema, and CPU-host tests are in place. The remaining work toward 0.1.0 final is the parity-gate CI on a CUDA runner and the bench-spike sign-off on Kaggle T4 — see Roadmap. The Quickstart below is the supported API; the 3-stream CUDA-overlap optimization that motivated this repo lands in 0.2 without changing the signature.

Positioning

This is the high-speed build of the video mosaic pipeline. The portable sibling mosaic-temporal keeps a CPU fallback at every step for users without a GPU; this repo drops every fallback so the hot path can be NVDEC → Triton → torch-lap → NVENC end-to-end. The cost is hard: NVIDIA GPU with CUDA ≥ 12.0 is required. The benefit is real throughput on long clips.

Feature	mosaic-temporal	mosaic-temporal-gpu (high-speed)
Hungarian assignment	scipy CPU (default)	torch-linear-assignment (only)
Cost matrix	numpy CPU loop	torch.cdist on CUDA (Triton in v0.2)
Oklab grid mean	numpy	torch view+reduce on CUDA (Triton v0.2)
Video I/O	cv2 PNG round-trip	PyAV NVDEC → ndarray → NVENC
RAFT optical flow	CPU torch (slow)	not in v0.1.0 — queued for v0.3
Bit-exact CPU output	yes (`bit-exact-cpu`)	no — parity gated at SSIM ≥ 0.98
Runtime requirement	none	NVIDIA GPU with CUDA ≥ 12.0

If you need the CPU fallback, the bit-exact reference, or Windows/macOS support, use mosaic-temporal. If you have a CUDA GPU and want speed, you're in the right place.

Install (once 0.1.0 ships to PyPI)

mosaic-temporal-gpu requires a CUDA build of PyTorch. Install torch first from the official CUDA wheel index, then install this package:

# 1. CUDA 12.1 wheels (adjust cu121 to your CUDA version)
pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision

# 2. Pure compute kernels only (no video I/O — no PyAV)
pip install mosaic-temporal-gpu

# 2'. With NVDEC/NVENC video I/O (needs a cuvid-enabled FFmpeg + PyAV).
#     The PyPI `av` wheel is software-only — see benchmarks/README.md for
#     the FFmpeg+PyAV self-build recipe. The `[nvdec]` extra declares the
#     `av>=12` dependency; it does NOT build FFmpeg for you.
pip install "mosaic-temporal-gpu[nvdec]"

If you skip step 1, pip will resolve torch to the CPU build from PyPI and every CUDA-only call will fail at runtime — there is no CPU fallback on purpose. NVIDIA driver ≥ R535 and CUDA ≥ 12.0 are prerequisites. Until 0.1.0 ships to PyPI, install from source:

git clone https://github.com/hinanohart/mosaic-temporal-gpu
cd mosaic-temporal-gpu
pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision
pip install -e ".[dev]"

Quickstart

from pathlib import Path
from mosaic_temporal_gpu import run_pipeline

stats = run_pipeline(
    input_video=Path("input.mp4"),
    output_video=Path("output.mp4"),
    tile_dir=Path("tiles/"),       # keyword-only
    fps=30,                        # NVENC output frame rate (input fps
                                   # auto-detection lands in 0.2)
    cq=19,                         # h264_nvenc constant-quality (lower = better)
)
print(stats)
# {"frames": 720, "width": 1920, "height": 1080,
#  "fps": 30, "active_codec": "h264_cuvid"}

Pass a D1Config to override the default vivid_b preset:

from mosaic_temporal_gpu import D1Config, run_pipeline
run_pipeline(..., config=D1Config.from_preset("vivid_b"))

For 0.1.0 we ship the vivid_b preset only (saturation_boost=2.10, mkl_hybrid, neighbor_swap_rounds=5). Additional presets and a CLI front-end are deferred to 0.2 to keep the launch surface narrow.

The active_codec field in the return value is how you confirm NVDEC engaged on the decode side ("h264_cuvid" / "hevc_cuvid"); if it silently falls back to software, the reader raises before any frame is processed — see the R8 assertion in io/nvdec.py.

What works today (component-level)

import torch
from mosaic_temporal_gpu import D1Config
from mosaic_temporal_gpu.kernels.cost_matrix import compute_cost_matrix_gpu
from mosaic_temporal_gpu.solvers.torch_lap import TorchLapSolver

cfg = D1Config.from_preset("vivid_b")          # ✅ schema + preset
cost = compute_cost_matrix_gpu(cells, tiles)   # ✅ GPU cost matrix (CUDA req'd)
assignment = TorchLapSolver().solve(cost)      # ✅ GPU Hungarian

NvdecReader / NvencWriter are likewise importable and tested on CPU host for their error paths; full round-trip needs CUDA.

Parity guarantee (planned, not yet wired)

The release contract is: for each frame of a fixed 24-frame synthetic clip, SSIM(mosaic_temporal_gpu candidate, mosaicraft CPU reference) ≥ 0.98. The test exists (tests/test_parity_vs_mosaicraft.py, @pytest.mark.parity), but GitHub's free runners have no CUDA, so the parity job is not in CI today — it runs locally on a CUDA host with pytest -m parity. A scheduled GPU runner (Modal / RunPod) is queued for 0.1.0 final. Output is not bit-exact (GPU reductions are non-associative); the SSIM gate is the operative contract.

Repository layout

src/mosaic_temporal_gpu/
  __init__.py            # version, public API (D1Config + exceptions today)
  _version.py            # single source of truth
  config.py              # D1Config schema (mirror of mosaic-temporal's GPU-valid subset)
  kernels/
    cost_matrix.py       # GPU cost matrix (torch.cdist on CUDA; Triton port = v0.2)
    oklab_grid.py        # GPU Oklab grid mean (torch view+reduce; Triton port = v0.2)
  solvers/
    torch_lap.py         # torch-linear-assignment wrapper
  io/
    nvdec.py             # PyAV NVDEC reader
    nvenc.py             # PyAV NVENC writer
  pipeline.py            # end-to-end run_pipeline (single CUDA stream;
                         # 3-stream overlap is v0.2)
tests/
  test_parity_vs_mosaicraft.py   # SSIM ≥ 0.98 gate (xfail until CUDA CI)
  test_pipeline_smoke.py         # run_pipeline public-API contract
  test_kernel_shapes.py
  test_solver_torch_lap.py
  test_io_bridges.py
  test_config_schema.py
  test_version_smoke.py

Roadmap

0.1.0 — run_pipeline() shipped (single-stream NVDEC → mosaic → NVENC); parity gate green on a CUDA runner (Modal / RunPod queued); bench-spike sign-off on Kaggle T4.
0.2 — 3-stream CUDA overlap (decode | compute | encode); DLPack zero-copy on both ends of the video bridge; Triton kernels for cost matrix and Oklab grid (replace torch.cdist / torch.view+mean once we benchmark a real win); CLI front-end; additional presets.
0.3 — RAFT optical flow on GPU for temporal coherence; flow_warp module.
1.0 — Stable parity gate across two driver/CUDA upgrades; one breaking-change cycle behind us.

Relation to siblings

mosaicraft (image mosaic, pure numpy/cv2/scipy) — used here as the CPU reference for the parity gate and for the Oklab / MKL OT / Laplacian primitives.
mosaic-temporal (video mosaic, CPU/GPU dual path) — the portable sibling. Same D1Config surface, so config files port between the two.

Verification (sigstore)

Releases from v_next_ (released after 2026-05-16) include a sigstore keyless signature bundle (.sigstore per artifact) attached to the GitHub Release.

Verify a PyPI install

pip download <pkg-name>==<version> --no-deps -d ./verify
python -m sigstore verify github \
    --cert-identity 'https://github.com/hinanohart/mosaic-temporal-gpu/.github/workflows/release.yml@refs/tags/v<version>' \
    --cert-oidc-issuer 'https://token.actions.githubusercontent.com' \
    ./verify/*.whl ./verify/*.tar.gz

The corresponding .sigstore bundles can be downloaded from the GitHub Release page.

Historic releases (pre-2026-05-16)

Earlier releases were published without sigstore bundles. Re-installing those versions provides no cryptographic provenance — pin to a current release if assurance matters.

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github		.github
benchmarks		benchmarks
src/mosaic_temporal_gpu		src/mosaic_temporal_gpu
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.semgrepignore		.semgrepignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
DESIGN.md		DESIGN.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mosaic-temporal-gpu

Positioning

Install (once 0.1.0 ships to PyPI)

Quickstart

What works today (component-level)

Parity guarantee (planned, not yet wired)

Repository layout

Roadmap

Relation to siblings

Verification (sigstore)

Verify a PyPI install

Historic releases (pre-2026-05-16)

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mosaic-temporal-gpu

Positioning

Install (once 0.1.0 ships to PyPI)

Quickstart

What works today (component-level)

Parity guarantee (planned, not yet wired)

Repository layout

Roadmap

Relation to siblings

Verification (sigstore)

Verify a PyPI install

Historic releases (pre-2026-05-16)

License

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages