Skip to content

hinanohart/mosaic-temporal-gpu

Repository files navigation

mosaic-temporal-gpu

The high-speed sibling of mosaic-temporal. NVDEC/NVENC + torch-lap Hungarian + on-GPU torch kernels (Triton port queued for v0.2).

⚠️ Status: 0.1.0 release candidate. Public API (run_pipeline), kernels, solver, NVDEC/NVENC bridge, config schema, and CPU-host tests are in place. The remaining work toward 0.1.0 final is the parity-gate CI on a CUDA runner and the bench-spike sign-off on Kaggle T4 — see Roadmap. The Quickstart below is the supported API; the 3-stream CUDA-overlap optimization that motivated this repo lands in 0.2 without changing the signature.

CI License Python

Positioning

This is the high-speed build of the video mosaic pipeline. The portable sibling mosaic-temporal keeps a CPU fallback at every step for users without a GPU; this repo drops every fallback so the hot path can be NVDEC → Triton → torch-lap → NVENC end-to-end. The cost is hard: NVIDIA GPU with CUDA ≥ 12.0 is required. The benefit is real throughput on long clips.

Feature mosaic-temporal mosaic-temporal-gpu (high-speed)
Hungarian assignment scipy CPU (default) torch-linear-assignment (only)
Cost matrix numpy CPU loop torch.cdist on CUDA (Triton in v0.2)
Oklab grid mean numpy torch view+reduce on CUDA (Triton v0.2)
Video I/O cv2 PNG round-trip PyAV NVDEC → ndarray → NVENC
RAFT optical flow CPU torch (slow) not in v0.1.0 — queued for v0.3
Bit-exact CPU output yes (bit-exact-cpu) no — parity gated at SSIM ≥ 0.98
Runtime requirement none NVIDIA GPU with CUDA ≥ 12.0

If you need the CPU fallback, the bit-exact reference, or Windows/macOS support, use mosaic-temporal. If you have a CUDA GPU and want speed, you're in the right place.

Install (once 0.1.0 ships to PyPI)

mosaic-temporal-gpu requires a CUDA build of PyTorch. Install torch first from the official CUDA wheel index, then install this package:

# 1. CUDA 12.1 wheels (adjust cu121 to your CUDA version)
pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision

# 2. Pure compute kernels only (no video I/O — no PyAV)
pip install mosaic-temporal-gpu

# 2'. With NVDEC/NVENC video I/O (needs a cuvid-enabled FFmpeg + PyAV).
#     The PyPI `av` wheel is software-only — see benchmarks/README.md for
#     the FFmpeg+PyAV self-build recipe. The `[nvdec]` extra declares the
#     `av>=12` dependency; it does NOT build FFmpeg for you.
pip install "mosaic-temporal-gpu[nvdec]"

If you skip step 1, pip will resolve torch to the CPU build from PyPI and every CUDA-only call will fail at runtime — there is no CPU fallback on purpose. NVIDIA driver ≥ R535 and CUDA ≥ 12.0 are prerequisites. Until 0.1.0 ships to PyPI, install from source:

git clone https://github.com/hinanohart/mosaic-temporal-gpu
cd mosaic-temporal-gpu
pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision
pip install -e ".[dev]"

Quickstart

from pathlib import Path
from mosaic_temporal_gpu import run_pipeline

stats = run_pipeline(
    input_video=Path("input.mp4"),
    output_video=Path("output.mp4"),
    tile_dir=Path("tiles/"),       # keyword-only
    fps=30,                        # NVENC output frame rate (input fps
                                   # auto-detection lands in 0.2)
    cq=19,                         # h264_nvenc constant-quality (lower = better)
)
print(stats)
# {"frames": 720, "width": 1920, "height": 1080,
#  "fps": 30, "active_codec": "h264_cuvid"}

Pass a D1Config to override the default vivid_b preset:

from mosaic_temporal_gpu import D1Config, run_pipeline
run_pipeline(..., config=D1Config.from_preset("vivid_b"))

For 0.1.0 we ship the vivid_b preset only (saturation_boost=2.10, mkl_hybrid, neighbor_swap_rounds=5). Additional presets and a CLI front-end are deferred to 0.2 to keep the launch surface narrow.

The active_codec field in the return value is how you confirm NVDEC engaged on the decode side ("h264_cuvid" / "hevc_cuvid"); if it silently falls back to software, the reader raises before any frame is processed — see the R8 assertion in io/nvdec.py.

What works today (component-level)

import torch
from mosaic_temporal_gpu import D1Config
from mosaic_temporal_gpu.kernels.cost_matrix import compute_cost_matrix_gpu
from mosaic_temporal_gpu.solvers.torch_lap import TorchLapSolver

cfg = D1Config.from_preset("vivid_b")          # ✅ schema + preset
cost = compute_cost_matrix_gpu(cells, tiles)   # ✅ GPU cost matrix (CUDA req'd)
assignment = TorchLapSolver().solve(cost)      # ✅ GPU Hungarian

NvdecReader / NvencWriter are likewise importable and tested on CPU host for their error paths; full round-trip needs CUDA.

Parity guarantee (planned, not yet wired)

The release contract is: for each frame of a fixed 24-frame synthetic clip, SSIM(mosaic_temporal_gpu candidate, mosaicraft CPU reference) ≥ 0.98. The test exists (tests/test_parity_vs_mosaicraft.py, @pytest.mark.parity), but GitHub's free runners have no CUDA, so the parity job is not in CI today — it runs locally on a CUDA host with pytest -m parity. A scheduled GPU runner (Modal / RunPod) is queued for 0.1.0 final. Output is not bit-exact (GPU reductions are non-associative); the SSIM gate is the operative contract.

Repository layout

src/mosaic_temporal_gpu/
  __init__.py            # version, public API (D1Config + exceptions today)
  _version.py            # single source of truth
  config.py              # D1Config schema (mirror of mosaic-temporal's GPU-valid subset)
  kernels/
    cost_matrix.py       # GPU cost matrix (torch.cdist on CUDA; Triton port = v0.2)
    oklab_grid.py        # GPU Oklab grid mean (torch view+reduce; Triton port = v0.2)
  solvers/
    torch_lap.py         # torch-linear-assignment wrapper
  io/
    nvdec.py             # PyAV NVDEC reader
    nvenc.py             # PyAV NVENC writer
  pipeline.py            # end-to-end run_pipeline (single CUDA stream;
                         # 3-stream overlap is v0.2)
tests/
  test_parity_vs_mosaicraft.py   # SSIM ≥ 0.98 gate (xfail until CUDA CI)
  test_pipeline_smoke.py         # run_pipeline public-API contract
  test_kernel_shapes.py
  test_solver_torch_lap.py
  test_io_bridges.py
  test_config_schema.py
  test_version_smoke.py

Roadmap

  • 0.1.0run_pipeline() shipped (single-stream NVDEC → mosaic → NVENC); parity gate green on a CUDA runner (Modal / RunPod queued); bench-spike sign-off on Kaggle T4.
  • 0.2 — 3-stream CUDA overlap (decode | compute | encode); DLPack zero-copy on both ends of the video bridge; Triton kernels for cost matrix and Oklab grid (replace torch.cdist / torch.view+mean once we benchmark a real win); CLI front-end; additional presets.
  • 0.3 — RAFT optical flow on GPU for temporal coherence; flow_warp module.
  • 1.0 — Stable parity gate across two driver/CUDA upgrades; one breaking-change cycle behind us.

Relation to siblings

  • mosaicraft (image mosaic, pure numpy/cv2/scipy) — used here as the CPU reference for the parity gate and for the Oklab / MKL OT / Laplacian primitives.
  • mosaic-temporal (video mosaic, CPU/GPU dual path) — the portable sibling. Same D1Config surface, so config files port between the two.

Verification (sigstore)

Releases from v_next_ (released after 2026-05-16) include a sigstore keyless signature bundle (.sigstore per artifact) attached to the GitHub Release.

Verify a PyPI install

pip download <pkg-name>==<version> --no-deps -d ./verify
python -m sigstore verify github \
    --cert-identity 'https://github.com/hinanohart/mosaic-temporal-gpu/.github/workflows/release.yml@refs/tags/v<version>' \
    --cert-oidc-issuer 'https://token.actions.githubusercontent.com' \
    ./verify/*.whl ./verify/*.tar.gz

The corresponding .sigstore bundles can be downloaded from the GitHub Release page.

Historic releases (pre-2026-05-16)

Earlier releases were published without sigstore bundles. Re-installing those versions provides no cryptographic provenance — pin to a current release if assurance matters.

License

MIT. See LICENSE.

About

GPU-accelerated sibling of mosaic-temporal: NVDEC/NVENC + torch-lap Hungarian + Triton kernels for video mosaic generation

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors