Releases: ludwig-ai/ludwig
Release list
v0.17.6
What's new
Preprocessing progress callback — implement in your subclass to receive live progress updates (0.0 to 1.0) during feature preprocessing. Works with pandas, Dask, and Ray backends out of the box.
class MyCallback(Callback):
def on_preprocess_progress(self, progress: float, **kwargs):
print(f"Preprocessing: {progress:.0%}")Fixes
- MLflow 3.x filesystem tracking store compatibility in CI
- GPU Docker images now correctly ship CUDA-enabled PyTorch wheels (cu126)
v0.17.5
What's changed
Bug fix: GPU Docker images now install the CUDA build of PyTorch
The GPU Docker images (ludwig-gpu, ludwig-ray-gpu) were incorrectly shipping the CPU build of PyTorch (torch==2.12.0 without a +cu* suffix) despite being GPU images. This meant GPU training silently fell back to CPU.
Root cause (two issues):
-
The
--force-reinstallstep used--extra-index-urlinstead of--index-url. With--extra-index-url, pip checks PyPI first and finds the CPU wheel (torch==2.12.0) there, so it never looks at the PyTorch CUDA index. -
The CUDA index suffix was
cu124, buttorch==2.12.0is not published on the cu124 index (which only goes up to2.6.0+cu124). The fix switches tocu126, wheretorch==2.12.0+cu126is available.
Fix:
- Changed
--extra-index-url→--index-urlon the force-reinstall step in both GPU Dockerfiles so pip goes exclusively to the PyTorch wheel server. - Changed
cu124→cu126throughout both GPU Dockerfiles (including the Ray base image tag).
Verified locally: torch==2.12.0+cu126 with CUDA build version: 12.6 confirmed inside the rebuilt image.
Updated Docker images (0.17.5)
ludwigai/ludwig:0.17.5— CPUludwigai/ludwig-gpu:0.17.5— CUDA 12.6 (torch 2.12.0+cu126)ludwigai/ludwig-ray:0.17.5— CPU + Rayludwigai/ludwig-ray-gpu:0.17.5— CUDA 12.6 + Ray (torch 2.12.0+cu126)
v0.17.3
What's changed
Bug fixes
-
Fixed LLM fine-tuning crash with
torchao>=0.17(#4170)
torchao 0.17requirestorch>=2.11(usestorch.utils._pytree.register_constant), but the previous Docker images shippedtorch==2.6.0, causing anAttributeErroron import. Ludwig itself now enforcestorch>=2.11so the combination can never resolve to an incompatible pair again. -
Fixed torchaudio / torchcodec audio loading —
torchaudio>=2.11delegates all audio I/O totorchcodec, which requires FFmpeg. The CI images and Docker images now install FFmpeg explicitly. -
Fixed CI dependency resolver poisoning — passing
--extra-index-url https://download.pytorch.org/whl/cputo the Ludwig[test]install step causeduvto resolve all packages (includingdatasets,ray,packaging) through the PyTorch wheel server, returning ancient versions. The install order is now: pin all torch-family packages from the CPU extra-index first, then install.[test]against plain PyPI.
Dependency changes
Updated lower bounds in pyproject.toml:
| Package | Old lower bound | New lower bound |
|---|---|---|
torch |
>=1.13 |
>=2.11 |
torchaudio |
>=0.13 |
>=2.11 |
torchvision |
>=0.14 |
>=0.26 |
torchcodec |
(new) | >=0.1 |
transformers |
>=4.36 |
>=5.0 |
torchao (llm extra) |
>=0.8.0 |
>=0.17.0 |
Docker images
All four images (ludwig, ludwig-ray, ludwig-gpu, ludwig-ray-gpu) are rebuilt with:
torch==2.12.0torchvision==0.27.0torchaudio==2.11.0torchcodec(CPU/CUDA variant as appropriate)- FFmpeg installed in the image
AutoML improvement
- Changed the default tabular combiner in AutoML from
tabnettoft_transformer, which generally performs better on tabular datasets.
Ludwig 0.17.2
What's changed
Bug fixes
- Fix LLM fine-tuning crash (
AttributeError: module 'torch.utils._pytree' has no attribute 'register_constant') caused bytorchao>=0.11being installed againsttorch<2.11— tighten all torch-family lower bounds to prevent incompatible combinations (#4170) - Update all Docker images from
torch==2.6.0totorch==2.12.0/torchvision==0.27.0/torchaudio==2.11.0, eliminating the mismatch between the Ray base image's bundled torch and thetorchaoversion pulled in byludwig[llm]
New features
- AutoML default tabular combiner changed from
tabnettoft_transformer— better out-of-the-box performance on most tabular datasets - Added
ft_transformerandtabtransformerto AutoML combiner defaults with tuned hyperopt parameter spaces
Dependency updates
| Package | Before | After |
|---|---|---|
torch |
>=2.0 |
>=2.11 |
torchaudio |
>=2.0 |
>=2.11 |
torchvision |
>=0.15 |
>=0.26 |
transformers |
>=4.36 |
>=5.0 |
torchao (llm extra) |
>=0.9.0 |
>=0.17.0 |
Docker images
ludwigai/ludwig:0.17.2— CPU, torch 2.12.0ludwigai/ludwig-gpu:0.17.2— CUDA 12.4, torch 2.12.0+cu124ludwigai/ludwig-ray:0.17.2— CPU + Ray 2.54, torch 2.12.0ludwigai/ludwig-ray-gpu:0.17.2— CUDA 12.4 + Ray 2.54, torch 2.12.0+cu124
Ludwig 0.17.1
What's changed
Bug fixes
- Fix preprocessing pipeline crash for output features without
preprocessingconfig (e.g.anomalyoutput type) —KeyErrorinbuild_preprocessing_parameters,build_dataset,build_metadata, andbuild_data - Fix v0.17.0 PyPI deployment: the
v0.17.0tag was created before the version bump commit landed, causing the PyPI build to produceludwig-0.16.2and fail with a duplicate-upload error
New features
StudioCallback: writesmetrics.jsonlandtrials.jsonlfor Ludwig Studio integration- Per-call callbacks on
train()andpredict() - Hyperopt executor hardened for ray-free environments (OptunaExecutor no longer requires Ray)
- SIGUSR1/SIGUSR2 pause/resume support in training
Upgrade notes
- Replace
pip install ludwig==0.17.0withpip install ludwig==0.17.1(0.17.0 was never successfully published to PyPI)
Ludwig 0.17.0
New Features
Lazy Preprocessing for Audio and Image
Ludwig 0.17 introduces lazy preprocessing — the most significant change to the training pipeline in several releases. Previously, audio and image features required a full preprocessing pass before training could begin: decode every file, resize/resample, write to disk. For large multimodal datasets, this meant waiting hours before a single training step ran.
Now you can start training immediately.
preprocessing_mode: lazy— audio and image features are decoded on the fly during training, directly from raw file paths. No upfront pass. Training starts in seconds.preprocessing_mode: lazy_cached— decoded tensors are cached as memory-mapped files on the first pass through the data. Subsequent epochs hit the cache directly, with zero decode overhead after the first.preprocessing_mode: eager— the previous default, preserved for full backwards compatibility.prefetch_size— configurable prefetch queue depth for the background decoder thread, letting you tune the CPU/GPU overlap for your hardware.- A background prefetch thread decodes ahead of the training loop, keeping the GPU fed without blocking the forward pass.
- Ray distributed training decodes lazy features inside the Ray data pipeline (not inside training actors), so decode work is properly parallelized across the cluster.
This matters most for audio and image datasets too large to preprocess in full — but it also makes iteration faster for any multimodal workload. (#4171, #4173)
Mega-AutoML: Rebuilt from the Ground Up
The AutoML infrastructure in Ludwig has been completely overhauled. (#4168, #4169)
YAML search space — search spaces are now declared in YAML and loaded via SearchSpace._from_specs(). This makes it straightforward to define custom search spaces, share them across experiments, and version-control them alongside your configs.
Dataset quality analysis — before building a search space, Ludwig now profiles your dataset: size, class balance, modality distribution, output cardinality. The search space is then constructed with awareness of what the data actually looks like.
Dataset-size-aware hyperparameter caps — epoch counts and batch sizes are now automatically capped based on dataset size, preventing both under-training on large datasets and OOM on tiny ones. Transformer-based combiners (cross_attention, perceiver) get additional batch size caps to prevent GPU memory exhaustion.
Learning rate and stability fixes — transformer-based combiners now have a capped learning rate upper bound to prevent NaN loss during hyperopt on sensitive architectures.
configs_from_dataframe improvements — default_epochs is now correctly threaded through to TrainerSpec, and the default search space builder is properly imported and called.
HuggingFace Dataset Library — 500+ Datasets
Ludwig now ships with a built-in library of over 500 HuggingFace datasets, usable directly from ludwig.datasets. This release adds:
- Datasets spanning every modality — text classification, NER, question answering, summarization, audio classification, image classification, multimodal tasks, and more.
- Custom loaders for complex datasets — for HF datasets that require non-trivial loading logic (custom splits, column renaming, label normalization, multi-config handling).
- ESC-50 (environmental audio classification), WikiANN (multilingual NER), GoEmotions (fine-grained emotion classification), New Yorker Caption Contest (multimodal humor), and many more.
These aren't just dataset references — each ships with a Ludwig config that maps columns to features, sets appropriate output types, and is smoke-tested end-to-end.
Refactoring
Visualize Package
The visualize.py module had grown to 4,144 lines. It's now split into a domain-scoped visualize/ package, with submodules organized by visualization type (learning curves, confusion matrices, calibration, hyperopt, etc.). The CLI entrypoint and all public APIs are unchanged. (#4154)
Text Encoders
Removed a stale duplicate text/encoders.py that had diverged from the canonical encoder implementations. Deep nesting in several encoder classes has been flattened for readability. (#4159)
Bug Fixes
- LLM extra now requires
torch>=2.7— the LLM extra (pip install ludwig[llm]) now pinstorch>=2.7, which is required for quantization and Flash Attention 2 support in the current transformers stack. Non-OSError exceptions during pretrained model loading no longer trigger the retry loop. - TabPFN v2 guard — Ludwig now raises a clear error at config validation time when a
tabpfn_v2combiner is configured but thetabpfnpackage is not installed, instead of failing at model construction. - Ray lazy decode placement — lazy audio/image features are now decoded inside the Ray data pipeline, before data reaches training actors. This keeps decode work off the critical path and avoids serialization of decoded tensors across Ray object store. Missing
lazy_audio_params/lazy_image_paramsnow emit a warning rather than a silent no-op. - Smoke test stability — diversity retry logic for sorted classification datasets, media-aware shuffle buffer sizing, and per-modality buffer tuning to eliminate label collapse in small evaluation splits.
CI
Distributed integration tests now run in 6 parallel groups (up from 1), cutting distributed test wall time by ~5×. Integration test groups are renamed to sequential letters for clarity. (#4172)
Installation
```
pip install ludwig==0.17.0
```
GitHub: https://github.com/ludwig-ai/ludwig
Docs: https://ludwig.ai
v0.16.2
Bug Fixes & Test Improvements
Bug Fixes
fix: call init_dist_strategy("local") in tune_batch_size_fn and tune_learning_rate_fn— fixesRuntimeError: Distributed strategy not initialized(#4149) when auto-tuning batch size or learning rate via Ray with non-MeanMetricoutput features (e.g. number output →MSEMetric).fix: remove redundant dtype check in text_feature— removes a strict integer-only dtype guard that broke tests passing float32 tensors (the function already casts to int32 internally).fix: add visualize __main__.py for python -m ludwig.visualize— restorespython -m ludwig.visualizeafter the visualize module was split into a package.fix: update test_serve_v2 to use numpy_to_python— updates test import after_numpy_safewas renamed tonumpy_to_pythonindata_utils.
Refactoring
refactor: major api.py cleanup— guard clauses, extraction, type annotations, docstring fixes acrossapi.py,serve.py,serve_v2.py,visualize/, and several utils.
Tests
test: regression tests for transformers import safety and Ray tune dist strategy— prevents recurrence of #4142 (brokenPreTrainedModelimport) and #4149 (missinginit_dist_strategyin Ray tune fns).fix: rewrite torch_utils tests to not fail in no-CUDA CI environments— CUDA-specific GPU isolation tests now skip gracefully on CPU-only runners; idempotency test rewritten to verify behavior directly.
v0.16.1
Bug Fixes
-
from ludwig.api import LudwigModelfails on Python 3.12 — Whentorchaoand PyTorch are version-mismatched (torchaocallstorch.utils._pytree.register_constant, added in PyTorch 2.5+),transformers's lazy loader raisesModuleNotFoundErrorfor any class defined inmodeling_utils.py, includingPreTrainedModel.llm_utils.pyandtext_feature.pyboth imported these classes at module level; they are now deferred toTYPE_CHECKING. Follows the same fix applied tohf_utils.pyin v0.15.1. (#4142) -
Replace
assertwith explicit exceptions; fix mutable default arg — Internalassertstatements replaced withValueError/RuntimeErrorso they aren't silently stripped withpython -O. Fixed a mutable default argument that could cause cross-call state leakage. (#4152) -
Unified ruff toolchain — Replaced
black+isort+flake8with a singleruffinvocation for linting and formatting. No behavior change for users.
v0.16.0
New Features
Timeseries Forecasting
- PatchTST & N-BEATS encoders — State-of-the-art patch-based and basis-expansion timeseries encoders. Both support multivariate and univariate forecasting with
TimeseriesOutputFeature. (#4147) - MASE & sMAPE metrics — Mean Absolute Scaled Error and symmetric Mean Absolute Percentage Error added for forecasting evaluation. (#4147)
Advanced PEFT Adapters
- New adapter types — TinyLoRA, C3A, OFT (Orthogonal Fine-Tuning), HRA (Householder Reflection Adaptation), WaveFT, LN-Tuning, VBLoRA. (#4146)
- New LoRA initializers — PiSSA (Principal Singular values and Singular vectors Adaptation), EVA (Explained Variance Adaptation), CorDA/LoftQ. (#4146)
Phase 6: Future Capabilities
- LLM config generation —
ludwig generate_config "describe your task"uses an LLM to write the YAML config for you. (#4092) - HyperNetwork combiner — Conditioning-based feature fusion where one feature generates weights for others. (#4092)
- Nash-MTL & Pareto-MTL — Game-theoretic and preference-based multi-task loss balancing strategies. (#4092)
New Examples
- VLM fine-tuning — LLaVA, Qwen2-VL, InternVL via
is_multimodal: true. (#4140) - Mamba-2 / Jamba encoders — State-space model encoders for sequence tasks. (#4140)
- Ray Serve & KServe deployment — Distributed and Kubernetes-native serving shims. (#4140)
- Multi-task & HyperNetwork examples (#4112)
Bug Fixes
- Python 3.12 import fix — Deferred
PreTrainedModelimport toTYPE_CHECKINGto fixImportErroron Python 3.12 when HuggingFace transformers is not installed in all code paths. - Dask image bytes UnicodeDecodeError —
dask.config.set({"dataframe.convert-string": False})is now applied at import time, preventingUnicodeDecodeErrorwhen image bytes pass through Dask string columns. (#4151) - Dask shuffle partd race condition — Replaced file-based Dask shuffle (which hit a partd lock race under concurrent workers) with tasks-based shuffle. (#4150)
- Encoder
input_shapecontract — Fixed a contract violation where certain encoders did not correctly report or handleinput_shape, causing shape mismatches during model construction. (#4148) - Ray backend GPU underutilization —
RayDatasetBatchernow runsto_tensorslocally in the producer thread instead of spawning remote Ray tasks per block. Datasets are materialized before training to avoid Parquet re-reads on every epoch. (#4144)
v0.15.1
Bug fixes
-
Ray training 3.7x slowdown eliminated —
LudwigProgressBarwas callingrt.report()on every training batch when running inside Ray workers (~1.9 s/call through the Ray GCS). With hundreds of batches this completely dominated wall-clock time. Per-batch progress reporting is now suppressed; training metrics continue to be reported correctly at eval/checkpoint time. Ray overhead is now ~1.7x vs local (fixed TorchTrainer setup cost), down from 3.7x. (#4144) -
GPU underutilization in Ray backend fixed —
RayDatasetBatcherwas runningto_tensorsviamap_batches(spawning a Ray remote task per dataset block with scheduling overhead). It now runs locally in the producer thread. Also: datasets are now materialized before training to avoid re-reading Parquet from disk on every epoch. (#4144) -
Python 3.14 compatibility —
LudwigBaseConfigsubclasses crashed withPydanticUserError: Field requires a type annotationon Python 3.14 because annotations are now stored lazily via__annotate_func__. Fixed in_LudwigModelMeta.__new__. (#4144) -
ModernBERT tokenizer — Models containing "bert" in their name (e.g.
answerdotai/ModernBERT-base) were incorrectly routed toBertTokenizer(WordPiece), causingMissing [UNK] tokenerrors. ModernBERT now correctly usesHFTokenizer(AutoTokenizer / BPE). (#4144) -
Dask
meta=parameter — Multiple feature types (binary,category,sequence,timeseries,text,vector) called.map()without ameta=argument, causingValueError: Metadata inference failed in mapwhen using the Dask engine (backend: {type: ray, processor: {type: dask}}). All bare.map()calls are now fixed. (#4144)