galilai-group · haodongzhang0118 · Feb 28, 2026 · Feb 28, 2026 · Apr 3, 2026 · Apr 3, 2026
diff --git a/docs/source/datasets/dsprites.rst b/docs/source/datasets/dsprites.rst
@@ -12,17 +12,46 @@ dSprites
 Overview
 --------
 
-The dSprites dataset is a synthetic benchmark designed for **disentangled and unsupervised representation learning**. It consists of procedurally generated **binary black-and-white images** of simple 2D shapes, rendered under controlled and fully known generative factors.
+The dSprites dataset is a synthetic benchmark designed for **disentangled and unsupervised representation learning**. It consists of procedurally generated 2D shapes rendered under controlled and fully known generative factors.
 
 The dataset contains **all possible combinations** of six latent factors of variation, with each combination appearing exactly once. This complete Cartesian product structure makes dSprites a standard benchmark for evaluating disentanglement, factor predictability, and interpretability of learned representations.
 
-- **Total images**: 737,280
-- **Image resolution**: 64×64 (binary)
+In stable-datasets, dSprites is exposed via the ``DSprites`` class, with **four variants** selectable via ``config_name``:
+
+- **Total images**: 737,280 per variant
+- **Image resolution**: 64x64 (original) or 64x64x3 (color, noise, scream)
+
+Variants
+--------
+
+.. list-table::
+   :header-rows: 1
+   :widths: 15 15 55
+
+   * - Variant
+     - Image Mode
+     - Description
+   * - ``original``
+     - Grayscale
+     - Binary black-and-white images (default)
+   * - ``color``
+     - RGB
+     - Object rendered with a random RGB color on a black background
+   * - ``noise``
+     - RGB
+     - White object on a random-noise background
+   * - ``scream``
+     - RGB
+     - Object rendered by inverting pixels on a random Scream painting patch
+
+.. image:: teasers/dsprites_teaser.gif
+   :align: center
+   :width: 90%
 
 Latent Factors of Variation
---------------------------
+---------------------------
 
-The dataset is generated from six independent latent factors:
+All variants share the same six independent latent factors:
 
 .. list-table::
    :header-rows: 1
@@ -42,20 +71,14 @@ The dataset is generated from six independent latent factors:
      - Linearly spaced in [0.5, 1.0]
    * - ``orientation``
      - {0, ..., 39}
-     - Uniform in [0, 2π] radians
+     - Uniform in [0, 2pi] radians
    * - ``posX``
      - {0, ..., 31}
      - Normalized position in [0, 1]
    * - ``posY``
      - {0, ..., 31}
      - Normalized position in [0, 1]
 
-Each image corresponds to a **unique combination** of these factors.
-
-.. image:: teasers/dsprites_teaser.gif
-   :align: center
-   :width: 90%
-
 Data Structure
 --------------
 
@@ -70,50 +93,82 @@ When accessing an example using ``ds[i]``, you will receive a dictionary with th
      - Description
    * - ``image``
      - ``PIL.Image.Image``
-     - 64×64 binary image
+     - 64x64 (grayscale) or 64x64x3 (RGB) image
    * - ``label``
      - ``List[int]``
      - Discrete latent indices: ``[color, shape, scale, orientation, posX, posY]``
    * - ``label_values``
      - ``List[float]``
      - Continuous latent values corresponding to ``label``
-   * - ``color`` … ``posY``
+   * - ``color`` ... ``posY``
      - ``int``
      - Individual discrete latent factors
-   * - ``colorValue`` … ``posYValue``
+   * - ``colorValue`` ... ``posYValue``
      - ``float``
      - Individual continuous latent values
+   * - ``colorRGB``
+     - ``List[float]``
+     - Actual RGB color applied to the object (**color variant only**)
 
 Usage Example
 -------------
 
-**Basic Usage**
+**Basic Usage (original variant)**
 
 .. code-block:: python
 
-    from stable_datasets.images.dsprites import DSprites
-
-    # First run will download + prepare cache, then return the split as a HF Dataset
-    ds = DSprites(split="train")
+    from stable_datasets.images import DSprites
 
-    # If you omit the split (split=None), you get a DatasetDict with all available splits
-    ds_all = DSprites(split=None)
+    # Default variant is "original"
+    ds = DSprites(split="train", config_name="original")
 
     sample = ds[0]
     print(sample.keys())
 
-    image = sample["image"]
-    factors = sample["label"]
-    factor_values = sample["label_values"]
+    image = sample["image"]       # PIL.Image (64x64 grayscale)
+    factors = sample["label"]     # [color, shape, scale, orientation, posX, posY]
 
     # Optional: make it PyTorch-friendly
     ds_torch = ds.with_format("torch")
 
+**Color variant**
+
+.. code-block:: python
+
+    from stable_datasets.images import DSprites
+
+    ds = DSprites(split="train", config_name="color")
+
+    sample = ds[0]
+    image = sample["image"]         # PIL.Image (64x64x3 RGB)
+    color_rgb = sample["colorRGB"]  # [R, G, B] in [0.5, 1.0]
+
+**Noise variant**
+
+.. code-block:: python
+
+    from stable_datasets.images import DSprites
+
+    ds = DSprites(split="train", config_name="noise")
+
+    sample = ds[0]
+    image = sample["image"]  # PIL.Image (64x64x3, noisy background)
+
+**Scream variant**
+
+.. code-block:: python
+
+    from stable_datasets.images import DSprites
+
+    ds = DSprites(split="train", config_name="scream")
+
+    sample = ds[0]
+    image = sample["image"]  # PIL.Image (64x64x3, Scream painting background)
 
 Why No Train/Test Split?
------------------------
+------------------------
 
-The dSprites dataset does not define an official train/test split.  
+The dSprites dataset does not define an official train/test split.
 It is intended for **representation learning research**, where models are trained to capture underlying factors of variation rather than to generalize across semantic classes.
 
 Because the dataset is a complete Cartesian product of all factor combinations, common evaluation protocols rely on:
@@ -122,22 +177,15 @@ Because the dataset is a complete Cartesian product of all factor combinations,
 - Metric-based disentanglement scores
 - Controlled interventions on latent variables
 
-Related Datasets
-----------------
-
-- **dSprites-Color**: Colored variant of dSprites
-- **dSprites-Noisy**: Noisy background variant
-- **dSprites-Scream**: Backgrounds replaced with natural images
-
 References
 ----------
 
 - Dataset repository: https://github.com/google-deepmind/dsprites-dataset
-- License: zlib/libpng License
-- Paper: Higgins et al., *β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework*, ICLR 2017
+- Disentanglement library: https://github.com/google-research/disentanglement_lib/
+- License: zlib/libpng License (original), Apache License 2.0 (color/noise/scream)
 
-Citation
---------
+Citations
+---------
 
 .. code-block:: bibtex
 
@@ -148,3 +196,15 @@ Citation
       booktitle={International Conference on Learning Representations},
       year={2017}
     }
+
+.. code-block:: bibtex
+
+    @inproceedings{locatello2019challenging,
+      title={Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations},
+      author={Locatello, Francesco and Bauer, Stefan and Lucic, Mario and
+              Raetsch, Gunnar and Gelly, Sylvain and
+              Sch{\"o}lkopf, Bernhard and Bachem, Olivier},
+      booktitle={International Conference on Machine Learning},
+      pages={4114--4124},
+      year={2019}
+    }
diff --git a/docs/source/datasets/dsprites_color.rst b/docs/source/datasets/dsprites_color.rst