Skip to content

Add size parameter to MedMNIST for MedMNIST+ multi-resolution support; Consolidate DSprites variants into a single config-based class; Added 4 Fine-grained datasets#52

Open
haodongzhang0118 wants to merge 11 commits into
galilai-group:mainfrom
haodongzhang0118:main
Open

Conversation

@haodongzhang0118

@haodongzhang0118 haodongzhang0118 commented Feb 28, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

  • Added a size config parameter to MedMNIST, enabling users to load MedMNIST+ larger resolution variants (64, 128, 224 for 2D; 64 for 3D) in addition to the default 28x28 MNIST-like size.
  • Added integration tests for larger size variants (pathmnist/chestmnist at 224, organmnist3d at 64).
  • Updated documentation to reflect the new size parameter and usage.
  • Consolidate DSprites variants into a single config-based class.
  • Updated the homepage link of shape3d.
  • Added Oxford-Pet III, PlantVillage, EuroSAT, Stanford Dog

Who can review?

@Leon-Leyang

@haodongzhang0118 haodongzhang0118 changed the title Add size parameter to MedMNIST for MedMNIST+ multi-resolution support Add size parameter to MedMNIST for MedMNIST+ multi-resolution support; Consolidate DSprites variants into a single config-based class Apr 3, 2026
@haodongzhang0118 haodongzhang0118 changed the title Add size parameter to MedMNIST for MedMNIST+ multi-resolution support; Consolidate DSprites variants into a single config-based class Add size parameter to MedMNIST for MedMNIST+ multi-resolution support; Consolidate DSprites variants into a single config-based class; Added 4 Fine-grained datasets May 7, 2026
- Resolved dsprites.py and med_mnist.py conflicts
- Migrated EuroSAT/OxfordPet/PlantVillage/StanfordDogs to stable_datasets.schema
The base BaseDatasetBuilder dropped extra kwargs passed to the constructor, so MedMNIST(config_name="pathmnist", size=64) silently fell back to the default size=28: the URL pointed at the right NPZ but the resulting dataset was 28x28.

- BaseDatasetBuilder.__init__ now applies extra kwargs as overrides on a shallow copy of the matched BUILDER_CONFIGS template (and rejects unknown fields and unexpected kwargs on no-config builders).
- cache_fingerprint accepts an optional extra discriminator; passing the empty default preserves existing cache directory names.
- BaseDatasetBuilder.__new__ derives that extra from the diff between the instance config and its template, so distinct overrides produce distinct Arrow caches and don't collide on a stale shard.

Without the cache change, the second fix alone would still return the 28x28 shard already written by an earlier run.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant