Skip to content

write_zarr fails with LocalCUDACluster (dask-cuda) when adata.X has been persisted #2444

Description

@Intron7

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of anndata.
  • (optional) I have confirmed this bug exists on the main branch of anndata.

Report

Code:

from dask_cuda import LocalCUDACluster
from dask.distributed import Client
import dask.array as da
import numpy as np
import zarr

cluster = LocalCUDACluster()
client = Client(cluster)

x = da.from_array(np.ones((10000, 200)), chunks=(1000, 200))
x = x.map_blocks(lambda b: b + 1).persist()

g = zarr.open("/tmp/test.zarr", mode="w", shape=x.shape, dtype=x.dtype, chunks=(1000, 200))
da.store(x, g, scheduler="threads")  # ValueError: Missing dependency ('lambda-<hash>', i, 0)

Traceback:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[3], line 14
     10 x = da.from_array(np.ones((10000, 200)), chunks=(1000, 200))
     11 x = x.map_blocks(lambda b: b + 1).persist()
     12 
     13 g = zarr.open("[/tmp/test.zarr](http://981afa3-lcedt.dyn.nvidia.com:8888/tmp/test.zarr)", mode="w", shape=x.shape, dtype=x.dtype, chunks=(1000, 200))
---> 14 da.store(x, g, scheduler="threads")  # Missing dependency

File [~/micromamba/envs/rapids-26.04/lib/python3.14/site-packages/dask/array/core.py:1218](http://981afa3-lcedt.dyn.nvidia.com:8888/home/sdicks/micromamba/envs/rapids-26.04/lib/python3.14/site-packages/dask/array/core.py#line=1217), in store(***failed resolving arguments***)
   1215 if not return_stored:
   1216     import dask
-> 1218     dask.compute(arrays, **kwargs)
   1219     return None
   1220 else:

File [~/micromamba/envs/rapids-26.04/lib/python3.14/site-packages/dask/base.py:685](http://981afa3-lcedt.dyn.nvidia.com:8888/home/sdicks/micromamba/envs/rapids-26.04/lib/python3.14/site-packages/dask/base.py#line=684), in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    682     expr = expr.optimize()
    683     keys = list(flatten(expr.__dask_keys__()))
--> 685     results = schedule(expr, keys, **kwargs)
    687 return repack(results)

File [~/micromamba/envs/rapids-26.04/lib/python3.14/site-packages/dask/local.py:191](http://981afa3-lcedt.dyn.nvidia.com:8888/home/sdicks/micromamba/envs/rapids-26.04/lib/python3.14/site-packages/dask/local.py#line=190), in start_state_from_dask(dsk, cache, sortkey, keys)
    189 if task is None:
    190     if dependents[key] and not cache.get(key):
--> 191         raise ValueError(
    192             f"Missing dependency {key} for dependents {dependents[key]}"
    193         )
    194     continue
    195 elif isinstance(task, DataNode):

ValueError: Missing dependency ('lambda-e88bbde22afa74e5c4a58733a1fb745d', 4, 0) for dependents {('store-map-680948d7b57b14eb187a955c49f9a516', 4, 0)}

Click to add a cell.

Versions

adata.write_zarr(...) raises ValueError: Missing dependency ... when a dask_cuda.LocalCUDACluster client is active and adata.X is a persisted dask array. The trigger is the hardcoded scheduler="threads" in anndata's writers — it can't resolve Futures held by the dask-cuda cluster's workers.
With a LocalCUDACluster active, .persist() materializes the array's tasks as Future objects on the dask-cuda workers. The local threaded scheduler doesn't have visibility into those Futures and reports them as missing dependencies in dask/local.py::start_state_from_dask.
I'll also file a bug with dask-cuda. However it might be worth looking into our writing functions if hardcoding scheduler="threads" is needed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions