Skip to content

concat_on_disk with join='outer' doesn't retain all .obsm fields #2394

Description

@alam-shahul

Not sure if this is a bug or a missing feature, but ad.concat and ad.concat_on_disk seem to behave differently with join='outer' for .obsm fields that are not shared between all AnnData.

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of anndata.
  • (optional) I have confirmed this bug exists on the main branch of anndata.

Report

Code:

# /// script
# requires-python = ">=3.12"
# dependencies = [
#   "anndata@git+https://github.com/scverse/anndata.git",
# ]
# ///

import anndata
import numpy as np
a = ad.AnnData(
        X=np.ones((2, 10)),
        obsm={'X_emb': np.arange(4).astype(np.float64).reshape((2, 2))},
)
b = ad.AnnData(
        X=np.ones((4, 12)),
)

c = ad.concat([a, b], join="outer")
assert 'X_emb' in c.obsm.keys()

a.write_h5ad('a.h5ad')
b.write_h5ad('b.h5ad')

ad.experimental.concat_on_disk(['a.h5ad', 'b.h5ad'], out_file='d.h5ad', join="outer")

d = ad.read_h5ad('d.h5ad')
assert 'X_emb' in d.obsm.keys()

Versions

0.12.10

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Bug.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions