Anndata.write fails with "object header message is too large"

Hi.

I’m trying to .write() an anndata object where I have a large dataframe (8869 columns) in the obsm attribute (adata.obsm["my_key"] = large_df). The write call fails with:

RuntimeError: Unable to create attribute (object header message is too large)

Above error raised while writing key 'my_key' of <class 'h5py._hl.group.Group'> to /

Any idea if and how this can be dealt with in anndata, or must we use mudata in this case?

Thanks!

Just pointed at this, I think this could be a useful issue though I’m not sure if we can do anything about it, or if it’s a fundamental limitation with hdf5.

To reproduce:

import anndata as ad, pandas as pd, numpy as np
N = 10_000

a = ad.AnnData(np.ones((5, 10)))
a.obsm["df"] = pd.DataFrame(
    np.ones((5, N)),
    index=a.obs_names,
    columns=[str(i) for i in range(N)]
)
a.write_h5ad("tmp.h5ad")
RuntimeError: Unable to create attribute (object header message is too large)

I think it’s because we’re putting the column order in the attributes, so the attributes get large.

for me it helps to just store the dataframe as numpy array and the columns separately as a list. Not super elegant but an easy fix.