Totalvi failing to write an h5mu objecy

Hello,
I have recently been encountering an error while running totalvi. Totalvi finishes running the model fine, but errors when trying to write out the h5mu object file.


                                                 \
                                                Traceback (most recent call last): \
                                                  File "/well/cartography/users/dms607/panpipes_public/python3-venv-panpipes/lib/python3.9/site-packages/ruffus/task.py", line 712, in run_pooled_job_without_exceptions \
                                                    return_value = job_wrapper(params, user_defined_work_func, \
                                                  File "/well/cartography/users/dms607/panpipes_public/python3-venv-panpipes/lib/python3.9/site-packages/ruffus/task.py", line 608, in job_wrapper_output_files \
                                                    job_wrapper_io_files(params, user_defined_work_func, register_cleanup, touch_files_only, \
                                                  File "/well/cartography/users/dms607/panpipes_public/python3-venv-panpipes/lib/python3.9/site-packages/ruffus/task.py", line 540, in job_wrapper_io_files \
                                                    ret_val = user_defined_work_func(*(params[1:])) \
                                                  File "/gpfs3/well/cartography/users/dms607/panpipes_public/panpipes/panpipes/panpipes/pipeline_integration.py", line 560, in run_totalvi \
                                                    P.run(cmd, **job_kwargs) \
                                                  File "/well/cartography/users/dms607/panpipes_public/python3-venv-panpipes/lib/python3.9/site-packages/cgatcore/pipeline/execution.py", line 1244, in run \
                                                    benchmark_data = r.run(statement_list) \
                                                  File "/well/cartography/users/dms607/panpipes_public/python3-venv-panpipes/lib/python3.9/site-packages/cgatcore/pipeline/execution.py", line 820, in run \
                                                    stdout, stderr, resource_usage = self.queue_manager.collect_single_job_from_cluster( \
                                                  File "/well/cartography/users/dms607/panpipes_public/python3-venv-panpipes/lib/python3.9/site-packages/cgatcore/pipeline/cluster.py", line 145, in collect_single_job_from_cluster \
                                                    raise OSError(error_msg) \
                                                OSError: Job 15948774 has non-zero exitStatus 1: hasExited=True,  wasAborted=FalsehasSignal=False, terminatedSignal=''  \
                                                statement = python /gpfs3/well/cartography/users/dms607/panpipes_public/panpipes/panpipes/python_scripts/batch_correct_totalvi.py       --scaled_anndata /well/cartography/projects/analysis/202_cartt_oncology/renal/panpipes/5_sub_buckets_none/endo_fibro/2_pipe_pp/e_bucket.h5mu      --output_csv batch_correction/umap_multimodal_totalvi.csv       --figdir figures/      --integration_col_categorical cart_patient_id  --neighbors_method scanpy --neighbors_metric euclidean --neighbors_n_pcs 35 --neighbors_k 30 > logs/multimodal_totalvi.log \
                                                 stderr = /etc/bashrc: line 12: PS1: unbound variable \
                                                [rank: 0] Global seed set to 0 \
                                                GPU available: True (cuda), used: True \
                                                TPU available: False, using: 0 TPU cores \
                                                IPU available: False, using: 0 IPUs \
                                                HPU available: False, using: 0 HPUs \
                                                LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [3] \
                                                SLURM auto-requeueing enabled. Setting signal handlers. \
                                                `Trainer.fit` stopped: `max_epochs=100` reached. \
                                                Traceback (most recent call last): \
                                                  File "/well/cartography/users/dms607/panpipes_public/python3-venv-panpipes/lib/python3.9/site-packages/anndata/_io/utils.py", line 214, in func_wrapper \
                                                    return func(elem, key, val, *args, **kwargs) \
                                                  File "/well/cartography/users/dms607/panpipes_public/python3-venv-panpipes/lib/python3.9/site-packages/anndata/_io/specs/registry.py", line 175, in write_elem \
                                                    _REGISTRY.get_writer(dest_type, t, modifiers)(f, k, elem, *args, **kwargs) \
                                                  File "/well/cartography/users/dms607/panpipes_public/python3-venv-panpipes/lib/python3.9/site-packages/anndata/_io/specs/registry.py", line 24, in wrapper \
                                                    result = func(g, k, *args, **kwargs) \
                                                  File "/well/cartography/users/dms607/panpipes_public/python3-venv-panpipes/lib/python3.9/site-packages/anndata/_io/specs/methods.py", line 500, in write_dataframe \
                                                    group.attrs["column-order"] = col_names \
                                                  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper \
                                                  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper \
                                                  File "/well/cartography/users/dms607/panpipes_public/python3-venv-panpipes/lib/python3.9/site-packages/h5py/_hl/attrs.py", line 104, in __setitem__ \
                                                    self.create(name, data=value) \
                                                  File "/well/cartography/users/dms607/panpipes_public/python3-venv-panpipes/lib/python3.9/site-packages/h5py/_hl/attrs.py", line 206, in create \
                                                    attr = h5a.create(self._id, self._e(tempname), htype, space) \
                                                  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper \
                                                  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper \
                                                  File "h5py/h5a.pyx", line 50, in h5py.h5a.create \
                                                RuntimeError: Unable to create attribute (object header message is too large) \
                                                 \
                                                The above exception was the direct cause of the following exception: \
                                                 \
                                                Traceback (most recent call last): \
                                                  File "/gpfs3/well/cartography/users/dms607/panpipes_public/panpipes/panpipes/python_scripts/batch_correct_totalvi.py", line 284, in <module> \
                                                    mdata.write("tmp/totalvi_scaled_adata.h5mu") \
                                                  File "/well/cartography/users/dms607/panpipes_public/python3-venv-panpipes/lib/python3.9/site-packages/mudata/_core/mudata.py", line 1184, in write_h5mu \
                                                    write_h5mu(filename, self, **kwargs) \
                                                  File "/well/cartography/users/dms607/panpipes_public/python3-venv-panpipes/lib/python3.9/site-packages/mudata/_core/io.py", line 207, in write_h5mu \
                                                    _write_h5mu(f, mdata, **kwargs) \
                                                  File "/well/cartography/users/dms607/panpipes_public/python3-venv-panpipes/lib/python3.9/site-packages/mudata/_core/io.py", line 75, in _write_h5mu \
                                                    write_elem(group, "obsm", dict(adata.obsm), dataset_kwargs=kwargs) \
                                                  File "/well/cartography/users/dms607/panpipes_public/python3-venv-panpipes/lib/python3.9/site-packages/anndata/_io/utils.py", line 214, in func_wrapper \
                                                    return func(elem, key, val, *args, **kwargs) \
                                                  File "/well/cartography/users/dms607/panpipes_public/python3-venv-panpipes/lib/python3.9/site-packages/anndata/_io/specs/registry.py", line 175, in write_elem \
                                                    _REGISTRY.get_writer(dest_type, t, modifiers)(f, k, elem, *args, **kwargs) \
                                                  File "/well/cartography/users/dms607/panpipes_public/python3-venv-panpipes/lib/python3.9/site-packages/anndata/_io/specs/registry.py", line 24, in wrapper \
                                                    result = func(g, k, *args, **kwargs) \
                                                  File "/well/cartography/users/dms607/panpipes_public/python3-venv-panpipes/lib/python3.9/site-packages/anndata/_io/specs/methods.py", line 281, in write_mapping \
                                                    write_elem(g, sub_k, sub_v, dataset_kwargs=dataset_kwargs) \
                                                  File "/well/cartography/users/dms607/panpipes_public/python3-venv-panpipes/lib/python3.9/site-packages/anndata/_io/utils.py", line 220, in func_wrapper \
                                                    raise type(e)( \
                                                RuntimeError: Unable to create attribute (object header message is too large) \
                                                 \
                                                Above error raised while writing key 'totalvi_denoised_rna' of <class 'h5py._hl.group.Group'> to / \
                                                 \
                                                 \
                                                

anyone have any ideas of what might be causing this?

can you share the code you’re using to train and add objects to the mudata object?

hi @adamgaysoo,
im using panpipes , to run totalvi and writing my object as an muon object. im sure that the script itselt is. not broken and have successfully used it to run totalvi and save outputs succesfully previously. We have deduced, its to do with number of cells we have for the dataset (approx 10200 cells) . We are assuming its not enough cells for totalvi model to work?

the specific totalvi script that we run within panpipes for batch correction is here: panpipes/batch_correct_totalvi.py at main · DendrouLab/panpipes · GitHub

Best,
Devika

I would make an issue at panpipes for this. It’s unclear what could be happening without trying your data.

Hi Adam,
thanks for your comments. Im part of the panpipes team and have been chatting wtih the others about this. We dont see it as a panpipes specific issue currently, as the script runs fine on other datasets.
it even produces UMAP co-ordinates, but fails to write out a muon object, after adding the denoised protein/RNA expression matrices to the obsm in muon. We are thinking, its because these are basically either full of NAs or something. I will try to run the same dataset just using scvi-tools directly and seeing if i get a similar error

Thank you for replying.

Best,
Devika

maybe @gtca can help?

From the log pasted above it seems it might an HDF5-related issue but I don’t know the reason.

I think it might be something we could look into on the AnnData side (MuData reuses native AnnData writers as also can be seen in the log) — this would require a way to reproduce this error. Is it by any chance a pandas data frame with a lot of columns that is attempted to be written?..

hi @gtca
what would you need from my end, the original muon object without the totalvi outputs added and the totalvi denoised rna output as a pandas dataframe?

Best,
Devika