Scvi.data.organize_multiome_anndatas with two big anndata objects

Hi,

I am trying to use MultiVI with two different anndata objects for paired scRNA and scATAC. In the organize_multiome_anndatas step, I was following the tutorial from the singl-cell best practice

adata_paired = ad.concat([rna.copy().T, atac.copy().T]).T
adata_paired.obs = adata_paired.obs.join(rna.obs[["cell_type", "batch"]])
adata_paired.obs["modality"] = "paired"
adata_paired

adata_mvi = scvi.data.organize_multiome_anndatas(adata_paired)

But this ad.concat([rna.copy().T, atac.copy().T]).T step requires huge memory and my jobs were always killed by the system. I was wondering if there are some ways to “bypass” this step.

Thanks in advance!

Hi, first try would be:

adata = ad.concat([rna, atac], axis=1)

There is also a function to do the merging on disk (anndata.experimental.concat_on_disk — anndata 0.1.dev50+gb3763f8 documentation). It also makes sense to subset to highly variable genes beforehand.

1 Like