Subsetting anndata is causing error problems

navi · February 22, 2023, 6:41pm

This is my AnnData object with n_obs × n_vars = 227550 × 32302 , i am trying to regress out ribosomal genes by 2 methods , one by regress_out function and other by selected the genes which start with “RPS” and subsetting it , adata2 = adata1[:,keep] and as soon as i subset the new anndata is consuming huge amounts of memory and i am not even able to plot a simple umap (it keeps running and crashes). Did anyone else face a similar problem?

yotamcons · March 16, 2023, 10:33am

This sounds strange indeed.
First thing I suggest is checking the dimensions of adata2.
A likely thing is that the first anndata is already eating more than half of your computer’s RAM, and trying to work with twice (or ~1.8) times the information might cause “swap/page fault” latency. In that case you might be able to run the analysis on a stronger machine, or you can try deleting the first object (del adata) after you create the second.
A more strict way to prevent the memory issue would be to write adata2 to the disk, reopen python and read only the second adata.

Regarding UMAP - are you running PCA before running the neighbors graph?

malonzm1 · May 4, 2024, 5:52am

Hi @navi,

Were you able to resolve this problem? I’m having the same problem. I can subset adata but when I use sc.tl.pca, I get errors.

malonzm1 · May 5, 2024, 4:10am

It’s been resolved. Thanks.

Topic		Replies	Views
Subsetting anndata using genelist anndata	4	4024	May 5, 2024
Problems subsetting scanpy	0	542	March 27, 2023
Removing an AnnData Object from Memory anndata	1	782	April 25, 2023
Can’t change anndata dimensions anndata	6	2039	March 9, 2023
Run scanpy.pp.neighbors and UMAP on a different layer other than X? scanpy	2	889	October 11, 2023

Subsetting anndata is causing error problems

Related topics