This is my AnnData object with n_obs × n_vars = 227550 × 32302 , i am trying to regress out ribosomal genes by 2 methods , one by regress_out function and other by selected the genes which start with “RPS” and subsetting it , adata2 = adata1[:,keep] and as soon as i subset the new anndata is consuming huge amounts of memory and i am not even able to plot a simple umap (it keeps running and crashes). Did anyone else face a similar problem?
This sounds strange indeed.
First thing I suggest is checking the dimensions of adata2.
A likely thing is that the first anndata is already eating more than half of your computer’s RAM, and trying to work with twice (or ~1.8) times the information might cause “swap/page fault” latency. In that case you might be able to run the analysis on a stronger machine, or you can try deleting the first object (del adata
) after you create the second.
A more strict way to prevent the memory issue would be to write adata2 to the disk, reopen python and read only the second adata.
Regarding UMAP - are you running PCA before running the neighbors graph?
Hi @navi,
Were you able to resolve this problem? I’m having the same problem. I can subset adata but when I use sc.tl.pca, I get errors.
It’s been resolved. Thanks.