I am working on a big sized scRNAseq atlas with 2 million cells. I want to get the normalized expression. However, since it returns a data frame/numpy array, I run out of memory every time I am trying to retrieve the normalized expressions. I need it for performing DE. Is there any other way to get it? or perform DE between clusters without using it?
Yes, of course, there’s a direct way to run DE from a scVI-trained model, without the need to get_normalization_expression first: model.differential_expression(…), and you can state if you want to do group vs group, group vs all , by which groups, and so on..
If you still need the whole normalized expression itself, and do not have enough memory, you can extract it in smaller chunks of adatas.
Thanks for your response. I tried the direct way using model.differential_expression. It throws an out of memory error due to the size of the adata probably. Is there a way to resolve this? or may be extracting the normalized data in smaller chunks might be useful, could you please guide me on how to do it?
You will want to add a filter to set counts to zero below a certain threshold like 1e-5 to increase sparsity. However, we usually do not recommend using normalized counts for downstream tasks such as Wilcoxon or t-test.
It might be also worth to compute posterior predictive samples - those are the counts sampled from the negative binomial distribution using the learned parameters from scVI.