I am not sure if there is any way I can get batch-corrected counts from scvi.model.SCVI()? I am planning to do DE gene analysis but I am not sure how to handle the batch effect. I know scvi() can handle batch effect very well in low-dimentional latent space but I am not sure how that helps my DE analysis? Thank you so much!
It depends on what you precisely mean by ‘handling batch effects’.
For example, the
.differential_expression() method has a parameter
batch_correction which when set to
True will average over batches so that the fold change between your groups will adjust for potential batch differences.
Or you can further specify the parameter
batchid1 = "batch 1" and
batchid2 = "batch 1" to transform all expression levels to what they would be in “batch 1” while performing differential expression.
So the DE framework is flexible, but what to do depends a bit what you want to learn. If you want to average over batches or treat one batch as a reference batch.
P.S. - You can generate data (that is, UMI counts) is if it had come from one specific batch using the
.posterior_predictive_sample() method, where you would make an
adata where the batch column only has one batch. In theory then you could take those counts and put in to some other DE framework. But I would be weary of this strategy compared to either using the built in DE system or taking the original observed UMI counts and model the batch structure with (G)LMMs.