Hello!
First things first, thank you for this awesome tool.
I know this question is similar to previous questions on this forum, but that’s part of the confusion I’m running into.
Some background. I have scRNA-seq data from several independent experiments (which I’ll refer to as batches) that I am hoping to pool together/integrate and use for eQTL mapping. I am hoping to use scvi
to integrate these data together and correct for the batch effects that come from pooling multiple independent experiments.
I used scvi.model.SCVI.setup_anndata
with batch_key="experiment"
(that’s the categorical variable that encodes the scRNA-seq experiment), and I’ve trained an scvi model using that anndata
object. Now my question is this: is the output of model.get_normalized_expression()
expected to be batch-corrected? In other words, can I use this data as input for eQTL mapping, or do I need to do additional batch-correction steps (e.g. calculating PEER factors)?
Some threads in this forum seem to suggest that the output of this function is batch-corrected (e.g. Differential expression with scvi - batch correction?). Other threads suggest that the output is not batch-corrected (e.g. How to extract batch-corrected expression matrix from trained scVI vae model).
I’m really just hoping to get a straight answer on this. The tool and documentation are phenomenal overall, but this nuance is tripping me up.
Thank you so much in advance!