Is the output of `get_normalized_expression` batch-corrected or not?

dtaylo95 · August 1, 2024, 4:10pm

Hello!
First things first, thank you for this awesome tool.

I know this question is similar to previous questions on this forum, but that’s part of the confusion I’m running into.

Some background. I have scRNA-seq data from several independent experiments (which I’ll refer to as batches) that I am hoping to pool together/integrate and use for eQTL mapping. I am hoping to use scvi to integrate these data together and correct for the batch effects that come from pooling multiple independent experiments.

I used scvi.model.SCVI.setup_anndata with batch_key="experiment" (that’s the categorical variable that encodes the scRNA-seq experiment), and I’ve trained an scvi model using that anndata object. Now my question is this: is the output of model.get_normalized_expression() expected to be batch-corrected? In other words, can I use this data as input for eQTL mapping, or do I need to do additional batch-correction steps (e.g. calculating PEER factors)?

Some threads in this forum seem to suggest that the output of this function is batch-corrected (e.g. Differential expression with scvi - batch correction?). Other threads suggest that the output is not batch-corrected (e.g. How to extract batch-corrected expression matrix from trained scVI vae model).

I’m really just hoping to get a straight answer on this. The tool and documentation are phenomenal overall, but this nuance is tripping me up.

Thank you so much in advance!

cane11 · August 4, 2024, 6:01pm

The output is not batch corrected. See the other post. I highly suggest against using get_normalized_counts for eQTL mapping. ScVI learns gene-gene dependencies. This will lead to trans-eQTL that are not backed by data. This is such a sensitive field that you should use best practice for sc-eQTL mapping and not apply less well validated approach (like scVI to normalize counts). You can still use the latent space to identify cell-types or similar cells across samples.

dtaylo95 · August 5, 2024, 2:42pm

Thanks so much for the response, this is very helpful. I’ll stick to using the scVI output for cell-type ID, and relying on the raw data for downstream.

Topic		Replies	Views
What is the best way to extract a "full" batch effect corrected count matrix from scVI model? scvi-tools scvi	4	3017	August 16, 2023
How to extract batch-corrected expression matrix from trained scVI vae model scvi-tools scvi	5	1523	June 20, 2022
Differential expression analysis scvi-tools	4	772	January 5, 2025
Batch correction in reconstructed gene space scvi-tools	5	1015	May 9, 2024
Domain adaptation to pre-train batch correction model using paired data scvi-tools integration , scvi , developer	12	109	May 27, 2025

Is the output of `get_normalized_expression` batch-corrected or not?

Related topics