Compatibility between scVI and SCENIC

jesswhitts · October 6, 2023, 12:14pm

Hi,

I have a large scVI integrated dataset and was wondering what the best way to implement scenic is? This GRN tool takes a raw matrix, ideally with all genes used as input. But from what I have read, it seems there is no way to retrieve a corrected (but not normalised) counts matrix of all genes from scVI.

Should I use the full raw counts matrix, or the HVG-resitricted scVI normalised matrix?

Is it appropriate to run GRN on uncorrected data?

Many thanks,
jess

martinkim0 · October 6, 2023, 5:48pm

Hi, thanks for you question. I’m not familiar with SCENIC, but if you’d like access to scVI’s corrected and unnormalized gene counts, you can run the following after training:

X = model.get_normalized_expression(library_size="latent")

Note that this will only return the genes that scVI was trained on.

zvittorio · October 9, 2023, 2:23pm

Just out of curiosity, does that mean that the values are the same as the input scVI received?

Also to comment @jesswhitts question: running GRN inference on uncorrected data might be interesting anyways because SCENIC can be robust enough to “correct” for batch effects.

martinkim0 · October 9, 2023, 9:55pm

No, these values will be different from the input that scVI receives as they are reconstructed/generated counts from the model.

dub2s · October 11, 2023, 4:32am

Hi @martinkim0

Can these corrected counts be used to perform DEGs analysis (in a pseudobulk fashion).

While I could correct for batch-effect in UMAP and clustering with scVI, but when I perform pyDESEQ2 on the raw counts, I see very less overlap between the batches. I wonder if the scVI corrected counts might be better for that purpose.

jesswhitts · October 12, 2023, 12:06pm

Thanks for your help @martinkim0 !

Is the ‘transform_batch’ parameter required in this instance, or just specifying latent library size? I can’t quite figure out what the transform_batch parameter does from the docs

martinkim0 · October 12, 2023, 6:27pm

@dub2s I’m not sure actually - I have a feeling that it’s more appropriate to use uncorrected raw counts for DESeq2. Maybe someone else with more experience with DESeq2 can comment.

If you want to perform differential expression on scVI-corrected counts, I would recommend using the built-in function for it: scvi.model.SCVI — scvi-tools

martinkim0 · October 12, 2023, 6:30pm

@jesswhitts Using transform_batch shouldn’t be necessary as it produces counterfactual reconstructions. It just decodes the latent representation with a different batch index than what is the actual data.

When you use model.get_normalized_expression(library_size="latent"), this will by default use the empirical library size of your data. In other words, it will scale the normalized expression generated from the model by the total UMI counts in each cell in your data, so you don’t need to explicitly pass in a library size.

dub2s · October 14, 2023, 9:52pm

Thanks for your input. I wasn’t aware of the differential expression within scVI. I will be trying it soon!

jesswhitts · October 16, 2023, 3:18pm

This makes sense. Thanks for your help!

cane11 · November 28, 2023, 6:22pm

To conclude it, I would recommend using raw counts for DESEQ2. All autoencoder or factor models learn gene-gene correlation. This might lead to false positives. DESEQ2 expects unnormalized count data as input. You won’t get this type of data out of scVI.

Topic		Replies	Views
What is the best way to extract a "full" batch effect corrected count matrix from scVI model? scvi-tools scvi	4	3015	August 16, 2023
Differential expression analysis scvi-tools	4	771	January 5, 2025
Understanding batch-corrected counts in scVI scvi-tools	6	210	March 9, 2025
Is the output of `get_normalized_expression` batch-corrected or not? scvi-tools integration , scvi	2	223	August 5, 2024
Inquiry about Data Input and DE Analysis Details in scVI scvi-tools diff-exp , scvi	4	272	May 3, 2024

Compatibility between scVI and SCENIC

Related topics