DE analysis stochasticity

wikiselev · August 11, 2022, 11:13am

Hi, All!

When I run DE analysis by scVI multiple times I get lists of genes that don’t overlap much (after applying my own filters of significance, which are the same across DE runs). How do people deal with this stochasticity in scVI DE analysis? Does it mean my groups are not very different from each other?

Many thanks,
Vlad

Valentine_Svensson · August 14, 2022, 12:55am

Hi Vlad,

The DE analysis is based on sampling, so there will be variability between runs.

However, the resulting summary statistics should be (mostly) consistent between runs.

What orders of magnitudes for fold changes and posterior probabilities are you getting for the top genes?

As you are suspecting, if the top genes have small fold changes and a lot of variability (leading to posterior probabilities close to 0.5), the results might change a lot between runs.

In my own analysis, I tend to consider results where posterior probabilities are on the order of < 0.1 or > 0.9, and fold changes of at least 2x in either direction as “successful”. (These thresholds are largely arbitrary).

Best,
/Valentine

wikiselev · August 22, 2022, 11:35am

Hi Valentine,

Many thanks for your reply and sorry for my delayed response, I’ve been on holidays.

Thanks for you clarification and providing your own criteria for DE genes selection. I am using similar values for selection. However, I suspect in my dataset the signal is not super strong. To achieve more robust results I’ve decided to run model.differential_expression multiple times, e.g. 50, write the results to a DataFrame and then groupby by gene names calculating mean of all the DE values. This results in a more or less stable list of DE genes.

Thanks again for your always timely help!
Cheers,
Vlad

Valentine_Svensson · August 30, 2022, 6:57pm

Sounds like a reasonable idea!

I think you can achieve the same result if you increase the number of samples. The default n_samples = 5000 can be changed to e.g. 100,000.

https://docs.scvi-tools.org/en/0.9.1/api/reference/scvi.utils.DifferentialComputation.get_bayes_factors.html#scvi.utils.DifferentialComputation.get_bayes_factors

Best,
/Valentine

wikiselev · August 31, 2022, 9:22am

Oh, great, thanks for the suggestion, Valentine!

adamgayoso · September 11, 2022, 11:07pm

Though you will have to quadruple the number of samples to reduce the standard deviation of the estimates by half.

sim · October 15, 2022, 4:48pm

Interesting discussion. Could you please elaborate on why to quadruple the samples and the connection to the standard error?

Topic		Replies	Views
Interpretation of fold-changes in differential expression analysis scvi-tools	2	531	May 17, 2023
Unexpected DE test results in scvi version 1.3.0 (as compared with version 1.1.2) scvi-tools	15	149	July 15, 2025
Questions about how to do DE with one replicate for one sample scRNA-seq	0	142	July 31, 2024
Understanding differential gene expression analysis scvi-tools	6	1659	April 8, 2021
Inquiry about Data Input and DE Analysis Details in scVI scvi-tools diff-exp , scvi	4	270	May 3, 2024

DE analysis stochasticity

Related topics