Interpretation of fold-changes in differential expression analysis

My greetings to the community,

I have the following questions regarding differential expressed genes (DEGs) with scvi:

  1. From my understanding differential expression is an outcome of the generative process of scvi and as such fold-changes have a distribution. Although, we can look at the lfc_mean or lfc_median to infer up or down regulation in the group of interest, my first question concerns the interpetation of the large ranges of fold-changes with lfc_min being a negative and lfc_max a positive value. Would that be an indicator of uncertainty that could be fixed with larger sample size? In relation to this, the bayes_factor in the vast majority of the DEGs is within the range 2-3. Would you put a threshold and consider DEGs only those with bayes_factor >= 3 to suggest substantial evidence of the alternative hypothesis?

  2. I have observed DEGs where the raw_normalized_mean1 and raw_normalized_mean2 is 0.000000 . Is this an indicator of an error, or is it due to six decimals approximation?

Thank you in advance,
Dimitrios

Hi Dimitrios,

Regarding point 1, requiring lfc_min and lfc_max to have the same sign would be very restrictive. And with larger sample sizes, you should expect even more genes to have lfc_min and lfc_max to have different signs. Conventionally, you would say you have evidence of (specifically) a positive fold change if e.g., 95% of the distribution of the LFC is positive. In scVI’s DE method, genes are called differentially expressed if enough of the distribution is far from a LFC of 0, which includes cases where there are extreme both positive and negative LFCs.

The other points I don’t know about, but hope this helps!
/Valentine

Thank you Valentine for your reply. Your explanation makes sense.

Regarding my second point, similar observations has also been reported by another user (link to post), so it would be great if another member of the scVI team could shed some light on this.

Dimitrios