Problems with Bayes factors and LFC changes using model.differential_expression()

joseph-siefert · April 12, 2025, 9:30pm

Has something changed for the differential expression in the most recent version of scvi-tools? Previously, plots of bayes_factor vs. lfc_median were very similar to plots of prob_de vs. lfc_median. prob_de plots still look ok, but bayes_factor plots look like this:

Additionally, comparing lfc changes to ground truth simulated data gives wildly inaccurate results. Pseudobulk differential expression on the same dataset gives a perfect correlation (pearson=1.0)

cane11 · April 14, 2025, 5:04am

Does the second plot look better in the old scVI-tools version? See other post about changes.
Could you try with a manual very small pseudocounts like 1e-10?
The Bayes factor plot likely looks find for a two-way comparison (change or not change vs the new three way comparison up/unchanged/down). There might be something wrong in computation there.

joseph-siefert · April 14, 2025, 2:01pm

Changing pseudo counts gives the same results. Changing test_mode=‘two’ does reproduce the bayes factor plots as before, but the lfc estimates are still off. I applied the LFC changes to only genes expressed in >10% of cells and this increased the correlation to ~0.75, but I am still getting better LFC estimates with pseudobulk. I don’t have the plot on the bottom from a previous version (it was years ago I first tested this), but the results were certainly more accurate. I’ll keep an eye on the other post for the kwargs to reproduce previous versions

joseph-siefert · April 26, 2025, 4:33pm

I found the issue, it has to do with total counts per celltype. If there are not at least 10 counts for a gene in a given cell type then the lfc estimates are off:

joseph-siefert · April 29, 2025, 5:33pm

I forgot to mention that is on the log1p scale, so the model seems to require a fairly high number of counts per cell type for accurate LFC estimates. There also seem to be a high number of false positive detected as well. Pseudobulk does not suffer from either of these issues. Any suggestions or solutions from the developers here?

joseph-siefert · April 30, 2025, 5:10am

Regarding the false positives, it seems to be that genes altered in one cell type are showing up as differentially expressed in other cell types they are not altered in. This appears to be due to the normalization. Here are 3 example false positive genes, the top are library size normalized and log1p transformed and the bottom are normalized values extracted from the scVI model. The raw counts and all associated metadata are identical between batches for these genes in this cell type. These genes are altered in other cell types, and that appears to affect the normalization in this cell type. I have many examples of this.

joseph-siefert · May 28, 2025, 5:36pm

One month later and still no response or solution from the developers. 750/783 false positive genes have LFC changes in another cell type, but not the cell type they are being identified as differentially expressed in. This is clearly due to the normalization as I showed in the previous post. The normalization procedure appears to be borrowing information from other cell types, leading to skewed normalization and incorrect differential expression in cell types for which there are no changes. Can the developers please address this issue?

sulfur · June 27, 2025, 6:34am

Relatively new user here, but thankful for these detailed and readily applicable toolkits and workflows!

Similar to Joseph’s original post, I have been noticing a lot of negative bayes_factor values in recent results from my group when previous iterations only had almost exclusively positive bayes_factor results. We believe that what we are seeing are the log of the Bayes factor. The is_de_fdr_0.05 are showing as True for the results with highly negative bayes_factor values.

From the extended table above, which only includes is_de_fdr_0.5 True genes (corresponding to the labeled green dots in the volcano plot), proba_de does seem to be much greater than proba_not_de.

With potential recent updates to the DE test, is it possible that something has changed with respect to the bayes_factor column and interpretation? Would appreciate any help!

ori-kron-wis · July 9, 2025, 12:36pm

Thank you for raising this and sorry for late response.
bayes_factor should be positive for expressed genes, but can get negative for highly underexpressed (with pseudocounts>0).
We are checking what is wrong with current default settings of the DE.

For the meanwhile, I can suggest running the DE function with mode=“vanilla” and pseudocounts=0?
Other option is to run with mode=“change” (default now) and force pseudocounts=0 and test_mode = “two”

Topic		Replies	Views
Wrong log-fold change estimation? scvi-tools diff-exp	3	40	July 9, 2025
Differential expression filtering in scVI scvi-tools	0	335	January 26, 2023
Interpretation of fold-changes in differential expression analysis scvi-tools	2	531	May 17, 2023
Unexpected DE test results in scvi version 1.3.0 (as compared with version 1.1.2) scvi-tools	11	146	July 9, 2025
DE analysis with model.SCVI: which lfc indicates gene up-/down-regulation? scvi-tools diff-exp , scvi	2	940	September 15, 2022

Problems with Bayes factors and LFC changes using model.differential_expression()

Related topics