Discrepancy between raw mean/non-zero proportion and LFC with model.differential_expression

easyeryiji · May 29, 2025, 2:54am

Dear totalVI community,

I’m encountering a puzzling issue while analyzing my Spatial CITE_seq data (P_CD8) with totalVI, and I hope someone can help me interpret the results. Here’s the key data for the gene/protein of interest:

plaintext

Comparison: group1 (12) vs group2 (Rest)
raw_mean1: 1.038232 (group1)
raw_mean2: 0.351163 (group2)
non_zeros_proportion1: 0.567503 (group1)
non_zeros_proportion2: 0.266475 (group2)
lfc_mean: -0.094309 (close to zero)
bayes_factor: -1.45131 (supporting no difference)
is_de_fdr_0.05: False (not significant)
delta: 0.25 
sample_protein_mixing: True (background set to zero)

Key observations:

Raw mean expression in group1 is ~3x higher than group2, and non-zero proportions differ by ~30% (0.56 vs. 0.26).
However, the LFC mean is negligible (-0.09), and the bayes factor is negative, suggesting no significant difference.
The result contradicts my expectation that large differences in raw expression and non-zero rates would translate to a significant DE result.

Questions:

Why does a large raw mean difference not translate to a meaningful LFC? Is this due to the logarithmic transformation or the model’s handling of zero-inflation?
How does totalVI integrate non-zero proportion differences into the DE test? Should I adjust parameters like delta=0.25 or sample_protein_mixing=True for sparse protein data?
Could the negative bayes factor indicate that the model favors the null hypothesis despite raw differences? What assumptions might I be missing?

Context:

The data is spatial CITE_seq data, we get 100 proteins and 20,000 genes in one tissue slice.
I used empirical_protein_background_prior=True and batch_correction=False (no batch effect in the data).

Any insights into the model’s behavior or suggestions for troubleshooting would be greatly appreciated! Thank you.

Best regards,
Ji

ori-kron-wis · July 9, 2025, 3:54pm

Can you share more information on how did you run this? scvi-version and how specifically you ran the DE function, as we might have an issue in most recent scvi-tools version.

cane11 · July 10, 2025, 12:10pm

I would assume the model likely learns that this is agreeable with background. This can e.g. be the case when comparing cells with Fc receptors and high unspecific antibody binding to cells with low unspecific antibody binding. Overall, if you expect differences and a more simple approach like t-test DE provides you a reasonable difference, totalVI not reporting it, is a weak argument. Experimental validation would be key in these instances.

Topic		Replies	Views
Wrong log-fold change estimation? scvi-tools diff-exp	3	80	July 9, 2025
Problems with Bayes factors and LFC changes using model.differential_expression() scvi-tools diff-exp	8	276	July 9, 2025
DE analysis with model.SCVI: which lfc indicates gene up-/down-regulation? scvi-tools diff-exp , scvi	2	993	September 15, 2022
Interpretation of fold-changes in differential expression analysis scvi-tools	2	594	May 17, 2023
Inquiry about Data Input and DE Analysis Details in scVI scvi-tools diff-exp , scvi	4	328	May 3, 2024

Discrepancy between raw mean/non-zero proportion and LFC with model.differential_expression

Related topics