Discrepancy between raw mean/non-zero proportion and LFC with model.differential_expression

Dear totalVI community,

I’m encountering a puzzling issue while analyzing my Spatial CITE_seq data (P_CD8) with totalVI, and I hope someone can help me interpret the results. Here’s the key data for the gene/protein of interest:

plaintext

Comparison: group1 (12) vs group2 (Rest)
raw_mean1: 1.038232 (group1)
raw_mean2: 0.351163 (group2)
non_zeros_proportion1: 0.567503 (group1)
non_zeros_proportion2: 0.266475 (group2)
lfc_mean: -0.094309 (close to zero)
bayes_factor: -1.45131 (supporting no difference)
is_de_fdr_0.05: False (not significant)
delta: 0.25 
sample_protein_mixing: True (background set to zero)

Key observations:

  1. Raw mean expression in group1 is ~3x higher than group2, and non-zero proportions differ by ~30% (0.56 vs. 0.26).
  2. However, the LFC mean is negligible (-0.09), and the bayes factor is negative, suggesting no significant difference.
  3. The result contradicts my expectation that large differences in raw expression and non-zero rates would translate to a significant DE result.

Questions:

  • Why does a large raw mean difference not translate to a meaningful LFC? Is this due to the logarithmic transformation or the model’s handling of zero-inflation?
  • How does totalVI integrate non-zero proportion differences into the DE test? Should I adjust parameters like delta=0.25 or sample_protein_mixing=True for sparse protein data?
  • Could the negative bayes factor indicate that the model favors the null hypothesis despite raw differences? What assumptions might I be missing?

Context:

  • The data is spatial CITE_seq data, we get 100 proteins and 20,000 genes in one tissue slice.
  • I used empirical_protein_background_prior=True and batch_correction=False (no batch effect in the data).

Any insights into the model’s behavior or suggestions for troubleshooting would be greatly appreciated! Thank you.

Best regards,
Ji