Dear totalVI community,
I’m encountering a puzzling issue while analyzing my Spatial CITE_seq data (P_CD8) with totalVI, and I hope someone can help me interpret the results. Here’s the key data for the gene/protein of interest:
plaintext
Comparison: group1 (12) vs group2 (Rest)
raw_mean1: 1.038232 (group1)
raw_mean2: 0.351163 (group2)
non_zeros_proportion1: 0.567503 (group1)
non_zeros_proportion2: 0.266475 (group2)
lfc_mean: -0.094309 (close to zero)
bayes_factor: -1.45131 (supporting no difference)
is_de_fdr_0.05: False (not significant)
delta: 0.25
sample_protein_mixing: True (background set to zero)
Key observations:
- Raw mean expression in group1 is ~3x higher than group2, and non-zero proportions differ by ~30% (0.56 vs. 0.26).
- However, the LFC mean is negligible (-0.09), and the bayes factor is negative, suggesting no significant difference.
- The result contradicts my expectation that large differences in raw expression and non-zero rates would translate to a significant DE result.
Questions:
- Why does a large raw mean difference not translate to a meaningful LFC? Is this due to the logarithmic transformation or the model’s handling of zero-inflation?
- How does totalVI integrate non-zero proportion differences into the DE test? Should I adjust parameters like
delta=0.25
orsample_protein_mixing=True
for sparse protein data? - Could the negative bayes factor indicate that the model favors the null hypothesis despite raw differences? What assumptions might I be missing?
Context:
- The data is spatial CITE_seq data, we get 100 proteins and 20,000 genes in one tissue slice.
- I used
empirical_protein_background_prior=True
andbatch_correction=False
(no batch effect in the data).
Any insights into the model’s behavior or suggestions for troubleshooting would be greatly appreciated! Thank you.
Best regards,
Ji