Thank you very much for developing this amazing tool. I am confused on the output from DE analysis, and how DEGs are defined. From the manuscript, the DEG was selected based on both p-value<0.05 and absolute log fold change>0.5 or 1. I understand that the model estimates each gene’s log fold change in each cell, but I did not understand why each cell only has 1 p-value for each covariate. (i.e. why there is no p-value for each gene in each cell) Are you keeping the cells whose adjusted p-values with respect to the covariate of interest are <0.05, and selecting the genes among these cells with absolute log fold change >0.5 as DEGs of that trait? Thank you very much for your explanation.
Hi, pvalue and padj are p-values of the covariate specific effect sizes in z-space using a Chi2 statistic.
pde is similar to a p-value on the gene-by-cell level (how many random samples generated an LFC above 0.5). It is not uniform under the null hypothesis of no differential expression and is therefore not a p-value but highlights the significance of a result. See lvm-DE for a deeper discussion of this aspect: https://www.pnas.org/doi/10.1073/pnas.2209124120.
Thank you very much for your response. I am still confused about this. In the manuscript of MrVI, when the DEG is defined as “p-value<0.05 and absolute logFC>1”, is the p-value referring to the pde? Then with the pde, how does MrVI select the optimal pde threshold to control FDR<0.05 and output significant DEGs, which lvm-DE takes care of? Shall I run lvm-DE after running MrVI? Thank you very much. I truly appreciate your patience.