"Vanilla" vs "Change" Mode in Differential Expression

I’m struggling to understand the differences between “vanilla” and “change” mode for differential expression. Are there certain types of datasets where it would be better to use one over the other? I am looking for heterogeneity within the same cell type of glia.

Also, am I correct that the differential_expression function wants the raw layer of anndata? I’ve been setting adata.X = adata.layers[‘raw’].copy() prior to running, but is there different way to indicate to the correct layer to calculate based on? Or is this step redundant because I chose the raw layer in the setup_anndata step?


Hi Dana,

The two different modes reflect different definitions of the term ‘differential expression’. 'vanilla' estimates the probability of a gene being higher in population A than population B. 'change' infers the distribution of the log fold change between population A and population B.

Practically, if you are looking for markers that distinguish a cell population, 'vanilla' is essentially quantifying how ‘separable’ a population is based on a gene.

When working with experimental setups, such as drug treatments, effect sizes as quantified by log fold change are often very directly tied to conditions of the experiments (e.g., dosage). Fold changes between treatments and conditions are generally how effects are quantified throughout many fields of biology. So working in the statistical framework of fold changes allow you to quantitatively relate results to the rest of experimental biology.

Regarding your second question, I’m afraid I don’t know.

Hope this helps!

That does clear things up a lot!
No worries on the second question, I can just leave a possibly redundant line in.
Thank you!