Do I get it right that the main difference between the vanilla and change methods is that vanilla just compares the means (which one is larger), while the change method requires a certain distance between the means as well ( |lfc| > delta ) ?

How do the methods sample from posterior distribution? We provide the cell indices for the groups to compare, but what happens then? Looking here my guess is that n_samples random cells with replacement are drawn from each group, and their expression values are compared. Are those expression values corrected by the model, i.e. what I would get from get_normalized_expression()?

How batch is treated in this process? Looking at the code I can see that batch information is passed around, and initially taken from the model, if it is not provided by the caller, but I couldn’t figure out how it affects the output.

I apologize if I couldn’t find where you have this written already.
Thank you,
Best,
Nick

You are exactly right, the change option will filter out DE genes for which the means in both populations are close to one another.

As of now, we sample from the aggregate variational posteriors. In other words, to sample from a given population A, we iteratively (i) sample a cell x belonging to A (ii) obtain a posterior sample from q(z \mid x). The line you highlighted reflects this process.When n_samples is specified in scale_sampler, (i), (ii) will be performed n_samples times.

If you don’t specify batch indices for get_bayes_factors, two scenarios can happen.
If use_oserved_batches is set to False, then differential expression is done in two steps. First, we sample expression levels h_a, h_b conditioned on each batch for both idx_1, idx_2.
In a second time, to construct log-fold change samples, and make sure that the pair of samples came from the batch conditioning.
To work, this procedure assumes that the observed batch in the two conditions indeed intersect.
If use_observed_batches is set to True, then the procedure won’t try to harmonize batch effects and pairs of samples won’t be matched together.

Does this help you? Don’t hesitate if you have other questions!
Best,
Pierre

Thank you @PierreBoyeau. I just wanted to follow up on this point.

What Pierre is describing with get_bayes_factors is a lower-level function.

Basically if you pass batch_correction=True to differential_expression() then when it samples two cells (one from group A, one from group B), they are conditioned on the same batch category This process is done and averaged over all batch categories.

The effect of this is that the batch effect is “integrated out”, and hopefully shouldn’t affect the DE results.

This whole batch correction point is a bit confusing. I think the best reference for it might be the scANVI paper in Molecular Systems Bio.

Interesting discussion. Can I ask a question regarding the sampling process. I am fairly new to this tool, and use it now more and like it.

Q:
If we compare the differential gene expression between one clusters with two conditions - and we have 100 cells from condition_A and 1000 cells from condition_B . Condition_A has a normally distributed expression around 10 and treatment_B has also a normally distributed expression value around 8 (for the sake of the example, but this brings up another question how the expression is distributed in clusters in general after QC?). Then we (maybe) would a significant higher expression in condition_A than in condition_B - although the cell numbers are off right (this applies to all the method for DGE I guess)? Do we correct for that bias somehow, or is that something to observe and take care of?

Another question I would have is also regarding the sampling with replacement - how small (from the cell count) can clusters get to compare them - can we compare clusters with 10 cells each for DGE if we do sampling with replacement?

Thanks in advance for your feedback, really looking forward to understand the tool better,
best wishes,
Simon

Indeed, when we assess the LFC, it should be an average LFC per population so there are no issues with imbalance. That said, less cells (and less samples) usually means more noise, so of course it’s always better to use as many cells as possible.

Indeed, you can work with rare populations (in the paper we’re almost ready to put on arxiv, I think we’re looking at between 30 and 200 cells for benchmarking. Then, 10 may seem extreme but these number are highly impacted by how strong the signal is right? So it’s very hard to say.

If you have enough cells, the only drawback should be longer runtime.

Yes this is exact! Again, I’m excited that this is getting more traction because we’ve been looking into this for a while. I hope the new paper that’s coming up will clarify much better all the theoretical considerations to look at and all the details when performing DE with deep generative models.