Time point specific batch correction

thank you for this amazing tool!

I’ve been working mostly with time-series data (scRNA / scATAC) and I was wondering if it is possible to run batch correction on each time point separately. For example, if I have a data set consisting of 4 batches A,B,C, and D, where A and B belong to time t0 and C and D to time t1, I would like to batch correct between A and B and between C and D. However, I don’t want batches of different time points to be corrected for, because then the temporal component of the development is lost.
Is something like this already implemented?

Thank you very much.

Hi there,

Thank you for using scvi-tools! If those 4 batches are all independent of one another and share no technical effects, I’m afraid there is way to batch correct in the way you requested. However, if say A and C were from the same experimental batch and B and D were from the same experimental batch, then you could batch correct using this metadata. Otherwise, there is no way for the model to distinguish between variation attributed to technical differences vs. biological differences (the temporal component in this case).

Hi @Justin_Hong, thanks for your quick reply! It makes perfect sense that the model can’t correct the batch effects globally, i.e. considering A, B, C and D jointly. However, if we’re only interested in correcting within-time point batch effects (which should be mostly technical), isn’t there a way to do this? I’m not sure but we may be able to assume that the batch effect from A-> B (at t_1) is similar to the one from C-> D at t_2

Hi Justin, thank you. I tried your approach with combining A+C and B+D and batch correcting between them and the result looks really good:

However, this is method is not applicable if the number of batches in each time point varies. It would be nice to have an option to perform batch correction on subsets of cells independently.

Hi @ManuelGander and @Marius1311,

Thanks for sharing your results. Since scVI does not make any strict assumptions on the nature of technical variation, there is no clear way to apply batch correction on subsets of cells ‘independently’. There would need to be some proper structure to the data to allow for such correction which ties batches across time points (or any axis of variation of interest). Depending on the dataset, experimenting with affine transformations in the latent space across time points is not a bad idea (e.g. similar idea with scGen scGen predicts single-cell perturbation responses | Nature Methods). For a more principled approach, you may want to consider batch correction methods with heavier assumptions on what batch effects look like. For example, RUV (Normalization of RNA-seq data using factor analysis of control genes or samples | Nature Biotechnology). Hope this is of some help!

Thanks @Justin_Hong for these pointers, I’ll discuss this with @ManuelGander!