Impact of batch on TotalVI results

JenniferF · January 4, 2025, 3:30am

Good evening, thank you for maintaining TotalVI! We are applying a pre-existing model and classifier built from TotalVI to a new dataset. This new dataset includes samples across several different cancer types, patients, techniques (multiome, single-cell, single-nuclei), and labs. Some cancers have samples from multiple techniques while others do not. We would like to compare our cell types across cancers as best as possible. When we run with batch set to different variables, we obtain very different results. Since our conclusions will differ based upon what we ultimately choose as batch, we would like to understand better what the batch variable is doing.

Are you able to provide guidance or information on what the batch variable is doing in the model to help us to select the best batch option? Any input is very much appreciated, thank you!

ori-kron-wis · January 5, 2025, 10:32am

Hey Jennifer,

Selecting best batch option might depend also on what you are looking for, as for example if you are analysing the biological aspect of your data, you should use only the technical data as batch (donor, sites etc…), on the other hand you can also use a flag to indicate which techniques even exists per sample , as this might also be a source of bias.

As to what specifically batch_key is doing, well, that depends on each model but overall in all cases its try to remove the noise originated in a technical aspects, or the removal of unwanted variation.

See the following resources for more information:

Topic		Replies	Views
Comparing steps of Scanpy for scRNQ-seq and totalvi for CITE-seq scvi-tools totalvi	6	805	October 8, 2021
Running TOTALVI data in which subset of cells do not have citeseq data scvi-tools integration , totalvi	8	695	March 25, 2021
Training split conditioned on batch_key scvi-tools scvi	3	207	May 22, 2024
All genes or highly variable genes? scvi-tools gene-selection , scvi , totalvi	10	4363	March 31, 2022
Failure to remove a batch_key/ effect of number of LVs scvi-tools integration , scvi	6	529	February 9, 2024

Impact of batch on TotalVI results

Related topics