Good evening, thank you for maintaining TotalVI! We are applying a pre-existing model and classifier built from TotalVI to a new dataset. This new dataset includes samples across several different cancer types, patients, techniques (multiome, single-cell, single-nuclei), and labs. Some cancers have samples from multiple techniques while others do not. We would like to compare our cell types across cancers as best as possible. When we run with batch set to different variables, we obtain very different results. Since our conclusions will differ based upon what we ultimately choose as batch, we would like to understand better what the batch variable is doing.
Are you able to provide guidance or information on what the batch variable is doing in the model to help us to select the best batch option? Any input is very much appreciated, thank you!