Unexpected Effect of Augmentations on scVI Batch Integration

I’m experimenting with augmentations in scVI and I get some results I can’t quite explain. When applying augmentations like random gene masking, cell swaps, poisson noise etc (only for training) and on a per training step basis, I get increases in biological conservation scores (eg Leiden ARI and NMI) but decreases in batch correction scores (eg KBET). I’m sure my augmentations are correctly applied and my scores are measured with scIB metrics. The results seem to be consistent across different datasets and random seeds. Any intuition for why this happens?

1 Like