Label transfer with SCVI-SCANVI pipeline changes (predicts wrong) labels in ref data

Hi, I am following this tutorial " Integration and label transfer with Tabula Muris" Integration and label transfer with Tabula Muris - scvi-tools on SCVI-tools docs page and everything works fine and I save the predicted labels in the new metadata column

adata.obs[“C_scANVI”] = lvae.predict(adata) #saving predicted labels in new column

However when I check the ref labels, some of them are predicted differently than what it was before I trained SCVI model (when I concatenated with my query data).
I don’t understand why this is?
My understanding is that I am training the SCVI model on ref labelled data and then using SCANVI to transfer the labels on ‘unknown’ labels in query dataset.
Why is it predicting some of the ref data labels wrongly?
Any advise please. Am I doing something wrong here?

As you can see in the screenshot the ref data cell label ‘Cell cycle_TCGGTCTGTGAGAGGG-1_32_1-1’ is changed from Trm-c to CTL-c.

Can you describe how many of the training labels are wrong?

The predict function makes a prediction for each cell, including the reference data, for which by default 90% is train and 10% is a validation set. Either scanvi is getting it wrong because there’s something systematically off and/or there is noise in the training labels.

1 Like