Scvi-tools label transfer accuracy

AkilaRanjith · June 9, 2023, 1:30am

Hello All,

I have annotated my data containing 400k cells, and now I have query data consisting of 100k cells. I would like to perform label transfer for this data using scvi-tools.

During label transfer with scANVI, I noticed that some classical clusters were not accurately predicted. For example, the reference data has 51 clusters, but the predicted data only has 30 clusters. I am wondering if there is a way to improve the classification of these clusters by adjusting any other parameters in the model. Please suggest any potential solutions.

Thanks,
Akila

martinkim0 · June 10, 2023, 5:36am

Hi, thank you for your question. Some parameters you may consider changing:

Passing in n_samples_per_label=x to SCANVI.train when training on the reference data. This will enable label subsampling such that the model will sample x observations from each cell type label at the start of each epoch. The consequence of this is that rare cell types will be sampled more frequently, which can significantly affect your model’s performance depending on the distribution of cell types in your dataset. In my experience testing this out, it leads to more stable classifier performance.
Setting linear_classifier=True when initializing SCANVI. The default classifier includes multiple layers and could be overfitting on the training data, so a simpler linear classifier might help. I would try plotting the validation accuracy and/or classification loss during training to compare both options.

Both of these options are available in the latest version of scvi-tools (1.0.0).

AkilaRanjith · June 15, 2023, 5:32am

Hello,

Thank you for your reply. I have utilized both the linear classifier and the n_samples_per_label parameter. These adjustments improved the accuracy, resulting in a 90% accurate prediction of cell labels in the reference data.

I have a naive question regarding another dataset. This dataset consists of 100k cells, where 90% were used for training and 10% for testing the model. I already know the class labels for this test dataset. The model successfully predicted the test set and reference set with high confidence (soft=True).

Now, I would like to use this trained model to predict similar class labels for another dataset (two times more than reference dataset) containing approximately 200k cells (Predict the class labels). Is it feasible to apply the same method? How should I tune the parameters for this new dataset?

Thank you,
Akila

Topic		Replies	Views
Label Transfer Discrepancy in scANVI Model Training scvi-tools	2	398	January 22, 2024
SCVI and SCANVI for label transfer how to assess accuracy? scvi-tools scanvi	3	171	August 20, 2024
Encountering Error in Label Transfer : Query Dataset Slightly Larger than Reference Dataset scvi-tools scanvi	3	259	January 11, 2024
Label transfer with SCVI-SCANVI pipeline changes (predicts wrong) labels in ref data scvi-tools scanvi , scvi	8	1002	July 31, 2023
How to assess scanvi annoation transfer effect Help scanvi	1	31	September 6, 2024

Scvi-tools label transfer accuracy

Related topics