SCANVI soft labeling

Anand · February 15, 2022, 11:04pm

I trained SCVI and SCANVI models on a dataset. In order to test the probabilities output by soft labeling , I withheld one cluster from the model. The theory was that the model should classify cells in this cluster with low confidence (lower max probability). I did 10 runs of this exercise with all clusters except the ‘Unseen cluster’ as part of the training. Here is what I get:

In the left plot, I plot the fraction of cells with max probability below 0.95. Each of the 10 runs of the model is colored separately. As expected, this fraction is quite low (~2%) among training cells. Also, expectedly, the fraction is higher among cells from the ‘Unseen cluster.’ In the right plot, I am plotting the median and 25-75 percentile distribution for max probability for the 10 runs of the model for training cells and ‘Unseen’ cells.

What I find strange is the variation in the outcome of SCANVI.predict for 10 runs of the same training data. The variation is quite large. Is this a result of expected stochasticity in different model runs?

adamgayoso · August 8, 2022, 7:26am

Thanks for sharing this. Indeed this is weird and we are looking into improvements to scANVI’s classifier component. It also seems related to:

Topic		Replies	Views
Posterior probability of being assigned to a specific label scvi-tools scanvi	4	592	July 28, 2021
scANVI relables known cells with known types incorrectly scvi-tools scanvi	13	1861	April 18, 2023
SCANVI inferred cell types don't make sense scvi-tools scanvi	1	91	October 17, 2024
SCANVI: cannot reproduce predicted cell types as in the tutorial scvi-tools scanvi	1	462	October 5, 2022
Predicting of unassigned cells using scANVI scvi-tools scanvi , scvi	0	267	December 10, 2023

SCANVI soft labeling

Related topics