Transferring lables from refrence dataset to query dataset with a more diverse cell population using scANVI

pouria · July 25, 2024, 4:51am

Hello,

Firstly, thank you for providing such an excellent set of tools and tutorials.

I’m currently using scANVI to transfer cell labels from a reference dataset of NK cells (labeled according to NK cell subtypes) to a query dataset consisting of CD45+ cells from peripheral blood. In the initial results, all cells in the query dataset have been labeled as different NK cell subtypes.

I have a few questions regarding this process:

Is it problematic to use a reference dataset with a much narrower variety of cells than the query dataset? Ideally, I only want to label the NK cells in the query dataset.
Can I restrict the labeling to cells that meet a higher probability threshold?
When using scanvi.predict to return probabilities, what would be a reliable probability threshold to consider the labeling as accurate?

Thank you for your assistance.

cane11 · July 25, 2024, 5:23am

Hi, scANVI is not developed for this use case. Specifically, probabilities are not calibrated and can’t predict an unobserved cell-type.
The number of tools that can detect query-specific cell-types is quite limited. To address this need, we have developed popV and have tested it in similar cases beyond the actual manuscript with good results. GitHub - YosefLab/PopV. You will likely need to disable using a cell ontology as the NK cell subsets are not part of the Cell Ontology. We have tested it in these settings and it was rather straightforward to find a good decision boundary (usually >5/7 algorithms will highlight a confident transferred label).

pouria · July 31, 2024, 7:12am

Thank you so much, will give this a try!

pouria · July 31, 2024, 7:26am

Do you think ther would be any added value in annotating NK cells first using PopV with the built in cell onotology to narrow the query dataset down to only NK cells and in a second run, not use cell ontology, but use the refrence dataset and >5/7 algorithm boundary as you suggest?

cane11 · August 1, 2024, 4:12pm

I think both results will be best case very similar. I assume NK cells are defined more stringently in your dataset as you are interested in these cells. I would directly go towards your reference dataset as in Tabula sapiens I disagreed with some Tcell labels.

pouria · September 27, 2024, 8:07am

Read you’re paper on BioArchive, I appreciate the work, and would love to use PopV. For some reason my installation doesn’t work. Can’t find the proper modules when following you tutorials. Also the google collab notebooks don’t work. Do you have any suggestions? Thank you for all your help thus far.

pouria · September 27, 2024, 8:09am

Have to emphasise that I’m an MD and not a bioinformatician, could be a very easy fix, because I lack basic coding skills.

cane11 · September 27, 2024, 5:33pm

Thank for your interest in using it. I’m likewise an MD and no worries. There were some outdated Colab notebooks circulation. Can you try: Google Colab. I just ran it on Colab an no issues were showing up. Can you confirm that you use the same link?

pouria · October 17, 2024, 7:53am

Sorry for my lag in response, have been occupied with experiments in the lab. Thank you for sending the the latest google colab, I will try it out and get back to you when I find the time. Really appreciate all your help!

Topic		Replies	Views
scANVI relables known cells with known types incorrectly scvi-tools scanvi	13	2028	April 18, 2023
Label transfer with SCVI-SCANVI pipeline changes (predicts wrong) labels in ref data scvi-tools scanvi , scvi	8	1158	July 31, 2023
Scvi-tools label transfer accuracy scvi-tools scanvi	2	597	June 15, 2023
Limited cell types in reference dataset for scANVI Help scanvi	3	27	December 4, 2025
Label Transfer Discrepancy in scANVI Model Training scvi-tools	2	475	January 22, 2024

Transferring lables from refrence dataset to query dataset with a more diverse cell population using scANVI

Related topics