Firstly, thank you for providing such an excellent set of tools and tutorials.
I’m currently using scANVI to transfer cell labels from a reference dataset of NK cells (labeled according to NK cell subtypes) to a query dataset consisting of CD45+ cells from peripheral blood. In the initial results, all cells in the query dataset have been labeled as different NK cell subtypes.
I have a few questions regarding this process:
Is it problematic to use a reference dataset with a much narrower variety of cells than the query dataset? Ideally, I only want to label the NK cells in the query dataset.
Can I restrict the labeling to cells that meet a higher probability threshold?
When using scanvi.predict to return probabilities, what would be a reliable probability threshold to consider the labeling as accurate?
Hi, scANVI is not developed for this use case. Specifically, probabilities are not calibrated and can’t predict an unobserved cell-type.
The number of tools that can detect query-specific cell-types is quite limited. To address this need, we have developed popV and have tested it in similar cases beyond the actual manuscript with good results. GitHub - YosefLab/PopV. You will likely need to disable using a cell ontology as the NK cell subsets are not part of the Cell Ontology. We have tested it in these settings and it was rather straightforward to find a good decision boundary (usually >5/7 algorithms will highlight a confident transferred label).
Do you think ther would be any added value in annotating NK cells first using PopV with the built in cell onotology to narrow the query dataset down to only NK cells and in a second run, not use cell ontology, but use the refrence dataset and >5/7 algorithm boundary as you suggest?
I think both results will be best case very similar. I assume NK cells are defined more stringently in your dataset as you are interested in these cells. I would directly go towards your reference dataset as in Tabula sapiens I disagreed with some Tcell labels.
Read you’re paper on BioArchive, I appreciate the work, and would love to use PopV. For some reason my installation doesn’t work. Can’t find the proper modules when following you tutorials. Also the google collab notebooks don’t work. Do you have any suggestions? Thank you for all your help thus far.
Thank for your interest in using it. I’m likewise an MD and no worries. There were some outdated Colab notebooks circulation. Can you try: Google Colab. I just ran it on Colab an no issues were showing up. Can you confirm that you use the same link?
Sorry for my lag in response, have been occupied with experiments in the lab. Thank you for sending the the latest google colab, I will try it out and get back to you when I find the time. Really appreciate all your help!