SCANVI inferred cell types don't make sense

Hi, I used scANVI to infer unknown cells from a trained scVi model. I then checked a small cluster to compare the original cell type “celltype_manual” and the inferred cell type ‘C_scANVI’

> adata.obs[adata.obs['leiden_0.2']=='19'][['celltype_manual','C_scANVI']].drop_duplicates()
|celltype_manual|C_scANVI|
|---|---|
|Neuron|Endothelial|
|Neuron|Ependymal|
|Neuron|SMC|
|Neuron|Neuron|
|Neuron|Fibroblast|
|Neuron|OPC|
|unknown|Endothelial|
|unknown|Ependymal|
|unknown|Immune_cells|
|unknown|SMC|
|unknown|Oligodendrocyte|
|unknown|Astrocyte|
|Oligodendrocyte|Endothelial|
|Oligodendrocyte|Oligodendrocyte|

The celltype_manual came from the original meta data of published data sets, so should be taken as truth here. However the inferred cell type by scANVI looks like random. And even if that’s true, such cell types shouldn’t be clustered into one small cluster #19.

My codes for set up scanvi_model and train simply followed the label transfer tutorial.

scanvi_model = scvi.model.SCANVI.from_scvi_model( model, adata=adata, labels_key=“celltype_manual”, unlabeled_category=“unknown”)
scanvi_model.train(max_epochs=20, n_samples_per_label=100, accelerator=‘gpu’)

Could you please let me know what might be wrong? I think the only problem might be I didn’t specify the labels_key to be the celltype_manual when setting up the model. Is that the reason?

Please also try a KNN classifier in latent space (see e.g. PopV code for a simple classifier). We and others found that the scANVI classifier sometimes doesn’t perform well. This is especially true before release 1.1 of scVI-tools.

1 Like