Different UMAPs for same dataset with scANVI

danamcc · July 20, 2024, 6:25pm

I am using scVI and scANVI to integrate datasets of two species. I am noticing that there is a very big difference in species integration on the UMAP depending on whether scANVI creates a new model or if it imports in an existing scVI model (see code below). I have confirmed that the environment is the same and that the random seed is set to the same number in both scripts. The data is identical and adata_subset is created with the same subset of highly variable genes.

Here is the code for if I am generating an scVI model first and then passing it to the scANVI model:
scvi.model.SCVI.setup_anndata(adata_subset, batch_key = “10x_batch”, layer = “raw”, labels_key = ‘cell_type’)
model = scvi.model.SCVI(adata_subset, dispersion = ‘gene-batch’)
model.train()
lvae = scvi.model.SCANVI.from_scvi_model(model, adata=adata_subset, labels_key=“cell_type”, unlabeled_category=“Unknown”)
lvae.train()
adata_subset.obsm[“X_scANVI”] = lvae.get_latent_representation(adata_subset)
sc.pp.neighbors(adata_subset, use_rep=“X_scANVI”)
sc.tl.umap(adata_subset)

And here is the code for if I am going directly to the scANVI model:
scvi.model.SCANVI.setup_anndata(adata_subset, labels_key = ‘cell_type’, unlabeled_category = ‘Unknown’, layer = ‘raw’, batch_key = ‘10x_batch’)
lvae = scvi.model.SCANVI(adata_subset, dispersion = ‘gene-batch’)
lvae.train()
adata_subset.obsm[“X_scANVI”] = lvae.get_latent_representation(adata_subset)
sc.pp.neighbors(adata_subset, use_rep=“X_scANVI”)
sc.tl.umap(adata_subset)

Unfortunately I cannot share the UMAPs at this time, but hopefully this code is sufficient to hint at what could be happening. Thanks!

cane11 · July 22, 2024, 5:56am

It is expected that both codes yield different results. We highly recommend training scVI first and then training scANVI. Training a classifier on a bad embedding can have unwanted side effects. If you think training scANVI only provides better embeddings, I would recommend looking into MrVI with a cell-type bias, which has a slightly different strategy and is better tested in these cases.

danamcc · July 22, 2024, 4:21pm

Can you explain why they would yield different results? I thought that scANVI was an extension from the scVI model, and that the extra time training scANVI when you start without an scVI model first was just wrapping in the scVI training time.

Is there anything necessarily wrong with using scANVI without making the scVI model first? We had favorable results with going directly to scANVI.

I will look into MrVI too, thank you for the recommendation.

cane11 · July 22, 2024, 4:38pm

scANVI was built with the idea of pretraining an scVI model first that doesn’t take celltypes into consideration (see our tutorials on how to correctly use scANVI). Directly using scANVI does not train an scVI model (see e.g. Seed labeling with scANVI — scvi-tools for how to correctly use scANVI).
Training it directly with a classifier will increase reliance on correct cell-type labels and might have negative side effects (I have observed this several times). In your case relying on labels might be helpful but I would then recommend the MrVI (or scPoli) manner of enforcing cell-type labels.

danamcc · July 22, 2024, 5:06pm

Hmm, I see where I was confused. I was looking at the API here and its example showed going directly to scANVI. Is there a scenario where this usage is appropriate, or is the API outdated? Thank you!

cane11 · July 22, 2024, 5:26pm

This is not a tutorial but a demonstration of the API. Especially, taking into consideration Metric Mirages in Cell Embeddings, I don’t have a good feeling with increasing the effect cell-type labels have on embeddings.

danamcc · July 22, 2024, 6:47pm

This is a lot to think about, thank you very much!

Topic		Replies	Views
scANVI relables known cells with known types incorrectly scvi-tools scanvi	13	1873	April 18, 2023
Issue with retrain scANVI model scvi-tools scanvi	1	52	March 3, 2025
SCANVI inferred cell types don't make sense scvi-tools scanvi	1	93	October 17, 2024
Predicting of unassigned cells using scANVI scvi-tools scanvi , scvi	0	268	December 10, 2023
Add new data to existing integration scvi-tools	3	65	March 7, 2025

Different UMAPs for same dataset with scANVI

Related topics