Weird UMAP after running scVI

to-be-so-lonely · August 7, 2024, 8:01am

Hello, there! Thank you for developing such a great tool.

I’m analyzing my single cell RNA sequencing dataset (about 2,000K cells) using scanpy and scVI. After running scVI and plotting UMAP, my plot seems so weird. The code I run is as below:

adata = ad.concat([adata_dict[key] for key in adata_dict.keys()], merge = "same")

# subsampling to reduce computing time
sc.pp.subsample(adata, n_obs = 100000, random_state = 42)

# preprocessing
sc.pp.filter_cells(adata, min_counts = 1000)
sc.pp.filter_cells(adata, min_genes = 10)
adata.layers["counts"] = adata.X.copy()
sc.pp.normalize_total(adata, exclude_highly_expressed = True, target_sum = 1e6)
sc.pp.log1p(adata)
adata.raw = adata.copy()

# selecting HVGs and running scVI
sc.pp.highly_variable_genes(adata, n_top_genes = 3000, layer = "counts", flavor = "seurat_v3", batch_key = "chemistry")
scvi.model.SCVI.setup_anndata(adata, layer = "counts", categorical_covariate_keys = ["sample", "chemistry"])
model_scvi = scvi.model.SCVI(adata, n_hidden = 128, n_latent = 20, n_layers = 2, gene_likelihood = "nb")
model_scvi.train()
adata.obsm["X_scVI"] = model_scvi.get_latent_representation()

# plotting UMAP
sc.pp.neighbors(adata, use_rep = "X_scVI", random_state = 42)
sc.tl.leiden(adata, random_state = 42, key_added = "leiden_scVI", resolution = 1.0)
sc.tl.umap(adata, random_state = 42)
sc.pl.umap(adata, color = "cell type")

and the UMAP looks like
umap

I tried

choosing different number of HVGs (3,000 ~ 6,000)
choosing different number of n_latent when generating model
trying normalization and log-transformation before merging
defining max_epochs as np.min([round((20000 / adata.n_obs) * 400), 400]) which was written in the scVI tutorial and single cell best practices paper
running scVI on both the entire dataset and the subsampled dataset yielded the same result

but neither of them worked. Also, I made sure that gene expression matrix which scVI used was composed of raw counts. Running harmony seems to work well but I have no idea why does my UMAP after running scVI look like this. Any ideas or comments will be appreciated.

Best,

cane11 · August 7, 2024, 8:27am

Hi, can you check how many counts the cells contain after highly_variable_gene selection. It might be a problem with UMAP (small outlier cells that are hard to see. Can you add plots for the 20 latent dimensions or a 2D PCA plot of your latent dimensions.

to-be-so-lonely · August 7, 2024, 9:08am

Thank you for your reply.

I’m not sure what you mean by the number of counts the cells contain after the “HVG selection”:

np.max(adata.obs["n_counts"])
192466.0
np.min(adata.obs["n_counts"])
1000.0

Though I didn’t perform QC for this dataset, I removed cells whose total counts were bigger than 99th percentile of the n_counts or n_genes or smaller than 1st percentile when preprocessing the original dataset (this dataset was downsampled to 1000K cells).

I’m not sure I got your purpose…

cane11 · September 9, 2024, 8:18pm

The PCA looks fine, which was my expectation. The model has captured variations in your data. UMAPs can look wildly off if there are disconnected graphs in the neighborhood graph. This is a problem with UMAP. You can increase the number of neighbors for UMAP which usually helps with those wrong UMAP displays.

christophechu · September 11, 2024, 7:29am

set the number of pcs to 20 using n_pcs=20 when you are running sc.pp.neighbors.

cane11 · September 12, 2024, 6:21pm

I’m confused. You don’t need to set n_pcs when using scVI embeddings but should run it on all latent dimensions (see our tutorials).

christophechu · September 12, 2024, 6:37pm

You are right. In this case, n_lanent =20.

Topic		Replies	Views
Clustering on the scVI latent space generates only gray-colored cells scvi-tools scvi	4	247	December 24, 2023
Model Training got only 2 epochs scvi-tools integration , scvi	4	180	October 14, 2024
Embedding number for visualization scvi-tools integration , scvi	10	89	September 23, 2024
Different UMAPs for same dataset with scANVI scvi-tools scanvi , scvi	6	172	July 22, 2024
scVI batch correction clusters all cells from sample in a circle (potential artifact) scvi-tools integration , scvi	14	92	June 3, 2025

Weird UMAP after running scVI

Related topics