Clustering on the scVI latent space generates only gray-colored cells

Hi,

I’d like to cluster my data on the scVI latent space using the following code:

adata = sc.read_h5ad(filename='%s/%s.h5ad'%(infolder,gse))

sc.pp.highly_variable_genes(
    adata,
    n_top_genes=1200,
    subset=True,
    layer="counts",
    flavor="seurat_v3",
    batch_key="gse",
) 
scvi.model.SCVI.setup_anndata(
    adata,
    layer="counts",
    categorical_covariate_keys=["gse"],
    continuous_covariate_keys=['pct_counts_mt', 'total_counts']
)
os.chdir(infolder)
model = scvi.model.SCVI(adata)
model.train()
latent = model.get_latent_representation()
adata.obsm["X_scVI"] = latent
adata.layers["scvi_normalized"] = model.get_normalized_expression(library_size=10e4)
sc.pp.neighbors(adata, use_rep="X_scVI")
sc.tl.umap(adata, min_dist=0.3)
SCVI_CLUSTERS_KEY = "leiden_scVI"
sc.tl.leiden(adata, key_added=SCVI_CLUSTERS_KEY, resolution=0.5)
sc.pl.umap(
    adata,
    color=[SCVI_CLUSTERS_KEY],
    frameon=False,
    save='_10x_gse_leiden_bc.png'
)

However, the resulting plot only has gray color for all cells for all clusters (see attached figure).

Please advise how the different clusters can have different colors. Thanks.

Hi, I believe this is occurring because there are too many clusters in data such that there’s no way to color them clearly. One way around this would be to Leiden cluster with less granularity so that you have fewer clusters in the end.

1 Like

I would assume you have very low quality cells with only a few counts after highly variable gene filtering. This creates those dispersed points in UMAP and leads to those many clusters. Please adjust the number of highly variable genes (more genes) and check your filtering for low count cells (something like 500 or 1000 should be fine for 10X v3 - different numbers for other technologies). If you want to keep your current settings, increasing n_neighbors in sc.pp.neighbors might help.

1 Like

Thank you. I tried this.

Thank you. I also tried this. The dispersed points in UMAP turned out to be duplicates.