Choosing a Clustering Resolution

danamcc · February 13, 2023, 11:43pm

Hi,
I have a large dataset where there is no ground truth to what clusters should be, so I can’t used annotation based validation. I’m having difficulty choosing an appropriate resolution when doing leiden clustering. I know that Seurat can determine clusters with a power analysis, is there anything similar in scanpy or scvi to validate the probability that clusters are distinct from each other? Otherwise I’m doing a more manual decision of calculating DEGs at different resolutions and checking if there’s enough above an LFC threshold.
Thanks!

yotamcons · February 17, 2023, 7:44am

I’d love to also learn what are the standard tools for decision making for this parameter selection.

I’ve found that the dendogram tool is quite helpful:

sc.tl.leiden(adata, key_added="leiden_res0_25", resolution=0.25)
sc.tl.leiden(adata, key_added="leiden_res0_5", resolution=0.5)

sc.tl.dendrogram(adata, groupby = "leiden_res0_25")
sc.pl.dendrogram(adata, groupby = "leiden_res0_25")

sc.tl.dendrogram(adata, groupby = "leiden_res0_5")
sc.pl.dendrogram(adata, groupby = "leiden_res0_5")

And visualization can sometimes help guiding:

sc.pl.umap(adata, color=["leiden_res0_25", "leiden_res0_5"], legend_loc="on data")

generally, There are also the elbow and silhouette scores:

Topic		Replies	Views
Oddly high amount of clusters during subcluster analysis using Leiden scanpy	0	511	June 12, 2023
Leiden clustering gives me different results when I run it scanpy	1	423	April 1, 2024
Re-Clustering Clusters of Anndata scanpy	2	3337	November 8, 2022
Leidenalg clustering error in Trajectory inference tutorial scanpy	3	1679	September 29, 2022
How to decide the number of Optimal clusters while clustering the subtypes of cells scanpy clustering	1	761	July 8, 2022

Choosing a Clustering Resolution

Related topics