Hi,
I have a large dataset where there is no ground truth to what clusters should be, so I can’t used annotation based validation. I’m having difficulty choosing an appropriate resolution when doing leiden clustering. I know that Seurat can determine clusters with a power analysis, is there anything similar in scanpy or scvi to validate the probability that clusters are distinct from each other? Otherwise I’m doing a more manual decision of calculating DEGs at different resolutions and checking if there’s enough above an LFC threshold.
Thanks!
1 Like
I’d love to also learn what are the standard tools for decision making for this parameter selection.
I’ve found that the dendogram tool is quite helpful:
sc.tl.leiden(adata, key_added="leiden_res0_25", resolution=0.25)
sc.tl.leiden(adata, key_added="leiden_res0_5", resolution=0.5)
sc.tl.dendrogram(adata, groupby = "leiden_res0_25")
sc.pl.dendrogram(adata, groupby = "leiden_res0_25")
sc.tl.dendrogram(adata, groupby = "leiden_res0_5")
sc.pl.dendrogram(adata, groupby = "leiden_res0_5")
And visualization can sometimes help guiding:
sc.pl.umap(adata, color=["leiden_res0_25", "leiden_res0_5"], legend_loc="on data")
generally, There are also the elbow and silhouette scores:
1 Like