I have a large dataset where there is no ground truth to what clusters should be, so I can’t used annotation based validation. I’m having difficulty choosing an appropriate resolution when doing leiden clustering. I know that Seurat can determine clusters with a power analysis, is there anything similar in scanpy or scvi to validate the probability that clusters are distinct from each other? Otherwise I’m doing a more manual decision of calculating DEGs at different resolutions and checking if there’s enough above an LFC threshold.
I’d love to also learn what are the standard tools for decision making for this parameter selection.
I’ve found that the dendogram tool is quite helpful:
sc.tl.leiden(adata, key_added="leiden_res0_25", resolution=0.25) sc.tl.leiden(adata, key_added="leiden_res0_5", resolution=0.5) sc.tl.dendrogram(adata, groupby = "leiden_res0_25") sc.pl.dendrogram(adata, groupby = "leiden_res0_25") sc.tl.dendrogram(adata, groupby = "leiden_res0_5") sc.pl.dendrogram(adata, groupby = "leiden_res0_5")
And visualization can sometimes help guiding:
sc.pl.umap(adata, color=["leiden_res0_25", "leiden_res0_5"], legend_loc="on data")
generally, There are also the elbow and silhouette scores: