Hello All,

I am working on approximately five hundred thousand cells belonging to brain tissue. I want to do clustering the neuronal and non-neuronal subtypes.

For example: In microglia cell type would like to see homeostatic, proliferating and activated microglia. For this, I have subsetted the clusters which belong to microglia and re-cluster them using various resolutions and n_neighbor cutoff.

But I would like to know if there is an elegant way to find out the optimal cluster numbers from the subset of cells.

Here is the piece of code

Blockquote

```
# Read Anndata
adata=anndata.read_h5ad("/home/Akila/integration/harmony/subset/celltype/
microglia.h5ad")
# known Marker gene-microglia
Marker= {'Microglia-Homeostatic': ['CX3CR1','CSF1R','APBB1IP'],'Microglia-
Activated-1': ['CD163','CD83'],Microglia-Inflammatory': ['HLA-A', 'HLA-
B','C3'],'Microglia-Proliferative': ['FAM111B']}
# Varying number of neighbors
neighbor=[3:30]
for k in range(len(neighbor)):
sc.pp.neighbors(adata,use_rep="X_pca_harmony",n_pcs=18,
n_neighbors=int(neighbor[k]))
# Different resolution
sc.tl.leiden(adata,resolution = 0.05 key_added = "leiden_0.05")
sc.tl.leiden(adata,resolution = 0.2, key_added = "leiden_0.2")
sc.tl.leiden(adata,resolution = 0.4, key_added = "leiden_0.4")
sc.tl.leiden(adata,resolution = 0.6, key_added = "leiden_0.6")
# Save plots
list2=["leiden_0.05","leiden_0.2","leiden_0.4,"leiden_0.6]
for j in range(len(list2)):
with rc_context({'figure.figsize': (7, 7)}):
sc.pl.umap(adata, color=str(list2[j]), add_outline=True, legend_loc='on data',
legend_fontsize=10, legend_fontoutline=2,frameon=False,
title='clustering of cells', palette='Set1')
plt.savefig("/home/Akila/integration/harmony/subset/celltype/neighbor/"
+str(neighbor[k])+str(list2[j])+"cluster_plot.png")
#save marker plots
sc.pl.dotplot(adata,marke,str(list2[j]))
plt.savefig("/home/Akila/integration/harmony/subset/celltype/
neighbor/"+str(neighbor[k])+str(list2[j])+"dotplot.png").
```

But while doing this, I feel I m subsetting the data randomly based on resolution and neighbor. In this case, should I use k-means and validate the elbow plot to obtain the optimal cluster number.

I have found "clustreeā€¯ method, which predicts the optimal clusters using R., But I am looking for some suggestions in python compatible methods. Can you please suggest me? How to proceed further.

Thanks

Akila