How to decide the number of Optimal clusters while clustering the subtypes of cells

AkilaRanjith · June 24, 2022, 7:18pm

Hello All,
I am working on approximately five hundred thousand cells belonging to brain tissue. I want to do clustering the neuronal and non-neuronal subtypes.

For example: In microglia cell type would like to see homeostatic, proliferating and activated microglia. For this, I have subsetted the clusters which belong to microglia and re-cluster them using various resolutions and n_neighbor cutoff.

But I would like to know if there is an elegant way to find out the optimal cluster numbers from the subset of cells.

Here is the piece of code

Blockquote

     # Read Anndata
    
       adata=anndata.read_h5ad("/home/Akila/integration/harmony/subset/celltype/
       microglia.h5ad")
      
     # known Marker gene-microglia
     
       Marker= {'Microglia-Homeostatic': ['CX3CR1','CSF1R','APBB1IP'],'Microglia- 
       Activated-1': ['CD163','CD83'],Microglia-Inflammatory': ['HLA-A', 'HLA- 
      B','C3'],'Microglia-Proliferative': ['FAM111B']}

    
    # Varying number of neighbors
    neighbor=[3:30]
    for k in range(len(neighbor)):
           sc.pp.neighbors(adata,use_rep="X_pca_harmony",n_pcs=18,
                           n_neighbors=int(neighbor[k]))
# Different resolution
           sc.tl.leiden(adata,resolution = 0.05 key_added = "leiden_0.05")
           sc.tl.leiden(adata,resolution = 0.2, key_added = "leiden_0.2")
           sc.tl.leiden(adata,resolution = 0.4, key_added = "leiden_0.4")
           sc.tl.leiden(adata,resolution = 0.6, key_added = "leiden_0.6")

  # Save plots
  list2=["leiden_0.05","leiden_0.2","leiden_0.4,"leiden_0.6]
  for j in range(len(list2)):

        with rc_context({'figure.figsize': (7, 7)}):
            sc.pl.umap(adata, color=str(list2[j]), add_outline=True, legend_loc='on data',
                legend_fontsize=10, legend_fontoutline=2,frameon=False,
                title='clustering of cells', palette='Set1')                          


          plt.savefig("/home/Akila/integration/harmony/subset/celltype/neighbor/"
         +str(neighbor[k])+str(list2[j])+"cluster_plot.png")

   #save marker plots
           sc.pl.dotplot(adata,marke,str(list2[j]))
        
          plt.savefig("/home/Akila/integration/harmony/subset/celltype/
         neighbor/"+str(neighbor[k])+str(list2[j])+"dotplot.png").

But while doing this, I feel I m subsetting the data randomly based on resolution and neighbor. In this case, should I use k-means and validate the elbow plot to obtain the optimal cluster number.

I have found "clustree” method, which predicts the optimal clusters using R., But I am looking for some suggestions in python compatible methods. Can you please suggest me? How to proceed further.

Thanks
Akila

Valentine_Svensson · July 8, 2022, 1:43am

Hi Akila,

Unfortunately, this is an extremely challenging problem. The more cells you have, the more clusters you can analyze. The question of when to stop clustering is partially philosophical and partially practical (how many microglia sub-types can you work with and describe?)

The strategies depend on the goals of the research. In some cases you know what kind of cell types you expect (for example, here you are expecting three microglia sub-types), and the goal is to estimate the proportions of these, or learn about their gene expression or responses to stimuli. In some cases the goal is to further subdivide known sub-classes to dissect major directions of variability.

Since you are expecting three classes of microglia, it seems to me that there are a couple of strategies to take: 1) Use known markers for these sub-classes to divide your cells into those, than analyse them. or, 2) Do relatively high resolution clustering, and merge the clusters which appear to all have the characteristics of these sub-classes.

Hope this helps!
/Valentine

Topic		Replies	Views
Oddly high amount of clusters during subcluster analysis using Leiden scanpy	0	521	June 12, 2023
Subclustering of one single cluster in my UMAP Plot scanpy clustering	0	569	March 23, 2023
Clustering subsets of cells scvi-tools scvi , clustering	3	1217	November 15, 2021
Subset/subcluster and reprocess scRNA-seq	0	1050	August 3, 2022
Re-Clustering Clusters of Anndata scanpy	2	3381	November 8, 2022

How to decide the number of Optimal clusters while clustering the subtypes of cells

Related topics