Re-Clustering Clusters of Anndata

Hi Everyone!

I have a question about re-clustering some clusters from my anndata. Say I perform a clustering for my anndata that reveals 10 clusters. From here I extract clusters 1, 2, and 3, and store them into a new anndata object like this:

new_anndata = anndata[anndata.obs[‘leiden’].isin([‘1’,‘2’,‘3’]),:]

If I want to re-cluster these clusters can I simply do:

sc.tl.leiden(new_anndata , resolution = 1)
sc.tl.umap(new_anndata )

and then plot the umap? Or do I have to perform a sc.pp.scale, sc.tl.pca, and compute neighbors sc.pp.neighbors, before doing the sc.tl.leiden and sc.tl.umap? Or should I do something entirely different?

With this, if I am subsetting clusters 1, 2, and 3 from anndata that contains two separate batches, and am trying to re-cluster them, should I have to again correct for batch effects?

I am sorry if these questions are trivial however, I tried all combinations of performing and not performing the pca and batch corrections, and the umap as a result has been wildly different each time. I am not sure which is the correct one and if I am performing the re-clustering correctly.

Any insight is greatly appreciated. Thank you all so much!

Hi Kparakul,

When you do the subsetting (anndata[anndata.obs[‘leiden’].isin([‘1’,‘2’,‘3’]),:]), the ‘neighbors’ information generated by the sc.pp.neighbors() function will come along with the rest of the data.

So if you wish to apply Leiden on the subset with for example a different resolution parameter, but the rest fixed, you should be all set! If you have done some other calculations that affect the creation of the neighbors matrix, it will all be kept.

As you have noted, different ways of clustering the cells will give very different results. There is no ‘correct’ way of doing; it will depend on what aspects you want to investigate with your clusterings. For example, if you do PCA after having restricted your data to a subset of cells from a celltype, you’re likely to ‘miss’ principal components representing pathways active in the other cells. On the other hand, this means you can focus on more subtle pathways in the cells you are zooming in on.

Interpretation of ‘correcting for batch effects’ before or after subsetting will depend strongly on which strategy you are using to do this.

Hope this helps!
/Valentine

1 Like

Hey Valentine!

Thank you so much for the clarification I can’t begin to thank the folks on here providing so much help and guidance when it comes to amateur’s like me making sense of how to use scanpy and its various configurations. Appreciate the help! :slight_smile: