Comparing Clusters of Different Anndata through the use of Dendrogram

kparakul · October 25, 2022, 9:32pm

Hi there,

I am currently trying to track a cluster through various different timepoints of sc-rna seq Anndata. Essentially, say there are two anndata objects, A1 and A2, I want to ask the question for A1 cluster 1 which cluster is it most similar to from A2?

I was trying to do this through the use of a dendrogram by first finding the mean gene expression for all the shared genes within two different timepoints and then calculating the pairwise distance, creating a distance matrix, and finally performing a linkage, and creating a dendrogram through the use of the scipy.cluster.hierarchy package.

However, I do not think these are yielding good results and was wondering if there was any other built in ways to do this through scanpy. I know in scanpy you can build a dendrogram of different clusters within a single anndata, but how do you do this with multiple anndatas? That too, how do you do it such that you are only comparing the clusters between the two anndata’s and not amongst themselves? I hope this makes sense, I appreciate the help!

PauBadiaM · October 26, 2022, 8:45am

One thing you could try is to integrate them using any of the methods available in scanpy, for example harmony:
https://scanpy.readthedocs.io/en/stable/generated/scanpy.external.pp.harmony_integrate.html

What I would do is to first merge the two objects using AnnData’s concatenate method (with join=outer, else you might lose genes)
https://anndata.readthedocs.io/en/latest/generated/anndata.AnnData.concatenate.html#anndata.AnnData.concatenate

Then you generate a label for each cluster and AnnData, for example:

cluster 1 in AnnData 1: “C1-A1”
cluster 1 in AnnData 2: “C1-A2”
cluster 2 in AnnData 1: “C2-A1”
cluster 2 in AnnData 2: “C2-A2”

And you use these labels to fit harmony.

After running harmony, you can recompute the clustering on the new integrated space and check how many cells belong to their original clusters. Example:

New cluster 1: 50% of cells coming from C1-A1 and 50% from C1-A2, meaning that this new cluster is most likely the old C1.

However, since you mention that your data has time-points maybe there is a better way to account for them. Anyways, this integration would be the first thing I would try.

kparakul · November 2, 2022, 10:42am

This is an amazing idea and thank so much for providing the means for doing it. I will give it a shot!

Topic		Replies	Views
Measure cluster colocalizations in spatialRNAseq with scRNAseq clusters squidpy	1	153	December 13, 2023
Performing seurat-style data integration on data analysed using scanpy? scanpy	3	1573	June 8, 2022
Re-Clustering Clusters of Anndata scanpy	2	3470	November 8, 2022
Using squidpy following scRNAseq and spatialRNASeq integration with scanorama and scanpy Help	0	24	January 17, 2025
ST: clustering on each tissue section or clustering on all combined in anndata object? scRNA-seq clustering	0	27	August 19, 2024

Comparing Clusters of Different Anndata through the use of Dendrogram

Related topics