Outlier Detection PCA / Clustering

Hi Everyone,

I am trying to see if my patient groups are presenting as outliers before I finalize my differential expression analysis. Typically I would use a pca alone to describe the variation among groups but the pca of cells in scanpy doesn’t give a precise separation between groups. I have an adata.obs variable that describes which patient each readout belongs. Does anyone have any experience or ideas about how to get the most representative separation between patients?

Thank you,


Unfortunately, I think this is a currently unsolved problem. You have repeated measures from each patient (i.e. the cells). All current analysis works on similarity of cells. But you are interested in similarity between patients from who the cells come from. This would require some hierarchical latent variable model (or similar) where patients are represented so they would produce similar scRNA-seq datasets, where each dataset is heterogeneous collection of cells.

A quick check could be to create pseudobulk of your patients single cell data, and treat that as you would regular bulk RNA-seq data. Then evaluate if you want to remove outliers, and return to the single-cell data for single-cell analysis.




Thanks so much for your insight on my query.
Your idea is really clever, I will go ahead and convert the matrices back and fourth to determine if there are outliers.

Thanks again,