I have always had a question: do I need to scale my adata before running sc.tl.score_genes?

christophechu · May 30, 2024, 10:21am

tuhulab · July 18, 2024, 8:01am

In my opinion, no.
If you check the function here. The mean is calculated on X, which is the raw counts.

christophechu · July 24, 2024, 5:21am

Although it appears that the mean is calculated on X, which is the raw counts, this step is typically performed after standardization and normalization (normalize and log1p). Therefore, I believe it might represent the processed matrix rather than the raw counts.

ddiez · August 2, 2024, 10:30am

In the help page for scanpy.tl.score_genes it says that it tries to reproduce the approach in Seurat. Looking at AddModuleScore it seems by default they use the data slot (or layer), and it does not seem they perform any scaling of the features within that function.

One problem I see with the current implementation in scanpy is that it is not possible to pass a layer, so if you do not have normalized log data in X, it might not return the expected results, especially if you expect to get something similar to Seurat.

Regarding your specific question, whether we should scale or not the data previous to scoring, I guess that would change the emphasis. In the default approach features with higher expression levels will have more weight in the final score than those with lower expression levels. So, it might depend on what you want to do. If you want to use the method as in the original implementation (Seurat apparently) then do not scale. But I may be missing something here…

christophechu · February 14, 2025, 6:29pm

Although it appears that the mean is calculated on X, which is the raw counts, this step is typically performed after standardization and normalization (normalize and log1p). Therefore, I believe it might represent the processed matrix rather than the raw counts.

Topic		Replies	Views
Do I need to scale my adata before running sc.tl.score_genes? scanpy	2	175	July 23, 2024
Regress out cell cycle scanpy	0	1637	April 20, 2023
Can’t change anndata dimensions anndata	6	2036	March 9, 2023
How to handle data lognormalization when using highly_variable_genes() with flavor seurat_v3? scanpy	3	1521	February 3, 2023
Scanpy.tl.rank_genes_groups, layer= does not appear to be working scanpy	1	1154	December 31, 2022

I have always had a question: do I need to scale my adata before running sc.tl.score_genes?

Related topics