Rank_genes_groups expects log data but default to adata.raw, why?

luguna · November 7, 2025, 5:19pm

Hi all,

I am relatively new to single-cell (and spatial) data analysis. I was following this pipeline to try to replicate an analysis, and it had the following steps:

sc.pp.filter_cells(adata,min_counts=40)
sc.pp.filter_cells(adata,min_genes=15)
adata.raw=adata
adata.layers['raw']=adata.X
sc.pp.normalize_total(adata, target_sum=100)
sc.pp.log1p(adata)

(...)

sc.tl.rank_genes_groups(adata, groupby='louvain_0.5', method='wilcoxon',key_added='louvain_0.5')

When running it, rank_genes_groups() was returning a warning about the data being in raw counts. After doing some reading and checking the function’s wiki page, I learned that indeed rank_genes_groups() expects logarithmized data. However, the function also has an argument “use_raw” which defaults to None. When “None”, it uses raw attribute of adata if present. The default behavior is to use raw if present.

What is the logic here? It expects log data but also defaults to raw if present, why?

Other scanpy functions (eg. violin plots) also default to adata.raw if present. What am I missing?

To be fair, I don’t know many tutorials setting the “adata.raw = adata”, so is that an old habit that is now obsolete and people now just use layers?

Topic		Replies	Views
How can I logarithmize the data for wilcoxon test? scanpy	1	497	April 25, 2024
Scanpy.tl.rank_genes_groups, layer= does not appear to be working scanpy	1	1303	December 31, 2022
How could `adata.raw.X` contain non-integer values? scanpy anndata	2	216	July 13, 2025
Scanpy.rank_genes_groups after pp.regress_out Help	0	107	February 27, 2025
Issue with logfoldchanges in scanpy.tl.rank_genes_groups scanpy	1	2770	March 11, 2023

Rank_genes_groups expects log data but default to adata.raw, why?

Related topics