Rank_genes_groups expects log data but default to adata.raw, why?

Hi all,

I am relatively new to single-cell (and spatial) data analysis. I was following this pipeline to try to replicate an analysis, and it had the following steps:

sc.pp.filter_cells(adata,min_counts=40)
sc.pp.filter_cells(adata,min_genes=15)
adata.raw=adata
adata.layers['raw']=adata.X
sc.pp.normalize_total(adata, target_sum=100)
sc.pp.log1p(adata)

(...)

sc.tl.rank_genes_groups(adata, groupby='louvain_0.5', method='wilcoxon',key_added='louvain_0.5')
        

When running it, rank_genes_groups() was returning a warning about the data being in raw counts. After doing some reading and checking the function’s wiki page, I learned that indeed rank_genes_groups() expects logarithmized data. However, the function also has an argument “use_raw” which defaults to None. When “None”, it uses raw attribute of adata if present. The default behavior is to use raw if present.

What is the logic here? It expects log data but also defaults to raw if present, why?

Other scanpy functions (eg. violin plots) also default to adata.raw if present. What am I missing?

To be fair, I don’t know many tutorials setting the “adata.raw = adata”, so is that an old habit that is now obsolete and people now just use layers?