Question about Github Issue

maflot · March 7, 2023, 9:02am

I have questions about the scanpy foldchange computations.
I stumbled across these two issues, which point out two severe issues about the foldchange computation and the tl.rank_genes_groups function.

github.com/scverse/scanpy

rank_genes_groups logfoldchange different than seurat

opened 09:37PM - 07 Oct 19 UTC

nagaBoulevard

Hi! I've noticed that the function `rank_genes_groups` calculate logFC differen…tly than seurat. https://github.com/theislab/scanpy/blob/5800db45dde0c2060ad0a9bed8ea931bd41da936/scanpy/tools/_rank_genes_groups.py#L207-L208 https://github.com/theislab/scanpy/blob/5800db45dde0c2060ad0a9bed8ea931bd41da936/scanpy/tools/_rank_genes_groups.py#L223 Thus the equation is `log(exp(mean(values))` while in Seurat is https://github.com/satijalab/seurat/blob/96d07d80bc4b6513b93e9c10d8a9d57ae7016f9f/R/differential_expression.R#L175-L179 thus `log(mean(exp(values)))` I was thus wondering if this was intended, since it leads to different logFC values.

github.com/scverse/scanpy

KeyError: 'base' when running `tl.rank_genes_groups`

opened 05:29PM - 18 Apr 22 UTC

naity2

- [x] I have checked that this issue has not already been reported. - [x] I hav…e confirmed this bug exists on the latest version of scanpy. - [x] (optional) I have confirmed this bug exists on the master branch of scanpy. --- **Note**: Please read [this guide](https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports) detailing how to provide the necessary information for us to reproduce your bug. ### Minimal code sample (that we can copy&paste without having any data) ```python # Your code here sc.tl.rank_genes_groups(adata, "origin", method="wilcoxon") ``` ```pytb [Paste the error output produced by the above code here] --------------------------------------------------------------------------- KeyError Traceback (most recent call last) Input In [18], in <cell line: 1>() ----> 1 sc.tl.rank_genes_groups(adata, "origin", method="wilcoxon") 2 sc.pl.rank_genes_groups(adata, n_genes=25, sharey=False) File ~/app/miniconda3/envs/bio/lib/python3.9/site-packages/scanpy/tools/_rank_genes_groups.py:590, in rank_genes_groups(adata, groupby, use_raw, groups, reference, n_genes, rankby_abs, pts, key_added, copy, method, corr_method, tie_correct, layer, **kwds) 580 adata.uns[key_added] = {} 581 adata.uns[key_added]['params'] = dict( 582 groupby=groupby, 583 reference=reference, (...) 587 corr_method=corr_method, 588 ) --> 590 test_obj = _RankGenes(adata, groups_order, groupby, reference, use_raw, layer, pts) 592 if check_nonnegative_integers(test_obj.X) and method != 'logreg': 593 logg.warning( 594 "It seems you use rank_genes_groups on the raw count data. " 595 "Please logarithmize your data before calling rank_genes_groups." 596 ) File ~/app/miniconda3/envs/bio/lib/python3.9/site-packages/scanpy/tools/_rank_genes_groups.py:93, in _RankGenes.__init__(self, adata, groups, groupby, reference, use_raw, layer, comp_pts) 82 def __init__( 83 self, 84 adata, (...) 90 comp_pts=False, 91 ): ---> 93 if 'log1p' in adata.uns_keys() and adata.uns['log1p']['base'] is not None: 94 self.expm1_func = lambda x: np.expm1(x * np.log(adata.uns['log1p']['base'])) 95 else: KeyError: 'base' ``` #### Versions <details> [Paste the output of scanpy.logging.print_versions() leaving a blank line after the details tag] ----- anndata 0.8.0 scanpy 1.9.1 ----- Levenshtein NA OpenSSL 21.0.0 PIL 9.1.0 adjustText NA airr 1.3.1 appdirs 1.4.4 asttokens NA attr 21.4.0 backcall 0.2.0 beta_ufunc NA binom_ufunc NA bioservices 1.8.4 boto3 1.21.42 botocore 1.24.42 brotli NA bs4 4.11.1 cattr NA certifi 2021.10.08 cffi 1.15.0 charset_normalizer 2.0.4 colorama 0.4.4 colorlog NA cryptography 36.0.0 cycler 0.10.0 cython_runtime NA dateutil 2.8.2 debugpy 1.6.0 decorator 5.1.1 defusedxml 0.7.1 easydev 0.12.0 entrypoints 0.4 executing 0.8.3 gseapy 0.10.8 h5py 3.6.0 hypergeom_ufunc NA idna 3.3 igraph 0.9.10 ipykernel 6.13.0 ipython_genutils 0.2.0 ipywidgets 7.7.0 jedi 0.18.1 jmespath 1.0.0 joblib 1.1.0 jupyter_server 1.16.0 kiwisolver 1.4.2 leidenalg 0.8.9 llvmlite 0.38.0 lxml 4.8.0 matplotlib 3.5.1 matplotlib_inline NA mpl_toolkits NA natsort 8.1.0 nbinom_ufunc NA networkx 2.8 numba 0.55.1 numpy 1.21.6 packaging 21.3 pandas 1.4.2 parasail 1.2.4 parso 0.8.3 pexpect 4.8.0 pickleshare 0.7.5 pkg_resources NA prompt_toolkit 3.0.29 psutil 5.9.0 ptyprocess 0.7.0 pure_eval 0.2.2 pycparser 2.21 pydev_ipython NA pydevconsole NA pydevd 2.8.0 pydevd_file_utils NA pydevd_plugins NA pydevd_tracing NA pyexpat NA pygments 2.11.2 pylab NA pynndescent 0.5.6 pyparsing 3.0.8 pytoml NA pytz 2022.1 requests 2.27.1 requests_cache 0.9.2 scipy 1.8.0 scirpy 0.10.1 seaborn 0.11.2 session_info 1.0.0 setuptools_scm NA six 1.16.0 sklearn 1.0.2 socks 1.7.1 soupsieve 2.3.2.post1 stack_data 0.2.0 statsmodels 0.13.2 texttable 1.6.4 threadpoolctl 3.1.0 tornado 6.1 tqdm 4.62.3 tracerlib NA traitlets 5.1.1 typing_extensions NA umap 0.5.3 url_normalize 1.4.3 urllib3 1.26.7 wcwidt </details>

Also, I also experienced, that the foldchanges differ drastically compared to the ones calculated by Seurat or MAST.
Will these issue be addressed in future?

maflot · March 20, 2023, 9:12am

[Update]
As far as I understood the main difference between the scanpy and seurat or MAST is in the computation of the mean.

Seurat, MAST using the arithmetic mean.
Scanpy is using the geometric mean.

The scanpy documentation already points out that tools such as MAST are more reliable in this tutorial.

It would be great to point out somewhere in the documentation that it is not possible to directly compare the logFCs computed with scanpy and those from most R packages.

grst · March 22, 2023, 3:20pm

Please also take a look at the best practice chapter on differential expression analysis.

Topic		Replies	Views
Issue with logfoldchanges in scanpy.tl.rank_genes_groups scanpy	1	2537	March 11, 2023
Error with sc.tl.rank_genes_groups : invalid value encountered in log2 self.stats scanpy	0	660	April 22, 2023
Scanpy.rank_genes_groups after pp.regress_out Help	0	41	February 27, 2025
P-values for scanpy.tl.rank_genes_groups scanpy	1	149	July 30, 2024
Scanpy.tl.rank_genes_groups, layer= does not appear to be working scanpy	1	1175	December 31, 2022

Question about Github Issue

Related topics