Scanpy.rank_genes_groups after pp.regress_out

quincey · February 27, 2025, 6:47am

Hello,
I was using scanpy to integrate some datasets, this is my workflow:

adata = ad.concat(h5ad_list, join=‘outer’, uns_merge=‘unique’)
adata.var[“mt”] = adata.var_names.str.startswith(“MT-”)
sc.pp.calculate_qc_metrics(adata, qc_vars=[“mt”], inplace=True)

sc.pp.filter_cells(adata, min_genes=100)
adata.layers[“raw”] = adata.X.copy()
sc.pp.normalize_total(adata, inplace=True)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, flavor=hvg_method, n_top_genes=nfeatures, layer=‘raw’)

adata.obs.fillna({‘pct_counts_mt’: 0}, inplace=True)
sc.pp.regress_out(adata, [‘pct_counts_mt’])

sc.pp.pca(adata)
sce.pp.harmony_integrate(adata, key=‘sample’)
sc.pp.neighbors(adata, n_pcs=pc_num, use_rep=‘X_pca_harmony’)
sc.tl.umap(adata)
sc.tl.leiden(
adata, key_added=“clusters”, flavor=“igraph”, directed=False, n_iterations=2, resolution=0.5
)

sc.tl.rank_genes_groups(adata, “clusters”, method=“wilcoxon”, pts=True)

Here I got some warning:
RuntimeWarning: invalid value encountered in log2

I checked adata.X and found some negative values:
array([[-1.38083658e-04, -7.98072433e-03, -3.01776285e-03, …,
-2.20001909e-01, -1.19811085e-01, -8.57294130e-01],
[-1.42802153e-04, -8.02760262e-03, -3.17046385e-03, …,
-1.70193650e-01, 1.48023778e-01, -3.58150496e-01],
[-9.71810418e-05, -7.57435644e-03, -1.69406323e-03, …,
-6.51768412e-01, 1.02808362e+00, -7.13421941e-02],
…,
[-1.43193189e-04, -8.03148757e-03, -3.18311863e-03, …,
-1.66065892e-01, -9.70656011e-02, 7.44920864e-01],
[-1.44542609e-04, -8.04489407e-03, -3.22678887e-03, …,
-1.51821463e-01, -9.10585499e-02, -6.53981357e-01],
[-1.38429397e-04, -7.98415926e-03, -3.02895176e-03, …,
-2.16352295e-01, -1.18271998e-01, -8.46411050e-01]])

My questions are:

Should I use rank_genes_groups on layer ‘raw’?
Were the negative values produced by ‘pp.regress_out’?
Was my workflow correct?
Is ‘pp.scale’ needed? I didn’t see it in current scanpy tutorial.

Thanks a lot.

Topic		Replies	Views
Issue with logfoldchanges in scanpy.tl.rank_genes_groups scanpy	1	2770	March 11, 2023
Scanpy.tl.rank_genes_groups, layer= does not appear to be working scanpy	1	1303	December 31, 2022
Error with sc.tl.rank_genes_groups : invalid value encountered in log2 self.stats scanpy	0	711	April 22, 2023
Rank_genes_groups expects log data but default to adata.raw, why? scanpy	0	71	November 7, 2025
Regress out cell cycle scanpy	0	1782	April 20, 2023

Scanpy.rank_genes_groups after pp.regress_out

Related topics