Sc.pp.filter_genes how to use

baicai · December 16, 2025, 6:41am

I would like to ask for clarification regarding the recommended order of
sc.pp.filter_genes and sc.pp.normalize_total, and the intended interpretation
of their interaction.

It is clear that filtering genes (e.g. min_cells=3) prior to normalization
changes the total counts per cell and therefore affects the scaling factors
used by normalize_total. From a technical perspective, this is expected.

My understanding is that this effect is not an unintended side effect, but a
conceptual choice: gene filtering defines which genes are considered part of
the meaningful expression space, and normalization is then performed with
respect to this reduced, more stable feature set.

In this view, gene filtering is not meant to improve normalization accuracy
per se, but to avoid letting extremely low-frequency, statistically unstable
genes influence library size estimation and downstream modeling (HVG selection,
PCA, DE, etc.).

Could you please confirm whether this interpretation aligns with the intended
design and recommended practice in Scanpy? Are there specific scenarios where
gene filtering should explicitly be avoided before normalize_total?

Topic		Replies	Views
Scanpy cell-based normalization Help	0	348	April 26, 2023
Normalization on rpkm data scanpy	0	260	September 11, 2023
Differential gene expression - normalize gene values? scanpy	0	249	December 6, 2023
Scale by total protein counts before running totalVI (to eliminate a batch effect of having different sequencing depths) scvi-tools	3	197	May 3, 2024
Should do normalize before integration? scanpy	1	572	April 14, 2022

Sc.pp.filter_genes how to use

Related topics