I would like to ask for clarification regarding the recommended order of
sc.pp.filter_genes and sc.pp.normalize_total, and the intended interpretation
of their interaction.
It is clear that filtering genes (e.g. min_cells=3) prior to normalization
changes the total counts per cell and therefore affects the scaling factors
used by normalize_total. From a technical perspective, this is expected.
My understanding is that this effect is not an unintended side effect, but a
conceptual choice: gene filtering defines which genes are considered part of
the meaningful expression space, and normalization is then performed with
respect to this reduced, more stable feature set.
In this view, gene filtering is not meant to improve normalization accuracy
per se, but to avoid letting extremely low-frequency, statistically unstable
genes influence library size estimation and downstream modeling (HVG selection,
PCA, DE, etc.).
Could you please confirm whether this interpretation aligns with the intended
design and recommended practice in Scanpy? Are there specific scenarios where
gene filtering should explicitly be avoided before normalize_total?