Implementation of PROGENy pathway analysis

zanebeckwith · April 30, 2026, 1:50pm

Sorry if this is a naive question, but I’m trying to check my understanding of the PROGENy approach to pathway analysis, and scverse/decoupleR’s implementation of it.

In the original PROGENy paper, I see that the PROGENy weights (the values in the model in scverse, right?) are regression coefficients that they found from fitting “pathway activity” vs “per-gene expression levels”.
(And when they did their fit, the “pathway activities” were 0 or 1, and the expressions were z-scores).

That would align with how scverse/decoupleR is getting PROGENy pathway activities from expression data (running an mlm, and using the t-values as the activities). So I think that makes sense to me.

But, in the “PROGENy Scores” section of the paper, they seem to explain that you should get the “pathway scores” by simple matrix multiplication (i.e. weighted sum of the gene expression, weighted by the PROGENy weights). And this approach seems to agree with usage of PROGENy that I see in the literature (e.g. this paper from the Saez-Rodriguez group from 2020)

Are these two approaches actually equivalent, and I’m not seeing it? Or are they answering two different questions, and I need to be careful which approach I take?

(Thanks for such a great ecosystem of tools, BTW!)

PauBadiaM · May 1, 2026, 4:37pm

Hi @zanebeckwith ,

Thanks for the kind words, there are really two separate steps here.

The first step is building the “network”, i.e. the weighted gene sets. This is what the regression in the PROGENy paper does: they fit a multivariate linear model on perturbation experiments where the active pathway was known, with z-scores as the response and pathway indicators as predictors. The resulting beta coefficients per gene are the weights, and that’s what you see as the “model/net” in decoupler.

The second step is using that weighted gene set to infer pathway scores in new data. Originally PROGENy did this by matrix multiplication of expression against the weights, which is essentially a weighted sum (wsum in decoupler). In the decoupler benchmark we found that ulm performs better and is faster than wsum, so that’s what we recommend. mlm also performs well in principle, but it can suffer when the weights matrix has collinearity between pathways, which is why we suggest ulm as the default.

Hope this is helpful!

zanebeckwith · May 1, 2026, 8:19pm

This is excellent, thank you @PauBadiaM!

We’ll use ulm rather than mlm then

Topic		Replies	Views
Pathway quantification with decoupler - questions on the possible inputs Help	3	161	October 22, 2025
Pathway Analysis scRNA-seq	16	2755	November 16, 2022
Interpreting decoupler results decoupler	1	49	April 16, 2026
Is it possible to include leading edge genes in decoupler.run_gsea output? Help	3	732	September 8, 2023
Understanding `differential_expression()` and lvm-DE scvi-tools	3	236	August 28, 2024

Implementation of PROGENy pathway analysis

Related topics