Implementation of PROGENy pathway analysis

Sorry if this is a naive question, but I’m trying to check my understanding of the PROGENy approach to pathway analysis, and scverse/decoupleR’s implementation of it.

In the original PROGENy paper, I see that the PROGENy weights (the values in the model in scverse, right?) are regression coefficients that they found from fitting “pathway activity” vs “per-gene expression levels”.
(And when they did their fit, the “pathway activities” were 0 or 1, and the expressions were z-scores).

That would align with how scverse/decoupleR is getting PROGENy pathway activities from expression data (running an mlm, and using the t-values as the activities). So I think that makes sense to me.

But, in the “PROGENy Scores” section of the paper, they seem to explain that you should get the “pathway scores” by simple matrix multiplication (i.e. weighted sum of the gene expression, weighted by the PROGENy weights). And this approach seems to agree with usage of PROGENy that I see in the literature (e.g. this paper from the Saez-Rodriguez group from 2020)

Are these two approaches actually equivalent, and I’m not seeing it? Or are they answering two different questions, and I need to be careful which approach I take?

(Thanks for such a great ecosystem of tools, BTW!)

Hi @zanebeckwith ,

Thanks for the kind words, there are really two separate steps here.

The first step is building the “network”, i.e. the weighted gene sets. This is what the regression in the PROGENy paper does: they fit a multivariate linear model on perturbation experiments where the active pathway was known, with z-scores as the response and pathway indicators as predictors. The resulting beta coefficients per gene are the weights, and that’s what you see as the “model/net” in decoupler.

The second step is using that weighted gene set to infer pathway scores in new data. Originally PROGENy did this by matrix multiplication of expression against the weights, which is essentially a weighted sum (wsum in decoupler). In the decoupler benchmark we found that ulm performs better and is faster than wsum, so that’s what we recommend. mlm also performs well in principle, but it can suffer when the weights matrix has collinearity between pathways, which is why we suggest ulm as the default.

Hope this is helpful!

This is excellent, thank you @PauBadiaM!

We’ll use ulm rather than mlm then

1 Like