is there a way in the scverse (python way) to analyse a pathway (multiple genes) in scanpy etc. I would like to either get a pathway from a cell selection I choose, or I would like to highlight pathways for a cell cluster.
My feeling is that in R/ Seurat there are plenty of options, but some how I could not find a matching pendant in python / scanpy.
Pointers, help, co-developers welcome to answer.
I used to use the gProfiler package for this: gprofiler-official · PyPI
In practice I more often copy lists of genes and paste into web interfaces, gProfiler for quick checks and Enrichr for more exploratory endeavours.
Perhaps decoupler can be useful for you
Thank you both! I will read a little into the decoupler.
Is there a way to depict a gene pathway on the single cell map? Instead of a single gene? Do a little more systems biology approach like data visualization? Happy to get involved in there as well in case this doesnt exist in the sc universe.
Ok, should have done the reading first and then the question ; )
is really good. Found this one:
Pathway activity inference — decoupler 1.2.0 documentation (decoupler-py.readthedocs.io)
and gotta check a little deeper whats possible!
Didn’t know about this one, thanks!
in this vein, does anyone know how to load the mouse MSigDB resource into decoupler, rather than the human one? The vignette loads the human one, but I am looking to use mouse!
I actually need the same thing. Would you be able to make an issue at their github repository?
I’m Pau, the main developer of decoupler, thanks for checking it out!
I saw that @LivR already opened an issue in the repository and that @deeenes (developer of OmniPath) already replied. Here’s the link of the reply in case anyone is interested:
For now it is a little bit “hacky” but I will wrap this into a utility function soon.
I encourage you using our new python package Spectra. GitHub - dpeerlab/spectra: Supervised Pathway DEConvolution of InTerpretable Gene ProgRAms (manuscript and pip coming soon)
The problem with pathway annotation is that our annotations are imperfect and scRNAseq data is noisy. Spectra takes pathway annotations as priors to find factors which contain information from your annotations but also incorporate and explain the information from the gene expression data. So non-sensical genes from an annotation/gene set will be removed, new gene sets will be added and you can also get entirely new annotations. There is a hyperparameter called lamda in the method you can use to define how much weight you want to put on your annotations vs the gene expression data. So if you feel very strongly about the annotations you can put a high lambda.
For immune cells the package also comes with a set of annotations but you can feed in any annotation you want.
Let me know if you have questions.
Hi @wallet-maker – spectra looks interesting. Would be fun to see it use scvi-tools under the hood!
Hi @wallet-maker, spectra looks very interesting, thanks for sharing! Looking forward to reading the manuscript. I agree that prior knowledge is not perfect and that contextualization methods such as spectra are a good expansion in this direction. Interestingly, when we were benchmarking the activity inference methods available in decoupler we observed that they are quite robust to noise in the prior knowledge. What we observed is that if we add random edges into the prior networks, we didn’t see much change in the resulting activities, but when we deleted edges, the effect was noticeable.
In this figure from the original deocupleR’s manuscript we see the correlation of the original activity (no noise) with the activity obtained by randomly adding or deleting a percentage of edges in the prior knowledge.
In any case, it would be cool to generate contextualized prior-knowledge networks using spectra (I would really like to play with the lambda parameter here) and then use them to infer activities using decoupler’s methods. I’d be happy to discuss more and explore synergies!
decoupler to compute some TF’s activity. I saw the tutorial, Transcription factor activity inference. There is a question about data normalization. Should I transform the gene expression by Z-scale along to cells?
PS: I also saw the example dataset, pbmc3k.h5ad, whose
adata.X are transformed by Z-scale along to cells.
You can use either the log-normalized data or the z-transformed, you should get similar results. Note that by default,
decoupler methods look for the
adata.raw attribute, where the log-normalized counts are stored usually. My personal preference is to use the log-normalized data stored in
.raw since it often contains more genes than the ones stored inside
.X (because of the highly variable gene filtering).
Thank you, @PauBadiaM. Your reply is helpful.
But I did some simple tests and found the activities computed from the log-normalized and the z-transformed data are different.
Hi @Ann-Holmes ,
Thanks for sharing these tests! What I meant was that the general trend of activities will be similar (as you show with the obtained correlations).
Hi @PauBadiaM ,
Thanks for your reply. I have known how the activities are computed after reading the
decoupler’s source codes. Just as you say, it doesn’t change the activities when normalizing gene expression by log-transform or z-scale along cells indeed.