Differential Expression using Scanpy

PaulJonasJost · January 12, 2023, 1:56pm

Hello!

I am trying to do a differential expression analysis on three different clusters using tl.rank_genes_groups. I do have more than three clusters but only want to compare cluster 1 (in the following named C1) with Cluster 2 ( C2) and Cluster 3 (C3) respectively.

Now I have two questions regarding this:

What is the correct code?
Looking at the API, I thought of 2 ways, the first would be to call tl.rank_genes_groups twice using groups to filter for C1, C2 and C1, C3, i.e.

sc.tl.rank_genes_groups(
    adata = adata,
    groupby = "clusters",
    groups=["C1", "C2"],
    method="wilcoxon",
    corr_method="benjamini-hochberg"
)

The problem I have here is that I do not know how exactly reference is working together with groups. The default there is “rest”, but does that refer to “the rest of all other groups in clusters” or “the rest of all other groups which are in the groups argument”. The first code would be for the latter, while for the former I would probably have to set the reference, i.e.

sc.tl.rank_genes_groups(
    adata = adata,
    groupby = "clusters",
    groups=["C2", "C3"],
    reference = "C1"
    method="wilcoxon",
    corr_method="benjamini-hochberg"
)

Which one would be the correct one here?

Visualization

When I visualise the result using pl.rank_genes_groups_dotplot, I get all clusters on my y-axis, but I would only want C1, C2, C3. How can I do that? The groups argument seems to only influence the genes shown.

Thank you very much for this great tool and in advance already for the help

mxposed · January 18, 2023, 12:37am

Hi

According to the code I think reference="rest" will only take the groups you provided. So the best way to find marker genes for C1, C2, C3 between themselves is to just pass groups=["C1", "C2", "C3"]. This will compare C1 with C2+C3, C2 with C1+C3 etc.
Looking at the code again, unfortunately I don’t see an option for this subsetted plotting. A clumsy workaround would be to compute gene expression and size yourself and pass it to pl.rank_genes_groups_dotplot as dot_color_df and dot_size_df arguments. See here

Topic		Replies	Views
Rank_genes_groups pts and DE scanpy	0	126	February 27, 2025
How to check average gene expression for each of 2 conditions within a cluster in scanpy? Help	0	390	March 30, 2023
Difference between tl.rank_genes_groups and pl.rank_genes_groups_heatmap scanpy	0	131	November 20, 2024
Which is the best method to compare the differently expressed genes between 2 clusters `sc.tl.rank_genes_groups` or `MAST.zlm`? scanpy	0	332	January 1, 2023
Scanpy.tl.rank_genes_groups, layer= does not appear to be working scanpy	1	1169	December 31, 2022

Differential Expression using Scanpy

Related topics