Differential Expression using Scanpy

Hello!

I am trying to do a differential expression analysis on three different clusters using tl.rank_genes_groups. I do have more than three clusters but only want to compare cluster 1 (in the following named C1) with Cluster 2 ( C2) and Cluster 3 (C3) respectively.

Now I have two questions regarding this:

  1. What is the correct code?
    Looking at the API, I thought of 2 ways, the first would be to call tl.rank_genes_groups twice using groups to filter for C1, C2 and C1, C3, i.e.
sc.tl.rank_genes_groups(
    adata = adata,
    groupby = "clusters",
    groups=["C1", "C2"],
    method="wilcoxon",
    corr_method="benjamini-hochberg"
)

The problem I have here is that I do not know how exactly reference is working together with groups. The default there is “rest”, but does that refer to “the rest of all other groups in clusters” or “the rest of all other groups which are in the groups argument”. The first code would be for the latter, while for the former I would probably have to set the reference, i.e.

sc.tl.rank_genes_groups(
    adata = adata,
    groupby = "clusters",
    groups=["C2", "C3"],
    reference = "C1"
    method="wilcoxon",
    corr_method="benjamini-hochberg"
)

Which one would be the correct one here?

  1. Visualization

When I visualise the result using pl.rank_genes_groups_dotplot, I get all clusters on my y-axis, but I would only want C1, C2, C3. How can I do that? The groups argument seems to only influence the genes shown.

Thank you very much for this great tool and in advance already for the help :slight_smile:

Hi

  1. According to the code I think reference="rest" will only take the groups you provided. So the best way to find marker genes for C1, C2, C3 between themselves is to just pass groups=["C1", "C2", "C3"]. This will compare C1 with C2+C3, C2 with C1+C3 etc.
  2. Looking at the code again, unfortunately I don’t see an option for this subsetted plotting. A clumsy workaround would be to compute gene expression and size yourself and pass it to pl.rank_genes_groups_dotplot as dot_color_df and dot_size_df arguments. See here