Hello!
I am trying to do a differential expression analysis on three different clusters using tl.rank_genes_groups
. I do have more than three clusters but only want to compare cluster 1 (in the following named C1) with Cluster 2 ( C2) and Cluster 3 (C3) respectively.
Now I have two questions regarding this:
- What is the correct code?
Looking at the API, I thought of 2 ways, the first would be to calltl.rank_genes_groups
twice usinggroups
to filter forC1, C2
andC1, C3
, i.e.
sc.tl.rank_genes_groups(
adata = adata,
groupby = "clusters",
groups=["C1", "C2"],
method="wilcoxon",
corr_method="benjamini-hochberg"
)
The problem I have here is that I do not know how exactly reference
is working together with groups
. The default there is “rest”, but does that refer to “the rest of all other groups in clusters” or “the rest of all other groups which are in the groups argument”. The first code would be for the latter, while for the former I would probably have to set the reference, i.e.
sc.tl.rank_genes_groups(
adata = adata,
groupby = "clusters",
groups=["C2", "C3"],
reference = "C1"
method="wilcoxon",
corr_method="benjamini-hochberg"
)
Which one would be the correct one here?
- Visualization
When I visualise the result using pl.rank_genes_groups_dotplot
, I get all clusters on my y-axis, but I would only want C1, C2, C3. How can I do that? The groups
argument seems to only influence the genes shown.
Thank you very much for this great tool and in advance already for the help