Hello, thank you for providing such brilliant tools!
I am new to analyzing my scRNAseq datasets, and I would like some advice on how to perform DE (differential expression) analysis.
My dataset contains cells from two conditions, control and disease, with only one replicate for each condition. ( because we had a tight budget )
I did data integration and clustering, and now I want to determine how gene expression differs in a specific cluster between the conditions.
My understanding is that:
- With only one replicate per condition, we cannot actually do differential expression analysis like DESeq2 because there is no way to distinguish between biological and technical variability without multiple replicates. https://support.bioconductor.org/p/118745/
- We should use raw data, not batch-corrected data, for DE analysis. Proper way to calculate differential gene expression after batch alignment? · Issue #669 · scverse/scanpy · GitHub
So, I used sc.tl.rank_genes_groups() to identify genes with different expression levels across conditions.
I created adata containing cells in cluster A and ran sc.tl.rank_genes_groups to rank genes.
I then filtered the results based on logFC, p-value, and the minimum fraction of cells in either population to identify genes with different expression levels.
adata_clusterA = adata[adata.obs[‘cell_type’] == “clusterA”].copy()
sc.tl.rank_genes_groups(adata_clusterA, “condition”, use_raw=True, reference=“rest”, pts=“True”)
df = sc.get.rank_genes_groups_df(adata_clusterA, “Wild”)
logfc_threshold = 0.8
pval_threshold = 0.05
min_pct = 0.1
df[
(df[“logfoldchanges”] > logfc_threshold)
&
(df[“pvals_adj”] < pval_threshold)
&
((df[“pct_nz_group”] > min_pct) | (df[“pct_nz_reference”]> min_pct))
].names
Am I on the right track to find genes with differential expression?
Are there any better ways to do it?
I have read several papers about DE, but they only discuss DE with multiple replicates.
It would be really helpful for me to get some advice on how experts conduct this kind of analysis.
Thank you!