Hello,
I am struggling to understand the filtering done in decoupler.pp.filter_by_expr.
I first used the pl.filter_by_expr to visualize how genes are expressed in my pseudobulk samples using:
dc.pl.filter_by_expr(
adata=pdata_muscle,
group=“anatomical_cluster”,
min_count=10, #threshold “minimum number of counts in a given number of samples”
min_total_count=40, #threshold “minimum total number of reads across all samples” ie x-axis value
large_n=20, #nb of samples in a group to be considered large
min_prop=0.6, #proportion of samples in the smallest group that should express a gene
)
which gives the following plot:
could someone explain the plot? I know only what’s in the upper right quadrant will be kept.
Isn’t it weird to have number of samples at zero while log total sum of counts non null?
Also, I then did the filtering with dc.pp.filter_by_expr and even for low thresholds I am only left with 40 genes so it filtered a lot.
Thank you very much for your precious help!
