Differential expression and rarely expressed genes

Hi all,

When I’m doing differential expression on a subset of my cells (specifically cancer cells between various treatment and resistance conditions), a lot of genes come up as differentially expressed. However, when I inspect them, many of these genes are only expressed in <1% of the cells in some condition, and not expressed at all in another condition or also expressed at similarly low levels. I’m currently filtering by proportion of nonzeroes for genes expressed in at least 30% of the cells. I’m worried though that this may indicate something wrong with the model marking these genes as DE, or if I should be feeding some params to make sure this doesn’t happen. I already tried using a pseudocount of 1e-6, as well as automatic delta, importance weighing, and filtering outlier cells.

I think my main fear is that according to my understanding, these really should not be marked as DE since this would lead to a high FDR, however this doesn’t seem to be the case for me.

Hi, indeed we usually run scVI DE after selecting highly variable genes and subset to only those genes. It makes sense to filter the results afterwards to only expressed genes in most cases. 30% is rather high and I usually use max(percent expressing in group 1, percent expressing in group 2) and set the threshold to 5%. A similar result is yielded when filtering all genes with an estimated scale below 1e-4.