Feature selection and effects on DGE analysis?


Thanks for using scvi-tools!

So yes this is how it works. When you filter genes for fitting the scVI model, you may only perform DE for those genes (the algorithm never saw the other ones, so can’t make inferences about them).

And indeed, if you’d like to perform DE with more genes, they need to be in the input data from scVI upfront. In the manuscript we discuss that adding more genes may be problematic if your dataset is small, (rule of thumb is to absolutely not go beyond more genes than cells). You can check your latent space, if cell type becomes blurry, then you’re probably not fitting well!

Hope that helps!

1 Like