Is the scVI model applicable to bulk RNA-seq data from cell lines?

Hi guys,
I know that scVI is primarily designed for modeling UMI-based single-cell RNA-seq counts, but I’m wondering whether it might also be applicable to bulk RNA-seq data from homogeneous cell lines. Since bulk RNA-seq typically uses full-length sequencing protocols, I was thinking it might be possible to preprocess it similarly to Smart-seq2 data.

After reviewing the scVI parameters (scVI — scvi-tools), I’m curious — if we consider bulk RNA-seq data as essentially the sum of a large number of identical cells, dropout events should be rare. In that case, would I need to adjust any specific parameters related to dropout modeling (e.g. disable zero inflation or tweak dispersion priors) to make scVI suitable for this type of data?

Would appreciate any thoughts or suggestions on this.

Hi, you don’t need any adjustment. The division by gene length in some of our tutorials is just so that Smart-Seq2 and 10X become more similar. I would use the default of ZINB reconstruction loss. However, keep in mind that for low sample number (cell number in single cell studies) the model will not generalize well (if you have beyond 500 bulk samples the limitation shouldn’t matter much).