Is the scVI model applicable to bulk RNA-seq data from cell lines?

Yao · April 16, 2025, 2:13am

Hi guys,
I know that scVI is primarily designed for modeling UMI-based single-cell RNA-seq counts, but I’m wondering whether it might also be applicable to bulk RNA-seq data from homogeneous cell lines. Since bulk RNA-seq typically uses full-length sequencing protocols, I was thinking it might be possible to preprocess it similarly to Smart-seq2 data.

After reviewing the scVI parameters (scVI — scvi-tools), I’m curious — if we consider bulk RNA-seq data as essentially the sum of a large number of identical cells, dropout events should be rare. In that case, would I need to adjust any specific parameters related to dropout modeling (e.g. disable zero inflation or tweak dispersion priors) to make scVI suitable for this type of data?

Would appreciate any thoughts or suggestions on this.

cane11 · April 16, 2025, 5:27am

Hi, you don’t need any adjustment. The division by gene length in some of our tutorials is just so that Smart-Seq2 and 10X become more similar. I would use the default of ZINB reconstruction loss. However, keep in mind that for low sample number (cell number in single cell studies) the model will not generalize well (if you have beyond 500 bulk samples the limitation shouldn’t matter much).

Topic		Replies	Views
Minimum number of cells for scVI? scvi-tools scvi	2	352	February 15, 2023
Dead link in parameter autotuning blog post Site Feedback	1	498	May 10, 2021
totalVI, peakVI, multiVI with scRNA-seq and scATAC-seq data scvi-tools multivi , totalvi , modeling	3	797	March 9, 2023
Best practices for processing/analyzing large scale scrna-seq datasets across multiple days scRNA-seq integration , scvi	2	704	September 15, 2022
Domain adaptation to pre-train batch correction model using paired data scvi-tools integration , scvi , developer	12	113	May 27, 2025

Is the scVI model applicable to bulk RNA-seq data from cell lines?

Related topics