Raw count input for scVI cell clustering

I notice that scVI model recommends raw count for input, is there a argument to pass for using preprocessed (CPM & log1p transformed) count matrix?

Hi, thank you for your question. scVI only supports raw counts as input because the generative portion of the model uses either the negative binomial or Poisson distribution, both of which expect discrete values. Since transforming with log1p maps to continuous values, this is unsupported.

We also don’t support any sort of normalized input because the scVI model will either learn a latent size factor per cell or use the empirical total UMI counts per cell, both of which are needed in the generative portion to reconstruct counts.

2 Likes

Appreciate your explanation.