Hi scvi-tools team,
Great piece of software.
I’m looking to benchmark some models/dataset w.r.t. imputation performance.
In your documentation, it is not immediately clear how to properly impute gene expression values using scVI.
From the scVI paper:
This mapping goes through intermediate values
ρ^n_g
, which provide a batch-corrected, normalized estimate of the percentage of transcripts in each celln
that originate from each geneg
. We used these estimates for differential expression analysis and its scaled version (multiplyingρ^n_g
by the estimated library sizeℓ_n
) for imputation.
I have surmised that ρ^n
and ℓ_n
can be obtained through the functions get_normalized_expression
and get_latent_representation
My question is in regards to the library_size
argument of the former function. In your user guide, you use a common library size. Hence my question: to benchmark imputation performance, should expression frequencing be scaled to latent library sizes or a common library size?
Thanks in advance!