Assessing scVI fit by gene

mxposed · June 22, 2021, 7:12pm

Hi,

Again, thank you for the great package!
If I get it right, because scVI model is a VAE, during fitting it reconstructs expression data for cells based on latent z, cell UMI count and whatever batch variables passed to the model.
How best to determine if the fit turned out to be good? Are there recommended cutoffs on the absolute error in reconstruction? Can the error be assessed on per-gene level?

Thank you

saketkc · June 23, 2021, 12:55pm

(Not part of scVI dev team)

One way I have assessed this in the past is my looking at the mean-mean, variance-variance and mean-variance relationship. For each gene, you can ask if the reconstructed expression captures these relationships.

You can also ask which genes are harder to reconstruct (have higher reconstruction error).
The reconstruction error itself carries a lot of information. For example. clustering on the reconstruction error per-gene per-cell captures the latent space based clustering (to an extent).

As a first contributor I am only allowed one image for this post, but I summarized these observations here in case you are interested.

mxposed · June 23, 2021, 2:58pm

Haha, I have your link bookmarked since you posted it. Finally it’s time to carefully read it : )
Thank you

adamgayoso · June 23, 2021, 3:59pm

I do endorse the answer by @saketkc. One caveat though is the shape of the NB distribution will mean that genes with higher means will be more difficult to reconstruct (by definition have higher reconstruction error, or lower log likelihood). Another potential idea is to look at posterior dispersion indices ([1605.07604] Posterior Dispersion Indices). This might better control for this slightly. I can post code for how to get these soon.

Topic		Replies	Views
Autoencoder gene expression reconstruction accuracy scvi-tools	1	377	June 1, 2022
How to interpret reconstruction loss increase but elbo loss decrease on trained scvi and scanvi models scvi-tools	1	70	May 11, 2025
How to compare different parameter sets using the validation loss? Help integration , scvi	6	806	August 26, 2024
Model complexity selection scvi-tools modeling	5	834	February 10, 2022
Ablating latent variables in LinearSCVI scvi-tools	3	44	February 6, 2025

Assessing scVI fit by gene

Related topics