How to compare different parameter sets using the validation loss?


I am applying scVI to about 250k cells, and I use “sample” as the batch key to integrate the snRNA-seq data. I tried several different parameter sets and plotted validation losses for each model. But I didn’t know how to interpret the results, after reading the posts here and here.

model 1: n_hidden: 128, n_latent: 30, n_layers: 2, dropout_rate: 0.1, dispersion: gene, gene_likelihood: nb

The reconstuction error for this was: -433.33 (from model.get_reconstruction_error(indices=model.validation_indices))

model 2: n_hidden: 128, n_latent: 30, n_layers: 1, dropout_rate: 0.1, dispersion: gene, gene_likelihood: zinb

The reconstuction error for this was: -428.675

model 3. I found that some samples did not mix well after scVI. So, I tried to add more covariates including, percent.ribo, UMI count, and feature count, and here is the training curves.

The reconstuction error for this was: -444.29

Here are the training curves for the three models.

My questions:

  1. In model 1, under the default epoches, it seemed that reconstuction losses are flat but elbo losses are still decreasing. So, should I use more epoches here?

  2. Compared with model 1, the gap between test losses and train losses is larger in model 2, does this imply anything?

  3. In model 3, we can see both elbo losses and reconstruction losses converge. This means the model is better for my data, no? Also here the gap between test losses and train losses is much smaller than the above two. But the reconstruction error of model 3 is lower.

Thank you in advance for your help.

Hey there!

  1. Yeah, I would train the model for a couple more epochs to see if the ELBO flattens out more. In general, we’re more interested in the ELBO converging over other metrics such as the reconstruction.

  2. It does imply that model 2 is overfitting slightly more, but with that small of a difference, I wouldn’t worry about it too much. It’s only a matter of concern when the validation loss is much larger compared to the training one, and/or the validation loss goes up while training decreases. As always, you should supplement inspecting loss curves with other evaluations such as plotting UMAPs or computing integration metrics.

  3. I would plot model 3’s curves on the same y-scale as the other models, and I think you’ll find that it looks similar or worse to the other models. Particularly, the validation reconstruction loss fluctuating around a lot means that the model is not training as stably.

In general, I’d recommend training models 1 & 2 for a couple more epochs to see when the ELBO converges. Hope this helps!

1 Like

Hey Martin,

Thank you very much for the explanations! It helps a lot.

For model 1, I increased the epoches to 100 and below is the training curve.

So, even at 100 epoch the ELBO didn’t converge, should I increase the epoch further? I mean, the default max epoch is only 33. Also, the reconstruction loss goes up after 50 epoches.

The difference between model 3 and model 1/2 is that I added more covariates to correct for, and you mentioned that the training is not stable. I was wondering what I should do to make it better if I want to use this model.



I tried early_stoping=True and checked the traing curves. It seems that the ELBO loss became flat after 200 epoches, but reconstruction loss went up after 50 epoches. @martinkim0 I was wondering if I should only focus on the ELBO convergence.