I would check the ELBO curves and reconstruction_loss curves. Train loss step is a noisy estimate (it might be that some minibatches have lower loss and I would avoid interpreting it). Elbo is preferable of Loss as we use kl_warmup during model training (the weight of KL local in the total loss increases over time up to a maximum value of 1 - this is tracked in the kl_weight plot and is the reason why the kl_local loss increases and then decreases again). KL global tracks KL losses of parameters that are not computed per cell (e.g. technical variation in DestVI mdel). It depends on your downstream task whether you prefer low reconstruction loss (a good generative model) or low KL local (likely better integration). We use those curves mainly to see that training converged (Elbo reaches a plateau) and we don’t overfit (strong increase in validation losses with descreasing training losses). I hope this gives some overview. There are more general introductions to this topic in most ML introductions.