Is older advice for estimating optimal number of epochs for model training of scVI still recommended?

Hi Team,

Earlier docs for scVI mention that the number of epochs should scale downwards with number of obs/cells. E.g: “([For] > 100,000 cells) you may only need ~10 epochs.” I cannot find similar advice on the modern docs. Is this advice still valid and should even fewer epochs be used for even larger datasets?

Thank you in advance,

Tim

I don’t think it’s relevant anymore, and it really depends on what you are doing.

You will need to train until convergence, and you can use the early_stopping parameter for that, but if you are starting from an already trained large model and only finetune it with your cells, yes might be that only a few epochs will be enough.

Thank you. If I am training a completely new model, what could be unintended consequences of not using the ‘early_stopping’ parameter to ensure convergence?

It will just train for as many epochs as you enter, possibly leading to overfitting.

You should check your loss curves while doing this.

Also for models trained on 30 million cells of the CELLxGENE census going below 10 epochs was not good for results. We saw little improvement beyond 10 epochs though but decided to train for 20 epochs.

Thank you, is it normal for it to keep improving for hundreds of epochs? I used early_stopping=True but it still did all 400 epochs. The loss curves seem quite normal to me:

The curves you attached actually show overfit.

However, the reconstruction loss doesn’t tell the whole story as it is only one part of ELBO (the other is KL divergence), so probably the general loss or negative ELBO is still decreasing, and this is why, despite you using early_stopping, it made the whole 400 epochs.