Earlier docs for scVI mention that the number of epochs should scale downwards with number of obs/cells. E.g: “([For] > 100,000 cells) you may only need ~10 epochs.” I cannot find similar advice on the modern docs. Is this advice still valid and should even fewer epochs be used for even larger datasets?
I don’t think it’s relevant anymore, and it really depends on what you are doing.
You will need to train until convergence, and you can use the early_stopping parameter for that, but if you are starting from an already trained large model and only finetune it with your cells, yes might be that only a few epochs will be enough.
Thank you. If I am training a completely new model, what could be unintended consequences of not using the ‘early_stopping’ parameter to ensure convergence?
Also for models trained on 30 million cells of the CELLxGENE census going below 10 epochs was not good for results. We saw little improvement beyond 10 epochs though but decided to train for 20 epochs.
Thank you, is it normal for it to keep improving for hundreds of epochs? I used early_stopping=True but it still did all 400 epochs. The loss curves seem quite normal to me:
However, the reconstruction loss doesn’t tell the whole story as it is only one part of ELBO (the other is KL divergence), so probably the general loss or negative ELBO is still decreasing, and this is why, despite you using early_stopping, it made the whole 400 epochs.