So I have been running training couple models recently and I encountered this one problem. scvi has no problem detecting CUDA or the GPU. But the problem is that it is using about 25% of GPU and 20% of VRAMs. Have any of y’all encountered a problem like this? Do you have any tricks to increase the utilization rate?
This is a screenshot of nvtop when I am autotuning for hyperparameters.
Increase your batch_size during training, and the GPU will be more utilized. You will also save runtime.
Thanks for the tip! I was able to get twice the utilization with batch size of 1024. As of 1.4.0.post1 the autotuner somehow does not respect scvi.settings.batch_size and has to be passed as a train_params.
It’s fine, it is a train parameter.
As for the settings batch_size, it is used only if the model’s input is None (the default is a number), but mainly used for the downstream analysis functions. The model expect a valid batch_size, has one as input and you can tune it as you saw.
Note to use ray<2.5.1 (as the most recent version has some issue with scvi-tools right now)