Running totalVI by batch

haerin0106 · February 14, 2024, 1:06pm

Hi! Thank you for the great program. I’ve been using totalVI very often on small datasets (~40 samples) with no problem. Recently we’ve acquired a CITE+RNA dataset of more than 100 samples with 10,000 cells each. The data was generated in pools of 4-5 samples each and batches of 4 pools (~20 samples). I believe totalVI has to be run on the full data at once, but the size of the dataset is making it difficult, reaching our computing limit. If running totalVI by batch is not an option, what would you recommend to make it less challenging?
Any help would be much appreciated! Thank you again for the amazing program.

martinkim0 · February 16, 2024, 5:41pm

Hi, thanks for your question. How long does it currently take to train totalVI on the full dataset? Or are you running into memory issues with loading the whole data? If your main concern is the total runtime of the algorithm, you could try the following to potentially speed up training time:

Since early_stopping is enabled by default in TOTALVI, you can modify its parameters such as early_stopping_patience, which controls the number of epochs before early stopping kicks in. Lowering this value will most likely decrease the number of training epochs. See more here. This should be adjusted after inspecting the validation loss.
Increase the batch_size. The default is set to 256, which is typically not enough to maximize GPU memory. Increasing this value and monitoring GPU utilization can potentially improve runtime.
Increase the learning rate to speed up convergence. This should also be done carefully by inspecting loss curves and making sure the model does not diverge.

Topic		Replies	Views
SCVI tools with large datasets scvi-tools	3	731	May 31, 2024
Impact of batch on TotalVI results scvi-tools totalvi	1	66	January 5, 2025
Scvi-tools with multiple GPUs scvi-tools scvi , gpu	2	1135	August 11, 2021
scVI with large datasets scvi-tools	4	276	September 24, 2024
Increase scVI integration speed scvi-tools integration	5	1009	October 24, 2023

Running totalVI by batch

Related topics