CytoVI took more than 24 hours to train

ghar1821 · November 19, 2025, 11:03pm

Hi there,

I’ve been testing CytoVI ( GitHub - YosefLab/cytovi-reference-implementation )on my data which have a bit over 3.8 million cells. I couldn’t get it to finish the training in 24 hours (I can’t run the job any longer than that). Is there a way of speeding it up? I’m already running it on a GPU.

Intron7 · November 19, 2025, 11:06pm

Did you try to reduce the precision? switching to TF32 or Bfloat16 might help.

ghar1821 · November 20, 2025, 11:44am

I don’t think that makes any difference, at least on the smaller data that has around 580,000 cells. It still takes 4.5 hours to finish. I switched to TF32 by using this code, not sure if it is the correct way to do it?

torch.backends.cuda.matmul.allow_tf32 = True

ori-kron-wis · November 20, 2025, 2:04pm

How about increasing the batch_size of the model (128 by default) to fully utilize the GPU and also reduce the num_of_epochs?

florianingelfinger · November 21, 2025, 8:59am

Ciao @ghar1821,

thanks for using CytoVI. The runtime you experience is far beyond expectations. I have attached our runtime benchmarking results (using the default parameters on a 35 marker panel). For 1M cells we expect around 30 min runtime.

As @ori-kron-wis suggested exhausting the GPU mem by increasing the batch size can help but I have the impression that there is another issue. As a first step I would recommend to use the scvi-tools implementation of CytoVI rather than the reference implementation (we had some performance fixes in between). In case this does not yield a massive speedup in training time, it would be nice if you could share a quick description of your data and the used hyperparameters plus on which computing resources the model is trained.

Cheers,

Flo

ghar1821 · November 22, 2025, 7:48am

I used the scvi-tools implementation and increased the batch_size to 8,192, and it is finishing quite quickly now! Thanks @ori-kron-wis, @florianingelfinger

Topic		Replies	Views
Running totalVI by batch scvi-tools	1	224	February 16, 2024
scVI data set size runtime question scvi-tools scvi	4	1274	February 18, 2022
CUDA is available but Training scVI models is too slow scvi-tools scvi	4	349	December 4, 2024
Using low precision matrix multiplication to boost performance scvi-tools	6	715	October 17, 2024
multiVI speed and cpu consumption Help multivi	2	93	August 30, 2025

CytoVI took more than 24 hours to train

Related topics