Increase scVI integration speed

Valentine_Svensson · September 25, 2023, 3:34pm

Hi ConDem94,

If you have a (very) large amount of RAM, you can convert the sparse count matrix (.X in the AnnData object) to a dense array. During training, a minibatch of cells are selected and the corresponding ‘slice’ of gene molecule counts is converted one the fly to a dense matrix. This conversion step, which happens on the CPU, is a bottleneck for the speed of training. If you convert the .X matrix to
a dense you won’t need to do this conversion on the fly. Remember, however, for a dense matrix you will need 32 bits of RAM for each cell, gene -pair. So for your data, if you have 30,000 genes, you will need 32 bits * 800,000 * 30,000 = 96 gb of RAM to hold the dense UMI count data.

Epochs will run faster with larger minibatch sizes than the default. However, my experience is that this causes training to need more epochs before reaching the same reconstruction error. I tried to optimize the minibatch size, and found that the default was optimal across a few different datasets at reaching lower reconstruction error with faster wall clock time. I would recommend you also evaluate this, but make sure to measure time until finished as well as keeping track of the loss curves for the different minibatch sizes you’re evaluating.

Hope this helps!

/Valentine

Topic		Replies	Views
SCVI tools with large datasets scvi-tools	3	753	May 31, 2024
Using low precision matrix multiplication to boost performance scvi-tools	6	278	October 17, 2024
scVI with large datasets scvi-tools	4	305	September 24, 2024
Suggestion on parameters for training scvi model scvi-tools integration , scvi	3	1705	December 4, 2023
scVI data set size runtime question scvi-tools scvi	4	1067	February 18, 2022

Increase scVI integration speed

Related topics