Increase scVI integration speed

Hi ConDem94,

If you have a (very) large amount of RAM, you can convert the sparse count matrix (.X in the AnnData object) to a dense array. During training, a minibatch of cells are selected and the corresponding ‘slice’ of gene molecule counts is converted one the fly to a dense matrix. This conversion step, which happens on the CPU, is a bottleneck for the speed of training. If you convert the .X matrix to
a dense you won’t need to do this conversion on the fly. Remember, however, for a dense matrix you will need 32 bits of RAM for each cell, gene -pair. So for your data, if you have 30,000 genes, you will need 32 bits * 800,000 * 30,000 = 96 gb of RAM to hold the dense UMI count data.

Epochs will run faster with larger minibatch sizes than the default. However, my experience is that this causes training to need more epochs before reaching the same reconstruction error. I tried to optimize the minibatch size, and found that the default was optimal across a few different datasets at reaching lower reconstruction error with faster wall clock time. I would recommend you also evaluate this, but make sure to measure time until finished as well as keeping track of the loss curves for the different minibatch sizes you’re evaluating.

Hope this helps!

/Valentine