Setting num_workers in DataLoader for improving train performance

Hi!

Thanks for the great ecosystem of scVI tools! It has become an essential part of my work.

I recently shifted my workload to a server with A100 GPU and when I run model.train(), I get the following message:

/env/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=9` in the `DataLoader` to improve performance.
/env/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=9` in the `DataLoader` to improve performance.

I tried setting scvi.settings.dl_num_workers = 9 (as I was using 10 cpu cores in this instance) and I still get the above message. Input data is 120,000 cells x 8000 genes.

I understand that the message is only a suggestion, but it would be great if the training time can be reduced by taking full advantage of the available resources.

I am using the py3.11-cu11-devel-latest Docker build of scvi-tools downloaded ~ 4 days ago.

Thanks for any help!

Hi, thanks for you question! I haven’t had too much success increasing the number of dataloader workers when training models - in my hands it hasn’t made too much of a difference, which is weird since most of our models are primarily data transfer bottlenecked.

Have you had the chance to measure GPU utilization/power consumption when increasing the number of workers? I would be curious to know.

Hey, I looked into this and it turns out that we don’t apply settings.dl_num_workers correctly in our dataloader, which is probably why you’re seeing the warning message. I have a fix here which will be released in scvi-tools 1.1 soon.