Setting num_workers in DataLoader for improving train performance

nag_ku · November 18, 2023, 9:14am

Hi!

Thanks for the great ecosystem of scVI tools! It has become an essential part of my work.

I recently shifted my workload to a server with A100 GPU and when I run model.train(), I get the following message:

/env/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=9` in the `DataLoader` to improve performance.
/env/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=9` in the `DataLoader` to improve performance.

I tried setting scvi.settings.dl_num_workers = 9 (as I was using 10 cpu cores in this instance) and I still get the above message. Input data is 120,000 cells x 8000 genes.

I understand that the message is only a suggestion, but it would be great if the training time can be reduced by taking full advantage of the available resources.

I am using the py3.11-cu11-devel-latest Docker build of scvi-tools downloaded ~ 4 days ago.

Thanks for any help!

martinkim0 · November 20, 2023, 8:58pm

Hi, thanks for you question! I haven’t had too much success increasing the number of dataloader workers when training models - in my hands it hasn’t made too much of a difference, which is weird since most of our models are primarily data transfer bottlenecked.

Have you had the chance to measure GPU utilization/power consumption when increasing the number of workers? I would be curious to know.

martinkim0 · November 21, 2023, 12:10am

Hey, I looked into this and it turns out that we don’t apply settings.dl_num_workers correctly in our dataloader, which is probably why you’re seeing the warning message. I have a fix here which will be released in scvi-tools 1.1 soon.

Topic		Replies	Views
multi-GPU training scvi-tools gpu	1	533	October 13, 2021
How to fix number of nodes used in HPC environment with scvi.train (via R & reticulate)? scvi-tools integration	8	47	April 10, 2025
Multithreading error scvi-tools	1	262	July 16, 2024
scVI data set size runtime question scvi-tools scvi	4	1073	February 18, 2022
Scvi-tools with multiple GPUs scvi-tools scvi , gpu	2	1141	August 11, 2021

Setting num_workers in DataLoader for improving train performance

Related topics