Hello scverse!
first time asking a question so let me know what I can improve
I’ve been trying to train the vae and solo models but using my MPS gpu throws the same error mentioned in this post (Error when training model on M3 Max MPS) so I’ve been going cpu only. It is absolutely slow because of the dataset size (700000 x 35000), but I was wondering if you all had any suggestions for things I could do to make sure this is going at the max possible speed.
I’ve been running this code
scvi.settings.dl_num_workers = 11
scvi.settings.batch_size = 2048
scvi.settings.num_threads = 10
scvi.model.SCVI.setup_anndata(adata)
vae = scvi.model.SCVI(adata)
vae.train()
if it helps, here is the startup output of the code above
GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/opt/miniconda3/envs/scanpy_env/lib/python3.9/site-packages/lightning/pytorch/trainer/setup.py:187: GPU available but not used. You can set it by doing `Trainer(accelerator='gpu')`.
/opt/miniconda3/envs/scanpy_env/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:436: Consider setting `persistent_workers=True` in 'train_dataloader' to speed up the dataloader worker initialization.
Specs: M2 Max, 94gb ram,
cpu usage during training: 50-65%
ram usage during training: 40~gb