GPU available but not using

karenlawwc · March 9, 2023, 1:26am

Hi,

Thanks for the great tool! I am going through the tutorials and are trying to invoke GPU to run the analysis for training. My system should have the Nvidia GPU, and I should have installed a version of PyTorch that supports it. However, when I did vae.train(), it indicated that GPU is available in my environment but not using to run the training step.

I am wondering how would I be able to utilize the GPU detected in the environment? Is there any extra parameter that I should set?
Here below are the warning messages and a screenshot of what I mentioned here. Thank you very much for the help in the community!

INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: False
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
/wynton/home/fong/karenlawwc/miniconda3/envs/scanpy_env/lib/python3.10/site-packages/pytorch_lightning/trainer/setup.py:176: PossibleUserWarning: GPU available but not used. Set accelerator and devices using Trainer(accelerator='gpu', devices=2).
rank_zero_warn(

martinkim0 · March 9, 2023, 5:47pm

Hi, thank you for your question. Could you try passing in train(use_gpu=True) and checking if the problem still occurs?

karenlawwc · March 9, 2023, 8:40pm

Hi, thanks for the reply Martin!

When I tried vae.train(use_gpu=True), the following error occurred.

MisconfigurationException: MPSAccelerator can not run on your system since the accelerator is not available. The following accelerator(s) is available and can be passed into accelerator argument of Trainer: [‘cpu’, ‘cuda’]

I am guessing it would be a system issue, not anything I can solve by adding/changing additional parameter?

Thank you!

martinkim0 · March 9, 2023, 10:04pm

Hmm I see, I’ve actually come across a similar error before where Lightning tries to use MPS on Linux, so it could be on their side of things.

We have a pull request scheduled for our next minor release (0.20.2) that allows you to directly pass in accelerator and devices into Trainer. We currently parse use_gpu into those arguments on our end, and something could have gone wrong there. This could solve your issue, but it won’t be out for another week or two.

For now, I would recommend creating a new environment with CUDA PyTorch as well as the latest scvi-tools release (0.20.1) and seeing if the issue still persists. Usually creating a fresh environment solves GPU detection issues for me, let me know if it works for you!

karenlawwc · March 9, 2023, 11:16pm

Thank you Martin, I will first try to install the latest release (0.20.1) to create a fresh environment and then see if it will solve the issue. If not, I will try again when the next release is out for passing accelerator into Trainer!

Another question, I tried running the training and subsequent steps without GPU and it went great. However, I got stuck at the Differential gene expression step and here is the error. I am hoping you could possibly help me identify how to modify my adata to make it work since it went fine with the tutorial PBMC dataset.

I already filtered the adata so that protein and RNA share the same cells. The full traceback are below. Somehow the index of the adata is messed up and is throwing the error?

DE…: 0%| | 0/18 [00:39<?, ?it/s]

KeyError Traceback (most recent call last)
Cell In [25], line 1
----> 1 de_df = vae.differential_expression(groupby = “cell_calls”)
2 de_df.head(5)

File ~/miniconda3/envs/scanpy_env/lib/python3.10/site-packages/scvi/model/_totalvi.py:763, in TOTALVI.differential_expression(self, adata, groupby, group1, group2, idx1, idx2, mode, delta, batch_size, all_stats, batch_correction, batchid1, batchid2, fdr_target, silent, protein_prior_count, scale_protein, sample_protein_mixing, include_protein_background, **kwargs)
749 model_fn = partial(
750 self._expression_for_de,
751 scale_protein=scale_protein,
(…)
755 batch_size=batch_size,
756 )
757 col_names = np.concatenate(
758 [
759 np.asarray(_get_var_names_from_manager(adata_manager)),
760 self.protein_state_registry.column_names,
761 ]
762 )
→ 763 result = _de_core(
764 adata_manager,
765 model_fn,
766 groupby,
767 group1,
768 group2,
769 idx1,
770 idx2,
771 all_stats,
772 cite_seq_raw_counts_properties,
773 col_names,
774 mode,
775 batchid1,
776 batchid2,
777 delta,
778 batch_correction,
779 fdr_target,
780 silent,
781 **kwargs,
782 )
784 return result

File ~/miniconda3/envs/scanpy_env/lib/python3.10/site-packages/scvi/model/base/_utils.py:267, in de_core(adata_manager, model_fn, groupby, group1, group2, idx1, idx2, all_stats, all_stats_fn, col_names, mode, batchid1, batchid2, delta, batch_correction, fdr, silent, **kwargs)
265 res = res.sort_values(by=sort_key, ascending=False)
266 if mode == “change”:
→ 267 res[f"is_de_fdr{fdr}"] = _fdr_de_prediction(res[“proba_de”], fdr=fdr)
268 if idx1 is None:
269 g2 = “Rest” if group2 is None else group2

File ~/miniconda3/envs/scanpy_env/lib/python3.10/site-packages/scvi/model/base/_utils.py:288, in _fdr_de_prediction(posterior_probas, fdr)
286 raise ValueError(“posterior_probas should be 1-dimensional”)
287 sorted_genes = np.argsort(-posterior_probas)
→ 288 sorted_pgs = posterior_probas[sorted_genes]
289 cumulative_fdr = (1.0 - sorted_pgs).cumsum() / (1.0 + np.arange(len(sorted_pgs)))
290 d = (cumulative_fdr <= fdr).sum()

File ~/miniconda3/envs/scanpy_env/lib/python3.10/site-packages/pandas/core/series.py:984, in Series.getitem(self, key)
981 key = np.asarray(key, dtype=bool)
982 return self._get_values(key)
→ 984 return self._get_with(key)

File ~/miniconda3/envs/scanpy_env/lib/python3.10/site-packages/pandas/core/series.py:1019, in Series._get_with(self, key)
1015 if key_type == “integer”:
1016 # We need to decide whether to treat this as a positional indexer
1017 # (i.e. self.iloc) or label-based (i.e. self.loc)
1018 if not self.index._should_fallback_to_positional:
→ 1019 return self.loc[key]
1020 else:
1021 return self.iloc[key]

File ~/miniconda3/envs/scanpy_env/lib/python3.10/site-packages/pandas/core/indexing.py:967, in _LocationIndexer.getitem(self, key)
964 axis = self.axis or 0
966 maybe_callable = com.apply_if_callable(key, self.obj)
→ 967 return self._getitem_axis(maybe_callable, axis=axis)

File ~/miniconda3/envs/scanpy_env/lib/python3.10/site-packages/pandas/core/indexing.py:1194, in _LocIndexer._getitem_axis(self, key, axis)
1191 if hasattr(key, “ndim”) and key.ndim > 1:
1192 raise ValueError(“Cannot index with multidimensional key”)
→ 1194 return self._getitem_iterable(key, axis=axis)
1196 # nested tuple slicing
1197 if is_nested_tuple(key, labels):

File ~/miniconda3/envs/scanpy_env/lib/python3.10/site-packages/pandas/core/indexing.py:1132, in _LocIndexer._getitem_iterable(self, key, axis)
1129 self._validate_key(key, axis)
1131 # A collection of keys
→ 1132 keyarr, indexer = self._get_listlike_indexer(key, axis)
1133 return self.obj._reindex_with_indexers(
1134 {axis: [keyarr, indexer]}, copy=True, allow_dups=True
1135 )

File ~/miniconda3/envs/scanpy_env/lib/python3.10/site-packages/pandas/core/indexing.py:1330, in _LocIndexer._get_listlike_indexer(self, key, axis)
1327 ax = self.obj._get_axis(axis)
1328 axis_name = self.obj._get_axis_name(axis)
→ 1330 keyarr, indexer = ax._get_indexer_strict(key, axis_name)
1332 return keyarr, indexer

File ~/miniconda3/envs/scanpy_env/lib/python3.10/site-packages/pandas/core/indexes/base.py:5796, in Index._get_indexer_strict(self, key, axis_name)
5793 else:
5794 keyarr, indexer, new_indexer = self._reindex_non_unique(keyarr)
→ 5796 self._raise_if_missing(keyarr, indexer, axis_name)
5798 keyarr = self.take(indexer)
5799 if isinstance(key, Index):
5800 # GH 42790 - Preserve name from an Index

File ~/miniconda3/envs/scanpy_env/lib/python3.10/site-packages/pandas/core/indexes/base.py:5859, in Index._raise_if_missing(self, key, indexer, axis_name)
5856 raise KeyError(f"None of [{key}] are in the [{axis_name}]“)
5858 not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())
→ 5859 raise KeyError(f”{not_found} not in index")

KeyError: ‘[239, 238, 237, 235, 234, 233, 236, 231, 230, 229, 232, 245, 244, 243, 241, 240, 242, 255, 260, 259, 258, 257, 256, 254, 248, 252, 251, 250, 249, 253, 247, 246, 274, 273, 272, 271, 270, 269, 268, 266, 265, 264, 263, 262, 261, 267, 286, 293, 292, 291, 290, 289, 288, 287, 285, 284, 283, 282, 281, 280, 279, 278, 277, 276, 275, 307, 306, 305, 304, 303, 302, 301, 299, 298, 297, 296, 295, 294, 300, 317, 322, 321, 320, 319, 318, 315, 316, 314, 313, 312, 311, 310, 309, 308, 337, 336, 335, 334, 333, 332, 330, 331, 328, 327, 326, 325, 324, 323, 329, 348, 349, 350, 351, 355, 353, 354, 356, 352, 347, 344, 345, 343, 342, 341, 340, 339, 338, 346, 370, 369, 368, 367, 366, 365, 364, 361, 362, 360, 359, 358, 357, 363, 384, 383, 389, 382, 386, 387, 388, 385, 381, 375, 379, 378, 377, 376, 374, 373, 372, 371, 380, 405, 406, 407, 408, 413, 410, 411, 412, 404, 409, 402, 403, 400, 401, 390, 391, 392, 394, 393, 396, 397, 398, 399, 395, 433, 430, 431, 432, 434, 440, 436, 437, 438, 441, 429, 435, 428, 439, 426, 427, 414, 416, 417, 418, 419, 415, 421, 422, 423, 424, 425, 420, 453, 461, 460, 459, 458, 457, 456, 455, 454, 452, 462, 450, 449, 448, 447, 446, 445, 444, 443, 442, 451, 474, 476, 477, 481, 479, 480, 473, 478, 472, 475, 470, 469, 468, 467, 466, 465, 464, 463, 471, 500, 501, 502, 503, 504, 505, 506, 511, 508, 509, 512, 513, 514, 499, 507, 498, 510, 496, 482, 483, 485, 497, 486, 487, 488, 484, 490, 491, 492, 493, 494, 495, 489, 529, 530, 531, 532, 536, 534, 535, 537, 528, 533, 527, 525, 517, 524, 523, 522, 521, 520, 519, 518, 516, 515, 526, 556, 555, 553, 557, 554, 558, 564, 560, 561, 562, 563, 565, 559, 552, 538, 550, 551, 539, 540, 541, 543, 542, 545, 546, 547, 548, 549, 544, 584, … 32071, 32072, 32073, 32074, 32075, 32076, 32077, 32078, 32079, 32080, 32081, 32082, 32083, 32084, 32085, 32086, 32087, 32088, 32089, 32090, 32091, 32092, 32093, 32094, 32095, 32096, 32097, 32098, 32099, 32100, 32101, 32102, 32103, 32104, 32105, 32106, 32107, 32108, 32109, 32110, 32111, 32112, 32113, 32114, 32115, 32116, 32117, 32118, 32119, 32120, 32121, 32122, 32123, 32124, 32125, 32126, 32127, 32128, 32129, 32130, 32131, 32132, 32133, 32134, 32135, 32136, 32137, 32138, 32139, 32140, 32141, 32142, 32143] not in index’

adamgayoso · March 10, 2023, 3:52am

Issue will be fixed in the next release, but also check out my response here:

karenlawwc · March 10, 2023, 8:56pm

Thank you Martin and Adam!

Martin, I tried creating a new conda environment with CUDA PyTorch and the latest scvi-tools like you mentioned, GPU is now being detected and running. Thank you very much for your help!

As for the differential expression step, there is still error after modifying according to what Adam suggested, so hopefully can continue to work on making this work together. Thank you very much!

jesswhitts · March 17, 2023, 4:09pm

I’m having a similar issue, but with mac M1 GPU, should this be compatible? PyTorch should run with this I believe.

I created a new conda environment as per the scvi-tools set up instructions, and ran ‘conda install pytorch torchvision torchaudio -c pytorch’ and ‘conda install jax jaxlib -c conda-forge’ too. Now it still isn’t using the GPU, and I’m getting an error.

I’m following the introductory tutorial, code used:

(scvi-env) jwhittle@BSC00783 ~ % pip install --quiet scvi-colab
(scvi-env) jwhittle@BSC00783 ~ % python
Python 3.9.16 | packaged by conda-forge | (main, Feb 1 2023, 21:38:11)
[Clang 14.0.6 ] on darwin
Type “help”, “copyright”, “credits” or “license” for more information.

from scvi_colab import install
install(run_outside_colab=True)
INFO scvi-colab: Installing scvi-tools.
INFO scvi-colab: Install successful. Testing import.
Global seed set to 0
/Users/jwhittle/opt/anaconda3/envs/scvi-env/lib/python3.9/site-packages/flax/struct.py:132: FutureWarning: jax.tree_util.register_keypaths is deprecated, and will be removed in a future release. Please use register_pytree_with_keys() instead.
jax.tree_util.register_keypaths(data_clz, keypaths)
/Users/jwhittle/opt/anaconda3/envs/scvi-env/lib/python3.9/site-packages/flax/struct.py:132: FutureWarning: jax.tree_util.register_keypaths is deprecated, and will be removed in a future release. Please use register_pytree_with_keys() instead.
jax.tree_util.register_keypaths(data_clz, keypaths)

import scanpy as sc
import scvi
sc.set_figure_params(figsize=(4, 4))
adata = scvi.data.heart_cell_atlas_subsampled()
INFO File data/hca_subsampled_20k.h5ad already downloaded
sc.pp.filter_genes(adata, min_counts=3)
adata.layers[“counts”] = adata.X.copy()
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
adata.raw = adata
scvi.model.SCVI.setup_anndata(
… adata,
… layer=“counts”,
… categorical_covariate_keys=[“cell_source”, “donor”],
… continuous_covariate_keys=[“percent_mito”, “percent_ribo”],
… )
model = scvi.model.SCVI(adata)
model
SCVI Model with the following params:
n_hidden: 128, n_latent: 10, n_layers: 1, dropout_rate: 0.1, dispersion: gene,
gene_likelihood: zinb, latent_distribution: normal
Training status: Not Trained
Model’s adata is minified?: False

model.train()
GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/Users/jwhittle/opt/anaconda3/envs/scvi-env/lib/python3.9/site-packages/pytorch_lightning/trainer/setup.py:201: UserWarning: MPS available but not used. Set accelerator and devices using Trainer(accelerator='mps', devices=1).
rank_zero_warn(
Epoch 1/400: 0%| | 0/400 [00:00<?, ?it/s][E thread_pool.cpp:109] Exception in thread pool task: mutex lock failed: Invalid argument
[E thread_pool.cpp:109] Exception in thread pool task: mutex lock failed: Invalid argument
[E thread_pool.cpp:109] Exception in thread pool task: mutex lock failed: Invalid argument
[E thread_pool.cpp:109] Exception in thread pool task: mutex lock failed: Invalid argument
[E thread_pool.cpp:109] Exception in thread pool task: mutex lock failed: Invalid argument
[E thread_pool.cpp:109] Exception in thread pool task: mutex lock failed: Invalid argument
[E thread_pool.cpp:109] Exception in thread pool task: mutex lock failed: Invalid argument
[E thread_pool.cpp:109] Exception in thread pool task: mutex lock failed: Invalid argument
[E thread_pool.cpp:109] Exception in thread pool task: mutex lock failed: Invalid argument
[E thread_pool.cpp:109] Exception in thread pool task: mutex lock failed: Invalid argument
[E thread_pool.cpp:109] Exception in thread pool task: mutex lock failed: Invalid argument
[E thread_pool.cpp:109] Exception in thread pool task: mutex lock failed: Invalid argument
[E thread_pool.cpp:109] Exception in thread pool task: mutex lock failed: Invalid argument
[E thread_pool.cpp:109] Exception in thread pool task: mutex lock failed: Invalid argument
zsh: bus error python
(scvi-env) jwhittle@BSC00783 ~ % /Users/jwhittle/opt/anaconda3/envs/scvi-env/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d ’

Can anyone help with this please?

Topic		Replies	Views
CUDA is available but Training scVI models is too slow scvi-tools scvi	4	131	December 4, 2024
M1 MAX: GPU available, but not used scvi-tools	4	1346	April 13, 2023
Parameters in training model for integrating datasets with scVI in R scvi-tools integration , scvi , model-fit	13	100	March 9, 2025
scVI - Help getting GPUs to work! Thanks! scvi-tools	9	1582	June 14, 2021
Help importing scvi in conda environment on new Mac M1 Help scvi	1	591	December 20, 2022

GPU available but not using

DE…: 0%| | 0/18 [00:39<?, ?it/s]

Related topics