How to fix number of nodes used in HPC environment with scvi.train (via R & reticulate)?

ZheFrench · March 28, 2025, 10:10am

Hi,

I’m using scvi via R using reticulate.

All cpu nodes of the clusters are used, I would like to restrain this behavior when I call model$train function.
Also I wanted to know if I should start the integration from counts or log normalized counts. It’s not clear to me.

Any ideas would be more than welcome,
thanks !

In fact I’m looking for an integration with singlecellexperiments and not the the seurat approach as described here

library(reticulate)
library(sceasy)

## subset to predefine list of HVGs
sce.combined <- sce.combined[top_hvgs,]

sc   <- import("scanpy", convert = FALSE)
scvi <- import("scvi", convert = FALSE)

## transform sce to anndata
adata <- sceasy::convertFormat(sce.combined, from="sce", to="anndata", main_layer="counts",transfer_layers = c("logcounts", "normcounts"),
 drop_single_values=FALSE)

## run setup_anndata
scvi$model$SCVI$setup_anndata(adata, batch_key = 'sample_id')

## create the model
model <- scvi$model$SCVI(adata)

## train the model
model$train(accelerator=as.character("cpu"),
 max_epochs = as.integer(10))

## get latent representation and normalized by this latent space
adata$obsm["X_scVI"] = model$get_latent_representation()
adata$obsm["X_normalized_scVI"] = model$get_normalized_expression()

# go back to sce to use known plot functions from R
sce <- SingleCellExperiment(
    assays      = list(X_normalized_scVI = t(reticulate::py_to_r(adata$obsm["X_normalized_scVI"] ))),
    colData     = reticulate::py_to_r(adata$obs),
    reducedDims = list(X_scVI = reticulate::py_to_r(adata$obsm["X_scVI"]))
)

## PCA using  expression values normalized by the latent space
sce <- scater::runPCA(sce, ncomponents = 30 , exprs_values="X_normalized_scVI",name = "PCA_SCVI")

## plot
r4 <- scater::plotReducedDim(sce, dimred="PCA_SCVI", colour_by="sample_id") + ggtitle("PCA_SCVI - sample")

Trainer complains as :

lightning/pytorch/trainer/connectors/data_connector.py:425: The ‘train_dataloader’ does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argumenttonum_workers=79in theDataLoader to improve performance. Epoch 2/2: 100%|█| 2/2 [00:58<00:00, 29.19s/it, v_num=1, train_loss_step=8.36e+3, train_loss_Trainer.fitstopped:max_epochs=2` reached.

ori-kron-wis · March 30, 2025, 7:50am

Hey @ZheFrench

scvi-tools models require the raw counts for integration.
other downstream analysis tasks might require the log-normalized (like in seurat or scanpy)

The trainer complains is about the dataloading number of workers.
You can set it up with something like:

run setup_anndata and adjust backend settings

scvi$model$SCVI$setup_anndata(adata)
scvi$settings$dl_num_workers = 79L
scvi$settings$persistent_workers = ‘True’ #try also with False
scvi$settings$num_threads = 3 #num cpus

create the model

model = scvi$model$SCVI(adata)

train the model

model$train()

This will run 3 cpu nodes.
There is some overhead for using workers and also the persistent workers might still be left alive and you’ll need to kill them manually.
Not sure if it will bring you any added advantage, but try it.

ZheFrench · March 31, 2025, 9:52am

Thanks.
It gives me the following message and all the cores are still actived even setting dl_num_workers to a lower value :

Epoch 5/50: 8%| | 4/50 [01:01<12:41, 16.54s/it, v_num=1, train_loss_step=7.86e+3, train_loss/data2/USERS/anaconda3/envs/R-4.4.1/lib/python3.12/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.

ori-kron-wis · April 1, 2025, 11:44am

run with dl_num_workers=1 and num_threads=1. do you see any change? how do you see all cores are taken? in what operating system are you?

ZheFrench · April 1, 2025, 12:16pm

Nope there is no impact setting one everywhere ,all cores run ~ 100 % .
I work on a Ubuntu 18.04.6 LTS (GNU/Linux 5.4.0-150-generic x86_64) system with 80 cores and no job scheduler.

cane11 · April 3, 2025, 3:36pm

Can you check CPU use with:

export OMP_NUM_THREADS=8
export MKL_NUM_THREADS=8
import torch
import time

a = torch.randn(5000, 5000)
b = torch.randn(5000, 5000)

start = time.time()
for _ in range(20):
    torch.mm(a, b)
print("Done in", time.time() - start, "seconds")

ZheFrench · April 3, 2025, 3:55pm

Yep this work on 8 cpus as expected.

import os
os.environ[“OMP_NUM_THREADS”] = “8”
os.environ[“MKL_NUM_THREADS”] = “8”

ZheFrench · April 8, 2025, 1:58pm

Still SCVI doesn"t want to work only on the 8 cores only.

os$environ[“OMP_NUM_THREADS”] = “8”
os$environ[“MKL_NUM_THREADS”] = “8”
scvi$settings$dl_num_workers = 8L #(defautl 0)
scvi$settings$persistent_workers = ‘True’ #try also with False that is default
scvi$settings$num_threads = 8L #num cpus
tc$set_num_interop_threads = 8L

ori-kron-wis · April 10, 2025, 6:01am

to change a system environment in R do it with
Sys.setenv(OMP_NUM_THREADS = "8")
Sys.setenv(MKL_NUM_THREADS = "8")
and not through the reticulate package and using os (a python package)

Topic		Replies	Views
scVI data set size runtime question scvi-tools scvi	4	1088	February 18, 2022
Parameters in training model for integrating datasets with scVI in R scvi-tools integration , scvi , model-fit	13	113	March 9, 2025
Setting num_workers in DataLoader for improving train performance scvi-tools scvi	2	2708	November 21, 2023
Not sure if scvi is using my GPU or not scRNA-seq integration , scvi	7	1217	June 13, 2023
Suggestion on parameters for training scvi model scvi-tools integration , scvi	3	1824	December 4, 2023

How to fix number of nodes used in HPC environment with scvi.train (via R & reticulate)?

run setup_anndata and adjust backend settings

create the model

train the model

Related topics