How to fix number of nodes used in HPC environment with scvi.train (via R & reticulate)?

Hi,

I’m using scvi via R using reticulate.

All cpu nodes of the clusters are used, I would like to restrain this behavior when I call model$train function.
Also I wanted to know if I should start the integration from counts or log normalized counts. It’s not clear to me.

Any ideas would be more than welcome,
thanks !

In fact I’m looking for an integration with singlecellexperiments and not the the seurat approach as described here

library(reticulate)
library(sceasy)

## subset to predefine list of HVGs
sce.combined <- sce.combined[top_hvgs,]

sc   <- import("scanpy", convert = FALSE)
scvi <- import("scvi", convert = FALSE)

## transform sce to anndata
adata <- sceasy::convertFormat(sce.combined, from="sce", to="anndata", main_layer="counts",transfer_layers = c("logcounts", "normcounts"),
 drop_single_values=FALSE)

## run setup_anndata
scvi$model$SCVI$setup_anndata(adata, batch_key = 'sample_id')

## create the model
model <- scvi$model$SCVI(adata)

## train the model
model$train(accelerator=as.character("cpu"),
 max_epochs = as.integer(10))

## get latent representation and normalized by this latent space
adata$obsm["X_scVI"] = model$get_latent_representation()
adata$obsm["X_normalized_scVI"] = model$get_normalized_expression()

# go back to sce to use known plot functions from R
sce <- SingleCellExperiment(
    assays      = list(X_normalized_scVI = t(reticulate::py_to_r(adata$obsm["X_normalized_scVI"] ))),
    colData     = reticulate::py_to_r(adata$obs),
    reducedDims = list(X_scVI = reticulate::py_to_r(adata$obsm["X_scVI"]))
)

## PCA using  expression values normalized by the latent space
sce <- scater::runPCA(sce, ncomponents = 30 , exprs_values="X_normalized_scVI",name = "PCA_SCVI")

## plot
r4 <- scater::plotReducedDim(sce, dimred="PCA_SCVI", colour_by="sample_id") + ggtitle("PCA_SCVI - sample") 

Trainer complains as :

lightning/pytorch/trainer/connectors/data_connector.py:425: The ‘train_dataloader’ does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argumenttonum_workers=79in theDataLoader to improve performance. Epoch 2/2: 100%|█| 2/2 [00:58<00:00, 29.19s/it, v_num=1, train_loss_step=8.36e+3, train_loss_Trainer.fitstopped:max_epochs=2` reached.

Hey @ZheFrench

scvi-tools models require the raw counts for integration.
other downstream analysis tasks might require the log-normalized (like in seurat or scanpy)

The trainer complains is about the dataloading number of workers.
You can set it up with something like:

run setup_anndata and adjust backend settings

scvi$model$SCVI$setup_anndata(adata)
scvi$settings$dl_num_workers = 79L
scvi$settings$persistent_workers = ‘True’ #try also with False
scvi$settings$num_threads = 3 #num cpus

create the model

model = scvi$model$SCVI(adata)

train the model

model$train()

This will run 3 cpu nodes.
There is some overhead for using workers and also the persistent workers might still be left alive and you’ll need to kill them manually.
Not sure if it will bring you any added advantage, but try it.

Thanks.
It gives me the following message and all the cores are still actived even setting dl_num_workers to a lower value :

Epoch 5/50: 8%| | 4/50 [01:01<12:41, 16.54s/it, v_num=1, train_loss_step=7.86e+3, train_loss/data2/USERS/anaconda3/envs/R-4.4.1/lib/python3.12/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.

run with dl_num_workers=1 and num_threads=1. do you see any change? how do you see all cores are taken? in what operating system are you?

Nope there is no impact setting one everywhere ,all cores run ~ 100 % .
I work on a Ubuntu 18.04.6 LTS (GNU/Linux 5.4.0-150-generic x86_64) system with 80 cores and no job scheduler.

Can you check CPU use with:

export OMP_NUM_THREADS=8
export MKL_NUM_THREADS=8
import torch
import time

a = torch.randn(5000, 5000)
b = torch.randn(5000, 5000)

start = time.time()
for _ in range(20):
    torch.mm(a, b)
print("Done in", time.time() - start, "seconds")

Yep this work on 8 cpus as expected.

import os
os.environ[“OMP_NUM_THREADS”] = “8”
os.environ[“MKL_NUM_THREADS”] = “8”

Still SCVI doesn"t want to work only on the 8 cores only.

os$environ[“OMP_NUM_THREADS”] = “8”
os$environ[“MKL_NUM_THREADS”] = “8”
scvi$settings$dl_num_workers = 8L #(defautl 0)
scvi$settings$persistent_workers = ‘True’ #try also with False that is default
scvi$settings$num_threads = 8L #num cpus
tc$set_num_interop_threads = 8L

to change a system environment in R do it with
Sys.setenv(OMP_NUM_THREADS = "8")
Sys.setenv(MKL_NUM_THREADS = "8")
and not through the reticulate package and using os (a python package)