Iteration consumes memory

Dear all,

I have a AnnData object with data from six patients. When I loop through the patient ids, subset the data, and compute the standard scVI workflow the first patient id needs about 3 minutes to complete. However, the second patient subset takes hours. But if I just run the second patient id on its own it only takes 3 minutes as well.

I think that issue is similar to the circular referencing problem with AnnData. (Anndata not properly garbage collected · Issue #360 · scverse/anndata · GitHub).

I highly appreciate your input on how to ensure that a scVI process is removed from memory. For example in the setting when I looping through patient ids.

Many thanks for your help!

Best, Florian

Hi, I don’t think your workflow becomes clear. Why do you want to run it seperately per patient? Do you see full GPU or system RAM?
This could explain the slowdown. Disabling pinning of memory on GPU would then help.

1 Like

Hi @cane11,

thank you very much for your time. I run scVI per patient and use the model for SOLO doublet detection. Currently I use CPU on our HPC (python v.3.9.19, scvi-tools v.1.1.6.post2).

With scvi.settings.num_threads=38 and in the Slurm --nodes=1, --ntasks=1, --cpus-per-task 38.

It feels like that after the training the model for the first patient id the CPU resources are still occupied. Do you have any idea what I could try to free the resources?

Many thanks,
Florian

I don’t know if that is good coding practice but importing scvi within the function and using Process seems to free up the CPU after training a model.

def scvi_worflow(adata, patient_id):

import scvi # Import modules 
scvi.settings.num_threads=8 # Set SCVI threads
adata_i = adata[adata.obs['patient_id']==patient_id] 

[...] # Train model etc. 

del adata_i, model_i
gc.collect()

return None 

from multiprocessing import Process
for patient_id in adata.obs[‘patient_id’].cat.categories:
p = Process(target=scvi_worflow, args=(adata, patient_id))
p.start()
p.join()

My first intuition is: Do you enable persistent workers? Without it using multiple jobs doesn’t really speed things up, with setting it the workers stay persistent even after training as we don’t kill the dataloader. Deleting the model should be sufficient though - is the gc step really needed? See What are the (dis) advantages of persistent_workers - #8 by albanD - vision - PyTorch Forums for a longer discussion. It would be helpful to get a more complete set of the script.

Hi @cane11, thank you very much for the pointers! I successfully failed to replicate the behavior with other and my own data. However, I realized that there were server updates running regarding the CPU distribution while the problem occurred. Maybe that caused the behavior but I can’t be sure. My apologies.