Get_normalized_expression causes my kernel to disconnect

eviezh · August 18, 2025, 2:05pm

Hey everyone! fairly new user of scvi-tools and sCANVI here.

Every time I run get_normalized_expression on my sCANVI query model, my kernel says it disconnects (which I believe is likely a memory issue?)

This is actually my second time running the model; the first time, get_normalized_expression ran with no issues. Between running the models, I updated my scvi from 1.3.1.post1 to 1.3.3. The main changes I implemented between the new and old models are:

the new model has linear_classifier = True
the new model has max_epochs = 100 instead of 20

(I implemented both of these changes in the sCANVI reference model, but I am running get_normalized_expression on the query model. I made these changes due to suggestions I saw, because previously my prediction accuracy on reference data was quite low.)

Everything else stayed the same, including the number of cells and genes. The anndata object I am working with has ~1.5 million cells and 5000 genes.

For troubleshooting, I tried changing my batch size from 128 to 32 for get_normalized_expression, which didn’t work. I also tried running it with a limited gene_list; it worked for both 10 and 20 genes. I haven’t tried anything greater than 20, besides when I ran it on all my genes (5000).

Please give me suggestions and troubleshooting tips!

ori-kron-wis · August 19, 2025, 7:13am

Hello,

I understand from you that the model was trained successfully (on ref? query? both?) and the issue is with memory during get_normalized_expression on the query. Reducing number of genes helps and this strengthen the reason its a memory issue .

I wouldn’t reduce it that much though, a common practice is to select the top ~2000 Highly variable genes.

In order to save memory, first of all you can separate your scripts to 2 parts (even 3) for the training and query part. Once your model is trained (seems it does), Save it and clear the memory of that script/notebook (and any other redundant thing on the machine). Then load it back and continue on a fresh script. If you insist on doing it all in same script, you can delete redundant objects or do gc.collect() during the code. it helps.

Another thing you can do is to remove validation step, if exists (make sure check_val_every_n_epoch is None and early_stopping is False). You also mentioned the anndata size, but is it used for training or query? if its the query , maybe you can do it in smaller chunks each time?

As for running the get_normalized_expression itself, you can also do it with chunks (you can specify the indices to run on), but it might be the same as query chunk of data I mentioned before.

All of this actions can help you to save memory.

Running 1.5M cells with 5K genes, requires a lot of memory , so monitor its consumption during running your code with htop for example.

Finally go over this tutorials of reference mapping with scanvi, see it matches yours.

Topic		Replies	Views
Get_normalized_expression function arguments scvi-tools totalvi	9	2652	September 11, 2021
Error when using 'get_normalized_expression' scvi-tools scvi	0	403	January 5, 2023
Not sure if scvi is using my GPU or not scRNA-seq integration , scvi	7	1258	June 13, 2023
Extracting Normalized Expression from Trained Model on Variable Subset scvi-tools scvi	2	191	March 14, 2024
Clustering subsets of cells scvi-tools scvi , clustering	3	1316	November 15, 2021

Get_normalized_expression causes my kernel to disconnect

Related topics