Getting normalized expression

Kray · May 6, 2026, 3:20pm

Hi,

I am working on a big sized scRNAseq atlas with 2 million cells. I want to get the normalized expression. However, since it returns a data frame/numpy array, I run out of memory every time I am trying to retrieve the normalized expressions. I need it for performing DE. Is there any other way to get it? or perform DE between clusters without using it?

Thanks,
Kam

ori-kron-wis · May 7, 2026, 8:41am

Yes, of course, there’s a direct way to run DE from a scVI-trained model, without the need to get_normalization_expression first: model.differential_expression(…), and you can state if you want to do group vs group, group vs all , by which groups, and so on..

If you still need the whole normalized expression itself, and do not have enough memory, you can extract it in smaller chunks of adatas.

Kray · May 7, 2026, 8:55am

Hi,

Thanks for your response. I tried the direct way using model.differential_expression. It throws an out of memory error due to the size of the adata probably. Is there a way to resolve this? or may be extracting the normalized data in smaller chunks might be useful, could you please guide me on how to do it?

ori-kron-wis · May 7, 2026, 9:11am

should be something like:

import numpy as np
import scipy.sparse as sp

chunk_size = 50000
all_chunks = []

for start in range(0, adata.n_obs, chunk_size):
    end = min(start + chunk_size, adata.n_obs)

    x = model.get_normalized_expression(
        adata=adata[start:end],
        return_numpy=True,
    )

    all_chunks.append(sp.csr_matrix(x))

which will store it in a sparse matrix.

Do you really need to run DE on all cells? usually we run group vs all/group, e.g:

de_df = model.differential_expression(
    groupby="cell_type",
    group1="B_cell",
    group2="T_cell",
)

Kray · May 7, 2026, 9:42am

Thanks a lot!
I am looking for cell specific markers so I have generated clusters and performing DE between each cluster vs other clusters.

Kray · May 7, 2026, 9:55am

Can I store the normalized expression as a sparse matrix in h5ad object for future use?

ori-kron-wis · May 7, 2026, 10:07am

Yes, that’s the idea.

For the clusters, then just replace groupby to the “cluster_column” and group 1 and 2 to the clusters ids

cane11 · May 7, 2026, 12:35pm

You will want to add a filter to set counts to zero below a certain threshold like 1e-5 to increase sparsity. However, we usually do not recommend using normalized counts for downstream tasks such as Wilcoxon or t-test.

Kray · May 7, 2026, 1:11pm

Is there an option to do that when getting the normalized expression?

ori-kron-wis · May 7, 2026, 1:25pm

you need to do that on raw data, as preprocessing with scanpy/anndata, before training the model

cane11 · May 12, 2026, 5:45am

I meant at this step for x you can set a filter and set counts to zero below this threshold.

cane11 · May 12, 2026, 5:46am

It might be also worth to compute posterior predictive samples - those are the counts sampled from the negative binomial distribution using the learned parameters from scVI.

Kray · May 12, 2026, 8:21am

Thanks a lot! Could you please tell me the use of computing the posterior predictive samples? I am new to this topic and learning about it.

cane11 · May 13, 2026, 9:54am

scvi.model.SCVI — scvi-tools and we described the use of it in the scvi-hub manuscripthttps://www.nature.com/articles/s41592-025-02799-9.

Topic		Replies	Views
Get_normalized_expression causes my kernel to disconnect scvi-tools scanvi	1	89	August 19, 2025
Batch key and categorical variables for get_normalized_expression() scvi-tools	2	158	November 4, 2025
How to use normalised expression scvi-tools scvi	0	751	December 19, 2023
Get_normalized_expression function arguments scvi-tools totalvi	9	2846	September 11, 2021
Imputation/Denoise output scvi-tools scvi	1	68	January 15, 2026

Getting normalized expression

Related topics