How to extract batch-corrected expression matrix from trained scVI vae model


As titled, I would like to extract batch-corrected expression matrix from trained scVI vae model for down-stream analysis.
Could someone tell me how to do it?

Thank you!

Hey Hsu-Che-Wei,

  1. If you just want an integrated space, you can use model.get_latent_representation() and it will be batch corrected (assuming you passed in a batch covariate while training scvi).

  2. If you do want a batch corrected expression matrix (this hasn’t been extensively tested so proceed carefully), you can pass in the batch you’d like to project your data to with the transform_batch argument in model.get_normalized_expression(). Note that you should probably only project to batches that have a good representation of your data (e.g. all your celltypes are well represented in the batch) or else it will be making out of sample predictions. Also, if you pass a list of batches to transform_batch, it will average the expression over all the batches you pass in.

Hope this helps!


Hi Galen,

Thanks for the reply! It helps.
I have another question, if all of my batches are of good representation, then could I simply get the batch-corrected matrix by calling “corrected_matrix = model.get_normalized_matrix()” ?


This by default will give expression that is not batch corrected. The reason is that the decoder in scVI can be represented by a function f(z, s) where z is the low-dimensional representation and s is the batch indicator. Therefore, you need to use the transform_batch argument to get an output that is batch-corrected.

I see, thanks a lot!

I am following the same steps to evaluate the normalized gene expression values. What are the values of “transform_batch” argument and how it could be used for batch corrected gene expression?