Get_normalized_expression function arguments

andrewjkwok · July 8, 2021, 10:59pm

Hi,

I had a few questions about the get_normalized_expression function that I hoped to get some clarification for. From the totalVI tutorial (https://docs.scvi-tools.org/en/stable/user_guide/notebooks/totalVI.html), n_samples is set to 25 and transform_batch is given the list of both datasets.

First, does n_samples refer to the number of cells (surely it can’t be number of biological samples, as the example dataset only has 2 individuals?), and if yes, why is the default 1 / why is the suggested number in the tutorial 25 / what might be a recommended number to set this as?

Second, how exactly should the transform_batch argument be used? I understand from the documentation and github (how to get corrected expression matrix after batch removal · Issue #786 · YosefLab/scvi-tools · GitHub) that it is about which batch to condition over. Intuitively, it seems to be that it would make the most sense to condition over all the batches as is also done in the tutorial, but would there be any situation where that might not be recommended?

Many thanks in advance.

adamgayoso · July 9, 2021, 4:30am

n_samples refers to Monte Carlo sampling for each cell. The normalized expression is a random variable, and we return the average over 25 samples in this case. It’s an unbiased estimate of the expectation, but you need a LOT more samples to reduce the variance of this estimate. So empirically, 25 just seemed to work well.

Generally, you would take all the batches, but you could have the case where one cell type is only seen in one batch. In this case, you’d want to call the function separately for that cell type, and not use the transform batch param in that case.

andrewjkwok · July 9, 2021, 9:27am

That’s super useful to know - thank you!

andrewjkwok · August 28, 2021, 12:24pm

Hello - just wanted to briefly follow up on this function. Is it possible to only get the denoised/normalised protein expression data matrix, and not the RNA one, or vice versa?

adamgayoso · August 30, 2021, 5:36pm

The method returns both, you can ignore the RNA denoised expression. It wouldn’t save any time really to reimplement in a way that only returns protein, so feel free to just ignore the RNA part for now.

andrewjkwok · September 1, 2021, 2:23pm

I see. The problem I’m running into is actually that I keep running out of memory, and was wondering whether returning only the protein expression might reduce the memory? If not I suppose then I don’t have options other than increase memory or forgo the denoised expression values?

adamgayoso · September 1, 2021, 2:58pm

You can do two things:

Reduce batch_size
Pass an argument to gene_list (e.g., a list of two genes, so only two genes are used)

Both of these steps will save you memory.

andrewjkwok · September 4, 2021, 10:23pm

Ah, number 2 makes a lot of sense - thank you!

andrewjkwok · September 9, 2021, 12:25pm

I tried the strategy of passing an argument to gene_list, but get an error:

ValueError: Value passed for key 'denoised_rna' is of incorrect shape. Values of layers must match dimensions (0, 1) of parent. Value had shape (92009, 3) while it should have had (92009, 4000).

Do you have any idea how this could be fixed?

adamgayoso · September 11, 2021, 2:46am

Probably a bug, would you be able to make an issue on GitHub? Thanks!

Topic		Replies	Views
Unbalanced cell types and transform_batch scvi-tools scvi	3	767	April 22, 2022
N_samples ( refers to Monte Carlo sampling for each cell) setting in totalVI scvi-tools totalvi	1	119	June 28, 2024
Batch key and categorical variables for get_normalized_expression() scvi-tools	2	157	November 4, 2025
What is the best way to extract a "full" batch effect corrected count matrix from scVI model? scvi-tools scvi	4	3546	August 16, 2023
How to extract batch-corrected expression matrix from trained scVI vae model scvi-tools scvi	5	1776	June 20, 2022

Get_normalized_expression function arguments

Related topics