Regressing out isotype controls

FelixTheStudent · March 10, 2022, 12:37pm

Hi scvi-tools team,

thanks for the great work so far!

I’d like a recommendation on whether to regress out isotype controls or not. Simply excluding them from analysis does not seem sufficient, because > 10 actual protein ADTs have such low expression that unspecific binding dominates, making them look exactly like isotypes. By regressing isotype expression, I’d hope to remove this effect without having to manually excluding affecting ADTs. So here are my questions (happy to have answers also to a subset of them):

Do you think regressing out isotypes is a good idea?
When I write them to obsm, should I use raw UMI, log-normalized UMIs or perhaps a sum of all isotypes?
Is it possible to get denoised counts with n_samples>1? As soon as I set n_samples to anything except 1, I get the error “RuntimeError: Tensors must have same number of dimensions: got 3 and 2”.
As a toy example, I also tried regressing out gene expression instead of protein signal, as done in your scvi-tools paper. Instead of highly-expressed sex-specific markers, however, I tried CD4 and CD8A, but saw hardly any effect. Do you think regressing gene expression only works for highly expressed genes or am I doing something wrong?

Thanks and keep up the good work!
Best,

Felix

adamgayoso · March 10, 2022, 6:58pm

It seems to me that you might consider removing these 10 protein features.

What procedure would you use to regress them out? I think it’s more complicated by the fact that you’d want to regress on the correct conjugate isotype control for each protein feature? Usually the background is much lower in magnitude than the signal, so I don’t think it would have too strong of an impact on the latent representation of totalVI for example. Also can you show that the isotype variation is cell type specific?

Which function are you using, should work with TOTALVI.get_normalized_expression

These are typically very lowly expressed genes? Also it may not work well without including a panel of genes that are highly coregulated/correlated.

FelixTheStudent · March 11, 2022, 2:41pm

Thanks for the answers, I’m running a few experiments and will get back to you next week.

About the conjugate antibodies: I have found all isotype controls to look identical, and conclude the unspecific binding is the same for all antibodies irrespective of the heavy/light chains. So I could regress out all isotypes, or I could sum them up to an aggregate isotype representing non-specific binding.

And yes, I used TOTALVI.get_normalized_expression and got errors with n_samples=25 while it works well with n_samples=1. If I find time, I’ll see if I can reproduce the error with a minimal example, then we’ll know if the code is off or my data set is strange.

Talk to you next week!

Best,

Felix

FelixTheStudent · March 24, 2022, 9:23am

Hi @adamgayoso,

As promised I’m coming back to you.

I tried regressing out all isotypes and indeed found this removed the grouping of isotype signal in UMAP nicely. I’m not sure though whether this really improves biology much, and as you say it might be important for other data sets to regress the conjugated isotypes for each gene specifically (which is currently not possible with totalVI). So I am glad to see it works on my data because all isotypes look so similar (unspecific binding is the same for all antibody chains), but would recommend everyone reading this to first analyze data without regressing isotypes and only if you feel it’s necessary to experiment more with this. If some cells bind a lot of isotype controls, these might also be dead or dying cells, or antibody aggregates, which we would want to exclude rather than pulling them into the normal cells by regression.

For the error with TOTALVI.get_normalized_expression I have not had the time to cook up a minimal example, and for now assume I am doing something wrong.

Thanks for your swift reply and all the insights!

easyeryiji · June 6, 2025, 2:44am

Hi Felix,

I am facing the same issue with my CITE-seq data having excessive background noise. I’m currently exploring how to denoise this data and wondering if isotypes could be used for this purpose.

In your analysis, did you simply remove the 10 protein features? Could you share your specific procedures for handling this?

I would greatly appreciate it if you could elaborate on how to perform the regression with the 10 isotype controls.

Best regards，

Ji

Topic		Replies	Views
Comparing steps of Scanpy for scRNQ-seq and totalvi for CITE-seq scvi-tools totalvi	6	724	October 8, 2021
TOTALVI RNA/protein analysis for R users scvi-tools	5	574	April 9, 2021
Scvi - denoising single-cell/single-nucleus transcription data scvi-tools scvi	3	253	August 8, 2024
Questions about running differential expression scvi-tools	9	956	March 14, 2023
Running TOTALVI data in which subset of cells do not have citeseq data scvi-tools integration , totalvi	8	620	March 25, 2021

Regressing out isotype controls

Related topics