Regressing out isotype controls

Hi scvi-tools team,

thanks for the great work so far!

I’d like a recommendation on whether to regress out isotype controls or not. Simply excluding them from analysis does not seem sufficient, because > 10 actual protein ADTs have such low expression that unspecific binding dominates, making them look exactly like isotypes. By regressing isotype expression, I’d hope to remove this effect without having to manually excluding affecting ADTs. So here are my questions (happy to have answers also to a subset of them):

  1. Do you think regressing out isotypes is a good idea?
  2. When I write them to obsm, should I use raw UMI, log-normalized UMIs or perhaps a sum of all isotypes?
  3. Is it possible to get denoised counts with n_samples>1? As soon as I set n_samples to anything except 1, I get the error “RuntimeError: Tensors must have same number of dimensions: got 3 and 2”.
  4. As a toy example, I also tried regressing out gene expression instead of protein signal, as done in your scvi-tools paper. Instead of highly-expressed sex-specific markers, however, I tried CD4 and CD8A, but saw hardly any effect. Do you think regressing gene expression only works for highly expressed genes or am I doing something wrong?

Thanks and keep up the good work!


It seems to me that you might consider removing these 10 protein features.

What procedure would you use to regress them out? I think it’s more complicated by the fact that you’d want to regress on the correct conjugate isotype control for each protein feature? Usually the background is much lower in magnitude than the signal, so I don’t think it would have too strong of an impact on the latent representation of totalVI for example. Also can you show that the isotype variation is cell type specific?

Which function are you using, should work with TOTALVI.get_normalized_expression

These are typically very lowly expressed genes? Also it may not work well without including a panel of genes that are highly coregulated/correlated.

Thanks for the answers, I’m running a few experiments and will get back to you next week.

About the conjugate antibodies: I have found all isotype controls to look identical, and conclude the unspecific binding is the same for all antibodies irrespective of the heavy/light chains. So I could regress out all isotypes, or I could sum them up to an aggregate isotype representing non-specific binding.

And yes, I used TOTALVI.get_normalized_expression and got errors with n_samples=25 while it works well with n_samples=1. If I find time, I’ll see if I can reproduce the error with a minimal example, then we’ll know if the code is off or my data set is strange.

Talk to you next week!



Hi @adamgayoso,

As promised I’m coming back to you.

I tried regressing out all isotypes and indeed found this removed the grouping of isotype signal in UMAP nicely. I’m not sure though whether this really improves biology much, and as you say it might be important for other data sets to regress the conjugated isotypes for each gene specifically (which is currently not possible with totalVI). So I am glad to see it works on my data because all isotypes look so similar (unspecific binding is the same for all antibody chains), but would recommend everyone reading this to first analyze data without regressing isotypes and only if you feel it’s necessary to experiment more with this. If some cells bind a lot of isotype controls, these might also be dead or dying cells, or antibody aggregates, which we would want to exclude rather than pulling them into the normal cells by regression.

For the error with TOTALVI.get_normalized_expression I have not had the time to cook up a minimal example, and for now assume I am doing something wrong.

Thanks for your swift reply and all the insights!