Scale by total protein counts before running totalVI (to eliminate a batch effect of having different sequencing depths)

Hello,
I am trying to scale by total protein counts before running totalVI to eliminate the batch effect).

More specifically,
Prior to running TotalVI with data integrated from two independent experiments (two batches), I would like to normalize or scale protein counts for each cells. Is there an equivalent to the scanpy normalization function for genes (scanpy.pp.normalize_total) for protein? Or, what is the best way to scale my protein count values prior to running TotalVI?

Hi, we don’t recommend scaling or normalizing protein counts prior to running totalVI since the generative model of totalVI uses a negative binomial distribution to model counts, thus normalized counts will not be suitable for this type of modeling. Hope this clarifies things.

That makes sense. But, even so, is there a way I can scale the protein count values and round up to the nearest integer prior to TotalVI?

This is not recommended. Downsampling would be fine but it’s throwing away part of your data with potentially negative side effects. Maybe we start a step earlier: why do you want to do it? Do you get low integration without doing these steps and using totalVI?