I have a vague memory that in the original version of scvi, the counts were normalized somehow before being passed into the encoder, perhaps as log(1+CPM). My recollection is this was more numerically stable than passing raw counts directly. Is that still the case? I couldn’t find anything in the documentation or in the code itself.
Just a log(1+x) transform. It’s here:
And yes, more numerically stable!
Excellent, thanks a ton Adam!
Hi Adam, I can see that log(1+X) keep partial info of the raw data while increasing the numerical stability as you mentioned. Is there a specific reason that scVI does not use more common log2(1+CPM) or log2(1+TPM)?
Hi, we just use log(1 + x)
for simplicity as we generally only care about numerical stability of the model.