Hi, I have several datasets, half of which in one category and half in another. I would like to integrate them all but thought maybe it would be better if I integrate the ones with closer biological relationship first, then do a second round of integration on the two integrated datasets.
However, in practice, firstly scVI extract only HVGs for correction, so I’m not sure whether a gene which might have been recognized as batch-HVG might not be when I do them separately. Second, I was checking the output normalized counts from scVI and they seem quite different from the normal count matrix we have pre-scVI-processing. So I’m not sure whether I should see it as scaled data or only log-normalized data or another raw count? Or that a second round integration is impossible due to the nature of the normalized count matrix generation?