Unbalanced cell types and transform_batch


I’m trying to obtain normalized expression values after integration with SCVI. Overall, I understand the whole process, but the transfrom_batch argument in .get_normalized_expression() is quite a head scratch. In several other issues including the ones in GitHub it’s been discussed to set this argument to the batch(es) that have a representative set of cell populations.

I have 7 batches, of which 6 profile the same set of cell populations (approx 27). The 7th batch however, focuses on a specific population from these 27 (with higher resolution). So far I’ve been setting the argument to a list of all batches, do you think it’s sensible to exclude the 7th batch?



Hi Claudio,

Unfortunately there’s not such a straightforward answer to your question. I would certainly try your proposed procedure, but then a more general question is related to what you plan to use these values for. As an alternative you can give one value to transform batch, considered as the reference batch so all the decoded values are with respect to this one batch category.

If your goal is to do something like differential expression, I would probably try some count based linear model like an NB GLM.

Hi Adam,

Thanks for the answer! I think it will certainly come down to try and see. Regarding the goal, Diff Exp is definitely within the plans, so thanks for the suggestion too :slight_smile:

Hi Claudio,

Since you’re trying things, I would try putting in all data and set the batches to the seven batches. Then, if your seventh batch only overlaps with a single cluster and the previous six batches have additional clusters you gain certainty in your cell sorting, and you get to annotate that cluster for free!