Hi,
Thanks for making this great package! I’m working with a multi-batch dataset with several donors, generated from multiple studies with their own batch effects (technology and varying sequencing depth). I am interested in generating a “batch-corrected” count matrix for downstream analysis. I see that in scvi.model.SCVI.setup_anndata()
there are options for categorical_covariate_keys
and continuous_covariate_keys
. So then I could use the various batch effects like “technology” and “donor” as categorical covariates, for example.
I also see that once I build the model, there is a function model.get_normalized_expression, which can take a transform_batch
argument, but how should I combine this with the categorical_covariates_keys above?
I saw a similar question here but didn’t see a specific recommendation: link.
Additionally, I see that the transform_batch requires one to specify a specific batch to treat each sample as it if came from as noted here: link. In that case, would it make more sense to average over all larger batch effects such as “technology” but not the individual to individual batches like “donor”? Would that be reasonable?
Overall, I want to remove technical artifacts such as sequencing depth and cell vs. nuclei effects from my datasets, without removing biological states such as tissue location or sex. Thanks!