scVI integration using two batch keys

dub2s · September 13, 2023, 6:34pm

Hii all

I am working on a data where the data was acquired at an interval of 2-3 years, probably with some differences in the processing kits. Suppose this is represented by “original_batch” ( having two values : Batch1 and Batch2). Other than that, overall the data has 8 patients ( 4 belonging to one condition and 4 belonging to other condition).

I performed integration using scVI and using the “sample” as batch key ( representing overall 8 batches). While that performed pretty well, if I view “original_batch” on the UMAP, it seems there are still some batch effect, which originally i didn’t correct for.

Is it possible to correct this as well or supply two batch keys? While I also think that at a broader level, the 2 batches are roughly equally represented in the clusters. What might be the better way of doing this?

martinkim0 · September 14, 2023, 5:35am

Hi @dub2s, this seems like a good use case for MrVI, which learns one latent space that is sample-aware and batch corrected and another latent space that is just batch corrected. For your data, I’m guessing the batch would be the original batch and the sample would be the experimental condition.

cc @Justin_Hong, one of the main developers of this method

Justin_Hong · September 15, 2023, 4:23pm

Hi @dub2s, Thanks for the post. Solely for integration you can try using the extra_categorical_covariates feature in scVI to improve mixing. It does look like your UMAP has the issue where your batches are ordered in the data and as a result Batch2 looks much more apparent than Batch1 specifically because the plotting follows the data order. If you shuffle the rows, before plotting it’ll likely give you a better picture of how well it’s mixing. A metric-like silhouette can also give you a more objective view.

MrVI is also an option as Martin suggested. For solely integration the current version may not be better than scVI; you instead get more interesting options for sample-sample analysis. We are working on a new version which we have found improves integration and has many more meta-analysis features. Keep an eye out for this work!

dub2s · September 15, 2023, 5:32pm

@martinkim0 and @Justin_Hong Thank you for the heads up on MrVI. I will wait for the future releases.

@Justin_Hong I didn’t know about extra_categorical_covariates in scVI. I will use it and update if that improves the mixing. I also plotted proportion of the two batches in each cluster and that looked fairly equal except for one or two clusters.

proportions

Shuffled for plotting:
umap_shuffle

dub2s · September 15, 2023, 6:31pm

Hii

I did use categorical_covariate_keys="original batch" . However, it didn’t seem to mix them well.

dub2s · October 24, 2023, 3:24pm

@martinkim0

Hii, I just happened to try harmony as well for the integration, and for this particular dataset, it does look like that it is correcting for the batch effect better, even though I supplied the same “sample” as the batch key.

What could be the reason given that I am using same batch key?

Topic		Replies	Views
scVI integration set batch_key and poor Umap result scvi-tools integration , diff-exp , scvi	3	213	August 7, 2024
Insufficient batch correction for certain cell-types scvi-tools integration , scvi	8	425	May 15, 2024
How to specify batch correction for 7 samples from two bacthes? scvi-tools scvi	2	419	March 15, 2023
Failure to remove a batch_key/ effect of number of LVs scvi-tools integration , scvi	6	412	February 9, 2024
Integration with scVI scvi-tools scvi	2	946	November 30, 2022

scVI integration using two batch keys

Related topics