How to Correct for Intra-Organ Batch Effects Without Removing Inter-Organ Differences?

dacon06 · July 29, 2025, 9:39am

Dear Community,

I’m currently working on integrating a single-cell RNA-seq dataset of human mesenchymal stem cells (MSCs) using scvi-tools. The dataset includes 11 samples, each from a different donor, across four tissue types:

A: Adipose (A01–A03)
B: Bone marrow (B01–B03)
D: Dermis (D01–D03)
U: Umbilical cord (U01–U02)

Each sample corresponds to one patient, so I’ve been using the sample ID (e.g., A01, B02) as the batch_key in SCVI.setup_anndata.

My goal is to mitigate donor-specific batch effects within each tissue, but preserve the biological differences between tissues (since tissue-of-origin is an important axis of variation here).

I’ve followed the scvi-tools tutorials, but after integration, the tissue-specific structure seems to be partially lost.

My Questions:

Is using batch_key='Sample' the right approach here?
Should I treat tissue type as a categorical_covariate instead, to help scVI retain inter-organ differences?
Has anyone dealt with a similar situation where batch effects should be removed within groups but preserved between groups?

Any advice or best practices for this type of integration would be greatly appreciated!

Thanks in advance!

ori-kron-wis · July 31, 2025, 1:08pm

You can try the 2nd option, although I think it will just integrate your tissue type as well.

Can you share UMAPs? Or why do you say that “the tissue-specific structure seems to be partially lost”.

Can you share more info on your data? its size? Maybe a stratification issue? Is there any other information that can help in batch integration besides donor (e.g are donors from different studies or do some overlap?, Perhaps sample is not the best choice)

dacon06 · August 1, 2025, 8:28am

Hi,

Thanks for your response!

I’d be happy to share my UMAP plots. My dataset consists of 132,706 cells and 33,694 genes. The only metadata I have for each cell is the patient ID and the organ from which the sample was taken.

Please let me know if you need any more details.

Best regards

dacon06 · August 1, 2025, 8:29am

ori-kron-wis · August 1, 2025, 12:09pm

I guess the 2nd image is with the tissue as a covariate.

Re the 1st figure, can it be that A02 & A03 are from same study and A01 is from a different place?

Same goes for D01 & D03 vs. D02?

How do training/validation loss curves look?

Have you tried other models than SCVI? SysVI or MrVI?

Topic		Replies	Views
Integration with scVI scvi-tools scvi	2	946	November 30, 2022
Integrating different tissues with scVI scvi-tools integration , scvi	3	156	November 15, 2024
Shared cell types not mixing when integrating datasets from different species scvi-tools integration , scvi	4	60	June 19, 2025
scVI integration using two batch keys scvi-tools	5	1243	October 24, 2023
What model to use when integrating batches of scRNA-seq matrices containing >150,000 T and innate lymphoid cell (ILC) sub-populations scvi-tools scvi	7	635	May 26, 2022

How to Correct for Intra-Organ Batch Effects Without Removing Inter-Organ Differences?

My Questions:

Related topics