How to handle batch effects within the query dataset when using scArches + SCVI?

ashenflower · July 28, 2025, 4:32pm

Hello everyone,

I’m new to scArches and currently exploring how it works in combination with SCVI, particularly for integrating new datasets into a reference atlas.

I’m following this tutorial, where a SCVI model is first trained on a reference dataset and then extended using treeArches to incorporate a new query dataset. From what I understand, the entire query dataset is treated as a single new batch during this “surgery” step when adapting the model. Also, it seems that SCVI expects raw count data for both training and mapping steps.

This leads me to a my question:

What if my query dataset contains multiple batches itself (for example, samples from different sources or sequencing runs)?

Should I split the query dataset by batch and integrate each one individually into the reference? Or is there a better approach that allows the model to recognize and correct for the internal batch effects within the query dataset during the surgery step?

Any guidance or best practices would be greatly appreciated!

Thanks in advance!

ori-kron-wis · July 31, 2025, 12:14pm

Hey,

I think the tutorial you linked to is a bit old and perhaps confusing (it uses batch column as “batch_key” but this column is also equal to “study” column, which is used to separate query and reference).

Anyway, You can check this tutorial: Reference mapping with SCVI-Tools — scvi-tools

Where the query dataset consists of several “tech” batches, and they are all used together as the query data. Obviously your query data should be close to your reference (in terms of species, tissue, cell types and so on) for good reference mapping (“surgery”).

However, there are more models beyond SCVI, that might be helpful for you if you want to integrate a query that is very different, such as SysVI. It might help in your flow (however scarches not fully supported in this model yet)

Topic		Replies	Views
scArches with multiple covariates scvi-tools integration , scvi , scarches	9	911	September 22, 2024
ScArches-TotalVI reproducibility scArches integration , totalvi	15	890	May 19, 2022
Using a model with categorical_covariate_key instead of batch_key scvi-tools	2	628	February 1, 2024
Add new data to existing integration scvi-tools	3	176	March 7, 2025
Sequential scArches scvi-tools integration , scvi , scarches	0	297	September 24, 2023

How to handle batch effects within the query dataset when using scArches + SCVI?

Related topics