Merging data from multiple cohorts and many donors with scVI

Phil_Bradley · September 18, 2021, 5:13pm

Hi there,
First of all, thanks so much for creating this amazing set of tools! I’m a new user, so my apologies if I’ve missed relevant docs that would answer this question

I’m interested in creating a large “atlas” from multiple (~20) independent cohorts which collectively span ~1000 donors. I have two related questions:

is there a recommended minimum number of cells per donor ie batch (I am using the donor as the “batch” identifier)? I understand that there are suggestions to keep the number of cells greater than the number of genes, but I’m not sure if this applies within individual batches as well.
It looks like there are systematic technical differences in expression between the cohorts. Is there a way to include the cohort as an additional “batch” covariate for the model fitting? Does that even make sense, given that each donor has a cohort membership so the model already has freedom to fit those differences on a per-donor basis?
Any other thoughts/suggestions you might have on integrating very many batches would be much appreciated! For example, would it be better to integrate a few big batches and then bring the rest into that latent space using something like scarches? Are there SCVI flags (use_layer_norm, use_batch_norm, etc?) that might be appropriate for handling many cells/batches?
Thanks in advance for any advice you can offer!
Take care,
Phil

adamgayoso · September 18, 2021, 6:34pm

Hi Phil,

It’s hard to say. What you’re attempting to to is beyond what we have explicitly tested (and very cool!).
We are currently working on an extension related to this post:

Yes you can put both the donor and the cohort as a key using categorical_covariate_keys here: https://docs.scvi-tools.org/en/stable/api/reference/scvi.data.setup_anndata.html#scvi.data.setup_anndata

I would be interested to learn a bit more about what you’re trying to do. What you stated are reasonable things to try though. Please feel free to email me (firstlast at berkeley dot edu) if you’d like to schedule a meeting!

Phil_Bradley · September 22, 2021, 5:36pm

Hi Adam,
Thanks for your kind reply and the pointer to that other relevant post. I will learn more about the categorical_covariate_keys argument.
I’m also very happy to chat more about this specific application. I’ll email you.
Take care,
Phil

PS. Sorry for the delay replying-- I closed the tab and missed the notification of your reply.

Topic		Replies	Views
Training split conditioned on batch_key scvi-tools scvi	3	159	May 22, 2024
Differential Expression and Batch Correction scvi-tools scvi	1	148	February 20, 2025
Recommendation for transform_batch / categorical_covariate_keys to obtain "batch corrected" counts scvi-tools integration	1	170	July 3, 2024
Batch correction with covariate preservation scRNA-seq scvi	2	712	June 13, 2022
Batch Integration Parameter Tuning scvi-tools integration , gene-selection , scvi , modeling	1	628	March 2, 2022

Merging data from multiple cohorts and many donors with scVI

Related topics