scArches with multiple covariates

Hi,

I want to use scArches to query disease samples to a normal reference. My data comes from a range of studies, using different single cell technologies, each including different patient samples.

My reference model looks like this:

scvi.model.SCVI.setup_anndata(
adata_ref,
layer=“counts”,
batch_key=“sample”,
categorical_covariate_keys=[“study”],
)

But when I add the query data I get an error:

vae_q = scvi.model.SCVI.load_query_data(
adata_query,
dir_path,
)
scArches currently does not support models with extra categorical covariates.

Does this mean I can’t have multiple covariates? Is there any workaround for this?

Many thanks,
Jess

Hi, thank you for your question. Yes, it looks like scArches currently does not support categorical covariates. A possible way around this would be to concatenate your "sample" and "study" columns into a single obs and then use that as the batch key. I’m not sure if there are plans to support extra categorical covariates for scArches in the future.

Thanks for your help

I think this feature would definitely be useful, for future versions of scArches :slight_smile:

Thanks for the feedback! We’ll see if we can update scarches with this feature.

1 Like

Any update on this?? Would really love a categorical covariate option for my work.

Hi, we have an open PR. We haven’t fully tested it yet. It might be part of the 1.2 release. However, please be mindful we don’t specifically decouple multiple covariates and I wouldn’t say that we learn the effect of e.g. disease due to this. The approach in MrVI is likely more founded.

1 Like

Thanks for the quick response, and I agree MrVI will be better for my work. However, I still need to develop a way to use the trained model to work on data it isn’t trained on. From my understanding MrVI is not compatible with scarches and isn’t planned to be. (For my research question I want to train the model on in vivo data and then benchmark multiple in vitro datasets against the in vivo MrVI model as to what it is most similar to). Before my workflow was scVI → scANVI → scArches (to get label transfer) any ideas about how to proceed while keeping batch specific learned effects (cross species)?

You are right, JAX scArches doesn’t exist currently. It won’t be super hard to implement and it is definitely planned. However, timeline is Q1/2 2025.

Thanks Cane11! Ive got the github version of scvi-tools install (pypi installation doesn’t have the external.mrvi i found… maybe something else was going on). But, besides that I am loving mrVI (great preprint also!), and super excited to polish up my metadata to use it to it’s full abilities on my data. Thanks team this is an awesome upgrade to the scviverse.

I think what you’re looking for is done in this ms by Truetlin/Theis lab where they’ve mapped multiple iPSC derived brain organoids single cell datasets to developing human brain: neural_organoid_atlas/NOMS_mapping/03_NOMS_to_HNOCA_mapping.ipynb at main · theislab/neural_organoid_atlas · GitHub