Failure to remove a batch_key/ effect of number of LVs

Marwansha · January 1, 2024, 9:23pm

Hello.
I have a dataset containing cells(700k) from two different conditions: unstimulated and stimulated with a virus. I noticed that when I set the batch_key to “condition”, scVI was unable to remove the batch effect, as the cell types were clustered based on the stimulation condition. This is evident in the UMAP plot, which shows that the cells are clustered by stimulation condition rather than cell type.

What can I do to address this issue?
does chaning the number of used HVG could affect this ?

Does the number of latent variables and layers scale with the number of cells , should I specify more LVs and layers if I have more cells?
i saw when i increased the n_epochs the result where a bit diffrent also, ideally shall i go for much higher epochs since i got alot of cells?
Thanks

Marwansha · January 1, 2024, 9:26pm

here is a umap colored by celltype and conditions that i wanted to remove its effect(normal workflow of pca/knn/umap yield a similary umap)

Marwansha · January 9, 2024, 9:39am

increasing the epochs until convergence ( using early stopping) still couldn’t remove the batch effect of the condition.

Valentine_Svensson · January 10, 2024, 5:42pm

Hi Marwansha,

Are all your 700k cells from just two samples / batches that are confounded with your virus treatment?

Integration of samples with scVI tend to work better if you include all samples / batches. For example, if you have 8 untreated samples, and 8 treated samples, for a total of 16 samples but with 2 conditions – then integration typically works better if you give the ‘sample’ as batch_key than if you give ‘condition’ as batch_key.

WIth only 700k cells, I think the default of 10 latent variables should be sufficient. When there are many more cells increasing LVs can help, but necessarily for batch integration, which tend to benefit from a small ‘bottleneck’.

Hope this helps!
/Valentine

Marwansha · January 10, 2024, 8:03pm

Thank you alot for your response.

My library design was quite complex, with each library containing samples from different individuals at different conditions (one individual from same library was stimulated with a virus, while other was not) followed by demultiplexing the cell identities using genotyping data

the visualization shows distinct clustering and separation of each cell type based on the condition with no library batch effect. I was trying to remove this separation using scVI.

I tried a variety of model settings For instance, I tried different numbers of latent variables (10, 20, 30, 50, 10), number of layers (1, 2, 4, 10), dispersion settings (gene, gene-batch), and gene likelihood models (Zinb and nb). all models reach convergence ( Early stopping option )

*Do you have any suggestions for how to get this to work , i try to add library with the condition maybe as categorical keys ( batches) ? ? *

Valentine_Svensson · February 9, 2024, 5:44am

Hi Marwansha,

Since you want to account for (‘remove’) variation due to ‘condition’, the ‘condition’ indicator should be part of your ‘batch’ in the batch correction. Did you try this?

For example, if you have one column ‘individual’ and another column ‘condition’ in your adata.obs, you can make a new column adata.obs['individual_condition'] = adata.obs['individual'] + '-' + adata.obs['condition'], then use 'individual_condition' as batch_key when setting up the model.

The choice of how to group cells with the batch_key argument will have a much larger effect than latent dimensionality, number or layers or other technical settings.

I would be interested in seeing how your UMAP changes with different choices for what you put in the batch_key!

Hope this helps,
/Valentine

Marwansha · February 9, 2024, 4:30pm

Hi Valentine,

just to let you know i tried excatly what u suggested and i think scvi / scanvi failed to remove this strong condition effectop( which is mainlya biological effect, stimulated and unstimulated cells)

i know its a tool effect since after i switched to harmony and it worked normal.

i would be happy to try again with any specific suggestion for the model parameters you would offer.

Thanks
Marwan

Topic		Replies	Views
Insufficient batch correction for certain cell-types scvi-tools integration , scvi	8	425	May 15, 2024
scVI integration using two batch keys scvi-tools	5	1242	October 24, 2023
Suggestion on parameters for training scvi model scvi-tools integration , scvi	3	1745	December 4, 2023
Batch effect not entirely removed in PeakVI scvi-tools peakvi	3	555	August 10, 2022
How to specify batch correction for 7 samples from two bacthes? scvi-tools scvi	2	419	March 15, 2023

Failure to remove a batch_key/ effect of number of LVs

Related topics