Nested batch effects with scvi

LinearParadox · November 22, 2023, 12:12am

Hi all,

I’m not exactly sure how to code the batch effect comparison here. I started with including all the levels, but I’m not sure if this is ideal. Essentially this is a 10x FFPE experiment with 15 samples. The 10X FFPE protocol splits this into 4 pool, and the Pools were sequenced in 2 batches. The set up is something like:

Batch 1 has pools 1 and 2
Batch 2 has pools 3 and 4. There is one sample in batch 2 that was also repeated in batch one.

I currently have it as:

scvi.model.SCVI.setup_anndata(
    adata,
    layer="counts",
    categorical_covariate_keys=["pool", "batch", "samples"],
    continuous_covariate_keys=["pct_counts_mt", "S_score", "G2M_score"],
    
)

Not sure if this is ideal however, as I realize some of the categorical variables are very well correlated. Thank you for your help!!

cane11 · November 28, 2023, 6:25pm

Correlated categorical covariates are not a problem in my experience and also not in the code. ScVI corrects for those covariates, it doesn’t try to learn the effect of each covariate separately. However, in my experience adding continuous covariates can make the latent space significantly worse as more information is encoded through those continuous covariates than through the actual latent space.

LinearParadox · November 29, 2023, 10:11pm

Thank you for your advice. Do you usually not include any continuous covariates? Is there any way to quantify how much information I’m removing from the latent space between models? Like for example, is there any way to estimate how much I lost if I included cell cycle scores as a continuous covariate, or if I included %mt as another example.

cane11 · November 29, 2023, 10:58pm

I generally don’t include them. Run scVI and check the output. If there is a strong gradient, I’m adding them and rerunning training. Best advice to check that structure is well preserved is prior knowledge (like celltypes or development trajectory). Quantifying it is possible using scib-metrics, while they are very focussed on cell-types.

Topic		Replies	Views
Integration with or without covariate scvi-tools	1	82	November 11, 2024
How to specify batch correction for 7 samples from two bacthes? scvi-tools scvi	2	425	March 15, 2023
Which scvi-tools releases support modelling with extra covariates? scvi-tools	7	688	October 21, 2021
Insufficient batch correction for certain cell-types scvi-tools integration , scvi	8	448	May 15, 2024
Recommendation for transform_batch / categorical_covariate_keys to obtain "batch corrected" counts scvi-tools integration	1	187	July 3, 2024

Nested batch effects with scvi

Related topics