Differential expression with scvi - batch correction?

Almx · June 13, 2024, 2:53pm

Hello,
Thanks a lot for this great tool.

I’m running the following analysis:

control = sc.read_h5ad(control_dir) #When you want to load after SOLO you need to use h5ad load instead of h5
control.var_names_make_unique()

ove = sc.read_h5ad(ove_dir)
ove.var_names_make_unique()

healthy = sc.read_h5ad(healthy_dir)
healthy.var_names_make_unique()

adata = control.concatenate(ove, healthy, batch_key="Sample", batch_categories=["control", "ove", "healthy"])
print(adata.obs['Sample'])


# mitochondrial genes
adata.var["mt"] = adata.var_names.str.startswith("mt-")
# ribosomal genes
adata.var["ribo"] = adata.var_names.str.startswith(("rps", "rps"))
# hemoglobin genes.
adata.var["hb"] = adata.var_names.str.contains(("^hb[^(p)]"))

sc.pp.calculate_qc_metrics(
    adata, qc_vars=["mt", "ribo", "hb"], inplace=True, percent_top=[20], log1p=True
)
#normalize data that can be used for other scanpy functions, etc.
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
adata.raw = adata

sc.pp.highly_variable_genes(adata, n_top_genes=30454, subset = True, layer = 'counts', flavor = "seurat_v3", batch_key="Sample") 
#Here I use all genes, so I can further perform DEGs on all genes

scvi.model.SCVI.setup_anndata(adata, layer='counts',
                            categorical_covariate_keys=["Sample"],
                            continuous_covariate_keys=['pct_counts_mt', 'pct_counts_ribo'])

model = scvi.model.SCVI(adata)

model.train(check_val_every_n_epoch=5, max_epochs=300, early_stopping=True, early_stopping_patience=20, early_stopping_monitor='elbo_validation')

latent = model.get_latent_representation() #this is what you will use to cluster now instead of PCs like normal
latent.shape
adata.obsm['X_scVI'] = latent
adata.layers['scvi_normalized'] = model.get_normalized_expression(library_size = 1e4)

sc.pp.neighbors(adata, use_rep = 'X_scVI')
sc.tl.umap(adata)
sc.tl.leiden(adata, resolution = 0.4)


genes = model.differential_expression(idx1 = [(adata.obs.Sample == 'control') & (adata.obs.leiden == '0')],
                             idx2 = [(adata.obs.Sample == 'healthy') & (adata.obs.leiden == '0')], **batch_correction=True**)

I’m not completely sure if I should be using batch_correction=True when performing differential_expression() or not? I would be really happy to hear your opinion on this?

Thanks a lot!

cane11 · June 19, 2024, 3:35am

Hi, in this case no you don’t want to use it as it seems that you want to compare healthy and diseased cells and this is the same key as provided to scVI, so by doing batch correction you will mask the differential expression between both samples. If you have multiple healthy and diseased samples you can provide healthy sample IDs to batchid1 and all diseased sample IDs to batchid2.

Topic		Replies	Views
Differential Expression and Batch Correction scvi-tools scvi	1	171	February 20, 2025
Differential expression analysis scvi-tools	4	772	January 5, 2025
Question about differential expression and the batch_correction usage scvi-tools diff-exp , scvi	0	174	June 7, 2024
scvi.model.SCVI for DE gene anlaysis scvi-tools	1	700	May 8, 2022
Insufficient batch correction for certain cell-types scvi-tools integration , scvi	8	427	May 15, 2024

Differential expression with scvi - batch correction?

Related topics