Hello,
Dataset
I have a scRNASeq dataset that was based on the following experimental design:
timepoint | Nsamples |
---|---|
t0 | 5 |
t1 | 3 |
t2 | 4 |
Aim
Correct for interindividual variability across timepoints and study gene expression for each cell-type between timepoints.
Batch correction
In order to remove interindividual variability, first I created two columns:
- One column named Replicate containing sample IDs.
- Second column denoting the timepoint t1, t2 and t3.
then I performed batch correction as following:
- Select 4000 highly variable genes
sc.pp.highly_variable_genes(adata, n_top_genes=4000, subset = True, layer = 'soupX_counts', flavor = "seurat_v3", batch_key="Replicate")
soupX_counts
were provided by the SoupX
package duing the QC steps to remove any possible mRNA contamination.
- Setup SCVI model
scvi.model.SCVI.setup_anndata(adata, layer = "soupX_counts",
categorical_covariate_keys=["Replicate"],
continuous_covariate_keys=['pct_counts_mt', 'total_counts'])
I tried tweaking the parameters (as suggested in other posts on batch correction) and the following provided the best results for me:
model = scvi.model.SCVI(
adata,
n_layers=2,
n_hidden=200,
n_latent=20,
gene_likelihood='zinb',
dispersion="gene"
)
- Define training size
if 0.1 * adata.n_obs < 20000:
train_size = 0.9
else:
train_size = 1-(20000/adata.n_obs)
- Train SCVI model
model.train(
accelerator="gpu",
devices=1,
early_stopping=True,
train_size=train_size,
early_stopping_patience=400,
max_epochs=10000,
batch_size=1024,
limit_train_batches=20
)
Results
Despite the batch correction step, still there seems to be some batch effect for certain cell-types especially classical monocytes.
In the figure above cells from timepoint t0 (PRE) were selected and UMAP was performed with scanpy
. Cell-type annotation was performed with CellTypist
.
Questions
- Do you think the observed batch effect appears strong enough to affect differential expression?
- Did the batch correction get affected negatively by the unbalanced dataset?
- Do you have any suggestion to improve the batch correction process?
Thank you in advance.