Integration with or without covariate

Hello,

I just did two run of scvi with our without covariate. Below is the codes I used.

Without covariates :

scvi.model.SCVI.setup_anndata(sco, layer = ‘raw_counts’)
norm = scvi.model.SCVI(sco)
norm.train()

sco.obsm[‘norm_scVI’] = norm.get_latent_representation()
sco.layers[‘scvi_normalized’] = norm.get_normalized_expression()

sc.pp.scale(sco, zero_center=False, layer=‘scvi_normalized’)

and with covariates :

scvi.model.SCVI.setup_anndata(sco, layer = ‘raw_counts’,categorical_covariate_keys=[‘infection’, ‘timepoint’])
corr = scvi.model.SCVI(sco)
corr.train()

sco.obsm[‘integrated_scVI’] = corr.get_latent_representation()
sco.layers[‘scvi_integrated’] = corr.get_normalized_expression(library_size = 1e4)

sc.pp.scale(sco, zero_center=False, layer=‘scvi_integrated’)

Of course, the results are different and the umap I can generate also.

I just want to wich one do you think will be the best approximation of the reality ? And thus which one I should use for further analysis ?

Best regards

Lionel

Hi,

Im not sure about the meaning of approximation of reality & what is your further analysis , but here
The 2nd option is the way to go if you want to correct the batch effect that was caused because of the categorial covariates [‘infection’, ‘timepoint’], although we generally state a batch_key parameter in the setup_anndata, which here is missing in both cases (thus in the 2nd case, the categ covariates function in the same manner).

so the 2nd option fix the batch effect because of the covariates and this is the main difference between them as I see it