Hello,
I am attempting to integrate datasets of the same tissue from different species. Some of the datasets are single-cell while some are single-nucleus. I thought that SCVI would work well for this task, however, as seen in the UMAPs below there is almost no integration of the different datasets, even from the same species. Using the consolidated cell-type labels from the original datasets, it seems shared cell types are in the same region of the umap, but the common cells from the different datasets do not mix (bottom umap). Below I have the code I used to make the umaps, where I used Sample and Method (single-cell vs. single-nuc) as categorical covariates. However, I have also tried other models such as:
scvi.model.SCVI.setup_anndata(adata, layer = "counts",
categorical_covariate_keys=['Dataset','Method', 'Organism','Sample'],
continuous_covariate_keys=['percent.mito', 'percent.ribo','nCount_RNA',])
and got similar results. I am using raw counts as input and all datasets have been subjected to the same quality control.
I have also tried scANVI using the consolidated cell-type labels from the original datasets and still did not achieve any overlap. Lastly, I have tried integrating different combinations of the datasets and could not even get cells from the same species and same method to integrate.
Almost surprisingly, I was able to achieve pretty good integration by scaling variable genes within each dataset and using harmony. However, I would prefer to use scvi and am confused as to why it is not working very well. If anyone has insight on how I might improve my integration, it would be much appreciated. Thank you for your consideration!
sc.pp.highly_variable_genes(adata, n_top_genes=3000, subset = True, layer = 'counts',
flavor = "seurat_v3", batch_key="Dataset")
scvi.model.SCVI.setup_anndata(adata, layer = "counts",
categorical_covariate_keys=["Sample",'Method'],
continuous_covariate_keys=['percent.mito','percent.ribo'])
model = scvi.model.SCVI(adata)
#Train the model and save
model.train(early_stopping=True)
#Save model
model.save("./scvi_model",overwrite = True)
#Get latent rep
latent = model.get_latent_representation()
adata.obsm['X_scVI'] = latent
sc.pp.neighbors(adata, use_rep = 'X_scVI')
sc.tl.umap(adata)
sc.tl.leiden(adata, resolution = 0.5)