Hi everyone,
I am trying to leverage the approach suggested in Benchmarking atlas-level data integration in single-cell genomics | Nature Methods and Semi-supervised integration with scANVI · Issue #698 · scverse/scvi-tools · GitHub, and using labels as metadata to guide the integration. The notebook from the paper is a little bit clunky to me and uses either custom wrappers or api that seem no longer present in the latest version.
I was wondering if running the standard preprocessing and then the following code below would do fine (providing actually no cells with “Unknown” label in adata.obs[“myLabel”]).
scvi.model.SCVI.setup_anndata(
adata,
layer="counts",
categorical_covariate_keys = ["batch","individual"],
labels_key = "myLabel"
)
lvae = scvi.model.SCANVI(adata, "Unknown", n_latent=30, n_layers=2)
lvae.train(max_epochs=100, n_samples_per_label=100)
latentSCANVI = lvae.get_latent_representation()
adata.obsm["X_scANVI"] = latentSCANVI
This way of supervised integration could be the way to go for many integration issues. It would be great to try and clarify/streamline it a little bit for the community
Thank you!
Davide