Semi supervised integration update

Hi everyone,

I am trying to leverage the approach suggested in Benchmarking atlas-level data integration in single-cell genomics | Nature Methods and Semi-supervised integration with scANVI · Issue #698 · scverse/scvi-tools · GitHub, and using labels as metadata to guide the integration. The notebook from the paper is a little bit clunky to me and uses either custom wrappers or api that seem no longer present in the latest version.

I was wondering if running the standard preprocessing and then the following code below would do fine (providing actually no cells with “Unknown” label in adata.obs[“myLabel”]).

scvi.model.SCVI.setup_anndata(
    adata,
    layer="counts",
    categorical_covariate_keys = ["batch","individual"],
    labels_key = "myLabel"
    
)

lvae = scvi.model.SCANVI(adata, "Unknown", n_latent=30, n_layers=2)
lvae.train(max_epochs=100, n_samples_per_label=100)
latentSCANVI = lvae.get_latent_representation()
adata.obsm["X_scANVI"] = latentSCANVI

This way of supervised integration could be the way to go for many integration issues. It would be great to try and clarify/streamline it a little bit for the community :slightly_smiling_face:

Thank you!
Davide

Hi Davide,

Perhaps this is what you’re looking for?

Yes, thanks a lot! Definitelly my bad :slight_smile: