@ori-kron-wis
Thank you!
My goal was to improve the notoriously flawed Xenium annotations and “dusty” UMAP by using a single cell RNA-seq reference, annotating that first, then mapping the Xenium cells to that and annotating them via scANVI or a nearest neighbor approach to get cleaner cluster separation (from scRNA-seq) and more valid and finer cell type and cell state annotations. This is a summary of the 2 approaches i tried so far:
- For the first approach i used the scANVI reference mapping, with a shared embedding space of Xenium and scRNA-seq (HVG 2000 and subsampled to shared genes).
vae_ref = scvi.model.SCVI.load(X…”scvi_model_scseq", adata=adata_ref)
scanvi_ref = scvi.model.SCANVI.from_scvi_model(vae_ref, labels_key=“celltype”, unlabeled_category=“Unknown”)
scanvi_ref.train(accelerator = “gpu”, max_epochs=50, n_samples_per_label=100)
adata_ref.obsm[“X_scANVI”] = vae_ref.get_latent_representation()
sc.pp.neighbors(adata_ref, use_rep=“X_scANVI”)
sc.tl.leiden(adata_ref)
sc.tl.umap(adata_ref)
scvi.model.SCANVI.prepare_query_anndata(adata_query, X…”scanvi_model_scseq")
scanvi_query = scvi.model.SCANVI.load_query_data(adata_query, X…”scanvi_model_scseq")
scanvi_query.train(max_epochs=100, plan_kwargs={“weight_decay”: 0.0})
adata_query.obsm[“X_scANVI”] = scanvi_query.get_latent_representation()
adata_query.obs[“celltype_scanvi”] = scanvi_query.predict()
adata_full = adata_query.concatenate(adata_ref)
adata_full.obsm[“X_scANVI”] = scanvi_query.get_latent_representation(adata_full)
sc.pp.neighbors(adata_full, use_rep=“X_scANVI”)
sc.tl.umap(adata_full)
sc.tl.leiden(adata_full)
- For the second approach, I used integrated scRNA-seq data as a fixed embedding and mapped Xenium cells to their nearest reference neighbors. This worked quite well, with good marker logic - also when i plot control_celltype_annotations (from solely integrated and DGE gathered Xenium annotations) on the shared UMAP. BUT around 60% of “close contact phenotypes,” such as dendritic cells or CD4 subtypes that are very close to tumor cells, are, in my opinion, mistakenly labeled as tumor cells in this approach. I guess Xenium transcript bleed made them map to the tumor cluster (already used proseg to refine Xenium segmentation).
sc.pp.highly_variable_genes(adata_ref, flavor=“seurat_v3”, n_top_genes=2000, layer=“counts”, batch_key= “biopsy_sc”, subset=True)
shared_genes = adata_ref.var_names.intersection(adata_xenium.var_names)
adata_xenium = adata_xenium[:, shared_genes].copy()
adata_ref = adata_ref[:, shared_genes].copy()
scvi.model.SCVI.setup_anndata(adata_ref, layer=“counts”, batch_key=“biopsy_sc”)
vae = scvi.model.SCVI(adata_ref, gene_likelihood=“nb”, n_layers=2, n_latent=30)
vae.train(accelerator=“gpu”, max_epochs=50, early_stopping=True, early_stopping_patience=10)
adata_ref.obsm[“scVI”] = vae.get_latent_representation()
adata_ref.layers[‘scvi_normalized’] = vae.get_normalized_expression(library_size = 1e4)
sc.pp.neighbors(adata_ref, use_rep = ‘scVI’)
sc.tl.umap(adata_ref)
sc.tl.leiden(adata_ref, resolution = 0.5)
vae = scvi.model.SCVI.load(…scvi_model_scseq, adata=adata_ref)
scvi.model.SCVI.prepare_query_anndata(adata_xenium, vae)
vae_query = scvi.model.SCVI.load_query_data(adata_xenium, vae)
vae_query.train(accelerator=“gpu”, max_epochs=50, plan_kwargs={“weight_decay”: 0.0})
adata_xenium.obsm[“scVI”] = vae_query.get_latent_representation()
umap_model = umap.UMAP(n_neighbors=15, min_dist=0.5, metric=“euclidean”)
umap_model.fit(adata_ref.obsm[“scVI”])
adata_xenium.obsm[“X_umap”] = umap_model.transform(adata_xenium.obsm[“scVI”])
Then, the KNN annotation transfer from single cell annotations to Xenium cells.
Thank you for any suggestions!