Weird output combat integration two datasets

Hi,

I am trying to integrate two datasets, and I have tested several methods, including scvi, BBKKN, scanorama, combat and some others in R (RCCA, and Harmony). In some cases I see weird integration output such as the one below. Here the code that I used for this specific integration in combat

# create a new object with lognormalized counts
adata_combat = sc.AnnData(X=concatenated_anndata.raw.X, var=concatenated_anndata.raw.var, obs = concatenated_anndata.obs)

# first store the raw data 
adata_combat.raw = adata_combat

# run combat
sc.pp.combat(adata_combat, key='dataset')

sc.pp.highly_variable_genes(adata_combat)
print("Highly variable genes: %d"%sum(adata_combat.var.highly_variable))
sc.pl.highly_variable_genes(adata_combat)

sc.pp.pca(adata_combat, n_comps=30, use_highly_variable=True, svd_solver='arpack')

sc.pp.neighbors(adata_combat)

sc.tl.umap(adata_combat)

fig, axs = plt.subplots(1, 1, figsize=(6,4),constrained_layout=True)
sc.pl.umap(adata_combat, color="dataset", title="Combat umap", ax=axs, show=False)


I am just curious if anyone knows what these distortions are called and what are they due to?

Im not sure, but this is not explicitly scvi-tools related right? perhaps general help/scanpy/integration forum is a better place to ask it

To stick to Dimitry Kobak, generally called Spaghetti (maybe coming originally from somewhere else) - and here’s a tweet about it: x.com
It’s part of degenerate neighbor graphs and is an optimization problem (many similar cells). In VAEs like scVI it’s usually a sign of issues with model training.

2 Likes