Loosing anndata .var layer when using sc.concat?

Duplicate of this question.
I’ve contributed to that thread, but will also answer here as this was the first hit when I was searching for a solution.

I don’t think sc.concat() currently (v1.10.1) has an option for this, but it is possible to merge all the .var DataFrames and add them to the .var of the final concatenated AnnData.

First, some data to have a reproducible example:

import scanpy as sc
import pandas as pd

pbmc = sc.datasets.pbmc68k_reduced()
adatas = {"set1": pbmc[0:5, 0:5], "set2": pbmc[5:10, 0:10], "set3": pbmc[10:15, 0:6]}

In this example, we have 3 sets of data stored in a dictionary, with some gene (vars) overlap between them.

We can concatenate our data:

adata = sc.concat(adatas, join="outer", label="set", index_unique="-")

After doing this, we loose the .var attribute.
To get it back, we can grab all the .var attributes from each set and merge them, as in this answer.

# grab all var DataFrames from our dictionary
all_var = [x.var for x in adatas.values()]
# concatenate them
all_var = pd.concat(all_var, join="outer")
# remove duplicates
all_var = all_var[~all_var.duplicated()]

Now we add this to our concatenated AnnData, making sure the order of the features is the same:

adata.var = all_var.loc[adata.var_names]
1 Like