Duplicate of this question.
I’ve contributed to that thread, but will also answer here as this was the first hit when I was searching for a solution.
I don’t think sc.concat()
currently (v1.10.1) has an option for this, but it is possible to merge all the .var
DataFrames and add them to the .var
of the final concatenated AnnData.
First, some data to have a reproducible example:
import scanpy as sc
import pandas as pd
pbmc = sc.datasets.pbmc68k_reduced()
adatas = {"set1": pbmc[0:5, 0:5], "set2": pbmc[5:10, 0:10], "set3": pbmc[10:15, 0:6]}
In this example, we have 3 sets of data stored in a dictionary, with some gene (vars) overlap between them.
We can concatenate our data:
adata = sc.concat(adatas, join="outer", label="set", index_unique="-")
After doing this, we loose the .var
attribute.
To get it back, we can grab all the .var
attributes from each set and merge them, as in this answer.
# grab all var DataFrames from our dictionary
all_var = [x.var for x in adatas.values()]
# concatenate them
all_var = pd.concat(all_var, join="outer")
# remove duplicates
all_var = all_var[~all_var.duplicated()]
Now we add this to our concatenated AnnData, making sure the order of the features is the same:
adata.var = all_var.loc[adata.var_names]