Loosing anndata .var layer when using sc.concat?

Hi, I am merging my samples, after initial QC, with scanpy.concat and l am loosing my .var layer.

This is my code
adata = sc.concat(adatas, join=‘outer’, index_unique=“_”, keys=sample_ids, label=‘sample_ID’)

Can I provide any other parameter so that concat function keeps the .var layer.
Thank you.

Duplicate of this question.
I’ve contributed to that thread, but will also answer here as this was the first hit when I was searching for a solution.

I don’t think sc.concat() currently (v1.10.1) has an option for this, but it is possible to merge all the .var DataFrames and add them to the .var of the final concatenated AnnData.

First, some data to have a reproducible example:

import scanpy as sc
import pandas as pd

pbmc = sc.datasets.pbmc68k_reduced()
adatas = {"set1": pbmc[0:5, 0:5], "set2": pbmc[5:10, 0:10], "set3": pbmc[10:15, 0:6]}

In this example, we have 3 sets of data stored in a dictionary, with some gene (vars) overlap between them.

We can concatenate our data:

adata = sc.concat(adatas, join="outer", label="set", index_unique="-")

After doing this, we loose the .var attribute.
To get it back, we can grab all the .var attributes from each set and merge them, as in this answer.

# grab all var DataFrames from our dictionary
all_var = [x.var for x in adatas.values()]
# concatenate them
all_var = pd.concat(all_var, join="outer")
# remove duplicates
all_var = all_var[~all_var.duplicated()]

Now we add this to our concatenated AnnData, making sure the order of the features is the same:

adata.var = all_var.loc[adata.var_names]
1 Like

Thanks @tavareshugo , this saved me some time.

I had to do a small change though, in the last line of the all_var generation, I think that the de-duplication should probably be made based on the index:

# remove duplicates
all_var = all_var[~all_var.index.duplicated()]

otherwise this attempts to get completely different rows, which might not work if you have already metrics for the genes that are dependent of the datasets.

2 Likes

Yes, you’re right! I’d done the right thing in the other thread, but mistyped it here. Thanks for the correction!