Loosing anndata .var layer when using sc.concat?

Avaptel18 · July 14, 2023, 5:39pm

Hi, I am merging my samples, after initial QC, with scanpy.concat and l am loosing my .var layer.

This is my code
adata = sc.concat(adatas, join=‘outer’, index_unique=“_”, keys=sample_ids, label=‘sample_ID’)

Can I provide any other parameter so that concat function keeps the .var layer.
Thank you.

tavareshugo · May 1, 2024, 5:02pm

Duplicate of this question.
I’ve contributed to that thread, but will also answer here as this was the first hit when I was searching for a solution.

I don’t think sc.concat() currently (v1.10.1) has an option for this, but it is possible to merge all the .var DataFrames and add them to the .var of the final concatenated AnnData.

First, some data to have a reproducible example:

import scanpy as sc
import pandas as pd

pbmc = sc.datasets.pbmc68k_reduced()
adatas = {"set1": pbmc[0:5, 0:5], "set2": pbmc[5:10, 0:10], "set3": pbmc[10:15, 0:6]}

In this example, we have 3 sets of data stored in a dictionary, with some gene (vars) overlap between them.

We can concatenate our data:

adata = sc.concat(adatas, join="outer", label="set", index_unique="-")

After doing this, we loose the .var attribute.
To get it back, we can grab all the .var attributes from each set and merge them, as in this answer.

# grab all var DataFrames from our dictionary
all_var = [x.var for x in adatas.values()]
# concatenate them
all_var = pd.concat(all_var, join="outer")
# remove duplicates
all_var = all_var[~all_var.duplicated()]

Now we add this to our concatenated AnnData, making sure the order of the features is the same:

adata.var = all_var.loc[adata.var_names]

pcm32 · September 9, 2024, 1:51pm

Thanks @tavareshugo , this saved me some time.

I had to do a small change though, in the last line of the all_var generation, I think that the de-duplication should probably be made based on the index:

# remove duplicates
all_var = all_var[~all_var.index.duplicated()]

otherwise this attempts to get completely different rows, which might not work if you have already metrics for the genes that are dependent of the datasets.

tavareshugo · September 9, 2024, 2:37pm

Yes, you’re right! I’d done the right thing in the other thread, but mistyped it here. Thanks for the correction!

Topic		Replies	Views
Anndata.concat([a,b], join="inner") does not behave as a.concat(b) anndata integration , anndata	1	147	September 4, 2024
How to concatenate anndata properly? anndata scrna-seq , integration , scvi	2	8231	November 3, 2022
Help with concat anndata	2	991	May 1, 2024
Help concatenating var for cite seq scanpy	2	621	May 24, 2023
How to filter concatenated anndata object? Help	5	555	March 18, 2024

Loosing anndata .var layer when using sc.concat?

Related topics