Help with concat

The obs-wise merge is not doing what I would expect – curious how to get the desired result.

Script:

#!/usr/bin/env python

import pandas as pd
import numpy as np
import anndata as ad

# ----------------------------------------------------------------
def show_adata(adata, name):
    print()
    print("----", name)
    print("X:")
    print(adata.X)
    print("obs:")
    print(adata.obs)
    print("var:")
    print(adata.var)

# ----------------------------------------------------------------
X1 = np.asarray([[1,2,3],[4,5,6]], dtype=np.float32)
obs1 = pd.DataFrame(
    index=np.asarray(['cell1', 'cell2']),
    data={
        "oa": np.asarray([10,20]),
        "ob": np.asarray(["a","b"]),
    },
)
var1 = pd.DataFrame(
    index=np.asarray(['gene1', 'gene2', 'gene3']),
    data={
        "va": np.asarray([30,40,50]),
        "vb": np.asarray([True, False, True]),
        "vc": np.asarray(["c","d","e"]),
    },
)
adata1 = ad.AnnData(X=X1, obs=obs1, var=var1)
show_adata(adata1, "ADATA1")

X2 = np.asarray([[7,8],[9,10]], dtype=np.float32)
obs2 = pd.DataFrame(
    index=np.asarray(['cell3', 'cell4']),
    data={
        "oa": np.asarray([60,70]),
        "ob": np.asarray(["f","g"]),
    },
)
var2 = pd.DataFrame(
    index=np.asarray(['gene1', 'gene4']),
    data={
        "va": np.asarray([80,90]),
        "vb": np.asarray([True, False]),
        "vc": np.asarray(["h","i"]),
    },
)
adata2 = ad.AnnData(X=X2, obs=obs2, var=var2)
show_adata(adata2, "ADATA2")

adatac = ad.concat([adata1, adata2], axis=0, join="outer", merge="first")
show_adata(adatac, "ADATAC")

Output:


---- ADATA1
X:
[[1. 2. 3.]
 [4. 5. 6.]]
obs:
       oa ob
cell1  10  a
cell2  20  b
var:
       va     vb vc
gene1  30   True  c
gene2  40  False  d
gene3  50   True  e

---- ADATA2
X:
[[ 7.  8.]
 [ 9. 10.]]
obs:
       oa ob
cell3  60  f
cell4  70  g
var:
       va     vb vc
gene1  80   True  h
gene4  90  False  i

---- ADATAC
X:
[[ 1.  2.  3. nan]
 [ 4.  5.  6. nan]
 [ 7. nan nan  8.]
 [ 9. nan nan 10.]]
obs:
       oa ob
cell1  10  a
cell2  20  b
cell3  60  f
cell4  70  g
var:
         va     vb   vc
gene1  30.0   True    c
gene2  40.0  False    d
gene3  50.0   True    e
gene4   NaN    NaN  NaN

I want to have:

var:
         va     vb   vc
gene1  30.0   True    c
gene2  40.0   False   d
gene3  50.0   True    e
gene4  90.0.  False.   i

A colleague set me up! :smiley:

For the record, the solution was to change from:

adatac = ad.concat([adata1, adata2], axis=0, join="outer", merge="first")

to handle the var merge on its own, as in:

# merge obs & X (and var, but we discard that)
adatac = ad.concat([adata1, adata2], axis=0, join="outer", merge="first")
# merge var
merged_var = pd.concat([adata1.var, adata2.var], join="outer")
adatac.var = merged_var[~merged_var.index.duplicated()]
1 Like

I was looking for a solution to the same issue and while your solution partially works, I don’t think it ensures the order of the genes is correct.

In case others find this thread, my modified suggestion is:

# merge obs & X
adatac = ad.concat([adata1, adata2], axis=0, join="outer", merge="first")
# merge var
merged_var = pd.concat([adata1.var, adata2.var], join="outer")
merged_var = merged_var[~merged_var.index.duplicated()]
adatac.var = merged_var.loc[adatac.var_names]

The last line of code ensures that the gene order of the merged_var DataFrame is the same as the gene names in the original AnnData object.