How to make the union set of 2 scRNA-seq matrix?

hyjforesight · May 13, 2022, 8:05pm

Hello Scanpy,
We have 2 scRNA-seq libraries sharing some common barcodes between them. We want to make a union set of these 2 libraries by removing the duplicated barcodes? But we’re not professional with the anndata coding. It looks like we cannot do it by simple | or & or - or ^ in pandas.
Could you please help us with this question?
Thanks!
Best,
YJ

We tried:

CKP1 = sc.read_10x_mtx(path='D:/ZGY/MST_matrix/KP9CKP11-5_11/CKP/', var_names='gene_symbols', cache=True) 
CKP1.var_names_make_unique()
CKP2 = sc.read_10x_mtx(path='D:/ZGY/MST_matrix/KP10CKP12-5_11/CKP/', var_names='gene_symbols', cache=True)
CKP2.var_names_make_unique()

dup_index = CKP1.obs_names.intersection(CKP2.obs_names)    # find the duplicated index between 2 libraries
dup_index
Index(['AAACCCACACCTTCGT-1', 'AAACCCACACTACAGT-1', 'AAACCCACAGATACCT-1',
       'AAACCCACAGCGTTTA-1', 'AAACCCACAGGCCTGT-1', 'AAACCCACATGAGATA-1',
       'AAACCCAGTAATCAAG-1', 'AAACCCAGTCGCCTAG-1', 'AAACCCAGTGTCATCA-1',
       'AAACCCAGTGTCCATA-1',
       ...
       'TTTGTTGCATAGAGGC-1', 'TTTGTTGCATGAGATA-1', 'TTTGTTGGTCGTACTA-1',
       'TTTGTTGGTGCGGCTT-1', 'TTTGTTGGTTGTGTTG-1', 'TTTGTTGTCAAAGCCT-1',
       'TTTGTTGTCACCCTTG-1', 'TTTGTTGTCCTCGCAT-1', 'TTTGTTGTCGCTTAAG-1',
       'TTTGTTGTCTCGCAGG-1'],
      dtype='object', length=9697)

CKP1_uni=CKP1-CKP1[dup_index,:]    # slice the unique part of CKP1
TypeError: unsupported operand type(s) for -: 'AnnData' and 'AnnData'
CKP2_uni=CKP2-CKP2[dup_index,:]    # slice the unique part of CKP2
TypeError: unsupported operand type(s) for -: 'AnnData' and 'AnnData'
CKP_intersection=CKP1[dup_index,:]    # slice the intersection part of CKP1 and CKP2
View of AnnData object with n_obs × n_vars = 9697 × 32285
    var: 'gene_ids', 'feature_types'

adata = CKP1_uni.concatenate(CKP2_uni, CKP_intersection, batch_categories=['CKP1_uni', 'CKP2_uni', 'CKP_intersection'])    # merge these 3 parts
NameError: name 'CKP1_uni' is not defined

Valentine_Svensson · May 15, 2022, 10:41pm

Hi YJ,

Index objects acts as sets. You have gotten pretty far to how I would solve this with the first bit. Here’s what I would do:

idx1 = CKP1.obs.index
idx2 = CKP2.obs.index
dup_index = idx1.intersection(idx2)
unique_idx1 = idx1.difference(dup_index)
unique_idx2 = idx2.difference(dup_index)

CKP1_uni = CKP1[unique_idx1, :].copy()
CKP2_uni = CKP2[unique_idx2, :].copy()
CKP_intersection = CKP1[dup_index, :].copy()

adata = anndata.concatenate((CKP1_uni, CKP2_uni, CKP_intersection))

The error you are seeing (TypeError: unsupported operand type(s) for -: 'AnnData' and 'AnnData') is because it hasn’t been defined what adata1 - adata2 means. So instead I am creating unique indices and intersection of the indices, then slice by those.

Now, I am not sure what this data is. But if you have two sequenced librariries of the same samples, I would probably add up the molecule counts from both CKP1[dup_index, :] and CKP2[dup_index, :]. Though it’s hard to know if the UMIs are unique between them.

Hope this helps!
/Valentine

hyjforesight · May 16, 2022, 12:02am

Hello Valentine,
Thanks for the solution! Appreciate it! You saved our data!
We’ll revisit this post once we publish our data and make the acknowledgment for you!
Thanks!
Best,
YJ

Topic		Replies	Views
How to concatenate spatial AnnData objects squidpy	4	1501	August 15, 2023
Anndata.concat([a,b], join="inner") does not behave as a.concat(b) anndata integration , anndata	1	149	September 4, 2024
Concatenate anndata with merged rows anndata	0	410	August 5, 2022
Help concatenating var for cite seq scanpy	2	627	May 24, 2023
How to concatenate anndata properly? anndata scrna-seq , integration , scvi	2	8365	November 3, 2022

How to make the union set of 2 scRNA-seq matrix?

Related topics