Hello Scanpy,
We have 2 scRNA-seq libraries sharing some common barcodes between them. We want to make a union set of these 2 libraries by removing the duplicated barcodes? But we’re not professional with the anndata coding. It looks like we cannot do it by simple | or & or - or ^ in pandas.
Could you please help us with this question?
Thanks!
Best,
YJ
We tried:
CKP1 = sc.read_10x_mtx(path='D:/ZGY/MST_matrix/KP9CKP11-5_11/CKP/', var_names='gene_symbols', cache=True)
CKP1.var_names_make_unique()
CKP2 = sc.read_10x_mtx(path='D:/ZGY/MST_matrix/KP10CKP12-5_11/CKP/', var_names='gene_symbols', cache=True)
CKP2.var_names_make_unique()
dup_index = CKP1.obs_names.intersection(CKP2.obs_names) # find the duplicated index between 2 libraries
dup_index
Index(['AAACCCACACCTTCGT-1', 'AAACCCACACTACAGT-1', 'AAACCCACAGATACCT-1',
'AAACCCACAGCGTTTA-1', 'AAACCCACAGGCCTGT-1', 'AAACCCACATGAGATA-1',
'AAACCCAGTAATCAAG-1', 'AAACCCAGTCGCCTAG-1', 'AAACCCAGTGTCATCA-1',
'AAACCCAGTGTCCATA-1',
...
'TTTGTTGCATAGAGGC-1', 'TTTGTTGCATGAGATA-1', 'TTTGTTGGTCGTACTA-1',
'TTTGTTGGTGCGGCTT-1', 'TTTGTTGGTTGTGTTG-1', 'TTTGTTGTCAAAGCCT-1',
'TTTGTTGTCACCCTTG-1', 'TTTGTTGTCCTCGCAT-1', 'TTTGTTGTCGCTTAAG-1',
'TTTGTTGTCTCGCAGG-1'],
dtype='object', length=9697)
CKP1_uni=CKP1-CKP1[dup_index,:] # slice the unique part of CKP1
TypeError: unsupported operand type(s) for -: 'AnnData' and 'AnnData'
CKP2_uni=CKP2-CKP2[dup_index,:] # slice the unique part of CKP2
TypeError: unsupported operand type(s) for -: 'AnnData' and 'AnnData'
CKP_intersection=CKP1[dup_index,:] # slice the intersection part of CKP1 and CKP2
View of AnnData object with n_obs × n_vars = 9697 × 32285
var: 'gene_ids', 'feature_types'
adata = CKP1_uni.concatenate(CKP2_uni, CKP_intersection, batch_categories=['CKP1_uni', 'CKP2_uni', 'CKP_intersection']) # merge these 3 parts
NameError: name 'CKP1_uni' is not defined