Hi everyone,
In our lab, we use the BD Rhapsody scRNA seq pipeline. Results from our recent CD8+ T cells TCR sequencing analysis confused me regarding the no-IR definition.
Here is step-by-step code and sample output
# Load IR and GEX data
mudata.set_options(pull_on_update=False)
lib1 = mudata.read_h5mu('Bd_out.cellismo')['rna']
lib1_airr = ir.io.read_airr(path = 'Bd_out_VDJ_Dominant_Contigs_AIRR.tsv')
lib1_mu = mu.MuData({'rna': lib1, 'airr': lib1_airr})
# Collect TCR QC metrics from AIRR data
ir.pp.index_chains(lib1_mu)
ir.tl.chain_qc(lib1_mu)
Looking at the QC metric plots, I noticed an unusually high number of non-IR-tagged cells. I loaded the dominant contigs file and checked the entries for some of the cells tagged as no-IR. Code and output below:
>> lib1_mu.obs.loc['397582']
## Output---------------------
airr:receptor_type no IR
airr:receptor_subtype no IR
airr:chain_pairing no IR
Name: 397582, dtype: object
-------------------------------
>> ir.get.airr(lib1_mu[lib1_mu.obs['airr:chain_pairing'] == 'no IR', :],
chain='VJ_1',
airr_variable=['junction_aa', 'v_call', 'productive']).loc['397582']
## Output---------------------
VJ_1_junction_aa <NA>
VJ_1_v_call <NA>
VJ_1_productive <NA>
Name: 397582, dtype: object
-------------------------------
>> df = pd.read_csv('Bd_out_VDJ_Dominant_Contigs_AIRR.tsv', sep='\t')
>> df.loc[df['cell_id'] == 397582,
[
'cell_id', 'cell_type_experimental',
'high_quality_cell_tcr_bcr', 'umi_count',
'locus', 'sequence_id', 'productive', 'junction_aa',
'v_call', 'd_call', 'j_call'
]
]
## Output---------------------
| cell_id | cell_type_experimental | high_quality_cell_tcr_bcr | umi_count | locus | sequence_id | productive | junction_aa | v_call | d_call | j_call |
|---|---|---|---|---|---|---|---|---|---|---|
| 397582 | T | False | 1 | TRA | 397582_TRA_1 | False | RAMSLITTCSTS | TRAV6D-3*03 | NaN | TRAJ21*01 |
| 397582 | T | False | 4 | TRB | 397582_TRB_1 | False | HLQCR*TGAYEQYF | TRBV1*01 | TRBD2*01 | TRBJ2-7*01 |
| 397582 | T | False | 2 | TRG | 397582_TRG_1 | False | CASW*YSSGFHKVF | TRGV7*02 | NaN | TRGJ1*01 |
>> from scirpy.util import DataHandler
>> params = DataHandler(data=lib1_mu,
airr_mod='airr',
airr_key='airr',
chain_idx_key='chain_indices')
>> ir.get._has_ir(params)
ir_dat = ir.get.airr(lib1_mu,
chain=['VJ_1', 'VDJ_1', 'VJ_2', 'VDJ_2'],
airr_variable=['junction_aa', 'v_call', 'd_call', 'j_call', 'productive'])
ir_dat['_has_ir'] = ir.get._has_ir(params)
ir_dat.loc['397582']
# Output -----------------
VJ_1_junction_aa <NA>
VJ_1_v_call <NA>
VJ_1_d_call <NA>
VJ_1_j_call <NA>
VJ_1_productive <NA>
VDJ_1_junction_aa <NA>
VDJ_1_v_call <NA>
VDJ_1_d_call <NA>
VDJ_1_j_call <NA>
VDJ_1_productive <NA>
VJ_2_junction_aa <NA>
VJ_2_v_call <NA>
VJ_2_d_call <NA>
VJ_2_j_call <NA>
VJ_2_productive <NA>
VDJ_2_junction_aa <NA>
VDJ_2_v_call <NA>
VDJ_2_d_call <NA>
VDJ_2_j_call <NA>
VDJ_2_productive <NA>
_has_ir False
Name: 397582, dtype: object
---------------------------
# Package version info:
anndata 0.12.14
awkward 2.8.10
matplotlib 3.10.8
mudata 0.3.8
muon 0.1.7
numpy 2.3.5
pandas 2.3.3
scanpy 1.12.1
scirpy 0.24.0
seaborn 0.13.2
So even though there are entries in the dominant contigs file, why do I get “no IR”? I don’t understand. Is it expected for some reason, or am I doing something wrong?
I can happily provide further info.
Thank you for your help.
Kind regards.