Confusion on "no IR" definition in scirpy

Hi everyone,

In our lab, we use the BD Rhapsody scRNA seq pipeline. Results from our recent CD8+ T cells TCR sequencing analysis confused me regarding the no-IR definition.

Here is step-by-step code and sample output

# Load IR and GEX data
mudata.set_options(pull_on_update=False)
lib1 = mudata.read_h5mu('Bd_out.cellismo')['rna']
lib1_airr = ir.io.read_airr(path = 'Bd_out_VDJ_Dominant_Contigs_AIRR.tsv')
lib1_mu = mu.MuData({'rna': lib1, 'airr': lib1_airr})

# Collect TCR QC metrics from AIRR data
ir.pp.index_chains(lib1_mu)
ir.tl.chain_qc(lib1_mu)

Looking at the QC metric plots, I noticed an unusually high number of non-IR-tagged cells. I loaded the dominant contigs file and checked the entries for some of the cells tagged as no-IR. Code and output below:

>> lib1_mu.obs.loc['397582']

## Output---------------------
airr:receptor_type       no IR
airr:receptor_subtype    no IR
airr:chain_pairing       no IR
Name: 397582, dtype: object
-------------------------------

>> ir.get.airr(lib1_mu[lib1_mu.obs['airr:chain_pairing'] == 'no IR', :], 
            chain='VJ_1', 
            airr_variable=['junction_aa', 'v_call', 'productive']).loc['397582']

## Output---------------------
VJ_1_junction_aa    <NA>
VJ_1_v_call         <NA>
VJ_1_productive     <NA>
Name: 397582, dtype: object
-------------------------------

>> df = pd.read_csv('Bd_out_VDJ_Dominant_Contigs_AIRR.tsv', sep='\t')
>> df.loc[df['cell_id'] == 397582, 
       [
           'cell_id', 'cell_type_experimental', 
           'high_quality_cell_tcr_bcr', 'umi_count',
           'locus', 'sequence_id', 'productive', 'junction_aa',
           'v_call', 'd_call', 'j_call'
       ]
   ]

## Output---------------------
cell_id cell_type_experimental high_quality_cell_tcr_bcr umi_count locus sequence_id productive junction_aa v_call d_call j_call
397582 T False 1 TRA 397582_TRA_1 False RAMSLITTCSTS TRAV6D-3*03 NaN TRAJ21*01
397582 T False 4 TRB 397582_TRB_1 False HLQCR*TGAYEQYF TRBV1*01 TRBD2*01 TRBJ2-7*01
397582 T False 2 TRG 397582_TRG_1 False CASW*YSSGFHKVF TRGV7*02 NaN TRGJ1*01
>> from scirpy.util import DataHandler
>> params = DataHandler(data=lib1_mu, 
                     airr_mod='airr', 
                     airr_key='airr', 
                     chain_idx_key='chain_indices')
>> ir.get._has_ir(params)
ir_dat = ir.get.airr(lib1_mu, 
                     chain=['VJ_1', 'VDJ_1', 'VJ_2', 'VDJ_2'],
                     airr_variable=['junction_aa', 'v_call', 'd_call', 'j_call', 'productive'])
ir_dat['_has_ir'] = ir.get._has_ir(params)
ir_dat.loc['397582']

# Output -----------------
VJ_1_junction_aa      <NA>
VJ_1_v_call           <NA>
VJ_1_d_call           <NA>
VJ_1_j_call           <NA>
VJ_1_productive       <NA>
VDJ_1_junction_aa     <NA>
VDJ_1_v_call          <NA>
VDJ_1_d_call          <NA>
VDJ_1_j_call          <NA>
VDJ_1_productive      <NA>
VJ_2_junction_aa      <NA>
VJ_2_v_call           <NA>
VJ_2_d_call           <NA>
VJ_2_j_call           <NA>
VJ_2_productive       <NA>
VDJ_2_junction_aa     <NA>
VDJ_2_v_call          <NA>
VDJ_2_d_call          <NA>
VDJ_2_j_call          <NA>
VDJ_2_productive      <NA>
_has_ir              False
Name: 397582, dtype: object
---------------------------
# Package version info:
anndata             0.12.14
awkward             2.8.10
matplotlib          3.10.8
mudata              0.3.8
muon                0.1.7
numpy               2.3.5
pandas              2.3.3
scanpy              1.12.1
scirpy              0.24.0
seaborn             0.13.2

So even though there are entries in the dominant contigs file, why do I get “no IR”? I don’t understand. Is it expected for some reason, or am I doing something wrong?

I can happily provide further info.

Thank you for your help.

Kind regards.

Hi @osmanmerdan,

thanks for your question. I think for the example you are showing the reason is that the chains are marked as non-productive. By default, index_chains does not consider non-productive chains. See also the filter parameter in the index_chains docs.

Best,
Gregor