Not able to load TCR data

Hello!
I am trying to analyze my TCR data by using scirpy but I am running into an error. Quick note, I analyze my sc data using Seurat and then converted Seurat in AnnData. This worked to run scvelo but I think I am doing something wrong in loading the filtered_contig_annotations.

Here is my code:

adata = scv.read("sobjscv.h5ad")

filenames = ['p219_base', 'p229_base', 'p264_base', 'pV03_base', 'pV09_base', 'p219_post', 'p229_post', 'p264_post', 'pV03_post', 'pV09_post']
tcrs = [ir.io.read_10x_vdj(filename) for filename in filenames]
for tcr in range(len(tcrs)):
    tcrs[tcr].obs_names = filenames[tcr] + "_" + tcrs[tcr].obs_names
    print(tcrs[tcr].obs_names)
tcr = tcrs[0].concatenate(tcrs[1:])
adata.obs_names= [c.split("-")[0] for c in adata.obs_names]
tcr.obs_names = [d.split("-")[0] for d in tcr.obs_names]
ir.pp.merge_with_ir(adata, tcr)

p219_base, p229_base, p264_base, pV03_base, pV09_base, p219_post, p229_post, p264_post, pV03_post, pV09_post are the names of my csv contigs.

I tried to make adata.obs_names and tcr.obs_names the same:

adata.obs_names
Index(['p229_base_AAACCTGAGACGCACA', 'p229_base_AAACCTGCACCGAAAG',
       'p229_base_AAACCTGGTCTACCTC', 'p229_base_AAACCTGGTGGTAACG',
       'p229_base_AAACCTGTCAGCAACT', 'p229_base_AAACCTGTCCCTAATT',
       'p229_base_AAACCTGTCTTGCATT', 'p229_base_AAACGGGAGCAATATG',
       'p229_base_AAACGGGGTTATTCTC', 'p229_base_AAACGGGTCTAACCGA',
       ...
       'pV03_post_TTTGGTTTCAAGATCC', 'pV03_post_TTTGGTTTCATCGCTC',
       'pV03_post_TTTGTCAAGAAGAAGC', 'pV03_post_TTTGTCAAGCCCTAAT',
       'pV03_post_TTTGTCAAGTGTACCT', 'pV03_post_TTTGTCACAAAGAATC',
       'pV03_post_TTTGTCACAGTGAGTG', 'pV03_post_TTTGTCACATTTCACT',
       'pV03_post_TTTGTCATCCCTCAGT', 'pV03_post_TTTGTCATCGCTGATA'],
      dtype='object', length=50808)
tcr.obs_names
Index(['p219_base_AAACCTGAGAGCTGCA', 'p219_base_AAACCTGAGGGATCTG',
       'p219_base_AAACCTGAGTGGCACA', 'p219_base_AAACCTGCAAACGCGA',
       'p219_base_AAACCTGCAGCCTTTC', 'p219_base_AAACCTGCATTGCGGC',
       'p219_base_AAACCTGGTCCAAGTT', 'p219_base_AAACCTGGTGGTCTCG',
       'p219_base_AAACCTGTCACCCGAG', 'p219_base_AAACCTGTCACTGGGC',
       ...
       'pV09_post_TTTGTCACAAGAGGCT', 'pV09_post_TTTGTCACAGACAAAT',
       'pV09_post_TTTGTCACAGGTCTCG', 'pV09_post_TTTGTCAGTAGGACAC',
       'pV09_post_TTTGTCAGTCATATGC', 'pV09_post_TTTGTCAGTCGACTAT',
       'pV09_post_TTTGTCAGTCTTCTCG', 'pV09_post_TTTGTCAGTGCCTGTG',
       'pV09_post_TTTGTCATCCATTCTA', 'pV09_post_TTTGTCATCGGAATCT'],
      dtype='object', length=36086)

Then if I run

ax = ir.pl.group_abundance(adata, groupby="receptor_subtype", target_col="group_id")
Traceback (most recent call last):
  File "/home/francesco/Documents/AML_project/scRNA_AMLproj/python_scRNAseq/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3629, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'receptor_subtype'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/lib/python3.8/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "/home/francesco/Documents/AML_project/scRNA_AMLproj/python_scRNAseq/lib/python3.8/site-packages/scirpy/io/_util.py", line 71, in check_wrapper
    return f(*args, **kwargs)
  File "/home/francesco/Documents/AML_project/scRNA_AMLproj/python_scRNAseq/lib/python3.8/site-packages/scirpy/pl/_group_abundance.py", line 62, in group_abundance
    abundance = tl.group_abundance(
  File "/home/francesco/Documents/AML_project/scRNA_AMLproj/python_scRNAseq/lib/python3.8/site-packages/scirpy/io/_util.py", line 71, in check_wrapper
    return f(*args, **kwargs)
  File "/home/francesco/Documents/AML_project/scRNA_AMLproj/python_scRNAseq/lib/python3.8/site-packages/scirpy/tl/_group_abundance.py", line 110, in group_abundance
    return _group_abundance(
  File "/home/francesco/Documents/AML_project/scRNA_AMLproj/python_scRNAseq/lib/python3.8/site-packages/scirpy/tl/_group_abundance.py", line 20, in _group_abundance
    na_mask = _is_na(ir_obs[groupby]) | _is_na(ir_obs[target_col])
  File "/home/francesco/Documents/AML_project/scRNA_AMLproj/python_scRNAseq/lib/python3.8/site-packages/pandas/core/frame.py", line 3505, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/francesco/Documents/AML_project/scRNA_AMLproj/python_scRNAseq/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3631, in get_loc
    raise KeyError(key) from err
KeyError: 'receptor_subtype'
/snap/pycharm-community/298/plugins/python-ce/helpers/pydev/_pydevd_bundle/pydevd_resolver.py:178: DeprecationWarning: Use is_view instead of isview, isview will be removed in the future.
  attr = getattr(var, n)

Could you help me?

Francesco

KeyError: 'receptor_subtype'

This sounds as if you didn’t run ir.tl.chain_qc before plotting… any chance you didn’t do this?


The rest of your code snippet looks fine, however

for tcr in range(len(tcrs)):
    tcrs[tcr]

could be simplified to

for tcr in tcrs:
   tcr

Thanks for your reply. I did forget to paste ir.tl.chain_qc in the snippet.
Sorry for that. However, even running the missing line I get the following
group_abundance

ok, could you please take a look at

tcrs[0].obs.loc[:, ["IR_VJ_1_locus", "IR_VJ_1_junction_aa", "IR_VDJ_1_locus", "IR_VDJ_1_junction_aa"]]

and

adata.obs.loc[:, ["IR_VJ_1_locus", "IR_VJ_1_junction_aa", "IR_VDJ_1_locus", "IR_VDJ_1_junction_aa"]]

and post the results here?

It seems you found the issue. How can I fix it?

tcrs[0].obs.loc[:, ["IR_VJ_1_locus", "IR_VJ_1_junction_aa", "IR_VDJ_1_locus", "IR_VDJ_1_junction_aa"]]
                             IR_VJ_1_locus  ... IR_VDJ_1_junction_aa
cell_id                                     ...                     
p219_base_AAACCTGAGAGCTGCA-1           TRA  ...       CASSLQGGNTEAFF
p219_base_AAACCTGAGGGATCTG-1           TRA  ...      CASSLGQGAGQPQHF
p219_base_AAACCTGAGTGGCACA-1           TRA  ...       CASRDPGQRHGYTF
p219_base_AAACCTGCAAACGCGA-1           TRA  ...       CASSLRGGFYEQYF
p219_base_AAACCTGCAGCCTTTC-1           TRA  ...      CASSLGQGPGAEAFF
                                    ...  ...                  ...
p219_base_TTTGTCAGTAGCTCCG-1           TRA  ...      CASSLSDRDNSPLHF
p219_base_TTTGTCAGTGCAGGTA-1           TRA  ...       CSALTGEIVEKLFF
p219_base_TTTGTCAGTTCAGTAC-1           TRA  ...       CSARDYRAVSPLHF
p219_base_TTTGTCATCATGCATG-1           TRA  ...                  NaN
p219_base_TTTGTCATCGTCTGCT-1           TRA  ...        CASSEEGYYEQYF
[5016 rows x 4 columns]
adata.obs.loc[:, ["IR_VJ_1_locus", "IR_VJ_1_junction_aa", "IR_VDJ_1_locus", "IR_VDJ_1_junction_aa"]]
                           IR_VJ_1_locus  ... IR_VDJ_1_junction_aa
p229_base_AAACCTGAGACGCACA           NaN  ...                  NaN
p229_base_AAAGATGCAAAGTGCG           NaN  ...                  NaN
p229_base_AAAGCAAAGTGGACGT           NaN  ...                  NaN
p229_base_AAATGCCCATTCTCAT           NaN  ...                  NaN
p229_base_AACACGTCACAACTGT           NaN  ...                  NaN
                                  ...  ...                  ...
pV03_post_TTGGCAAAGTGTGAAT           NaN  ...                  NaN
pV03_post_TTTACTGGTAGAAGGA           NaN  ...                  NaN
pV03_post_TTTATGCCACTTCTGC           NaN  ...                  NaN
pV03_post_TTTGCGCCAGGGTATG           NaN  ...                  NaN
pV03_post_TTTGTCACATTTCACT           NaN  ...                  NaN
[2978 rows x 4 columns]

It’s still not entirely clear to me what’s causing the problem.

First, could you please also check

tcr.obs.loc[:, ["IR_VJ_1_locus", "IR_VJ_1_junction_aa", "IR_VDJ_1_locus", "IR_VDJ_1_junction_aa"]]

to rule out that the problem already arises during concatenation?

Second, do you happen to have a batch column in adata.obs, i.e. does one of the following exist?

tcr.obs["batch"]
adata.obs["batch"]

If it exists, the batch column is implicitly included in the join of merge_with_ir, so the batch column would need to be consistent between the two objects, too.

This:

tcr.obs["batch"]
adata.obs["batch"]

was the problem. I removed the batch variable from adata.obs and it is working.
Only one question. How should I interpret the no IR receptor_subtype?

Is this the portion of cells without an identified TCR?

Thanks!

Is this the portion of cells without an identified TCR?

exactly. Cells with only non-productive TCRs would also fall in this category, though.

1 Like

Hello grst,
Thank you again for your reply.
How can I remove then the no IR subtype?

One last question about the Query epitope databases section. How can one plot only for example the CMV antigen.specie? Also, what would be the way to plot the antigen.species onto the 2D UMAP as density instead of dots (if possible) and also as boxplots (showing the proportion of cells CMV specific in a specific subset)?

Thanks!

How can I remove then the no IR subtype?

You can simply subset the AnnData object to exclude a certain category, e.g.

adata_filtered = adata[adata.obs["receptor_subtype"] != "no IR", :]

How can one plot only for example the CMV antigen.specie?

Assuming you mean on a UMAP plot, you can specify the groups keyword (see umap docs).

Also, what would be the way to plot the antigen.species onto the 2D UMAP as density instead of dots (if possible)

I never tried this, but I think this would be possible with scanpy’s embedding_density function.

and also as boxplots (showing the proportion of cells CMV specific in a specific subset)?

This is beyond the scope of scverse functions, but you can easily use the dataframe in adata.obs and use it with a plotting library of your choice, e.g. seaborn or altair. Here’s an example with seaborn using the adata object with annotated epitopes from the end of the scirpy tutorial. It generates a boxplot of cell-fractions split by sample origin (Blood vs. normal adjacent vs. Tumor) colored by antigen species.

import matplotlib.pyplot as plt
import seaborn as sns

antigen_per_patient = (
    adata.obs.groupby(["source", "patient"])
    .apply(
        lambda x: x["antigen.species"]
        .value_counts(dropna=False, normalize=True)
        .rename_axis("antigen.species")
    )
    .reset_index(name="fraction")
)

sns.boxplot(data=antigen_per_patient, hue="antigen.species", x="source", y="fraction")
plt.legend(bbox_to_anchor=(1.04, 1))

You can also use the antigen_per_patient dataframe to perform statistics with e.g. statsmodels or pingouin, or simply export it to R if you are more familiar with it.