Not able to load TCR data

france · September 23, 2022, 12:54am

Hello!
I am trying to analyze my TCR data by using scirpy but I am running into an error. Quick note, I analyze my sc data using Seurat and then converted Seurat in AnnData. This worked to run scvelo but I think I am doing something wrong in loading the filtered_contig_annotations.

Here is my code:

adata = scv.read("sobjscv.h5ad")

filenames = ['p219_base', 'p229_base', 'p264_base', 'pV03_base', 'pV09_base', 'p219_post', 'p229_post', 'p264_post', 'pV03_post', 'pV09_post']
tcrs = [ir.io.read_10x_vdj(filename) for filename in filenames]
for tcr in range(len(tcrs)):
    tcrs[tcr].obs_names = filenames[tcr] + "_" + tcrs[tcr].obs_names
    print(tcrs[tcr].obs_names)
tcr = tcrs[0].concatenate(tcrs[1:])
adata.obs_names= [c.split("-")[0] for c in adata.obs_names]
tcr.obs_names = [d.split("-")[0] for d in tcr.obs_names]
ir.pp.merge_with_ir(adata, tcr)

p219_base, p229_base, p264_base, pV03_base, pV09_base, p219_post, p229_post, p264_post, pV03_post, pV09_post are the names of my csv contigs.

I tried to make adata.obs_names and tcr.obs_names the same:

adata.obs_names
Index(['p229_base_AAACCTGAGACGCACA', 'p229_base_AAACCTGCACCGAAAG',
       'p229_base_AAACCTGGTCTACCTC', 'p229_base_AAACCTGGTGGTAACG',
       'p229_base_AAACCTGTCAGCAACT', 'p229_base_AAACCTGTCCCTAATT',
       'p229_base_AAACCTGTCTTGCATT', 'p229_base_AAACGGGAGCAATATG',
       'p229_base_AAACGGGGTTATTCTC', 'p229_base_AAACGGGTCTAACCGA',
       ...
       'pV03_post_TTTGGTTTCAAGATCC', 'pV03_post_TTTGGTTTCATCGCTC',
       'pV03_post_TTTGTCAAGAAGAAGC', 'pV03_post_TTTGTCAAGCCCTAAT',
       'pV03_post_TTTGTCAAGTGTACCT', 'pV03_post_TTTGTCACAAAGAATC',
       'pV03_post_TTTGTCACAGTGAGTG', 'pV03_post_TTTGTCACATTTCACT',
       'pV03_post_TTTGTCATCCCTCAGT', 'pV03_post_TTTGTCATCGCTGATA'],
      dtype='object', length=50808)
tcr.obs_names
Index(['p219_base_AAACCTGAGAGCTGCA', 'p219_base_AAACCTGAGGGATCTG',
       'p219_base_AAACCTGAGTGGCACA', 'p219_base_AAACCTGCAAACGCGA',
       'p219_base_AAACCTGCAGCCTTTC', 'p219_base_AAACCTGCATTGCGGC',
       'p219_base_AAACCTGGTCCAAGTT', 'p219_base_AAACCTGGTGGTCTCG',
       'p219_base_AAACCTGTCACCCGAG', 'p219_base_AAACCTGTCACTGGGC',
       ...
       'pV09_post_TTTGTCACAAGAGGCT', 'pV09_post_TTTGTCACAGACAAAT',
       'pV09_post_TTTGTCACAGGTCTCG', 'pV09_post_TTTGTCAGTAGGACAC',
       'pV09_post_TTTGTCAGTCATATGC', 'pV09_post_TTTGTCAGTCGACTAT',
       'pV09_post_TTTGTCAGTCTTCTCG', 'pV09_post_TTTGTCAGTGCCTGTG',
       'pV09_post_TTTGTCATCCATTCTA', 'pV09_post_TTTGTCATCGGAATCT'],
      dtype='object', length=36086)

Then if I run

ax = ir.pl.group_abundance(adata, groupby="receptor_subtype", target_col="group_id")
Traceback (most recent call last):
  File "/home/francesco/Documents/AML_project/scRNA_AMLproj/python_scRNAseq/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3629, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'receptor_subtype'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/lib/python3.8/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "/home/francesco/Documents/AML_project/scRNA_AMLproj/python_scRNAseq/lib/python3.8/site-packages/scirpy/io/_util.py", line 71, in check_wrapper
    return f(*args, **kwargs)
  File "/home/francesco/Documents/AML_project/scRNA_AMLproj/python_scRNAseq/lib/python3.8/site-packages/scirpy/pl/_group_abundance.py", line 62, in group_abundance
    abundance = tl.group_abundance(
  File "/home/francesco/Documents/AML_project/scRNA_AMLproj/python_scRNAseq/lib/python3.8/site-packages/scirpy/io/_util.py", line 71, in check_wrapper
    return f(*args, **kwargs)
  File "/home/francesco/Documents/AML_project/scRNA_AMLproj/python_scRNAseq/lib/python3.8/site-packages/scirpy/tl/_group_abundance.py", line 110, in group_abundance
    return _group_abundance(
  File "/home/francesco/Documents/AML_project/scRNA_AMLproj/python_scRNAseq/lib/python3.8/site-packages/scirpy/tl/_group_abundance.py", line 20, in _group_abundance
    na_mask = _is_na(ir_obs[groupby]) | _is_na(ir_obs[target_col])
  File "/home/francesco/Documents/AML_project/scRNA_AMLproj/python_scRNAseq/lib/python3.8/site-packages/pandas/core/frame.py", line 3505, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/francesco/Documents/AML_project/scRNA_AMLproj/python_scRNAseq/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3631, in get_loc
    raise KeyError(key) from err
KeyError: 'receptor_subtype'
/snap/pycharm-community/298/plugins/python-ce/helpers/pydev/_pydevd_bundle/pydevd_resolver.py:178: DeprecationWarning: Use is_view instead of isview, isview will be removed in the future.
  attr = getattr(var, n)

Could you help me?

Francesco

grst · September 23, 2022, 6:11am

KeyError: 'receptor_subtype'

This sounds as if you didn’t run ir.tl.chain_qc before plotting… any chance you didn’t do this?

The rest of your code snippet looks fine, however

for tcr in range(len(tcrs)):
    tcrs[tcr]

could be simplified to

for tcr in tcrs:
   tcr

france · September 23, 2022, 4:05pm

Thanks for your reply. I did forget to paste ir.tl.chain_qc in the snippet.
Sorry for that. However, even running the missing line I get the following
group_abundance

grst · September 23, 2022, 5:52pm

ok, could you please take a look at

tcrs[0].obs.loc[:, ["IR_VJ_1_locus", "IR_VJ_1_junction_aa", "IR_VDJ_1_locus", "IR_VDJ_1_junction_aa"]]

and

adata.obs.loc[:, ["IR_VJ_1_locus", "IR_VJ_1_junction_aa", "IR_VDJ_1_locus", "IR_VDJ_1_junction_aa"]]

and post the results here?

france · September 23, 2022, 9:09pm

It seems you found the issue. How can I fix it?

tcrs[0].obs.loc[:, ["IR_VJ_1_locus", "IR_VJ_1_junction_aa", "IR_VDJ_1_locus", "IR_VDJ_1_junction_aa"]]
                             IR_VJ_1_locus  ... IR_VDJ_1_junction_aa
cell_id                                     ...                     
p219_base_AAACCTGAGAGCTGCA-1           TRA  ...       CASSLQGGNTEAFF
p219_base_AAACCTGAGGGATCTG-1           TRA  ...      CASSLGQGAGQPQHF
p219_base_AAACCTGAGTGGCACA-1           TRA  ...       CASRDPGQRHGYTF
p219_base_AAACCTGCAAACGCGA-1           TRA  ...       CASSLRGGFYEQYF
p219_base_AAACCTGCAGCCTTTC-1           TRA  ...      CASSLGQGPGAEAFF
                                    ...  ...                  ...
p219_base_TTTGTCAGTAGCTCCG-1           TRA  ...      CASSLSDRDNSPLHF
p219_base_TTTGTCAGTGCAGGTA-1           TRA  ...       CSALTGEIVEKLFF
p219_base_TTTGTCAGTTCAGTAC-1           TRA  ...       CSARDYRAVSPLHF
p219_base_TTTGTCATCATGCATG-1           TRA  ...                  NaN
p219_base_TTTGTCATCGTCTGCT-1           TRA  ...        CASSEEGYYEQYF
[5016 rows x 4 columns]
adata.obs.loc[:, ["IR_VJ_1_locus", "IR_VJ_1_junction_aa", "IR_VDJ_1_locus", "IR_VDJ_1_junction_aa"]]
                           IR_VJ_1_locus  ... IR_VDJ_1_junction_aa
p229_base_AAACCTGAGACGCACA           NaN  ...                  NaN
p229_base_AAAGATGCAAAGTGCG           NaN  ...                  NaN
p229_base_AAAGCAAAGTGGACGT           NaN  ...                  NaN
p229_base_AAATGCCCATTCTCAT           NaN  ...                  NaN
p229_base_AACACGTCACAACTGT           NaN  ...                  NaN
                                  ...  ...                  ...
pV03_post_TTGGCAAAGTGTGAAT           NaN  ...                  NaN
pV03_post_TTTACTGGTAGAAGGA           NaN  ...                  NaN
pV03_post_TTTATGCCACTTCTGC           NaN  ...                  NaN
pV03_post_TTTGCGCCAGGGTATG           NaN  ...                  NaN
pV03_post_TTTGTCACATTTCACT           NaN  ...                  NaN
[2978 rows x 4 columns]

grst · September 24, 2022, 12:19pm

It’s still not entirely clear to me what’s causing the problem.

First, could you please also check

tcr.obs.loc[:, ["IR_VJ_1_locus", "IR_VJ_1_junction_aa", "IR_VDJ_1_locus", "IR_VDJ_1_junction_aa"]]

to rule out that the problem already arises during concatenation?

Second, do you happen to have a batch column in adata.obs, i.e. does one of the following exist?

tcr.obs["batch"]
adata.obs["batch"]

If it exists, the batch column is implicitly included in the join of merge_with_ir, so the batch column would need to be consistent between the two objects, too.

france · September 24, 2022, 2:56pm

This:

tcr.obs["batch"]
adata.obs["batch"]

was the problem. I removed the batch variable from adata.obs and it is working.
Only one question. How should I interpret the no IR receptor_subtype?

Is this the portion of cells without an identified TCR?

Thanks!

grst · September 24, 2022, 3:48pm

Is this the portion of cells without an identified TCR?

exactly. Cells with only non-productive TCRs would also fall in this category, though.

france · September 26, 2022, 7:10pm

Hello grst,
Thank you again for your reply.
How can I remove then the no IR subtype?

One last question about the Query epitope databases section. How can one plot only for example the CMV antigen.specie? Also, what would be the way to plot the antigen.species onto the 2D UMAP as density instead of dots (if possible) and also as boxplots (showing the proportion of cells CMV specific in a specific subset)?

Thanks!

grst · September 26, 2022, 8:03pm

How can I remove then the no IR subtype?

You can simply subset the AnnData object to exclude a certain category, e.g.

adata_filtered = adata[adata.obs["receptor_subtype"] != "no IR", :]

How can one plot only for example the CMV antigen.specie?

Assuming you mean on a UMAP plot, you can specify the groups keyword (see umap docs).

Also, what would be the way to plot the antigen.species onto the 2D UMAP as density instead of dots (if possible)

I never tried this, but I think this would be possible with scanpy’s embedding_density function.

and also as boxplots (showing the proportion of cells CMV specific in a specific subset)?

This is beyond the scope of scverse functions, but you can easily use the dataframe in adata.obs and use it with a plotting library of your choice, e.g. seaborn or altair. Here’s an example with seaborn using the adata object with annotated epitopes from the end of the scirpy tutorial. It generates a boxplot of cell-fractions split by sample origin (Blood vs. normal adjacent vs. Tumor) colored by antigen species.

import matplotlib.pyplot as plt
import seaborn as sns

antigen_per_patient = (
    adata.obs.groupby(["source", "patient"])
    .apply(
        lambda x: x["antigen.species"]
        .value_counts(dropna=False, normalize=True)
        .rename_axis("antigen.species")
    )
    .reset_index(name="fraction")
)

sns.boxplot(data=antigen_per_patient, hue="antigen.species", x="source", y="fraction")
plt.legend(bbox_to_anchor=(1.04, 1))

You can also use the antigen_per_patient dataframe to perform statistics with e.g. statsmodels or pingouin, or simply export it to R if you are more familiar with it.

Topic		Replies	Views
Problem with airr data loading scirpy	11	699	October 3, 2023
TypeError with ir.tl.define_clonotype_clusters scirpy	3	40	July 9, 2024
NotImplementedError: Cannot create a view of an awkward array with __array__ parameter scirpy anndata	9	72	November 6, 2024
Defines clonotype clusters using TCRdist distance matrix scirpy	3	265	May 7, 2024
Loading data into adata.obsm['airr'] of an existing scRNAseq object scirpy	2	343	July 24, 2023

Not able to load TCR data

Related topics