TCR Metrics for Pairwise Distance Matrix

grst · January 26, 2024, 7:06am

First, I want to make the distinction between “clonotypes” and “clonotype clusters” clear:

A clonotype refers to T cells with the same origin and exactly the same CDR3 nucleotide sequence
A clonotype cluster is a group of similar clonotypes that likely recognize the same epitope, as defined by some distance metric.

This implies that for defining clonotypes, the only relevant metric is the identity metric.

As for defining clonotype clusters, clonotype networks and database queries, I don’t think that any of the metrics available in scirpy behaves fundamentally different. Ultimately, we don’t know which metric is best (in that it captures best which receptors recognize the same antigen), because there is insufficient gold standard data for benchmarking. It is very likely though that the “alignment” metric captures this better than the “levenshtein” distance because it takes the properties of the individual amino acids into account (at a higher computational cost). The “hamming” distance is more useful for B cells than for T cells.

The overall network structure is more affected by the distance threshold you set (a higher cutoff will lead to larger network components) and whether you set receptor_arms and dual_ir to all or any. This is also demonstrated to some extent in this thread: Is it necessary to remove orphan-VJ/VDJ cells in practice?

Hope that helps!

Topic		Replies	Views
Defines clonotype clusters using TCRdist distance matrix scirpy	3	253	May 7, 2024
Interpretation of Scripy's define_clonotype_clusters for B-cell receptors Help	4	147	December 1, 2023
TypeError with ir.tl.define_clonotype_clusters scirpy	3	34	July 9, 2024
Clonotype_network scirpy	2	155	January 26, 2024
Interpreting scirpy's ir_dist_aa_alignment distances scirpy	1	247	August 22, 2023

TCR Metrics for Pairwise Distance Matrix

Related topics