Hi,
I’m using scirpy
for computing the distances between CDR3 AA sequences.
For that I’ve called pp.ir_dist
specifying metric = 'alignment'
, sequence = 'aa'
, and cutoff = 15
.
Trying to understand the behavior, I looked at an example entry in the CDR33 AA distance matrix that was created in my mdata
object:
mdata["airr"].uns["ir_dist_aa_alignment"]["VDJ"]["distances"][6,8]
12.0
The corresponding CDR3 AA sequences are:
mdata["airr"].uns["ir_dist_aa_alignment"]["VDJ"]["seqs"][6]
'AARGTGGITAQLV'
mdata["airr"].uns["ir_dist_aa_alignment"]["VDJ"]["seqs"][8]
'AARTGGITAQLV'
If I run R
’s Biostrings::pairwiseAlignment
, using BLOSUM62
as the substitution matrix, with gapOpening = -11
and gapExtension = -11
(consistent with scirpy
’s ir_dist_aa_alignment
parameters) I get:
Biostrings::pairwiseAlignment('AARGTGGITAQLV','AARGTGGITAQLV',substitutionMatrix="BLOSUM62",gapOpening=-11,gapExtension=-11)
Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: AARGTGGITAQLV
subject: AARGTGGITAQLV
score: 62
Biostrings::pairwiseAlignment('AARTGGITAQLV','AARTGGITAQLV',substitutionMatrix="BLOSUM62",gapOpening=-11,gapExtension=-11)
Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: AARTGGITAQLV
subject: AARTGGITAQLV
score: 56
Biostrings::pairwiseAlignment('AARGTGGITAQLV','AARTGGITAQLV',substitutionMatrix="BLOSUM62",gapOpening=-11,gapExtension=-11)
Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: AARGTGGITAQLV
subject: AAR-TGGITAQLV
score: 34
Following scirpy
’s AlignmentDistanceCalculator
function, the distance should be:
min(62,56) - 34 = 22, and with the offset of 1, the entry in mdata["airr"].uns["ir_dist_aa_alignment"]["VDJ"]["distances"][6,8]
should therefore be 23 (or actually 0 since 23 > cutoff = 15
I called the pp.ir_dist
function with.
Can anyone help pointing me in the right direction?