I’m working with AIRR data from BD Rhapsody and need some guidance. I’ve loaded data from a
*_VDJ_Dominant_Contigs.csv file using the [scirpy.io.read_bd_rhapsody] function.
However, after performing QC on my data using
ir.tl.chain_qc, I’ve noticed a discrepancy. The BD pipeline identified 600-700 more cells with the
TCR_Paired_Chains label in the
*_VDJ_perCell.csv file than I found with my QC. The BD documentation states that a
TCR_Paired_Chains is True/False based on the presence of at least one error-corrected molecule of either TCR Alpha and TCR Beta or TCR Gamma and TCR Delta in the cell.
I’m guessing it has to do with the error correction they perform?
Does anyone have experience with this discrepancy or insights into why it might occur?
Thanks in advance for any help!
is the difference already in the files produced by the pipeline, or do you think something goes wrong when loading the data into scirpy? If it’s the former, you may want to reach out to the BD support. I was in contact with them once and they were quite helpful.
More recent versions of the pipeline should also produce output in AIRR format, which would anyway be preferred over reading in the
Thank you for your prompt response.
It appears that the discrepancy is in the files produced by their pipeline itself. I’ve already reached out to them, and the bioinformatician assisting me has been incredibly helpful. I’m just awaiting his response since he’s currently on vacation, hence my decision to crowdsource in the meantime.
Regrettably, this older version doesn’t support the AIRR format for either the perCell or Dominant_Contigs files.
By the way, have you worked with BD Rhapsody data before? If so, I’d appreciate any advice or insights you might have.
if you find out something useful, please let me know and I’m happy to add it to the documentation.
Unfortunately, I don’t have any experience with BD TCR data except for writing the data loader.