Integrating TRUST4 outputs into anndata GEX

fonseca.dfm · January 15, 2025, 2:58pm

Hi I love this community but I am struggling to integrate scRNA vdj annotations coming out of TRUST4 with my GEX data.

I am mostly using python so I tried using dandelion (read_airr fuction), but somehow it doesnt recognize the output of TRUST4 barcode_airr.tsv file, as it raises a ValueError: Unable to initialize metadata due to missing keys. Please ensure either ‘umi_count’ or ‘duplicate_count’ is in the input data.
Shouldn’t AIRR be a specific standardized output?

Can you help me with this? what is the best way of integrating TRUST4 into an anndata object? I tried scirpy as well, but it seems to be quite difficult to extract the TRAV/TRBV cdr3aa of pair receptors into the adata.obs

I

grst · January 15, 2025, 4:23pm

Hi,

it seems the TRUST4 AIRR files indeed does not contain a duplicate_count or umi_count field. As per the AIRR standard, this is ok, as the field is optional. I’m still surprised it doesn’t include the field, as TRUST4 does support UMIs. Did you use the --UMI flag when running it?

If dandelion can’t read that file, it’s an issue on their end because it is a compliant AIRR file. The maintainer doesn’t seem to be on this discourse, but you can open an issue in their repo.

I’d expect scirpy to be able to read that file (if not, let me know, I’m the maintainer). It shouldn’t be hard to add any of the information to adata.obs using scripy.get.airr_context, see Accessing AIRR data in the documentation.

Essentially, you can add any variable you need to .obs temporarily using ir.get.airr_context:

with ir.get.airr_context(adata, ["junction_aa", "v_call"], ["VJ_1", "VDJ_1"]):
    display(adata.obs)

P.S. Please indicate if you are cross-posting on different platforms: Integration of outputs of scRNA VDJ from TRUST4 with GEX · Issue #342 · liulab-dfci/TRUST4 · GitHub. Not doing so may lead to duplicate efforts of community members who often answer to such posts and issues in their free time.

fonseca.dfm · January 22, 2025, 12:02pm

Hi,

thank you for your help, I really appreciate it. The TRUST4 output has a column with the --UMI, I just needed to rename it. (see Integration of outputs of scRNA VDJ from TRUST4 with GEX · Issue #342 · liulab-dfci/TRUST4 · GitHub

I am new to scRNA data analysis and I am struggling to integrate my data all together. I have gex from 10x that I processed with cellRanger and I also got ab_vdj and gd_vdj libraries from the same dataset, which I have annotated with TRUST4.
I processed and cleaned the gex dataset using adata and scanpy. But I am struggling to integrate the output of TRUST4, with the adata object, to do umap and clonotype analysis.

since I have two vdj datasets, do I need to concat them before using muon with vdj + gex in scirpy? Additionally I have a bit of an overlap of barcodes for the ab_vdj and gd_vdj and I dont know if to exclude them.

I tried ir.get.airr_context(), but tbh I dont really understand what this function is doing… I got a table out this line of code [with ir.get.airr_context(adata, [“junction_aa”, “v_call”], [“VJ_1”, “VDJ_1”]): display(adata.obs)] but the adata object wasnt modified.

what do you advise me to do… Any help would be good.

Thanks ,

Daniel

grst · January 22, 2025, 6:27pm

since I have two vdj datasets, do I need to concat them before using muon with vdj + gex in scirpy? Additionally I have a bit of an overlap of barcodes for the ab_vdj and gd_vdj and I dont know if to exclude them.

You don’t need muon, but you can build a mudata object from the gex and the vdj anndata objects. An example is shown here.

Additionally I have a bit of an overlap of barcodes for the ab_vdj and gd_vdj and I dont know if to exclude them.

Typically I would integrate ab and gd into a single dataset and then run scirpy.tl.chain_qc to get labels you can use for filtering. If you loaded ab and gd into separate anndata objects, you can merge them using scirpy.pp.merge_airr:

adata_airr = ir.pp.merge_airr(adata_ab, adata_gd)
mdata = MuData({"gex": adata_gex, "airr": adata_airr})
ir.tl.chain_qc(mdata)

I tried ir.get.airr_context(), but tbh I dont really understand what this function is doing… I got a table out this line of code [with ir.get.airr_context(adata, [“junction_aa”, “v_call”], [“VJ_1”, “VDJ_1”]): display(adata.obs)] but the adata object wasnt modified.

airr_context only temporarily adds the information to .obs while you are within the context manager. This is not to overload your .obs with tons of columns. You can use the AIRR variables within the context manager for whatever you’d like to do, e.g. plot gene usage. See an example in the scirpy tutorial.

If you really want to add the information permanently, you can use ir.get.airr to get a data frame and join it with adata.obs.

Topic		Replies	Views
Loading data into adata.obsm['airr'] of an existing scRNAseq object scirpy	2	339	July 24, 2023
Bar plot using muon between scRNA-seq and scTCR-seq&scBCR-seq? scirpy integration	4	353	July 6, 2023
Access to antibody sequence information in scirpy anndata object scirpy	1	280	February 20, 2023
`ir.tl.chain_qc(adata)` error message with scirpy 0.18.0 scirpy	3	29	October 17, 2024
Unexpected chain pairing status while converting AirrCells to AnnData scirpy	4	136	March 23, 2024

Integrating TRUST4 outputs into anndata GEX

Related topics