Integrating TRUST4 outputs into anndata GEX

Hi I love this community but I am struggling to integrate scRNA vdj annotations coming out of TRUST4 with my GEX data.

I am mostly using python so I tried using dandelion (read_airr fuction), but somehow it doesnt recognize the output of TRUST4 barcode_airr.tsv file, as it raises a ValueError: Unable to initialize metadata due to missing keys. Please ensure either ‘umi_count’ or ‘duplicate_count’ is in the input data.
Shouldn’t AIRR be a specific standardized output?

Can you help me with this? what is the best way of integrating TRUST4 into an anndata object? I tried scirpy as well, but it seems to be quite difficult to extract the TRAV/TRBV cdr3aa of pair receptors into the adata.obs

I

Hi,

it seems the TRUST4 AIRR files indeed does not contain a duplicate_count or umi_count field. As per the AIRR standard, this is ok, as the field is optional. I’m still surprised it doesn’t include the field, as TRUST4 does support UMIs. Did you use the --UMI flag when running it?

If dandelion can’t read that file, it’s an issue on their end because it is a compliant AIRR file. The maintainer doesn’t seem to be on this discourse, but you can open an issue in their repo.

I’d expect scirpy to be able to read that file (if not, let me know, I’m the maintainer). It shouldn’t be hard to add any of the information to adata.obs using scripy.get.airr_context, see Accessing AIRR data in the documentation.

Essentially, you can add any variable you need to .obs temporarily using ir.get.airr_context:

with ir.get.airr_context(adata, ["junction_aa", "v_call"], ["VJ_1", "VDJ_1"]):
    display(adata.obs)

P.S. Please indicate if you are cross-posting on different platforms: Integration of outputs of scRNA VDJ from TRUST4 with GEX · Issue #342 · liulab-dfci/TRUST4 · GitHub. Not doing so may lead to duplicate efforts of community members who often answer to such posts and issues in their free time.

Hi,

thank you for your help, I really appreciate it. The TRUST4 output has a column with the --UMI, I just needed to rename it. (see Integration of outputs of scRNA VDJ from TRUST4 with GEX · Issue #342 · liulab-dfci/TRUST4 · GitHub

I am new to scRNA data analysis and I am struggling to integrate my data all together. I have gex from 10x that I processed with cellRanger and I also got ab_vdj and gd_vdj libraries from the same dataset, which I have annotated with TRUST4.
I processed and cleaned the gex dataset using adata and scanpy. But I am struggling to integrate the output of TRUST4, with the adata object, to do umap and clonotype analysis.

since I have two vdj datasets, do I need to concat them before using muon with vdj + gex in scirpy? Additionally I have a bit of an overlap of barcodes for the ab_vdj and gd_vdj and I dont know if to exclude them.

I tried ir.get.airr_context(), but tbh I dont really understand what this function is doing… I got a table out this line of code [with ir.get.airr_context(adata, [“junction_aa”, “v_call”], [“VJ_1”, “VDJ_1”]): display(adata.obs)] but the adata object wasnt modified.

what do you advise me to do… Any help would be good.

Thanks ,

Daniel

since I have two vdj datasets, do I need to concat them before using muon with vdj + gex in scirpy? Additionally I have a bit of an overlap of barcodes for the ab_vdj and gd_vdj and I dont know if to exclude them.

You don’t need muon, but you can build a mudata object from the gex and the vdj anndata objects. An example is shown here.

Additionally I have a bit of an overlap of barcodes for the ab_vdj and gd_vdj and I dont know if to exclude them.

Typically I would integrate ab and gd into a single dataset and then run scirpy.tl.chain_qc to get labels you can use for filtering. If you loaded ab and gd into separate anndata objects, you can merge them using scirpy.pp.merge_airr:

adata_airr = ir.pp.merge_airr(adata_ab, adata_gd)
mdata = MuData({"gex": adata_gex, "airr": adata_airr})
ir.tl.chain_qc(mdata)

I tried ir.get.airr_context(), but tbh I dont really understand what this function is doing… I got a table out this line of code [with ir.get.airr_context(adata, [“junction_aa”, “v_call”], [“VJ_1”, “VDJ_1”]): display(adata.obs)] but the adata object wasnt modified.

airr_context only temporarily adds the information to .obs while you are within the context manager. This is not to overload your .obs with tons of columns. You can use the AIRR variables within the context manager for whatever you’d like to do, e.g. plot gene usage. See an example in the scirpy tutorial.

If you really want to add the information permanently, you can use ir.get.airr to get a data frame and join it with adata.obs.