Loading data into adata.obsm['airr'] of an existing scRNAseq object

I’m trying to load my TCR and BCR data into a pre-existing AnnData object containing scRNAseq gene expression. From the new update it appears that you can do this without needing to use MuData, is this correct and how do you load it in?

Thanks for your question!

First of all, are there any specific reasons not to use MuData? This is the recommended way of working with multimodal data in scverse, and I don’t think there are any downsides.

That said, if you want everything in a single AnnData object, you do the following:

  1. load each modality into a separate AnnData object, e.g.

    adata_gex = sc.read_10x_h5(...)
    adata_tcr = ir.io.read_10x_vdj(...)
    adata_bcr = ir.io.read_10x_vdj(...)
  2. merge TCR and BCR into a single object

    adata_airr = ir.pp.merge_airr(adata_tcr, adata_bcr)
  3. Add AIRR data to GEX object, as described in the docs. This discards all cells from the AIRR object that do not have gene expression data:

    # Map each cell barcode to its respective numeric index (assumes obs_names are unique)
    barcode2idx = {barcode: i for i, barcode in enumerate(adata_airr.obs_names)}
    # Generate a slice for the awkward array that retrieves the corresponding row
    # from `adata_airr` for each barcode in `adata_gex`. `-1` will generate all
    # "None"s for barcodes that are not in `adata_airr`
    idx = [barcode2idx.get(barcode, -1) for barcode in adata_gex.obs_names]
    adata_gex.obsm["airr"] = adata_airr.obsm["airr"][idx]

Thanks for your reply! I was having issues with MuData as it expected VDJ data to be in a slot called “airr”, but your response on github showing how to specify airr_mod='tcr' when using both BCR and TCR data has resolved this. Thank you for highlighting the section of the docs though, I completely missed that bit.