How to edit and filter awkward array created by scirpy?

racng · August 3, 2023, 11:30pm

I used scirpy to read an AIRR table generated by MiXCR. The v/d/j/c_call columns have “*00” added to the end of the gene that I want to remove.
scirpy has functions for retrieving values, but I can’t figure out how to directly edit the awkward array.
For example, mdata.mod['airr'].obsm['airr'][0][0]['c_call'] gives “TRBC1*00”. After assigning a value manually, mdata.mod['airr'].obsm['airr'][0][0]['c_call'] = "TRBC1". The values are unaffected.

Additionally, I want to remove some chains from the awkward array based on matching “junction_aa”, “v_call”, and “j_call”. These are contaminations that I want to remove from the analysis. Is there a way to do that?

grst · August 7, 2023, 8:52pm

Hi @racng,

this is an excellent question!

You can slice the awkward array in .obsm["airr"] and manipulate its values. For instance,
you can retrieve all c_call variables from all chains using

>>> mdata["airr"].obsm["airr"]["c_call"]
[['TRBC2*00', 'TRBC2*00'],
 ['TRBC1*00', 'TRAC*00', 'TRBC1*00'],
 ['TRBC2*00', 'TRBC2*00', 'TRAC*00'],
 ...,
 ['TRBC2*00', 'TRAC*00'],
 ['TRBC2*00', 'TRAC*00']]
--------------------------------------------
type: 3000 * var * ?string

This is still an awkward array. There are ways of manipulating awkward arrays directly and while they are computationally efficient, they are not always beginner-friendly. Let’s therefore convert it to a python list of lists that you can easily modify:

import awkward as ak
c_calls = ak.to_list(mdata["airr"].obsm["airr"]["c_call"])

Now you can walk that list and build a new one, manipulating values one-by-one:

c_calls_new = []
for cell in c_calls:
    tmp_cell = []
    for c_gene in cell:
        if c_gene is not None:
            tmp_cell.append(c_gene.split("*")[0])
        else:
            tmp_cell.append(None)
    c_calls_new.append(tmp_cell)

You can now re-assign the list to the awkward array:

mdata["airr"].obsm["airr"]["c_call"] = c_calls_new

And appreciate that the *00 suffix has been removed:

ir.get.airr(mdata, airr_variable="c_call")

	VJ_1_c_call	VDJ_1_c_call	VDJ_2_c_call
LN1_GTAGGCCAGCGTAGTG-1		TRBC2	TRBC2
RN2_AGAGCGACAGATTGCT-1	TRAC	TRBC1
LN1_GTCATTTCAATGAAAC-1	TRAC	TRBC1
LN2_GACACGCAGGTAGCTG-2		TRBC2
LN2_GCACTCTCAGGGATTG-2	TRAC	TRBC1

The reason why

mdata.mod['airr'].obsm['airr'][0][0]['c_call'] = "TRBC1"

doesn’t affect your values is that only “Record types” (that is the awkward equivalent of a dictionary) are mutable. Selecting an index [0] returns an immutable view of the array, therefore your edit is in vain.

Regarding your second question:
My suggestion here would be to not actually remove those values, but use the filtering capabilities of scirpy.pp.index_chains. That way, the chains will be ignored by all scirpy functions that use AIRR data.

For instance, you can define a list of custom filters, e.g.

filters = [
   # these are the default filters that you'll need to re-specify here if you want to keep them
   "productive",
   "require_junction_aa"
   # custom filters via callback functions - return True to keep the chain
   lambda x: x['c_gene'] != "TRBC2",
   lambda x: ~x['junction_aa'].contains("*")
]

and pass it to index_chains like this:

scirpy.pp.index_chains(mdata, filter=filters)

(Note: From scirpy v0.14 on, these functions need to be numba-compilable since index-chains will switch to a more efficient numba implementation that is >100x faster)

Alternatively, if you prefer to remove those chains entirely, you can subset the awkward array directly.
Again, the easiest (but unefficient) way would be to convert the entire array to a list of dictionaries using ak.to_list, filter that list using a python loop and reassign mdata["airr"].obsm["airr"] = ak.Array(filtered_list_of_dicts). You can also create boolean masks of the awkward array and use that for subsetting it:

arr = mdata["airr"].obsm["airr"]
mask = arr["c_call"] != "TRBC2"
mdata["airr"].obsm["airr"] == arr[mask]

Hope that helps! If you have any specific questions regarding awkward arrays, consider asking in their forum. From my experience the authors are super helpful and responsive.

Topic		Replies	Views
Unexpected chain pairing status while converting AirrCells to AnnData scirpy	4	88	March 23, 2024
Community Meeting 2023-10-17 18:00 CEST Announcements	0	132	October 12, 2023
Loading data into adata.obsm['airr'] of an existing scRNAseq object scirpy	2	253	July 24, 2023
Access to antibody sequence information in scirpy anndata object scirpy	1	233	February 20, 2023
Can not select part of the adata.obs (corresponding to a library_id) squidpy	3	386	February 6, 2023

How to edit and filter awkward array created by scirpy?

Related Topics