Hi,
I encounter an unexpected behavior while converting a AirrCells object to AnnData.
However, some cells seem to be lost when converting airr_cells_new
to adata_new
at the end.
# Convert to AnnData
adata_new = ir.io.from_airr_cells(airr_cells_new)
# Get updated chain_pairing status, with no filtering
ir.pp.index_chains(adata_new, filter=(lambda x: True, lambda x: True))
ir.tl.chain_qc(adata_new)
## Counting chains in AirrCell objects:
pd.DataFrame([len(cell.chains) for cell in airr_cells_new]).value_counts()
> 2 3433
> 1 146
> Name: count, dtype: int64
## Chain pairing status of the AnnData object:
adata_new.obs['chain_pairing'].value_counts()
> chain_pairing
> single pair 3432
> orphan VJ 133
> orphan VDJ 14
> Name: count, dtype: int64
Which means that there is one cell with 2 chains which is converted to either “orphan VJ” or “orphan VDJ” .
Am I missing something?
More context:
The AirrCells object is defined by hand. Here is the core run to define airr_cells_new
.
The script’s purpose is to manually remove secondary chains from an AnnData object.
airr_cells = ir.io.to_airr_cells(adata)
# Initiate empty AirrCell object to populate
airr_cells_new = []
# Convert additional chains into new cells
for ind, cell in enumerate(airr_cells):
if len(cell.chains) <= 2:
new_cell = cell
elif len(cell.chains) > 2:
## Check that there are at most 4 chains, in concordance with the dual IR model
assert len(cell.chains) <= 4
chain_indices = adata.obsm['chain_indices'][ind]
new_cell = ir.io.AirrCell(cell_id=f'{cell.cell_id}_prim')
## Select primary VJ chain if it exists
if chain_indices.tolist()['VJ'][0] is not None:
prim_vj_chain = cell.chains[chain_indices.tolist()['VJ'][0]]
## Add it
new_cell.add_chain(prim_vj_chain)
## Select primary VDJ chain if it exists
if chain_indices.tolist()['VDJ'][0] is not None:
prim_vdj_chain = cell.chains[chain_indices.tolist()['VDJ'][0]]
## Add it
new_cell.add_chain(prim_vdj_chain)
# Add new cell to list
airr_cells_new.append(new_cell)