Error ir.tl.summarize_clonal_expansion

Hi,

I’m having error while running summarize_clonal_expansion function. It works with old metrics (e.g. days) but gets errors with new metrics that I concatenate (e.g. annotation_lvl2_marker_draining_days).

Thank you!

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File /conda/envs/tcr/lib/python3.10/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key)
   3811 try:
-> 3812     return self._engine.get_loc(casted_key)
   3813 except KeyError as err:

File pandas/_libs/index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:7096, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'airr:annotation_lvl2_marker_draining_days'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
File /conda/envs/tcr/lib/python3.10/site-packages/scirpy/util/__init__.py:164, in DataHandler._get_obs_col(self, column)
    163 try:
--> 164     return self.mdata.obs[column]
    165 except (KeyError, AttributeError):

File /conda/envs/tcr/lib/python3.10/site-packages/pandas/core/frame.py:4107, in DataFrame.__getitem__(self, key)
   4106     return self._getitem_multilevel(key)
-> 4107 indexer = self.columns.get_loc(key)
   4108 if is_integer(indexer):

File /conda/envs/tcr/lib/python3.10/site-packages/pandas/core/indexes/base.py:3819, in Index.get_loc(self, key)
   3818         raise InvalidIndexError(key)
-> 3819     raise KeyError(key) from err
   3820 except TypeError:
   3821     # If we have a listlike key, _check_indexing_error will raise
   3822     #  InvalidIndexError. Otherwise we fall through and re-raise
   3823     #  the TypeError.

KeyError: 'airr:annotation_lvl2_marker_draining_days'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
File /conda/envs/tcr/lib/python3.10/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key)
   3811 try:
-> 3812     return self._engine.get_loc(casted_key)
   3813 except KeyError as err:

File pandas/_libs/index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:7096, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'airr:annotation_lvl2_marker_draining_days'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[38], line 1
----> 1 plot_df = ir.tl.summarize_clonal_expansion(
      2         mdata_sub, target_col="airr:cc_aa_tcrdist_same_v", groupby="airr:annotation_lvl2_marker_draining_days", breakpoints=(1, 2, 5), 
      3         normalize=False)

File /conda/envs/tcr/lib/python3.10/site-packages/scirpy/tl/_clonal_expansion.py:169, in summarize_clonal_expansion(adata, groupby, target_col, summarize_by, normalize, airr_mod, **kwargs)
    166 tmp_col = target_col + "_clipped_count"
    167 tmp_col_weight = target_col + "_weight"
--> 169 obs = params.get_obs([groupby, target_col])
    170 obs[tmp_col] = expansion
    172 # filter NA values

File /conda/envs/tcr/lib/python3.10/site-packages/scirpy/util/__init__.py:155, in DataHandler.get_obs(self, columns)
    153 else:
    154     if len(columns):
--> 155         df = pd.concat({c: self._get_obs_col(c) for c in columns}, axis=1)
    156         assert df.index.is_unique, "Index not unique"
    157         return df.reindex(self.data.obs_names)

File /conda/envs/tcr/lib/python3.10/site-packages/scirpy/util/__init__.py:155, in <dictcomp>(.0)
    153 else:
    154     if len(columns):
--> 155         df = pd.concat({c: self._get_obs_col(c) for c in columns}, axis=1)
    156         assert df.index.is_unique, "Index not unique"
    157         return df.reindex(self.data.obs_names)

File /conda/envs/tcr/lib/python3.10/site-packages/scirpy/util/__init__.py:166, in DataHandler._get_obs_col(self, column)
    164     return self.mdata.obs[column]
    165 except (KeyError, AttributeError):
--> 166     return self.adata.obs[column]

File /conda/envs/tcr/lib/python3.10/site-packages/pandas/core/frame.py:4107, in DataFrame.__getitem__(self, key)
   4105 if self.columns.nlevels > 1:
   4106     return self._getitem_multilevel(key)
-> 4107 indexer = self.columns.get_loc(key)
   4108 if is_integer(indexer):
   4109     indexer = [indexer]

File /conda/envs/tcr/lib/python3.10/site-packages/pandas/core/indexes/base.py:3819, in Index.get_loc(self, key)
   3814     if isinstance(casted_key, slice) or (
   3815         isinstance(casted_key, abc.Iterable)
   3816         and any(isinstance(x, slice) for x in casted_key)
   3817     ):
   3818         raise InvalidIndexError(key)
-> 3819     raise KeyError(key) from err
   3820 except TypeError:
   3821     # If we have a listlike key, _check_indexing_error will raise
   3822     #  InvalidIndexError. Otherwise we fall through and re-raise
   3823     #  the TypeError.
   3824     self._check_indexing_error(key)

KeyError: 'airr:annotation_lvl2_marker_draining_days'

Hi @m21camby,

there’s likely something wrong with how the columns (do not) get synchronized between the mdata.obs and mdata[“airr”].obs.

Could you please share:

>>> mdata.obs.columns
>>> mdata[“airr”].obs.columns

and how you built the concatenated column?

Probably you can solve your issue with MuData’s “pull/push” functions. However, it would still be great if you could share the info above – this would be valuable feedback for improving scirpy and/or MuData.

Best,
Gregor

Hi Gregor,

Thanks for the kind support. pull/push function solved the issue. I just concatenate columns e.g.

mdata[‘gex’].obs[‘annotation_lvl2_marker_draining_days’] = mdata[‘gex’].obs[‘annotation_lvl2_marker_draining’] + ‘_’ + mdata[‘gex’].obs[‘days’]

Just to share:

mdata.obs.columns
Index(['airr:Sample_ID', 'airr:receptor_type', 'airr:receptor_subtype',
       'airr:chain_pairing', 'airr:cc_aa_tcrdist_same_v',
       'airr:cc_aa_tcrdist_same_v_size', 'airr:antigen.species',
       'airr:antigen.gene', 'airr:antigen.epitope', 'airr:Cell_ID',
       'airr:identical', 'airr:identical_size', 'gex:n_genes',
       'gex:percent_mito', 'gex:n_genes_by_counts',
       'gex:log1p_n_genes_by_counts', 'gex:total_counts',
       'gex:log1p_total_counts', 'gex:pct_counts_in_top_50_genes',
       'gex:pct_counts_in_top_100_genes', 'gex:pct_counts_in_top_200_genes',
       'gex:pct_counts_in_top_500_genes', 'gex:total_counts_ribo',
       'gex:log1p_total_counts_ribo', 'gex:pct_counts_ribo',
       'gex:total_counts_hb', 'gex:log1p_total_counts_hb', 'gex:pct_counts_hb',
       'gex:n_counts', 'gex:Classification_soc', 'gex:scrublet_score',
       'gex:scrublet_cluster_score', 'gex:zscore', 'gex:bh_pval',
       'gex:bonf_pval', 'gex:bh_pval_decision', 'gex:is_doublet',
       'gex:majority_voting_Healthy_COVID19_PBMC',
       'gex:majority_voting_LEGACY1_annotated', 'gex:combined_annotation',
       'gex:Broad_combined_annotation', 'gex:Sample_ID', 'gex:Experiment',
       'gex:leiden', 'gex:leiden_1.5', 'gex:leiden_1.2', 'gex:Cell_ID',
       'gex:annotation_lvl1', 'gex:broad_annotation_lvl1', 'gex:Lymph_node',
       'gex:vaccination', 'gex:days', 'gex:leiden_1', 'gex:annotation_lvl2',
       'gex:broad_annotation_lvl2', 'gex:annotation_lvl2_marker',
       'gex:Patient_ID', 'gex:Age', 'gex:Sex', 'gex:ethnicity',
       'gex:HANCESTRO_ontology', 'gex:flu_vac_last_three_years', 'gex:weight',
       'gex:height', 'gex:BMI', 'gex:Vaccine_site', 'gex:draining',
       'gex:Patient_ID_days', 'gex:receptor_type', 'gex:receptor_subtype',
       'gex:chain_pairing', 'gex:cc_aa_tcrdist_same_v',
       'gex:cc_aa_tcrdist_same_v_size', 'gex:antigen.species',
       'gex:antigen.gene', 'gex:antigen.epitope', 'gex:check_clonotype',
       'gex:draining_days', 'indexing'],
      dtype='object')
mdata[“airr”].obs.columns
Index(['Sample_ID', 'receptor_type', 'receptor_subtype', 'chain_pairing',
       'cc_aa_tcrdist_same_v', 'cc_aa_tcrdist_same_v_size', 'antigen.species',
       'antigen.gene', 'antigen.epitope', 'Cell_ID', 'identical',
       'identical_size', 'annotation_lvl2_marker_draining_days'],
      dtype='object')

Thank you so much!!

1 Like