Hello everyone,
thank you in advance for any help on this. I am running into a type error when running the following command on my scRNA/TCR-seq data:
ir.pp.index_chains(mdata)
ir.tl.chain_qc(mdata)
ir.pp.ir_dist(mdata, metric='identity', sequence='aa')
ir.tl.define_clonotype_clusters(mdata, metric='identity', receptor_arms='all', dual_ir='any', sequence='aa', 
                                key_added='clone_id')
The output with the TypeError is:
Filtering chains...
Indexing VJ chains...
Indexing VDJ chains...
build result array
Stored result in `mdata.obs["airr:receptor_type"]`.
Stored result in `mdata.obs["airr:receptor_subtype"]`.
Stored result in `mdata.obs["airr:chain_pairing"]`.
Computing sequence x sequence distance matrix for VJ sequences.
Computing sequence x sequence distance matrix for VDJ sequences.
Initializing lookup tables. 
--> Done initializing lookup tables. (0:00:00)
Computing clonotype x clonotype distances.
NB: Computation happens in chunks. The progressbar only advances when a chunk has finished. 
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[33], line 4
      2 ir.tl.chain_qc(mdata)
      3 ir.pp.ir_dist(mdata, metric='identity', sequence='aa')
----> 4 ir.tl.define_clonotype_clusters(mdata, metric='identity', receptor_arms='all', dual_ir='any', sequence='aa', 
      5                                 key_added='clone_id')
File ~/miniconda3/envs/single-cell_env/lib/python3.12/site-packages/scirpy/tl/_clonotypes.py:298, in define_clonotype_clusters(adata, sequence, metric, receptor_arms, dual_ir, same_v_gene, within_group, key_added, partitions, resolution, n_iterations, distance_key, inplace, n_jobs, chunksize, airr_mod, airr_key, chain_idx_key)
    275 within_group, distance_key, key_added = _validate_parameters(
    276     params,
    277     None,
   (...)
    284     key_added,
    285 )
    287 ctn = ClonotypeNeighbors(
    288     params,
    289     receptor_arms=receptor_arms,  # type: ignore
   (...)
    296     chunksize=chunksize,
    297 )
--> 298 clonotype_dist = ctn.compute_distances()
    299 g = igraph_from_sparse_matrix(clonotype_dist, matrix_type="distance")
    301 if partitions == "leiden":
File ~/miniconda3/envs/single-cell_env/lib/python3.12/site-packages/scirpy/ir_dist/_clonotype_neighbors.py:224, in ClonotypeNeighbors.compute_distances(self)
    219 else:
    220     logging.info(
    221         "NB: Computation happens in chunks. The progressbar only advances " "when a chunk has finished. "
    222     )  # type: ignore
--> 224     dist_rows = process_map(
    225         self._dist_for_clonotype,
    226         range(n_clonotypes),
    227         max_workers=_get_usable_cpus(self.n_jobs),
    228         chunksize=2000,
    229         tqdm_class=tqdm,
    230     )
    232 dist = sp.vstack(list(dist_rows))
    233 dist.eliminate_zeros()
File ~/miniconda3/envs/single-cell_env/lib/python3.12/site-packages/tqdm/contrib/concurrent.py:105, in process_map(fn, *iterables, **tqdm_kwargs)
    103     tqdm_kwargs = tqdm_kwargs.copy()
    104     tqdm_kwargs["lock_name"] = "mp_lock"
--> 105 return _executor_map(ProcessPoolExecutor, fn, *iterables, **tqdm_kwargs)
File ~/miniconda3/envs/single-cell_env/lib/python3.12/site-packages/tqdm/contrib/concurrent.py:49, in _executor_map(PoolExecutor, fn, *iterables, **tqdm_kwargs)
     46 lock_name = kwargs.pop("lock_name", "")
     47 with ensure_lock(tqdm_class, lock_name=lock_name) as lk:
     48     # share lock in case workers are already using `tqdm`
---> 49     with PoolExecutor(max_workers=max_workers, initializer=tqdm_class.set_lock,
     50                       initargs=(lk,)) as ex:
     51         return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
File ~/miniconda3/envs/single-cell_env/lib/python3.12/concurrent/futures/process.py:684, in ProcessPoolExecutor.__init__(self, max_workers, mp_context, initializer, initargs, max_tasks_per_child)
    681         self._max_workers = min(_MAX_WINDOWS_WORKERS,
    682                                 self._max_workers)
    683 else:
--> 684     if max_workers <= 0:
    685         raise ValueError("max_workers must be greater than 0")
    686     elif (sys.platform == 'win32' and
    687         max_workers > _MAX_WINDOWS_WORKERS):
TypeError: '<=' not supported between instances of 'set' and 'int'
My mdata object has the following structure:
MuData object with n_obs × n_vars = 11129 × 38606
  2 modalities
    gex:	11129 x 38606
      obs:	'sample1', 'sample2', 'sample3', 'sample4', 'sample5', 'sample6', 'sample', 'n_counts', 'log_counts', 'n_genes', 'log_genes', 'mt_frac', 'chain_pairing', 'most_likely_hypothesis', 'cluster_feature', 'negative_hypothesis_probability', 'singlet_hypothesis_probability', 'doublet_hypothesis_probability', 'pool', 'log_sample1', 'log_sample2', 'log_sample3', 'log_sample4', 'log_sample5', 'log_sample6', 'batch', 'donor', 'condition', 'hsct', 'leiden', 'markers_MonozyteClassical', 'markers_MonozytesNon-Classical', 'markers_NK', 'markers_NK_b1', 'markers_NK_b2', 'markers_TCR', 'markers_GammaDeltaTC'
      var:	'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'
      uns:	'donor_colors', 'condition_colors', 'sample_colors', 'pool_colors', 'chain_pairing_colors'
      obsm:	'X_umap'
    airr:	11129 x 0
      obs:	'receptor_type', 'receptor_subtype', 'chain_pairing', 'batch'
      obsm:	'airr', 'chain_indices'
My environment uses scirpy version 0.17.0 and here are the other packages in my environment:
Babel 2.14.0
Brotli 1.1.0
Levenshtein 0.25.1
MarkupSafe 2.1.5
PyQt5 5.15.9
PyQt5-sip 12.12.2
PySocks 1.7.1
PyWavelets 1.4.1
PyYAML 6.0.1
Send2Trash 1.8.3
adjustText 1.2.0
airr 1.5.1
anndata 0.10.7
annoy 1.17.3
anyio 4.4.0
argon2-cffi 23.1.0
argon2-cffi-bindings 21.2.0
array-api-compat 1.7.1
arrow 1.3.0
asttokens 2.4.1
async-lru 2.0.4
attrs 23.2.0
awkward 2.6.5
awkward-cpp 34
beautifulsoup4 4.12.3
bleach 6.1.0
cached-property 1.5.2
certifi 2024.6.2
cffi 1.16.0
charset-normalizer 3.3.2
colorama 0.4.6
comm 0.2.2
contourpy 1.2.1
cycler 0.12.1
debugpy 1.8.1
decorator 5.1.1
decoupler 1.5.0
defusedxml 0.7.1
docrep 0.3.2
entrypoints 0.4
exceptiongroup 1.2.0
executing 2.0.1
fastjsonschema 2.20.0
fonttools 4.53.0
fqdn 1.5.1
fsspec 2024.6.0
get-annotations 0.1.2
h11 0.14.0
h2 4.1.0
h5py 3.11.0
hpack 4.0.0
httpcore 1.0.5
httpx 0.27.0
hyperframe 6.0.1
idna 3.7
igraph 0.11.5
imagecodecs 2024.6.1
imageio 2.34.1
importlib-metadata 7.1.0
importlib-resources 6.4.0
inflect 7.2.1
ipykernel 6.29.4
ipython 8.25.0
isoduration 20.11.0
jedi 0.19.1
jinja2 3.1.4
joblib 1.4.2
json5 0.9.25
jsonpointer 3.0.0
jsonschema 4.22.0
jsonschema-specifications 2023.12.1
jupyter-client 8.6.2
jupyter-core 5.7.2
jupyter-events 0.10.0
jupyter-lsp 2.2.5
jupyter-server 2.14.1
jupyter-server-terminals 0.5.3
jupyterlab 4.2.2
jupyterlab-pygments 0.3.0
jupyterlab-server 2.27.2
kiwisolver 1.4.5
lazy-loader 0.4
legacy-api-wrap 1.4
leidenalg 0.10.2
llvmlite 0.42.0
matplotlib 3.8.4
matplotlib-inline 0.1.7
mistune 3.0.2
more-itertools 10.3.0
mudata 0.2.3
munkres 1.1.4
muon 0.1.6
natsort 8.4.0
nbclient 0.10.0
nbconvert 7.16.4
nbformat 5.10.4
nest-asyncio 1.6.0
networkx 3.3
notebook-shim 0.2.4
numba 0.59.1
numpy 1.26.4
omnipath 1.0.8
overrides 7.7.0
packaging 24.1
pandas 2.2.2
pandocfilters 1.5.0
parasail 1.3.4
parso 0.8.4
pathlib 1.0.1
patsy 0.5.6
pexpect 4.9.0
pickleshare 0.7.5
pillow 10.3.0
pip 24.0
pkgutil-resolve-name 1.3.10
platformdirs 4.2.2
plotly 5.22.0
ply 3.11
pooch 1.8.2
prometheus-client 0.20.0
prompt-toolkit 3.0.47
protobuf 4.25.3
psutil 5.9.8
ptyprocess 0.7.0
pure-eval 0.2.2
pycparser 2.22
pygments 2.18.0
pynndescent 0.5.13
pyparsing 3.1.2
python-Levenshtein 0.25.1
python-dateutil 2.9.0
python-json-logger 2.0.7
pytz 2024.1
pyzmq 26.0.3
rapidfuzz 3.9.3
referencing 0.35.1
requests 2.32.3
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rpds-py 0.18.1
scanpy 1.10.1
scikit-image 0.23.2
scikit-learn 1.5.0
scipy 1.13.1
scirpy 0.17.0
scrublet 0.2.3
seaborn 0.13.2
session-info 1.0.0
setuptools 70.0.0
sip 6.7.12
six 1.16.0
skranger 0.8.0
sniffio 1.3.1
soupsieve 2.5
squarify 0.4.3
stack-data 0.6.2
statsmodels 0.14.2
stdlib-list 0.10.0
tenacity 8.4.1
terminado 0.18.1
texttable 1.7.0
threadpoolctl 3.5.0
tifffile 2024.5.22
tinycss2 1.3.0
toml 0.10.2
tomli 2.0.1
tornado 6.4.1
tqdm 4.66.4
traitlets 5.14.3
typeguard 4.3.0
types-python-dateutil 2.9.0.20240316
typing-extensions 4.12.2
typing-utils 0.1.0
tzdata 2024.1
umap-learn 0.5.5
uri-template 1.3.0
urllib3 2.2.2
wcwidth 0.2.13
webcolors 24.6.0
webencodings 0.5.1
websocket-client 1.8.0
wheel 0.43.0
wrapt 1.16.0
xlrd 1.2.0
yamlordereddictloader 0.4.0
zipp 3.19.2
I was thinking that the problem is caused by concatenating my two samples because I can run the comman successfully on single samples. However, my colleague uses this notebook with multiple samples and runs in no issue. This is how I concatenated my two MuData objects after preprocessing:
adatas = []
adata_tmp = mu.read('/home/michael/Bioinfo/Output/new_cellranger_NC.h5mu')
adatas.append(adata_tmp)
adata_tmp = mu.read('/home/michael/Bioinfo/Output/new_cellranger_STIM.h5mu')
adatas.append(adata_tmp)
mdata = []
airrdata = []
#concatenate GEX
mdata = adatas[0]["gex"].concatenate(adatas[1]["gex"])
#concatenate AIRR
airrdata = adatas[0]["airr"].concatenate(adatas[1]["airr"])
#fuse AIRR and GEX
mdata = mu.MuData({'gex': mdata, 'airr': airrdata})
I would be incredible grateful for any help since I tried many different things now without success. Could this be a dependency issue between different packages or is there something that could go wrong with concatenating the two samples.
Thank you all for your help,
Michael