Hello everyone,
thank you in advance for any help on this. I am running into a type error when running the following command on my scRNA/TCR-seq data:
ir.pp.index_chains(mdata)
ir.tl.chain_qc(mdata)
ir.pp.ir_dist(mdata, metric='identity', sequence='aa')
ir.tl.define_clonotype_clusters(mdata, metric='identity', receptor_arms='all', dual_ir='any', sequence='aa',
key_added='clone_id')
The output with the TypeError is:
Filtering chains...
Indexing VJ chains...
Indexing VDJ chains...
build result array
Stored result in `mdata.obs["airr:receptor_type"]`.
Stored result in `mdata.obs["airr:receptor_subtype"]`.
Stored result in `mdata.obs["airr:chain_pairing"]`.
Computing sequence x sequence distance matrix for VJ sequences.
Computing sequence x sequence distance matrix for VDJ sequences.
Initializing lookup tables.
--> Done initializing lookup tables. (0:00:00)
Computing clonotype x clonotype distances.
NB: Computation happens in chunks. The progressbar only advances when a chunk has finished.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[33], line 4
2 ir.tl.chain_qc(mdata)
3 ir.pp.ir_dist(mdata, metric='identity', sequence='aa')
----> 4 ir.tl.define_clonotype_clusters(mdata, metric='identity', receptor_arms='all', dual_ir='any', sequence='aa',
5 key_added='clone_id')
File ~/miniconda3/envs/single-cell_env/lib/python3.12/site-packages/scirpy/tl/_clonotypes.py:298, in define_clonotype_clusters(adata, sequence, metric, receptor_arms, dual_ir, same_v_gene, within_group, key_added, partitions, resolution, n_iterations, distance_key, inplace, n_jobs, chunksize, airr_mod, airr_key, chain_idx_key)
275 within_group, distance_key, key_added = _validate_parameters(
276 params,
277 None,
(...)
284 key_added,
285 )
287 ctn = ClonotypeNeighbors(
288 params,
289 receptor_arms=receptor_arms, # type: ignore
(...)
296 chunksize=chunksize,
297 )
--> 298 clonotype_dist = ctn.compute_distances()
299 g = igraph_from_sparse_matrix(clonotype_dist, matrix_type="distance")
301 if partitions == "leiden":
File ~/miniconda3/envs/single-cell_env/lib/python3.12/site-packages/scirpy/ir_dist/_clonotype_neighbors.py:224, in ClonotypeNeighbors.compute_distances(self)
219 else:
220 logging.info(
221 "NB: Computation happens in chunks. The progressbar only advances " "when a chunk has finished. "
222 ) # type: ignore
--> 224 dist_rows = process_map(
225 self._dist_for_clonotype,
226 range(n_clonotypes),
227 max_workers=_get_usable_cpus(self.n_jobs),
228 chunksize=2000,
229 tqdm_class=tqdm,
230 )
232 dist = sp.vstack(list(dist_rows))
233 dist.eliminate_zeros()
File ~/miniconda3/envs/single-cell_env/lib/python3.12/site-packages/tqdm/contrib/concurrent.py:105, in process_map(fn, *iterables, **tqdm_kwargs)
103 tqdm_kwargs = tqdm_kwargs.copy()
104 tqdm_kwargs["lock_name"] = "mp_lock"
--> 105 return _executor_map(ProcessPoolExecutor, fn, *iterables, **tqdm_kwargs)
File ~/miniconda3/envs/single-cell_env/lib/python3.12/site-packages/tqdm/contrib/concurrent.py:49, in _executor_map(PoolExecutor, fn, *iterables, **tqdm_kwargs)
46 lock_name = kwargs.pop("lock_name", "")
47 with ensure_lock(tqdm_class, lock_name=lock_name) as lk:
48 # share lock in case workers are already using `tqdm`
---> 49 with PoolExecutor(max_workers=max_workers, initializer=tqdm_class.set_lock,
50 initargs=(lk,)) as ex:
51 return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
File ~/miniconda3/envs/single-cell_env/lib/python3.12/concurrent/futures/process.py:684, in ProcessPoolExecutor.__init__(self, max_workers, mp_context, initializer, initargs, max_tasks_per_child)
681 self._max_workers = min(_MAX_WINDOWS_WORKERS,
682 self._max_workers)
683 else:
--> 684 if max_workers <= 0:
685 raise ValueError("max_workers must be greater than 0")
686 elif (sys.platform == 'win32' and
687 max_workers > _MAX_WINDOWS_WORKERS):
TypeError: '<=' not supported between instances of 'set' and 'int'
My mdata object has the following structure:
MuData object with n_obs × n_vars = 11129 × 38606
2 modalities
gex: 11129 x 38606
obs: 'sample1', 'sample2', 'sample3', 'sample4', 'sample5', 'sample6', 'sample', 'n_counts', 'log_counts', 'n_genes', 'log_genes', 'mt_frac', 'chain_pairing', 'most_likely_hypothesis', 'cluster_feature', 'negative_hypothesis_probability', 'singlet_hypothesis_probability', 'doublet_hypothesis_probability', 'pool', 'log_sample1', 'log_sample2', 'log_sample3', 'log_sample4', 'log_sample5', 'log_sample6', 'batch', 'donor', 'condition', 'hsct', 'leiden', 'markers_MonozyteClassical', 'markers_MonozytesNon-Classical', 'markers_NK', 'markers_NK_b1', 'markers_NK_b2', 'markers_TCR', 'markers_GammaDeltaTC'
var: 'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'
uns: 'donor_colors', 'condition_colors', 'sample_colors', 'pool_colors', 'chain_pairing_colors'
obsm: 'X_umap'
airr: 11129 x 0
obs: 'receptor_type', 'receptor_subtype', 'chain_pairing', 'batch'
obsm: 'airr', 'chain_indices'
My environment uses scirpy version 0.17.0 and here are the other packages in my environment:
Babel 2.14.0
Brotli 1.1.0
Levenshtein 0.25.1
MarkupSafe 2.1.5
PyQt5 5.15.9
PyQt5-sip 12.12.2
PySocks 1.7.1
PyWavelets 1.4.1
PyYAML 6.0.1
Send2Trash 1.8.3
adjustText 1.2.0
airr 1.5.1
anndata 0.10.7
annoy 1.17.3
anyio 4.4.0
argon2-cffi 23.1.0
argon2-cffi-bindings 21.2.0
array-api-compat 1.7.1
arrow 1.3.0
asttokens 2.4.1
async-lru 2.0.4
attrs 23.2.0
awkward 2.6.5
awkward-cpp 34
beautifulsoup4 4.12.3
bleach 6.1.0
cached-property 1.5.2
certifi 2024.6.2
cffi 1.16.0
charset-normalizer 3.3.2
colorama 0.4.6
comm 0.2.2
contourpy 1.2.1
cycler 0.12.1
debugpy 1.8.1
decorator 5.1.1
decoupler 1.5.0
defusedxml 0.7.1
docrep 0.3.2
entrypoints 0.4
exceptiongroup 1.2.0
executing 2.0.1
fastjsonschema 2.20.0
fonttools 4.53.0
fqdn 1.5.1
fsspec 2024.6.0
get-annotations 0.1.2
h11 0.14.0
h2 4.1.0
h5py 3.11.0
hpack 4.0.0
httpcore 1.0.5
httpx 0.27.0
hyperframe 6.0.1
idna 3.7
igraph 0.11.5
imagecodecs 2024.6.1
imageio 2.34.1
importlib-metadata 7.1.0
importlib-resources 6.4.0
inflect 7.2.1
ipykernel 6.29.4
ipython 8.25.0
isoduration 20.11.0
jedi 0.19.1
jinja2 3.1.4
joblib 1.4.2
json5 0.9.25
jsonpointer 3.0.0
jsonschema 4.22.0
jsonschema-specifications 2023.12.1
jupyter-client 8.6.2
jupyter-core 5.7.2
jupyter-events 0.10.0
jupyter-lsp 2.2.5
jupyter-server 2.14.1
jupyter-server-terminals 0.5.3
jupyterlab 4.2.2
jupyterlab-pygments 0.3.0
jupyterlab-server 2.27.2
kiwisolver 1.4.5
lazy-loader 0.4
legacy-api-wrap 1.4
leidenalg 0.10.2
llvmlite 0.42.0
matplotlib 3.8.4
matplotlib-inline 0.1.7
mistune 3.0.2
more-itertools 10.3.0
mudata 0.2.3
munkres 1.1.4
muon 0.1.6
natsort 8.4.0
nbclient 0.10.0
nbconvert 7.16.4
nbformat 5.10.4
nest-asyncio 1.6.0
networkx 3.3
notebook-shim 0.2.4
numba 0.59.1
numpy 1.26.4
omnipath 1.0.8
overrides 7.7.0
packaging 24.1
pandas 2.2.2
pandocfilters 1.5.0
parasail 1.3.4
parso 0.8.4
pathlib 1.0.1
patsy 0.5.6
pexpect 4.9.0
pickleshare 0.7.5
pillow 10.3.0
pip 24.0
pkgutil-resolve-name 1.3.10
platformdirs 4.2.2
plotly 5.22.0
ply 3.11
pooch 1.8.2
prometheus-client 0.20.0
prompt-toolkit 3.0.47
protobuf 4.25.3
psutil 5.9.8
ptyprocess 0.7.0
pure-eval 0.2.2
pycparser 2.22
pygments 2.18.0
pynndescent 0.5.13
pyparsing 3.1.2
python-Levenshtein 0.25.1
python-dateutil 2.9.0
python-json-logger 2.0.7
pytz 2024.1
pyzmq 26.0.3
rapidfuzz 3.9.3
referencing 0.35.1
requests 2.32.3
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rpds-py 0.18.1
scanpy 1.10.1
scikit-image 0.23.2
scikit-learn 1.5.0
scipy 1.13.1
scirpy 0.17.0
scrublet 0.2.3
seaborn 0.13.2
session-info 1.0.0
setuptools 70.0.0
sip 6.7.12
six 1.16.0
skranger 0.8.0
sniffio 1.3.1
soupsieve 2.5
squarify 0.4.3
stack-data 0.6.2
statsmodels 0.14.2
stdlib-list 0.10.0
tenacity 8.4.1
terminado 0.18.1
texttable 1.7.0
threadpoolctl 3.5.0
tifffile 2024.5.22
tinycss2 1.3.0
toml 0.10.2
tomli 2.0.1
tornado 6.4.1
tqdm 4.66.4
traitlets 5.14.3
typeguard 4.3.0
types-python-dateutil 2.9.0.20240316
typing-extensions 4.12.2
typing-utils 0.1.0
tzdata 2024.1
umap-learn 0.5.5
uri-template 1.3.0
urllib3 2.2.2
wcwidth 0.2.13
webcolors 24.6.0
webencodings 0.5.1
websocket-client 1.8.0
wheel 0.43.0
wrapt 1.16.0
xlrd 1.2.0
yamlordereddictloader 0.4.0
zipp 3.19.2
I was thinking that the problem is caused by concatenating my two samples because I can run the comman successfully on single samples. However, my colleague uses this notebook with multiple samples and runs in no issue. This is how I concatenated my two MuData objects after preprocessing:
adatas = []
adata_tmp = mu.read('/home/michael/Bioinfo/Output/new_cellranger_NC.h5mu')
adatas.append(adata_tmp)
adata_tmp = mu.read('/home/michael/Bioinfo/Output/new_cellranger_STIM.h5mu')
adatas.append(adata_tmp)
mdata = []
airrdata = []
#concatenate GEX
mdata = adatas[0]["gex"].concatenate(adatas[1]["gex"])
#concatenate AIRR
airrdata = adatas[0]["airr"].concatenate(adatas[1]["airr"])
#fuse AIRR and GEX
mdata = mu.MuData({'gex': mdata, 'airr': airrdata})
I would be incredible grateful for any help since I tried many different things now without success. Could this be a dependency issue between different packages or is there something that could go wrong with concatenating the two samples.
Thank you all for your help,
Michael