Error in SCANVI.prepare_query_anndata

Hi,

I am trying to map a query on a reference using SCANVI.
I obtained a model for the reference (shape: 66585 x 18354)

sca.models.SCANVI.setup_anndata(adata_sct, batch_key=‘orig_ident’, labels_key=‘cell_types’, unlabeled_category=‘not labelled’)

vae = sca.models.SCANVI(adata_sct, n_hidden=128, n_latent=10, n_layers=1, dropout_rate=0.1, dispersion=‘gene’, gene_likelihood=‘zinb’)

vae.train()

Then I need to prepare the query (shape: 7268 x 24991) because it does not have the same number of features as the reference in the model does:

sca.models.SCANVI.prepare_query_anndata(query , vae)

I get this error:
INFO Found 79.79187098180233% reference vars in query data.
/Users/lansev00/.pyenv/versions/3.10.5/envs/scanpy_general/lib/python3.10/site-packages/scvi/model/base/_archesmixin.py:204: UserWarning: Query data contains less than 1% of reference var names. This may result in poor performance.
warnings.warn(
/Users/lansev00/.pyenv/versions/3.10.5/envs/scanpy_general/lib/python3.10/site-packages/scvi/model/base/_archesmixin.py:211: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass AnnData(X, dtype=X.dtype, ...) to get the future behavour.
adata_padding = AnnData(
IndexError: indices are out-of-bounds

I also tried to subset query to only the features that are common to both query and reference (shape: 7268 x 14645), but I get same error message.

1- it is telling that less than 1% of query var_names are represented in the reference var_names, which is not true and does not match the INFO message stating that 79% of the reference var_names are in the query var_names;
2- it is throwing me and index out-of-bound that I don’t understand.

Does anyone have an idea on how to debug this?

Hi,

Thanks for using scvi-tools. The print message is indeed incorrect and I just put out a change to fix that. The actual threshold for the warning is 80% which is why you got that message. As for the IndexError I unfortunately can’t figure it out without more context. Could you paste the whole stack trace here so I could hopefully help you out?

HI Justin,

Thank you for your quick response. Here is the full stack trace:

INFO File /Users/lansev00/Documents/HCA/HCA_reference/model.pt already downloaded
INFO Found 79.79187098180233% reference vars in query data.
/Users/lansev00/.pyenv/versions/3.10.5/envs/scanpy_general/lib/python3.10/site-packages/scvi/model/base/_archesmixin.py:204: UserWarning: Query data contains less than 1% of reference var names. This may result in poor performance.
warnings.warn(
/Users/lansev00/.pyenv/versions/3.10.5/envs/scanpy_general/lib/python3.10/site-packages/scvi/model/base/_archesmixin.py:211: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass AnnData(X, dtype=X.dtype, ...) to get the future behavour.
adata_padding = AnnData(

ValueError Traceback (most recent call last)
Input In [15], in <cell line: 1>()
----> 1 scvi.model.SCANVI.prepare_query_anndata(query,“/Users/lansev00/Documents/HCA/HCA_reference/”)

File ~/.pyenv/versions/3.10.5/envs/scanpy_general/lib/python3.10/site-packages/scvi/model/base/_archesmixin.py:217, in ArchesMixin.prepare_query_anndata(adata, reference_model, return_reference_var_names, inplace)
215 adata_padding.obs_names = adata.obs_names
216 # Concatenate object
→ 217 adata_out = anndata.concat(
218 [adata, adata_padding],
219 axis=1,
220 join=“outer”,
221 index_unique=None,
222 merge=“unique”,
223 )
224 else:
225 adata_out = adata

File ~/.pyenv/versions/3.10.5/envs/scanpy_general/lib/python3.10/site-packages/anndata/_core/merge.py:888, in concat(adatas, axis, join, merge, uns_merge, label, keys, index_unique, fill_value, pairwise)
886 concat_pairwise = {}
887 elif join == “outer”:
→ 888 layers = outer_concat_aligned_mapping(
889 [a.layers for a in adatas], reindexers, axis=axis, fill_value=fill_value
890 )
891 concat_mapping = outer_concat_aligned_mapping(
892 [getattr(a, f"{dim}m") for a in adatas],
893 index=concat_indices,
894 fill_value=fill_value,
895 )
896 if pairwise:

File ~/.pyenv/versions/3.10.5/envs/scanpy_general/lib/python3.10/site-packages/anndata/_core/merge.py:526, in outer_concat_aligned_mapping(mappings, reindexers, index, fill_value, axis)
522 cur_reindexers = reindexers
524 # Handling of missing values here is hacky for dataframes
525 # We should probably just handle missing elements for all types
→ 526 result[k] = concat_arrays(
527 [
528 el if not_missing(el) else np.zeros((n, 0), dtype=bool)
529 for el, n in zip(els, ns)
530 ],
531 cur_reindexers,
532 axis=axis,
533 index=index,
534 fill_value=fill_value,
535 )
536 return result

File ~/.pyenv/versions/3.10.5/envs/scanpy_general/lib/python3.10/site-packages/anndata/_core/merge.py:440, in concat_arrays(arrays, reindexers, axis, index, fill_value)
437 elif any(isinstance(a, sparse.spmatrix) for a in arrays):
438 sparse_stack = (sparse.vstack, sparse.hstack)[axis]
439 return sparse_stack(
→ 440 [
441 f(as_sparse(a), axis=1 - axis, fill_value=fill_value)
442 for f, a in zip(reindexers, arrays)
443 ],
444 format=“csr”,
445 )
446 else:
447 return np.concatenate(
448 [
449 f(x, fill_value=fill_value, axis=1 - axis)
(…)
452 axis=axis,
453 )

File ~/.pyenv/versions/3.10.5/envs/scanpy_general/lib/python3.10/site-packages/anndata/_core/merge.py:441, in (.0)
437 elif any(isinstance(a, sparse.spmatrix) for a in arrays):
438 sparse_stack = (sparse.vstack, sparse.hstack)[axis]
439 return sparse_stack(
440 [
→ 441 f(as_sparse(a), axis=1 - axis, fill_value=fill_value)
442 for f, a in zip(reindexers, arrays)
443 ],
444 format=“csr”,
445 )
446 else:
447 return np.concatenate(
448 [
449 f(x, fill_value=fill_value, axis=1 - axis)
(…)
452 axis=axis,
453 )

File ~/.pyenv/versions/3.10.5/envs/scanpy_general/lib/python3.10/site-packages/anndata/_core/merge.py:282, in Reindexer.call(self, el, axis, fill_value)
281 def call(self, el, *, axis=1, fill_value=None):
→ 282 return self.apply(el, axis=axis, fill_value=fill_value)

File ~/.pyenv/versions/3.10.5/envs/scanpy_general/lib/python3.10/site-packages/anndata/_core/merge.py:295, in Reindexer.apply(self, el, axis, fill_value)
293 return self._apply_to_df(el, axis=axis, fill_value=fill_value)
294 elif isinstance(el, sparse.spmatrix):
→ 295 return self._apply_to_sparse(el, axis=axis, fill_value=fill_value)
296 else:
297 return self._apply_to_array(el, axis=axis, fill_value=fill_value)

File ~/.pyenv/versions/3.10.5/envs/scanpy_general/lib/python3.10/site-packages/anndata/_core/merge.py:362, in Reindexer._apply_to_sparse(self, el, axis, fill_value)
356 elif axis == 0:
357 idxmtx = sparse.coo_matrix(
358 (np.ones(len(self.new_pos), dtype=bool), (self.new_pos, self.old_pos)),
359 shape=(len(self.new_idx), len(self.old_idx)),
360 dtype=idxmtx_dtype,
361 )
→ 362 out = idxmtx @ el
364 if len(to_fill) > 0:
365 out = out.tocsr()

File ~/.pyenv/versions/3.10.5/envs/scanpy_general/lib/python3.10/site-packages/scipy/sparse/_base.py:623, in spmatrix.matmul(self, other)
620 if isscalarlike(other):
621 raise ValueError("Scalar operands are not allowed, "
622 “use ‘*’ instead”)
→ 623 return self._mul_dispatch(other)

File ~/.pyenv/versions/3.10.5/envs/scanpy_general/lib/python3.10/site-packages/scipy/sparse/_base.py:536, in spmatrix._mul_dispatch(self, other)
534 if issparse(other):
535 if self.shape[1] != other.shape[0]:
→ 536 raise ValueError(‘dimension mismatch’)
537 return self._mul_sparse_matrix(other)
539 # If it’s a list or whatever, treat it like a matrix

ValueError: dimension mismatch

So I found a way around and I report what I observed here if of any help:

The error of dimension mismatch I reported occured if I added the raw counts in query.layers, even though the raw counts have the exact same dimensions as query.X. If I run scvi.model.SCANVI.prepare_query_anndata(query,“/Users/lansev00/Documents/HCA/HCA_reference/”)
on the query object without raw counts added to query.layers, then it runs fine without error.
But I still need my raw counts in .layers, so I did:
query.X = raw_counts
scvi.model.SCANVI.prepare_query_anndata(query,“/Users/lansev00/Documents/HCA/HCA_reference/”)
query.layers[‘counts’] = query.X
scvi.model.SCANVI.prepare_query_anndata(query,“/Users/lansev00/Documents/HCA/HCA_reference/”)

No more dimension mismatch error. Not elegant but solved it for me.
Looks like .prepare_query_anndata() may not update features in .layers[‘counts’] ??

Thanks for the detailed investigation. I realized this is actually an error reported here Layers not supported by `prepare_query_anndata` but required by `load_query_data` · Issue #1550 · scverse/scvi-tools · GitHub. We have a solution for it merged, and we will be making a new 0.17 release including this fix. It’s also included in a beta release of 0.17 which you can try with pip install scvi-tools==0.17.0-beta.0.

1 Like

v0.17 was just released! Please give it a try to see if it fixes the issue.