Unexpected exception formatting exception when prepare_query_anndata

Hello, thanks for the tool!
I’m trying to do the label transfer from a ref data to my query data.
I’m following this turorial: Human Endometrial Cell Atlas (HECA): a step-by-step guide to mapping query endometrial datasets to the reference atlas.
Everything goes well, but when I run sca.models.SCANVI.prepare_query_anndata(query_adata, scanvae), it announced:

INFO     Found 96.39999999999999% reference vars in query data.                                                    
Unexpected exception formatting exception. Falling back to standard exception
Traceback (most recent call last):
  File "/dssg/opt/icelake/linux-centos8-icelake/gcc-11.2.0/miniconda3-4.10.3-f5dsmdmzng2ck6a4otduqwosi22kacfl/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3398, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/tmp/ipykernel_2491995/2532825497.py", line 2, in <cell line: 2>
    sca.models.SCANVI.prepare_query_anndata(query_adata, scanvae)
  File "/dssg/home/acct-aling1357/sherylshaw/.local/lib/python3.9/site-packages/scvi/model/base/_archesmixin.py", line 225, in prepare_query_anndata
    adata_out = anndata.concat(
  File "/dssg/home/acct-aling1357/sherylshaw/.local/lib/python3.9/site-packages/anndata/_core/merge.py", line 1367, in concat
  File "/dssg/home/acct-aling1357/sherylshaw/.local/lib/python3.9/site-packages/anndata/_core/anndata.py", line 271, in __init__
    Single dimensional annotations of the observation and variables are stored
  File "/dssg/home/acct-aling1357/sherylshaw/.local/lib/python3.9/site-packages/anndata/_core/anndata.py", line 473, in _init_as_actual
    if any((obs, var, uns, obsm, varm, obsp, varp)):
  File "/dssg/home/acct-aling1357/sherylshaw/.local/lib/python3.9/site-packages/anndata/_core/aligned_mapping.py", line 288, in __init__
    def __init__(
  File "/dssg/opt/icelake/linux-centos8-icelake/gcc-11.2.0/miniconda3-4.10.3-f5dsmdmzng2ck6a4otduqwosi22kacfl/lib/python3.9/_collections_abc.py", line 940, in update
    self[key] = other[key]
  File "/dssg/home/acct-aling1357/sherylshaw/.local/lib/python3.9/site-packages/anndata/_core/aligned_mapping.py", line 199, in __setitem__
    return key in self._data
  File "/dssg/home/acct-aling1357/sherylshaw/.local/lib/python3.9/site-packages/anndata/_core/aligned_mapping.py", line 268, in _validate_value
    return (self.parent.obs_names, self.parent.var_names)[self._axis]
  File "/dssg/home/acct-aling1357/sherylshaw/.local/lib/python3.9/site-packages/anndata/_core/aligned_mapping.py", line 89, in _validate_value
    val = ensure_df_homogeneous(val, f"{name} {key!r}")
ValueError: Value passed for key 'PCs' is of incorrect shape. Values of varm must match dimensions ('var',) of parent. Value had shape (58903,) while it should have had (38678,).

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/dssg/opt/icelake/linux-centos8-icelake/gcc-11.2.0/miniconda3-4.10.3-f5dsmdmzng2ck6a4otduqwosi22kacfl/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 1993, in showtraceback
    stb = self.InteractiveTB.structured_traceback(
  File "/dssg/opt/icelake/linux-centos8-icelake/gcc-11.2.0/miniconda3-4.10.3-f5dsmdmzng2ck6a4otduqwosi22kacfl/lib/python3.9/site-packages/IPython/core/ultratb.py", line 1118, in structured_traceback
    return FormattedTB.structured_traceback(
  File "/dssg/opt/icelake/linux-centos8-icelake/gcc-11.2.0/miniconda3-4.10.3-f5dsmdmzng2ck6a4otduqwosi22kacfl/lib/python3.9/site-packages/IPython/core/ultratb.py", line 1012, in structured_traceback
    return VerboseTB.structured_traceback(
  File "/dssg/opt/icelake/linux-centos8-icelake/gcc-11.2.0/miniconda3-4.10.3-f5dsmdmzng2ck6a4otduqwosi22kacfl/lib/python3.9/site-packages/IPython/core/ultratb.py", line 865, in structured_traceback
    formatted_exception = self.format_exception_as_a_whole(etype, evalue, etb, number_of_lines_of_context,
  File "/dssg/opt/icelake/linux-centos8-icelake/gcc-11.2.0/miniconda3-4.10.3-f5dsmdmzng2ck6a4otduqwosi22kacfl/lib/python3.9/site-packages/IPython/core/ultratb.py", line 818, in format_exception_as_a_whole
    frames.append(self.format_record(r))
  File "/dssg/opt/icelake/linux-centos8-icelake/gcc-11.2.0/miniconda3-4.10.3-f5dsmdmzng2ck6a4otduqwosi22kacfl/lib/python3.9/site-packages/IPython/core/ultratb.py", line 736, in format_record
    result += ''.join(_format_traceback_lines(frame_info.lines, Colors, self.has_colors, lvals))
  File "/dssg/opt/icelake/linux-centos8-icelake/gcc-11.2.0/miniconda3-4.10.3-f5dsmdmzng2ck6a4otduqwosi22kacfl/lib/python3.9/site-packages/stack_data/utils.py", line 145, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/dssg/opt/icelake/linux-centos8-icelake/gcc-11.2.0/miniconda3-4.10.3-f5dsmdmzng2ck6a4otduqwosi22kacfl/lib/python3.9/site-packages/stack_data/core.py", line 698, in lines
    pieces = self.included_pieces
  File "/dssg/opt/icelake/linux-centos8-icelake/gcc-11.2.0/miniconda3-4.10.3-f5dsmdmzng2ck6a4otduqwosi22kacfl/lib/python3.9/site-packages/stack_data/utils.py", line 145, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/dssg/opt/icelake/linux-centos8-icelake/gcc-11.2.0/miniconda3-4.10.3-f5dsmdmzng2ck6a4otduqwosi22kacfl/lib/python3.9/site-packages/stack_data/core.py", line 649, in included_pieces
    pos = scope_pieces.index(self.executing_piece)
  File "/dssg/opt/icelake/linux-centos8-icelake/gcc-11.2.0/miniconda3-4.10.3-f5dsmdmzng2ck6a4otduqwosi22kacfl/lib/python3.9/site-packages/stack_data/utils.py", line 145, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/dssg/opt/icelake/linux-centos8-icelake/gcc-11.2.0/miniconda3-4.10.3-f5dsmdmzng2ck6a4otduqwosi22kacfl/lib/python3.9/site-packages/stack_data/core.py", line 628, in executing_piece
    return only(
  File "/dssg/opt/icelake/linux-centos8-icelake/gcc-11.2.0/miniconda3-4.10.3-f5dsmdmzng2ck6a4otduqwosi22kacfl/lib/python3.9/site-packages/executing/executing.py", line 164, in only
    raise NotOneValueFound('Expected one value, found 0')
executing.executing.NotOneValueFound: Expected one value, found 0

I compared genes between ref and query adata, found query has only 1928, while the ref adata has 2000 genes.

# Get common genes
common_genes = np.intersect1d(query_adata.var_names, ref_adata.var_names)
print(f"Number of common genes: {len(common_genes)}")
Number of common genes: 1928

I tried to subset the 1928 genes and run

model = sca.models.SCANVI.load_query_data(
    query_adata,
    ref_path,
    freeze_dropout = True,
)

there will be anoter error:

ValueError: Number of vars in `adata_target` not the same as source. Expected: 2000 Received: 1928

I’d appreciate it a lot if anyoone could give me some suggestion!!! :smiling_face_with_tear:

Hey @SherylShaw

Hard to say what is going on cause there is some preprocess before as the failure begin with this line:

I did notice you are using python3.9, which is not support in recent scvi-tools versions.
My suggestion to you is to update to most recent version in a new python 3.12 env and try again.

Having less genes in query than reference shouldn’t be the cause of problem as we reorder the genes and pads any missing genes with 0s when running prepare_query_anndata

Thanks! I update my python version and rerun it but it still doesn’t work.

For adata, I did the ordinary quality control, doublet detection, normalization, feature selection, dimensionality reduction, and Nearest neighbor graph constuction followed by scanpy tutorial.
This is my adata:

AnnData object with n_obs × n_vars = 20297 × 38606
    obs: 'sample', 'Binary_Stage', 'Age', 'dataset', 'Hormonal_treatment', 'Biopsy_type', 'Tissue_sampled', 'Endometrial_pathology', '10x_kit', 'age_group', 'n_genes_by_counts', 'total_counts', 'pct_counts_mt', 'n_genes', 'doublet_score', 'predicted_doublet'
    var: 'n_cells_by_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
    uns: 'scrublet', 'log1p', 'hvg', 'pca', 'sample_colors', 'neighbors', 'umap', 'predicted_doublet_colors'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'counts'
    obsp: 'distances', 'connectivities'

I tidy up my code, here’s the main code:

import os
import scanpy as sc
import torch
import anndata as ad
import scanpy.external as sce
import scarches as sca
from scarches.dataset.trvae.data_handling import remove_sparsity
import gdown
import scvi
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the single-cell RNA-seq dataset downloaded from Reproductive Cell Atlas portal.
atlas_path = "../data/endometriumAtlasV2_cells_with_counts.h5ad"
adata_atlas = sc.read_h5ad(atlas_path)

# prepare ref_data
ref_adata = adata_atlas
ref_adata.X = ref_adata.layers["counts"]
ref_adata.shape
# (313527, 17736)

ref_adata = ref_adata[:, ref_adata.var.highly_variable].copy()

# prepare query_data
adata = ad.read_h5ad('../data/self_basicprocessed_adata.h5ad')
query_adata = adata

# load scANVI model
ref_path = "../data/scanvi_model/"
scanvae = sca.models.SCANVI.load(ref_path, ref_adata)
# INFO     File ../data/scanvi_model/model.pt already downloaded      

# train
sca.models.SCANVI.prepare_query_anndata(query_adata, scanvae)

and this time the error is:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_2331062/2532825497.py in <cell line: 0>()
      1 # Now try preparing the query data again
----> 2 sca.models.SCANVI.prepare_query_anndata(query_adata, scanvae)

~/miniconda3/envs/py3.12/lib/python3.12/site-packages/scvi/model/base/_archesmixin.py in prepare_query_anndata(adata, reference_model, return_reference_var_names, inplace)
    264             return var_names
    265 
--> 266         return _pad_and_sort_query_anndata(adata, var_names, inplace)
    267 
    268     @staticmethod

~/miniconda3/envs/py3.12/lib/python3.12/site-packages/scvi/model/base/_archesmixin.py in _pad_and_sort_query_anndata(adata, reference_var_names, inplace)
    456         adata_padding.obs_names = adata.obs_names
    457         # Concatenate object
--> 458         adata_out = anndata.concat(
    459             [adata, adata_padding],
    460             axis=1,

~/miniconda3/envs/py3.12/lib/python3.12/site-packages/anndata/_core/merge.py in concat(adatas, axis, join, merge, uns_merge, label, keys, index_unique, fill_value, pairwise)
   1413             UserWarning,
   1414         )
-> 1415     return AnnData(
   1416         **{
   1417             "X": X,

~/miniconda3/envs/py3.12/lib/python3.12/site-packages/anndata/_core/anndata.py in __init__(self, X, obs, var, uns, obsm, varm, layers, raw, dtype, shape, filename, filemode, asview, obsp, varp, oidx, vidx)
    250             self._init_as_view(X, oidx, vidx)
    251         else:
--> 252             self._init_as_actual(
    253                 X=X,
    254                 obs=obs,

~/miniconda3/envs/py3.12/lib/python3.12/site-packages/anndata/_core/anndata.py in _init_as_actual(self, X, obs, var, uns, obsm, varm, varp, obsp, raw, layers, dtype, shape, filename, filemode)
    441 
    442         self.obsm = obsm
--> 443         self.varm = varm
    444 
    445         self.obsp = obsp

~/miniconda3/envs/py3.12/lib/python3.12/site-packages/anndata/_core/aligned_mapping.py in __set__(self, obj, value)
    433     ) -> None:
    434         value = convert_to_dict(value)
--> 435         _ = self.construct(obj, store=value)  # Validate
    436         if obj.is_view:
    437             obj._init_as_actual(obj.copy())

~/miniconda3/envs/py3.12/lib/python3.12/site-packages/anndata/_core/aligned_mapping.py in construct(self, obj, store)
    406         if self.axis is None:
    407             return self.cls(obj, store=store)
--> 408         return self.cls(obj, axis=self.axis, store=store)
    409 
    410     @property

~/miniconda3/envs/py3.12/lib/python3.12/site-packages/anndata/_core/aligned_mapping.py in __init__(self, parent, axis, store)
    295             raise ValueError()
    296         self._axis = axis
--> 297         super().__init__(parent, store=store)
    298 
    299 

~/miniconda3/envs/py3.12/lib/python3.12/site-packages/anndata/_core/aligned_mapping.py in __init__(self, parent, store)
    208         self._data = store
    209         for k, v in self._data.items():
--> 210             self._data[k] = self._validate_value(v, k)
    211 
    212     def __getitem__(self, key: str) -> Value:

~/miniconda3/envs/py3.12/lib/python3.12/site-packages/anndata/_core/aligned_mapping.py in _validate_value(self, val, key)
    277                     msg = "Index.equals and pd.testing.assert_index_equal disagree"
    278                     raise AssertionError(msg)
--> 279         return super()._validate_value(val, key)
    280 
    281     @property

~/miniconda3/envs/py3.12/lib/python3.12/site-packages/anndata/_core/aligned_mapping.py in _validate_value(self, val, key)
     95                     f"Value had shape {actual_shape} while it should have had {right_shape}."
     96                 )
---> 97             raise ValueError(msg)
     98 
     99         name = f"{self.attrname.title().rstrip('s')} {key!r}"

ValueError: Value passed for key 'PCs' is of incorrect shape. Values of varm must match dimensions ('var',) of parent. Value had shape (58903,) while it should have had (38678,).

Looking at this error:

and

         adata_out = anndata.concat(
    459             [adata, adata_padding],
    460             axis=1

what I infer is that the issue is with the adata concat which is a result of the padding we do due to missing genes in referece.

But than the problem is that the query data had varm “PCs” that is not like its parent adata “var” gene counts (58903 vs 38678). Do we need to use the PCs here (coming from Nearest neighbor graph)?
can you verify that indeed we get :
adata.varm.parent.shape==adata.shape
I think that the query might have selected different number of PCs and this is why it got so confused.
If you dont need PC, dont use them in the adata.

Also please validate you are running a recent version of anndata.

1 Like

Thanks!!! I calculated PCs when I dealt with the batch effect to compared the effects from different methods. I delete it by del query_adata.varm['PCs'] and the error solved!

By the way , I run adata.varm.parent.shape==adata.shape. It returns TRUE.
And I checked my anndata version, which is ‘0.11.3’, the recent one.

Thank you again for your suggestion and kindness!!

1 Like