I am attempting to use scanVI to perform cell type annotation for my scRNA-seq data using a published pretrained scVI model and the adata used to train that model as a reference. I am following the Reference mapping with scvi-tools tutorial from the
Attempting to load a saved scVI model gives warning that the
adata I am using to load the model has different
var_names than the
adata used to train the model. While I am able to load the model, this warning makes me question whether I can trust the cell type predictions I will eventually get from scanVI.
Is there a way to extract the names and order of the variables used to train the scVI model either from the saved scVI model or from the loaded SCVI model object?
I have downloaded a published SCVI model and the full
adata used to train the model. The model was trained on a subset of highly variable genes, and even though I have the code for how the highly variable genes were selected, it appears that I am not able to extract the highly variable genes in the right order.
This is my code for reading in the reference
adata and list of highly variable genes then attempting to load the model:
# read reference adata ref_h5ad_path = '../data/hypomap/hypoMap.h5ad' ref_adata = sc.read(ref_h5ad_path) # read list of highly variable genes with open('../data/hypomap/hypoMap_highly_variable_genes.json') as f: highly_variable_genes = np.array(json.load(f)) # subset adata and keep highly variable genes ref_adata = ref_adata[:, highly_variable_genes].copy() # load scVI model scvi_model_path = '../data/hypomap/model/' scvi_ref = scvi.model.SCVI.load( scvi_model_path, adata = ref_adata )
and this is the warning that I receive:
<project dir path>/renv/python/virtualenvs/renv-python-3.10/lib/python3.10/site-packages/scvi/model/base/_base_model.py:698: UserWarning: var_names for adata passed in does not match var_names of adata used to train the model. For valid results, the vars need to be the same and in the same order as the adata used to train the model.
Again, this does give me a successfully instantiated SCVI model object that I can use downstream for loading my query
adata into and training and making predictions from the query model. I’m just not sure that I can trust the results at this stage or when I get to the scanVI modeling step.