Goal
I am attempting to use scanVI to perform cell type annotation for my scRNA-seq data using a published pretrained scVI model and the adata used to train that model as a reference. I am following the Reference mapping with scvi-tools tutorial from the scvi-tools
docs.
Problem
Attempting to load a saved scVI model gives warning that the adata
I am using to load the model has different var_names
than the adata
used to train the model. While I am able to load the model, this warning makes me question whether I can trust the cell type predictions I will eventually get from scanVI.
Question
Is there a way to extract the names and order of the variables used to train the scVI model either from the saved scVI model or from the loaded SCVI model object?
Details
I have downloaded a published SCVI model and the full adata
used to train the model. The model was trained on a subset of highly variable genes, and even though I have the code for how the highly variable genes were selected, it appears that I am not able to extract the highly variable genes in the right order.
This is my code for reading in the reference adata
and list of highly variable genes then attempting to load the model:
# read reference adata
ref_h5ad_path = '../data/hypomap/hypoMap.h5ad'
ref_adata = sc.read(ref_h5ad_path)
# read list of highly variable genes
with open('../data/hypomap/hypoMap_highly_variable_genes.json') as f:
highly_variable_genes = np.array(json.load(f))
# subset adata and keep highly variable genes
ref_adata = ref_adata[:, highly_variable_genes].copy()
# load scVI model
scvi_model_path = '../data/hypomap/model/'
scvi_ref = scvi.model.SCVI.load(
scvi_model_path,
adata = ref_adata
)
and this is the warning that I receive:
<project dir path>/renv/python/virtualenvs/renv-python-3.10/lib/python3.10/site-packages/scvi/model/base/_base_model.py:698: UserWarning: var_names for adata passed in does not match var_names of adata used to train the model. For valid results, the vars need to be the same and in the same order as the adata used to train the model.
Again, this does give me a successfully instantiated SCVI model object that I can use downstream for loading my query adata
into and training and making predictions from the query model. I’m just not sure that I can trust the results at this stage or when I get to the scanVI modeling step.