Get var_names used for training SCVI model

trev-f · September 11, 2023, 5:02pm

Goal

I am attempting to use scanVI to perform cell type annotation for my scRNA-seq data using a published pretrained scVI model and the adata used to train that model as a reference. I am following the Reference mapping with scvi-tools tutorial from the scvi-tools docs.

Problem

Attempting to load a saved scVI model gives warning that the adata I am using to load the model has different var_names than the adata used to train the model. While I am able to load the model, this warning makes me question whether I can trust the cell type predictions I will eventually get from scanVI.

Question

Is there a way to extract the names and order of the variables used to train the scVI model either from the saved scVI model or from the loaded SCVI model object?

Details

I have downloaded a published SCVI model and the full adata used to train the model. The model was trained on a subset of highly variable genes, and even though I have the code for how the highly variable genes were selected, it appears that I am not able to extract the highly variable genes in the right order.

This is my code for reading in the reference adata and list of highly variable genes then attempting to load the model:

# read reference adata
ref_h5ad_path = '../data/hypomap/hypoMap.h5ad'
ref_adata = sc.read(ref_h5ad_path)

# read list of highly variable genes
with open('../data/hypomap/hypoMap_highly_variable_genes.json') as f:
  highly_variable_genes = np.array(json.load(f))
  
# subset adata and keep highly variable genes
ref_adata = ref_adata[:, highly_variable_genes].copy()

# load scVI model
scvi_model_path = '../data/hypomap/model/'
scvi_ref = scvi.model.SCVI.load(
  scvi_model_path,
  adata = ref_adata
)

and this is the warning that I receive:

<project dir path>/renv/python/virtualenvs/renv-python-3.10/lib/python3.10/site-packages/scvi/model/base/_base_model.py:698: UserWarning: var_names for adata passed in does not match var_names of adata used to train the model. For valid results, the vars need to be the same and in the same order as the adata used to train the model.

Again, this does give me a successfully instantiated SCVI model object that I can use downstream for loading my query adata into and training and making predictions from the query model. I’m just not sure that I can trust the results at this stage or when I get to the scanVI modeling step.

Topic		Replies	Views
Writing/reading scANVI model error? scvi-tools	3	1028	January 23, 2023
Modify gene identifiers stored in existing scVI model scvi-tools	1	22	April 20, 2025
Usage of HVG in scVI scvi-tools gene-selection , scvi	12	2188	March 1, 2022
Differential expression with scvi - batch correction? scvi-tools scvi	1	259	June 19, 2024
The totalVI DE test; gene names scvi-tools totalvi	2	365	May 24, 2022

Get var_names used for training SCVI model

Related topics