Matching features in query data and reference model

emdann · November 17, 2022, 10:37am

Hi,
I have a question on matching features between a trained xVI model and a query dataset (apologies if this is a duplicate).

By default load_query_data wants matching features (fair enough). If you set inplace_subset_query_vars=True it does the subsetting in place but it throws a KeyError if some features are missing in the query data var_names.

First off, the KeyError here is not very helpful because if only a few genes are missing it will still tell me that all the variables are missing:

KeyError: "Values [**all the genes**] are not valid obs/ var names or indices."

In case of missing features, the scArches paper recommends zero-filling the matrix (as long as less than 10% of features are missing). So is there an easy way to check the features used for training from model.pt, if there is no adata object saved with it? Or do I necessarily have to go back to the original reference anndata?

Ideally one should be able to share just a trained model w/o the big reference adata attached to it, so being able to access the missing genes would be important (like in older scvi-tools versions I could just read the var_names.csv). Perhaps load_query_data could also have an optional parameter for zero filling with a message on which fraction of features are missing.

Thanks in advance!

adamgayoso · November 17, 2022, 6:42pm

I think this method we added is what you’re looking for. Please let me know if it’s missing anything!

adamgayoso · November 17, 2022, 6:47pm

Usage is shown in this tutorial:

emdann · November 23, 2022, 2:04pm

That’s brilliant, thanks for the pointer!

Topic		Replies	Views
Interpreting validation loss curve in query to reference mapping scvi-tools	3	1172	November 15, 2022
Reference mapping with missing genes (vars) in SCVI / scANVI scvi-tools reference-mapping	1	866	December 10, 2022
Query data formatting to map onto a reference scvi-tools scarches	3	490	June 22, 2022
Using a model with categorical_covariate_key instead of batch_key scvi-tools	2	531	February 1, 2024
Error in SCANVI.prepare_query_anndata scvi-tools scanvi	5	963	July 14, 2022

Matching features in query data and reference model

Related topics