Hi,
I have a question on matching features between a trained xVI model and a query dataset (apologies if this is a duplicate).
By default load_query_data
wants matching features (fair enough). If you set inplace_subset_query_vars=True
it does the subsetting in place but it throws a KeyError
if some features are missing in the query data var_names
.
First off, the KeyError here is not very helpful because if only a few genes are missing it will still tell me that all the variables are missing:
KeyError: "Values [**all the genes**] are not valid obs/ var names or indices."
In case of missing features, the scArches paper recommends zero-filling the matrix (as long as less than 10% of features are missing). So is there an easy way to check the features used for training from model.pt
, if there is no adata object saved with it? Or do I necessarily have to go back to the original reference anndata?
Ideally one should be able to share just a trained model w/o the big reference adata attached to it, so being able to access the missing genes would be important (like in older scvi-tools versions I could just read the var_names.csv). Perhaps load_query_data
could also have an optional parameter for zero filling with a message on which fraction of features are missing.
Thanks in advance!