Modify gene identifiers stored in existing scVI model

pcm32 · April 11, 2025, 2:52pm

Hi! I trained a lot of scVI models with different parameters to asses their impact, however I realised at the end of the exercise that the original AnnData file was indexed in var through a simple numeric index (so the index wasn’t either gene identifiers nor gene symbols). It took quite a while to train all the models. Is it possible to modify, perhaps loading as a torch object, the models so that I can only change the gene labels used in the model (so that I can use them for cell type labelling with other datasets)?

If I load the model through torch:

model = torch.load(f"{model_path}/model.pt", map_location=torch.device("cpu"))

I can see these two dictionary elements with arrays containing my gene indices:

>> model['var_names']
array(['67', '83', '125', ..., '60058', '60060', '60159'], dtype=object)
>> model['attr_dict']['registry_']['field_registries']['X']['state_registry']['column_names']
array(['67', '83', '125', ..., '60058', '60060', '60159'], dtype=object)

They contain the same values, but they are different elements in memory (if I modify one, the other remains as it was). Would it suffice to replace the values in these arrays? Or is there something else in the torch model that would require being changed? Or is there a method from scvi that would allow me to replace these indices more cleanly without the need to fix this through torch and then re-serialise?

Thanks!

ori-kron-wis · April 20, 2025, 1:59pm

if I understood you correctly, you can load the saved model simply with
model = SCVI.load(pretrained_scvi_path, adata=adata_tmp)
and adata_tmp will be the other adata you use, where its var_names might be different than originally trained, like other indices(although it should be the same number of genes and order matters). The rest of registry for model will initialise correctly. You should be able to fine tune training then.

I think you method will work also though. you can compare them.

Not sure it will help here but another more general option will be to use prepare_query_anndata and than load_query_data with the new dataset, but here it will look for common gene indices between the model and new adata, order them as the model expect and pad with 0 those that are missing. You might use that once you already fixed the saved models.

Topic		Replies	Views
Get var_names used for training SCVI model scvi-tools scvi	0	301	September 11, 2023
Reloading saved model gives different results scvi-tools scvi	7	416	February 20, 2024
Error loading previously saved scVI model after updating packages / cloud environment scvi-tools	6	539	December 5, 2024
Scanvi from scvi model saved to file scvi-tools	5	2068	August 29, 2022
Is it possible to get the saved model from h5ad file? scvi-tools	3	485	October 17, 2023

Modify gene identifiers stored in existing scVI model

Related topics