Modify gene identifiers stored in existing scVI model

Hi! I trained a lot of scVI models with different parameters to asses their impact, however I realised at the end of the exercise that the original AnnData file was indexed in var through a simple numeric index (so the index wasn’t either gene identifiers nor gene symbols). It took quite a while to train all the models. Is it possible to modify, perhaps loading as a torch object, the models so that I can only change the gene labels used in the model (so that I can use them for cell type labelling with other datasets)?

If I load the model through torch:

model = torch.load(f"{model_path}/model.pt", map_location=torch.device("cpu"))

I can see these two dictionary elements with arrays containing my gene indices:

>> model['var_names']
array(['67', '83', '125', ..., '60058', '60060', '60159'], dtype=object)
>> model['attr_dict']['registry_']['field_registries']['X']['state_registry']['column_names']
array(['67', '83', '125', ..., '60058', '60060', '60159'], dtype=object)

They contain the same values, but they are different elements in memory (if I modify one, the other remains as it was). Would it suffice to replace the values in these arrays? Or is there something else in the torch model that would require being changed? Or is there a method from scvi that would allow me to replace these indices more cleanly without the need to fix this through torch and then re-serialise?

Thanks!

if I understood you correctly, you can load the saved model simply with
model = SCVI.load(pretrained_scvi_path, adata=adata_tmp)
and adata_tmp will be the other adata you use, where its var_names might be different than originally trained, like other indices(although it should be the same number of genes and order matters). The rest of registry for model will initialise correctly. You should be able to fine tune training then.

I think you method will work also though. you can compare them.

Not sure it will help here but another more general option will be to use prepare_query_anndata and than load_query_data with the new dataset, but here it will look for common gene indices between the model and new adata, order them as the model expect and pad with 0 those that are missing. You might use that once you already fixed the saved models.