Reloading saved model gives different results

pckinnunen_lbl · February 1, 2024, 12:43am

Hello,

I’m working through the c. elegans tutorial for scVI. The first run through the tutorial worked great, and I was able to replicate all the results. However, I saved the trained model and tried to reload it, and I got substantially different results.

Specifically, I used the following code to save the model after training OR load the saved model, depending on the TRAIN_MODEL variable:

#previous code: Import data, setup adata, following tutorial exactly

TRAIN_MODEL = False
model_dir = os.path.join(save_dir, "scvi_model")

if TRAIN_MODEL:
    model = scvi.model.SCVI(
        adata, gene_likelihood = 'nb'
    )
    model.train(
        check_val_every_n_epoch = 1,
        max_epochs = 400,
        early_stopping = True,
        early_stopping_patience = 20,
        early_stopping_monitor = 'elbo_validation',
        use_gpu = False,
    )
    model.save(model_dir, overwrite=True)
else:
    model = scvi.model.SCVI.load(model_dir, adata = adata)

# Following code: get scVI latent space, apply UMAP and visualize

As an example of the differences I see, if I run the model initially I get really nice replication of the UMAP from the tutorial. I don’t have my result right now and can only share one picture as a newbie, but this is the tutorial image.

If I reload the model, I get much worse separation of my cell types:

Am I saving/loading the model incorrectly somehow? I didn’t see anything that seemed noteworthy in the save/load_model docs.

Thank you for any help!

gargerd · February 19, 2024, 11:13am

This happens in my case as well. I train the model and check the latent representation, which shows a good separation of the celltype clusters and mixing of the batch effects. However, when I save the model, re-load it and run get_latent_representation, I also get a “scrambled” representation of my cells, similarly to @pckinnunen_lbl.

Update:

I tested it a bit and this only happens, if I restart my kernel. If I run training-saving-loading within the same Jupyter notebook session, then the representations always look nice. However, if I restart my Jupyter kernel and then load the model, the scrambling of the cells happens.
Even if I set torch.manual_seed(0) in the beginning of my script.
Happens both in scvi-tools 1.0.4 and 1.1.0.

@martinkim0 , any idea what we are doing wrong?

martinkim0 · February 19, 2024, 5:14pm

Not sure what might be going wrong right now - could you send a reproducible script with data? I can look into this further.

gargerd · February 20, 2024, 2:47pm

Hey @martinkim0, I figured the issue out…

I overlooked a var_names for adata passed in does not match var_names of adata used to train the model. For valid results, the vars need to be the same and in the same order as the adata used to train the model. error.

I am working given gene subsets of my reference data, and at the beginning of the script I subsetted the reference based on a list of genes. The problem was the order of the genes: every time I ran the notebook, the order of the genes was changed, and differed from the gene order from the anndata I trained on… I fixed the order of the genes and now it is working fine!

@pckinnunen_lbl : Compare the genes and the order of the genes in your Anndata after loading it! If some genes are missing or the gene order in var_names is different from the order of the anndata the model was trained on, you might get this result!

martinkim0 · February 20, 2024, 4:56pm

Glad you were able to figure out the issue!

pckinnunen_lbl · February 20, 2024, 9:29pm

@gargerd Thank you! I failed to fully examine my outputs, since I was running the notebook on the same data, in the same way, each time. However, the way the tutorial does differential gene expression using scvi.data.poisson_gene_selection(adata) is stochastic, so different genes were selected each time I ran the code.

The following code snippet shows what happens (it should work on any computer, it’ll just download the packer2019 c. elegans data).

import os
import scanpy as sc
import scvi

# Get the data
save_dir = 'c_elegans_tut_files'
adata_path = os.path.join(save_dir, "packer2019.h5ad")
adata = sc.read(
    adata_path,
    backup_url="https://github.com/Munfred/wormcells-site/releases/download/packer2019/packer2019.h5ad",
)

#Make 2 identical copies of the data
adata1 = adata.copy()
adata2 =  adata.copy()

#Select highly variable genes in each one:
scvi.data.poisson_gene_selection(adata1)
scvi.data.poisson_gene_selection(adata2)

#Calculate the percentage of matching between the two adatas
print(
    sum(
        adata1.var.highly_variable == adata2.var.highly_variable
    )/len(adata1.var.highly_variable)
)

This showed that only ~99% of the genes match, so running this each time will yield slightly different identified genes.

martinkim0 · February 20, 2024, 9:56pm

Hmm I see - could you check whether adding scvi.settings.seed = 0 leads to the same stochastic results? If not, I’ll take a look at what’s causing this.

pckinnunen_lbl · February 20, 2024, 10:23pm

Sure!

Restarting the kernel and running the same code without a seed yields different percentages of matching indices between adata1 and adata2 each time:
run 1: 0.9931757491840569
run 2: 0.9932746513697953
run 3: 0.9934724557412719

If I add scvi.settings.seed = 0, I get identical results each run. However, the differentially expressed genes between adata1 and adata2 are still different.
run 1: 0.9930768469983187
run 2: 0.9930768469983187
run 3: 0.9930768469983187

So, using the seed would solve my problem. Thanks again!

Topic		Replies	Views
Writing/reading scANVI model error? scvi-tools	3	1028	January 23, 2023
Consistent results when using scvi.model.SCVI scvi-tools scvi , developer	3	757	May 11, 2022
_base_model.load() throwing weird errors when loading saved model scvi-tools scvi	4	321	April 10, 2023
Loading an scVI model from a pytorch lightning checkpoint scvi-tools scvi	3	800	August 29, 2023
Get var_names used for training SCVI model scvi-tools scvi	0	295	September 11, 2023

Reloading saved model gives different results

Related topics