Scanvi from scvi model saved to file

following the tutorial here

a scvi model can be saved and reloaded afterwards

scvi.model.SCVI.setup_anndata(adata_ref, batch_key="tech", layer="counts")
vae_ref = scvi.model.SCVI(
    adata_ref,
    **arches_params
)
vae_ref.train()
dir_path = "pancreas_model/"
vae_ref.save(dir_path, overwrite=True) 

but if I want to create a scanvi model from the scvi file I can’t, the 2 trainings (scvi, scanvi) have to happen in the same session?
in this example the scvi_model is already loaded in the session and the function below doesn’t accept a path in scvi_model

scvi_model = SCVI(adata_ref, **arches_params)
scvi_model.train()
# we can't save and restart here
scanvi_model = SCANVI.from_scvi_model(scvi_model, unlabeled_category="Unknown")
scanvi_model.train()

and this wouldn’t work in case you don’t have the reference anndata that was used to create the scvi model (defying the purpose of an already trained model)

vae = scvi.model.SCANVI.load("pancreas_model/")
ValueError: Save path contains no saved anndata and no adata was passed.

can the scvi training be splitted from the scanvi training by allowing the 2nd to start from the file instead of model instance in active session?

thanks!

Are you saying this doesn’t work?

scvi.model.SCVI.setup_anndata(adata_ref, batch_key="tech", layer="counts")
vae_ref = scvi.model.SCVI(
    adata_ref,
    **arches_params
)
vae_ref.train()
dir_path = "pancreas_model/"
vae_ref.save(dir_path, overwrite=True) 
adata_ref.write_h5ad(...)

New session

adata_ref = anndata.read_h5ad(...)
scvi_model = scvi.model.SCVI.load(dir_path, adata_ref)
scanvi_model = SCANVI.from_scvi_model(scvi_model, unlabeled_category="Unknown")

thanks for your reply!

scvi_model = scvi.model.SCVI.load(“/pancreas_model/”)
returns: ValueError: Save path contains no saved anndata and no adata was passed.

I saved the model with

dir_path = "pancreas_model/"
vae_ref.save(dir_path, overwrite=True)

In order for scvi.model.SCVI.load to work it’s necessary that either the model has an anndata (hence the above has to be saved with save_anndata =True, default is false ) or to provide an anndata,

organized in the same way as data used to train model, It is not necessary to run setup_anndata(), as AnnData is validated against the saved scvi setup dictionary. If None, will check for and load anndata saved with the model.

does it mean I always need the reference anndata or what does it exactly mean to provide an anndata organized in the same way as the data used for scvi training? like a dummy one?

thanks!

Hi,
I thought you may have a typo in this command. The save path, as you mentioned, is “pancreas_model/” not “/pancreas_model/” (one extra slash at the beginning). In order to avoid such error, it’s better to use the variable name dir_path in the load command instead of retyping, such as scvi_model = scvi.model.SCVI.load(dir_path)
Hope it helps,

Just updated my original reply to fix some typos – yes the load command needs an anndata unless you saved it with one.

ok, thanks for clarifying!