Hi
I’m trying to do a label transfer using SCANVI using the following script:
import scanpy as sc
import scvi # scvi==1.0.4
from scvi.model import SCVI
from scvi.model import SCANVI
import rapids_singlecell as rsc
pancreas_full = sc.read_h5ad('./data/t2d/pancreas_full_v2.h5ad')
rmask = pancreas_full.obs['batch']=='reference'
rdata = pancreas_full[rmask].copy()
qdata = pancreas_full[~rmask].copy()
rdata.obs['cell_type_scanvi'] = rdata.obs["cell_type"].values
scvi.model.SCVI.setup_anndata(rdata, layer="counts", batch_key="batch")
scvi_ref = scvi.model.SCVI(rdata, n_layers=2, n_latent=30, gene_likelihood="nb")
scvi_ref.train(
max_epochs=1000,
check_val_every_n_epoch=10
)
scanvi_ref = scvi.model.SCANVI.from_scvi_model(scvi_ref,
labels_key='cell_type_scanvi',
unlabeled_category="Unknown")
scanvi_ref.train(
max_epochs=1000,
check_val_every_n_epoch=10
)
scvi.model.SCANVI.prepare_query_anndata(qdata, scanvi_ref)
scanvi_query = scvi.model.SCANVI.load_query_data(qdata, scanvi_ref)
scanvi_query.train(
max_epochs=1000,
plan_kwargs={"weight_decay": 0.0},
check_val_every_n_epoch=10,
)
but during the training of scanvi_query, on first epoch, latent space get’s full nan matrix and stops. I should mention that a colleague at TheisLab has also reported this behaviour and we think we should avoid training scvi for too many epochs.
TBH, the tutorial of scanvi sounds a bit ambiguous for me. it’s not clear enough to me that why we need scanvi_ref and scanvi_query as two distinct models. I mean my expected script based on the methodology of scanvi would be that:
- we don’t need scanvi_ref
- get
scanvi_qurery
by runningscanvi_query = scvi.model.SCANVI.load_query_data(qdata, scvi_ref)
scanvi_query.train()
and then:qdata.obs[SCANVI_PREDICTIONS_KEY] = scanvi_query.predict()
p.s. I tried adding inplace_subset_query_vars=True
but ran out of memory
Thanks for your help in advance