SCVI and SCANVI for label transfer how to assess accuracy?

Hello, I followed this tutorial sanbomics_scripts/scvi_label_transfer.ipynb at main · mousepixels/sanbomics_scripts · GitHub

where it first concatenated the sample(s) of unknown labels with the reference. Now ref and my samples have different batches then I trained vae=scvi.model.SCVI(adata) without any extra arguments and it ran for 6 epoch only ( data is about 1.4 million) so I wonder if 6 epoch is a good number and how to assess that the model is accurate??

then lvae = scvi.model.SCANVI.from_scvi_model(vae, adata = adata, unlabeled_category = ‘Unknown’,
labels_key = ‘cell_ontology_class’)

lvae.train(max_epochs=20, n_samples_per_label=100)

was done to predict the labels of my samples which have unknown. I want to ask what does the n_samples_per_label mean ? what I understand is that it takes representative cells for each label in this case 100 cells. those representative cells from the unknown cells? or what?

I would appreciate it if you help regarding this method

I tend to train for at least 20 epochs. However, this is more an experience based thing. You should check elbo_validation and elbo_train afterwards. You can increase batch_size to 1024 (increases runtime by a factor of 8). Yes it takes 1000 representative cells for each celltype (or if there are less than 100 cells of a celltype all cells of this type). The classifier doesn’t have balanced weights and this helps with balancing.

1 Like

Thank you.
strangely I do not have elbo_validation. May be I need to add another argument?

vae.history.keys()
dict_keys([‘kl_weight’, ‘train_loss_step’, ‘train_loss_epoch’, ‘elbo_train’, ‘reconstruction_loss_train’, ‘kl_local_train’, ‘kl_global_train’])

vae
SCVI model with the following parameters:
n_hidden: 128, n_latent: 10, n_layers: 1, dropout_rate: 0.1, dispersion: gene,
gene_likelihood: zinb, latent_distribution: normal.
Training status: Trained
Model’s adata is minified?: False

lvae.history.keys()
dict_keys([‘train_loss_step’, ‘train_loss_epoch’, ‘elbo_train’, ‘reconstruction_loss_train’, ‘kl_local_train’, ‘kl_global_train’, ‘train_classification_loss’, ‘train_accuracy’, ‘train_f1_score’, ‘train_calibration_error’])

lvae
ScanVI Model with the following params:
unlabeled_category: Unknown, n_hidden: 128, n_latent: 10, n_layers: 1,
dropout_rate: 0.1, dispersion: gene, gene_likelihood: zinb
Training status: Trained
Model’s adata is minified?: False

Can you add check_val_every_n_epoch=1 to train? This enables tracking validation losses.