Hello,
I have several data sets that I am running scvi on via reticulate in R. I have a large data set which is over 350K cells and runs for 23 epochs and takes about 2 hours to complete. I have another data set that is about 25K cells and runs for 308 epochs and takes about over 6 hours to complete.
My question is why it takes longer to run on a smaller data set than it does on a much larger data set? Are there some additional parameters that I could set to make training on smaller data sets faster?
Below is the code I am running:
# Call integration
sc <- import('scanpy', convert = FALSE)
scvi <- import('scvi', convert = FALSE)
scvi$settings$progress_bar_style = 'tqdm'
print("Converting seurat object to anndata...")
DefaultAssay(so) = "RNA"
so <- FindVariableFeatures(so, selection.method = "vst", nfeatures = 2000)
# Get top genes and subset original matrix to include only top 2000 genes
top_genes <- head(VariableFeatures(so), 2000)
so_vargenes <- so[top_genes]
adata <- sc$AnnData(X = t(as.matrix(GetAssayData(so_vargenes,slot='counts'))), obs = so_vargenes[[]],var = GetAssay(so_vargenes)[[]])
# run setup_anndata
scvi$data$setup_anndata(adata,batch_key = 'patient_id')
# create the model
model = scvi$model$SCVI(adata, use_cuda = TRUE)
# train the model
model$train()
Output:
INFO Using batches from adata.obs["patient_id"]
INFO No label_key inputted, assuming all cells have same label
INFO Using data from adata.X
INFO Computing library size prior per batch
INFO Successfully registered anndata object containing 25934 cells, 2000 vars, 62 batches, 1 labels, and 0 proteins. Also registered 0 extra categorical covariates and 0 extra continuous covariates.
INFO Please do not further modify adata until model is trained.
INFO Training for 308 epochs
INFO KL warmup phase exceeds overall training phaseIf your applications rely on the posterior quality, consider training for more epochs or reducing the kl warmup.
INFO KL warmup for 400 epochs
Training...: 0%|
The output says to reduce the kl warmup which I tried to do according to Issue 735 by setting n_iter_kl_warmup = 0, but the kl warmup is still 400 epochs.
model$train(n_iter_kl_warmup=0)
INFO Training for 308 epochs
INFO KL warmup phase exceeds overall training phaseIf your applications rely on the posterior quality, consider training for more epochs or reducing the kl warmup.
INFO KL warmup for 400 epochs
Any help is greatly appreciated - thanks,
s2hui