When using the code below for data denoise/imputation, how to set the library_size in get_normalized_expression()? to obtain the output in count-scale, rather than normalized output?
scvi.model.SCVI.setup_anndata(adata,layer=“counts”)
vae = scvi.model.SCVI(adata)
vae.train()
vae.get_latent_representation()
vae.get_normalized_expression()
Also, are there any recommendations for parameters in vae.train(),e.g., max_epochs and train_size=0.9; as well as n_samples in get_normalized_expression()?
If I have multiple samples in adata, may I input the whole dataset together to do denoise? or do denoise for each sample, then combine the results?
Thank you!
You can use library_size=“latent” in get_normalized_expression to get count-like output, which is the “closest” to the denoised real counts.
Recommended parameters for training really depend on the problem, we set the default max_epochs=400 and train_size=0.9, batch size should be optimal for your GPU memory. But it doesnt have to be like this. What you care about is for model convergence and that it is not overfitting. You can use early stopping for this. See our different tutorials for reference. Increasing n_samples will result in more accurate outputs, but will take more time.
You should use sample_id as batch_key when setup the SCVI model, and denoise them jointly (unless the samples really differ in their genes/from different tissues or census completely - there are other models for that).