Hi there - first, I am simply blown away by the suite of tools you have assembled. It is truly awe-inspiring and sets the bar for any other python / comp-bio developers.
My question (and please assist me if I have placed this on the forum in an undesired location): Do you have a general protocol for model optimization? In this instance, I am particularly interested in using MultiVI to integrate cells from the same sample measured by scRNA-seq and scATAC-seq. After following the tutorial, I have been playing around with the model parameters such as the # of hidden units / latent variables.
Despite some initial attempts at tuning parameters, etc., I have only obtained results that show a sub-optimal co-embedding:
'MultiVI Model with INPUTS: n_genes:7213, n_regions:28053\nn_hidden: 187, n_latent: 13, n_layers_encoder: 2, n_layers_decoder: 2 , dropout_rate: 0.1, latent_distribution: normal, deep injection: False, gene_likelihood: zinb'
Unfortunately, since I am a new user, I can only upload one image at the moment but I did try other model implementations and samples below.
'MultiVI Model with INPUTS: n_genes:8863, n_regions:33682\nn_hidden: 206, n_latent: 14, n_layers_encoder: 2, n_layers_decoder: 2 , dropout_rate: 0.1, latent_distribution: normal, deep injection: False, gene_likelihood: zinb'
'MultiVI Model with INPUTS: n_genes:8863, n_regions:33682\nn_hidden: 5, n_latent: 2, n_layers_encoder: 2, n_layers_decoder: 2 , dropout_rate: 0.1, latent_distribution: normal, deep injection: False, gene_likelihood: zinb'
I feel there must be a more optimal latent space to be identified, however I am not sure of the typical procedure you might recommend for HP tuning and optimization within your framework. Happy to share code / more info but I basically started at the MultiVI tutorial linked above. Any help is greatly appreciated.