Protocol for model optimization (currently focused on MultiVI)

mvinyard · February 28, 2022, 5:28am

Hi there - first, I am simply blown away by the suite of tools you have assembled. It is truly awe-inspiring and sets the bar for any other python / comp-bio developers.

My question (and please assist me if I have placed this on the forum in an undesired location): Do you have a general protocol for model optimization? In this instance, I am particularly interested in using MultiVI to integrate cells from the same sample measured by scRNA-seq and scATAC-seq. After following the tutorial, I have been playing around with the model parameters such as the # of hidden units / latent variables.

Despite some initial attempts at tuning parameters, etc., I have only obtained results that show a sub-optimal co-embedding:

'MultiVI Model with INPUTS: n_genes:7213, n_regions:28053\nn_hidden: 187, n_latent: 13, n_layers_encoder: 2, n_layers_decoder: 2 , dropout_rate: 0.1, latent_distribution: normal, deep injection: False, gene_likelihood: zinb'

Unfortunately, since I am a new user, I can only upload one image at the moment but I did try other model implementations and samples below.

'MultiVI Model with INPUTS: n_genes:8863, n_regions:33682\nn_hidden: 206, n_latent: 14, n_layers_encoder: 2, n_layers_decoder: 2 , dropout_rate: 0.1, latent_distribution: normal, deep injection: False, gene_likelihood: zinb'

'MultiVI Model with INPUTS: n_genes:8863, n_regions:33682\nn_hidden: 5, n_latent: 2, n_layers_encoder: 2, n_layers_decoder: 2 , dropout_rate: 0.1, latent_distribution: normal, deep injection: False, gene_likelihood: zinb'

I feel there must be a more optimal latent space to be identified, however I am not sure of the typical procedure you might recommend for HP tuning and optimization within your framework. Happy to share code / more info but I basically started at the MultiVI tutorial linked above. Any help is greatly appreciated.

adamgayoso · February 28, 2022, 5:58pm

Thank you

MultiVI requires to have some data where ATAC + RNA are measured simultaneously. It doesn’t look like you have that here?

mvinyard · February 28, 2022, 10:07pm

Ah, I see - I misread the intention of the example notebook. You are correct: I do not have data where each modality is measured simultaneously in the same cell (just the same sample). That brings me to two questions:

If I paired (same-cell) RNA/ATAC data from a model cell line of the same disease, do you think this could serve as an anchor point along which we might integrate other samples measured by the same assays for which we have separate but matched modality measurements? Perhaps this question involves too many unknowns from your end to say yes or no.
Do you have any other recommendations given matched (but not paired) scRNA-seq and scATAC-seq samples using your framework? I could use PeakVI and scvi to analyze the modalities independently, but I would be curious to know if there is a preferred solution here…

Thanks again!

MP_Epana · May 14, 2025, 9:53pm

This is crazy old, but I’ll respond for the record:
I’ve seen pretty good integration for unpaired data when the peaks actually reflect the same amount of diversity as the scRNA. Frequently, the 10x peak matrix and other peak matrices based on bulk peak-calling will miss the diversity of peaks necessary for good integration.

Working with peak calls from a scATAC-seq peak calling, like MOCHA, can help immensely, and anchoring points of RNA/ATAC can also help, though it can’t completely overcome it.

Topic		Replies	Views
totalVI, peakVI, multiVI with scRNA-seq and scATAC-seq data scvi-tools multivi , totalvi , modeling	3	797	March 9, 2023
Preprocess of scATAC use for peakVI or MultiVI scvi-tools	2	40	June 19, 2025
multiVI and totalVI modal integration question scvi-tools scvi , multivi , totalvi	0	474	September 15, 2022
Does MultiVI support using tile matrix directly instead of peak calling for ATAC input? scvi-tools scvi , multivi	2	32	July 9, 2025
Integration of Multiple Multiome Datasets Multiome integration , multivi , totalvi	5	705	March 6, 2024

Protocol for model optimization (currently focused on MultiVI)

Related topics