Batch Integration Parameter Tuning

abbass2 · March 2, 2022, 12:21am

Hello,

Great tool, thanks for all your efforts…couple questions below.

I am integrating ~500K cells from > 40 donors, and I am interested in parameter tuning for the model. I’m new to neural networks, but it seems like the output (clustering, markers, UMAP) would be primarily affected by the number of HVGs, number of layers, and final dimensions. I was wondering what your thoughts were on toggling these three inputs for model refinement. My understanding is increasing layers should tease out more hidden interactions, while increasing dimensions is essentially allocating more space to store variability/patterns?

Does your model automatically consider variance from sequencing depth or should users specify number of UMIs as a continuous covariate?

Thanks for your help.

adamgayoso · March 2, 2022, 11:28pm

Yes, though it’s not always so simple. There are pecularities when training Variational autoencoders related to inactive dimensions of the “bottleneck” layer. But your thought process is very reasonable.

What you could do is define some relevant metrics to you (like in scIB) and then do hyperparameter optimization using Ray Tune or other packages or a simple grid search.

The model automatically uses the observed library size of the gene expression data you supply (as it’s counts, just takes the sum). In the newest release you can provide your own size_factor_key to setup_anndata (on the linear scale!)

Topic		Replies	Views
Tuning/setting scvi.model.SCVI parameters scvi-tools scvi	9	1791	March 25, 2023
Protocol for model optimization (currently focused on MultiVI) scvi-tools multivi	3	538	May 14, 2025
Gene filtering prior to batch correction scRNA-seq scrna-seq , integration	2	728	July 9, 2021
MrVI input and interpretation scvi-tools	23	1211	July 31, 2024
Suggestion on parameters for training scvi model scvi-tools integration , scvi	3	1639	December 4, 2023

Batch Integration Parameter Tuning

Related topics