Tuning/setting scvi.model.SCVI parameters

Diogo_de_Moraes · June 9, 2021, 6:23pm

Hello

Is there any recommendation for tuning the parameters in the scvi.model.SCVI() function?

I have been modifying the parameters until the clusters make more biological sense (such as immune cells far from adipocytes and adipocyte subpopulations closer in the UMAP). Does this sound reasonable? Should I simply stick to the default?

adamgayoso · June 11, 2021, 3:14am

Are you seeing a lot of variance of hyperparameters? What sorts of things are you currently changing? We don’t go too far from the defaults, even if we do.

Diogo_de_Moraes · June 11, 2021, 2:04pm

I messed with the number of hidden layers and dimensionality of the latent space (1-4 and 10-40). I modified these values because I expected different adipocyte types to be closely clustered, while other cell types far apart. Moreover, with higher values, brown adipocytes from different studies cluster together. But when I use values closer to the default some brown adipocytes cluster with white adipocytes.

adamgayoso · June 13, 2021, 3:33pm

1-4 would be much too small. If I changed defaults I might try n_layers=2, n_latent=30, gene_likelihood="nb"

Diogo_de_Moraes · June 16, 2021, 12:56pm

Thank you for the suggestion Adam. Could you explain the reasoning for these choices? Especially the gene_likehood. If I understood it correctly, nb should do better with overdispersed data, which should be my case due to very different cell types coming from different studies?

sandrav-CGEN · March 22, 2023, 3:06pm

I also was wondering about the impact of parameter changes.
I work on two datasets that I do not succeed to successfully integrate with scvi whereas with harmony it works well.
I will now try to change the default parameters and hope it may improve the integration.
As you suggested I will start with using n_layers=2, n_latent=30, gene_likelihood=“nb”.

For the integration I am using 4000 HVG (see below)

What is in your experience the impact of changing the number of HVGs?

Any other advice what I could check concerning the data that prevents integration?

adamgayoso · March 22, 2023, 7:24pm

I would ensure that the same exact genes are used to compare methods. I would also ensure the loss has converged.

What genes are being used by harmony?

sandrav-CGEN · March 23, 2023, 8:00am

Thanks for your help Adam.

In both models 4000 HVG genes are used.
And the loss function seems to converge:

Any other advice how to understand why the data does not integrate?
Would the choice of dispersion or gene_likelihood have a big impact on the model?

adamgayoso · March 23, 2023, 3:25pm

I would need to understand more about your data. How many cells? How many batches? Also how are you defining poor integration? Finally, there may be implementation differences between R and Python for HVG selection so to fairly compare methods it’s ideal to use the same exact 4000 genes.

Harmony typically does tend to achieve higher batch correction scores than scVI, but scVI tends to preserve biological variation better.

sandrav-CGEN · March 25, 2023, 8:41am

It’s quiet a big object 390938 cells × 33538 genes with only 2 batches.
I am using harmonypy and therefore the same 4000 HVG with both methods.

With poor integration I mean that let’s say in both datasets I can find Tregs but after integration with scvi in the umap they cluster one next to each other whereas harmony manages to put them into one cluster.

This only the t cell compartment from the datasets

Harmony:

scvi ( zoom in):

This umap is already better than the ones I got with default parameters.
The integration improved when changing the default parameters to n_layers=2, n_latent=30, gene_likelihood=“nb”.
I am planning now to run hyperparameter optimization with the tuner.

Topic		Replies	Views
Batch Integration Parameter Tuning scvi-tools integration , gene-selection , scvi , modeling	1	644	March 2, 2022
Suggestion on parameters for training scvi model scvi-tools integration , scvi	3	1794	December 4, 2023
Batch correction using scvI on multiple datasets + hyperparameter tuning of an scvI model scvi-tools integration , scvi , developer	1	245	February 16, 2024
Parameters in training model for integrating datasets with scVI in R scvi-tools integration , scvi , model-fit	13	109	March 9, 2025
Validation loss lower than the training loss in scvi scvi-tools integration , scvi	7	1262	June 7, 2023

Tuning/setting scvi.model.SCVI parameters

Related topics