Hyperparameter optimization of SCVI model using the autotune package

Hello,

I’m using scvi’s autotune package to run hyperparameter optimization on an SCVI model. My object is very large, (~470k cells, 36k genes), and I’m planning to use it as a reference object so I’m taking the time to fine-tune integration.

The new update with the autotune package is very helpful! I do have a couple questions because I’m having a hard time making clear conclusions on how to best configure the model. It appears that I cannot directly feed hyperparameter combinations to the tuner, but rather a set of options for each hyperparameter from which the tuner selects num_samples combinations from. I’m trying to vary one hyperparameter at a time, so I iteratively create new tuner objects for each hyperparameter I want to isolate.

For example, to vary the number of layers, I will give the tuner tune.choice([1,2,3]) n_layers, and then tell it to take say 9 samples from here. The sampling appears to be random, and I’m not sure how to interpret the output. The tuner goes by validation loss, but some jobs stop after one epoch? I’m confused how the tuner runs each job and how it determines which hyperparamater combination is best. I’ll have tuner objects with the exact same hyperparameters and one will train for many epochs and reach a validation loss of, say 450, and the other will stop after one epoch in which it had the same loss as the optimal combination after one epoch. Additionally, due to what appears to be a random sampling process, the tuner will take 6 combinations with 2 layers, 2 with 1 layer, and 1 with 3 layers. I’m not sure this is a fair way to evaluate which hyperparameter is best, or perhaps the tuner learns which hyperparameters are worth testing more.

In general, I’m not getting very consistent results and I’m not sure if this is a product of tweaking hyperparameters having little effect on model performance (I know it’s advised to not deviate far from SCVI’s default parameters if at all), or if the tuner is not properly assessing my model configurations. For reference, I’m varying the following parameters:

  • N_latent:[6, 8, 10, 12]
  • N_layers:[1, 2, 3]
  • N_hidden:[128, 256]- 256 has been consistently better across all runs (even when I vary the number of HVGs), which makes sense given my object is so large
  • Dropout_rate:[0.1,0.2]

I would appreciate any input on this task, and thank you for putting out such a useful tool!

Best,
Sam

Hi, thank you for your question.

In response to how the tuner samples from the hyperparameter search space, you can modify which searching algorithm is used by passing in searcher to ModelTuner.fit. Currently, as you have observed, the default is set to a random search – we will be changing this in the future as it’s not really consistent.

I would recommend trying out searcher=hyperopt instead as it uses Bayesian optimization to sample new sets of hyperparameters. It will take into account how good previous samples were (in terms of the loss) and then make a more informed decision on how to draw new samples. This should be more consistent as you can also give the algorithm a sensible choice of parameters to start from (you can pass this in through searcher_kwargs. For example, if you know n_hidden=256 is better, you would want to start with this value such that the tuner will converge earlier.

In terms of how the tuner chooses to allocate resources to trials (i.e. the number of epochs a given sample gets trained for), this is controlled by the scheduler, which is set to asha by default. ASHA runs a group of parallel trials, and it will choose to terminate trials that perform poorly. Thus some trials might train for only a couple of epochs. Of course, this might be a suboptimal strategy as some model configurations might not train quickly enough, even though the hyperparameters are good. Because of this, I would recommend experimenting with changing the scheduler as well as max_epochs.

Hope this helps! Let me know if there’s anything else I can clarify.

Martin,

Thanks for getting back to me, this is very helpful! I hadn’t realized that I wasn’t taking advantage of the hyperopt bayesian optiimization hyperparameter search method- this should help greatly. In terms of scheduling allocation of resources to trials, it looks like the tuner will accept one of the five options (I don’t think I can customize the parameters within an individual scheduler). ASHA looks to be the best one indeed, but I can surely try some others to see if this varies performance.

I also wanted to ask if you think my hyperparameter search space seems plausible? I’m not sure if I should be searching a wider space of parameters or varying any others? For example, I could vary gene_likelihood, but in the past I’ve consistently seen zinb show lower validation loss than nb.

Thanks again!

Hi, you should be able to pass in additional keyword arguments into the scheduler you choose using scheduler_kwargs. In terms of the search space, I think the choice of hyperparameters seems reasonable. The only thing I would recommend would be to maybe cover a larger range of values for each hyperparam (e.g. setting n_latent=[10, 25, 50, 75]) and also varying the training hyperparameters such as the learning rate. The latter is important as training parameters will really affect how quickly your model converges, which will help since these trials train for only a handful of epochs.

Thank you again for the feedback Martin. Indeed I was able to alter my scheduler with scheduler_kwargs, which helped give a more robust assessment of hyperparameter configurations. Varying learning rate is interesting because it’s hard for me to establish a biological justification for why a smaller/larger learning rate leads to better model training. With the other parameters, they seem to follow the narrative of over/under fitting or needing to be in/decreased based on the biological complexity of the object. Right now I just interpret it as a network characteristic, almost like a trick to get the proper model convergence rather than something that could have greater biological meaning. Nonetheless, I really appreciate your help!

I wanted to ask a quick follow-up question. I’m able to tune the model learning rate through the tuner object to find an optimal rate, but I’m not sure how I would actually feed this learning rate to the model training parameters. I don’t see learning rate in any of the model keyword arguments. Is there a way to do this that I’m missing?

Hi, are you referring to passing in the learning rate after you tune the model? If so, you can do this as follows:

model = scvi.model.SCVI(adata)
model.train(plan_kwargs={"lr": your_learning_rate})

I am, but not within the same script. I ran the hyperparameter tuner in a separate script in which I tune objects with different number of HVGs and independently vary hyperparameters. Could I manually enter the lr with: model.train(plan_kwargs={“lr”: your_learning_rate})?

Yes, you can manually pass in the learning rate like that