scVI dropout, need for it?

If any of the team working on scVI could help this query that’d be amazing.

I am just checking whether I am right in my interpretation of the base code, that the dropout occurs both in the encoder and decoder layers?

If that is the case, I just wanted to check if any internal validation works have been done on performance dependent on dropout values?

My vague understanding of VAEs (which isn’t particularly good), is that dropout isn’t necessarily a good thing? as it may affect the latent representation.

From a small trial of my data, it seems the umap embeddings look “better” when it is set to zero. I know this isn’t a good way of judging a model, but there aren’t really any consensus metrics and I find that autotune embeddings (while optimise for the lowest loss value), often result in overly smoothed umap plots where it is incredibly hard to distinguish cell types from each other.

1 Like

I believe we only have dropout by default in the encoder (see the encoder code here and decoder here).

We include dropout in scVI primarily for regularization - in other words, it’s helping the model not overfit on the training dataset. Of course this depends on the data, but generally speaking removing dropout leads to an increase in the validation loss and/or worse embeddings on the validation set, while decreasing training loss.

If you see that removing dropout leads to better visualization of the data, then go for it, I don’t see anything wrong with that. I would just be careful that the model is performing adequately on the validation set.

Thanks Martin, you were right, the validation loss was worse. Thanks, this was helpful

1 Like