Understanding scvi.module.VAEC


I am currently working with the vaec module. I decided to use this model to be able to reconstruct specific gene responses given the condition (e.g. cell type) I pass it in the decoder. I imagine that this could work similarly to e.g. a CVAE used for MNIST with the label condition corresponding to the integer value in the image which then can be reconstructed. However, I am having some troubles to better understand the module. I am not aware of a paper describing this module, like for scvi or totalvi. If there is a paper I would greatly appreciate if you can reference the paper to me :slight_smile:

First of all, I am having difficulties to grasp what the difference between VAEC and SCVI exactly is. From the source code I understand that the main difference is that you can pass additional label information to VAEC which is not possible in SCVI. However, I do not understand how these labels are exactly treated by the model, e.g. are they concatenated with the gene expression x? More specifically, given I define the labels as the batch_index, how is VAEC then different to SCVI as SCVI also considers batch difference?

Besides that I am struggling to understand if I can pass multiple conditions as labels. For example, passing cell types and drug responses. Is this only possible by creating a vector with the combinatorial numbers, so passing a 1d vectors as y that can take values from 1 to n_cell_types x n_batches) or can I pass a 2d vector, each dimension indexing one condition?

Thanks already in advance!

Hi there,

Thank you for using scvi-tools! The overarching method that uses vaec.py is DestVI which is covered in this user guide page: DestVI - scvi-tools. The page also references the original bioRxiv preprint that it is summarizing.

You are correct in that VAEC is much like the VAE module that SCVI uses where instead of batch you provide labels. Since DestVI is intended for spatial transcriptomics data, it also does not provide the usual batch correction that SCVI does. Instead, it focuses on determining inter-cell type variation.

For the purposes of understanding drug responses, you are probably better off using the more fully featured VAE class with the extra_categorical_covariates field containing your drug treatment metadata and cell types.

Hi Justin,

thanks for your answer! I switched to working with the SCVI model and am adding the covariates as extra_categorical_covariates.

When I understand it correctly then the features from extra_categorical_covariates are contaminated with the input x and not processed in a separate prior network as it is the case for conditional VAEs. Is that correct?

They are incorporated in the same fashion it is for a conditional VAE. Any categorical covariates are one hot encoded then concatenated to the input in the decoder. If encode_covariates is set to True, then they are additionally concatenated to the input x in the encoder.