Identifying Population-Specific Loadings with LDVAE

aheydari · October 20, 2022, 6:49pm

Hi there!

Thank you all very much for your great work on all the packages! I have a question on using LDVAE to identify population/cluster-specific genes contributing to the variation in that group:

My understanding is LDVAE (trained on all populations) gives us per gene weights (x dimension of the latent space) that can be used for interoperability. However, I am interested in identifying the top genes for each cluster/population present in the data. To extract the top loadings for each population, currently, I train LDVAE on specific groups and follow the standard pipeline. Is there a better way to extract cluster-specific loadings (given the drawbacks of training LDVAE on each population separately)?

Thanks so much for your time and help!

Valentine_Svensson · October 23, 2022, 2:11am

Hi,

The reason to use an LDVAE is to attempt to identify sets of co-expressed genes, where each latent dimension will correspond to one such set of genes. This way the activity of that set of genes is summarized along one axis. In this framing, I would make the assumption that a cluster/population is defined as an extreme on each axis (in particular if using the logistic latent space option).

If you are interested in individual genes that are enriched in specific populations or clusters, the best way to get those would be by using the .differential_expression() method. See this part of the intro tutorial: Introduction to scvi-tools - scvi-tools

The .differential_expression() method is agnostic to the structure of the model, so it works both for standard VAE, cVAE (when there are batches), and LDVAE. The ‘interpretability’ aspect of LDVAE is that the axes of latent representation vectors are directly tied to a collection of genes. It turns out that by using the statistical framework developed for the .differential_expression() method you can get interpretability (or at least explainability) for arbitrary areas of the representation space even if the decoding function is non-linear.

Now, if you want to learn which latent dimensions are assocated with groups/clusters/treatments/etc, that would be a different solution, but it doesn’t sound like this is what you are looking for?

Hope this helps!
/Valentine

Topic		Replies	Views
Differentially expressed genes between two Leiden clusters scvi-tools diff-exp , totalvi	8	1224	October 15, 2021
Understanding scvi.module.VAEC scvi-tools	3	838	March 22, 2022
DE analysis between two batch-specific clusters scvi-tools diff-exp , scvi	9	864	March 24, 2023
Extract high likelihood genes identified in latent time scVelo	0	494	December 6, 2022
How to interpret the latent space in scVI scvi-tools scvi	6	1724	March 5, 2024

Identifying Population-Specific Loadings with LDVAE

Related topics