Ablating latent variables in LinearSCVI

ajaynadig · February 6, 2025, 5:10pm

Hello, and thank you for these amazing and very useful packages.

I have trained a series of LDVAE models and suspect that a particular latent variable is driving performance in a subset of evaluation cells.

To test this hypothesis, I would like to set that latent variable equal to zero and attempt reconstruction, to see how reconstruction error changes.

However, when I naively tried to write my own modifiable reconstruction script, trying to match get_normalized_expression, with something like:

latent_rep = np.array(model.get_latent_representation(adata))
loadings = np.array(model.get_loadings())
reconstruction = latent_rep @ loadings.T # as in Svensson 2020

I found that the resulting reconstruction values, which in my understanding are \rho, would often be negative; these appear not to be valid parameters for negative binomial distributions. This makes me suspect that there is some hidden parameter that I am not appreciating, for example some biases that are added at the end.

What is the right way to implement getting a reconstruction of some data? My eventual goal is to intercept the latent variables in this process, set one of them to zero, and decode that modified latent representation.

cane11 · February 6, 2025, 7:38pm

Hi, in the LDVAE model itself, a softmax activation function is used at the output. This makes the model not strictly linear but makes it similar to scVI. Ypu need to use this activation as well. You might want to check out DrVI if you require a strictly linear model (more complex decoder though).

ajaynadig · February 6, 2025, 9:30pm

Thank you for the reply! That makes sense and I have implemented the softmax on the decoded expression. The output is correlated with the output of get_normalized_expression, but I am worried that the outputs are still not exactly the same.

Below is a minimal example with some simulated expression data that shows the issue:

import numpy as np
import scanpy as sc
import scvi
import anndata as ad
import matplotlib.pyplot as plt
import seaborn as sns
import os
import pandas as pd

# Set random seed for reproducibility
np.random.seed(42)
scvi.settings.seed = 42

# Generate synthetic data
n_cells = 1000
n_genes = 200
n_latent = 10

# Create random count data from negative binomial
raw_counts = np.random.negative_binomial(5, 0.3, size=(n_cells, n_genes))

# Create AnnData object with raw counts
adata = ad.AnnData(raw_counts)

# Setup and train the model with raw counts in X
scvi.model.LinearSCVI.setup_anndata(adata)  # Use raw counts from X
model = scvi.model.LinearSCVI(
    adata,
    n_latent=n_latent,
)

# Train the model
model.train(max_epochs=100, train_size=1.0)

loadings = model.get_loadings()
latents = model.get_latent_representation()

reconstruction = np.array(model.get_normalized_expression())

#write a brief numpy based softmax function, with subtraction in the numerator for numerical stability
def stable_softmax(x):
    """Compute the softmax across the second dimension (axis=1) of matrix x."""
    x_max = np.max(x, axis=1, keepdims=True)  # Subtract max for numerical stability
    exp_x = np.exp(x - x_max)
    return exp_x / np.sum(exp_x, axis=1, keepdims=True)


manual_reconstruction = stable_softmax(np.array(latents @ loadings.T))

#correlation between manual reconstruction and reconstruction for the first five cells
for i in range(5):
    print(np.corrcoef(reconstruction[i,:], manual_reconstruction[i,:])[0, 1])

Which gives these correlations between the built-in and “manual” reconstructions:
0.4817705853184663
0.8996616372856779
0.6568411356280006
0.7262337615103117
0.9316294369319855

Am I missing some other transformation or input argument needed to make these match up?

Thank you very much!

cane11 · February 6, 2025, 9:47pm

There is another batchnorm in the decoder before applying the softmax by default, though you can disable it by setting batchnorm to „encoder“.
Also see: scvi-tools/src/scvi/nn/_base_components.py at 2f1611c5d14392fc5b0e7e9706e2b9d80b20e1d5 · scverse/scvi-tools · GitHub

Topic		Replies	Views
The output of scANVI scvi-tools integration , scanvi , scvi	2	157	June 19, 2024
Question about get_latent_representation function of scvi for scRNAseq data scvi-tools scvi	7	1052	October 21, 2022
How to extract batch-corrected expression matrix from trained scVI vae model scvi-tools scvi	5	1509	June 20, 2022
Directly accessing scVI's decoder scvi-tools scvi	10	1058	September 8, 2022
Assessing scVI fit by gene scvi-tools scvi , model-fit	3	514	June 23, 2021

Ablating latent variables in LinearSCVI

Related topics