Question about get_latent_representation function of scvi for scRNAseq data

Hongru-Hu · October 13, 2022, 7:42pm

Hi,

I was wondering how the get_latent_representation in SCVI RNA model is calculated. I tested the same function in PeakVI and I could confirm that the result from this function is the same when I use PEAKVI_model.module.z_encoder.encoder() followed by PEAKVI_model.module.z_encoder.mean_encoder(), but it’s not the case in SCVI. That says, the get_latent_representation of SCVI did not return the mean of the VAE bottleneck distribution?

Valentine_Svensson · October 18, 2022, 3:08am

Hi Hongru-Hu,

By default, SCVI.get_latent_representation() returns the mean of the variational distribution which approximates the posterior distribution of the latent representation z_i for cell i.

If you specify the option give_mean = False, SCVI.get_latent_representation() will instead return a sample from the variational distribution.

Hope this helps!
/Valentine

Hongru-Hu · October 18, 2022, 3:43am

Thanks, but did I use give_mean=True as default. The latent mean (mu of the latent distribution) calculated use the VAE module is not equal to the result from SCVI.get_latent_representation(give_mean=True) function.

adamgayoso · October 18, 2022, 4:47am

Can you share exactly the code you tried?

As you can see here it’s providing the mean of the qz distribution object

github.com

scverse/scvi-tools/blob/00e03420c5b7b59bd9ae57238bbfe431d9ab8865/scvi/model/base/_vaemixin.py#L186


      
              else:
                  qz_m, qz_v = outputs["qz_m"], outputs["qz_v"]
                  qz = torch.distributions.Normal(qz_m, qz_v.sqrt())
              if give_mean:
                  # does each model need to have this latent distribution param?
                  if self.module.latent_distribution == "ln":
                      samples = qz.sample([mc_samples])
                      z = torch.nn.functional.softmax(samples, dim=-1)
                      z = z.mean(dim=0)
                  else:
                      z = qz.loc
              else:
                  z = outputs["z"]
          
          
    latent += [z.cpu()]
              latent_qzm += [qz.loc.cpu()]
              latent_qzv += [qz.scale.square().cpu()]
          return (
              (torch.cat(latent_qzm).numpy(), torch.cat(latent_qzv).numpy())
              if return_dist
              else torch.cat(latent).numpy()

Hongru-Hu · October 20, 2022, 5:39pm

rna_adata.layers["counts"] = rna_adata.X.copy() # preserve counts

scvi.model.SCVI.setup_anndata(rna_adata, layer="counts")
rna_model = scvi.model.SCVI(rna_adata, n_hidden=256, n_latent=30)

rna_model.train(early_stopping=True, max_epochs=1000)

rna_mean = rna_model.module.z_encoder.mean_encoder(rna_model.module.z_encoder.encoder(torch.FloatTensor(rna_counts.to_numpy()).to(device)))

rna_latent = rna_model.get_latent_representation()

where rna_mean should have been equal to rna_latent, but not

adamgayoso · October 20, 2022, 5:58pm

The issue is that scvi does x=log(1+x) to the data before encoding for numerical stability.

Hongru-Hu · October 21, 2022, 4:33pm

oh I see, thanks. was that log(1+raw_counts) or log(1+CPM)?

adamgayoso · October 21, 2022, 5:37pm

This one! just a numerical stability choice

Topic		Replies	Views
Directly accessing scVI's decoder scvi-tools scvi	10	1050	September 8, 2022
How to extract batch-corrected expression matrix from trained scVI vae model scvi-tools scvi	5	1498	June 20, 2022
The output of scANVI scvi-tools integration , scanvi , scvi	2	152	June 19, 2024
Generate cell expression from latent space directly scvi-tools scvi	2	682	June 26, 2023
Ablating latent variables in LinearSCVI scvi-tools	3	38	February 6, 2025

Question about get_latent_representation function of scvi for scRNAseq data

Related topics