Question about get_latent_representation function of scvi for scRNAseq data


I was wondering how the get_latent_representation in SCVI RNA model is calculated. I tested the same function in PeakVI and I could confirm that the result from this function is the same when I use PEAKVI_model.module.z_encoder.encoder() followed by PEAKVI_model.module.z_encoder.mean_encoder(), but it’s not the case in SCVI. That says, the get_latent_representation of SCVI did not return the mean of the VAE bottleneck distribution?

Hi Hongru-Hu,

By default, SCVI.get_latent_representation() returns the mean of the variational distribution which approximates the posterior distribution of the latent representation z_i for cell i.

If you specify the option give_mean = False, SCVI.get_latent_representation() will instead return a sample from the variational distribution.

Hope this helps!

Thanks, but did I use give_mean=True as default. The latent mean (mu of the latent distribution) calculated use the VAE module is not equal to the result from SCVI.get_latent_representation(give_mean=True) function.

Can you share exactly the code you tried?

As you can see here it’s providing the mean of the qz distribution object

rna_adata.layers["counts"] = rna_adata.X.copy() # preserve counts

scvi.model.SCVI.setup_anndata(rna_adata, layer="counts")
rna_model = scvi.model.SCVI(rna_adata, n_hidden=256, n_latent=30)

rna_model.train(early_stopping=True, max_epochs=1000)

rna_mean = rna_model.module.z_encoder.mean_encoder(rna_model.module.z_encoder.encoder(torch.FloatTensor(rna_counts.to_numpy()).to(device)))

rna_latent = rna_model.get_latent_representation()

where rna_mean should have been equal to rna_latent, but not

The issue is that scvi does x=log(1+x) to the data before encoding for numerical stability.

oh I see, thanks. was that log(1+raw_counts) or log(1+CPM)?

This one! just a numerical stability choice