Using categorical_covariate_keys when sampling or generating normalised expression

Hi,

I want to use scVI for conditional sampling using two categorical variables. I have looked into a way to use get_normalized_expression(), however this one can only use the batch_key with transform_batch which can only be one covariate. Also I looked at posterior_predictive_sample but in this case it’s not possible to give any covariate. How could I sample and/or transform given two categorical variables?

Thank you in advance,
Alice Driessen

Hi, thank you for your question. Both get_normalized_expression and posterior_predictive_sample take in AnnDatas with the same structure as the AnnData used to initialize the model, so you should be able to provide categorical and continuous covariates if the model was set up that way. We currently only support counterfactual predictions for the batch key (transform_batch) since the latent space explicitly seeks to remove this effect. However, you could do something similar by directly changing your covariate of interest in the input AnnData itself.

Hi,

thank you for you fast reply! I realised later that I am looking for conditional sampling from the prior (latent space) and then decoding with covariates for interest. I’ve found a similar question here and the answer is a bit outdated because scvi-tools has been updates since. But I guess the general setup is similar, we sample from the latent normal distribution and then we can now use scvi.module.vae.generative() to decode, is this correct?

Could you help with this?

What I think should work do sample from the prior is this:

import torch
from torch.distributions import Normal
n_cells = 1000
n_samples = 1
with torch.no_grad():
    if model.module.gene_likelihood not in ["zinb", "nb", "poisson"]:
        raise ValueError("Invalid gene_likelihood.")


    # Sampling
    qz_m = torch.zeros(n_cells, model.module.n_latent)
    qz_v = torch.ones(n_cells, model.module.n_latent)
    z = Normal(qz_m, qz_v).sample()

    dec_batch_index = torch.zeros(n_cells, 1)
    y = torch.zeros(n_cells, 1)
    library = torch.zeros(n_cells, 1) # gets exponentiated

    # Here put the covariates of my interest
    cat_covs = torch.tensor(np.array(np.repeat([[3.,5.]], n_cells, axis=0), dtype="float32"))

    generative_outputs = model.module.generative(z, library, dec_batch_index, cat_covs=cat_covs)
    
    
    dist = generative_outputs["px"]
    if model.module.gene_likelihood == "poisson":
        l_train = generative_outputs["px"].rate
        l_train = torch.clamp(l_train, max=1e8)
        dist = torch.distributions.Poisson(
            l_train
        )  # Shape : (n_samples, n_cells_batch, n_genes)
    if n_samples > 1:
        exprs = dist.sample().permute(
            [1, 2, 0]
        )  # Shape : (n_cells_batch, n_genes, n_samples)
    else:
        exprs = dist.sample()

    exprs.cpu()

Is this correct?

I only took a quick pass through this but it looks like it should work!

Thanks I got it to work when I supply a fixed library size (for example 0 in my code block above). However, I now wanted to sample for a LogNormal distribution, as scVI also does. But then I run into problems. I’ve added my code below, but what happens downstream is that the theta and mu are exactly the same, leading to the rate (theta/mu) for the gamma distribution to be zero everywhere. Any idea why this happens or how I can prevent this?

What I’ve implemented is:


    with torch.no_grad():
        if model.module.gene_likelihood not in ["zinb", "nb", "poisson"]:
            raise ValueError("Invalid gene_likelihood.")


        # Sampling
        qz_m = torch.zeros(n_cells, model.module.n_latent)
        qz_v = torch.ones(n_cells, model.module.n_latent)
        z = Normal(qz_m, qz_v).sample()

        # TODO: allow for different batch indices
        dec_batch_index = torch.zeros(n_cells, 1)
        
        # HERE I SAMPLE DIFFERENT LIBRARY SIZES
        ln = torch.distributions.log_normal.LogNormal(torch.tensor(6.7649703), torch.tensor(0.16759828)) # Mean and variance of log library sizes in a batch of training data
        library = ln.sample(sample_shape=torch.Size([n_cells, 1]))
        cat_covs = torch.tensor(np.array(np.repeat([[3.,5.]], n_cells, axis=0), dtype="float32"))

        generative_outputs = model.module.generative(z, library, dec_batch_index, cat_covs=cat_covs)

        dist = generative_outputs["px"]
        if model.module.gene_likelihood == "poisson":
            l_train = generative_outputs["px"].rate
            l_train = torch.clamp(l_train, max=1e8)
            dist = torch.distributions.Poisson(
                l_train
            )  # Shape : (n_samples, n_cells_batch, n_genes)
        if n_samples > 1:
            exprs = dist.sample().permute(
                [1, 2, 0]
            )  # Shape : (n_cells_batch, n_genes, n_samples)
        else:
            exprs = dist.sample() # HERE IS WHERE THE ERROR OCCURS

        exprs.cpu()
    return exprs.numpy()

Error message:

ValueError: Expected parameter rate (Tensor of shape (3582, 1200)) of distribution Gamma(concentration: torch.Size([3582, 1200]), rate: torch.Size([3582, 1200])) to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values:
tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]])```

Ah sorry, I stated above that mu and theta are the same, which is not the case. mu is infinite, which is why theta/mu becomes 0. But I also figured out that I should sample the library size from a normal distribution, as it get exponentiated during the model.module.generative(), so by sampling from a normal distribution we will get a lognormal distributed library size.

So I use:

ln = torch.distributions.normal.Normal(torch.tensor(6.7649703), torch.tensor(0.16759828))
library = ln.sample(sample_shape=torch.Size([n_cells, 1]))