How loss function is used in CellAssign

KANG-BIOINFO · July 29, 2021, 1:42pm

Hi,

I’m trying to understand how loss function in CellAssignModule is used in CellAssign Training / Testing. I couldn’t find loss function being used in either file. Do you know in which step the Q function of EM algorithm is applied?

Thanks a lot!

KANG-BIOINFO · July 29, 2021, 2:00pm

I can find where loss function is defined, but couldn’t see where it is applied for parameter inference.

adamgayoso · July 29, 2021, 3:36pm

This is the E step:

github.com

YosefLab/scvi-tools/blob/d3ba83a909480a8a561004f3fb58e08add161f16/scvi/external/cellassign/_module.py#L186-L195

    
      
          nb_pdf = NegativeBinomial(mu=mu_ngc, theta=phi)
          x_ = x.unsqueeze(-1).expand(n_cells, self.n_genes, self.n_labels)
          x_log_prob_raw = nb_pdf.log_prob(x_)  # (n, g, c)
          theta_log = theta_log.expand(n_cells, self.n_labels)
          p_x_c = torch.sum(x_log_prob_raw, 1) + theta_log  # (n, c)
          normalizer_over_c = torch.logsumexp(p_x_c, 1)
          normalizer_over_c = normalizer_over_c.unsqueeze(-1).expand(
              n_cells, self.n_labels
          )
          gamma = torch.exp(p_x_c - normalizer_over_c)  # (n, c)

This is the calculation of the expected log likelihood

github.com

YosefLab/scvi-tools/blob/d3ba83a909480a8a561004f3fb58e08add161f16/scvi/external/cellassign/_module.py#L219

    
      
              generative_outputs,
              n_obs: int = 1.0,
          ):
              # generative_outputs is a dict of the return value from `generative(...)`
              # assume that `n_obs` is the number of training data points
              p_x_c = generative_outputs["p_x_c"]
              gamma = generative_outputs["gamma"]
          
          
    # compute Q
              # take mean of number of cells and multiply by n_obs (instead of summing n)
              q_per_cell = torch.sum(gamma * -p_x_c, 1)
          
          
    # third term is log prob of prior terms in Q
              theta_log = F.log_softmax(self.theta_logit, dim=-1)
              theta_log_prior = Dirichlet(self.dirichlet_concentration)
              theta_log_prob = -theta_log_prior.log_prob(
                  torch.exp(theta_log) + THETA_LOWER_BOUND
              )
              prior_log_prob = theta_log_prob
              delta_log_prior = Normal(
                  self.delta_log_mean, self.delta_log_log_scale.exp().sqrt()

The M step is not closed form, we use a black box optimizer like in the original cellassign

github.com

YosefLab/scvi-tools/blob/d3ba83a909480a8a561004f3fb58e08add161f16/scvi/external/cellassign/_module.py#L238-L240

    
      
          return LossRecorder(
              loss, q_per_cell, torch.zeros_like(q_per_cell), prior_log_prob
          )

The loss above gets backpropagated to update all the model params.

KANG-BIOINFO · July 29, 2021, 4:40pm

Thanks @adamgayoso . It’s very helpful! I think I got the point of definition of E and Q functions. However, I’m still a little confused about the calling of loss function (Sorry I’m not very good at statistics and deep learning). I saw the calling of generative function in line 119 of CellAssign, but I’m not sure when the loss function defined in CellAssignModule is called.

I added print code within loss function to see if it’s called, and I couldn’t see the output of printing. I’m not sure whether and when the loss function is called. I guess I still need to understand better of the code.

adamgayoso · July 29, 2021, 5:12pm

Loss is called by this function here:

github.com

YosefLab/scvi-tools/blob/d3ba83a909480a8a561004f3fb58e08add161f16/scvi/module/base/_base_module.py#L95-L104

    
      
          def forward(
              self,
              tensors,
              get_inference_input_kwargs: Optional[dict] = None,
              get_generative_input_kwargs: Optional[dict] = None,
              inference_kwargs: Optional[dict] = None,
              generative_kwargs: Optional[dict] = None,
              loss_kwargs: Optional[dict] = None,
              compute_loss=True,
          ) -> Union[

But there is a lot going on with code abstraction and the finer details of using PyTorch. What is it that you’d like to do?

KANG-BIOINFO · July 29, 2021, 5:48pm

Thanks so much @adamgayoso! It really helps, and I totally understand the abstraction of code, which make it more flexible and compatible. Really appreciate the hard work!
Basically I changed a little bit of the code of CellAssignModule to fit in other variables in the CellAssign algorithm. What I’m concerned is that the variable inference in the new model might not work. It seems the predict function output the originally defined randomized delta variables instead of optimized delta variables in the new model. I debugged it for a while and it seems the loss function might not be called in the new code. Not sure why it happened.

Train function is still same. Prediction function is like below:

@torch.no_grad()
def predict(self) -> pd.DataFrame:
    """Predict soft cell type assignment probability for each cell."""
    adata = self._validate_anndata(None)
    scdl = self._make_data_loader(adata=adata)

    # predictions = []
    for idx, tensors in enumerate(scdl):
        generative_inputs = self.module._get_generative_input(tensors, None)
        outputs = self.module.generative(**generative_inputs)
        
        if idx == 0:
            delta_c = outputs["delta_c"]
            delta_p = outputs["delta_p"]
            delta_cp = outputs["delta_cp"]
        else:
            delta_c = torch.cat((delta_c, outputs["delta_c"]))
            delta_p = torch.cat((delta_p, outputs["delta_p"]))
            delta_cp = torch.cat((delta_cp, outputs["delta_cp"]))

        # gamma = outputs["gamma"]
        # predictions += [gamma.cpu()]

    # to be better specified ??
    return delta_c.numpy(), delta_p.numpy(), delta_cp.numpy()

adamgayoso · July 29, 2021, 5:50pm

Like you want to add covariates? Because this is currently implemented.

KANG-BIOINFO · July 29, 2021, 6:03pm

Yes, similar to that. Cell type and other annotations are known, so I think I need to change the loss function definition and variable inference code.

KANG-BIOINFO · July 29, 2021, 6:04pm

I don’t have latent variable, like cell type of CellAssign Model, in my own model.

adamgayoso · July 29, 2021, 7:05pm

Maybe you can see if the training step in the TrainingPlan is being called correctly.

KANG-BIOINFO · July 29, 2021, 7:14pm

Thanks @adamgayoso , I will have a check! Really appreciate your help!

Topic		Replies	Views
Having some difficulties with CellAssign help please =) scvi-tools cellassign	5	539	July 13, 2021
Function in scanvi that can help compute the probability of having a latent cell state given a cell type scvi-tools scanvi	1	30	February 26, 2025
Loss_z1_weight and loss_z1_unweight in scANVI scvi-tools scanvi	1	450	March 23, 2022
CellAssign labels cells mostly as 'other' Help cellassign	0	182	July 20, 2023
Generate cell expression from latent space directly scvi-tools scvi	2	686	June 26, 2023

How loss function is used in CellAssign

Related topics