How loss function is used in CellAssign


I’m trying to understand how loss function in CellAssignModule is used in CellAssign Training / Testing. I couldn’t find loss function being used in either file. Do you know in which step the Q function of EM algorithm is applied?

Thanks a lot!

I can find where loss function is defined, but couldn’t see where it is applied for parameter inference.

This is the E step:

This is the calculation of the expected log likelihood

The M step is not closed form, we use a black box optimizer like in the original cellassign

The loss above gets backpropagated to update all the model params.

Thanks @adamgayoso . It’s very helpful! I think I got the point of definition of E and Q functions. However, I’m still a little confused about the calling of loss function (Sorry I’m not very good at statistics and deep learning). I saw the calling of generative function in line 119 of CellAssign, but I’m not sure when the loss function defined in CellAssignModule is called.

I added print code within loss function to see if it’s called, and I couldn’t see the output of printing. I’m not sure whether and when the loss function is called. I guess I still need to understand better of the code.

Loss is called by this function here:

But there is a lot going on with code abstraction and the finer details of using PyTorch. What is it that you’d like to do?

Thanks so much @adamgayoso! It really helps, and I totally understand the abstraction of code, which make it more flexible and compatible. Really appreciate the hard work!
Basically I changed a little bit of the code of CellAssignModule to fit in other variables in the CellAssign algorithm. What I’m concerned is that the variable inference in the new model might not work. It seems the predict function output the originally defined randomized delta variables instead of optimized delta variables in the new model. I debugged it for a while and it seems the loss function might not be called in the new code. Not sure why it happened.

Train function is still same. Prediction function is like below:

def predict(self) -> pd.DataFrame:
    """Predict soft cell type assignment probability for each cell."""
    adata = self._validate_anndata(None)
    scdl = self._make_data_loader(adata=adata)

    # predictions = []
    for idx, tensors in enumerate(scdl):
        generative_inputs = self.module._get_generative_input(tensors, None)
        outputs = self.module.generative(**generative_inputs)
        if idx == 0:
            delta_c = outputs["delta_c"]
            delta_p = outputs["delta_p"]
            delta_cp = outputs["delta_cp"]
            delta_c =, outputs["delta_c"]))
            delta_p =, outputs["delta_p"]))
            delta_cp =, outputs["delta_cp"]))

        # gamma = outputs["gamma"]
        # predictions += [gamma.cpu()]

    # to be better specified ??
    return delta_c.numpy(), delta_p.numpy(), delta_cp.numpy()

Like you want to add covariates? Because this is currently implemented.

Yes, similar to that. Cell type and other annotations are known, so I think I need to change the loss function definition and variable inference code.

I don’t have latent variable, like cell type of CellAssign Model, in my own model.

Maybe you can see if the training step in the TrainingPlan is being called correctly.

Thanks @adamgayoso , I will have a check! Really appreciate your help!