Another idea I thought I’d throw out in case anyone wanted to help is explicitly modeling of the cell cycle. I get that it is implicitly modeled in standard latent space, but there are many. Since the cell cycle is extremely well understood across the evolutionary tree, and it’s pretty easy to generate “cell cycle score” with known phasic gene lists, should be easy enough to formulate prior beliefs to do the same thing within this framework.
I’m imagining a circular latent dimension… this seems easy enough with a sine transform of some uniform τ distributed [0,2π]. Seems to be sort of what they tried here but I think they made some mistakes and it performs poorly on real data.
I have been trying something similar to CellAssign’s prior, with markers of a phase denoted 1 for that classification -1 otherwise and all other genes denoted 0 (initially agnostic). I was maybe thinking that these should be sampled from a gaussian with these prior locs and used as weights in a linear decoder from a (discrete?) classification. I think this could be very powerful along with the standard SCVI latent space, and might help make clustering more biologically meaningful. I’m still trying to work it out and could use some ideas
Thanks for any thoughts!
I did some related work a long time ago. What you’re describing is a generally good idea I think. There is one important modification you’d want to do though. If you make a latent variable \tau as in the drawing, you want to restrict the ‘decoding’ function f so that f(\tau)_g is periodic for each phasic gene g.
In my outdated work on this I considered the latent function f to come from a Gaussian process GP(0, K(\tau_1, \tau_2)), where K is a periodic covariance function. There might be some way to transform the ‘raw’ decoded f(\tau) values so that it is cyclically constrained.
Some other warning though, we did this using a FUCCI cell line, and, well, cell cycle apparently more consists of discrete states than you’d think, rather than a circle. Here’s a figure from a random paper I found on Google (Human Fucci Pancreatic Beta Cell Lines: New Tools to Study Beta Cell Cycle and Terminal Differentiation):
Maybe a better strategy would be to carefully select cell cycle stage markers and apply CellAssign as is for the distinct stages?
Thanks for your response and these ideas. I’ve kept working on it and detecting cell cycle in an unsupervised way is the one challenge I can’t overcome, but you were right that a supervised setup like CellAssign works okay. I just don’t understand all these papers where they claim that throwing in some sin and cos transformed dimensions and magically detect the cell cycle. I imagine that doesn’t work in datasets where there are literally any axes of variation besides the cell cycle.