Help with cellrank2 macrostates (multiple conditions design)

Hello, I have a question about cellrank2.

First off, thank you very much for this very interesting tool.

I am currently exploring the trajectories of the differentiation of the respiratory epithelium. The design of the analysis is 8 conditions (Healthy/Disease Treatment/CTRL D3/D14).

  1. I first run palantir pseudotime all conditions together as suggested in issue 957 in github scverse/cellrank (can’t add the link here), choosing cell_A as the cluster for the root cell. I run palantir with 2500 waypoints, since my dataset size is 40014 × 62703, which would be around only 6% of my cells. My pseudotime results make sense.
  2. Then I calculate macrostates with 100 cells (if I use only 30, one of the healthy conditions does not contribute at all to the macrostates, if I use more, other cell types contaminate the macrostates). You can see the UMAP with the macrostates.

My problem is not the contribution of the original cell type clusters to the macrostates, but the contribution of the conditions to the macrostates (as you can see in the heatmap): only Disease conditions contribute to macrostates, except for the macrostate cell_C. Is there a way to balance the contribution of the conditions to the macrostates? Do you think this comes from something in the gene expression? My problem here is that it is unbalanced, and then it does not seem correct to calculate fate probabilities of my cells towards macrostates characterized only by Disease conditions.

I can share my notebooks with you to see if there is any parameter I can change along the pipeline.
Thank you very much,
Paola

Hi Paola! thanks for your question. Would you expect all conditions to contribute to all macrostates? Could it make sense biologically that healthy only contributes to a single macrostate, or is the contribution of different conditions to the phenocypic landscape much more mixed, including the healthy condition? You could look at scanpy density plots in the UMAP as a first step of exploring this.

Hello Marius, thank you very much for your answer!

Would you expect all conditions to contribute to all macrostates?

Maybe not to all. It would not be surprising to have some macrostates with higher contribution from treatment or disease, but not almost 0 contribution from healthy conditions among all macrostates, this is very weird.

Could it make sense biologically that healthy only contributes to a single macrostate, or is the contribution of different conditions to the phenocypic landscape much more mixed, including the healthy condition?

The weird result is the non-contribution of healthy conditions to any of the macrostates basically (with the exception of one). That being said, there could be a biological reason of a second macrostate for a cell type, that is not a weird.

You could look at scanpy density plots in the UMAP as a first step of exploring this.

To explore the contribution of each condition to that part of the cluster after integration/annotation etc?

Thanks very much for your help :grinning_face_with_smiling_eyes: