I have a large scRNAseq data set with multiple libraries for different experimental conditions (time, tissue, perturbation), which I have integrated using Scanorama.
Now I would like to do trajectory inference on this combined data set, possibly with CellRank, but was wondering what is the best way of approaching this.
Apply trajectory inference to separate samples, then merge together results. Problem is that the inferred time from one sample might not be comparable to that of another sample.
Apply trajectory inference to integrated data. This only makes sense to me if the data integration directly transforms the counts (e.g. with scVI). Otherwise two biologically similar cell populations from different libraries might still be assigned very different pseudotimes if the data integration is only at the dimensionality reduction level (e.g. with Scanorama).
I found one paper discussing a method for dealing with exactly this problem, but I’m not sure how well it works: https://www.biorxiv.org/content/10.1101/2021.03.09.433671v1
If you have any thoughts or suggestions on how to approach this, let me know!