Cell charter niche calling

Hey,

Thank you for the cell charter tool! @marcovarrone. We find the tool really useful in getting niches. Reading the paper and running it on our internal Xenium data, we find that an extremely small group of cells would form their own cluster in the middle of a huge sheet of cells of same cell type. As an example, we would find that macrophages and T cells that are spatially very close (i.e they are both in the middle of a tumor sheet for example) would just be in different clusters. Our understanding is that it’s because of the way neighbor information is aggregated in cell charter. In particular, the first entry in the aggregated neighborhood space is just the latent representation of the source cell. So, if we use n_layers=3 in cell charter, we would just end up with a neighborhood space where 25% of the space is dominated by the source cell’s representation. Do you think that this problem can be fixed by changing how source cell’s representation is encoded?

Thanks.

Hi @abs51295, I am happy to see that you are finding CellCharter useful.
It’s not easy to answer without looking at the data.
Your intuition is correct and if your tissue is very dense, n_layers=3 may not span a big area (especially in single-cell technologies like Xenium).
In general, a low value for n_layers gives you fine and more scattered niches, closer to cell types, and higher values give you broader niches where you can have more mixing of different cell types and the source cell is less important. So you can try increasing n_layers.

In cellcharter.gr.aggregate_neighbors, n_layers can also accept a list and this can contain the specific layers that you want to get the embedding from. Passing simply the value 3 is equivalent to passing [0,1,2,3] where 0 means the source cell. This means that if you pass [1,2,3] it will take the first, second, and third-hop neighbors and not the source cell’s embedding.

1 Like

Hey @marcovarrone thank you for your prompt response. I understand that increasing n_layers would be better. Do you know of any way to choose n_layers for a given dataset?

Thanks

Sorry @abs51295, for some reason I missed the notification.
I tried to make CellCharter the least dependent on parameters possible so that it would work well with default values in basically every situation (see dimensionality reduction, number of clusters, etc…) but n_layers inherently depends on what’s the scope of your analysis.
In some projects I wanted to obtain thin structures or the signal from gene or protein expression was not very strong so it was already kind of diluted, so I chose a low number of n_layers = 2-3. In other projects I wanted to have a more high-level overview of the main architecture so I chose a higher n_layers = 3-4-5. You can notice that I put 3 in both low and high because as I said it depends a lot on the data :slight_smile: .
I have also noticed that in some cases, setting higher values doesn’t change much the result (unless you go to very different values like 10 or 20).