Passing cell neighbor indices to my encoder

Hello,

I’m trying to pass the indices of neighbors of a cell to my encoder. The neighbor indices are lists stored in adata.obs[‘neighbors’]. They are just lists of integers. For example, [0,4,34,56,99]. And I’m trying to pass this by modifying the setup_anndata function to register a new data field. Am I doing the right thing?

I tried to use NumericalObsField for the neighbor indices but it always tries to convert my lists into float and gives me an error. Could you please give me some guidance on how to fix this (specifying a list type)? How should I add more inputs in general?

Thanks in advance!

Jingtao

You might consider using an obsm field if I understand what you’re trying to do correctly, instead of .obs

cc @cane11 @Justin_Hong

1 Like

I’ve actually optimized some of the code from NCEM for doing this kind of thing recently, see: Speed up batch generation for `EstimatorNeighborHood` by ivirshup · Pull Request #96 · theislab/ncem · GitHub

In the NCEM case, adjacency information are converted to padded dense tensors before being passed to the GPU.

As a side note, I would recommend storing adjacency info as an adjacency matrix inside obsp.

I’ll ping the authors who may be able to give more info.

If you are interested in checking out some stochastic access patterns into graphs related to single-cell questions, you could also check out the generator functions we use for map-style data loaders in tf keras, conceptually they generalise across frameworks, not sure what s easiest for you to implement here. They are implemented in estimator classes in NCEM ncem/ncem/estimators at main · theislab/ncem · GitHub.

Thank you so much for all the help! obsm works fine! I’ve finished implementation but converting data type between flout and double caused some problems, which shouldn’t be hard to fix.

My current implementation is slow, especially for large datasets. So I will also check out NCEM later. Really appreciate it!

1 Like