Is anndata limited to 2d data?


Noob question, but is anndata limited to 2D data? The docs say “The group MAY contain an entry X, which MUST be either a dense or sparse array and whose shape MUST be (n_obs, n_var)” which is 2D. Although this is MAY, I assume the whole ecosystem is designed around 2D rather than ND.

The reason for asking is I’m exploring options for a hyperspectral image format where we have X by Y by ‘chemistry’ where the X and Y dimensions are physical grid points on a solid sample (pixels), and the chemistry dimension is an entire spectrum, say mass spec, or IR/Raman. In the IR case we have X and Y being around 2000, and the chemistry (IR) dimension being around 700. For the MS images X and Y are around 512 and the chemistry dimension can be over 300,000. As an additional curve ball, we can also do depth analysis, so have 3 physical dimensions X by Y by Z and a chemistry dimension, so 4D.

I’m chunking and compressing these files in HDF5 and that seems to work OK due to the typical values being easily compressible. Just looking for a metadata ‘standard’ to drape over the top.



I’m relatively new to scverse myself, so hopefully this isn’t false info. I believe anndata is not limited to 2D data. n_obs and n_var is 2 dimensional, but each element can be multi-dimensional. That being said, I am guessing the best way to do this is via “layers” (AnnData.layers), based on the visualization of the anndata structure (AnnData schema). I haven’t tried using layers or higher-dimensional data, so just chalk it up as something to try, but based on your mention of HDF5, I’m guessing the use of AnnData layers is semi-straightforward.

edit: on second thought, I am going to walk it back on layers being the “best way,” it just seems like a way, but I haven’t looked at how it’s implemented. Depending on how you’re using the chemistry dimensions, it may be enough to just store them in a single layer. I guess a necessary follow up question is: how are you using the chemistry dimensions? are you using specific functions that you want to maximize performance of (general anndata functions) or are you copying the data elsewhere before analyzing them (for external functions that expect a different layout)?