I am interested in analyzing from scratch some MERFISH data. In particular, I will have several stacks from the same tissue, that will hopefully will be used to reconstruct a 3D-like structure of the tissue itself, from the transcriptomic point of view (but that’s not the point of this post).
I would like to analyze my data with Squidpy, but on top of that, I need to set up a pipeline that accepts as an input raw images coming from the microscope.
Even before the cell segmentation (for which we have several choices in terms of tools), I was wondering if the SpatialData framework is also designed for:
- Registration of images of the same FOV in different rounds of hybridization;
- Decoding of the images by using the codebook.
My question comes because in the SpatialData Satellite project, there is this part:
P0. Raw data IO: we are implementing readers for raw data of common spatial omics technologies @ spatialdata-io.
This made me thinking that the analysis that I mentioned above is also part of the framework. However, if I go into the spatialdata-sandbox and look for the script which converts common merfish format into spatialdata storage format (spatialdata-sandbox/merfish at main · giovp/spatialdata-sandbox · GitHub), it seems that processed data are also required, meaning that the preprocessing steps (registration/alignment and decoding) need to be performed somewhere else.
Please, could you clarify to me at what stage my analysis can be performed in the spatialdata framework, at this point? I am sure there is something I am misunderstanding.
Thank you very much!
Hi Sergio, thanks for reaching out.
First before answering, this may be relevant for you if you will be using in the future the commercial platform Vizgen MERSCOPE to generate the MERFISH data: Quentin Blampey wrote a reader that soon will be part of spatialdata-io. You can see the open PR, which still needs some adjustments, here.
Now back to your question. If you have an external tool that performs image registration, and this registration can be represented as an affine transformation, then with SpatialData you can:
- store each round of hybridization as a separate (1, y_i, x_i) image together with the affine transformation;
- without having to transform the data, call napari to see that things are correctly aligned, otherwise you can still adjust each individual registration (for instance with a landmarks-based approach within napari; we show in a notebook how to do it);
- for each image, call
rasterize() to apply the transformation and obtain a new image within a target bounding box and with a desired resolution;
- stack the new images (that now are all aligned and have the same resolution) with dask.array APIs and then store this into a unique (c, y, x) image. You can put this image in a new SpatialData object and when you call
write() it will be saved to Zarr.
What SpatialData can’t handle is the following:
- finding the registration itself (needs to be found with external tools or with a landmark based approach in napari)
- non-linear/non-affine transformations are not supported
- we currently can’t have a transformation that maps a (1, y_i, x_i) image to a specific channel (say channel 3).
Regarding the last point, Wouter-Michiel Vierdag is also interested in this problem (in the context of ISS data) and we would like at some point to allow transformations to map to specific channels, so the dask.array stack() logic can be implemented under the hood behind a single call of rasterize().
Anyway, for the moment the approach that I outlined above can be used. Please feel free to reach me out for any question on that, either here either via Zulip. Also, if you have a small (or subset of) a public dataset with data that could be used to show this approach, I am also happy to help with the code.
Thank you very much for your reply and all the info.
I will try to set up a pipeline for these first steps of the image analysis, and I will eventually come back to you and to the community for feedback and/or advices.