SpatialData workflow

Hello,

I’m a biostatistician working on a project with visiumHD data and I’d like to use spatialdata objects as the backbone of the segmentation process. I’m used to work with R and probably my supervisor will prefer the statistical analysis to be done in bioconductor. But I noticed that there are a lot of nice libraries to work with in python and I really like the possibility of spatialdata to manage all the elements of the experiment in a more flexible way than the standard SummarizedExperiment of BioC.

What I don’t understand thought is the difficulty of doing the most basic thing, for exemple I’m trying to filter the bins element and the associated table of transcipts to mantain only the bins inside the tissue, and consequently to annotate the bins so I know to which tissue they belong to (we have 3 differents tissue samples in one slide). So I have my sd object (read with spatialdata_io and rewritten as zarr)

SpatialData object, with associated Zarr store: /.../spe_blocco1_mod001.zarr
├── Images
│ ├── ‘blocco1_cytassist_image’: DataArray[cyx] (3, 3000, 3200)
│ ├── ‘blocco1_full_image’: DataTree[cyx] (3, 22718, 16166), (3, 11359, 8083), (3, 5679, 4041), (3, 2839, 2020), (3, 1419, 1010)
│ ├── ‘blocco1_hires_image’: DataArray[cyx] (3, 6000, 4270)
│ └── ‘blocco1_lowres_image’: DataArray[cyx] (3, 600, 427)
├── Shapes
│ ├── ‘blocco1_intissue’: GeoDataFrame shape: (3, 5) (2D shapes)
│ ├── ‘blocco1_square_002um’: GeoDataFrame shape: (9233739, 1) (2D shapes)
│ └── ‘intissue_002um’: GeoDataFrame shape: (3548786, 5) (2D shapes)
└── Tables
├── ‘intissue’: AnnData (3548786, 1)
└── ‘square_002um’: AnnData (9233739, 32285)
with coordinate systems:
▸ ‘blocco1’, with elements:
blocco1_cytassist_image (Images), blocco1_full_image (Images), blocco1_hires_image (Images), blocco1_lowres_image (Images), blocco1_square_002um (Shapes)
▸ ‘blocco1_downscaled_hires’, with elements:
blocco1_hires_image (Images), blocco1_square_002um (Shapes)
▸ ‘blocco1_downscaled_lowres’, with elements:
blocco1_lowres_image (Images), blocco1_square_002um (Shapes)
▸ ‘global’, with elements:
blocco1_intissue (Shapes), intissue_002um (Shapes)

as you can see I have already filtered the blocco1_square_002um (bins object) with blocco1_intissue (3 multipolygon obj for the 3 tissue samples in one slide), but when I try to filter the square_002um with the shapes of intissue_002um i get the intissue AnnData with 1 column.

I tried different things. I started to try to filter the tables directly with the blocco1_intissue shape, but I was unable to achieve anything because, as the error wrote, the shapes wasn’t annotated with the table (square_002um), so I filtered the bin shape with the intissue shape and then used the filtered_bin shape to filter the table. It didn’t work, as you can see the resulting “intissue” table.

I tried to use polygon_query, but apparently I needed to give it one polygon at a time and I don’t know how to manage the resulting object in a loop.

I was looking for exemples/tutorials/notebooks but seems like there aren’t, a part for some that are too minimal and probably too much advanced for me. I also thought that filtering spatial object with geojson and such would have been pretty standard thing to do and so there must be functions to achieve that, maybe not?

If i don’t find a way to work with spatialdata I’ll do all the algebra in R and then read the object in python when it’s ready for the segmentation but I don’t like this solution, also the scverse seems nice and I’d like to work more with it.

Thanks in advance and sorry for the wall of text,

Valerio

Hi Valerio, thanks for reaching out.

We will consider your feedback and in the next months we will try to improve documentation and usability for workflows related to filtering and querying. For example we are working on more ergonomic APIs like this one Added `filter_table_by_query` by srivarra · Pull Request #894 · scverse/spatialdata · GitHub and we can use it to refresh the notebooks.

Regarding concrete answers to specific points of your question, due to very limited time we prioritize questions containing code that we can copy-paste and run. Please consider splitting your questions into separate GitHub issues and attaching code to reproduce your problem + description of the expected behaviors, using the blobs dataset (see docs).

The code would look something like this:
import spatialdata as ds
from spatialdata.datasets import blobs

sdata = blobs()
query_polygons = …
# …

sd.polygon_query(…) # error here

Example of such copy-pasteable snippets here: render_shapes() fails with datashader when the coordinate system is not `global` · Issue #447 · scverse/spatialdata-plot · GitHub.

Best,
Luca

@revalescente

Please also check out this notebook: better napari rois by LucaMarconato · Pull Request #148 · scverse/spatialdata-notebooks · GitHub
I have updated it by adding an example on how to query for multiple distinct polygons in a loop. You could also query directly for a single multipolygon (faster).

Thank you very much @LucaMarconato

I tried to delete the post because I figured out how to solve my issues. I filtered the shapes with geopandas and understood better how I should work with sd. Anyway I really appreciate your answers and I’ll try the new query function.

1 Like