Dealing with multiple samples

Hi spatialdata folks,

I was wondering what is the intended way to work with images from different samples (e.g. 10 tumor slides from responders and 10 non-responders) when the goal is to compare features from groups with each other.

Would I still use 1 SpatialData object per sample and aggregate features elsewhere is it possible/recommended to store all images ein a single object?

Best,
Gregor

Hi Gregor,

from a performance point of view it is equivalent as each image is saved in a different Zarr group in both cases. From an ergonomics point of view, having one object allows for more compact data representation and I personally prefer it.

There are still some corners to polish when working with one vs multiple SpatialData objects but we are trying to have them addressed the soonest.

For instance adding a new image and saving it to an existing SpatialData object is still a bit rough and will become more intuitive in a refactoring that we are planning to do very soon, so if it feels too unintuitive, one can use multiple objects for that.

Finally, any feedback about this will be greatly appreciated.

1 Like

Hi Luca,

Is there any progress or changes on this question since your last answer?

We are trying, for the VISIUM nf-core pipeline (Aggregate samples into single object · Issue #56 · nf-core/spatialvi · GitHub) to aggregate multiple spatialdata objects (from spatialdata_io.visium) into one object, and optionally also run batch correction on the samples.
As of today, what would be the recommended way to:

  1. merge samples into one sdata? I tried:
output_sdata = spatialdata.concatenate(
    sdatas,
    region_key=None,
    instance_key=None,
    concatenate_tables=False,
    obs_names_make_unique=True,
    modify_tables_inplace=False,
)

but I am a bit confused about what to merge and what not.
2. How to do batch correction (using scanorama or harmonypy or similar) on the concatenated sdata?

Should we run 1. without the concatenate_tables flag, and then give all tables to scanorama or harmonypy? But then how can we push back the corrected data into the sdata object?

I would greatly appreciate any input on this or material if it is available somewhere. Thanks!

Christophe

Hi Christophe,

I think there has been some progress on the concatenate function:

For batch correction, I would use a single table/AnnData object with a batch_key column. I’d typically pass this on to the cell2location model for batch correction, but it can also work with scVI, or scanpy.external.pp.scanorama_integrate.

Maybe Luca has more input.

Thanks @grst for the answer. I would indeed use a single table. Basically you create a SpatialData object with all the elements “unmerged” but you merge the table. This because it’s handier to keep spatial locations separate by sample but handle the table in one go.

For instance adding a new image and saving it to an existing SpatialData object is still a bit rough and will become more intuitive in a refactoring that we are planning to do very soon, so if it feels too unintuitive, one can use multiple objects for that.

This has been addressed as well, by providing the SpatialData.write_element() method.

Finally, nested NGFF store are still not supported. When this will be addressed you can also provide custom hierarchies (e.g. cohorts) in the same SpatialData object.

Hi @LucaMarconato,

another follow-up question: When concatenating multiple samples (visium in my case), would you advise to have separate coordinate systems for each sample or have them all in one?

The latter is currently the (implicit) default, because reading Visium samples always results in coordinate systems named downscaled_hires, global and downscaled_lowres, so they are the same between all images and will be merged.

Hi, I would rename the coordinate systems to be different. We have an API for that: spatialdata/src/spatialdata/_core/spatialdata.py at ae71ae134de2c189a3aa425ceabe82cf1937e701 · scverse/spatialdata · GitHub, but you are right, it is not called by concatenate(). Maybe we could have an extra parameter in concatenate() that renames the coordinate systems; would you like to try a PR for that?

That would be one option. Alternatively, I was wondering if it might be even better to adapt the IO functions that the coordinate systems are already uniquely named after the input file?

Thanks also a good option. Also the global space should have the name from the dataset id.