I was wondering what is the intended way to work with images from different samples (e.g. 10 tumor slides from responders and 10 non-responders) when the goal is to compare features from groups with each other.
Would I still use 1 SpatialData object per sample and aggregate features elsewhere is it possible/recommended to store all images ein a single object?
from a performance point of view it is equivalent as each image is saved in a different Zarr group in both cases. From an ergonomics point of view, having one object allows for more compact data representation and I personally prefer it.
There are still some corners to polish when working with one vs multiple SpatialData objects but we are trying to have them addressed the soonest.
For instance adding a new image and saving it to an existing SpatialData object is still a bit rough and will become more intuitive in a refactoring that we are planning to do very soon, so if it feels too unintuitive, one can use multiple objects for that.
Finally, any feedback about this will be greatly appreciated.
Is there any progress or changes on this question since your last answer?
We are trying, for the VISIUM nf-core pipeline (Aggregate samples into single object · Issue #56 · nf-core/spatialvi · GitHub) to aggregate multiple spatialdata objects (from spatialdata_io.visium) into one object, and optionally also run batch correction on the samples.
As of today, what would be the recommended way to:
but I am a bit confused about what to merge and what not.
2. How to do batch correction (using scanorama or harmonypy or similar) on the concatenated sdata?
Should we run 1. without the concatenate_tables flag, and then give all tables to scanorama or harmonypy? But then how can we push back the corrected data into the sdata object?
I would greatly appreciate any input on this or material if it is available somewhere. Thanks!
I think there has been some progress on the concatenate function:
For batch correction, I would use a single table/AnnData object with a batch_key column. I’d typically pass this on to the cell2location model for batch correction, but it can also work with scVI, or scanpy.external.pp.scanorama_integrate.
Thanks @grst for the answer. I would indeed use a single table. Basically you create a SpatialData object with all the elements “unmerged” but you merge the table. This because it’s handier to keep spatial locations separate by sample but handle the table in one go.
For instance adding a new image and saving it to an existing SpatialData object is still a bit rough and will become more intuitive in a refactoring that we are planning to do very soon, so if it feels too unintuitive, one can use multiple objects for that.
Finally, nested NGFF store are still not supported. When this will be addressed you can also provide custom hierarchies (e.g. cohorts) in the same SpatialData object.