Dealing with multiple samples

grst · May 26, 2023, 8:31am

Hi spatialdata folks,

I was wondering what is the intended way to work with images from different samples (e.g. 10 tumor slides from responders and 10 non-responders) when the goal is to compare features from groups with each other.

Would I still use 1 SpatialData object per sample and aggregate features elsewhere is it possible/recommended to store all images ein a single object?

Best,
Gregor

LucaMarconato · May 31, 2023, 9:24pm

Hi Gregor,

from a performance point of view it is equivalent as each image is saved in a different Zarr group in both cases. From an ergonomics point of view, having one object allows for more compact data representation and I personally prefer it.

There are still some corners to polish when working with one vs multiple SpatialData objects but we are trying to have them addressed the soonest.

For instance adding a new image and saving it to an existing SpatialData object is still a bit rough and will become more intuitive in a refactoring that we are planning to do very soon, so if it feels too unintuitive, one can use multiple objects for that.

Finally, any feedback about this will be greatly appreciated.

cavenel · November 15, 2024, 8:18am

Hi Luca,

Is there any progress or changes on this question since your last answer?

We are trying, for the VISIUM nf-core pipeline (Aggregate samples into single object · Issue #56 · nf-core/spatialvi · GitHub) to aggregate multiple spatialdata objects (from spatialdata_io.visium) into one object, and optionally also run batch correction on the samples.
As of today, what would be the recommended way to:

merge samples into one sdata? I tried:

output_sdata = spatialdata.concatenate(
    sdatas,
    region_key=None,
    instance_key=None,
    concatenate_tables=False,
    obs_names_make_unique=True,
    modify_tables_inplace=False,
)

but I am a bit confused about what to merge and what not.
2. How to do batch correction (using scanorama or harmonypy or similar) on the concatenated sdata?

Should we run 1. without the concatenate_tables flag, and then give all tables to scanorama or harmonypy? But then how can we push back the corrected data into the sdata object?

I would greatly appreciate any input on this or material if it is available somewhere. Thanks!

Christophe

grst · November 15, 2024, 8:36am

Hi Christophe,

I think there has been some progress on the concatenate function:

github.com/scverse/spatialdata

Feedback on concatenate()

opened 10:00AM - 08 Apr 24 UTC

closed 01:09PM - 04 Oct 24 UTC

grst

enhancement ✨ method: concatenate

While I in the end was able to concatenate the data the way I like, the user exp…erience wasn't as great as I had hoped, so wanted to drop some feedback. As I'm not that familiar with spatialdata yet, it might be that there are already better solutions -- please let me know if there are. ### Starting situation I have ~20 Visium Cytassist samples from a clinical trial processed with nf-core/spatialtranscriptomics (using the https://github.com/nf-core/spatialtranscriptomics/pull/67 branch that already uses spatialdata). The pipeline generates a single `.zarr` folder for each sample. ### Desired outcome I would like to have all samples in a single SpatialData object. The AnnData table should contain the gene expression from all samples. ### Pain points * `sd.concatenate` enforces that the input is a list. Is there a reason this can't accept any `Sequence` type (e.g. `dict_values`)? * Usually, I pass a dictionary `sample_id -> AnnData` to `anndata.concat`, which nicely makes unique obs_names in combination with `concat(..., index_unique="_")`. This doesn't work with spatialdata.concatenate, which leaves me with either manipulating the `obs_names` for each object before concatenation, or ugly obs names with numeric sufficies (e.g. `AACTCAACCTTGACCA-1_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0`). IMO it would be great to support a dict as input to spatialdata.concatenate, too. * The per-sample SpatialData objects all have the same names for images, shapes and coordinate systems. I currently rename them like this: ```python sdatas_vis = {} for _, row in tqdm(samplesheet.iterrows(), total=samplesheet.shape[0]): sample = row["sample"] tmp_sd = sd.read_zarr(sample_path / sample / "data" / "sdata_processed.zarr") tmp_sd.tables["table"].obs = tmp_sd.tables["table"].obs.assign(**row) tmp_sd.tables["table"].obs["region"] = sample tmp_sd.tables["table"].uns["spatialdata_attrs"]["region"] = sample # rename images tmp_sd.images[f"{sample}_hires"] = tmp_sd.images["visium_hires_image"] tmp_sd.images[f"{sample}_lowres"] = tmp_sd.images["visium_lowres_image"] del tmp_sd.images["visium_hires_image"] del tmp_sd.images["visium_lowres_image"] # rename shapes tmp_sd.shapes[f"{sample}"] = tmp_sd.shapes["visium"] del tmp_sd.shapes["visium"] sdatas_vis[sample] = tmp_sd ``` which seems a bit cumbersome. I'm wondering if there's a better solution or what's the intended way of handling such cases. It could also be worth adding a process to the nf-core/spatialtranscriptomics pipeline that already does the concatenation step.

For batch correction, I would use a single table/AnnData object with a batch_key column. I’d typically pass this on to the cell2location model for batch correction, but it can also work with scVI, or scanpy.external.pp.scanorama_integrate.

Maybe Luca has more input.

LucaMarconato · November 18, 2024, 2:35pm

Thanks @grst for the answer. I would indeed use a single table. Basically you create a SpatialData object with all the elements “unmerged” but you merge the table. This because it’s handier to keep spatial locations separate by sample but handle the table in one go.

For instance adding a new image and saving it to an existing SpatialData object is still a bit rough and will become more intuitive in a refactoring that we are planning to do very soon, so if it feels too unintuitive, one can use multiple objects for that.

This has been addressed as well, by providing the SpatialData.write_element() method.

Finally, nested NGFF store are still not supported. When this will be addressed you can also provide custom hierarchies (e.g. cohorts) in the same SpatialData object.

grst · January 20, 2025, 2:52pm

Hi @LucaMarconato,

another follow-up question: When concatenating multiple samples (visium in my case), would you advise to have separate coordinate systems for each sample or have them all in one?

The latter is currently the (implicit) default, because reading Visium samples always results in coordinate systems named downscaled_hires, global and downscaled_lowres, so they are the same between all images and will be merged.

LucaMarconato · January 20, 2025, 3:19pm

Hi, I would rename the coordinate systems to be different. We have an API for that: spatialdata/src/spatialdata/_core/spatialdata.py at ae71ae134de2c189a3aa425ceabe82cf1937e701 · scverse/spatialdata · GitHub, but you are right, it is not called by concatenate(). Maybe we could have an extra parameter in concatenate() that renames the coordinate systems; would you like to try a PR for that?

grst · January 20, 2025, 3:20pm

That would be one option. Alternatively, I was wondering if it might be even better to adapt the IO functions that the coordinate systems are already uniquely named after the input file?

LucaMarconato · January 20, 2025, 3:46pm

Thanks also a good option. Also the global space should have the name from the dataset id.

grst · February 13, 2025, 12:49pm

Here’s a PR implementing the second option for visium: Rename coordinate systems in visium by grst · Pull Request #266 · scverse/spatialdata-io · GitHub

If you think that’s a way forward, I could also try to adapt the other reader functions.

Topic		Replies	Views
Using my own datasets with spatialData spatialdata	9	426	February 8, 2024
How to concatenate spatial AnnData objects squidpy	4	1501	August 15, 2023
MERFISH data analysis from scratch spatialdata	2	723	June 4, 2023
Anndata.concatenate() with two 10x multiome datasets? anndata integration , multivi	2	647	December 29, 2022
Unexpected Coordinate Values with Image2DModel spatialdata	5	270	July 15, 2023

Dealing with multiple samples

Related topics