Lazy Loading Anndata backed in Zarr Arrays from Disk

Mariano · April 2, 2025, 11:26pm

Hi,
I am trying to understand the Zarr format and its interplay with anndata. Can you help me with these questions? Let’s say that I have a Zarr anndata that is saved in disk with certain chunks and shard.

When a Zarr anndata is loaded into python, is it loaded into memory or is it loaded lazy?
Is there any way of loading it lazy? How would I do this?
Once it is loaded lazily, is there any way to optimize loading chunks?

Best !

ilan-gold · April 3, 2025, 8:58am

Hello Mariano,

At the moment, loading zarr is not lazy. You can load numeric (sparse) array elements (like X and obsm/X_umap) using anndata.io.read_elem_as_dask. For more info see Using dask with Scanpy — scanpy and anndata.experimental.read_elem_as_dask — anndata 0.11.4 documentation

In the next release, we will have a read_lazyanndata.experimental.read_lazy — anndata 0.12.0.dev92+gccfb6e3 documentation that allows reading hte whole anndata lazily.

For optimizing chunks, you will need to use anndata.experimental.read_dispatched — anndata 0.12.0.dev92+gccfb6e3 documentation and anndata.experimental.write_dispatched — anndata 0.12.0.dev92+gccfb6e3 documentation as necessary to control how array elements are read/written because you can’t just say “the whole anndata object has XYZ chunking” as there are different considerations: sparse data (which does not map onto the chunking concept well), different sparse densities (even if it did), different access patterns to optimize for etc.

Hope this helps!

Mariano · April 3, 2025, 9:01pm

Hi Ilan,
I would like to help on this. When is the next release schedule? how can we make this functionality work? can you point me to any issue/feature that is open to work on this?
What is the state of experimental.read_lazy? does it work right now? Would read_lazy just bring to memory any slice of X only when it is needed ?

Best,
Mariano

ilan-gold · April 8, 2025, 6:19am

Hi Mariano, we will begin a pre-release process soon. There is extensive documentation on the linked pages, I couldn’t give you any more help than that because I wrote it And yes, it only brings data into memory when you tell it to.

Lazily Accessing Remotely Stored Data — anndata 0.12.0.dev92+gccfb6e3 documentation There’s also a notebook

Mariano · April 11, 2025, 5:17am

Hi Ilan,
can you tell me if what I am doing is right and how much memory is allocated in these steps?

file_in_disk=“a.zarr”

-Here not much memory is allocated. Basically, opening the Zarr store
adata = ad.experimental.read_lazy(file_in_disk).

here, am I just bringing to memory the chunk?
adata.X[:100, :]
here, am I just bringing to memory the chunk?
adata.obs[‘batch’][:100]

Thanks!

ilan-gold · May 28, 2025, 12:12pm

Hi Mariano, sorry for the delay. No memory should be allocated in either step, both should be lazy representations.

Topic		Replies	Views
Lazy loading of Zarr from S3 anndata	3	65	January 30, 2025
Memory Usage in multiple New Formats anndata	0	27	April 13, 2025
Current status of dask support (and on disk sparse arrays) scanpy	1	695	May 25, 2023
[AnnData] Lazily create .obsm on disk anndata	4	480	May 10, 2022
Reading in a 1.1 million cell HDF5 dataset scRNA-seq h5	4	1142	March 26, 2022

Lazy Loading Anndata backed in Zarr Arrays from Disk

Related topics