Lazy Loading Anndata backed in Zarr Arrays from Disk

Hi,
I am trying to understand the Zarr format and its interplay with anndata. Can you help me with these questions? Let’s say that I have a Zarr anndata that is saved in disk with certain chunks and shard.

  1. When a Zarr anndata is loaded into python, is it loaded into memory or is it loaded lazy?
  2. Is there any way of loading it lazy? How would I do this?
  3. Once it is loaded lazily, is there any way to optimize loading chunks?

Best !

Hello Mariano,

At the moment, loading zarr is not lazy. You can load numeric (sparse) array elements (like X and obsm/X_umap) using anndata.io.read_elem_as_dask. For more info see Using dask with Scanpy — scanpy and anndata.experimental.read_elem_as_dask — anndata 0.11.4 documentation

In the next release, we will have a read_lazyanndata.experimental.read_lazy — anndata 0.12.0.dev92+gccfb6e3 documentation that allows reading hte whole anndata lazily.

For optimizing chunks, you will need to use anndata.experimental.read_dispatched — anndata 0.12.0.dev92+gccfb6e3 documentation and anndata.experimental.write_dispatched — anndata 0.12.0.dev92+gccfb6e3 documentation as necessary to control how array elements are read/written because you can’t just say “the whole anndata object has XYZ chunking” as there are different considerations: sparse data (which does not map onto the chunking concept well), different sparse densities (even if it did), different access patterns to optimize for etc.

Hope this helps!

Hi Ilan,
I would like to help on this. When is the next release schedule? how can we make this functionality work? can you point me to any issue/feature that is open to work on this?
What is the state of experimental.read_lazy? does it work right now? Would read_lazy just bring to memory any slice of X only when it is needed ?

Best,
Mariano

Hi Mariano, we will begin a pre-release process soon. There is extensive documentation on the linked pages, I couldn’t give you any more help than that because I wrote it :slight_smile: And yes, it only brings data into memory when you tell it to.

Lazily Accessing Remotely Stored Data — anndata 0.12.0.dev92+gccfb6e3 documentation There’s also a notebook

Hi Ilan,
can you tell me if what I am doing is right and how much memory is allocated in these steps?

file_in_disk=“a.zarr”

-Here not much memory is allocated. Basically, opening the Zarr store
adata = ad.experimental.read_lazy(file_in_disk).

  • here, am I just bringing to memory the chunk?
    adata.X[:100, :]

  • here, am I just bringing to memory the chunk?
    adata.obs[‘batch’][:100]

Thanks!

Hi Mariano, sorry for the delay. No memory should be allocated in either step, both should be lazy representations.