Hi,
I am trying to understand the Zarr format and its interplay with anndata. Can you help me with these questions? Let’s say that I have a Zarr anndata that is saved in disk with certain chunks and shard.
- When a Zarr anndata is loaded into python, is it loaded into memory or is it loaded lazy?
- Is there any way of loading it lazy? How would I do this?
- Once it is loaded lazily, is there any way to optimize loading chunks?
Best !
Hello Mariano,
At the moment, loading zarr is not lazy. You can load numeric (sparse) array elements (like X
and obsm/X_umap
) using anndata.io.read_elem_as_dask
. For more info see Using dask with Scanpy — scanpy and anndata.experimental.read_elem_as_dask — anndata 0.11.4 documentation
In the next release, we will have a read_lazy
anndata.experimental.read_lazy — anndata 0.12.0.dev92+gccfb6e3 documentation that allows reading hte whole anndata lazily.
For optimizing chunks, you will need to use anndata.experimental.read_dispatched — anndata 0.12.0.dev92+gccfb6e3 documentation and anndata.experimental.write_dispatched — anndata 0.12.0.dev92+gccfb6e3 documentation as necessary to control how array elements are read/written because you can’t just say “the whole anndata object has XYZ chunking” as there are different considerations: sparse data (which does not map onto the chunking concept well), different sparse densities (even if it did), different access patterns to optimize for etc.
Hope this helps!
Hi Ilan,
I would like to help on this. When is the next release schedule? how can we make this functionality work? can you point me to any issue/feature that is open to work on this?
What is the state of experimental.read_lazy? does it work right now? Would read_lazy just bring to memory any slice of X only when it is needed ?
Best,
Mariano
Hi Mariano, we will begin a pre-release process soon. There is extensive documentation on the linked pages, I couldn’t give you any more help than that because I wrote it
And yes, it only brings data into memory when you tell it to.
Lazily Accessing Remotely Stored Data — anndata 0.12.0.dev92+gccfb6e3 documentation There’s also a notebook
Hi Ilan,
can you tell me if what I am doing is right and how much memory is allocated in these steps?
file_in_disk=“a.zarr”
-Here not much memory is allocated. Basically, opening the Zarr store
adata = ad.experimental.read_lazy(file_in_disk).
-
here, am I just bringing to memory the chunk?
adata.X[:100, :]
-
here, am I just bringing to memory the chunk?
adata.obs[‘batch’][:100]
Thanks!
Hi Mariano, sorry for the delay. No memory should be allocated in either step, both should be lazy representations.