Hi! I am wondering if there is any difference between using compression via scanpy.write, or running lz4 on a h5ad uncompressed file on the disk. Can scanpy load it again?
I want to compress some h5ad files that are on disk, which will later be loaded by scanpy.read and I will like to compress them on disk without having a python script that loads them into memory, and uses scanpy to write them back compressed.
I don’t think that this will work as anndata.write_h5ad (which scanpy uses under the hood) uses internal hdf5 compression. So I don’t think that anndata will be able to read those files if you manually compress them outside of the anndata write operation. The compression that you have in mind produces a single binary blob of the h5ad file.
However, I’d encourage you to try it out and report back! Maybe something like
h5repack -f GZIP=4 input.h5ad output.h5ad
will work and these might still be readable by anndata.