Trouble Reading .h5 files


I am using
scanpy version 1.7.0 with
h5py version 3.6.0

I am trying to read a file in .h5 format (as I understand this is the legacy format). Usually this is not a problem because I can usually read:

adata = sc.read_10x_h5('my_file.h5')
and then save

However this time I am unable to read the file and get the following error
Exception: File is missing one or more required datasets.

Surprisingly (for me) I am actually able to load the file with my version of h5py
f = h5py.File('../../utils/my_file.h5', 'r')
but I just cannot access the data (e.g. load it into anndata)

There is 1 key called “shoji” and I want to access the contents of the “Expression” database which can be done through

data = f['shoji']['Expression']

But then I have trouble loading this into anndata (perhaps because it is too big but maybe because something else is wrong? My kernal dies so there is no error I can share).

I’ve also attempted sc.read_hdf('my_file.h5', key='shoji')
but then I get a counterintuitive error
TypeError: Accessing a group is done with bytes or str, not <class 'tuple'>
Suggesting that the shoji argument is a tuple instead of a string. I figure it was just a bug of a deprecated function.

Anyway, can anyone shed some light on what is happening here? Or better yet, any ideas as to how I can read this file?


Hey @jjeang,

Does this .h5 file follow the expected structure for HDF5 files from 10x Genomics?

In case it’s a custom file, you should be able to read its contents with h5py and then use the data to construct an AnnData object. Is it something that you expect to fit in memory (you can see the dimensions without loading the dataset)?