Trouble Reading .h5 files

jjeang · April 6, 2023, 7:27pm

Hello,

I am using
scanpy version 1.7.0 with
h5py version 3.6.0

I am trying to read a file in .h5 format (as I understand this is the legacy format). Usually this is not a problem because I can usually read:

adata = sc.read_10x_h5('my_file.h5')
and then save
adata.write_h5ad('my_file.h5ad')

However this time I am unable to read the file and get the following error
Exception: File is missing one or more required datasets.

Surprisingly (for me) I am actually able to load the file with my version of h5py
f = h5py.File('../../utils/my_file.h5', 'r')
but I just cannot access the data (e.g. load it into anndata)

There is 1 key called “shoji” and I want to access the contents of the “Expression” database which can be done through

data = f['shoji']['Expression']

But then I have trouble loading this into anndata (perhaps because it is too big but maybe because something else is wrong? My kernal dies so there is no error I can share).

I’ve also attempted sc.read_hdf('my_file.h5', key='shoji')
but then I get a counterintuitive error
TypeError: Accessing a group is done with bytes or str, not <class 'tuple'>
Suggesting that the shoji argument is a tuple instead of a string. I figure it was just a bug of a deprecated function.

Anyway, can anyone shed some light on what is happening here? Or better yet, any ideas as to how I can read this file?

Thanks

gtca · April 13, 2023, 5:42pm

Hey @jjeang,

Does this .h5 file follow the expected structure for HDF5 files from 10x Genomics?

In case it’s a custom file, you should be able to read its contents with h5py and then use the data to construct an AnnData object. Is it something that you expect to fit in memory (you can see the dimensions without loading the dataset)?

Topic		Replies	Views
Issue reading h5ad file scanpy	1	2733	January 12, 2023
Read_10x_h5 Error scanpy	1	783	May 23, 2023
Converting h5 to h5ad files? scATAC-seq	2	4071	September 29, 2022
Error reading HDF5 file in MuData/AnnData anndata	2	57	October 6, 2024
Reading in a 1.1 million cell HDF5 dataset scRNA-seq h5	4	1083	March 26, 2022

Trouble Reading .h5 files

Related topics