Read_10x_mtx error UnicodeDecodeError:

I was in the process of creating the AnnData but when I ran the code it gave me a UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x8b in position 1: invalid start byte.

Not sure why the error came when my file formats are correct (tsv.gz, mtx.gz). The data I’m using is from 10x’s cell/matrix raw from this link:
https://www.10xgenomics.com/resources/datasets/6-k-pbm-cs-from-a-healthy-donor-1-standard-1-1-0

data_dir = '/Users/csb/mount/scRNA/'

#create AnnData
adata_pbmc6k = sc.read_10x_mtx(data_dir, var_names = 'gene_ids', cache=True)

adata_pbmc3k.uns["name"] = "PBMC 6k"

Error message:

Even when I add the dtype=‘float64’ it still gives me error message.

It seems you are running into a pandas.read_csv error, I suggest checking that direction:

Alternatively, you can check if this repeats in other 10x’s cell/matrix raw datasets as there might be an actual problem with the file.

And as always - try updating the software and see if the issue was solved :slight_smile: