Cellxgene datasets raw data? scaled?

LysSanzMoreta · July 1, 2025, 9:35am

Hi!

First, I would just like to know how were the .X matrices in the .h5ad files that can be downloaded from Datasets - CZ CELLxGENE Discover made ? What is the origin of those floats?

I finally found that the raw data is under adata._raw.X

Second, is there any accessible AWS or gcloud storage where the raw counts are located? Something like some collection of .h5ad files i.e dataset_id.h5ad. I refer to the datasets with the raw counts generated after the cellxgene filtering pipeline for removing low counts cells etc.

I am aware of using the python API to download some of the .h5ad files , however sometimes it is very slow/gets stuck etc.

Thank you in advanced for your reply,

Best

mschilli · July 8, 2025, 8:04am

Hi @LysSanzMoreta,

I’m afraid I don’t know the answers to your questions but I wanted to point out that adata._raw is a private element so the API might change. There is adata.raw which currently just returns adata._raw but is meant as part of the stable API. So I suggest you use adata.raw.X to retrieve the raw counts rather than adata._raw.X.

LysSanzMoreta · July 8, 2025, 9:56am

Thanks anyways!

Strange .raw is not found in the keys, that is why I had to dig into the private keys

Topic		Replies	Views
How could `adata.raw.X` contain non-integer values? scanpy anndata	2	298	July 13, 2025
Filter genes in a subset of cells Help	0	394	August 8, 2022
Can’t change anndata dimensions anndata	6	2209	March 9, 2023
Differences between .X, .raw.X, and .raw in anndata object anndata	6	7561	May 10, 2024
Rank_genes_groups expects log data but default to adata.raw, why? scanpy	0	123	November 7, 2025

Cellxgene datasets raw data? scaled?

Related topics