Does Scanpy stores raw count as int or float?

lwtan · March 24, 2024, 12:02am

Hi single-cell connoisseur,

I am curious if the default type for AnnData raw object is float or int?

For example:

raw_adata = adata.raw.to_adata()
raw_adata.X

Output:

<220752x29484 sparse matrix of type '<class 'numpy.float64'>'
	with 307103988 stored elements in Compressed Sparse Row format>

Of course, I got this object for a published dataset, and have to restore the raw count. Just curious if raw counts are stored by default as float?

The motivation for this question was because I saw this cute way of checking if the data is normalized:

if check_counts:
        # check if observations are unnormalized using first 10
        X_subset = adata.X[:10]
        norm_error = 'Make sure that the dataset (adata.X) contains unnormalized count data.'
        if sp.sparse.issparse(X_subset):
            assert (X_subset.astype(int) != X_subset).nnz == 0, norm_error
        else:
            assert np.all(X_subset.astype(int) == X_subset), norm_error

Certainly, my case would have failed this check, if the sparse matrix is not by default stored in int.

Thank you.

Wil

ivirshup · March 25, 2024, 8:30pm

Hi Wil,

Scanpy should be able to work with either, though inplace normalization requires floating point values. Data is typically stored as float32.

Certainly, my case would have failed this check, if the sparse matrix is not by default stored in int.

I don’t think this is the case. For an example:

import numpy as np

assert np.array_equal(
    np.arange(10, dtype=np.float64).astype(int),
    np.arange(10, dtype=np.float64)
)

If the numbers have a floating point type but are integer valued, this should be fine.

Topic		Replies	Views
[loom] Why is adata.layers.matrix stored as float? anndata	1	577	June 17, 2022
How could `adata.raw.X` contain non-integer values? scanpy anndata	2	32	July 13, 2025
Normalized data found instead of raw counts scanpy scvi , anndata	2	800	March 18, 2024
Does processing the data always overwrite the raw counts? scanpy	3	2250	April 22, 2024
Reading matrix.mtx with "real" number format scanpy	0	356	October 29, 2023

Does Scanpy stores raw count as int or float?

Related topics