Does Scanpy stores raw count as int or float?

Hi single-cell connoisseur,

I am curious if the default type for AnnData raw object is float or int?

For example:

raw_adata = adata.raw.to_adata()
raw_adata.X

Output:

<220752x29484 sparse matrix of type '<class 'numpy.float64'>'
	with 307103988 stored elements in Compressed Sparse Row format>

Of course, I got this object for a published dataset, and have to restore the raw count. Just curious if raw counts are stored by default as float?

The motivation for this question was because I saw this cute way of checking if the data is normalized:

if check_counts:
        # check if observations are unnormalized using first 10
        X_subset = adata.X[:10]
        norm_error = 'Make sure that the dataset (adata.X) contains unnormalized count data.'
        if sp.sparse.issparse(X_subset):
            assert (X_subset.astype(int) != X_subset).nnz == 0, norm_error
        else:
            assert np.all(X_subset.astype(int) == X_subset), norm_error

Certainly, my case would have failed this check, if the sparse matrix is not by default stored in int.

Thank you.

Wil

Hi Wil,

Scanpy should be able to work with either, though inplace normalization requires floating point values. Data is typically stored as float32.

Certainly, my case would have failed this check, if the sparse matrix is not by default stored in int.

I don’t think this is the case. For an example:

import numpy as np

assert np.array_equal(
    np.arange(10, dtype=np.float64).astype(int),
    np.arange(10, dtype=np.float64)
)

If the numbers have a floating point type but are integer valued, this should be fine.