[loom] Why is adata.layers.matrix stored as float?

natalia.nutella · June 16, 2022, 5:28pm

rep1_loom = sc.read('../velocyto_outputs/rep1/possorted_genome_bam_5NXZV.loom',
                   cache=True)

I have not normalized the raw data. However, when I try to run:

sc.pp.highly_variable_genes(rep1_loom, flavor='seurat_v3')

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/local/42995467/ipykernel_25231/942804383.py in <module>
----> 1 sc.pp.highly_variable_genes(rep1_loom, flavor='seurat_v3')

~/.conda/envs/scarches/lib/python3.7/site-packages/scanpy/preprocessing/_highly_variable_genes.py in highly_variable_genes(adata, layer, n_top_genes, min_disp, max_disp, min_mean, max_mean, span, n_bins, flavor, subset, inplace, batch_key, check_values)
    428             span=span,
    429             subset=subset,
--> 430             inplace=inplace,
    431         )
    432 

~/.conda/envs/scarches/lib/python3.7/site-packages/scanpy/preprocessing/_highly_variable_genes.py in _highly_variable_genes_seurat_v3(adata, layer, n_top_genes, batch_key, check_values, span, subset, inplace)
    124     ranked_norm_gene_vars = ranked_norm_gene_vars.astype(np.float32)
    125     num_batches_high_var = np.sum(
--> 126         (ranked_norm_gene_vars < n_top_genes).astype(int), axis=0
    127     )
    128     ranked_norm_gene_vars[ranked_norm_gene_vars >= n_top_genes] = np.nan

TypeError: '<' not supported between instances of 'float' and 'NoneType'

Why is the raw data (stored in adata.layers.X saved as float32?

dict(rep1_loom.layers)

{'ambiguous': <6720x25607 sparse matrix of type '<class 'numpy.uint32'>'
 	with 5028254 stored elements in Compressed Sparse Row format>,
 'matrix': <6720x25607 sparse matrix of type '<class 'numpy.float32'>'
 	with 30746240 stored elements in Compressed Sparse Row format>,
 'spliced': <6720x25607 sparse matrix of type '<class 'numpy.uint32'>'
 	with 20652564 stored elements in Compressed Sparse Row format>,
 'unspliced': <6720x25607 sparse matrix of type '<class 'numpy.uint32'>'
 	with 12619025 stored elements in Compressed Sparse Row format>}

print(rep1_loom.layers['matrix'][1,:])
  (0, 42)	1.0
  (0, 50)	1.0
  (0, 53)	1.0
  (0, 58)	1.0
  (0, 59)	4.0
  (0, 61)	1.0
  (0, 68)	1.0
  (0, 73)	5.0
  (0, 80)	1.0
  (0, 81)	1.0
  (0, 106)	1.0
  (0, 121)	8.0
  (0, 125)	2.0
  (0, 132)	2.0
  (0, 134)	1.0
  (0, 146)	2.0
  (0, 148)	5.0
  (0, 159)	1.0
  (0, 160)	2.0
  (0, 168)	3.0
  (0, 170)	1.0
  (0, 218)	2.0
  (0, 223)	1.0
  (0, 227)	1.0
  (0, 237)	2.0
  :	:
  (0, 36363)	1.0
  (0, 36373)	1.0
  (0, 36377)	1.0
  (0, 36381)	1.0
  (0, 36387)	1.0
  (0, 36407)	1.0
  (0, 36418)	1.0
  (0, 36421)	2.0
  (0, 36424)	3.0
  (0, 36441)	1.0
  (0, 36456)	6.0
  (0, 36467)	1.0
  (0, 36468)	46.0
  (0, 36471)	1.0
  (0, 36472)	1.0
  (0, 36477)	1.0
  (0, 36480)	2.0
  (0, 36482)	3.0
  (0, 36507)	2.0
  (0, 36543)	4.0
  (0, 36544)	1.0
  (0, 36571)	2.0
  (0, 36572)	1.0
  (0, 36575)	2.0
  (0, 36582)	4.0

As you can see, they are all whole numbers. Why aren’t they saved as int?

ivirshup · June 17, 2022, 12:17pm

natalia.nutella:

{'ambiguous': <6720x25607 sparse matrix of type '<class 'numpy.uint32'>'
 	with 5028254 stored elements in Compressed Sparse Row format>,
 'matrix': <6720x25607 sparse matrix of type '<class 'numpy.float32'>'
 	with 30746240 stored elements in Compressed Sparse Row format>,
 'spliced': <6720x25607 sparse matrix of type '<class 'numpy.uint32'>'
 	with 20652564 stored elements in Compressed Sparse Row format>,
 'unspliced': <6720x25607 sparse matrix of type '<class 'numpy.uint32'>'
 	with 12619025 stored elements in Compressed Sparse Row format>}

It looks like some of them are saved as unsigned integers, just not "matrix".

For the range of values that we see in single cell sequencing data, there also shouldn’t be any difference in accuracy of the values, especially since many downstream computations will be performing floating point operations anyways.

I think the error you are hitting is due to n_top_genes being None, as opposed to the datatype of the matrix.

Topic		Replies	Views
Can’t change anndata dimensions anndata	6	2041	March 9, 2023
Normalized data found instead of raw counts scanpy scvi , anndata	2	791	March 18, 2024
Error with sc.pp.highly_variable_genes, pp.highly_variable_genes` expects an `AnnData` argument scanpy	0	1057	September 21, 2022
Scanpy.tl.rank_genes_groups, layer= does not appear to be working scanpy	1	1158	December 31, 2022
How could `adata.raw.X` contain non-integer values? scanpy anndata	2	18	July 13, 2025

[loom] Why is adata.layers.matrix stored as float?

Related topics