[loom] Why is adata.layers.matrix stored as float?

rep1_loom = sc.read('../velocyto_outputs/rep1/possorted_genome_bam_5NXZV.loom',
                   cache=True)

I have not normalized the raw data. However, when I try to run:

sc.pp.highly_variable_genes(rep1_loom, flavor='seurat_v3')

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/local/42995467/ipykernel_25231/942804383.py in <module>
----> 1 sc.pp.highly_variable_genes(rep1_loom, flavor='seurat_v3')

~/.conda/envs/scarches/lib/python3.7/site-packages/scanpy/preprocessing/_highly_variable_genes.py in highly_variable_genes(adata, layer, n_top_genes, min_disp, max_disp, min_mean, max_mean, span, n_bins, flavor, subset, inplace, batch_key, check_values)
    428             span=span,
    429             subset=subset,
--> 430             inplace=inplace,
    431         )
    432 

~/.conda/envs/scarches/lib/python3.7/site-packages/scanpy/preprocessing/_highly_variable_genes.py in _highly_variable_genes_seurat_v3(adata, layer, n_top_genes, batch_key, check_values, span, subset, inplace)
    124     ranked_norm_gene_vars = ranked_norm_gene_vars.astype(np.float32)
    125     num_batches_high_var = np.sum(
--> 126         (ranked_norm_gene_vars < n_top_genes).astype(int), axis=0
    127     )
    128     ranked_norm_gene_vars[ranked_norm_gene_vars >= n_top_genes] = np.nan

TypeError: '<' not supported between instances of 'float' and 'NoneType'

Why is the raw data (stored in adata.layers.X saved as float32?

dict(rep1_loom.layers)

{'ambiguous': <6720x25607 sparse matrix of type '<class 'numpy.uint32'>'
 	with 5028254 stored elements in Compressed Sparse Row format>,
 'matrix': <6720x25607 sparse matrix of type '<class 'numpy.float32'>'
 	with 30746240 stored elements in Compressed Sparse Row format>,
 'spliced': <6720x25607 sparse matrix of type '<class 'numpy.uint32'>'
 	with 20652564 stored elements in Compressed Sparse Row format>,
 'unspliced': <6720x25607 sparse matrix of type '<class 'numpy.uint32'>'
 	with 12619025 stored elements in Compressed Sparse Row format>}
print(rep1_loom.layers['matrix'][1,:])
  (0, 42)	1.0
  (0, 50)	1.0
  (0, 53)	1.0
  (0, 58)	1.0
  (0, 59)	4.0
  (0, 61)	1.0
  (0, 68)	1.0
  (0, 73)	5.0
  (0, 80)	1.0
  (0, 81)	1.0
  (0, 106)	1.0
  (0, 121)	8.0
  (0, 125)	2.0
  (0, 132)	2.0
  (0, 134)	1.0
  (0, 146)	2.0
  (0, 148)	5.0
  (0, 159)	1.0
  (0, 160)	2.0
  (0, 168)	3.0
  (0, 170)	1.0
  (0, 218)	2.0
  (0, 223)	1.0
  (0, 227)	1.0
  (0, 237)	2.0
  :	:
  (0, 36363)	1.0
  (0, 36373)	1.0
  (0, 36377)	1.0
  (0, 36381)	1.0
  (0, 36387)	1.0
  (0, 36407)	1.0
  (0, 36418)	1.0
  (0, 36421)	2.0
  (0, 36424)	3.0
  (0, 36441)	1.0
  (0, 36456)	6.0
  (0, 36467)	1.0
  (0, 36468)	46.0
  (0, 36471)	1.0
  (0, 36472)	1.0
  (0, 36477)	1.0
  (0, 36480)	2.0
  (0, 36482)	3.0
  (0, 36507)	2.0
  (0, 36543)	4.0
  (0, 36544)	1.0
  (0, 36571)	2.0
  (0, 36572)	1.0
  (0, 36575)	2.0
  (0, 36582)	4.0

As you can see, they are all whole numbers. Why aren’t they saved as int?

It looks like some of them are saved as unsigned integers, just not "matrix".

For the range of values that we see in single cell sequencing data, there also shouldn’t be any difference in accuracy of the values, especially since many downstream computations will be performing floating point operations anyways.


I think the error you are hitting is due to n_top_genes being None, as opposed to the datatype of the matrix.