Sc.pp.neighbours not always getting requested n_neighbors = 5

Hey,

I have realized that sc.pp.neighbours(adata, n_neighbors = 5) does not create 4 neighbours for all cells, some of them are empty:

adata_train.obsp[‘distances’].tolil().rows

array([list([223, 280, 316, 5791]), list([3877, 5899, 7766, 7807]),
       list([165, 304, 423, 713]), ..., list([]),
       list([94, 865, 7077, 7666]), list([])], dtype=object)

This ends up in an error when using sc.tl.ingest(). The same happens with other number for n_neighbors). Is this a bug or a feature? My dataset only has 92 genes. How can I avoid this?

I also wrote here: Ingest won't integrate datasets of different lengths · Issue #2085 · scverse/scanpy · GitHub

Cheers,

I found what causes this, the closest neighbours are not reported if the distance is 0. That means duplicated rows cause empty lists to be returned. Even cases of almost duplicated rows create the same effect, like these two cells (yes, only dealing with a couple dozen genes and very low gene counts):

>adata.X[[1662, 3578]]
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 8.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 8.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0.]], dtype=float32)

>adata.obs.iloc[[1662, 3578]]
	celltype 	n_counts 	n_genes
2328 	hepatocytes 	10.0 	3
5105 	hepatocytes 	11.0 	4
1 Like