"ValueError: cannot specify integer `bins` when input data contains infinity" in multi sample data

I have few samples and merged them all (so the adata has 6 samples in it) and followed the scanpy tutorial without any problem until I reached to the point where I had to extract highly variable genes using this command:

sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)

but keep getting this error:

extracting highly variable genes
/home/.conda/envs/single_cell_scanpy/lib/python3.10/site-packages/scanpy/preprocessing/_highly_variable_genes.py:200: RuntimeWarning: overflow encountered in expm1
** X = np.expm1(X)**
/home/.conda/envs/single_cell_scanpy/lib/python3.10/site-packages/scanpy/preprocessing/_utils.py:11: RuntimeWarning: overflow encountered in multiply
** mean_sq = np.multiply(X, X).mean(axis=axis, dtype=np.float64)**
/home/.conda/envs/single_cell_scanpy/lib/python3.10/site-packages/scanpy/preprocessing/_utils.py:12: RuntimeWarning: invalid value encountered in subtract
** var = mean_sq - mean2
Traceback (most recent call last):
** File “”, line 1, in **
** File “/home/.conda/envs/single_cell_scanpy/lib/python3.10/site-packages/scanpy/preprocessing/_highly_variable_genes.py”, line 440, in highly_variable_genes**
** df = _highly_variable_genes_single_batch(**
** File “/home/.conda/envs/single_cell_scanpy/lib/python3.10/site-packages/scanpy/preprocessing/_highly_variable_genes.py”, line 215, in _highly_variable_genes_single_batch**
** df[‘mean_bin’] = pd.cut(df[‘means’], bins=n_bins)**
** File “/home/.conda/envs/single_cell_scanpy/lib/python3.10/site-packages/pandas/core/reshape/tile.py”, line 263, in cut**
** raise ValueError(**
ValueError: cannot specify integer bins when input data contains infinity

I looked for the possible solutions but have not found yet. Would you please let me know how I can fix the error?

Hello Behzad,
The warning you are getting (before the value error) indicate that you have some very very large numbers in your dataset (which causes an overflow to the np.expm1 function.

The scanpy documentation for sc.pp.highly_variable_genes states that the function “Expects logarithmized data, except when flavor='seurat_v3' , in which count data is expected.”

My suggestion is that you first verify that you have first ran standard normalizations (scanpy.pp.normalize_total and scanpy.pp.log1p) and try re-running the code and see if the error repeats.
If it does, I suggest checking the maximal value of your adata.X, and make sure no infinity np.infty or humongous values appear there.

Good luck!

Hello Yotam,

1- I tried scanpy.pp.normalize_total and scanpy.pp.log1p but got the same error. I tried normalize_per_cell as well.
2- I also checked the max value of your adata.X, which is “175.02573” and min value is “-9.317178”. but I do not know how could be used to find a solution for this issue.
3- I tried np.infty however constantly got this error:

TypeError: ‘float’ object is not callable

but I used to_list function which made a matrix (list (len is 32100) of lists (len is 24062) in python ). I looked for infinity values there and did not find any.

So, I still have the problem. as more info, I have 6 samples (3 controls and 3 conditions) and concatenated them into one variable (adata).

I tried the same code on every single sample separately and worked perfectly but when I combine the samples into adata (concatenate them), even if I use 2 samples, I will get the same error.
the only difference that I will see in adata, is that adata.obs will have an extra item which is batch, compare to when I do single sample analysis.

Very strange.
The fact that you have a minimal value below 0 in your data means you are not working with counts data, but that you’ve performed some normalization.
I suggest you retry things with the raw data counts, and make sure to concatenate the datasets before you perform any normalizations.

it is solved. I just wrote the adata to a file and read it. everything worked as it should.

Hi,

I also met the same problem. I read the concatenated adata and read it but sill got the error.