"ValueError: cannot specify integer `bins` when input data contains infinity" in multi sample data

Behzad · March 15, 2023, 8:37am

I have few samples and merged them all (so the adata has 6 samples in it) and followed the scanpy tutorial without any problem until I reached to the point where I had to extract highly variable genes using this command:

sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)

but keep getting this error:

extracting highly variable genes
/home/.conda/envs/single_cell_scanpy/lib/python3.10/site-packages/scanpy/preprocessing/_highly_variable_genes.py:200: RuntimeWarning: overflow encountered in expm1
** X = np.expm1(X)**
/home/.conda/envs/single_cell_scanpy/lib/python3.10/site-packages/scanpy/preprocessing/_utils.py:11: RuntimeWarning: overflow encountered in multiply
** mean_sq = np.multiply(X, X).mean(axis=axis, dtype=np.float64)**
/home/.conda/envs/single_cell_scanpy/lib/python3.10/site-packages/scanpy/preprocessing/_utils.py:12: RuntimeWarning: invalid value encountered in subtract
** var = mean_sq - mean2
Traceback (most recent call last):
** File “”, line 1, in **
** File “/home/.conda/envs/single_cell_scanpy/lib/python3.10/site-packages/scanpy/preprocessing/_highly_variable_genes.py”, line 440, in highly_variable_genes**
** df = _highly_variable_genes_single_batch(**
** File “/home/.conda/envs/single_cell_scanpy/lib/python3.10/site-packages/scanpy/preprocessing/_highly_variable_genes.py”, line 215, in _highly_variable_genes_single_batch**
** df[‘mean_bin’] = pd.cut(df[‘means’], bins=n_bins)**
** File “/home/.conda/envs/single_cell_scanpy/lib/python3.10/site-packages/pandas/core/reshape/tile.py”, line 263, in cut**
** raise ValueError(**
ValueError: cannot specify integer bins when input data contains infinity

I looked for the possible solutions but have not found yet. Would you please let me know how I can fix the error?

yotamcons · March 16, 2023, 8:52am

Hello Behzad,
The warning you are getting (before the value error) indicate that you have some very very large numbers in your dataset (which causes an overflow to the np.expm1 function.

The scanpy documentation for sc.pp.highly_variable_genes states that the function “Expects logarithmized data, except when flavor='seurat_v3' , in which count data is expected.”

My suggestion is that you first verify that you have first ran standard normalizations (scanpy.pp.normalize_total and scanpy.pp.log1p) and try re-running the code and see if the error repeats.
If it does, I suggest checking the maximal value of your adata.X, and make sure no infinity np.infty or humongous values appear there.

Good luck!

Behzad · March 16, 2023, 5:16pm

Hello Yotam,

1- I tried scanpy.pp.normalize_total and scanpy.pp.log1p but got the same error. I tried normalize_per_cell as well.
2- I also checked the max value of your adata.X, which is “175.02573” and min value is “-9.317178”. but I do not know how could be used to find a solution for this issue.
3- I tried np.infty however constantly got this error:

TypeError: ‘float’ object is not callable

but I used to_list function which made a matrix (list (len is 32100) of lists (len is 24062) in python ). I looked for infinity values there and did not find any.

So, I still have the problem. as more info, I have 6 samples (3 controls and 3 conditions) and concatenated them into one variable (adata).

Behzad · March 17, 2023, 10:46am

I tried the same code on every single sample separately and worked perfectly but when I combine the samples into adata (concatenate them), even if I use 2 samples, I will get the same error.
the only difference that I will see in adata, is that adata.obs will have an extra item which is batch, compare to when I do single sample analysis.

yotamcons · March 30, 2023, 10:37am

Very strange.
The fact that you have a minimal value below 0 in your data means you are not working with counts data, but that you’ve performed some normalization.
I suggest you retry things with the raw data counts, and make sure to concatenate the datasets before you perform any normalizations.

Behzad · April 5, 2023, 10:11pm

it is solved. I just wrote the adata to a file and read it. everything worked as it should.

GangLi · June 27, 2023, 10:36pm

Hi,

I also met the same problem. I read the concatenated adata and read it but sill got the error.

Topic		Replies	Views
Error with sc.pp.highly_variable_genes, pp.highly_variable_genes` expects an `AnnData` argument scanpy	0	1060	September 21, 2022
scanpy.pp.highly_variable_genes and “raise KeyError” scanpy	1	1359	February 24, 2023
Error in highly variable gene selection scanpy scrna-seq , gene-selection	8	3921	March 21, 2022
Merged adata with different dimension but getting stuck at PCA scanpy	0	662	August 22, 2022
Normalized data found instead of raw counts scanpy scvi , anndata	2	795	March 18, 2024

"ValueError: cannot specify integer `bins` when input data contains infinity" in multi sample data

Related topics