I have few samples and merged them all (so the adata has 6 samples in it) and followed the scanpy tutorial without any problem until I reached to the point where I had to extract highly variable genes using this command:
sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)
but keep getting this error:
extracting highly variable genes
/home/.conda/envs/single_cell_scanpy/lib/python3.10/site-packages/scanpy/preprocessing/_highly_variable_genes.py:200: RuntimeWarning: overflow encountered in expm1
** X = np.expm1(X)**
/home/.conda/envs/single_cell_scanpy/lib/python3.10/site-packages/scanpy/preprocessing/_utils.py:11: RuntimeWarning: overflow encountered in multiply
** mean_sq = np.multiply(X, X).mean(axis=axis, dtype=np.float64)**
/home/.conda/envs/single_cell_scanpy/lib/python3.10/site-packages/scanpy/preprocessing/_utils.py:12: RuntimeWarning: invalid value encountered in subtract
** var = mean_sq - mean2
Traceback (most recent call last):
** File “”, line 1, in **
** File “/home/.conda/envs/single_cell_scanpy/lib/python3.10/site-packages/scanpy/preprocessing/_highly_variable_genes.py”, line 440, in highly_variable_genes**
** df = _highly_variable_genes_single_batch(**
** File “/home/.conda/envs/single_cell_scanpy/lib/python3.10/site-packages/scanpy/preprocessing/_highly_variable_genes.py”, line 215, in _highly_variable_genes_single_batch**
** df[‘mean_bin’] = pd.cut(df[‘means’], bins=n_bins)**
** File “/home/.conda/envs/single_cell_scanpy/lib/python3.10/site-packages/pandas/core/reshape/tile.py”, line 263, in cut**
** raise ValueError(**
ValueError: cannot specify integer
bins when input data contains infinity
I looked for the possible solutions but have not found yet. Would you please let me know how I can fix the error?
The warning you are getting (before the value error) indicate that you have some very very large numbers in your dataset (which causes an overflow to the
The scanpy documentation for
sc.pp.highly_variable_genes states that the function “Expects logarithmized data, except when
flavor='seurat_v3' , in which count data is expected.”
My suggestion is that you first verify that you have first ran standard normalizations (
scanpy.pp.log1p) and try re-running the code and see if the error repeats.
If it does, I suggest checking the maximal value of your
adata.X, and make sure no infinity
np.infty or humongous values appear there.
1- I tried
scanpy.pp.log1p but got the same error. I tried
normalize_per_cell as well.
2- I also checked the max value of your
adata.X, which is “175.02573” and min value is “-9.317178”. but I do not know how could be used to find a solution for this issue.
3- I tried
np.infty however constantly got this error:
TypeError: ‘float’ object is not callable
but I used to_list function which made a matrix (list (len is 32100) of lists (len is 24062) in python ). I looked for infinity values there and did not find any.
So, I still have the problem. as more info, I have 6 samples (3 controls and 3 conditions) and concatenated them into one variable (adata).
I tried the same code on every single sample separately and worked perfectly but when I combine the samples into adata (concatenate them), even if I use 2 samples, I will get the same error.
the only difference that I will see in adata, is that adata.obs will have an extra item which is batch, compare to when I do single sample analysis.
The fact that you have a minimal value below 0 in your data means you are not working with counts data, but that you’ve performed some normalization.
I suggest you retry things with the raw data counts, and make sure to concatenate the datasets before you perform any normalizations.
it is solved. I just wrote the adata to a file and read it. everything worked as it should.
I also met the same problem. I read the concatenated adata and read it but sill got the error.