I have few samples and merged them all (so the adata has 6 samples in it) and followed the scanpy tutorial without any problem until I reached to the point where I had to extract highly variable genes using this command:
Hello Behzad,
The warning you are getting (before the value error) indicate that you have some very very large numbers in your dataset (which causes an overflow to the np.expm1 function.
The scanpy documentation for sc.pp.highly_variable_genes states that the function “Expects logarithmized data, except when flavor='seurat_v3' , in which count data is expected.”
My suggestion is that you first verify that you have first ran standard normalizations (scanpy.pp.normalize_total and scanpy.pp.log1p) and try re-running the code and see if the error repeats.
If it does, I suggest checking the maximal value of your adata.X, and make sure no infinity np.infty or humongous values appear there.
1- I tried scanpy.pp.normalize_total and scanpy.pp.log1p but got the same error. I tried normalize_per_cell as well.
2- I also checked the max value of your adata.X, which is “175.02573” and min value is “-9.317178”. but I do not know how could be used to find a solution for this issue.
3- I tried np.infty however constantly got this error:
TypeError: ‘float’ object is not callable
but I used to_list function which made a matrix (list (len is 32100) of lists (len is 24062) in python ). I looked for infinity values there and did not find any.
So, I still have the problem. as more info, I have 6 samples (3 controls and 3 conditions) and concatenated them into one variable (adata).
I tried the same code on every single sample separately and worked perfectly but when I combine the samples into adata (concatenate them), even if I use 2 samples, I will get the same error.
the only difference that I will see in adata, is that adata.obs will have an extra item which is batch, compare to when I do single sample analysis.
Very strange.
The fact that you have a minimal value below 0 in your data means you are not working with counts data, but that you’ve performed some normalization.
I suggest you retry things with the raw data counts, and make sure to concatenate the datasets before you perform any normalizations.