Hello scverse communtity,
I need some help troubleshooting an issue I do not understand.
I have a (non-public) H5ad file with a dataset that was analysed (QC, cell/gene/sample annotation, normalisation, dimensionality reduction, clustering, cluster annotation) using scanpy (but I do not have access to the actual code used for these steps).
I want to run (per-cluster) differential gene expression analyses on that dataset, so I need the raw (integer) counts.
However, the AnnData
does not have any layers and adata.X
contains normalised (and scaled) values.
From the AnnData documentation, I understood that adata.raw.X
should contain the raw data, i.e. integer counts.
However,
vals = np.unique(adata.raw.X.data)
vals.sort()
print(vals)
returns
array([0.09674773, 0.10480512, 0.10735171, ..., 4.8433833 , 4.852986 ,
4.8590107 ], dtype=float32)
.
Does anyone have a clue
- what those values could be? (They are not natural logs of integers, for example.) And
- how I could end up with these values in
adata.raw.X
?
I was under the impression that
adata = adata.raw
should
- restore the raw data to
adata.X
and - drop any layers that may exists (in my case: none) but
- preserve the metadata associated with the cells (so I could use the existing clusters to generate pseudobulk samples by summing up the raw counts).
Is this a wrong assumption of mine or a bug somewhere?
Thank you in advance for your help!
Cheers,
Marcel