How can I export list of genes and counts for each cluster from adata.raw?

st4302 · April 2, 2024, 8:24pm

Hello,

I have an anndata object from a sample that I am processing through scanpy. I have reached the point where I did the leiden clustering. I would like to export to a csv file all the genes expressed in each cluster and the counts for each gene.

df_gene_expression = pd.DataFrame(adata.X, index=adata.obs.index, columns=adata.var.index)
df_gene_expression[‘cluster’] = adata.obs[‘leiden_0.4’]

The above seems to work but I also want to do this for the adata.raw data.
I first did the below and then run the command to create the dataframe again.

ad5 = adata.raw.to_adata()

I get this message:

ValueError: Shape of passed values is (11647, 1), indices imply (11647, 18845)

I looked at the ad5.X and it looks like this:

<11647x18845 sparse matrix of type ‘<class ‘numpy.float32’>’
with 16326262 stored elements in Compressed Sparse Row format>

How can I use it properly to export the list?

Thank you

ivirshup · April 12, 2024, 10:06pm

That’s weird!

Please report this as a bug over on anndata:

Topic		Replies	Views
Can’t change anndata dimensions anndata	6	2065	March 9, 2023
How could `adata.raw.X` contain non-integer values? scanpy anndata	2	61	July 13, 2025
How to isolate individual gene expression values within a cluster? anndata	3	1250	September 14, 2022
Layer (counts) loss after adata.raw.to_adata() scanpy	0	955	April 13, 2023
Differences between .X, .raw.X, and .raw in anndata object anndata	6	6725	May 10, 2024

How can I export list of genes and counts for each cluster from adata.raw?

Related topics