Thanks for the reply.
There are lots of publicly available RNA-seq data. I’m wondering how it is possible to tailor the proper format for scanpy.
any idea?
As @ivirshup mentioned, the problem is that each GEO entry has a different format, some store just a csv, others h5, others zip files, etc. So depending on the dataset you will need to tailor the processing into an AnnData object.
The Bioconductor community has previsoly tackled this problem with recount (recount2: analysis-ready RNA-seq gene and exon counts datasets), a resource consisting of many RNA-seq datasets available in the SummarizedExperiment format. You could retrieve these objects and then transform them to AnnData using Zellkonverter (Conversion Between scRNA-seq Objects • zellkonverter). Unfortunately, I don’t think there is any other alternative 100% native in python. The closest thing is the scanpy function sc.datasets.ebi_expression_atlas(), which allows you to download scRNA-seq datasets stored in the EBI Single Cell Expression Atlas (https://www.ebi.ac.uk/gxa/sc/experiments).
Alternatively, if you are interested in only one dataset from GEO you can always download it and manually process it into an AnnData. For an example, check the begining of this vignette in decoupler: Bulk functional analysis — decoupler 1.2.1 documentation