I have a question about data export from GEO to scanpy object.
I want to make GSE81608 dataset into scanpy object file. (GEO Accession viewer)
Usually I make scanpy object by 3 files (barcodes, features, matrix) but GSE81608 has only one txt file.
So I want to ask how can I transformGSE81608 dataset toscanpy object?
There’s no standardization of files uploaded to GEO, you’re going to have to figure it out on a dataset by dataset basis.
Thanks for the reply.
There are lots of publicly available RNA-seq data. I’m wondering how it is possible to tailor the proper format for scanpy.
As @ivirshup mentioned, the problem is that each GEO entry has a different format, some store just a csv, others h5, others zip files, etc. So depending on the dataset you will need to tailor the processing into an
The Bioconductor community has previsoly tackled this problem with recount (recount2: analysis-ready RNA-seq gene and exon counts datasets), a resource consisting of many RNA-seq datasets available in the
SummarizedExperiment format. You could retrieve these objects and then transform them to
AnnData using Zellkonverter (Conversion Between scRNA-seq Objects • zellkonverter). Unfortunately, I don’t think there is any other alternative 100% native in python. The closest thing is the
sc.datasets.ebi_expression_atlas(), which allows you to download scRNA-seq datasets stored in the EBI Single Cell Expression Atlas (https://www.ebi.ac.uk/gxa/sc/experiments).
Alternatively, if you are interested in only one dataset from GEO you can always download it and manually process it into an
AnnData. For an example, check the begining of this vignette in
decoupler: Bulk functional analysis — decoupler 1.2.1 documentation
Hope this is helpful!
Thanks a lot for the comprehensive reply. The sources that you address are beneficial.