Loading ATAC dataset

Hello - I’m a novice with multimodal analysis, but relatively comfortable with scanpy. I have a scATAC dataset (GEO Accession viewer) that I’m trying to pair with another scRNA dataset, but I’m having trouble loading the ATAC dataset with muon.

Here is the data in question:

# Download the RAW file from the output above
ftp_link = r'ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE182nnn/GSE182134/suppl/GSE182134_RAW.tar'
! curl -O $ftp_link 

After unzipping, I get the following files

./E11_5rep1/
./E11_5rep1/fragments.tsv.gz.tbi
./E11_5rep1/singlecell.csv
./E11_5rep1/fragments.tsv.gz

etc. … - I’ve tried using muon.read_10x_mtx(path_) , where path_ = r'/content/E11_5rep1/' but the error:

FileNotFoundError: Did not find file /content/E11_5rep1/matrix.mtx.gz.

… and not sure how to troubleshoot from there - any help is appreciated!

Best,
Panos

Hi Panos,

The read_10x_mtx is expecting a (peak x cell) count matrix, which doesn’t exist in your folder.

Generally, ATAC preprocessing includes:
[1] alignment: fastqs + reference → fragments
[2] peak calling: fragments → count matrix

The files you have are the output from [1], so you still need to do [2].

You can this with various preprocessing pipelines (e.g ArchR), but the easiest is probably 10x’s cellranger, specifically the aggr function of the ATAC pipeline.

If you have multiple samples, make sure you should include all of them in this step so that they all use the same set of peaks.

The output from the pipeline will include the files read_10x_mtx is looking for.

Good luck,
Tal

1 Like

This is a little overdue, but for anyone having similar problems, snapatac2 is a very user friendly preprocessing pipeline for scATAC data.

2 Likes

Thank you for the tip!