Preprocess of scATAC use for peakVI or MultiVI

Hi,

Many thanks for providing scvi-tools, it do helps a lot when I integrate multiple datasets.

I processed scATAC data with snapatac2, but snapatac2 output scATAC in bin counts (not peak counts). I tired many tools to integrate scRNA and scATAC with bin couns (5k) or gene activity matrix generated from bin counts (5k). I could not get proper integration results.

Could you suggest some about this? Do you think bin count could be used for peakVI or multiVI or could you suggest some preprocess pipeline used to merge multi scATAC data?

Many thanks,
Xuan

Hi, multiVI is intended to integrate e.g. multiome with RNA or ATAC data but not to integrate RNA with ATAC data. Most other models also require some cells with joint measurements. I’m not aware of other tools that I would recommend for this task.

1 Like

Many thanks for your quick reply. I am integrating scRNA scATAC with multiomic data, so I first try integration with multiVI.
I went through the tutorial of multiVI, but in the tutorial, the test data (scRNA, scATAC and multiomic) are both subset of one multiomic dataset which means it will have the same var names. If I run multiVI with my data, should I align each sample with same var names or some overlapping between scATAC and multiomic is sufficient?

Best,
Xuan

Hi, you want to do peak calling across those datasets to have a joint set of features in your ATAC data. E.g. Atlas-scale Analysis: a cell atlas of human chromatin accessibility. — SnapATAC2 2.8.0 documentation provides guidelines to identify joint peaks.

Hi,

Thanks for your reply. I also tried snapatac2, now I generate matrix from cellranger aggr, becasue snapatac2 will generate bin matrix.

Best,
Xuan

Bin counts should work just as well as peak counts. In my experience at least, integration depends on which peaks to use (assuming you are using MultiVI or other similar VAE approaches). If you use bulk peak calling to identify regions to count, you are more likely to miss key regions that are more cell type selective, especially in smaller populations. Junk in, Junk out.
In contrast, I’ve been able to integrate scRNA and scATAC relatively well in MultiVI, if I select the correct regions via non-bulk peak calling methods.

Here’s an example of an approach that does not rely on bulk peak calling (as well as a discussion as to why it matters).

https://www.nature.com/articles/s41467-024-50612-6