Preprocess of scATAC use for peakVI or MultiVI

Xuann · June 19, 2025, 2:24pm

Hi,

Many thanks for providing scvi-tools, it do helps a lot when I integrate multiple datasets.

I processed scATAC data with snapatac2, but snapatac2 output scATAC in bin counts (not peak counts). I tired many tools to integrate scRNA and scATAC with bin couns (5k) or gene activity matrix generated from bin counts (5k). I could not get proper integration results.

Could you suggest some about this? Do you think bin count could be used for peakVI or multiVI or could you suggest some preprocess pipeline used to merge multi scATAC data?

Many thanks,
Xuan

cane11 · June 19, 2025, 5:51pm

Hi, multiVI is intended to integrate e.g. multiome with RNA or ATAC data but not to integrate RNA with ATAC data. Most other models also require some cells with joint measurements. I’m not aware of other tools that I would recommend for this task.

Xuann · June 19, 2025, 6:17pm

Many thanks for your quick reply. I am integrating scRNA scATAC with multiomic data, so I first try integration with multiVI.
I went through the tutorial of multiVI, but in the tutorial, the test data (scRNA, scATAC and multiomic) are both subset of one multiomic dataset which means it will have the same var names. If I run multiVI with my data, should I align each sample with same var names or some overlapping between scATAC and multiomic is sufficient?

Best,
Xuan

cane11 · August 26, 2025, 7:36am

Hi, you want to do peak calling across those datasets to have a joint set of features in your ATAC data. E.g. Atlas-scale Analysis: a cell atlas of human chromatin accessibility. — SnapATAC2 2.8.0 documentation provides guidelines to identify joint peaks.

Xuann · August 26, 2025, 12:53pm

Hi,

Thanks for your reply. I also tried snapatac2, now I generate matrix from cellranger aggr, becasue snapatac2 will generate bin matrix.

Best,
Xuan

MP_Epana · September 3, 2025, 9:39pm

Bin counts should work just as well as peak counts. In my experience at least, integration depends on which peaks to use (assuming you are using MultiVI or other similar VAE approaches). If you use bulk peak calling to identify regions to count, you are more likely to miss key regions that are more cell type selective, especially in smaller populations. Junk in, Junk out.
In contrast, I’ve been able to integrate scRNA and scATAC relatively well in MultiVI, if I select the correct regions via non-bulk peak calling methods.

Here’s an example of an approach that does not rely on bulk peak calling (as well as a discussion as to why it matters).

https://www.nature.com/articles/s41467-024-50612-6

Topic		Replies	Views
Preprocess of scATAC use for peakVI or MultiVII Help	2	20	September 16, 2025
Does MultiVI support using tile matrix directly instead of peak calling for ATAC input? scvi-tools scvi , multivi	2	71	July 9, 2025
Protocol for model optimization (currently focused on MultiVI) scvi-tools multivi	3	599	May 14, 2025
totalVI, peakVI, multiVI with scRNA-seq and scATAC-seq data scvi-tools multivi , totalvi , modeling	3	846	March 9, 2023
Integrate multiple samples for paired multi-omics data scvi-tools multivi	1	610	February 15, 2022

Preprocess of scATAC use for peakVI or MultiVI

Related topics