PeakVI preprocessing for multi-timepoint scATAC-seq when peak coordinates differ slightly across samples

khagani · April 20, 2026, 4:30pm

I am working with a scATAC-seq / multi-timepoint reprogramming dataset and want to run PeakVI on a combined AnnData object across samples/timepoints. The issue is that peaks were called separately for each sample, so the peak coordinates differ slightly between samples even when they appear to represent the same biological region.

For example, for what looks like the same locus, I see:
* Day0: chr1:3210643-3211508
* Day4: chr1:3210640-3211510
* Day7: chr1:3210651-3211509

When I compare exact peak names across samples, the overlap is almost zero, even though biologically these regions seem to correspond to the same accessible site.

Example overlap counts:
* total peaks:
* Day0: 292,251
* Day4: 253,628
* Day7: 198,323

* exact intersections:
* Day0 ∩ Day4: 20
* Day0 ∩ Day7: 14
* Day4 ∩ Day7: 25

So exact string matching is clearly not the right way to harmonize features here. My question is: What is the recommended way to construct a shared feature space for PeakVI when peak boundaries differ slightly across samples?

More specifically: Using 10x / Cell Ranger / Cell Ranger ARC outputs, what is the best practice to harmonize peaks before building the final AnnData for PeakVI?

My concern is that if I use only the exact intersection of peaks, I will lose genuine timepoint-specific accessibility changes, which are biologically important in my context as a reprogramming trajectory. And also when running it as is I see 0 integration of timepoints (which should be happening based on what we know about the biology) with all different resolutions Any guidance on best practice for preprocessing before PeakVI would be very helpful.

ori-kron-wis · April 23, 2026, 7:05am

I can only suggest posting on scATAC-seq - scverse, as this is more of a pre-processing question rather than scvi per se.

cane11 · April 23, 2026, 9:19am

Please check SnapATAC2 tutorials for coherent peak calling across your different samples: Standard pipeline: analyzing 5K PBMC dataset from 10X genomics — SnapATAC2 2.9.0 documentation.

Topic		Replies	Views
Preprocess of scATAC use for peakVI or MultiVI scvi-tools	5	220	September 3, 2025
Integrate multiple samples for paired multi-omics data scvi-tools multivi	1	663	February 15, 2022
Preprocess of scATAC use for peakVI or MultiVII Help	0	34	September 12, 2025
Working with Multiple samples with PEAKVI scvi-tools scatac-seq , scvi , peakvi	1	650	May 5, 2023
Anndata.concatenate() with two 10x multiome datasets? anndata integration , multivi	2	719	December 29, 2022

PeakVI preprocessing for multi-timepoint scATAC-seq when peak coordinates differ slightly across samples

Related topics