I have a 10x Multiome data set (GSE199994) and ran
scvi.data.read_10x_multiome() on each of 10 batches (1 batch per patient).
But when I try
adataconcat = adata1.concatenate(adata2), the issue seems to be that there are no shared peaks between any of the batches:
For instance, adata1.var might contain 70,000 peaks with names like
chr11:32333163-32334040, but none of these exist in any of the other patients.
Hi @mkarikom, I think concatenation here is only defined for the same feature sets. In this case it seems the peaks were called separately on different samples.
See a discussion on a seemingly similar topic in the MuData repository here for some more details.
Thanks @gtca, I ended up re-using the 10x cell-calling and reducing the peaks in signac using the raw ATAC data, then substituting these features for the ones generated by the per-batch peak-calling. In this case, the cell-calling previously performed for each batch was recycled.