Anndata.concatenate() with two 10x multiome datasets?

I have a 10x Multiome data set (GSE199994) and ran scvi.data.read_10x_multiome() on each of 10 batches (1 batch per patient).

But when I try adataconcat = adata1.concatenate(adata2), the issue seems to be that there are no shared peaks between any of the batches:

For instance, adata1.var might contain 70,000 peaks with names like chr11:32333163-32334040, but none of these exist in any of the other patients.

Hi @mkarikom, I think concatenation here is only defined for the same feature sets. In this case it seems the peaks were called separately on different samples.

See a discussion on a seemingly similar topic in the MuData repository here for some more details.

1 Like

Thanks @gtca, I ended up re-using the 10x cell-calling and reducing the peaks in signac using the raw ATAC data, then substituting these features for the ones generated by the per-batch peak-calling. In this case, the cell-calling previously performed for each batch was recycled.

1 Like