Working with Multiple samples with PEAKVI

Hello scverse community.

I am working with analysis of scATACseq dataset with 5 samples for 5 time points(d01, d05, d10, d25 and an Infected sample). I did the usual pre-processing for each sample (via signac) and I merged these 5 individual samples into a single seurat object :

atac_data ← Reduce(function(x, y) merge(x, y), list(d01, d05, d10, d25, Inf_dpi4)).

This merged object I converted to an anndata object :

adata ← convertFormat(atac_data, from = “seurat”, to = “anndata”, main_layer = “counts”, assay = “peaks”, drop_single_values = FALSE, outFile = “converted_object_all5_unfiltered.h5ad”)

and then used it with PEAKVI and obtain the UMAP.

Although I obtained the UMAP for this entire “merged” dataset, I would like to see the sample wise contribution to the UMAP - as an example please see the image below the image from ArchR.

Question 1: Is it correct that I merged the indivodual seurat objects into a single seurat object and then converted it or would it have been better if I had converted 5 seurat objects individually into anndata objects and then concatenated it - Would I then be able access sample-wise information in the UMAP?
Question2: Is it possible to obtain the sample-wise information for an UMAP obtained from PEAKVI at all or do I need to convert this PEAKVI-trained object back into a seurat object and then try to hopefully generate a UMAP?

I feel like I am missing a step/trick somewhere but I cant see by myself what it is.
If you could provide me the information and/or point me to a link or tutorial where I can find a solution then it would be very helpful to me. Thank you for your time.

Hi, thank you for your question. If you are referring to coloring your UMAP by each experimental sample, I think either of the methods you have listed would work. You would just have to include an observation-wise metadata column indicating which sample each observation originates from.

If you were to convert each seurat object to anndata first, you can just add an obs column to each anndata with the sample labels. Then you can pass in that column to your plotting function.

In Python, it might look something like this:

adata1.obs["sample"] = "d01"
adata2.obs["sample"] = "d05"
# same for other anndatas

adata_all_samples = anndata.concat([adata1, adata2, ...])
sc.pl.umap(adata_all_samples, color="sample")
1 Like