Data preprocessing for scATAC-seq

Hi Adam @adamgayoso,
I would like to ask if scvi-tools supports functions for preprocessing scATAC-seq data, similar to what have been provided by Seurat, or not. For example, does it have function for filtering out cells based on the fraction of of fragments in peaks/ fragments overlapping with TSSs, etc. (examples given here: (Analyzing PBMC scATAC-seq • Signac). I have looked for them so far, however it seems scvi-tools only supports read_10x_atac to read ATAC-seq data in, which causes the loss of other metadata for necessary preprocessing.
Thank you very much for a great framework, and look forward to hearing from you.

We do not have this functionality and would recommend using Seurat for this. We will have a tutorial up for this soon, but you can then use sceasy R package to convert from R to anndata

Hi Adam,
Thanks for your prompt feedback. As you suggested, at this moment that may be the best option for data preprocessing before running scvi. Hope to see functional updates from you in near future.
Kind regards,

Just to be clear, we do not anticipate adding preprocessing functionality directly in scvi-tools as it’s out of scope in our opinion. We will instead show how to use other tools.

Hi Adam,

Thanks for clarifying. Based on your suggestion, I have found a way working around preprocessing of the data. It’s obvious that adding core functionalities to scvi-tools should have higher priority.

By the way, is there any appropriate way to assess the goodness of a trained PeakVI model besides Elbo value on validation data? I have seen some posts suggesting using posterior predictive values of gene expression in case of scVI models, but not sure if there is a similar thing for chromatin openness or not.

Hope to see many more functions to come.

Kind regards,