How to best integrate Genomic Variant Information

Hi there,

I am currently evaluating Muon to provide the backend for multi-omics, patient level (=not single cell) datasets.
What I have not found yet is if there is already a recommended way/example how to include genomic variants.

My idea was to include them as a separate data layer.
In the anndata I would have each variant as a variable.
As the data is binary and really sparse, I would have the data matrix to be a sparse matrix with bool datatype.

Would this sound reasonable?

1 Like

Hi Vito,

I think that your approach might work well. I was wondering if a genomic variant could be more than a binary label as in “present/absent” and rather stating the specific type of variation at a certain genomic position. That might help with compressing the information. Also, do you know how well it the variables can be subsetted (e.g. if I am looking for a specific type of variation rather than genomic positions)?