Managing anndata with multiple features from each gene

Calibarn · April 16, 2024, 9:00pm

I am working on analysis where one variable (gene) contains N number of properties. If I want any additional analysis (pca, neighbor, leiden…) to be able to factor in all values from each gene, should I:

Create multiple layers for each property.
Create separate anndata object for each property.
Expand the number of vars into N*vars to include the N number of properties for each gene.

I am aware that neighbors and umap have no options for which layer to operate on, so what is the best way to deal with this?

ivirshup · April 17, 2024, 10:21am

What do you mean by “factor in”? Do you want each sub-gene feature to be considered a variable?

Calibarn · April 17, 2024, 11:05am

Yes that is correct. I’m just wondering will it mess up anything if I artificially increase the number of variables.

ivirshup · April 17, 2024, 11:30am

I think the:

Expand the number of vars into N*vars to include the N number of properties for each gene.

approach is fine and the way to go if you want to treat each of these gene/ property things as a separate variable.

I don’t think this will mess anything up. But which way you structuring the data really depends on what you’re doing with it downstream.

Create multiple layers for each property

This is the approach that’s taken for an scvelo-like model. But, doesn’t get you the “each sub-gene feature to be considered a variable”.

Calibarn · April 17, 2024, 5:56pm

I do understand that when it comes to cell annotation it would definitely not work, or at least I have to create a separate anndata with just the gene counts.

For algorithms like neighbour, UMAP and leiden, I want to be sure if they will be able to identify the difference of the gene properties and properly cluster the cells based on gene properties.

Topic		Replies	Views
How to subset anndata variables, but still store the removed variables elsewhere for downstream analysis? anndata	1	116	July 23, 2024
The totalVI DE test; gene names scvi-tools totalvi	2	366	May 24, 2022
How to use normalised expression scvi-tools scvi	0	564	December 19, 2023
scVI integration with all genes scvi-tools integration , scvi	0	281	December 5, 2023
Run scanpy.pp.neighbors and UMAP on a different layer other than X? scanpy	2	873	October 11, 2023

Managing anndata with multiple features from each gene

Related topics