I am running my multiome data (atac + gex) with muon and am getting an error that pertains to the names saving back to the index.
To be clear I am running muon on data that was created from bamtofastq as we no longer have the index files for our fastqs (not sure if that’s why I’m having trouble)
mdata.var_names_make_unique()
ValueError: Length mismatch: Expected axis has 193932 elements, new values have 193244 elements
It appears each modality has unique variables but when they’re merged the attribute is lost.
It would be great if we manage to reproduce it together first to figure out where the error is.
Here are code snippets of how it looks for the latest mudata.
Make some objects first:
from mudata import AnnData, MuData
import numpy as np
from jax import random
# modality x
x = np.array(random.normal(random.PRNGKey(1), (100, 10)))
ad_x = AnnData(x)
ad_x.var_names = [f"x{i}" for i in range(ad_x.n_vars)]
# modality y
y = np.array(random.normal(random.PRNGKey(1), (100, 20)))
ad_y = AnnData(y)
ad_y.var_names = [f"y{i}" for i in range(ad_y.n_vars)]
It seems to work as expected for the three scenarios outlined below:
# case 1: var_names unique for each modality
mdata = MuData({"x": ad_x, "y": ad_y})
mdata.var.index.is_unique() # => True
# case 2: some var_names are in both modalities
ad_x.var_names = [f"y{i}" for i in range(ad_x.n_vars)]
mdata = MuData({"x": ad_x, "y": ad_y})
mdata.var.index.is_unique() # => False
mdata.var_names_make_unique()
mdata.var.index.is_unique() # => True
# case 3: duplicate feature names in one modality & case 2
ad_x.var_names = [f"y{i}" for i in range(ad_x.n_vars)]
ad_y.var_names = [f"y{i}" for i in range(ad_y.n_vars)]
var_x = ad_x.var_names.values
var_x[0] = "y1"
ad_x.var_names = var_x
mdata = MuData({"x": ad_x, "y": ad_y})
mdata.var_names_make_unique()
mdata.var.index.is_unique() # => True
Are there any less evident pd.Index manipulations in your pipeline that haven’t been addressed by these scenarios?