Subsetting anndata using genelist

Hi there, I want to subset my anndata using a list of gene names. I have about 2000 genes to extract from the original anndata. Is there a way to do this?
Any suggestions/help would be much appreciated!
Thanks

Hi @skoturan,

You can index into the anndata’s columns with the gene names, like adata[:, ["gene1", "gene2", ...]]. You may also want to check out this page from the anndata docs:

https://anndata-tutorials.readthedocs.io/en/latest/getting-started.html#Subsetting-AnnData

Thank you! It worked :smile:

1 Like

Hi,

When I try this with the following code (the subsetting works, the pca doesn’t):

adata0 = adata[:, genes1]
sc.tl.pca(adata0)

I get the following error:

File /PUHTI_TYKKY_Cq2gHLh/miniconda/envs/env1/lib/python3.9/site-packages/anndata/_core/anndata.py:683, in AnnData.X(self)
    680     X = None
    681 elif self.is_view:
    682     X = as_view(
--> 683         _subset(self._adata_ref.X, (self._oidx, self._vidx)),
    684         ElementRef(self, "X"),
    685     )
    686 else:
    687     X = self._X

File /PUHTI_TYKKY_Cq2gHLh/miniconda/envs/env1/lib/python3.9/functools.py:888, in singledispatch.<locals>.wrapper(*args, **kw)
    884 if not args:
    885     raise TypeError(f'{funcname} requires at least '
    886                     '1 positional argument')
--> 888 return dispatch(args[0].__class__)(*args, **kw)

File /PUHTI_TYKKY_Cq2gHLh/miniconda/envs/env1/lib/python3.9/site-packages/anndata/_core/index.py:168, in _subset_spmatrix(a, subset_idx)
    166 if len(subset_idx) > 1 and all(isinstance(x, cabc.Iterable) for x in subset_idx):
    167     subset_idx = (subset_idx[0].reshape(-1, 1), *subset_idx[1:])
--> 168 return a[subset_idx]

File /PUHTI_TYKKY_Cq2gHLh/miniconda/envs/env1/lib/python3.9/site-packages/scipy/sparse/_index.py:70, in IndexMixin.__getitem__(self, key)
     68         return self._get_sliceXslice(row, col)
     69     elif col.ndim == 1:
---> 70         return self._get_sliceXarray(row, col)
     71     raise IndexError('index results in >2 dimensions')
     72 elif row.ndim == 1:

File /PUHTI_TYKKY_Cq2gHLh/miniconda/envs/env1/lib/python3.9/site-packages/scipy/sparse/_csr.py:207, in _csr_base._get_sliceXarray(self, row, col)
    206 def _get_sliceXarray(self, row, col):
--> 207     return self._major_slice(row)._minor_index_fancy(col)

File /PUHTI_TYKKY_Cq2gHLh/miniconda/envs/env1/lib/python3.9/site-packages/scipy/sparse/_compressed.py:774, in _cs_matrix._minor_index_fancy(self, idx)
    772 col_offsets = np.zeros(N, dtype=idx_dtype)
    773 res_indptr = np.empty_like(self.indptr)
--> 774 csr_column_index1(k, idx, M, N, self.indptr, self.indices,
    775                   col_offsets, res_indptr)
    777 # pass 2: copy indices/data for selected idxs
    778 col_order = np.argsort(idx).astype(idx_dtype, copy=False)

ValueError: Output dtype not compatible with inputs.

Please advise.

It’s been resolved. Thank you.