Hi there, I want to subset my anndata using a list of gene names. I have about 2000 genes to extract from the original anndata. Is there a way to do this?
Any suggestions/help would be much appreciated!
Thanks
Hi @skoturan,
You can index into the anndata’s columns with the gene names, like adata[:, ["gene1", "gene2", ...]]
. You may also want to check out this page from the anndata
docs:
https://anndata-tutorials.readthedocs.io/en/latest/getting-started.html#Subsetting-AnnData
Thank you! It worked
1 Like
Hi,
When I try this with the following code (the subsetting works, the pca doesn’t):
adata0 = adata[:, genes1]
sc.tl.pca(adata0)
I get the following error:
File /PUHTI_TYKKY_Cq2gHLh/miniconda/envs/env1/lib/python3.9/site-packages/anndata/_core/anndata.py:683, in AnnData.X(self)
680 X = None
681 elif self.is_view:
682 X = as_view(
--> 683 _subset(self._adata_ref.X, (self._oidx, self._vidx)),
684 ElementRef(self, "X"),
685 )
686 else:
687 X = self._X
File /PUHTI_TYKKY_Cq2gHLh/miniconda/envs/env1/lib/python3.9/functools.py:888, in singledispatch.<locals>.wrapper(*args, **kw)
884 if not args:
885 raise TypeError(f'{funcname} requires at least '
886 '1 positional argument')
--> 888 return dispatch(args[0].__class__)(*args, **kw)
File /PUHTI_TYKKY_Cq2gHLh/miniconda/envs/env1/lib/python3.9/site-packages/anndata/_core/index.py:168, in _subset_spmatrix(a, subset_idx)
166 if len(subset_idx) > 1 and all(isinstance(x, cabc.Iterable) for x in subset_idx):
167 subset_idx = (subset_idx[0].reshape(-1, 1), *subset_idx[1:])
--> 168 return a[subset_idx]
File /PUHTI_TYKKY_Cq2gHLh/miniconda/envs/env1/lib/python3.9/site-packages/scipy/sparse/_index.py:70, in IndexMixin.__getitem__(self, key)
68 return self._get_sliceXslice(row, col)
69 elif col.ndim == 1:
---> 70 return self._get_sliceXarray(row, col)
71 raise IndexError('index results in >2 dimensions')
72 elif row.ndim == 1:
File /PUHTI_TYKKY_Cq2gHLh/miniconda/envs/env1/lib/python3.9/site-packages/scipy/sparse/_csr.py:207, in _csr_base._get_sliceXarray(self, row, col)
206 def _get_sliceXarray(self, row, col):
--> 207 return self._major_slice(row)._minor_index_fancy(col)
File /PUHTI_TYKKY_Cq2gHLh/miniconda/envs/env1/lib/python3.9/site-packages/scipy/sparse/_compressed.py:774, in _cs_matrix._minor_index_fancy(self, idx)
772 col_offsets = np.zeros(N, dtype=idx_dtype)
773 res_indptr = np.empty_like(self.indptr)
--> 774 csr_column_index1(k, idx, M, N, self.indptr, self.indices,
775 col_offsets, res_indptr)
777 # pass 2: copy indices/data for selected idxs
778 col_order = np.argsort(idx).astype(idx_dtype, copy=False)
ValueError: Output dtype not compatible with inputs.
Please advise.
It’s been resolved. Thank you.