Hello,
I am trying to filter out a particular set of genes from a Anndata object.
So far I tried:
# Assuming 'adata' is your AnnData object
# List of genes to remove
genes_to_remove
# Convert genes_to_remove to a set for efficient lookup
genes_to_remove_set = set(genes_to_remove)
# Check which genes are in the AnnData object
genes_in_adata = set(adata.var_names)
# Find the intersection of genes_to_remove and genes_in_adata
common_genes_to_remove = genes_to_remove_set.intersection(genes_in_adata)
# Print the genes that will be removed
print("Genes to be removed:", common_genes_to_remove)
# Create a boolean mask to keep only the genes not in 'common_genes_to_remove'
mask = [gene not in common_genes_to_remove for gene in adata.var_names]
# Subset the AnnData object to exclude the genes in 'common_genes_to_remove'
adata_filtered = adata[:, mask].copy()
# Verify that the genes have been removed
print("Remaining genes:", adata_filtered .var_names)
This seems to initially work, so if I run :
for gene in genes_to_remove:
print(gene)
# Ensure the gene exists in the data
if gene not in adata_filtered .var_names:
print(f"Gene {gene} not found in the dataset.")
it does indeed state that my list of genes is gone. However, if I run a umap for one of the genes in particular or if I run a wilcoxon analysis these genes seem to still be there…
Any thoughts?