How to make a UMAP for single cell data and color cells by average expression of a list of genes in scanpy?

bioinf · February 13, 2023, 5:01pm

Hello,

I would like to make a UMAP where the cells are colored by the average expression of the bulk signature genes but I am not confident that I did it correctly. I would like to use scanpy for it.

I did the below:

bulk_de_genes_up_list = bulk_de_genes['Gene'].tolist()
#Subset the data based on the list of genes
adata2 = adata[:, adata.var_names.isin(bulk_de_genes_up_list)]
average_expression = adata2.X.mean(axis=1)
adata2.var['bulk_de_gene_average'] = average_expression
sc.pl.umap(adata2, color='bulk_de_gene_average', cmap='viridis')

I do get a UMAP as an output but I am not sure if it is done correctly. I am mainly worried about average_expression = adata2.X.mean(axis=1)

Is that the correct way of calculating the mean of the gene expression per cell?

Thank you

yotamcons · March 16, 2023, 10:42am

Hey there,
a simple test would be to select one gene, and see if you get the right numbers for it. If that works well, check for two genes and see that you get the average

Your code seems OK (axis 1 is the columns that are the genes),
though there doesn’t seem to be a need for creating the new adata:

bulk_de_genes_up_list = bulk_de_genes['Gene'].tolist()

average_expression = adata[:, adata.var_names.isin(bulk_de_genes_up_list)].X.mean(axis=1)

adata.var['bulk_de_gene_average'] = average_expression

sc.pl.umap(adata, color='bulk_de_gene_average', cmap='viridis')

The only issue that might rise from the way you are calculating the mean is if you did some transformation of the data beforehand: total counts (CPM normalization) is not an issue for this, but averaging log1p data is not the right way (in that case you might use the .raw counts if you’ve saved them or try exponentiating the data before running the average using np.expm1:
np.expm1(adata.var_names.isin(bulk_de_genes_up_list)].X).mean(axis=1)

Topic		Replies	Views
How do I get a list (NOT plot/figure/pdf) of highest expressed genes in scanpy? scanpy	3	1094	July 8, 2022
Weird UMAP after running scVI scvi-tools	6	262	September 12, 2024
UMAP plot colors problem scanpy	2	689	June 6, 2024
Umap shapes different for the same dataset scanpy	1	305	January 15, 2024
UMI tables using scanpy scRNA-seq	3	658	June 29, 2022

How to make a UMAP for single cell data and color cells by average expression of a list of genes in scanpy?

Related topics