I would like to make a UMAP where the cells are colored by the average expression of the bulk signature genes but I am not confident that I did it correctly. I would like to use scanpy for it.
I did the below:
bulk_de_genes_up_list = bulk_de_genes['Gene'].tolist()
#Subset the data based on the list of genes
adata2 = adata[:, adata.var_names.isin(bulk_de_genes_up_list)]
average_expression = adata2.X.mean(axis=1)
adata2.var['bulk_de_gene_average'] = average_expression
sc.pl.umap(adata2, color='bulk_de_gene_average', cmap='viridis')
I do get a UMAP as an output but I am not sure if it is done correctly. I am mainly worried about average_expression = adata2.X.mean(axis=1)
Is that the correct way of calculating the mean of the gene expression per cell?
The only issue that might rise from the way you are calculating the mean is if you did some transformation of the data beforehand: total counts (CPM normalization) is not an issue for this, but averaging log1p data is not the right way (in that case you might use the .raw counts if you’ve saved them or try exponentiating the data before running the average using np.expm1: np.expm1(adata.var_names.isin(bulk_de_genes_up_list)].X).mean(axis=1)