I’m interested in using scanpy.pl.stacked_violin() to plot multiple gene scores sc.tl.score_genes(). As far as I can tell scanpy.pl.stacked_violin() only takes var inputs and groups by an obs. Is there a way to do obs x obs with stacked_violin or is there some technical reason I would not want to do this?
As you mentioned, the way sc.pl.stacked_violin works is to check for var_names so you cannot use values stored in obs. One way to overcome this would be to create a new AnnData object where in .X you would store the obtained scores (extracted from .obs of the original adata object), with dimensions n_cells x n_genesets. This way you can plug this directly into scanpy plotting functions.
Just as an alternative example, this is how its done in decoupler:
import scanpy as sc
import pandas as pd
import decoupler as dc
# Load data
adata = sc.datasets.pbmc3k_processed()
# Define genesets
net = pd.DataFrame([
['gset B cells', 'CD74'],
['gset B cells', 'HLA-DRA'],
['gset B cells', 'HLA-DPB1'],
['gset T cells', 'CD3D'],
['gset T cells', 'CD3E'],
['gset T cells', 'IL32'],
], columns=['source', 'target']
)
# Run enrichment (equivalent of sc.tl.score_genes)
dc.run_ulm(adata, net, weight=None, min_n=0)
# Create new AnnData with geneset scores in .X
acts = dc.get_acts(adata, 'ulm_estimate')
# Plot UMAP scores
sc.pl.umap(acts, color=acts.var_names)
# Plot violin
sc.pl.stacked_violin(acts, var_names=acts.var_names, groupby='louvain', standard_scale='var')