Efficient pseudobulking

emdann · February 26, 2025, 8:16pm

With the newest anndata/scanpy releases, what is the recommended scverse workflow to pseudobulk efficiently a large anndata object? I.e. summing the counts for each var for groups of cells, based on some column in obs. In the past I used decoupler, but this can be pretty slow when you have hundreds of groups (e.g. celltype x perturbation). Are there better solutions?

grst · February 26, 2025, 8:18pm

scanpy.get.aggregate

ilan-gold · February 27, 2025, 1:55pm

I am looking into making this dask-compatible as well. The issue is that the worst-case is pretty bad…looking into it though!

emdann · May 27, 2025, 6:26pm

@ilan-gold any progress on Dask-compatible aggregate?

ilan-gold · May 28, 2025, 12:08pm

I will add it to our next sprint. It should be doable, sorry for the delay here

Topic		Replies	Views
Build a large anndata object column by colum anndata	1	401	September 29, 2022
Group/sum rows based on jobs feature anndata	6	2456	March 28, 2022
Pseudobulk DE gene Analysis in scverse ecosystem scRNA-seq	10	2250	March 30, 2023
Anndata 0.10 released! General anndata , release	0	633	October 9, 2023
Current status of dask support (and on disk sparse arrays) scanpy	1	671	May 25, 2023

Efficient pseudobulking

Related topics