Efficient pseudobulking

With the newest anndata/scanpy releases, what is the recommended scverse workflow to pseudobulk efficiently a large anndata object? I.e. summing the counts for each var for groups of cells, based on some column in obs. In the past I used decoupler, but this can be pretty slow when you have hundreds of groups (e.g. celltype x perturbation). Are there better solutions?

scanpy.get.aggregate

1 Like

I am looking into making this dask-compatible as well. The issue is that the worst-case is pretty bad…looking into it though!