Efficient pseudobulking

With the newest anndata/scanpy releases, what is the recommended scverse workflow to pseudobulk efficiently a large anndata object? I.e. summing the counts for each var for groups of cells, based on some column in obs. In the past I used decoupler, but this can be pretty slow when you have hundreds of groups (e.g. celltype x perturbation). Are there better solutions?

scanpy.get.aggregate

1 Like

I am looking into making this dask-compatible as well. The issue is that the worst-case is pretty bad…looking into it though!

@ilan-gold any progress on Dask-compatible aggregate? :grimacing:

I will add it to our next sprint. It should be doable, sorry for the delay here

1 Like