Hi,
I have a perhaps unusual use-case.
I am working with the ouput of cellranger multi for the new probe-based fixed single cell kit.
The unfiltered raw_feature_bc_matrix.h5
which I want to utilize with Cellbender and Co contains probes and not transcript species.
That means when I load it in with sc.read_10x_h5()
there will be duplicate entries in adata.var and in the columns of adata.X as some genes have multiple probes targeting them. These entries have identical var_names
What would be the most graceful way to merge these entries?
Do you have both the same targets and the same expression levels?
Otherwise, maybe you’d want the “better” probe? I don’t know how you’d decide that though.
Hi Isaac, the expression levels of each probe is different.
They seem to be targeting different forms of the genes.
However, I see now that 10x simply removes blacklisted probes which results in unique .var in the end.
So not relevant after all.
Out of curiosity how would one merge genes? Extract the index number and then operate on anndata.X?
When I’ve done this with microarray data in the past was something like:
DataFrame where each row is a gene
Group by probe target
Some aggregation (max
, mean
, etc)
If you are okay with densify the matrix, this should be straight forward. Maybe with flox
or numpy-groupies
.
For sparse, it’s a little more complicated. But this would be a good extension of the new sc.get.aggregate
and I’ve opened an issue to track it:
opened 05:58PM - 06 Mar 24 UTC
Enhancement ✨
Area – API
### What kind of feature would you like to request?
Additional function param… eters / changed functionality / changed defaults?
### Please describe your wishes
@Intron7, found a use case 😆
It could be nice for `sc.get.aggregate` to be able to return sparse matrices where we don't expect the aggregation to return very dense data.
Previously discussed in:
* https://github.com/scverse/scanpy/issues/2892
Usecases include:
* Taking the `max` for multiple reports of a genes (`sc.get.aggregate(adata, "probe_target", "max")`, e.g. https://discourse.scverse.org/t/merging-identical-genes-from-10x-fixed-scrna/2142)
* *(note: max is not currently implemented)*
* Small aggregations, e.g. only summing neighbors
This would require both api design choices for what the argument is called, and efficient implementations for both dense and sparse results (`python-graphblas` could be useful here)
1 Like