Merging identical genes from 10x fixed scRNA

pakiessling · March 6, 2024, 5:13pm

Hi,

I have a perhaps unusual use-case.
I am working with the ouput of cellranger multi for the new probe-based fixed single cell kit.
The unfiltered raw_feature_bc_matrix.h5 which I want to utilize with Cellbender and Co contains probes and not transcript species.

That means when I load it in with sc.read_10x_h5() there will be duplicate entries in adata.var and in the columns of adata.X as some genes have multiple probes targeting them. These entries have identical var_names

What would be the most graceful way to merge these entries?

ivirshup · March 6, 2024, 5:55pm

Do you have both the same targets and the same expression levels?

Otherwise, maybe you’d want the “better” probe? I don’t know how you’d decide that though.

pakiessling · March 6, 2024, 6:17pm

Hi Isaac, the expression levels of each probe is different.
They seem to be targeting different forms of the genes.

However, I see now that 10x simply removes blacklisted probes which results in unique .var in the end.

grafik

So not relevant after all.

Out of curiosity how would one merge genes? Extract the index number and then operate on anndata.X?

ivirshup · March 6, 2024, 6:24pm

When I’ve done this with microarray data in the past was something like:

DataFrame where each row is a gene
Group by probe target
Some aggregation (max, mean, etc)

If you are okay with densify the matrix, this should be straight forward. Maybe with flox or numpy-groupies.

For sparse, it’s a little more complicated. But this would be a good extension of the new sc.get.aggregate and I’ve opened an issue to track it:

github.com/scverse/scanpy

Option to return sparse arrays from `sc.get.aggregate`

opened 05:58PM - 06 Mar 24 UTC

ivirshup

Enhancement ✨ Area – API

### What kind of feature would you like to request? Additional function param…eters / changed functionality / changed defaults? ### Please describe your wishes @Intron7, found a use case 😆 It could be nice for `sc.get.aggregate` to be able to return sparse matrices where we don't expect the aggregation to return very dense data. Previously discussed in: * https://github.com/scverse/scanpy/issues/2892 Usecases include: * Taking the `max` for multiple reports of a genes (`sc.get.aggregate(adata, "probe_target", "max")`, e.g. https://discourse.scverse.org/t/merging-identical-genes-from-10x-fixed-scrna/2142) * *(note: max is not currently implemented)* * Small aggregations, e.g. only summing neighbors This would require both api design choices for what the argument is called, and efficient implementations for both dense and sparse results (`python-graphblas` could be useful here)

Topic		Replies	Views
Integrate multiple samples for paired multi-omics data scvi-tools multivi	1	574	February 15, 2022
How to make the union set of 2 scRNA-seq matrix? anndata	2	596	May 16, 2022
Concatenate anndata with merged rows anndata	0	416	August 5, 2022
Group/sum rows based on jobs feature anndata	6	2547	March 28, 2022
How to add additional metadata on multiple different single cell RNA files from different experiments, such as organ, sample number etc anndata	1	1009	June 8, 2022

Merging identical genes from 10x fixed scRNA

Related topics