Benchmarking with scib-metrics: Graph Connectivity as a batch correction metric

Hello!
I was wondering why graph connectivity is grouped as a batch correction metric in scib-metrics? From my understanding of the (reference) it seems to be measuring if cells of the same cell type are connected in the kNN graph - which sounds like a bioconservation metric?
Any insight would be appreciated!
Thanks,

Hello, the scvi-tools tag seems confusing here. It measures how well connected cells of a specific type are. If batch correction fails, then you get seperated graphs for the same cell-type. To get a better intuition, batch correction metrics (like graph connectivity) improve if you would use as embedding of the full dataset a single point.

Ah yes didn’t know what tag to throw it under. I guess to me that still sounds like a bioconservation metric (i.e. how well a given cell type is embedded within neighbors of the same cell type). Perhaps there are separate graphs for a single cell type, but within those separate graphs the ‘batch label’ is fully mixed, then one would say that in terms of ‘batch correction’ alone the method did its job well. Hope that makes sense!

Again this metric would be optimized by setting a single point as the embedding. It doesn’t measure bioconservation therefore (as this is the worst embedding for cell-type conservation). It computes a subgraph per cell-type and therefore has no notion of cell-type distances. See scib.metrics.graph_connectivity — scib 1.1.4 documentation for a good description.

Sorry to belabor the point, but the above documentation says subgraph connectivity is quantified per cell type. Wouldn’t a single point be the best embedding (from a bioconservation point of view) for a given cell type? I.e. all data points belonging to a given cell type are given the same embedding.

Bioconservation checks for between cell-type distances (how well seperated are cell-types). As the graph is only built for a single cell-type it doesn’t measure whether different cell-types are separated. Therefore you can easily maximize this metric while minimizing bioconservation (all cell-types embedded to a single point).