Can I use the package scib-metrics on methods that don't output an embedding?

mlebel · February 15, 2024, 6:12pm

Hello,

I am currently testing several tools to integrate my data, and I would like to compare them using the metrics computed by scIB.

I want to compare outputs of Seurat RPCA, Scanorama and scVI.
As far as I understood, while Scanorama and scVI do output a low dimensionnal embedding of the data, Seurat RPCA doesn’t.

I want to use the scib-metrics package to benchmark the different integration, but the documentation seems to suggest that this package currently works only on embedding-based methods :

In the tutorial we find :

Here we run a few embedding-based methods. By focusing on embedding-based methods, we can substantially reduce the runtime of the benchmarking metrics.

In principle, graph-based integration methods can also be benchmarked on some of the metrics that have graph inputs. Future work can explore using graph convolutional networks to embed the graph and then using the embedding-based metrics.

In the scib_metrics.benchmark.Benchmarker function documentation :
**embedding_obsm_keys** – List of obsm keys that contain the embeddings to be benchmarked.

Which means I wouldn’t be able to compare the output of RPCA with the others.
so my main question is : can I use the package scib-metrics on methods that don’t output an embedding ?

The scib method was originally used to compare more integration tools than the embedding based ones. I tried installing the original scib package, but it conflicts with my version of pandas. I’ll try to solve it (I guess I’ll have to setup proper conda environnements), but scib-metrics seemed more straightforward for what I was trying to do.

Any help would be appreciated !

martinkim0 · February 16, 2024, 5:33pm

From my understanding, it seems like Seurat RPCA is able to output a lower-dimensional representation of the data, right? If this is the case, you should be able to feed this into scib-metrics just like any embedding method. For reference, by default we compute PCA on the raw counts as a “benchmark” embedding in scib-metrics.

mlebel · February 27, 2024, 9:27am

Hello,

Thank you for your answer!

So, I was mistakenly thinking that RPCA was only outputing corrected gene expression, but I was wrong. A new cell embedding is indeed computed by Seurat RPCA.

For anyone interested, in Seurat V5 (5.0.1):
After running:

#R
se <- IntegrateLayers(
  object = se, method = RPCAIntegration,
  orig.reduction = "pca", new.reduction = "integrated.rpca",
  verbose = FALSE
)

They are available in the DimReduc “integrated.rpca” (or any name you might have given it) and the cell embedding part of it can be extracted easily with:

#R
rpca_embed <- Embeddings(se, reduction = "integrated.rpca")

Flu09 · August 25, 2024, 12:06pm

@mlebel Hello were you able to run the tool? how did you import the rpca_embed into python into the adata object? I tried to use the tool and the progress line is not moving at all.

cane11 · August 26, 2024, 5:26pm

I assume you also created a GitHub issue. Let me respond here. Your dataset is rather large (1.3 million cells). The implementation of sci-metrics is more performant than the original implementation, however, it is still expansive to compute the metrics (NN graphs with high number of neighbors).
We have GPU support within scib-metrics which requires that you install a GPU version of JAX. You can check that a GPU is used. It is otherwise safe to downsample to e.g. 100k cells and compute the metrics on this subset. Above 300k cells running it on CPU will take a very long time.

Flu09 · August 31, 2024, 12:47pm

I see thank you. what is the expected time for the 1.3m cells to finish on GPU (comparing 5 methods)? I am not sure how downsampling the seurat object affects the integrated assay. Perhaps the integrated assay will remain same size. The number of cells in the object will just be less than the array (unequal size) and it will throw errors. or I might be mistaken.

cane11 · August 31, 2024, 3:17pm

It depended a bit on the size of each sample but expect it to take roughly 30 minutes per embedding.

Topic		Replies	Views
Benchmarking CellHint with scib-metrics Benchmarker tool how to use embedding_obsm_keys Help scrna-seq , integration , scanvi , scvi	1	165	April 25, 2024
Scib_metrics run time scvi-tools	5	702	February 19, 2023
Embedding number for visualization scvi-tools integration , scvi	10	101	September 23, 2024
Scanorama embeddings on CellRank analysis cellrank integration	0	348	May 11, 2023
How to compute the integration metrics scvi-tools multivi	1	173	February 16, 2024

Can I use the package scib-metrics on methods that don't output an embedding?

Related topics