Scib_metrics run time

Hi,

I am trying to compare several batch effect removal tools using scib_metrics.
However, the benchmarking function didn’t progress after waiting for an hour.
What runtime would one expect for this function?

Stats about the dataset:

1 Like

Thanks for trying out the package. It should only take a few minutes. I assume you are only using CPU and do not have access to a GPU?

Can you try running like this?

from scib_metrics.benchmark import Benchmarker, BioConservation

biocons = BioConservation(
    isolated_labels=False, nmi_ari_cluster_labels_leiden=False, nmi_ari_cluster_labels_kmeans=False
)
bm = Benchmarker(
    adata,
    batch_key="sample",
    label_key="major_cell_type",
    embedding_obsm_keys=["Unintegrated", ...],
    pre_integrated_embedding_obsm_key="X_pca",
    bio_conservation_metrics=biocons,
    n_jobs=16,
)
bm.benchmark()

I think nmi_ari_cluster_labels_leiden or isolated_labels may be slow.

I tried running the snippet and again it got stuck:

I believe it’s connected to the silhouette score because when I interrupt manually I see that this is what he’s working on:

I will try to take it out.

In your example, is unintegrated equal to the X_pca?

Can you also try the following?

biocons = BioConservation(
    isolated_labels=False, 
    nmi_ari_cluster_labels_leiden=False,
    nmi_ari_cluster_labels_kmeans=False,
    silhouette_label={"chunk_size": 64},
)
batchcorr = BatchCorrection(
    silhoutte_batch={"chunk_size": 64},
)

It’s possible that you’re running into memory issues here.

Trying this would require updating to the latest version.

Thanks a lot for helping out Adam.
I updated to the latest version but unfortunately it still returned an error: