Hi,
I am trying to compare several batch effect removal tools using scib_metrics.
However, the benchmarking function didn’t progress after waiting for an hour.
What runtime would one expect for this function?
Stats about the dataset:
Hi,
I am trying to compare several batch effect removal tools using scib_metrics.
However, the benchmarking function didn’t progress after waiting for an hour.
What runtime would one expect for this function?
Stats about the dataset:
Thanks for trying out the package. It should only take a few minutes. I assume you are only using CPU and do not have access to a GPU?
Can you try running like this?
from scib_metrics.benchmark import Benchmarker, BioConservation
biocons = BioConservation(
isolated_labels=False, nmi_ari_cluster_labels_leiden=False, nmi_ari_cluster_labels_kmeans=False
)
bm = Benchmarker(
adata,
batch_key="sample",
label_key="major_cell_type",
embedding_obsm_keys=["Unintegrated", ...],
pre_integrated_embedding_obsm_key="X_pca",
bio_conservation_metrics=biocons,
n_jobs=16,
)
bm.benchmark()
I think nmi_ari_cluster_labels_leiden
or isolated_labels
may be slow.
I tried running the snippet and again it got stuck:
I believe it’s connected to the silhouette score because when I interrupt manually I see that this is what he’s working on:
I will try to take it out.
In your example, is unintegrated equal to the X_pca?
Can you also try the following?
biocons = BioConservation(
isolated_labels=False,
nmi_ari_cluster_labels_leiden=False,
nmi_ari_cluster_labels_kmeans=False,
silhouette_label={"chunk_size": 64},
)
batchcorr = BatchCorrection(
silhoutte_batch={"chunk_size": 64},
)
It’s possible that you’re running into memory issues here.
Trying this would require updating to the latest version.