Inconsistency of scvi/SOLO in predicting doublets?

yiwei24 · November 3, 2024, 12:53pm

I’m using colab to run my analysis, and since the session is restarted every time you logged out, or terminated due to long pause, i have come to realization on how the inconsistency of the doublets prediction is affecting my analysis.

This is most evidenced when I perform clustering, each restart generated different UMAP profile.

Should i implement random seed ??
Or is this normal??? (im newbie in this field)

I would highly appreciate any advice I could get in this forum.

ori-kron-wis · November 3, 2024, 1:12pm

You should run with scvi.settings.seed = 0 at the beginning (see any of our tutorials), but its not enough.

You will only get the exact same UMAPs when comparing 2 runs done after restarting their sessions (under interactive session).
In other words, you might get different UMAPs even if you are running the exact same code but under the same session and even after setting that seed.

Only setting seed + restarting session each time will guarantee reproducible results (of course given you are running with the same logic)

yiwei24 · November 3, 2024, 1:32pm

Does this mean i must return the session to a clean slate every time i run scvi??

Would running multiple samples in the same session affect the consistency??

ori-kron-wis · November 3, 2024, 1:39pm

when you need to exactly reproduce your UMAPs and results, yes. And this general rule of thumb is true to any statistical code, not just SCVI.

if not, you should still get similar results, just a bit different due to the random nature in the process.

I didn’t see exactly what you did, but having multiple samples should not the reason.

yiwei24 · November 3, 2024, 1:59pm

Thank you for your thorough explanation.

cane11 · November 7, 2024, 5:02am

It’s unfortunately slightly worse on Colab. There is no guarantee that you get exactly the same plots in Colab. Some additional variation is due to the used GPU (CUDA on two different devices is not deterministic) and Colab sometimes switches the GPU or updates their cuDNN library. I don’t find differences in downstream analysis of results but sometimes in number of Leiden clusters etc. Reproducibility — PyTorch 2.5 documentation gives a short overview.

Topic		Replies	Views
scGen generate irreproducible output scvi-tools	6	366	July 24, 2023
scVI reproducibility seed issue scvi-tools scvi	4	924	September 19, 2024
Random umap and clustering resullt scanpy	5	1354	April 7, 2023
Consistent results when using scvi.model.SCVI scvi-tools scvi , developer	3	786	May 11, 2022
Unstable result scvi-tools scvi	0	297	February 1, 2023

Inconsistency of scvi/SOLO in predicting doublets?

Related topics