I’m using colab to run my analysis, and since the session is restarted every time you logged out, or terminated due to long pause, i have come to realization on how the inconsistency of the doublets prediction is affecting my analysis.
This is most evidenced when I perform clustering, each restart generated different UMAP profile.
Should i implement random seed ??
Or is this normal??? (im newbie in this field)
I would highly appreciate any advice I could get in this forum.
You should run with scvi.settings.seed = 0 at the beginning (see any of our tutorials), but its not enough.
You will only get the exact same UMAPs when comparing 2 runs done after restarting their sessions (under interactive session).
In other words, you might get different UMAPs even if you are running the exact same code but under the same session and even after setting that seed.
Only setting seed + restarting session each time will guarantee reproducible results (of course given you are running with the same logic)
It’s unfortunately slightly worse on Colab. There is no guarantee that you get exactly the same plots in Colab. Some additional variation is due to the used GPU (CUDA on two different devices is not deterministic) and Colab sometimes switches the GPU or updates their cuDNN library. I don’t find differences in downstream analysis of results but sometimes in number of Leiden clusters etc. Reproducibility — PyTorch 2.5 documentation gives a short overview.