Random umap and clustering resullt

Hello everyone, i meet a problem that i got different umap and louvain on two machines with same script , same data, same softwares.

Also ,very occasionally, repeatly run pca, neighbors,louvain on same adata got different results.

Hi Yan!

AFAIK all of these methods, except for neighbors, have a random element. PCA in particular will not return identical components if run twice. This means that the downstream calculations will be subtly different, although the coarse result should be the same (unless you have different preprocessing too?). Setting a random state could be a way to solve this, by forcing Python to make the same “random” choices (see here for a more in-depth explanation and here for more context.).

Thanks galicae, i set the random_state as 0 by default but still get different results on two workstations

I think i need to check whether their python environments are identical.

I use embedded python packages, i wonder same version modules are imported during runtime, Because one has another install python env.

yeah, that could be another reason. Overall, if it’s so important that you have exactly the same object, it might be worth the effort of just copying a master version over.

I think it is also possible multiple packages will require you to set the random seed for each of them independently. For example, the umap package has its own random_state parameter: UMAP Reproducibility — umap 0.5 documentation

/Valentine

It has something to do with the PCA function under the hood of scanpy which is from sklearn i think.
If you set the solver to ‘full’ then it is reproducible.
I think the random state is not the issue because it is for almost all of the scanpy functions set to ‘0’.