How to interpret Diffusion Maps?

I’m trying to adapt this to a very simple case which is the iris dataset:

import anndata as ad
import scanpy as sc
from sklearn.neighbors import NearestNeighbors
from scipy.sparse import csr_matrix
from sklearn import datasets

iris = datasets.load_iris()
X_iris = pd.DataFrame(
    iris.data,
    columns=iris.feature_names,
)
X_iris.index = X_iris.index.map(str)
df_meta = pd.Series(iris.target).map(lambda x: iris.target_names[x]).to_frame("species")
df_meta.index = df_meta.index.map(str)


adata = ad.AnnData(X_iris, obs=df_meta)
sc.pp.neighbors(adata, n_neighbors=10, use_rep='X', method='gauss', metric="euclidean")
sc.tl.diffmap(adata, n_comps=min(adata.shape))
adata.obsm["X_diffmap_"] = adata.obsm["X_diffmap"][:, 1:] 
sc.pl.embedding(adata, "diffmap_", color=["species"])

Unknown

I have a few questions:

  • How are the distances in adata.obsp[“distances”] calculated?
    It looks like they are calculated with the following code but I was wrong:
neigh = NearestNeighbors(n_neighbors=10,metric="euclidean")
neigh.fit(adata.X)
distances = neigh.kneighbors_graph(np.ascontiguousarray(adata.X, dtype=np.float32), mode="distance")
np.allclose(distances.toarray(), adata.obsp["distances"].toarray())
# False
  • Similarly, how are the connectivities calculated?
    I thought it would be from similar code but Sklearn’s NearestNeighbors returns 1/0 not continuous floats:
neigh.kneighbors_graph(np.ascontiguousarray(adata.X, dtype=np.float32), mode="connectivity")
  • How are the eigenvalues interpreted?

According to this thread, you are supposed to drop the first dimension because it’s the steady state and not informative: Error when repeating the tutorial for diffusion map in v1.9.1 scanpy · Issue #2254 · scverse/scanpy · GitHub

adata.uns["diffmap_evals"]
# array([1.        , 1.        , 0.9813439 , 0.94277596], dtype=float32)

Are the eigenvalues transformed?

In some literature, it says the steady state eigenvector’s eigenvalue is 0 and the importance is inversely proportional to the non-zero eigenvalue.

The Laplacian eigenvectors that are identified in the processare also of interest for a different reason. Thenth eigenvectorvncontains one element corresponding to each of the species, whichis related to thenth i-trait for that species. The corresponding eigenvalue λn is inversely proportional to the relative importance of this nth trait axis (SI Appendix).

Ryabov et al. 2022 doi:10.1073/pnas.2118156119

Are eigenvalues in descending order of importance?

What does it mean that the 2nd eigenvector is also 1? Does that mean this is also uninformative?

  • Are there any methods with Scanpy to fit a model and transform new data based on the model?

For example, like this: Usage — pydiffmap 0.2.0.1 documentation

mydmap.fit(X)
dmap_X = mydmap.transform(X)
dmap_Y = mydmap.transform(Y)
  • What does it mean when you get negative eigenvalues?

  • I noticed that I was able to generate more Diffusion Map dimensions than I had dimensions in my original data. Are any of these dimensions informative? Is there a rule of thumb?