I’m trying to adapt this to a very simple case which is the iris dataset:
import anndata as ad
import scanpy as sc
from sklearn.neighbors import NearestNeighbors
from scipy.sparse import csr_matrix
from sklearn import datasets
iris = datasets.load_iris()
X_iris = pd.DataFrame(
iris.data,
columns=iris.feature_names,
)
X_iris.index = X_iris.index.map(str)
df_meta = pd.Series(iris.target).map(lambda x: iris.target_names[x]).to_frame("species")
df_meta.index = df_meta.index.map(str)
adata = ad.AnnData(X_iris, obs=df_meta)
sc.pp.neighbors(adata, n_neighbors=10, use_rep='X', method='gauss', metric="euclidean")
sc.tl.diffmap(adata, n_comps=min(adata.shape))
adata.obsm["X_diffmap_"] = adata.obsm["X_diffmap"][:, 1:]
sc.pl.embedding(adata, "diffmap_", color=["species"])
I have a few questions:
 How are the distances in adata.obsp[“distances”] calculated?
It looks like they are calculated with the following code but I was wrong:
neigh = NearestNeighbors(n_neighbors=10,metric="euclidean")
neigh.fit(adata.X)
distances = neigh.kneighbors_graph(np.ascontiguousarray(adata.X, dtype=np.float32), mode="distance")
np.allclose(distances.toarray(), adata.obsp["distances"].toarray())
# False
 Similarly, how are the connectivities calculated?
I thought it would be from similar code but Sklearn’s NearestNeighbors returns 1/0 not continuous floats:
neigh.kneighbors_graph(np.ascontiguousarray(adata.X, dtype=np.float32), mode="connectivity")
 How are the eigenvalues interpreted?
According to this thread, you are supposed to drop the first dimension because it’s the steady state and not informative: Error when repeating the tutorial for diffusion map in v1.9.1 scanpy · Issue #2254 · scverse/scanpy · GitHub
adata.uns["diffmap_evals"]
# array([1. , 1. , 0.9813439 , 0.94277596], dtype=float32)
Are the eigenvalues transformed?
In some literature, it says the steady state eigenvector’s eigenvalue is 0 and the importance is inversely proportional to the nonzero eigenvalue.
The Laplacian eigenvectors that are identified in the processare also of interest for a different reason. Thenth eigenvectorvncontains one element corresponding to each of the species, whichis related to thenth itrait for that species. The corresponding eigenvalue λn is inversely proportional to the relative importance of this nth trait axis (SI Appendix).
Ryabov et al. 2022 doi:10.1073/pnas.2118156119
Are eigenvalues in descending order of importance?
What does it mean that the 2nd eigenvector is also 1? Does that mean this is also uninformative?
 Are there any methods with Scanpy to fit a model and transform new data based on the model?
For example, like this: Usage — pydiffmap 0.2.0.1 documentation
mydmap.fit(X)
dmap_X = mydmap.transform(X)
dmap_Y = mydmap.transform(Y)

What does it mean when you get negative eigenvalues?

I noticed that I was able to generate more Diffusion Map dimensions than I had dimensions in my original data. Are any of these dimensions informative? Is there a rule of thumb?