How to interpret Diffusion Maps?

jolespin · April 18, 2024, 9:45pm

I’m trying to adapt this to a very simple case which is the iris dataset:

import anndata as ad
import scanpy as sc
from sklearn.neighbors import NearestNeighbors
from scipy.sparse import csr_matrix
from sklearn import datasets

iris = datasets.load_iris()
X_iris = pd.DataFrame(
    iris.data,
    columns=iris.feature_names,
)
X_iris.index = X_iris.index.map(str)
df_meta = pd.Series(iris.target).map(lambda x: iris.target_names[x]).to_frame("species")
df_meta.index = df_meta.index.map(str)


adata = ad.AnnData(X_iris, obs=df_meta)
sc.pp.neighbors(adata, n_neighbors=10, use_rep='X', method='gauss', metric="euclidean")
sc.tl.diffmap(adata, n_comps=min(adata.shape))
adata.obsm["X_diffmap_"] = adata.obsm["X_diffmap"][:, 1:] 
sc.pl.embedding(adata, "diffmap_", color=["species"])

Unknown

I have a few questions:

How are the distances in adata.obsp[“distances”] calculated?
It looks like they are calculated with the following code but I was wrong:

neigh = NearestNeighbors(n_neighbors=10,metric="euclidean")
neigh.fit(adata.X)
distances = neigh.kneighbors_graph(np.ascontiguousarray(adata.X, dtype=np.float32), mode="distance")
np.allclose(distances.toarray(), adata.obsp["distances"].toarray())
# False

Similarly, how are the connectivities calculated?
I thought it would be from similar code but Sklearn’s NearestNeighbors returns 1/0 not continuous floats:

neigh.kneighbors_graph(np.ascontiguousarray(adata.X, dtype=np.float32), mode="connectivity")

How are the eigenvalues interpreted?

According to this thread, you are supposed to drop the first dimension because it’s the steady state and not informative: Error when repeating the tutorial for diffusion map in v1.9.1 scanpy · Issue #2254 · scverse/scanpy · GitHub

adata.uns["diffmap_evals"]
# array([1.        , 1.        , 0.9813439 , 0.94277596], dtype=float32)

Are the eigenvalues transformed?

In some literature, it says the steady state eigenvector’s eigenvalue is 0 and the importance is inversely proportional to the non-zero eigenvalue.

The Laplacian eigenvectors that are identified in the processare also of interest for a different reason. Thenth eigenvectorvncontains one element corresponding to each of the species, whichis related to thenth i-trait for that species. The corresponding eigenvalue λn is inversely proportional to the relative importance of this nth trait axis (SI Appendix).

Ryabov et al. 2022 doi:10.1073/pnas.2118156119

Are eigenvalues in descending order of importance?

What does it mean that the 2nd eigenvector is also 1? Does that mean this is also uninformative?

Are there any methods with Scanpy to fit a model and transform new data based on the model?

For example, like this: Usage — pydiffmap 0.2.0.1 documentation

mydmap.fit(X)
dmap_X = mydmap.transform(X)
dmap_Y = mydmap.transform(Y)

What does it mean when you get negative eigenvalues?
I noticed that I was able to generate more Diffusion Map dimensions than I had dimensions in my original data. Are any of these dimensions informative? Is there a rule of thumb?

Topic		Replies	Views
Question about diffusion map "gaussian" implementation scanpy	3	300	April 22, 2024
Diffmap DPT question scanpy dpt	3	1376	July 17, 2022
Run scanpy.pp.neighbors and UMAP on a different layer other than X? scanpy	2	881	October 11, 2023
Distances -> connectivities in neighbor graph construction scanpy	3	1233	April 22, 2022
Scanpy.tl.rank_genes_groups, layer= does not appear to be working scanpy	1	1152	December 31, 2022

How to interpret Diffusion Maps?

Related topics