Diffusion maps is a

dimensionality reduction Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally ...

feature extraction Feature may refer to: Computing * Feature recognition, could be a hole, pocket, or notch * Feature (computer vision), could be an edge, corner or blob * Feature (machine learning), in statistics: individual measurable properties of the phenome ...

algorithm introduced by Coifman and Lafon which computes a family of

embedding In mathematics, an embedding (or imbedding) is one instance of some mathematical structure contained within another instance, such as a group (mathematics), group that is a subgroup. When some object X is said to be embedded in another object Y ...

s of a data set into Euclidean space (often low-dimensional) whose coordinates can be computed from the eigenvectors and eigenvalues of a diffusion operator on the data. The Euclidean distance between points in the embedded space is equal to the "diffusion distance" between probability distributions centered at those points. Different from linear dimensionality reduction methods such as

principal component analysis Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing. The data is linearly transformed onto a new coordinate system such that th ...

(PCA), diffusion maps are part of the family of

nonlinear dimensionality reduction Nonlinear dimensionality reduction, also known as manifold learning, is any of various related techniques that aim to project high-dimensional data, potentially existing across non-linear manifolds which cannot be adequately captured by linear de ...

methods which focus on discovering the underlying

manifold In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point. More precisely, an n-dimensional manifold, or ''n-manifold'' for short, is a topological space with the property that each point has a N ...

that the data has been sampled from. By integrating local similarities at different scales, diffusion maps give a global description of the data-set. Compared with other methods, the diffusion map algorithm is robust to noise perturbation and computationally inexpensive.

Definition of diffusion maps

Following and, diffusion maps can be defined in four steps.

Connectivity

Diffusion maps exploit the relationship between

heat diffusion In mathematics and physics (more specifically thermodynamics), the heat equation is a parabolic partial differential equation. The theory of the heat equation was first developed by Joseph Fourier in 1822 for the purpose of modeling how a quanti ...

and

random walk In mathematics, a random walk, sometimes known as a drunkard's walk, is a stochastic process that describes a path that consists of a succession of random steps on some Space (mathematics), mathematical space. An elementary example of a rand ...

Markov chain In probability theory and statistics, a Markov chain or Markov process is a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally ...

. The basic observation is that if we take a random walk on the data, walking to a nearby data-point is more likely than walking to another that is far away. Let

(X, \mathcal, \mu)

be a

measure space A measure space is a basic object of measure theory, a branch of mathematics that studies generalized notions of volumes. It contains an underlying set, the subsets of this set that are feasible for measuring (the -algebra) and the method that ...

, where

X

is the data set and

\mu

represents the distribution of the points on

X

. Based on this, the connectivity

k

between two data points,

x

and

y

, can be defined as the probability of walking from

x

y

in one step of the random walk. Usually, this probability is specified in terms of a kernel function of the two points:

k: X \times X \rightarrow \mathbb

. For example, the popular Gaussian kernel: :

k(x,y)=\exp\left(-\frac\right)

More generally, the kernel function has the following properties :

k(x,y) = k(y,x)

(

k

is symmetric) :

k(x,y) \geq 0 \,\,\forall x,y

(

k

is positivity preserving). The kernel constitutes the prior definition of the ''local'' geometry of the data-set. Since a given kernel will capture a specific feature of the data set, its choice should be guided by the application that one has in mind. This is a major difference with methods such as

, where correlations between all data points are taken into account at once. Given

(X, k)

, we can then construct a reversible discrete-time Markov chain on

X

(a process known as the normalized graph Laplacian construction): :

d(x) = \int_X k(x,y) d\mu(y)

and define: :

p(x,y) = \frac

Although the new normalized kernel does not inherit the symmetric property, it does inherit the positivity-preserving property and gains a conservation property: :

\int_X p(x,y) d\mu(y) = 1

Diffusion process

From

p(x,y)

we can construct a transition matrix of a Markov chain (

M

) on

X

. In other words,

p(x,y)

represents the one-step transition probability from

x

y

, and

M^t

gives the t-step transition matrix. We define the diffusion matrix

L

(it is also a version of graph

Laplacian matrix In the mathematical field of graph theory, the Laplacian matrix, also called the graph Laplacian, admittance matrix, Kirchhoff matrix, or discrete Laplacian, is a matrix representation of a graph. Named after Pierre-Simon Laplace, the graph Lap ...

) :

L_=k(x_i,x_j) \,

We then define the new kernel :

L^_= k^(x_i,x_j) =\frac \,

or equivalently, :

L^ = D^ L D^ \,

where D is a diagonal matrix and

D_ = \sum_j L_.

We apply the graph Laplacian normalization to this new kernel: :

M=(^)^L^, \,

where

D^

is a diagonal matrix and

^_ = \sum_j L^_.

p(x_j,t, x_i)=M^t_ \,

One of the main ideas of the diffusion framework is that running the chain forward in time (taking larger and larger powers of

M

) reveals the geometric structure of

X

at larger and larger scales (the diffusion process). Specifically, the notion of a ''cluster'' in the data set is quantified as a region in which the probability of escaping this region is low (within a certain time t). Therefore, t not only serves as a time parameter, but it also has the dual role of scale parameter. The eigendecomposition of the matrix

M^t

yields :

M^t_ = \sum_l \lambda_l^t \psi_l(x_i)\phi_l(x_j) \,

where

\

is the sequence of eigenvalues of

M

and

\

and

\

are the biorthogonal left and right eigenvectors respectively. Due to the spectrum decay of the eigenvalues, only a few terms are necessary to achieve a given relative accuracy in this sum.

Parameter α and the diffusion operator

The reason to introduce the normalization step involving

\alpha

is to tune the influence of the data point density on the infinitesimal transition of the diffusion. In some applications, the sampling of the data is generally not related to the geometry of the manifold we are interested in describing. In this case, we can set

\alpha=1

and the diffusion operator approximates the

Laplace–Beltrami operator In differential geometry, the Laplace–Beltrami operator is a generalization of the Laplace operator to functions defined on submanifolds in Euclidean space and, even more generally, on Riemannian and pseudo-Riemannian manifolds. It is named aft ...

. We then recover the Riemannian geometry of the data set regardless of the distribution of the points. To describe the long-term behavior of the point distribution of a system of stochastic differential equations, we can use

\alpha=0.5

and the resulting Markov chain approximates the Fokker–Planck diffusion. With

\alpha=0

, it reduces to the classical graph Laplacian normalization.

Diffusion distance

The diffusion distance at time

t

between two points can be measured as the similarity of two points in the observation space with the connectivity between them. It is given by :

D_(x_i,x_j)^2 =\sum_y \frac

where

\phi_0(y)

is the stationary distribution of the Markov chain, given by the first left eigenvector of

M

. Explicitly: :

\phi_0(y) = \frac

Intuitively,

D_t(x_i,x_j)

is small if there is a large number of short paths connecting

x_i

and

x_j

. There are several interesting features associated with the diffusion distance, based on our previous discussion that

t

also serves as a scale parameter: # Points are closer at a given scale (as specified by

D_t(x_i,x_j)

) if they are highly connected in the graph, therefore emphasizing the concept of a cluster. # This distance is robust to noise, since the distance between two points depends on all possible paths of length

t

between the points. # From a machine learning point of view, the distance takes into account all evidences linking

x_i

x_j

, allowing us to conclude that this distance is appropriate for the design of inference algorithms based on the majority of preponderance.

Diffusion process and low-dimensional embedding

The diffusion distance can be calculated using the eigenvectors by :

D_t(x_i,x_j)^2=\sum_l \lambda_l^ (\psi_l(x_i)-\psi_l(x_j))^2 \,

So the eigenvectors can be used as a new set of coordinates for the data. The diffusion map is defined as: :

\Psi_t(x)=(\lambda_1^t\psi_1(x),\lambda_2^t\psi_2(x),\ldots,\lambda_k^t\psi_k(x))

Because of the spectrum decay, it is sufficient to use only the first ''k'' eigenvectors and eigenvalues. Thus we get the diffusion map from the original data to a ''k''-dimensional space which is embedded in the original space. In it is proved that :

D_t(x_i,x_j)^2 \approx, , \Psi_t(x_i)-\Psi_t(x_j), , ^2 \,

so the Euclidean distance in the diffusion coordinates approximates the diffusion distance.

Algorithm

The basic algorithm framework of diffusion map is as: Step 1. Given the similarity matrix ''L''. Step 2. Normalize the matrix according to parameter

\alpha

L^ = D^ L D^

. Step 3. Form the normalized matrix

M=(^)^L^

. Step 4. Compute the ''k'' largest eigenvalues of

M^t

and the corresponding eigenvectors. Step 5. Use diffusion map to get the embedding

\Psi_t

Application

In the paper Nadler et al. showed how to design a kernel that reproduces the diffusion induced by a

Fokker–Planck equation In statistical mechanics and information theory, the Fokker–Planck equation is a partial differential equation that describes the time evolution of the probability density function of the velocity of a particle under the influence of drag (physi ...

. They also explained that, when the data approximate a manifold, one can recover the geometry of this manifold by computing an approximation of the

. This computation is completely insensitive to the distribution of the points and therefore provides a separation of the statistics and the geometry of the data. Since diffusion maps give a global description of the data-set, they can measure the distances between pairs of sample points in the manifold in which the data is embedded. Applications based on diffusion maps include

face recognition A facial recognition system is a technology potentially capable of matching a human face from a digital image or a Film frame, video frame against a database of faces. Such a system is typically employed to authenticate users through ID verif ...

spectral clustering In multivariate statistics, spectral clustering techniques make use of the spectrum (eigenvalues) of the similarity matrix of the data to perform dimensionality reduction before clustering in fewer dimensions. The similarity matrix is provided ...

, low dimensional representation of images, image segmentation, 3D model segmentation, speaker verification and identification, sampling on manifolds, anomaly detection, image inpainting, revealing brain resting state networks organization and so on. Furthermore, the diffusion maps framework has been productively extended to

complex networks Complex Networks is an American media and entertainment company for youth culture, based in New York City. It was founded as a bi-monthly magazine, ''Complex'', by fashion designer Marc Eckō. Complex Networks reports on popular and emerging ...

, revealing a functional organisation of networks which differs from the purely topological or structural one.

References

{{Reflist, 30em, refs= {{cite journal , last = Nadler , first = Boaz , author2 = Stéphane Lafon , author3 = Ronald R. Coifman , author4 = Ioannis G. Kevrekidis , year = 2005 , title = Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker–Planck Operators , url = http://www.wisdom.weizmann.ac.il/~nadler/Publications/dm_nips05.pdf , journal = Advances in Neural Information Processing Systems , volume = 18 , bibcode = 2005math......6090N , arxiv = math/0506090 {{cite journal , last = Zeev , first = Farbman , author2 = Fattal Raanan , author3 = Lischinski Dani , year = 2010 , title = Diffusion maps for edge-aware image editing , journal = ACM Trans. Graph. , volume = 29 , issue = 6 , pages = 145:1–145:10 , doi = 10.1145/1882261.1866171 {{cite journal , last = Coifman , first = R.R. , author2 = Lafon, S , author3 = Lee, A B , author4 = Maggioni, M , author5 = Nadler, B , author6 = Warner, F , author7 = Zucker, S W , year = 2005 , title = Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps , journal = PNAS , doi = 10.1073/pnas.0500334102 , volume = 102 , issue = 21 , pages = 7426–7431 , pmid=15899970 , pmc=1140422 , bibcode = 2005PNAS..102.7426C , doi-access = free {{cite journal , last = Coifman , first = R.R. , author2 = Lafon, S , author3 = Lee, A B , author4 = Maggioni, M , author5 = Nadler, B , author6 = Warner, F , author7 = Zucker, S W , year = 2005 , title = Geometric diffusions as a tool for harmonic analysis and structure definition of data: Multiscale methods , journal = PNAS , doi = 10.1073/pnas.0500896102 , volume = 102 , issue = 21 , pages = 7432–7437 , pmid=15899969 , pmc=1140426 , bibcode = 2005PNAS..102.7432C , doi-access = free {{cite journal , last = Coifman , first = R.R. , author2 = S. Lafon. , year = 2006 , title = Diffusion maps , journal = Applied and Computational Harmonic Analysis , doi = 10.1016/j.acha.2006.04.006 , volume = 21 , pages = 5–30 , doi-access= , s2cid = 17160669 {{cite thesis , last = Lafon , first = S.S. , year = 2004 , title = Diffusion Maps and Geometric Harmonics , publisher = Yale University , type = PhD , url = https://690cd646-a-62cb3a1a-s-sites.googlegroups.com/site/stefansresearchpapers/home/dissertation.pdf {{cite journal , last1 = De la Porte , first1 = J. , last2 = Herbst , first2 = B M , last3 = Hereman , first3 = W , last4 = Van der Walt , first4 = S J , year = 2008 , title = An Introduction to Diffusion Maps , journal = Proceedings of the Nineteenth Annual Symposium of the Pattern Recognition Association of South Africa (PRASA) , citeseerx = 10.1.1.309.674 {{cite conference , last1 = Oana , first1 = Sidi , first2 = Oliver , last2 = van Kaick , first3 = Yanir , last3 = Kleiman , first4 = Hao , last4 = Zhang , first5 = Daniel , last5 = Cohen-Or , year = 2011 , title = Unsupervised Co-Segmentation of a Set of Shapes via Descriptor-Space Spectral Clustering , journal = ACM Transactions on Graphics , url = https://www.cs.sfu.ca/~haoz/pubs/sidi_siga11_coseg.pdf {{cite conference , last1 = Michalevsky , first1 = Yan , first2 = Ronen , last2 = Talmon , first3 = Israel , last3 = Cohen , year = 2011 , title = Speaker Identification Using Diffusion Maps , conference = European signal processing conference 2011 , url = http://www.eurasip.org/Proceedings/Eusipco/Eusipco2011/papers/1569414913.pdf {{cite journal , last1 = Mishne , first1 = Gal , first2 = Israel , last2 = Cohen , year = 2013 , title = Multiscale Anomaly Detection Using Diffusion Maps , journal = IEEE Journal of Selected Topics in Signal Processing , volume = 7 , issue = 1 , pages = 111–123 , doi = 10.1109/jstsp.2012.2232279 , bibcode = 2013ISTSP...7..111M , s2cid = 1954466 {{cite journal , last1 = Gepshtein , first1 = Shai , first2 = Yosi , last2 = Keller , year = 2013 , title = Image Completion by Diffusion Maps and Spectral Relaxation , journal = IEEE Transactions on Image Processing , volume = 22 , issue = 8 , pages = 2983–2994 , doi = 10.1109/tip.2013.2237916 , pmid = 23322762 , bibcode = 2013ITIP...22.2983G , s2cid = 14375333 Machine learning algorithms

Definition of diffusion maps

Connectivity

Diffusion process

Parameter α and the diffusion operator

Diffusion distance

Diffusion process and low-dimensional embedding

Algorithm

Application

See also

References