machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

, the

radial basis function A radial basis function (RBF) is a real-valued function \varphi whose value depends only on the distance between the input and some fixed point, either the origin, so that \varphi(\mathbf) = \hat\varphi(\left\, \mathbf\right\, ), or some other fixed ...

kernel, or RBF kernel, is a popular

kernel function In operator theory, a branch of mathematics, a positive-definite kernel is a generalization of a positive-definite function or a positive-definite matrix. It was first introduced by James Mercer in the early 20th century, in the context of solving ...

used in various kernelized learning algorithms. In particular, it is commonly used in

support vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratories ...

classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood. Classification is the grouping of related facts into classes. It may also refer to: Business, organizat ...

. The RBF kernel on two samples

\mathbf\in \mathbb^

and x', represented as feature vectors in some ''input space'', is defined asJean-Philippe Vert, Koji Tsuda, and Bernhard Schölkopf (2004)
"A primer on kernel methods".
''Kernel Methods in Computational Biology''. :

K(\mathbf, \mathbf) = \exp\left(-\frac\right)

\textstyle\, \mathbf - \mathbf\, ^2

may be recognized as the

squared Euclidean distance In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefore o ...

between the two feature vectors.

\sigma

is a free parameter. An equivalent definition involves a parameter

\textstyle\gamma = \tfrac

: :

K(\mathbf, \mathbf) = \exp(-\gamma\, \mathbf - \mathbf\, ^2)

Since the value of the RBF kernel decreases with distance and ranges between zero (in the limit) and one (when ), it has a ready interpretation as a

similarity measure In statistics and related fields, a similarity measure or similarity function or similarity metric is a real-valued function that quantifies the similarity between two objects. Although no single definition of a similarity exists, usually such mea ...

. The

feature space In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon. Choosing informative, discriminating and independent features is a crucial element of effective algorithms in pattern r ...

of the kernel has an infinite number of dimensions; for

\sigma = 1

, its expansion using the

multinomial theorem In mathematics, the multinomial theorem describes how to expand a power of a sum in terms of powers of the terms in that sum. It is the generalization of the binomial theorem from binomials to multinomials. Theorem For any positive integer ...

is: :

\begin 
\exp\left(-\frac\, \mathbf - \mathbf\, ^2\right)
&= \exp(\frac\mathbf^\top \mathbf - \frac\, \mathbf\, ^2  - \frac\, \mathbf\, ^2)\\
&= \exp(\mathbf^\top \mathbf) \exp( - \frac\, \mathbf\, ^2) \exp( - \frac\, \mathbf\, ^2) \\
&= \sum_^\infty \frac \exp\left(-\frac\, \mathbf\, ^2\right) \exp\left(-\frac\, \mathbf\, ^2\right)\\
&= \sum_^\infty \quad \sum_ 
\exp\left(-\frac\, \mathbf\, ^2\right) 
\frac 
\exp\left(-\frac\, \mathbf\, ^2\right) 
\frac \\
&=\langle \varphi(\mathbf), \varphi(\mathbf) \rangle
\end

\varphi(\mathbf)
=
\exp\left(-\frac\, \mathbf\, ^2\right)
\left(a^_,a^_1,\dots,a^_,\dots,a^_,\dots,a^_,\dots \right )

where

l_j=\tbinom

, :

a^_=\frac \quad, \quad n_1+n_2+\dots+n_k = j \wedge 1\leq l\leq l_j

Approximations

Because support vector machines and other models employing the

kernel trick In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). The general task of pattern analysis is to find and study general types of relations (for example c ...

do not scale well to large numbers of training samples or large numbers of features in the input space, several approximations to the RBF kernel (and similar kernels) have been introduced. Typically, these take the form of a function ''z'' that maps a single vector to a vector of higher dimensionality, approximating the kernel: :

\langle z(\mathbf), z(\mathbf) \rangle \approx \langle \varphi(\mathbf), \varphi(\mathbf) \rangle = K(\mathbf, \mathbf)

where

\textstyle\varphi

is the implicit mapping embedded in the RBF kernel. One way to construct such a ''z'' is to randomly sample from the

Fourier transformation A Fourier transform (FT) is a mathematical transform that decomposes functions into frequency components, which are represented by the output of the transform as a function of frequency. Most commonly functions of time or space are transformed, ...

of the kernel. Another approach uses the

Nyström method In mathematics numerical analysis, the Nyström method or quadrature method seeks the numerical solution of an integral equation by replacing the integral with a representative weighted sum. The continuous problem is broken into n discrete interv ...

to approximate the

eigendecomposition In linear algebra, eigendecomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. When the matr ...

of the

Gram matrix In linear algebra, the Gram matrix (or Gramian matrix, Gramian) of a set of vectors v_1,\dots, v_n in an inner product space is the Hermitian matrix of inner products, whose entries are given by the inner product G_ = \left\langle v_i, v_j \right\r ...

''K'', using only a random sample of the training set.

References

{{reflist, 30em Kernel methods for machine learning Support vector machines

Approximations

See also

References