RBF Kernel
   HOME

TheInfoList



OR:

In
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
, the
radial basis function A radial basis function (RBF) is a real-valued function \varphi whose value depends only on the distance between the input and some fixed point, either the origin, so that \varphi(\mathbf) = \hat\varphi(\left\, \mathbf\right\, ), or some other fixed ...
kernel, or RBF kernel, is a popular
kernel function In operator theory, a branch of mathematics, a positive-definite kernel is a generalization of a positive-definite function or a positive-definite matrix. It was first introduced by James Mercer in the early 20th century, in the context of solving ...
used in various kernelized learning algorithms. In particular, it is commonly used in
support vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratorie ...
classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood. Classification is the grouping of related facts into classes. It may also refer to: Business, organizat ...
. The RBF kernel on two samples \mathbf\in \mathbb^ and x', represented as feature vectors in some ''input space'', is defined asJean-Philippe Vert, Koji Tsuda, and Bernhard Schölkopf (2004)
"A primer on kernel methods".
''Kernel Methods in Computational Biology''.
:K(\mathbf, \mathbf) = \exp\left(-\frac\right) \textstyle\, \mathbf - \mathbf\, ^2 may be recognized as the
squared Euclidean distance In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefore oc ...
between the two feature vectors. \sigma is a free parameter. An equivalent definition involves a parameter \textstyle\gamma = \tfrac: :K(\mathbf, \mathbf) = \exp(-\gamma\, \mathbf - \mathbf\, ^2) Since the value of the RBF kernel decreases with distance and ranges between zero (in the limit) and one (when ), it has a ready interpretation as a
similarity measure In statistics and related fields, a similarity measure or similarity function or similarity metric is a real-valued function that quantifies the similarity between two objects. Although no single definition of a similarity exists, usually such meas ...
. The
feature space In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon. Choosing informative, discriminating and independent features is a crucial element of effective algorithms in pattern r ...
of the kernel has an infinite number of dimensions; for \sigma = 1, its expansion using the
multinomial theorem In mathematics, the multinomial theorem describes how to expand a power of a sum in terms of powers of the terms in that sum. It is the generalization of the binomial theorem from binomials to multinomials. Theorem For any positive integer an ...
is: : \begin \exp\left(-\frac\, \mathbf - \mathbf\, ^2\right) &= \exp(\frac\mathbf^\top \mathbf - \frac\, \mathbf\, ^2 - \frac\, \mathbf\, ^2)\\ &= \exp(\mathbf^\top \mathbf) \exp( - \frac\, \mathbf\, ^2) \exp( - \frac\, \mathbf\, ^2) \\ &= \sum_^\infty \frac \exp\left(-\frac\, \mathbf\, ^2\right) \exp\left(-\frac\, \mathbf\, ^2\right)\\ &= \sum_^\infty \quad \sum_ \exp\left(-\frac\, \mathbf\, ^2\right) \frac \exp\left(-\frac\, \mathbf\, ^2\right) \frac \\ &=\langle \varphi(\mathbf), \varphi(\mathbf) \rangle \end : \varphi(\mathbf) = \exp\left(-\frac\, \mathbf\, ^2\right) \left(a^_,a^_1,\dots,a^_,\dots,a^_,\dots,a^_,\dots \right ) where l_j=\tbinom , : a^_=\frac \quad, \quad n_1+n_2+\dots+n_k = j \wedge 1\leq l\leq l_j


Approximations

Because support vector machines and other models employing the
kernel trick In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). The general task of pattern analysis is to find and study general types of relations (for example ...
do not scale well to large numbers of training samples or large numbers of features in the input space, several approximations to the RBF kernel (and similar kernels) have been introduced. Typically, these take the form of a function ''z'' that maps a single vector to a vector of higher dimensionality, approximating the kernel: :\langle z(\mathbf), z(\mathbf) \rangle \approx \langle \varphi(\mathbf), \varphi(\mathbf) \rangle = K(\mathbf, \mathbf) where \textstyle\varphi is the implicit mapping embedded in the RBF kernel. One way to construct such a ''z'' is to randomly sample from the
Fourier transformation A Fourier transform (FT) is a mathematical transform that decomposes functions into frequency components, which are represented by the output of the transform as a function of frequency. Most commonly functions of time or space are transformed, ...
of the kernel. Another approach uses the
Nyström method In mathematics numerical analysis, the Nyström method or quadrature method seeks the numerical solution of an integral equation by replacing the integral with a representative weighted sum. The continuous problem is broken into n discrete interv ...
to approximate the
eigendecomposition In linear algebra, eigendecomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. When the matri ...
of the
Gram matrix In linear algebra, the Gram matrix (or Gramian matrix, Gramian) of a set of vectors v_1,\dots, v_n in an inner product space is the Hermitian matrix of inner products, whose entries are given by the inner product G_ = \left\langle v_i, v_j \right\r ...
''K'', using only a random sample of the training set.


See also

*
Gaussian function In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the base form f(x) = \exp (-x^2) and with parametric extension f(x) = a \exp\left( -\frac \right) for arbitrary real constants , and non-zero . It is n ...
*
Kernel (statistics) The term kernel is used in statistical analysis to refer to a window function. The term "kernel" has several distinct meanings in different branches of statistics. Bayesian statistics In statistics, especially in Bayesian statistics, the kernel ...
*
Polynomial kernel In machine learning, the polynomial kernel is a kernel function commonly used with support vector machines (SVMs) and other kernelized models, that represents the similarity of vectors (training samples) in a feature space over polynomials of th ...
*
Radial basis function A radial basis function (RBF) is a real-valued function \varphi whose value depends only on the distance between the input and some fixed point, either the origin, so that \varphi(\mathbf) = \hat\varphi(\left\, \mathbf\right\, ), or some other fixed ...
*
Radial basis function network In the field of mathematical modeling, a radial basis function network is an artificial neural network that uses radial basis functions as activation functions. The output of the network is a linear combination of radial basis functions of the inp ...
*
Obst Kernel network Obst is a German language surname, which means "fruit". It may refer to: * Alan Obst (born 1987), Australian football player * Andreas Obst (born 1996), German basketball player *Andrew Obst (born 1964), Australian football player *Chris Obst (bor ...


References

{{reflist, 30em Kernel methods for machine learning Support vector machines