In
operator theory
In mathematics, operator theory is the study of linear operators on function spaces, beginning with differential operators and integral operators. The operators may be presented abstractly by their characteristics, such as bounded linear operato ...
, a branch of mathematics, a positive-definite kernel is a generalization of a
positive-definite function
In mathematics, a positive-definite function is, depending on the context, either of two types of function.
Definition 1
Let \mathbb be the set of real numbers and \mathbb be the set of complex numbers.
A function f: \mathbb \to \mathbb is ...
or a
positive-definite matrix
In mathematics, a symmetric matrix M with real entries is positive-definite if the real number \mathbf^\mathsf M \mathbf is positive for every nonzero real column vector \mathbf, where \mathbf^\mathsf is the row vector transpose of \mathbf.
Mo ...
. It was first introduced by
James Mercer in the early 20th century, in the context of solving
integral operator equations. Since then, positive-definite functions and their various analogues and generalizations have arisen in diverse parts of mathematics. They occur naturally in
Fourier analysis
In mathematics, Fourier analysis () is the study of the way general functions may be represented or approximated by sums of simpler trigonometric functions. Fourier analysis grew from the study of Fourier series, and is named after Joseph Fo ...
,
probability theory
Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...
,
operator theory
In mathematics, operator theory is the study of linear operators on function spaces, beginning with differential operators and integral operators. The operators may be presented abstractly by their characteristics, such as bounded linear operato ...
,
complex function-theory,
moment problem
In mathematics, a moment problem arises as the result of trying to invert the mapping that takes a measure \mu to the sequence of moments
:m_n = \int_^\infty x^n \,d\mu(x)\,.
More generally, one may consider
:m_n = \int_^\infty M_n(x) \,d\mu( ...
s,
integral equation
In mathematical analysis, integral equations are equations in which an unknown function appears under an integral sign. In mathematical notation, integral equations may thus be expressed as being of the form: f(x_1,x_2,x_3,\ldots,x_n ; u(x_1,x_2 ...
s,
boundary-value problems for
partial differential equation
In mathematics, a partial differential equation (PDE) is an equation which involves a multivariable function and one or more of its partial derivatives.
The function is often thought of as an "unknown" that solves the equation, similar to ho ...
s,
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
,
embedding problem,
information theory
Information theory is the mathematical study of the quantification (science), quantification, Data storage, storage, and telecommunications, communication of information. The field was established and formalized by Claude Shannon in the 1940s, ...
, and other areas.
Definition
Let
be a nonempty set, sometimes referred to as the index set. A
symmetric function
In mathematics, a function of n variables is symmetric if its value is the same no matter the order of its arguments. For example, a function f\left(x_1,x_2\right) of two arguments is a symmetric function if and only if f\left(x_1,x_2\right) = f\ ...
is called a positive-definite (p.d.) kernel on
if
holds for all
,
.
In probability theory, a distinction is sometimes made between positive-definite kernels, for which equality in (1.1) implies
, and positive semi-definite (p.s.d.) kernels, which do not impose this condition. Note that this is equivalent to requiring that every finite matrix constructed by pairwise evaluation,
, has either entirely positive (p.d.) or nonnegative (p.s.d.)
eigenvalues
In linear algebra, an eigenvector ( ) or characteristic vector is a vector that has its direction unchanged (or reversed) by a given linear transformation. More precisely, an eigenvector \mathbf v of a linear transformation T is scaled by a ...
.
In mathematical literature, kernels are usually complex-valued functions. That is, a complex-valued function
is called a Hermitian kernel if
and positive definite if for every finite set of points
and any complex numbers
,
:
where
denotes the
complex conjugate
In mathematics, the complex conjugate of a complex number is the number with an equal real part and an imaginary part equal in magnitude but opposite in sign. That is, if a and b are real numbers, then the complex conjugate of a + bi is a - ...
. In the rest of this article we assume real-valued functions, which is the common practice in applications of p.d. kernels.
Some general properties
* For a family of p.d. kernels
** The conical sum
is p.d., given
** The product
is p.d., given
** The limit
is p.d. if the limit exists.
* If
is a sequence of sets, and
a sequence of p.d. kernels, then both
and
are p.d. kernels on
.
* Let
. Then the restriction
of
to
is also a p.d. kernel.
Examples of p.d. kernels
* Common examples of p.d. kernels defined on Euclidean space
include:
** Linear kernel:
.
**
Polynomial kernel:
.
**
Gaussian kernel
In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the base form
f(x) = \exp (-x^2)
and with parametric extension
f(x) = a \exp\left( -\frac \right)
for arbitrary real constants , and non-zero . It is ...
(
RBF kernel):
.
** Laplacian kernel:
.
** Abel kernel:
.
** Kernel generating
Sobolev space
In mathematics, a Sobolev space is a vector space of functions equipped with a norm that is a combination of ''Lp''-norms of the function together with its derivatives up to a given order. The derivatives are understood in a suitable weak sense ...
s
:
, where
is the
Bessel function of the third kind.
** Kernel generating Paley–Wiener space:
.
* If
is a
Hilbert space
In mathematics, a Hilbert space is a real number, real or complex number, complex inner product space that is also a complete metric space with respect to the metric induced by the inner product. It generalizes the notion of Euclidean space. The ...
, then its corresponding inner product
is a p.d. kernel. Indeed, we have
* Kernels defined on
and histograms: Histograms are frequently encountered in applications of real-life problems. Most observations are usually available under the form of nonnegative vectors of counts, which, if normalized, yield histograms of frequencies. It has been shown that the following family of squared metrics, respectively Jensen divergence, the
-square, Total Variation, and two variations of the Hellinger distance:
can be used to define p.d. kernels using the following formula
Examples of other kernels
The sigmoid kernel, or hyperbolic tangent kernel, is defined as
where
are real parameters. The kernel is not PD, but has been sometimes used for kernel algorithms.
History
Positive-definite kernels, as defined in (1.1), appeared first in 1909 in a paper on integral equations by James Mercer. Several other authors made use of this concept in the following two decades, but none of them explicitly used kernels
, i.e. p.d. functions (indeed M. Mathias and
S. Bochner seem not to have been aware of the study of p.d. kernels). Mercer’s work arose from Hilbert’s paper of 1904 on
Fredholm integral equation
In mathematics, the Fredholm integral equation is an integral equation whose solution gives rise to Fredholm theory, the study of Fredholm kernels and Fredholm operators. The integral equation was studied by Ivar Fredholm. A useful method to ...
s of the second kind:
In particular, Hilbert had shown that
where
is a continuous real symmetric kernel,
is continuous,
is a complete system of
orthonormal eigenfunctions, and
’s are the corresponding
eigenvalues
In linear algebra, an eigenvector ( ) or characteristic vector is a vector that has its direction unchanged (or reversed) by a given linear transformation. More precisely, an eigenvector \mathbf v of a linear transformation T is scaled by a ...
of (1.2). Hilbert defined a “definite” kernel as one for which the double integral
satisfies
except for
. The original object of Mercer’s paper was to characterize the kernels which are definite in the sense of Hilbert, but Mercer soon found that the class of such functions was too restrictive to characterize in terms of determinants. He therefore defined a continuous real symmetric kernel
to be of positive type (i.e. positive-definite) if
for all real continuous functions
on