In
operator theory
In mathematics, operator theory is the study of linear operators on function spaces, beginning with differential operators and integral operators. The operators may be presented abstractly by their characteristics, such as bounded linear operat ...
, a branch of mathematics, a positive-definite kernel is a generalization of a
positive-definite function
In mathematics, a positive-definite function is, depending on the context, either of two types of function.
Most common usage
A ''positive-definite function'' of a real variable ''x'' is a complex-valued function f: \mathbb \to \mathbb such th ...
or a
positive-definite matrix
In mathematics, a symmetric matrix M with real entries is positive-definite if the real number z^\textsfMz is positive for every nonzero real column vector z, where z^\textsf is the transpose of More generally, a Hermitian matrix (that is, a co ...
. It was first introduced by
James Mercer in the early 20th century, in the context of solving
integral operator equations. Since then, positive-definite functions and their various analogues and generalizations have arisen in diverse parts of mathematics. They occur naturally in
Fourier analysis
In mathematics, Fourier analysis () is the study of the way general functions may be represented or approximated by sums of simpler trigonometric functions. Fourier analysis grew from the study of Fourier series, and is named after Josep ...
,
probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
,
operator theory
In mathematics, operator theory is the study of linear operators on function spaces, beginning with differential operators and integral operators. The operators may be presented abstractly by their characteristics, such as bounded linear operat ...
,
complex function-theory,
moment problem
In mathematics, a moment problem arises as the result of trying to invert the mapping that takes a measure ''μ'' to the sequences of moments
:m_n = \int_^\infty x^n \,d\mu(x)\,.
More generally, one may consider
:m_n = \int_^\infty M_n(x) ...
s,
integral equation
In mathematics, integral equations are equations in which an unknown Function (mathematics), function appears under an integral sign. In mathematical notation, integral equations may thus be expressed as being of the form: f(x_1,x_2,x_3,...,x_n ; ...
s,
boundary-value problems for
partial differential equation
In mathematics, a partial differential equation (PDE) is an equation which imposes relations between the various partial derivatives of a Multivariable calculus, multivariable function.
The function is often thought of as an "unknown" to be sol ...
s,
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
,
embedding problem In Galois theory, a branch of mathematics, the embedding problem is a generalization of the inverse Galois problem. Roughly speaking, it asks whether a given Galois extension can be embedded into a Galois extension in such a way that the Restriction ...
,
information theory
Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist a ...
, and other areas.
This article will discuss some of the historical and current developments of the theory of positive-definite kernels, starting with the general idea and properties before considering practical applications.
Definition
Let
be a nonempty set, sometimes referred to as the index set. A
symmetric function
In mathematics, a function of n variables is symmetric if its value is the same no matter the order of its arguments. For example, a function f\left(x_1,x_2\right) of two arguments is a symmetric function if and only if f\left(x_1,x_2\right) = f\l ...
is called a positive-definite (p.d.) kernel on
if
holds for any
, given
.
In probability theory, a distinction is sometimes made between positive-definite kernels, for which equality in (1.1) implies
, and positive semi-definite (p.s.d.) kernels, which do not impose this condition. Note that this is equivalent to requiring that any finite matrix constructed by pairwise evaluation,
, has either entirely positive (p.d.) or nonnegative (p.s.d.)
eigenvalues
In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...
.
In mathematical literature, kernels are usually complex valued functions, but in this article we assume real-valued functions, which is the common practice in applications of p.d. kernels.
Some general properties
* For a family of p.d. kernels
** The conical sum
is p.d., given
** The product
is p.d., given
** The limit
is p.d. if the limit exists.
* If
is a sequence of sets, and
a sequence of p.d. kernels, then both
and
are p.d. kernels on
.
* Let
. Then the restriction
of
to
is also a p.d. kernel.
Examples of p.d. kernels
* Common examples of p.d. kernels defined on Euclidean space
include:
** Linear kernel:
.
**
Polynomial kernel
In machine learning, the polynomial kernel is a kernel function commonly used with support vector machines (SVMs) and other kernelized models, that represents the similarity of vectors (training samples) in a feature space over polynomials of th ...
:
.
** Gaussian kernel (
RBF kernel In machine learning, the radial basis function kernel, or RBF kernel, is a popular kernel function used in various kernelized learning algorithms. In particular, it is commonly used in support vector machine classification.
The RBF kernel on two s ...
):
.
** Laplacian kernel:
.
** Abel kernel:
.
** kernel generating
Sobolev space
In mathematics, a Sobolev space is a vector space of functions equipped with a norm that is a combination of ''Lp''-norms of the function together with its derivatives up to a given order. The derivatives are understood in a suitable weak sense t ...
s
:
, where
is the
Bessel function of the third kind.
** kernel generating Paley–Wiener space:
.
* If
is a
Hilbert space
In mathematics, Hilbert spaces (named after David Hilbert) allow generalizing the methods of linear algebra and calculus from (finite-dimensional) Euclidean vector spaces to spaces that may be infinite-dimensional. Hilbert spaces arise natural ...
, then its corresponding inner product
is a p.d. kernel. Indeed, we have
* Kernels defined on
and histograms: Histograms are frequently encountered in applications of real-life problems. Most observations are usually available under the form of nonnegative vectors of counts, which, if normalized, yield histograms of frequencies. It has been shown that the following family of squared metrics, respectively Jensen divergence, the
-square, Total Variation, and two variations of the Hellinger distance:
can be used to define p.d. kernels using the following formula
History
Positive-definite kernels, as defined in (1.1), appeared first in 1909 in a paper on integral equations by James Mercer. Several other authors made use of this concept in the following two decades, but none of them explicitly used kernels
, i.e. p.d. functions (indeed M. Mathias and
S. Bochner seem not to have been aware of the study of p.d. kernels). Mercer’s work arose from Hilbert’s paper of 1904 on
Fredholm integral equation In mathematics, the Fredholm integral equation is an integral equation whose solution gives rise to Fredholm theory, the study of Fredholm kernels and Fredholm operators. The integral equation was studied by Ivar Fredholm. A useful method to solve ...
s of the second kind:
In particular, Hilbert had shown that
where
is a continuous real symmetric kernel,
is continuous,
is a complete system of
orthonormal eigenfunctions, and
’s are the corresponding
eigenvalues
In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...
of (1.2). Hilbert defined a “definite” kernel as one for which the double integral
satisfies
except for
. The original object of Mercer’s paper was to characterize the kernels which are definite in the sense of Hilbert, but Mercer soon found that the class of such functions was too restrictive to characterize in terms of determinants. He therefore defined a continuous real symmetric kernel
to be of positive type (i.e. positive-definite) if
for all real continuous functions
on