In
operator theory
In mathematics, operator theory is the study of linear operators on function spaces, beginning with differential operators and integral operators. The operators may be presented abstractly by their characteristics, such as bounded linear oper ...
, a branch of mathematics, a positive-definite kernel is a generalization of a
positive-definite function
In mathematics, a positive-definite function is, depending on the context, either of two types of function.
Most common usage
A ''positive-definite function'' of a real variable ''x'' is a complex-valued function f: \mathbb \to \mathbb suc ...
or a
positive-definite matrix
In mathematics, a symmetric matrix M with real entries is positive-definite if the real number z^\textsfMz is positive for every nonzero real column vector z, where z^\textsf is the transpose of More generally, a Hermitian matrix (that is, ...
. It was first introduced by
James Mercer in the early 20th century, in the context of solving
integral operator equations. Since then, positive-definite functions and their various analogues and generalizations have arisen in diverse parts of mathematics. They occur naturally in
Fourier analysis,
probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
,
operator theory
In mathematics, operator theory is the study of linear operators on function spaces, beginning with differential operators and integral operators. The operators may be presented abstractly by their characteristics, such as bounded linear oper ...
,
complex function-theory,
moment problems,
integral equation
In mathematics, integral equations are equations in which an unknown function appears under an integral sign. In mathematical notation, integral equations may thus be expressed as being of the form: f(x_1,x_2,x_3,...,x_n ; u(x_1,x_2,x_3,...,x_n ...
s,
boundary-value problems for
partial differential equation
In mathematics, a partial differential equation (PDE) is an equation which imposes relations between the various partial derivatives of a multivariable function.
The function is often thought of as an "unknown" to be solved for, similarly to ...
s,
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
,
embedding problem,
information theory
Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...
, and other areas.
This article will discuss some of the historical and current developments of the theory of positive-definite kernels, starting with the general idea and properties before considering practical applications.
Definition
Let
be a nonempty set, sometimes referred to as the index set. A
symmetric function is called a positive-definite (p.d.) kernel on
if
holds for any
, given
.
In probability theory, a distinction is sometimes made between positive-definite kernels, for which equality in (1.1) implies
, and positive semi-definite (p.s.d.) kernels, which do not impose this condition. Note that this is equivalent to requiring that any finite matrix constructed by pairwise evaluation,
, has either entirely positive (p.d.) or nonnegative (p.s.d.)
eigenvalues
In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...
.
In mathematical literature, kernels are usually complex valued functions, but in this article we assume real-valued functions, which is the common practice in applications of p.d. kernels.
Some general properties
* For a family of p.d. kernels
** The conical sum
is p.d., given
** The product
is p.d., given
** The limit
is p.d. if the limit exists.
* If
is a sequence of sets, and
a sequence of p.d. kernels, then both
and
are p.d. kernels on
.
* Let
. Then the restriction
of
to
is also a p.d. kernel.
Examples of p.d. kernels
* Common examples of p.d. kernels defined on Euclidean space
include:
** Linear kernel:
.
**
Polynomial kernel:
.
** Gaussian kernel (
RBF kernel):
.
** Laplacian kernel:
.
** Abel kernel:
.
** kernel generating
Sobolev space
In mathematics, a Sobolev space is a vector space of functions equipped with a norm that is a combination of ''Lp''-norms of the function together with its derivatives up to a given order. The derivatives are understood in a suitable weak sense ...
s
:
, where
is the
Bessel function of the third kind.
** kernel generating Paley–Wiener space:
.
* If
is a
Hilbert space
In mathematics, Hilbert spaces (named after David Hilbert) allow generalizing the methods of linear algebra and calculus from (finite-dimensional) Euclidean vector spaces to spaces that may be infinite-dimensional. Hilbert spaces arise natu ...
, then its corresponding inner product
is a p.d. kernel. Indeed, we have
* Kernels defined on
and histograms: Histograms are frequently encountered in applications of real-life problems. Most observations are usually available under the form of nonnegative vectors of counts, which, if normalized, yield histograms of frequencies. It has been shown that the following family of squared metrics, respectively Jensen divergence, the
-square, Total Variation, and two variations of the Hellinger distance:
can be used to define p.d. kernels using the following formula
History
Positive-definite kernels, as defined in (1.1), appeared first in 1909 in a paper on integral equations by James Mercer. Several other authors made use of this concept in the following two decades, but none of them explicitly used kernels
, i.e. p.d. functions (indeed M. Mathias and
S. Bochner seem not to have been aware of the study of p.d. kernels). Mercer’s work arose from Hilbert’s paper of 1904 on
Fredholm integral equation In mathematics, the Fredholm integral equation is an integral equation whose solution gives rise to Fredholm theory, the study of Fredholm kernels and Fredholm operators. The integral equation was studied by Ivar Fredholm. A useful method to ...
s of the second kind:
In particular, Hilbert had shown that
where
is a continuous real symmetric kernel,
is continuous,
is a complete system of
orthonormal eigenfunctions, and
’s are the corresponding
eigenvalues
In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...
of (1.2). Hilbert defined a “definite” kernel as one for which the double integral
satisfies
except for
. The original object of Mercer’s paper was to characterize the kernels which are definite in the sense of Hilbert, but Mercer soon found that the class of such functions was too restrictive to characterize in terms of determinants. He therefore defined a continuous real symmetric kernel
to be of positive type (i.e. positive-definite) if
for all real continuous functions
on