mathematics Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...

, specifically

functional analysis Functional analysis is a branch of mathematical analysis, the core of which is formed by the study of vector spaces endowed with some kind of limit-related structure (e.g. Inner product space#Definition, inner product, Norm (mathematics)#Defini ...

, Mercer's theorem is a representation of a symmetric

positive-definite In mathematics, positive definiteness is a property of any object to which a bilinear form or a sesquilinear form may be naturally associated, which is positive-definite. See, in particular: * Positive-definite bilinear form * Positive-definite fu ...

function on a square as a sum of a convergent sequence of product functions. This theorem, presented in , is one of the most notable results of the work of James Mercer (1883–1932). It is an important theoretical tool in the theory of

integral equation In mathematics, integral equations are equations in which an unknown Function (mathematics), function appears under an integral sign. In mathematical notation, integral equations may thus be expressed as being of the form: f(x_1,x_2,x_3,...,x_n ; ...

s; it is used in the

Hilbert space In mathematics, Hilbert spaces (named after David Hilbert) allow generalizing the methods of linear algebra and calculus from (finite-dimensional) Euclidean vector spaces to spaces that may be infinite-dimensional. Hilbert spaces arise natural ...

theory of

stochastic process In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that appea ...

es, for example the Karhunen–Loève theorem; and it is also used to characterize a symmetric positive semi-definite

kernel Kernel may refer to: Computing * Kernel (operating system), the central component of most operating systems * Kernel (image processing), a matrix used for image convolution * Compute kernel, in GPGPU programming * Kernel method, in machine learnin ...

Introduction

To explain Mercer's

theorem In mathematics, a theorem is a statement that has been proved, or can be proved. The ''proof'' of a theorem is a logical argument that uses the inference rules of a deductive system to establish that the theorem is a logical consequence of th ...

, we first consider an important special case; see

below Below may refer to: *Earth *Ground (disambiguation) *Soil *Floor *Bottom (disambiguation) Bottom may refer to: Anatomy and sex * Bottom (BDSM), the partner in a BDSM who takes the passive, receiving, or obedient role, to that of the top or ...

for a more general formulation. A ''kernel'', in this context, is a symmetric continuous function :

\rightarrow \mathbb

where symmetric means that

K(x,y) = K(y,x)

for all

x,y \in,b /math>.

''K'' is said to be ''non-negative definite'' (or positive semidefinite) if and only if

: \sum_^n\sum_^n K(x_i, x_j) c_i c_j \geq 0 for all finite sequences of points ''x''

₁, ..., ''x''_''n'' of 'a'', ''b''and all choices of real numbers ''c''₁, ..., ''c''_''n'' (cf.

positive-definite kernel In operator theory, a branch of mathematics, a positive-definite kernel is a generalization of a positive-definite function or a positive-definite matrix. It was first introduced by James Mercer in the early 20th century, in the context of solving ...

). Associated to ''K'' is a

linear operator In mathematics, and more specifically in linear algebra, a linear map (also called a linear mapping, linear transformation, vector space homomorphism, or in some contexts linear function) is a mapping V \to W between two vector spaces that pre ...

(more specifically a

Hilbert–Schmidt integral operator In mathematics, a Hilbert–Schmidt integral operator is a type of integral transform. Specifically, given a domain (an open and connected set) Ω in ''n''-dimensional Euclidean space R''n'', a Hilbert–Schmidt kernel is a function ''k'' ...

) on functions defined by the integral :

x) =\int_a^b K(x,s) \varphi(s)\, ds.

For technical considerations we assume

\varphi

can range through the space ''L''² 'a'', ''b''(see

Lp space In mathematics, the spaces are function spaces defined using a natural generalization of the Norm (mathematics)#p-norm, -norm for finite-dimensional vector spaces. They are sometimes called Lebesgue spaces, named after Henri Lebesgue , although ...

) of square-integrable real-valued functions. Since ''T_K'' is a linear operator, we can talk about

eigenvalues In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...

and

eigenfunction In mathematics, an eigenfunction of a linear operator ''D'' defined on some function space is any non-zero function f in that space that, when acted upon by ''D'', is only multiplied by some scaling factor called an eigenvalue. As an equation, th ...

s of ''T_K''. Theorem. Suppose ''K'' is a continuous symmetric non-negative definite kernel. Then there is an

orthonormal basis In mathematics, particularly linear algebra, an orthonormal basis for an inner product space ''V'' with finite dimension is a basis for V whose vectors are orthonormal, that is, they are all unit vectors and orthogonal to each other. For example, ...

_i of ''L''² 'a'', ''b''consisting of eigenfunctions of ''T''_''K'' such that the corresponding sequence of eigenvalues _''i'' is nonnegative. The eigenfunctions corresponding to non-zero eigenvalues are continuous on 'a'', ''b''and ''K'' has the representation :

K(s,t) = \sum_^\infty \lambda_j \, e_j(s) \, e_j(t)

where the convergence is absolute and uniform.

Details

We now explain in greater detail the structure of the proof of Mercer's theorem, particularly how it relates to

spectral theory of compact operators In functional analysis, compact operators are linear operators on Banach spaces that map bounded sets to relatively compact sets. In the case of a Hilbert space ''H'', the compact operators are the closure of the finite rank operators in the unifo ...

. * The map ''K'' ↦ ''T''_''K'' is

injective In mathematics, an injective function (also known as injection, or one-to-one function) is a function that maps distinct elements of its domain to distinct elements; that is, implies . (Equivalently, implies in the equivalent contrapositiv ...

. * ''T''_''K'' is a non-negative symmetric compact operator on ''L''² 'a'',''b'' moreover ''K''(''x'', ''x'') ≥ 0. To show compactness, show that the image of the

unit ball Unit may refer to: Arts and entertainment * UNIT, a fictional military organization in the science fiction television series ''Doctor Who'' * Unit of action, a discrete piece of action (or beat) in a theatrical presentation Music * ''Unit'' (alb ...

of ''L''² 'a'',''b''under ''T''_''K''

equicontinuous In mathematical analysis, a family of functions is equicontinuous if all the functions are continuous and they have equal variation over a given neighbourhood, in a precise sense described herein. In particular, the concept applies to countable fa ...

and apply Ascoli's theorem, to show that the image of the unit ball is relatively compact in C( 'a'',''b'' with the

uniform norm In mathematical analysis, the uniform norm (or ) assigns to real- or complex-valued bounded functions defined on a set the non-negative number :\, f\, _\infty = \, f\, _ = \sup\left\. This norm is also called the , the , the , or, when the ...

and ''a fortiori'' in ''L''² 'a'',''b'' Now apply the

spectral theorem In mathematics, particularly linear algebra and functional analysis, a spectral theorem is a result about when a linear operator or matrix (mathematics), matrix can be Diagonalizable matrix, diagonalized (that is, represented as a diagonal matrix i ...

for compact operators on Hilbert spaces to ''T''_''K'' to show the existence of the orthonormal basis _i of ''L''² 'a'',''b'' :

t) = \int_a^b K(t,s) e_i(s)\, ds.

If λ_i ≠ 0, the eigenvector (

) ''e''_i is seen to be continuous on 'a'',''b'' Now :

\sum_^\infty \lambda_i , e_i(t) e_i(s),  \leq \sup_ , K(x,x), ,

which shows that the sequence :

\sum_^\infty \lambda_i e_i(t) e_i(s)

converges absolutely and uniformly to a kernel ''K''₀ which is easily seen to define the same operator as the kernel ''K''. Hence ''K''=''K''₀ from which Mercer's theorem follows. Finally, to show non-negativity of the eigenvalues one can write

\lambda \langle f,f \rangle= \langle f, T_f \rangle

and expressing the right hand side as an integral well approximated by its Riemann sums, which are non-negative by positive-definiteness of ''K'', implying

\lambda \langle f,f \rangle \geq 0

, implying

\lambda \geq 0

Trace

The following is immediate: Theorem. Suppose ''K'' is a continuous symmetric non-negative definite kernel; ''T''_''K'' has a sequence of nonnegative eigenvalues _i. Then :

\int_a^b K(t,t)\, dt = \sum_i \lambda_i.

This shows that the operator ''T''_''K'' is a

trace class In mathematics, specifically functional analysis, a trace-class operator is a linear operator for which a Trace (linear algebra), trace may be defined, such that the trace is a finite number independent of the choice of basis used to compute the tra ...

operator and :

\operatorname(T_K) = \int_a^b K(t,t)\, dt.

Generalizations

Mercer's theorem itself is a generalization of the result that any

symmetric Symmetry (from grc, συμμετρία "agreement in dimensions, due proportion, arrangement") in everyday language refers to a sense of harmonious and beautiful proportion and balance. In mathematics, "symmetry" has a more precise definiti ...

positive-semidefinite matrix In mathematics, a symmetric matrix M with real entries is positive-definite if the real number z^\textsfMz is positive for every nonzero real column vector z, where z^\textsf is the transpose of More generally, a Hermitian matrix (that is, a co ...

is the

Gramian matrix In linear algebra, the Gram matrix (or Gramian matrix, Gramian) of a set of vectors v_1,\dots, v_n in an inner product space is the Hermitian matrix of inner products, whose entries are given by the inner product G_ = \left\langle v_i, v_j \right\r ...

of a set of vectors. The first generalization replaces the interval 'a'', ''b''with any

compact Hausdorff space In mathematics, specifically general topology, compactness is a property that seeks to generalize the notion of a closed and bounded subset of Euclidean space by making precise the idea of a space having no "punctures" or "missing endpoints", i ...

and Lebesgue measure on 'a'', ''b''is replaced by a finite countably additive measure μ on the

Borel algebra In mathematics, a Borel set is any set in a topological space that can be formed from open sets (or, equivalently, from closed sets) through the operations of countable union, countable intersection, and relative complement. Borel sets are named ...

of ''X'' whose support is ''X''. This means that μ(''U'') > 0 for any nonempty open subset ''U'' of ''X''. A recent generalization replaces these conditions by the following: the set ''X'' is a

first-countable In topology, a branch of mathematics, a first-countable space is a topological space satisfying the "first axiom of countability". Specifically, a space X is said to be first-countable if each point has a countable neighbourhood basis (local base) ...

topological space endowed with a Borel (complete) measure μ. ''X'' is the support of μ and, for all ''x'' in ''X'', there is an open set ''U'' containing ''x'' and having finite measure. Then essentially the same result holds: Theorem. Suppose ''K'' is a continuous symmetric positive-definite kernel on ''X''. If the function κ is ''L''¹_μ(''X''), where κ(x)=K(x,x), for all ''x'' in ''X'', then there is an

orthonormal set In linear algebra, two vectors in an inner product space are orthonormal if they are orthogonal (or perpendicular along a line) unit vectors. A set of vectors form an orthonormal set if all vectors in the set are mutually orthogonal and all of un ...

_i of ''L''²_μ(''X'') consisting of eigenfunctions of ''T''_''K'' such that corresponding sequence of eigenvalues _i is nonnegative. The eigenfunctions corresponding to non-zero eigenvalues are continuous on ''X'' and ''K'' has the representation :

K(s,t) = \sum_^\infty \lambda_j \, e_j(s) \, e_j(t)

where the convergence is absolute and uniform on compact subsets of ''X''. The next generalization deals with representations of ''measurable'' kernels. Let (''X'', ''M'', μ) be a σ-finite measure space. An ''L''² (or square-integrable) kernel on ''X'' is a function :

K \in L^2_(X \times X).

''L''² kernels define a bounded operator ''T''_''K'' by the formula :

\langle T_K \varphi, \psi \rangle = \int_ K(y,x) \varphi(y) \psi(x) \,d mu \otimes \mu y,x).

''T''_''K'' is a compact operator (actually it is even a

Hilbert–Schmidt operator In mathematics, a Hilbert–Schmidt operator, named after David Hilbert and Erhard Schmidt, is a bounded operator A \colon H \to H that acts on a Hilbert space H and has finite Hilbert–Schmidt norm \, A\, ^2_ \ \stackrel\ \sum_ \, Ae_i\, ^2_ ...

). If the kernel ''K'' is symmetric, by the

, ''T''_''K'' has an orthonormal basis of eigenvectors. Those eigenvectors that correspond to non-zero eigenvalues can be arranged in a sequence _''i'' (regardless of separability). Theorem. If ''K'' is a symmetric positive-definite kernel on (''X'', ''M'', μ), then :

K(y,x) = \sum_ \lambda_i e_i(y) e_i(x)

where the convergence in the ''L''² norm. Note that when continuity of the kernel is not assumed, the expansion no longer converges uniformly.

Mercer's condition

, a

real Real may refer to: Currencies * Brazilian real (R$) * Central American Republic real * Mexican real * Portuguese real * Spanish real * Spanish colonial real Music Albums * ''Real'' (L'Arc-en-Ciel album) (2000) * ''Real'' (Bright album) (2010) ...

-valued

function Function or functionality may refer to: Computing * Function key, a type of key on computer keyboards * Function model, a structured representation of processes in a system * Function object or functor or functionoid, a concept of object-oriente ...

''K(x,y)'' is said to fulfill Mercer's condition if for all

square-integrable function In mathematics, a square-integrable function, also called a quadratically integrable function or L^2 function or square-summable function, is a real- or complex-valued measurable function for which the integral of the square of the absolute value i ...

s ''g''(''x'') one has :

\iint g(x)K(x,y)g(y)\,dx\,dy \geq 0.

Discrete analog

This is analogous to the definition of a

. This is a matrix

K

of dimension

N

, which satisfies, for all vectors

g

, the property :

(g,Kg)=g^Kg=\sum_^N\sum_^N\,g_i\,K_\,g_j\geq0

Examples

A positive constant function :

K(x, y)=c\,

satisfies Mercer's condition, as then the integral becomes by

Fubini's theorem In mathematical analysis Fubini's theorem is a result that gives conditions under which it is possible to compute a double integral by using an iterated integral, introduced by Guido Fubini in 1907. One may switch the order of integration if the ...

\iint g(x)\,c\,g(y)\,dx dy = c\int\! g(x) \,dx \int\! g(y) \,dy = c\left(\int\! g(x) \,dx\right)^2

which is indeed

non-negative In mathematics, the sign of a real number is its property of being either positive, negative, or zero. Depending on local conventions, zero may be considered as being neither positive nor negative (having no sign or a unique third sign), or it ...

Notes

References

* Adriaan Zaanen, ''Linear Analysis'', North Holland Publishing Co., 1960, * Ferreira, J. C., Menegatto, V. A., ''Eigenvalues of integral operators defined by smooth positive definite kernels'', Integral equation and Operator Theory, 64 (2009), no. 1, 61–81. (Gives the generalization of Mercer's theorem for metric spaces. The result is easily adapted to first countable topological spaces) * Konrad Jörgens, ''Linear integral operators'', Pitman, Boston, 1982, *

Richard Courant Richard Courant (January 8, 1888 – January 27, 1972) was a German American mathematician. He is best known by the general public for the book '' What is Mathematics?'', co-written with Herbert Robbins. His research focused on the areas of real ...

and

David Hilbert David Hilbert (; ; 23 January 1862 – 14 February 1943) was a German mathematician, one of the most influential mathematicians of the 19th and early 20th centuries. Hilbert discovered and developed a broad range of fundamental ideas in many a ...

, ''

Methods of Mathematical Physics ''Methoden der mathematischen Physik'' (Methods of Mathematical Physics) is a 1924 book, in two volumes totalling around 1000 pages, published under the names of Richard Courant and David Hilbert. It was a comprehensive treatment of the "methods o ...

'', vol 1, Interscience 1953, * Robert Ash, ''Information Theory'', Dover Publications, 1990, * , * * H. König, ''Eigenvalue distribution of compact operators'', Birkhäuser Verlag, 1986. (Gives the generalization of Mercer's theorem for finite measures μ.) {{Functional analysis Theorems in functional analysis