mathematics Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...

, the Rayleigh quotient () for a given complex

Hermitian matrix In mathematics, a Hermitian matrix (or self-adjoint matrix) is a complex square matrix that is equal to its own conjugate transpose—that is, the element in the -th row and -th column is equal to the complex conjugate of the element in the -th ...

''M'' and nonzero

vector Vector most often refers to: *Euclidean vector, a quantity with a magnitude and a direction *Vector (epidemiology), an agent that carries and transmits an infectious pathogen into another living organism Vector may also refer to: Mathematic ...

''x'' is defined as:

R(M,x) = .

For real matrices and vectors, the condition of being Hermitian reduces to that of being

symmetric Symmetry (from grc, συμμετρία "agreement in dimensions, due proportion, arrangement") in everyday language refers to a sense of harmonious and beautiful proportion and balance. In mathematics, "symmetry" has a more precise definiti ...

, and the

conjugate transpose In mathematics, the conjugate transpose, also known as the Hermitian transpose, of an m \times n complex matrix \boldsymbol is an n \times m matrix obtained by transposing \boldsymbol and applying complex conjugate on each entry (the complex con ...

x^

to the usual

transpose In linear algebra, the transpose of a matrix is an operator which flips a matrix over its diagonal; that is, it switches the row and column indices of the matrix by producing another matrix, often denoted by (among other notations). The tr ...

x'

. Note that

R(M, c x) = R(M,x)

for any non-zero scalar ''c''. Recall that a Hermitian (or real symmetric) matrix is diagonalizable with only real eigenvalues. It can be shown that, for a given matrix, the Rayleigh quotient reaches its minimum value

\lambda_\min

(the smallest

eigenvalue In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...

of ''M'') when ''x'' is

v_\min

(the corresponding

eigenvector In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...

). Similarly,

R(M, x) \leq \lambda_\max

and

R(M, v_\max) = \lambda_\max

. The Rayleigh quotient is used in the

min-max theorem In linear algebra and functional analysis, the min-max theorem, or variational theorem, or Courant–Fischer–Weyl min-max principle, is a result that gives a variational characterization of eigenvalues of compact Hermitian operators o ...

to get exact values of all eigenvalues. It is also used in

eigenvalue algorithm In numerical analysis, one of the most important problems is designing efficient and stable algorithms for finding the eigenvalues of a matrix. These eigenvalue algorithms may also find eigenvectors. Eigenvalues and eigenvectors Given an square ...

s (such as

Rayleigh quotient iteration Rayleigh quotient iteration is an eigenvalue algorithm which extends the idea of the inverse iteration by using the Rayleigh quotient to obtain increasingly accurate eigenvalue estimates. Rayleigh quotient iteration is an iterative method, that is, ...

) to obtain an eigenvalue approximation from an eigenvector approximation. The range of the Rayleigh quotient (for any matrix, not necessarily Hermitian) is called a

numerical range In the mathematical field of linear algebra and convex analysis, the numerical range or field of values of a complex n \times n matrix ''A'' is the set :W(A) = \left\ where \mathbf^* denotes the conjugate transpose of the vector \mathbf. The nume ...

and contains its

spectrum A spectrum (plural ''spectra'' or ''spectrums'') is a condition that is not limited to a specific set of values but can vary, without gaps, across a continuum. The word was first used scientifically in optics to describe the rainbow of colors i ...

. When the matrix is Hermitian, the numerical radius is equal to the spectral norm. Still in functional analysis,

\lambda_\max

is known as the

spectral radius In mathematics, the spectral radius of a square matrix is the maximum of the absolute values of its eigenvalues. More generally, the spectral radius of a bounded linear operator is the supremum of the absolute values of the elements of its spectru ...

. In the context of C*-algebras or algebraic quantum mechanics, the function that to ''M'' associates the Rayleigh–Ritz quotient ''R''(''M'',''x'') for a fixed ''x'' and ''M'' varying through the algebra would be referred to as "vector state" of the algebra. In

quantum mechanics Quantum mechanics is a fundamental theory in physics that provides a description of the physical properties of nature at the scale of atoms and subatomic particles. It is the foundation of all quantum physics including quantum chemistry, ...

, the Rayleigh quotient gives the

expectation value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...

of the observable corresponding to the operator ''M'' for a system whose state is given by ''x''. If we fix the complex matrix ''M'', then the resulting Rayleigh quotient map (considered as a function of ''x'') completely determines ''M'' via the

polarization identity In linear algebra, a branch of mathematics, the polarization identity is any one of a family of formulas that express the inner product of two vectors in terms of the norm of a normed vector space. If a norm arises from an inner product then t ...

; indeed, this remains true even if we allow ''M'' to be non-Hermitian. (However, if we restrict the field of scalars to the real numbers, then the Rayleigh quotient only determines the

part of ''M''.)

Bounds for Hermitian ''M''

As stated in the introduction, for any vector ''x'', one has

R(M,x) \in \left lambda_\min, \lambda_\max \right /math>, where \lambda_\min, \lambda_\max are respectively the smallest and largest eigenvalues of M . This is immediate after observing that the Rayleigh quotient is a weighted average of eigenvalues of ''M'': R(M,x) =  = \frac where (\lambda_i, v_i) is the i -th eigenpair after orthonormalization and y_i = v_i^* x is the i th coordinate of ''x'' in the eigenbasis. It is then easy to verify that the bounds are attained at the corresponding eigenvectors v_\min, v_\max .

The fact that the quotient is a weighted average of the eigenvalues can be used to identify the second, the third, ... largest eigenvalues. Let \lambda_ = \lambda_1 \ge \lambda_2 \ge \cdots \ge \lambda_n = \lambda_be the eigenvalues in decreasing order. If n=2 and x is constrained to be orthogonal to v_1, in which case y_1 = v_1^*x = 0, then R(M,x) has maximum value \lambda_2, which is achieved when x = v_2 .

Special case of covariance matrices

An empirical

covariance matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...

M

can be represented as the product

A'A

of the

data matrix A Data Matrix is a two-dimensional code consisting of black and white "cells" or dots arranged in either a square or rectangular pattern, also known as a matrix. The information to be encoded can be text or numeric data. Usual data size is fro ...

A

pre-multiplied by its transpose

A'

. Being a positive semi-definite matrix,

M

has non-negative eigenvalues, and orthogonal (or orthogonalisable) eigenvectors, which can be demonstrated as follows. Firstly, that the eigenvalues

\lambda_i

are non-negative:

\begin
&M v_i = A' A v_i = \lambda_i v_i \\
\Rightarrow& v_i' A' A v_i = v_i' \lambda_i v_i \\
\Rightarrow& \left\,  A v_i \right\, ^2 = \lambda_i \left\,  v_i \right\, ^2 \\
\Rightarrow& \lambda_i = \frac \geq 0.
\end

Secondly, that the eigenvectors

v_i

are orthogonal to one another:

\begin
&M v_i = \lambda _i v_i \\
\Rightarrow& v_j' M v_i = v_j' \lambda _i v_i \\
\Rightarrow& \left (M v_j \right )' v_i = \lambda_i v_j' v_i \\
\Rightarrow& \lambda_j v_j ' v_i = \lambda _i v_j' v_i \\
\Rightarrow& \left (\lambda_j - \lambda_i \right ) v_j ' v_i = 0 \\
\Rightarrow& v_j ' v_i = 0
\end

if the eigenvalues are different – in the case of multiplicity, the basis can be orthogonalized. To now establish that the Rayleigh quotient is maximized by the eigenvector with the largest eigenvalue, consider decomposing an arbitrary vector

x

on the basis of the eigenvectors

v_i

x = \sum _ ^n \alpha _i v_i,

where

\alpha_i = \frac = \frac

is the coordinate of

x

orthogonally projected onto

v_i

. Therefore, we have:

\begin
R(M,x) &= \frac \\
&= \frac \\
&= \frac \\
&= \frac
\end

which, by

orthonormality In linear algebra, two vectors in an inner product space are orthonormal if they are orthogonal (or perpendicular along a line) unit vectors. A set of vectors form an orthonormal set if all vectors in the set are mutually orthogonal and all of uni ...

of the eigenvectors, becomes:

\begin
R(M,x) &= \frac \\
&= \sum_^n \lambda_i \frac \\
&= \sum_^n \lambda_i \frac
\end

The last representation establishes that the Rayleigh quotient is the sum of the squared cosines of the angles formed by the vector

x

and each eigenvector

v_i

, weighted by corresponding eigenvalues. If a vector

x

maximizes

R(M,x)

, then any non-zero scalar multiple

kx

also maximizes

R

, so the problem can be reduced to the Lagrange problem of maximizing

\sum _^n \alpha_i^2 \lambda _i

under the constraint that

\sum _ ^n \alpha _i ^2 = 1

. Define:

\beta_i = \alpha_i^2

. This then becomes a

linear program Linear programming (LP), also called linear optimization, is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements are represented by linear relationships. Linear programming is ...

, which always attains its maximum at one of the corners of the domain. A maximum point will have

\alpha_1 = \pm 1

and

\alpha _i = 0

for all

i > 1

(when the eigenvalues are ordered by decreasing magnitude). Thus, the Rayleigh quotient is maximized by the eigenvector with the largest eigenvalue.

Formulation using Lagrange multipliers

Alternatively, this result can be arrived at by the method of

Lagrange multipliers In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equality constraints (i.e., subject to the condition that one or more equations have to be satisfied ex ...

. The first part is to show that the quotient is constant under scaling

x \to cx

, where

c

is a scalar

R(M,cx) = \frac   = \frac   \frac   = R(M,x).

Because of this invariance, it is sufficient to study the special case

\, x\, ^2 = x^Tx = 1

. The problem is then to find the critical points of the function

R(M,x) = x^\mathsf M x ,

subject to the constraint

\, x\, ^2 = x^Tx = 1.

In other words, it is to find the critical points of

\mathcal(x) = x^\mathsf M x  -\lambda \left (x^\mathsf x - 1 \right),

where

\lambda

is a Lagrange multiplier. The stationary points of

\mathcal(x)

occur at

\begin 
&\frac = 0 \\
\Rightarrow& 2x^\mathsfM  - 2\lambda x^\mathsf = 0 \\
\Rightarrow& 2Mx  - 2\lambda x = 0 \text\\
\Rightarrow& M x = \lambda x
\end

and

\therefore R(M,x) = \frac = \lambda \frac = \lambda.

Therefore, the eigenvectors

x_1, \ldots, x_n

M

are the critical points of the Rayleigh quotient and their corresponding eigenvalues

\lambda_1, \ldots, \lambda_n

are the stationary values of

\mathcal

. This property is the basis for

principal components analysis Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and ...

and

canonical correlation In statistics, canonical-correlation analysis (CCA), also called canonical variates analysis, is a way of inferring information from cross-covariance matrices. If we have two vectors ''X'' = (''X''1, ..., ''X'n'') and ''Y'' ...

Use in Sturm–Liouville theory

Sturm–Liouville theory In mathematics and its applications, classical Sturm–Liouville theory is the theory of ''real'' second-order ''linear'' ordinary differential equations of the form: for given coefficient functions , , and , an unknown function ''y = y''(''x'') ...

concerns the action of the

linear operator In mathematics, and more specifically in linear algebra, a linear map (also called a linear mapping, linear transformation, vector space homomorphism, or in some contexts linear function) is a mapping V \to W between two vector spaces that pre ...

+ q(x)y\right)

on the

inner product space In mathematics, an inner product space (or, rarely, a Hausdorff pre-Hilbert space) is a real vector space or a complex vector space with an operation called an inner product. The inner product of two vectors in the space is a scalar, often den ...

defined by

\langle\rangle = \int_a^b w(x)y_1(x)y_2(x) \, dx

of functions satisfying some specified

boundary conditions In mathematics, in the field of differential equations, a boundary value problem is a differential equation together with a set of additional constraints, called the boundary conditions. A solution to a boundary value problem is a solution to th ...

at ''a'' and ''b''. In this case the Rayleigh quotient is

\frac = \frac.

This is sometimes presented in an equivalent form, obtained by separating the integral in the numerator and using

integration by parts In calculus, and more generally in mathematical analysis, integration by parts or partial integration is a process that finds the integral of a product of functions in terms of the integral of the product of their derivative and antiderivative. ...

\begin
\frac &= \frac \\
&= \frac\\
&= \frac .
\end

Generalizations

# For a given pair (''A'', ''B'') of matrices, and a given non-zero vector ''x'', the generalized Rayleigh quotient is defined as:

R(A,B; x) := \frac.

The Generalized Rayleigh Quotient can be reduced to the Rayleigh Quotient

R(D, C^*x)

through the transformation

D = C^ A ^

where

CC^*

is the

Cholesky decomposition In linear algebra, the Cholesky decomposition or Cholesky factorization (pronounced ) is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which is useful for effici ...

of the Hermitian positive-definite matrix ''B''. # For a given pair (''x'', ''y'') of non-zero vectors, and a given Hermitian matrix ''H'', the generalized Rayleigh quotient can be defined as:

R(H; x,y) := \frac\sqrt

which coincides with ''R''(''H'',''x'') when ''x'' = ''y''. In quantum mechanics, this quantity is called a "matrix element" or sometimes a "transition amplitude".

Bounds for Hermitian ''M''

Special case of covariance matrices

Formulation using Lagrange multipliers

Use in Sturm–Liouville theory

Generalizations

See also

References

Further reading