linear algebra Linear algebra is the branch of mathematics concerning linear equations such as: :a_1x_1+\cdots +a_nx_n=b, linear maps such as: :(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n, and their representations in vector spaces and through matrices. ...

, the Cayley–Hamilton theorem (named after the mathematicians

Arthur Cayley Arthur Cayley (; 16 August 1821 – 26 January 1895) was a prolific United Kingdom of Great Britain and Ireland, British mathematician who worked mostly on algebra. He helped found the modern British school of pure mathematics. As a child, C ...

and

William Rowan Hamilton Sir William Rowan Hamilton Doctor of Law, LL.D, Doctor of Civil Law, DCL, Royal Irish Academy, MRIA, Royal Astronomical Society#Fellow, FRAS (3/4 August 1805 – 2 September 1865) was an Irish mathematician, astronomer, and physicist. He was the ...

) states that every

square matrix In mathematics, a square matrix is a matrix with the same number of rows and columns. An ''n''-by-''n'' matrix is known as a square matrix of order Any two square matrices of the same order can be added and multiplied. Square matrices are often ...

over a

commutative ring In mathematics, a commutative ring is a ring in which the multiplication operation is commutative. The study of commutative rings is called commutative algebra. Complementarily, noncommutative algebra is the study of ring properties that are not sp ...

(such as the

real Real may refer to: Currencies * Brazilian real (R$) * Central American Republic real * Mexican real * Portuguese real * Spanish real * Spanish colonial real Music Albums * ''Real'' (L'Arc-en-Ciel album) (2000) * ''Real'' (Bright album) (2010) ...

complex number In mathematics, a complex number is an element of a number system that extends the real numbers with a specific element denoted , called the imaginary unit and satisfying the equation i^= -1; every complex number can be expressed in the form ...

s or the

integers An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign (−1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language o ...

) satisfies its own characteristic equation. If is a given matrix and is the

identity matrix In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere. Terminology and notation The identity matrix is often denoted by I_n, or simply by I if the size is immaterial o ...

, then the

characteristic polynomial In linear algebra, the characteristic polynomial of a square matrix is a polynomial which is invariant under matrix similarity and has the eigenvalues as roots. It has the determinant and the trace of the matrix among its coefficients. The chara ...

of is defined as

p_A(\lambda)=\det(\lambda I_n-A)

, where is the

determinant In mathematics, the determinant is a scalar value that is a function of the entries of a square matrix. It characterizes some properties of the matrix and the linear map represented by the matrix. In particular, the determinant is nonzero if and ...

operation and is a

variable Variable may refer to: * Variable (computer science), a symbolic name associated with a value and whose associated value may be changed * Variable (mathematics), a symbol that represents a quantity in a mathematical expression, as used in many ...

for a

scalar Scalar may refer to: *Scalar (mathematics), an element of a field, which is used to define a vector space, usually the field of real numbers * Scalar (physics), a physical quantity that can be described by a single element of a number field such ...

element of the base

ring Ring may refer to: * Ring (jewellery), a round band, usually made of metal, worn as ornamental jewelry * To make a sound with a bell, and the sound made by a bell :(hence) to initiate a telephone connection Arts, entertainment and media Film and ...

. Since the entries of the matrix

(\lambda I_n-A)

are (linear or constant)

polynomial In mathematics, a polynomial is an expression consisting of indeterminates (also called variables) and coefficients, that involves only the operations of addition, subtraction, multiplication, and positive-integer powers of variables. An exa ...

s in , the determinant is also a

degree Degree may refer to: As a unit of measurement * Degree (angle), a unit of angle measurement ** Degree of geographical latitude ** Degree of geographical longitude * Degree symbol (°), a notation used in science, engineering, and mathematics ...

monic polynomial In algebra, a monic polynomial is a single-variable polynomial (that is, a univariate polynomial) in which the leading coefficient (the nonzero coefficient of highest degree) is equal to 1. Therefore, a monic polynomial has the form: :x^n+c_x^+\cd ...

in ,

p_A(\lambda) = \lambda^n + c_\lambda^ + \cdots + c_1\lambda + c_0~.

One can create an analogous polynomial

p_A(A)

in the matrix instead of the scalar variable , defined as

p_A(A) = A^n + c_A^ + \cdots + c_1A + c_0I_n~.

The Cayley–Hamilton theorem states that this polynomial expression is equal to the

zero matrix In mathematics, particularly linear algebra, a zero matrix or null matrix is a matrix all of whose entries are zero. It also serves as the additive identity of the additive group of m \times n matrices, and is denoted by the symbol O or 0 followed ...

, which is to say that

p_A(A) = \mathbf 0

. The

theorem In mathematics, a theorem is a statement that has been proved, or can be proved. The ''proof'' of a theorem is a logical argument that uses the inference rules of a deductive system to establish that the theorem is a logical consequence of th ...

allows to be expressed as a linear combination of the lower matrix powers of . When the ring is a

field Field may refer to: Expanses of open ground * Field (agriculture), an area of land used for agricultural purposes * Airfield, an aerodrome that lacks the infrastructure of an airport * Battlefield * Lawn, an area of mowed grass * Meadow, a grass ...

, the Cayley–Hamilton theorem is equivalent to the statement that the minimal polynomial of a square matrix

divides In mathematics, a divisor of an integer n, also called a factor of n, is an integer m that may be multiplied by some integer to produce n. In this case, one also says that n is a multiple of m. An integer n is divisible or evenly divisible by ...

its characteristic polynomial. The theorem was first

proven Proven is a rural village in the Belgian province of West Flanders, and a "deelgemeente" of the municipality Poperinge. The village has about 1400 inhabitants. The church and parish of Proven are named after Saint Victor. The Saint Victor Chur ...

in 1853 in terms of inverses of linear functions of

quaternion In mathematics, the quaternion number system extends the complex numbers. Quaternions were first described by the Irish mathematician William Rowan Hamilton in 1843 and applied to mechanics in three-dimensional space. Hamilton defined a quatern ...

s, a ''non-commutative'' ring, by Hamilton. This corresponds to the special case of certain real or complex matrices. The theorem holds for general quaternionic matrices.Due to the non-commutative nature of the multiplication operation for quaternions and related constructions, care needs to be taken with definitions, most notably in this context, for the determinant. The theorem holds as well for the slightly less well-behaved

split-quaternion In abstract algebra, the split-quaternions or coquaternions form an algebraic structure introduced by James Cockle in 1849 under the latter name. They form an associative algebra of dimension four over the real numbers. After introduction in th ...

s, see . The rings of quaternions and split-quaternions can both be represented by certain complex matrices. (When restricted to unit norm, these are the

groups A group is a number of persons or things that are located, gathered, or classed together. Groups of people * Cultural group, a group whose members share the same cultural identity * Ethnic group, a group whose members share the same ethnic ide ...

and respectively.) Therefore it is not surprising that the theorem holds.
There is no such matrix representation for the

octonion In mathematics, the octonions are a normed division algebra over the real numbers, a kind of hypercomplex number system. The octonions are usually represented by the capital letter O, using boldface or blackboard bold \mathbb O. Octonions have e ...

s, since the multiplication operation is not associative in this case. However, a modified Cayley–Hamilton theorem still holds for the octonions, see . Cayley in 1858 stated it for and smaller matrices, but only published a proof for the case. The general case was first proved by Ferdinand Frobenius in 1878.

Examples

matrices

For a matrix , the characteristic polynomial is given by , and so is trivial.

matrices

As a concrete example, let :

A = \begin1&2\\3&4\end.

Its characteristic polynomial is given by :

p(\lambda) = \det(\lambda I_2-A) = \det\!\begin\lambda-1&-2\\
-3&\lambda-4\end=(\lambda-1)(\lambda-4)-(-2)(-3)=\lambda^2-5\lambda-2.

The Cayley–Hamilton theorem claims that, if we ''define'' :

p(X)=X^2-5X-2I_2,

then :

p(A)=A^2-5A-2I_2 = \begin0&0\\0&0\\\end.

We can verify by computation that indeed, :

A^2-5A-2I_2 = \begin7&10\\15&22\\\end-\begin5&10\\15&20\\\end-\begin2&0\\0&2\\\end=\begin0&0\\0&0\\\end.

For a generic matrix, :

A=\begina&b\\c&d\\\end ,

the characteristic polynomial is given by , so the Cayley–Hamilton theorem states that :

p(A)=A^2-(a+d)A+(ad-bc)I_2=\begin0&0\\0&0\\\end;

which is indeed always the case, evident by working out the entries of ².

A^2-(a+d)A+(ad-bc)I_2

=\begina^2+bc&ab+bd\\ac+cd&bc+d^2\\\end-\begina(a+d)&b(a+d)\\c(a+d)&d(a+d)\\\end+(ad-bc)I_2

=\beginbc-ad&0\\0&bc-ad\\\end+(ad-bc)I_2

=\begin0&0\\0&0\\\end

Applications

Determinant and inverse matrix

For a general

invertible matrix In linear algebra, an -by- square matrix is called invertible (also nonsingular or nondegenerate), if there exists an -by- square matrix such that :\mathbf = \mathbf = \mathbf_n \ where denotes the -by- identity matrix and the multiplicati ...

, i.e., one with nonzero determinant, ⁻¹ can thus be written as an order polynomial expression in : As indicated, the Cayley–Hamilton theorem amounts to the identity The coefficients are given by the

elementary symmetric polynomial In mathematics, specifically in commutative algebra, the elementary symmetric polynomials are one type of basic building block for symmetric polynomials, in the sense that any symmetric polynomial can be expressed as a polynomial in elementary sy ...

s of the

eigenvalue In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...

s of . Using

Newton identities In mathematics, Newton's identities, also known as the Girard–Newton formulae, give relations between two types of symmetric polynomials, namely between power sums and elementary symmetric polynomials. Evaluated at the roots of a monic polynomi ...

, the elementary symmetric polynomials can in turn be expressed in terms of

power sum symmetric polynomial In mathematics, specifically in commutative algebra, the power sum symmetric polynomials are a type of basic building block for symmetric polynomials, in the sense that every symmetric polynomial with rational coefficients can be expressed as a su ...

s of the eigenvalues: :

s_k = \sum_^n \lambda_i^k = \operatorname(A^k),

where is the

trace Trace may refer to: Arts and entertainment Music * ''Trace'' (Son Volt album), 1995 * ''Trace'' (Died Pretty album), 1993 * Trace (band), a Dutch progressive rock band * ''The Trace'' (album) Other uses in arts and entertainment * ''Trace'' ...

of the matrix . Thus, we can express in terms of the trace of powers of . In general, the formula for the coefficients is given in terms of complete exponential

Bell polynomials In combinatorial mathematics, the Bell polynomials, named in honor of Eric Temple Bell, are used in the study of set partitions. They are related to Stirling and Bell numbers. They also occur in many applications, such as in the Faà di Bruno's fo ...

asAn explicit expression for these coefficients is :

c_i = \sum_\prod_^ \frac\operatorname(A^l)^,

where the sum is taken over the sets of all integer partitions satisfying the equation :

\sum_^lk_ = n-i.

c_ = \frac B_k(s_1, -1! s_2, 2! s_3, \ldots, (-1)^(k-1)! s_k).

In particular, the determinant of equals . Thus, the determinant can be written as the

trace identity In mathematics, a trace identity is any equation involving the trace of a matrix. Properties Trace identities are invariant under simultaneous conjugation. Uses They are frequently used in the invariant theory of n \times n matrices to find th ...

: :

\det(A) = \frac B_n(s_1, -1! s_2, 2! s_3, \ldots, (-1)^(n-1)! s_n).

Likewise, the characteristic polynomial can be written as :

-(-1)^n\det(A)I_n = A(A^+c_A^+\cdots+c_I_n),

and, by multiplying both sides by (note ), one is led to an expression for the inverse of as a trace identity, :

& = \frac\sum_^ (-1)^\frac B_k(s_1, -1! s_2, 2! s_3, \ldots, (-1)^(k-1)! s_k). \end

Another method for obtaining these coefficients for a general matrix, provided no root be zero, relies on the following alternative expression for the determinant, :

p(\lambda)= \det (\lambda I_n -A) = \lambda^n \exp (\operatorname (\log (I_n - A/\lambda))).

Hence, by virtue of the

Mercator series In mathematics, the Mercator series or Newton–Mercator series is the Taylor series for the natural logarithm: :\ln(1+x)=x-\frac+\frac-\frac+\cdots In summation notation, :\ln(1+x)=\sum_^\infty \frac x^n. The series converges to the natural ...

, :

p(\lambda)= \lambda^n \exp \left( -\operatorname \sum_^\infty  \right),

where the exponential ''only'' needs be expanded to order , since is of order , the net negative powers of automatically vanishing by the C–H theorem. (Again, this requires a ring containing the

rational number In mathematics, a rational number is a number that can be expressed as the quotient or fraction of two integers, a numerator and a non-zero denominator . For example, is a rational number, as is every integer (e.g. ). The set of all ration ...

s.) Differentiation of this expression with respect to allows one to express the coefficients of the characteristic polynomial for general as determinants of matrices,See, e.g., p. 54 of , which solves

Jacobi's formula In matrix calculus, Jacobi's formula expresses the derivative of the determinant of a matrix ''A'' in terms of the adjugate of ''A'' and the derivative of ''A''., Part Three, Section 8.3 If is a differentiable map from the real numbers to mat ...

, :

\partial p(\lambda)  /\partial \lambda= p(\lambda) \sum^\infty _\lambda ^  \operatornameA^m =  p(\lambda) ~  \operatorname \frac\equiv\operatorname B~,

where is the adjugate matrix of the next section. There also exists an equivalent, related recursive algorithm introduced by

Urbain Le Verrier Urbain Jean Joseph Le Verrier FRS (FOR) HFRSE (; 11 March 1811 – 23 September 1877) was a French astronomer and mathematician who specialized in celestial mechanics and is best known for predicting the existence and position of Neptune using ...

and

Dmitry Konstantinovich Faddeev Dmitry Konstantinovich Faddeev ( rus, Дми́трий Константи́нович Фадде́ев, , ˈdmʲitrʲɪj kənstɐnʲˈtʲinəvʲɪtɕ fɐˈdʲe(j)ɪf; 30 June 1907 – 20 October 1989) was a Soviet mathematician. Biography Dmitry w ...

—the

Faddeev–LeVerrier algorithm In mathematics (linear algebra), the Faddeev–LeVerrier algorithm is a recursive method to calculate the coefficients of the characteristic polynomial p_A(\lambda)=\det (\lambda I_n - A) of a square matrix, , named after Dmitry Konstantinovic ...

, which reads :

M_k &\equiv AM_ -\frac(\operatorname(AM_)) I \qquad \qquad & c_ &= -\frac 1 k \operatorname(AM_k) \qquad &k=1,\ldots ,n ~. \end

(see, e.g., p 88 of .) Observe as the recursion terminates. See the algebraic proof in the following section, which relies on the modes of the adjugate, . Specifically,

(\lambda I-A) B = I p(\lambda)

and the above derivative of when one traces it yields :

\lambda p' -n p =\operatorname (AB)~,

(), and the above recursions, in turn. :

c_ = \frac
\begin  \operatornameA  &   m-1 &0&\cdots\\
\operatornameA^2  &\operatornameA&   m-2 &\cdots\\
 \vdots & \vdots & & & \vdots    \\
\operatornameA^ &\operatornameA^& \cdots & \cdots & 1    \\
\operatornameA^m  &\operatornameA^& \cdots & \cdots & \operatornameA \end ~.

; Examples For instance, the first few Bell polynomials are = 1, , , and . Using these to specify the coefficients of the characteristic polynomial of a matrix yields :

c_0 = \frac B_2(s_1, -1! s_2) = \frac(s_1^2 - s_2) = \frac((\operatorname(A))^2 - \operatorname(A^2)). \end

The coefficient gives the determinant of the matrix, minus its trace, while its inverse is given by :

A^ = \frac(A + c_1 I_2) = \frac.

It is apparent from the general formula for ''c''_{''n''−''k''}, expressed in terms of Bell polynomials, that the expressions :

-\operatorname(A)\quad \text \quad \tfrac 1 2 (\operatorname(A)^2 - \operatorname(A^2))

always give the coefficients of and of in the characteristic polynomial of any matrix, respectively. So, for a matrix , the statement of the Cayley–Hamilton theorem can also be written as :

A^3- (\operatornameA)A^2+\frac\left((\operatornameA)^2-\operatorname(A^2)\right)A-\det(A)I_3=O,

where the right-hand side designates a matrix with all entries reduced to zero. Likewise, this determinant in the case, is now :

&= \tfrac \left ( (\operatornameA)^3-3\operatorname(A^2)(\operatornameA)+2\operatorname(A^3) \right ). \end

This expression gives the negative of coefficient of in the general case, as seen below. Similarly, one can write for a matrix , :

A^4-(\operatornameA)A^3 + \tfrac\bigl((\operatornameA)^2-\operatorname(A^2)\bigr)A^2 - \tfrac\bigl( (\operatornameA)^3-3\operatorname(A^2)(\operatornameA)+2\operatorname(A^3)\bigr)A + \det(A)I_4 = O,

where, now, the determinant is , :

\tfrac\! \left( (\operatornameA)^4-6 \operatorname(A^2)(\operatornameA)^2+3(\operatorname(A^2))^2+8\operatorname(A^3)\operatorname(A) -6\operatorname(A^4) \right),

and so on for larger matrices. The increasingly complex expressions for the coefficients is deducible from

Newton's identities In mathematics, Newton's identities, also known as the Girard–Newton formulae, give relations between two types of symmetric polynomials, namely between power sums and elementary symmetric polynomials. Evaluated at the roots of a monic polynomia ...

or the

''n''-th power of matrix

The Cayley–Hamilton theorem always provides a relationship between the powers of (though not always the simplest one), which allows one to simplify expressions involving such powers, and evaluate them without having to compute the power ^''n'' or any higher powers of . As an example, for

A = \begin1&2\\3&4\end

the theorem gives :

A^2=5A+2I_2\,  .

Then, to calculate , observe :

A^3=(5A+2I_2)A=5A^2+2A=5(5A+2I_2)+2A=27A+10I_2,

A^4=A^3A=(27A+10I_2)A=27A^2+10A=27(5A+2I_2)+10A=145A+54I_2\, .

Likewise, :

A^=\frac~.

A^=A^A^=\frac=\frac=\frac~.

Notice that we have been able to write the matrix power as the sum of two terms. In fact, matrix power of any order can be written as a matrix polynomial of degree at most , where is the size of a square matrix. This is an instance where Cayley–Hamilton theorem can be used to express a matrix function, which we will discuss below systematically.

Matrix functions

Given an

analytic function In mathematics, an analytic function is a function that is locally given by a convergent power series. There exist both real analytic functions and complex analytic functions. Functions of each type are infinitely differentiable, but complex an ...

f(x) = \sum_^\infty a_k x^k

and the characteristic polynomial of degree of an matrix , the function can be expressed using long division as :

f(x) = q(x) p(x) + r(x),

where is some quotient polynomial and is a remainder polynomial such that . By the Cayley–Hamilton theorem, replacing by the matrix gives , so one has :

f(A) = r(A).

Thus, the analytic function of the matrix can be expressed as a matrix polynomial of degree less than . Let the remainder polynomial be :

r(x) = c_0 + c_1 x + \cdots + c_ x^.

Since , evaluating the function at the eigenvalues of yields :

f(\lambda_i) = r(\lambda_i) = c_0 + c_1 \lambda_i + \cdots + c_ \lambda_i^, \qquad \text i=1,2,...,n.

This amounts to a system of

linear equation In mathematics, a linear equation is an equation that may be put in the form a_1x_1+\ldots+a_nx_n+b=0, where x_1,\ldots,x_n are the variables (or unknowns), and b,a_1,\ldots,a_n are the coefficients, which are often real numbers. The coefficien ...

s, which can be solved to determine the coefficients . Thus, one has :

f(A) = \sum_^ c_k A^k.

When the eigenvalues are repeated, that is for some , two or more equations are identical; and hence the linear equations cannot be solved uniquely. For such cases, for an eigenvalue with multiplicity , the first derivatives of vanish at the eigenvalue. This leads to the extra linearly independent solutions :

\frac\Big, _ = \frac\Big, _\qquad \text k = 1, 2, \ldots, m-1,

which, combined with others, yield the required equations to solve for . Finding a polynomial that passes through the points is essentially an interpolation problem, and can be solved using Lagrange or Newton interpolation techniques, leading to

Sylvester's formula In matrix theory, Sylvester's formula or Sylvester's matrix theorem (named after J. J. Sylvester) or Lagrange−Sylvester interpolation expresses an analytic function of a matrix as a polynomial in , in terms of the eigenvalues and eigenvectors ...

. For example, suppose the task is to find the polynomial representation of :

f(A) = e^ \qquad \mathrm \qquad A = \begin1&2\\0&3\end.

The characteristic polynomial is , and the eigenvalues are . Let . Evaluating at the eigenvalues, one obtains two linear equations, and . Solving the equations yields and . Thus, it follows that :

e^ = c_0 I_2 + c_1 A = \beginc_0 + c_1 & 2 c_1\\ 0 & c_0 + 3 c_1\end = \begine^ & e^ - e^ \\ 0 & e^\end.

If, instead, the function were , then the coefficients would have been and ; hence :

\sin(At) = c_0 I_2 + c_1 A = \begin\sin t & \sin 3t - \sin t \\ 0 & \sin 3t\end.

As a further example, when considering :

f(A) = e^ \qquad \mathrm \qquad A = \begin0 & 1\\-1 & 0\end,

then the characteristic polynomial is , and the eigenvalues are . As before, evaluating the function at the eigenvalues gives us the linear equations and ; the solution of which gives, and . Thus, for this case, :

e^ = (\cos t) I_2 + (\sin t) A =  \begin\cos t & \sin t\\ -\sin t & \cos t \end,

which is a

rotation matrix In linear algebra, a rotation matrix is a transformation matrix that is used to perform a rotation in Euclidean space. For example, using the convention below, the matrix :R = \begin \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end ...

. Standard examples of such usage is the exponential map from the

Lie algebra In mathematics, a Lie algebra (pronounced ) is a vector space \mathfrak g together with an Binary operation, operation called the Lie bracket, an Alternating multilinear map, alternating bilinear map \mathfrak g \times \mathfrak g \rightarrow ...

of a

matrix Lie group In mathematics, a Lie group (pronounced ) is a group (mathematics), group that is also a differentiable manifold. A manifold is a space that locally resembles Euclidean space, whereas groups define the abstract concept of a binary operation a ...

into the group. It is given by a

matrix exponential In mathematics, the matrix exponential is a matrix function on square matrices analogous to the ordinary exponential function. It is used to solve systems of linear differential equations. In the theory of Lie groups, the matrix exponential gives ...

, :

\exp: \mathfrak g \rightarrow G;
\qquad tX \mapsto e^ = \sum_^\infty \frac = I + tX + \frac + \cdots, t \in \mathbb R, X \in \mathfrak g .

Such expressions have long been known for , :

e^ = I_2 \cos \theta/2 + i(\hat n \cdot \sigma) \sin \theta/2,

where the are the

Pauli matrices In mathematical physics and mathematics, the Pauli matrices are a set of three complex matrices which are Hermitian, involutory and unitary. Usually indicated by the Greek letter sigma (), they are occasionally denoted by tau () when used in ...

and for , :

e^ = I_3 +  i(\hat n \cdot \mathbf J) \sin \theta  + (\hat n \cdot \mathbf J)^2 (\cos \theta - 1),

which is

Rodrigues' rotation formula In the theory of three-dimensional rotation, Rodrigues' rotation formula, named after Olinde Rodrigues, is an efficient algorithm for rotating a vector in space, given an axis and angle of rotation. By extension, this can be used to transform al ...

. For the notation, see 3D rotation group#A note on Lie algebras. More recently, expressions have appeared for other groups, like the

Lorentz group In physics and mathematics, the Lorentz group is the group of all Lorentz transformations of Minkowski spacetime, the classical and quantum setting for all (non-gravitational) physical phenomena. The Lorentz group is named for the Dutch physicis ...

, and , as well as . The group is the

conformal group In mathematics, the conformal group of an inner product space is the group of transformations from the space to itself that preserve angles. More formally, it is the group of transformations that preserve the conformal geometry of the space. Se ...

spacetime In physics, spacetime is a mathematical model that combines the three dimensions of space and one dimension of time into a single four-dimensional manifold. Spacetime diagrams can be used to visualize relativistic effects, such as why differen ...

, its

simply connected In topology, a topological space is called simply connected (or 1-connected, or 1-simply connected) if it is path-connected and every path between two points can be continuously transformed (intuitively for embedded spaces, staying within the spac ...

cover (to be precise, the simply connected cover of the connected component of ). The expressions obtained apply to the standard representation of these groups. They require knowledge of (some of) the

s of the matrix to exponentiate. For (and hence for ), closed expressions have been obtained for ''all'' irreducible representations, i.e. of any spin. GeorgFrobenius

Algebraic number theory

The Cayley–Hamilton theorem is an effective tool for computing the minimal polynomial of

algebraic integer In algebraic number theory, an algebraic integer is a complex number which is integral over the integers. That is, an algebraic integer is a complex root of some monic polynomial (a polynomial whose leading coefficient is 1) whose coefficients ...

s. For example, given a finite extension

\mathbb alpha_1,\ldots,\alpha_k /math> of \mathbb and an algebraic integer \alpha \in \mathbb alpha_1,\ldots,\alpha_k /math> which is a non-zero linear combination of the \alpha_1^\cdots\alpha_k^we can compute the minimal polynomial of \alpha by finding a matrix representing the \mathbb -

linear transformation In mathematics, and more specifically in linear algebra, a linear map (also called a linear mapping, linear transformation, vector space homomorphism, or in some contexts linear function) is a mapping V \to W between two vector spaces that pre ...

\cdot \alpha : \mathbb alpha_1,\ldots,\alpha_k \to \mathbb alpha_1,\ldots,\alpha_k /math>
If we call this transformation matrix A, then we can find the minimal polynomial by applying the Cayley–Hamilton theorem to A .

Proofs

The Cayley–Hamilton theorem is an immediate consequence of the existence of the

Jordan normal form In linear algebra, a Jordan normal form, also known as a Jordan canonical form (JCF), is an upper triangular matrix of a particular form called a Jordan matrix representing a linear operator on a finite-dimensional vector space with respect to so ...

for matrices over

algebraically closed field In mathematics, a field is algebraically closed if every non-constant polynomial in (the univariate polynomial ring with coefficients in ) has a root in . Examples As an example, the field of real numbers is not algebraically closed, because ...

s, see . In this section, direct proofs are presented. As the examples above show, obtaining the statement of the Cayley–Hamilton theorem for an matrix :

A = (a_)_^n

requires two steps: first the coefficients of the characteristic polynomial are determined by development as a polynomial in of the determinant :

& = t^n+c_t^+\cdots+c_1t+c_0, \end

and then these coefficients are used in a linear combination of powers of that is equated to the zero matrix: :

A^n+c_A^ + \cdots + c_1 A + c_0 I_n = \begin 0 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & 0 \end.

The left-hand side can be worked out to an matrix whose entries are (enormous) polynomial expressions in the set of entries of , so the Cayley–Hamilton theorem states that each of these expressions equals . For any fixed value of , these identities can be obtained by tedious but straightforward algebraic manipulations. None of these computations, however, can show why the Cayley–Hamilton theorem should be valid for matrices of all possible sizes , so a uniform proof for all is needed.

Preliminaries

If a vector of size is an

eigenvector In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...

of with eigenvalue , in other words if , then :

& = \lambda^nv+c_\lambda^v+\cdots+c_1\lambda v+c_0 v=p(\lambda)v, \end

which is the zero vector since (the eigenvalues of are precisely the

root In vascular plants, the roots are the organs of a plant that are modified to provide anchorage for the plant and take in water and nutrients into the plant body, which allows plants to grow taller and faster. They are most often below the sur ...

s of ). This holds for all possible eigenvalues , so the two matrices equated by the theorem certainly give the same (null) result when applied to any eigenvector. Now if admits a

basis Basis may refer to: Finance and accounting * Adjusted basis, the net cost of an asset after adjusting for various tax-related items *Basis point, 0.01%, often used in the context of interest rates * Basis trading, a trading strategy consisting ...

of eigenvectors, in other words if is

diagonalizable In linear algebra, a square matrix A is called diagonalizable or non-defective if it is similar to a diagonal matrix, i.e., if there exists an invertible matrix P and a diagonal matrix D such that or equivalently (Such D are not unique.) F ...

, then the Cayley–Hamilton theorem must hold for , since two matrices that give the same values when applied to each element of a basis must be equal. :

A=XDX^, \quad D=\operatorname(\lambda_i), \quad i=1,2,...,n

p_A(\lambda)=, \lambda I-A, =\prod_^n (\lambda-\lambda_i)\equiv \sum_^n c_k\lambda^k

p_A(A)=\sum c_k A^k=X p_A(D)X^=X C X^

C_=\sum_^n c_k\lambda_i^k=\prod_^n(\lambda_i-\lambda_j)=0, \qquad C_=0

\therefore p_A(A)=XCX^=O .

Consider now the function

e\colon M_n \to M_n

which maps matrices to matrices given by the formula

e(A)=p_A(A)

, i.e. which takes a matrix

A

and plugs it into its own characteristic polynomial. Not all matrices are diagonalizable, but for matrices with

complex Complex commonly refers to: * Complexity, the behaviour of a system whose components interact in multiple ways so possible interactions are difficult to describe ** Complex system, a system composed of many components which may interact with each ...

coefficients many of them are: the set

D

of diagonalizable complex square matrices of a given size is

dense Density (volumetric mass density or specific mass) is the substance's mass per unit of volume. The symbol most often used for density is ''ρ'' (the lower case Greek letter rho), although the Latin letter ''D'' can also be used. Mathematically ...

in the set of all such square matrices (for a matrix to be diagonalizable it suffices for instance that its characteristic polynomial not have any multiple roots). Now viewed as a function

e\colon \C^\to \C ^

(since matrices have

n^2

entries) we see that this function is

continuous Continuity or continuous may refer to: Mathematics * Continuity (mathematics), the opposing concept to discreteness; common examples include ** Continuous probability distribution or random variable in probability and statistics ** Continuous ...

. This is true because the entries of the image of a matrix are given by polynomials in the entries of the matrix. Since

e(D) = \left\

and since the set

D

is dense, by continuity this function must map the entire set of matrices to the zero matrix. Therefore, the Cayley–Hamilton theorem is true for complex numbers, and must therefore also hold for

\Q

- or

\R

-valued matrices. While this provides a valid proof, the argument is not very satisfactory, since the identities represented by the theorem do not in any way depend on the nature of the matrix (diagonalizable or not), nor on the kind of entries allowed (for matrices with real entries the diagonalizable ones do not form a dense set, and it seems strange one would have to consider complex matrices to see that the Cayley–Hamilton theorem holds for them). We shall therefore now consider only arguments that prove the theorem directly for any matrix using algebraic manipulations only; these also have the benefit of working for matrices with entries in any

. There is a great variety of such proofs of the Cayley–Hamilton theorem, of which several will be given here. They vary in the amount of

abstract algebra In mathematics, more specifically algebra, abstract algebra or modern algebra is the study of algebraic structures. Algebraic structures include groups, rings, fields, modules, vector spaces, lattices, and algebras over a field. The term ''a ...

ic notions required to understand the proof. The simplest proofs use just those notions needed to formulate the theorem (matrices, polynomials with numeric entries, determinants), but involve technical computations that render somewhat mysterious the fact that they lead precisely to the correct conclusion. It is possible to avoid such details, but at the price of involving more subtle algebraic notions: polynomials with coefficients in a non-commutative ring, or matrices with unusual kinds of entries.

Adjugate matrices

All proofs below use the notion of the

adjugate matrix In linear algebra, the adjugate or classical adjoint of a square matrix is the transpose of its cofactor matrix and is denoted by . It is also occasionally known as adjunct matrix, or "adjoint", though the latter today normally refers to a differe ...

of an matrix , the

transpose In linear algebra, the transpose of a matrix is an operator which flips a matrix over its diagonal; that is, it switches the row and column indices of the matrix by producing another matrix, often denoted by (among other notations). The tr ...

of its cofactor matrix. This is a matrix whose coefficients are given by polynomial expressions in the coefficients of (in fact, by certain determinants), in such a way that the following fundamental relations hold, :

\operatorname(M)\cdot M=\det(M)I_n=M\cdot\operatorname(M)~.

These relations are a direct consequence of the basic properties of determinants: evaluation of the entry of the matrix product on the left gives the expansion by column of the determinant of the matrix obtained from by replacing column by a copy of column , which is if and zero otherwise; the matrix product on the right is similar, but for expansions by rows. Being a consequence of just algebraic expression manipulation, these relations are valid for matrices with entries in any commutative ring (commutativity must be assumed for determinants to be defined in the first place). This is important to note here, because these relations will be applied below for matrices with non-numeric entries such as polynomials.

A direct algebraic proof

This proof uses just the kind of objects needed to formulate the Cayley–Hamilton theorem: matrices with polynomials as entries. The matrix whose determinant is the characteristic polynomial of is such a matrix, and since polynomials form a commutative ring, it has an

adjugate In linear algebra, the adjugate or classical adjoint of a square matrix is the transpose of its cofactor matrix and is denoted by . It is also occasionally known as adjunct matrix, or "adjoint", though the latter today normally refers to a differe ...

B=\operatorname(tI_n-A).

Then, according to the right-hand fundamental relation of the adjugate, one has :

(t I_n - A)B = \det(t I_n - A) I_n = p(t) I_n.

Since is also a matrix with polynomials in as entries, one can, for each , collect the coefficients of in each entry to form a matrix of numbers, such that one has :

B = \sum_^ t^i B_i.

(The way the entries of are defined makes clear that no powers higher than occur). While this ''looks'' like a polynomial with matrices as coefficients, we shall not consider such a notion; it is just a way to write a matrix with polynomial entries as a linear combination of constant matrices, and the coefficient has been written to the left of the matrix to stress this point of view. Now, one can expand the matrix product in our equation by bilinearity: :

\begin
 p(t) I_n &= (t I_n - A)B \\
 &=(t I_n - A)\sum_^ t^i B_i  \\
 &=\sum_^ tI_n\cdot t^i B_i - \sum_^ A\cdot t^i B_i \\
 &=\sum_^ t^  B_i- \sum_^ t^i AB_i  \\
 &=t^n B_ + \sum_^  t^i(B_ - AB_i) - AB_0.
\end

Writing :

p(t)I_n=t^nI_n+t^c_I_n+\cdots+tc_1I_n+c_0I_n,

one obtains an equality of two matrices with polynomial entries, written as linear combinations of constant matrices with powers of as coefficients. Such an equality can hold only if in any matrix position the entry that is multiplied by a given power is the same on both sides; it follows that the constant matrices with coefficient in both expressions must be equal. Writing these equations then for from down to 0, one finds :

B_ = I_n, \qquad B_ - AB_i = c_i I_n\quad \text1 \leq i \leq n-1, \qquad -A B_0 = c_0 I_n.

Finally, multiply the equation of the coefficients of from the left by , and sum up:

A^n B_ + \sum\limits_^\left( A^i B_ - A^B_i\right) -A B_0 = A^n+c_A^+\cdots+c_1A+c_0I_n.

The left-hand sides form a

telescoping sum In mathematics, a telescoping series is a series whose general term t_n can be written as t_n=a_n-a_, i.e. the difference of two consecutive terms of a sequence (a_n). As a consequence the partial sums only consists of two terms of (a_n) after c ...

and cancel completely; the right-hand sides add up to

p(A)

: :

0 = p(A).

This completes the proof.

A proof using polynomials with matrix coefficients

This proof is similar to the first one, but tries to give meaning to the notion of polynomial with matrix coefficients that was suggested by the expressions occurring in that proof. This requires considerable care, since it is somewhat unusual to consider polynomials with coefficients in a non-commutative ring, and not all reasoning that is valid for commutative polynomials can be applied in this setting. Notably, while arithmetic of polynomials over a commutative ring models the arithmetic of

polynomial function In mathematics, a polynomial is an expression (mathematics), expression consisting of indeterminate (variable), indeterminates (also called variable (mathematics), variables) and coefficients, that involves only the operations of addition, subtrac ...

s, this is not the case over a non-commutative ring (in fact there is no obvious notion of polynomial function in this case that is closed under multiplication). So when considering polynomials in with matrix coefficients, the variable must not be thought of as an "unknown", but as a formal symbol that is to be manipulated according to given rules; in particular one cannot just set to a specific value. :

(f+g)(x) = \sum_i \left (f_i+g_i \right )x^i = \sum_i + \sum_i = f(x) + g(x).

Let

M(n,R)

be the ring of matrices with entries in some ring ''R'' (such as the real or complex numbers) that has as an element. Matrices with as coefficients polynomials in , such as

tI_n - A

or its adjugate ''B'' in the first proof, are elements of

M(n,R

. By collecting like powers of , such matrices can be written as "polynomials" in with constant matrices as coefficients; write

M(n,R) /math> for the set of such polynomials. Since this set is in

bijection In mathematics, a bijection, also known as a bijective function, one-to-one correspondence, or invertible function, is a function between the elements of two sets, where each element of one set is paired with exactly one element of the other s ...

with

M(n,R

, one defines arithmetic operations on it correspondingly, in particular multiplication is given by :

\left( \sum_iM_it^i \right)\!\!\left( \sum_jN_jt^j \right) = \sum_(M_i N_j)t^,

respecting the order of the coefficient matrices from the two operands; obviously this gives a non-commutative multiplication. Thus, the identity :

(t I_n - A)B = p(t) I_n.

from the first proof can be viewed as one involving a multiplication of elements in

M(n,R) /math> .

At this point, it is tempting to simply set  equal to the matrix , which makes the first factor on the left equal to the zero matrix, and the right hand side equal to ; however, this is not an allowed operation when coefficients do not commute. It is possible to define a "right-evaluation map" ev

: M 't'' → M, which replaces each by the matrix power of , where one stipulates that the power is always to be multiplied on the right to the corresponding coefficient. But this map is not a

ring homomorphism In ring theory, a branch of abstract algebra, a ring homomorphism is a structure-preserving function between two rings. More explicitly, if ''R'' and ''S'' are rings, then a ring homomorphism is a function such that ''f'' is: :addition preservi ...

: the right-evaluation of a product differs in general from the product of the right-evaluations. This is so because multiplication of polynomials with matrix coefficients does not model multiplication of expressions containing unknowns: a product

Mt^i Nt^j = (M\cdot N)t^

is defined assuming that commutes with , but this may fail if is replaced by the matrix . One can work around this difficulty in the particular situation at hand, since the above right-evaluation map does become a ring homomorphism if the matrix is in the

center Center or centre may refer to: Mathematics *Center (geometry), the middle of an object * Center (algebra), used in various contexts ** Center (group theory) ** Center (ring theory) * Graph center, the set of all vertices of minimum eccentrici ...

of the ring of coefficients, so that it commutes with all the coefficients of the polynomials (the argument proving this is straightforward, exactly because commuting with coefficients is now justified after evaluation). Now, is not always in the center of M, but we may replace M with a smaller ring provided it contains all the coefficients of the polynomials in question:

I_n

, , and the coefficients

B_i

of the polynomial ''B''. The obvious choice for such a

subring In mathematics, a subring of ''R'' is a subset of a ring that is itself a ring when binary operations of addition and multiplication on ''R'' are restricted to the subset, and which shares the same multiplicative identity as ''R''. For those wh ...

is the

centralizer In mathematics, especially group theory, the centralizer (also called commutant) of a subset ''S'' in a group ''G'' is the set of elements \mathrm_G(S) of ''G'' such that each member g \in \mathrm_G(S) commutes with each element of ''S'', o ...

''Z'' of , the subring of all matrices that commute with ; by definition is in the center of ''Z''. This centralizer obviously contains

I_n

, and , but one has to show that it contains the matrices

B_i

. To do this, one combines the two fundamental relations for adjugates, writing out the adjugate ''B'' as a polynomial: :

\begin
  \left(\sum_^m B_i t^i\right)\!(t I_n - A) &= (tI_n - A) \sum_^m B_i t^i \\
  \sum_^m B_i t^ - \sum_^m B_i A t^i &= \sum_^m B_i t^ - \sum_^m A B_i t^i \\
 \sum_^m B_i A t^i &= \sum_^m A B_i t^i .
\end

Equating the coefficients In mathematics, the method of equating the coefficients is a way of solving a functional equation of two expressions such as polynomials for a number of unknown parameters. It relies on the fact that two expressions are identical precisely when cor ...

shows that for each ''i'', we have as desired. Having found the proper setting in which ev is indeed a

homomorphism In algebra, a homomorphism is a structure-preserving map between two algebraic structures of the same type (such as two groups, two rings, or two vector spaces). The word ''homomorphism'' comes from the Ancient Greek language: () meaning "same" ...

of rings, one can complete the proof as suggested above: :

p(A) &= (AI_n-A) \cdot \operatorname_A(B) = O\cdot\operatorname_A(B)=O. \end

This completes the proof.

A synthesis of the first two proofs

In the first proof, one was able to determine the coefficients of based on the right-hand fundamental relation for the adjugate only. In fact the first equations derived can be interpreted as determining the quotient of the

Euclidean division In arithmetic, Euclidean division – or division with remainder – is the process of dividing one integer (the dividend) by another (the divisor), in a way that produces an integer quotient and a natural number remainder strictly smaller than ...

of the polynomial on the left by the

, while the final equation expresses the fact that the remainder is zero. This division is performed in the ring of polynomials with matrix coefficients. Indeed, even over a non-commutative ring, Euclidean division by a monic polynomial is defined, and always produces a unique quotient and remainder with the same

condition as in the commutative case, provided it is specified at which side one wishes to be a factor (here that is to the left). To see that quotient and remainder are unique (which is the important part of the statement here), it suffices to write

PQ+r = PQ'+r'

P(Q-Q') = r'-r

and observe that since is monic, cannot have a degree less than that of , unless . But the dividend and divisor used here both lie in the subring , where is the subring of the matrix ring generated by : the -linear

span Span may refer to: Science, technology and engineering * Span (unit), the width of a human hand * Span (engineering), a section between two intermediate supports * Wingspan, the distance between the wingtips of a bird or aircraft * Sorbitan ester ...

of all powers of . Therefore, the Euclidean division can in fact be performed within that ''commutative'' polynomial ring, and of course it then gives the same quotient and remainder 0 as in the larger ring; in particular this shows that in fact lies in . But, in this commutative setting, it is valid to set to in the equation :

p(t)I_n=(tI_n-A)B;

in other words, to apply the evaluation map :

\operatorname_A:(R to R /math>

which is a ring homomorphism, giving

: p(A)=0\cdot\operatorname_A(B)=0 just like in the second proof, as desired.

In addition to proving the theorem, the above argument tells us that the coefficients  of  are polynomials in , while from the second proof we only knew that they lie in the centralizer  of ; in general  is a larger subring than , and not necessarily commutative. In particular the constant term  lies in . Since  is an arbitrary square matrix, this proves that  can always be expressed as a polynomial in  (with coefficients that depend on .

In fact, the equations found in the first proof allow successively expressing B_, \ldots, B_1, B_0 as polynomials in , which leads to the identity

valid for all  matrices, where 
: p(t)=t^n+c_t^+\cdots+c_1t+c_0 is the characteristic polynomial of .

Note that this identity also implies the statement of the Cayley–Hamilton theorem: one may move  to the right hand side, multiply the resulting equation (on the left or on the right) by , and use the fact that
: -A\cdot \operatorname(-A) = \operatorname(-A)\cdot (-A) = \det(-A)I_n = c_0I_n.

A proof using matrices of endomorphisms

As was mentioned above, the matrix ''p''(''A'') in statement of the theorem is obtained by first evaluating the determinant and then substituting the matrix ''A'' for ''t''; doing that substitution into the matrix

tI_n-A

before evaluating the determinant is not meaningful. Nevertheless, it is possible to give an interpretation where ''p''(''A'') is obtained directly as the value of a certain determinant, but this requires a more complicated setting, one of matrices over a ring in which one can interpret both the entries

A_

of ''A'', and all of ''A'' itself. One could take for this the ring ''M''(''n'', ''R'') of ''n'' × ''n'' matrices over ''R'', where the entry

A_

is realised as

A_I_n

, and ''A'' as itself. But considering matrices with matrices as entries might cause confusion with

block matrices In mathematics, a block matrix or a partitioned matrix is a matrix that is '' interpreted'' as having been broken into sections called blocks or submatrices. Intuitively, a matrix interpreted as a block matrix can be visualized as the original mat ...

, which is not intended, as that gives the wrong notion of determinant (recall that the determinant of a matrix is defined as a sum of products of its entries, and in the case of a block matrix this is generally not the same as the corresponding sum of products of its blocks!). It is clearer to distinguish ''A'' from the

endomorphism In mathematics, an endomorphism is a morphism from a mathematical object to itself. An endomorphism that is also an isomorphism is an automorphism. For example, an endomorphism of a vector space is a linear map , and an endomorphism of a g ...

''φ'' of an

vector space In mathematics and physics, a vector space (also called a linear space) is a set whose elements, often called ''vectors'', may be added together and multiplied ("scaled") by numbers called '' scalars''. Scalars are often real numbers, but can ...

''V'' (or free ''R''-module if ''R'' is not a field) defined by it in a basis

e_1, \ldots, e_n

, and to take matrices over the ring End(''V'') of all such endomorphisms. Then ''φ'' ∈ End(''V'') is a possible matrix entry, while ''A'' designates the element of ''M''(''n'', End(''V'')) whose entry is endomorphism of scalar multiplication by

A_

; similarly

I_n

will be interpreted as element of ''M''(''n'', End(''V'')). However, since End(''V'') is not a commutative ring, no determinant is defined on ''M''(''n'', End(''V'')); this can only be done for matrices over a commutative subring of End(''V''). Now the entries of the matrix

\varphi I_n-A

all lie in the subring ''R'' 'φ''generated by the identity and ''φ'', which is commutative. Then a determinant map ''M''(''n'', ''R'' 'φ'' → ''R'' 'φ''is defined, and

\det(\varphi I_n-A)

evaluates to the value ''p''(''φ'') of the characteristic polynomial of ''A'' at ''φ'' (this holds independently of the relation between ''A'' and ''φ''); the Cayley–Hamilton theorem states that ''p''(''φ'') is the null endomorphism. In this form, the following proof can be obtained from that of (which in fact is the more general statement related to the Nakayama lemma; one takes for the

ideal Ideal may refer to: Philosophy * Ideal (ethics), values that one actively pursues as goals * Platonic ideal, a philosophical idea of trueness of form, associated with Plato Mathematics * Ideal (ring theory), special subsets of a ring considere ...

in that proposition the whole ring ''R''). The fact that ''A'' is the matrix of ''φ'' in the basis ''e''₁, ..., ''e''_''n'' means that :

\varphi(e_i) = \sum_^n A_ e_j \quad\texti=1,\ldots,n.

One can interpret these as ''n'' components of one equation in ''V''^''n'', whose members can be written using the matrix-vector product ''M''(''n'', End(''V'')) × ''V''^''n'' → ''V''^''n'' that is defined as usual, but with individual entries ''ψ'' ∈ End(''V'') and ''v'' in ''V'' being "multiplied" by forming

\psi(v)

; this gives: :

\varphi I_n \cdot E= A^\operatorname\cdot E,

where

E\in V^n

is the element whose component ''i'' is ''e''_''i'' (in other words it is the basis ''e''₁, ..., ''e''_''n'' of ''V'' written as a column of vectors). Writing this equation as :

(\varphi I_n-A^\operatorname)\cdot E = 0\in V^n

one recognizes the

of the matrix

\varphi I_n-A

considered above, and its determinant (as element of ''M''(''n'', ''R'' 'φ'') is also ''p''(''φ''). To derive from this equation that ''p''(''φ'') = 0 ∈ End(''V''), one left-multiplies by the

\varphi I_n-A^\operatorname

, which is defined in the matrix ring ''M''(''n'', ''R'' 'φ'', giving :

\begin
 0&=\operatorname(\varphi I_n-A^\operatorname)\cdot((\varphi I_n-A^\operatorname)\cdot E)\\
  &= (\operatorname(\varphi I_n-A^\operatorname)\cdot(\varphi I_n-A^\operatorname))\cdot E\\
  &= (\det(\varphi I_n-A^\operatorname)I_n)\cdot E\\
  &= (p(\varphi)I_n)\cdot E;
\end

the

associativity In mathematics, the associative property is a property of some binary operations, which means that rearranging the parentheses in an expression will not change the result. In propositional logic, associativity is a valid rule of replacement f ...

of matrix-matrix and matrix-vector multiplication used in the first step is a purely formal property of those operations, independent of the nature of the entries. Now component ''i'' of this equation says that ''p''(''φ'')(''e_i'') = 0 ∈ ''V''; thus ''p''(''φ'') vanishes on all ''e''_''i'', and since these elements generate ''V'' it follows that ''p''(''φ'') = 0 ∈ End(''V''), completing the proof. One additional fact that follows from this proof is that the matrix ''A'' whose characteristic polynomial is taken need not be identical to the value ''φ'' substituted into that polynomial; it suffices that ''φ'' be an endomorphism of ''V'' satisfying the initial equations :

\varphi(e_i) = \sum_j A_ e_j

for ''some'' sequence of elements ''e''₁, ..., ''e''_''n'' that generate ''V'' (which space might have smaller dimension than ''n'', or in case the ring ''R'' is not a field it might not be a

free module In mathematics, a free module is a module that has a basis – that is, a generating set consisting of linearly independent elements. Every vector space is a free module, but, if the ring of the coefficients is not a division ring (not a field in t ...

at all).

A bogus "proof": ''p''(''A'') = det(''AI''_''n'' − ''A'') = det(''A'' − ''A'') = 0

One persistent elementary but ''incorrect'' argument for the theorem is to "simply" take the definition :

p(\lambda) = \det(\lambda I_n - A)

and substitute for , obtaining :

p(A)=\det(A I_n - A) = \det(A - A) = \det(\mathbf) = 0.

There are many ways to see why this argument is wrong. First, in the Cayley–Hamilton theorem, ''p''(''A'') is an ''n × n matrix''. However, the right hand side of the above equation is the value of a determinant, which is a ''scalar''. So they cannot be equated unless ''n'' = 1 (i.e. ''A'' is just a scalar). Second, in the expression

\det(\lambda I_n - A)

, the variable λ actually occurs at the diagonal entries of the matrix

\lambda I_n - A

. To illustrate, consider the characteristic polynomial in the previous example again: :

\det\!\begin\lambda-1&-2\\-3&\lambda-4\end.

If one substitutes the entire matrix ''A'' for ''λ'' in those positions, one obtains :

\det\!\begin \begin 1 & 2 \\ 3 & 4 \end - 1 & -2 \\ -3 &\begin 1 & 2 \\ 3 & 4 \end - 4\end,

in which the "matrix" expression is simply not a valid one. Note, however, that if scalar multiples of identity matrices instead of scalars are subtracted in the above, i.e. if the substitution is performed as :

\det\!\begin \begin 1 & 2 \\ 3 & 4 \end - I_2 & -2I_2 \\ -3I_2 &\begin 1 & 2 \\ 3 & 4 \end - 4I_2 \end,

then the determinant is indeed zero, but the expanded matrix in question does not evaluate to

A I_n-A

; nor can its determinant (a scalar) be compared to ''p''(''A'') (a matrix). So the argument that

p(A) = \det(AI_n-A) = 0

still does not apply. Actually, if such an argument holds, it should also hold when other

multilinear form In abstract algebra and multilinear algebra, a multilinear form on a vector space V over a field K is a map :f\colon V^k \to K that is separately ''K''-linear in each of its ''k'' arguments. More generally, one can define multilinear forms on ...

s instead of determinant is used. For instance, if we consider the

permanent Permanent may refer to: Art and entertainment * ''Permanent'' (film), a 2017 American film * ''Permanent'' (Joy Division album) * "Permanent" (song), by David Cook Other uses * Permanent (mathematics), a concept in linear algebra * Permanent (cy ...

function and define

q(\lambda) = \operatorname(\lambda I_n - A)

, then by the same argument, we should be able to "prove" that ''q''(''A'') = 0. But this statement is demonstrably wrong: in the 2-dimensional case, for instance, the permanent of a matrix is given by :

\operatorname\!\begin a & b \\ c & d \end = ad + bc.

So, for the matrix ''A'' in the previous example, :

& = (\lambda - 1)(\lambda - 4) + (-2)(-3) = \lambda^2 - 5\lambda + 10. \end

Yet one can verify that :

q(A) = A^2-5A+10I_2=12I_2\not=0.

One of the proofs for Cayley–Hamilton theorem above bears some similarity to the argument that

p(A)=\det(AI_n-A)=0

. By introducing a matrix with non-numeric coefficients, one can actually let ''A'' live inside a matrix entry, but then

A I_n

is not equal to ''A'', and the conclusion is reached differently.

Proofs using methods of abstract algebra

Basic properties of Hasse–Schmidt derivations on the

exterior algebra In mathematics, the exterior algebra, or Grassmann algebra, named after Hermann Grassmann, is an algebra that uses the exterior product or wedge product as its multiplication. In mathematics, the exterior product or wedge product of vectors is ...

A = \bigwedge M

of some ''B''-

module Module, modular and modularity may refer to the concept of modularity. They may also refer to: Computing and engineering * Modular design, the engineering discipline of designing complex devices using separately designed sub-components * Mo ...

''M'' (supposed to be free and of finite rank) have been used by to prove the Cayley–Hamilton theorem. See also .

Abstraction and generalizations

The above proofs show that the Cayley–Hamilton theorem holds for matrices with entries in any commutative ring ''R'', and that ''p''(''φ'') = 0 will hold whenever ''φ'' is an endomorphism of an ''R''-module generated by elements ''e''₁,...,''e''_''n'' that satisfies :

\varphi(e_j)=\sum a_e_i, \qquad j =1, \ldots, n.

This more general version of the theorem is the source of the celebrated Nakayama lemma in

commutative algebra Commutative algebra, first known as ideal theory, is the branch of algebra that studies commutative rings, their ideals, and modules over such rings. Both algebraic geometry and algebraic number theory build on commutative algebra. Prominent ...

and

algebraic geometry Algebraic geometry is a branch of mathematics, classically studying zeros of multivariate polynomials. Modern algebraic geometry is based on the use of abstract algebraic techniques, mainly from commutative algebra, for solving geometrical ...

Remarks

Notes

References

* (open access) * * * * * * * * * * * * * * * * (communicated on June 9, 1862) * (communicated on June 23, 1862)
"Classroom Note: A Simple Proof of the Leverrier--Faddeev Characteristic Polynomial Algorithm"
* * * * * * (open archive).

External links

*
A proof from PlanetMath.
{{DEFAULTSORT:Cayley-Hamilton Theorem Theorems in linear algebra Articles containing proofs Matrix theory William Rowan Hamilton

Examples

matrices

matrices

Applications

Determinant and inverse matrix

''n''-th power of matrix

Matrix functions

Algebraic number theory

Proofs

Preliminaries

Adjugate matrices

A direct algebraic proof

A proof using polynomials with matrix coefficients

A synthesis of the first two proofs

A proof using matrices of endomorphisms

A bogus "proof": ''p''(''A'') = det(''AI''''n'' − ''A'') = det(''A'' − ''A'') = 0

Proofs using methods of abstract algebra

Abstraction and generalizations

See also

Remarks

Notes

References

External links

A bogus "proof": ''p''(''A'') = det(''AI''_''n'' − ''A'') = det(''A'' − ''A'') = 0