In
mathematics
Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...
, matrix calculus is a specialized notation for doing
multivariable calculus
Multivariable calculus (also known as multivariate calculus) is the extension of calculus in one variable to calculus with functions of several variables: the differentiation and integration of functions involving multiple variables ('' mult ...
, especially over spaces of
matrices. It collects the various
partial derivative
In mathematics, a partial derivative of a function of several variables is its derivative with respect to one of those variables, with the others held constant (as opposed to the total derivative, in which all variables are allowed to vary). P ...
s of a single
function with respect to many
variables, and/or of a
multivariate function
In mathematics, a function from a set to a set assigns to each element of exactly one element of .; the words ''map'', ''mapping'', ''transformation'', ''correspondence'', and ''operator'' are sometimes used synonymously. The set is called ...
with respect to a single variable, into
vectors and matrices that can be treated as single entities. This greatly simplifies operations such as finding the maximum or minimum of a multivariate function and solving systems of
differential equations. The notation used here is commonly used in
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
and
engineering
Engineering is the practice of using natural science, mathematics, and the engineering design process to Problem solving#Engineering, solve problems within technology, increase efficiency and productivity, and improve Systems engineering, s ...
, while the
tensor index notation is preferred in
physics
Physics is the scientific study of matter, its Elementary particle, fundamental constituents, its motion and behavior through space and time, and the related entities of energy and force. "Physical science is that department of knowledge whi ...
.
Two competing notational conventions split the field of matrix calculus into two separate groups. The two groups can be distinguished by whether they write the derivative of a
scalar with respect to a vector as a
column vector or a row vector. Both of these conventions are possible even when the common assumption is made that vectors should be treated as column vectors when combined with matrices (rather than row vectors). A single convention can be somewhat standard throughout a single field that commonly uses matrix calculus (e.g.
econometrics
Econometrics is an application of statistical methods to economic data in order to give empirical content to economic relationships. M. Hashem Pesaran (1987). "Econometrics", '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8 ...
, statistics,
estimation theory
Estimation theory is a branch of statistics that deals with estimating the values of Statistical parameter, parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such ...
and
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
). However, even within a given field different authors can be found using competing conventions. Authors of both groups often write as though their specific conventions were standard. Serious mistakes can result when combining results from different authors without carefully verifying that compatible notations have been used. Definitions of these two conventions and comparisons between them are collected in the
layout conventions section.
Scope
Matrix calculus refers to a number of different notations that use matrices and vectors to collect the derivative of each component of the dependent variable with respect to each component of the independent variable. In general, the independent variable can be a scalar, a vector, or a matrix while the dependent variable can be any of these as well. Each different situation will lead to a different set of rules, or a separate
calculus
Calculus is the mathematics, mathematical study of continuous change, in the same way that geometry is the study of shape, and algebra is the study of generalizations of arithmetic operations.
Originally called infinitesimal calculus or "the ...
, using the broader sense of the term. Matrix notation serves as a convenient way to collect the many derivatives in an organized way.
As a first example, consider the
gradient
In vector calculus, the gradient of a scalar-valued differentiable function f of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p gives the direction and the rate of fastest increase. The g ...
from
vector calculus
Vector calculus or vector analysis is a branch of mathematics concerned with the differentiation and integration of vector fields, primarily in three-dimensional Euclidean space, \mathbb^3. The term ''vector calculus'' is sometimes used as a ...
. For a scalar function of three independent variables,
, the gradient is given by the vector equation
:
where
represents a unit vector in the
direction for
. This type of generalized derivative can be seen as the derivative of a scalar, ''f'', with respect to a vector,
, and its result can be easily collected in vector form.
:
More complicated examples include the derivative of a scalar function with respect to a matrix, known as the
gradient matrix, which collects the derivative with respect to each matrix element in the corresponding position in the resulting matrix. In that case the scalar must be a function of each of the independent variables in the matrix. As another example, if we have an -vector of dependent variables, or functions, of independent variables we might consider the derivative of the dependent vector with respect to the independent vector. The result could be collected in an matrix consisting of all of the possible derivative combinations.
There are a total of nine possibilities using scalars, vectors, and matrices. Notice that as we consider higher numbers of components in each of the independent and dependent variables we can be left with a very large number of possibilities. The six kinds of derivatives that can be most neatly organized in matrix form are collected in the following table.
Here, we have used the term "matrix" in its most general sense, recognizing that vectors are simply matrices with one column (and scalars are simply vectors with one row). Moreover, we have used bold letters to indicate vectors and bold capital letters for matrices. This notation is used throughout.
Notice that we could also talk about the derivative of a vector with respect to a matrix, or any of the other unfilled cells in our table. However, these derivatives are most naturally organized in a
tensor
In mathematics, a tensor is an algebraic object that describes a multilinear relationship between sets of algebraic objects associated with a vector space. Tensors may map between different objects such as vectors, scalars, and even other ...
of rank higher than 2, so that they do not fit neatly into a matrix. In the following three sections we will define each one of these derivatives and relate them to other branches of mathematics. See the
layout conventions section for a more detailed table.
Relation to other derivatives
The matrix derivative is a convenient notation for keeping track of partial derivatives for doing calculations. The
Fréchet derivative is the standard way in the setting of
functional analysis
Functional analysis is a branch of mathematical analysis, the core of which is formed by the study of vector spaces endowed with some kind of limit-related structure (for example, Inner product space#Definition, inner product, Norm (mathematics ...
to take derivatives with respect to vectors. In the case that a matrix function of a matrix is Fréchet differentiable, the two derivatives will agree up to translation of notations. As is the case in general for
partial derivative
In mathematics, a partial derivative of a function of several variables is its derivative with respect to one of those variables, with the others held constant (as opposed to the total derivative, in which all variables are allowed to vary). P ...
s, some formulae may extend under weaker analytic conditions than the existence of the derivative as approximating linear mapping.
Usages
Matrix calculus is used for deriving optimal stochastic estimators, often involving the use of
Lagrange multipliers. This includes the derivation of:
*
Kalman filter
In statistics and control theory, Kalman filtering (also known as linear quadratic estimation) is an algorithm that uses a series of measurements observed over time, including statistical noise and other inaccuracies, to produce estimates of unk ...
*
Wiener filter
*
Expectation-maximization algorithm for Gaussian mixture
*
Gradient descent
Notation
The vector and matrix derivatives presented in the sections to follow take full advantage of
matrix notation, using a single variable to represent a large number of variables. In what follows we will distinguish scalars, vectors and matrices by their typeface. We will let denote the space of
real matrices with rows and columns. Such matrices will be denoted using bold capital letters: , , , etc. An element of , that is, a
column vector
In linear algebra, a column vector with elements is an m \times 1 matrix consisting of a single column of entries, for example,
\boldsymbol = \begin x_1 \\ x_2 \\ \vdots \\ x_m \end.
Similarly, a row vector is a 1 \times n matrix for some , c ...
, is denoted with a boldface lowercase letter: , , , etc. An element of is a scalar, denoted with lowercase italic typeface: , , , etc. denotes matrix
transpose
In linear algebra, the transpose of a Matrix (mathematics), matrix is an operator which flips a matrix over its diagonal;
that is, it switches the row and column indices of the matrix by producing another matrix, often denoted by (among other ...
, is the
trace, and or is the
determinant
In mathematics, the determinant is a Scalar (mathematics), scalar-valued function (mathematics), function of the entries of a square matrix. The determinant of a matrix is commonly denoted , , or . Its value characterizes some properties of the ...
. All functions are assumed to be of
differentiability class unless otherwise noted. Generally letters from the first half of the alphabet (a, b, c, ...) will be used to denote constants, and from the second half (t, x, y, ...) to denote variables.
NOTE: As mentioned above, there are competing notations for laying out systems of
partial derivative
In mathematics, a partial derivative of a function of several variables is its derivative with respect to one of those variables, with the others held constant (as opposed to the total derivative, in which all variables are allowed to vary). P ...
s in vectors and matrices, and no standard appears to be emerging yet. The next two introductory sections use the
numerator layout convention simply for the purposes of convenience, to avoid overly complicating the discussion. The section after them discusses
layout conventions in more detail. It is important to realize the following:
#Despite the use of the terms "numerator layout" and "denominator layout", there are actually more than two possible notational choices involved. The reason is that the choice of numerator vs. denominator (or in some situations, numerator vs. mixed) can be made independently for scalar-by-vector, vector-by-scalar, vector-by-vector, and scalar-by-matrix derivatives, and a number of authors mix and match their layout choices in various ways.
#The choice of numerator layout in the introductory sections below does not imply that this is the "correct" or "superior" choice. There are advantages and disadvantages to the various layout types. Serious mistakes can result from carelessly combining formulas written in different layouts, and converting from one layout to another requires care to avoid errors. As a result, when working with existing formulas the best policy is probably to identify whichever layout is used and maintain consistency with it, rather than attempting to use the same layout in all situations.
Alternatives
The
tensor index notation with its
Einstein summation convention is very similar to the matrix calculus, except one writes only a single component at a time. It has the advantage that one can easily manipulate arbitrarily high rank tensors, whereas tensors of rank higher than two are quite unwieldy with matrix notation. All of the work here can be done in this notation without use of the single-variable matrix notation. However, many problems in estimation theory and other areas of applied mathematics would result in too many indices to properly keep track of, pointing in favor of matrix calculus in those areas. Also, Einstein notation can be very useful in proving the identities presented here (see section on
differentiation) as an alternative to typical element notation, which can become cumbersome when the explicit sums are carried around. Note that a matrix can be considered a tensor of rank two.
Derivatives with vectors
Because vectors are matrices with only one column, the simplest matrix derivatives are vector derivatives.
The notations developed here can accommodate the usual operations of
vector calculus
Vector calculus or vector analysis is a branch of mathematics concerned with the differentiation and integration of vector fields, primarily in three-dimensional Euclidean space, \mathbb^3. The term ''vector calculus'' is sometimes used as a ...
by identifying the space of -vectors with the
Euclidean space
Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are ''Euclidean spaces ...
, and the scalar is identified with . The corresponding concept from vector calculus is indicated at the end of each subsection.
NOTE: The discussion in this section assumes the
numerator layout convention for pedagogical purposes. Some authors use different conventions. The section on
layout conventions discusses this issue in greater detail. The identities given further down are presented in forms that can be used in conjunction with all common layout conventions.
Vector-by-scalar
The
derivative
In mathematics, the derivative is a fundamental tool that quantifies the sensitivity to change of a function's output with respect to its input. The derivative of a function of a single variable at a chosen input value, when it exists, is t ...
of a
vector
Vector most often refers to:
* Euclidean vector, a quantity with a magnitude and a direction
* Disease vector, an agent that carries and transmits an infectious pathogen into another living organism
Vector may also refer to:
Mathematics a ...
, by a
scalar is written (in
numerator layout notation) as
:
In
vector calculus
Vector calculus or vector analysis is a branch of mathematics concerned with the differentiation and integration of vector fields, primarily in three-dimensional Euclidean space, \mathbb^3. The term ''vector calculus'' is sometimes used as a ...
the derivative of a vector with respect to a scalar is known as the
tangent vector
In mathematics, a tangent vector is a vector that is tangent to a curve or surface at a given point. Tangent vectors are described in the differential geometry of curves in the context of curves in R''n''. More generally, tangent vectors are ...
of the vector ,
. Notice here that .
Example Simple examples of this include the
velocity
Velocity is a measurement of speed in a certain direction of motion. It is a fundamental concept in kinematics, the branch of classical mechanics that describes the motion of physical objects. Velocity is a vector (geometry), vector Physical q ...
vector in
Euclidean space
Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are ''Euclidean spaces ...
, which is the
tangent vector
In mathematics, a tangent vector is a vector that is tangent to a curve or surface at a given point. Tangent vectors are described in the differential geometry of curves in the context of curves in R''n''. More generally, tangent vectors are ...
of the
position vector (considered as a function of time). Also, the
acceleration
In mechanics, acceleration is the Rate (mathematics), rate of change of the velocity of an object with respect to time. Acceleration is one of several components of kinematics, the study of motion. Accelerations are Euclidean vector, vector ...
is the tangent vector of the velocity.
Scalar-by-vector
The
derivative
In mathematics, the derivative is a fundamental tool that quantifies the sensitivity to change of a function's output with respect to its input. The derivative of a function of a single variable at a chosen input value, when it exists, is t ...
of a
scalar by a vector
, is written (in
numerator layout notation) as
:
In
vector calculus
Vector calculus or vector analysis is a branch of mathematics concerned with the differentiation and integration of vector fields, primarily in three-dimensional Euclidean space, \mathbb^3. The term ''vector calculus'' is sometimes used as a ...
, the
gradient
In vector calculus, the gradient of a scalar-valued differentiable function f of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p gives the direction and the rate of fastest increase. The g ...
of a scalar field (whose independent coordinates are the components of ) is the transpose of the derivative of a scalar by a vector.
:
By example, in physics, the
electric field
An electric field (sometimes called E-field) is a field (physics), physical field that surrounds electrically charged particles such as electrons. In classical electromagnetism, the electric field of a single charge (or group of charges) descri ...
is the negative vector
gradient
In vector calculus, the gradient of a scalar-valued differentiable function f of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p gives the direction and the rate of fastest increase. The g ...
of the
electric potential
Electric potential (also called the ''electric field potential'', potential drop, the electrostatic potential) is defined as electric potential energy per unit of electric charge. More precisely, electric potential is the amount of work (physic ...
.
The
directional derivative
In multivariable calculus, the directional derivative measures the rate at which a function changes in a particular direction at a given point.
The directional derivative of a multivariable differentiable (scalar) function along a given vect ...
of a scalar function of the space vector in the direction of the unit vector (represented in this case as a column vector) is defined using the gradient as follows.
:
Using the notation just defined for the derivative of a scalar with respect to a vector we can re-write the directional derivative as
This type of notation will be nice when proving product rules and chain rules that come out looking similar to what we are familiar with for the scalar
derivative
In mathematics, the derivative is a fundamental tool that quantifies the sensitivity to change of a function's output with respect to its input. The derivative of a function of a single variable at a chosen input value, when it exists, is t ...
.
Vector-by-vector
Each of the previous two cases can be considered as an application of the derivative of a vector with respect to a vector, using a vector of size one appropriately. Similarly we will find that the derivatives involving matrices will reduce to derivatives involving vectors in a corresponding way.
The derivative of a
vector function (a vector whose components are functions)
, with respect to an input vector,
, is written (in
numerator layout notation) as
:
In
vector calculus
Vector calculus or vector analysis is a branch of mathematics concerned with the differentiation and integration of vector fields, primarily in three-dimensional Euclidean space, \mathbb^3. The term ''vector calculus'' is sometimes used as a ...
, the derivative of a vector function with respect to a vector whose components represent a space is known as the
pushforward (or differential), or the
Jacobian matrix
In vector calculus, the Jacobian matrix (, ) of a vector-valued function of several variables is the matrix of all its first-order partial derivatives. If this matrix is square, that is, if the number of variables equals the number of component ...
.
The pushforward along a vector function with respect to vector in is given by
Derivatives with matrices
There are two types of derivatives with matrices that can be organized into a matrix of the same size. These are the derivative of a matrix by a scalar and the derivative of a scalar by a matrix. These can be useful in minimization problems found in many areas of applied mathematics and have adopted the names tangent matrix and gradient matrix respectively after their analogs for vectors.
Note: The discussion in this section assumes the
numerator layout convention for pedagogical purposes. Some authors use different conventions. The section on
layout conventions discusses this issue in greater detail. The identities given further down are presented in forms that can be used in conjunction with all common layout conventions.
Matrix-by-scalar
The derivative of a matrix function by a scalar is known as the tangent matrix and is given (in
numerator layout notation) by
:
Scalar-by-matrix
The derivative of a scalar function , with respect to a matrix of independent variables, is given (in
numerator layout notation) by
:
Important examples of scalar functions of matrices include the
trace of a matrix and the
determinant
In mathematics, the determinant is a Scalar (mathematics), scalar-valued function (mathematics), function of the entries of a square matrix. The determinant of a matrix is commonly denoted , , or . Its value characterizes some properties of the ...
.
In analog with
vector calculus
Vector calculus or vector analysis is a branch of mathematics concerned with the differentiation and integration of vector fields, primarily in three-dimensional Euclidean space, \mathbb^3. The term ''vector calculus'' is sometimes used as a ...
this derivative is often written as the following.
:
Also in analog with
vector calculus
Vector calculus or vector analysis is a branch of mathematics concerned with the differentiation and integration of vector fields, primarily in three-dimensional Euclidean space, \mathbb^3. The term ''vector calculus'' is sometimes used as a ...
, the directional derivative of a scalar of a matrix in the direction of matrix is given by
:
It is the gradient matrix, in particular, that finds many uses in minimization problems in
estimation theory
Estimation theory is a branch of statistics that deals with estimating the values of Statistical parameter, parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such ...
, particularly in the
derivation of the
Kalman filter
In statistics and control theory, Kalman filtering (also known as linear quadratic estimation) is an algorithm that uses a series of measurements observed over time, including statistical noise and other inaccuracies, to produce estimates of unk ...
algorithm, which is of great importance in the field.
Other matrix derivatives
The three types of derivatives that have not been considered are those involving vectors-by-matrices, matrices-by-vectors, and matrices-by-matrices. These are not as widely considered and a notation is not widely agreed upon.
Layout conventions
This section discusses the similarities and differences between notational conventions that are used in the various fields that take advantage of matrix calculus. Although there are largely two consistent conventions, some authors find it convenient to mix the two conventions in forms that are discussed below. After this section, equations will be listed in both competing forms separately.
The fundamental issue is that the derivative of a vector with respect to a vector, i.e.
, is often written in two competing ways. If the numerator is of size and the denominator of size ''n'', then the result can be laid out as either an matrix or matrix, i.e. the elements of laid out in rows and the elements of laid out in columns, or vice versa. This leads to the following possibilities:
#''Numerator layout'', i.e. lay out according to and (i.e. contrarily to ). This is sometimes known as the ''Jacobian formulation''. This corresponds to the layout in the previous example, which means that the row number of
equals to the size of the numerator
and the column number of
equals to the size of .
#''Denominator layout'', i.e. lay out according to and (i.e. contrarily to y). This is sometimes known as the ''Hessian formulation''. Some authors term this layout the ''gradient'', in distinction to the ''Jacobian'' (numerator layout), which is its transpose. (However, ''
gradient
In vector calculus, the gradient of a scalar-valued differentiable function f of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p gives the direction and the rate of fastest increase. The g ...
'' more commonly means the derivative
regardless of layout.). This corresponds to the ''n×m'' layout in the previous example, which means that the row number of
equals to the size of (the denominator).
#A third possibility sometimes seen is to insist on writing the derivative as
(i.e. the derivative is taken with respect to the transpose of ) and follow the numerator layout. This makes it possible to claim that the matrix is laid out according to both numerator and denominator. In practice this produces results the same as the numerator layout.
When handling the
gradient
In vector calculus, the gradient of a scalar-valued differentiable function f of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p gives the direction and the rate of fastest increase. The g ...
and the opposite case
we have the same issues. To be consistent, we should do one of the following:
#If we choose numerator layout for
we should lay out the
gradient
In vector calculus, the gradient of a scalar-valued differentiable function f of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p gives the direction and the rate of fastest increase. The g ...
as a row vector, and
as a column vector.
#If we choose denominator layout for
we should lay out the
gradient
In vector calculus, the gradient of a scalar-valued differentiable function f of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p gives the direction and the rate of fastest increase. The g ...
as a column vector, and
as a row vector.
#In the third possibility above, we write
and
and use numerator layout.
Not all math textbooks and papers are consistent in this respect throughout. That is, sometimes different conventions are used in different contexts within the same book or paper. For example, some choose denominator layout for gradients (laying them out as column vectors), but numerator layout for the vector-by-vector derivative
Similarly, when it comes to scalar-by-matrix derivatives
and matrix-by-scalar derivatives
then consistent numerator layout lays out according to and , while consistent denominator layout lays out according to and . In practice, however, following a denominator layout for
and laying the result out according to , is rarely seen because it makes for ugly formulas that do not correspond to the scalar formulas. As a result, the following layouts can often be found:
#''Consistent numerator layout'', which lays out
according to and
according to .
#''Mixed layout'', which lays out
according to and
according to .
#Use the notation
with results the same as consistent numerator layout.
In the following formulas, we handle the five possible combinations
and
separately. We also handle cases of scalar-by-scalar derivatives that involve an intermediate vector or matrix. (This can arise, for example, if a multi-dimensional
parametric curve is defined in terms of a scalar variable, and then a derivative of a scalar function of the curve is taken with respect to the scalar that parameterizes the curve.) For each of the various combinations, we give numerator-layout and denominator-layout results, except in the cases above where denominator layout rarely occurs. In cases involving matrices where it makes sense, we give numerator-layout and mixed-layout results. As noted above, cases where vector and matrix denominators are written in transpose notation are equivalent to numerator layout with the denominators written without the transpose.
Keep in mind that various authors use different combinations of numerator and denominator layouts for different types of derivatives, and there is no guarantee that an author will consistently use either numerator or denominator layout for all types. Match up the formulas below with those quoted in the source to determine the layout used for that particular type of derivative, but be careful not to assume that derivatives of other types necessarily follow the same kind of layout.
When taking derivatives with an aggregate (vector or matrix) denominator in order to find a maximum or minimum of the aggregate, it should be kept in mind that using numerator layout will produce results that are transposed with respect to the aggregate. For example, in attempting to find the
maximum likelihood estimate of a
multivariate normal distribution using matrix calculus, if the domain is a ''k''×1 column vector, then the result using the numerator layout will be in the form of a 1×''k'' row vector. Thus, either the results should be transposed at the end or the denominator layout (or mixed layout) should be used.
:
The results of operations will be transposed when switching between numerator-layout and denominator-layout notation.
Numerator-layout notation
Using numerator-layout notation, we have:
:
The following definitions are only provided in numerator-layout notation:
:
Denominator-layout notation
Using denominator-layout notation, we have:
:
Identities
As noted above, in general, the results of operations will be transposed when switching between numerator-layout and denominator-layout notation.
To help make sense of all the identities below, keep in mind the most important rules: the
chain rule
In calculus, the chain rule is a formula that expresses the derivative of the Function composition, composition of two differentiable functions and in terms of the derivatives of and . More precisely, if h=f\circ g is the function such that h ...
,
product rule and
sum rule. The sum rule applies universally, and the product rule applies in most of the cases below, provided that the order of matrix products is maintained, since matrix products are not commutative. The chain rule applies in some of the cases, but unfortunately does ''not'' apply in matrix-by-scalar derivatives or scalar-by-matrix derivatives (in the latter case, mostly involving the
trace operator applied to matrices). In the latter case, the product rule can't quite be applied directly, either, but the equivalent can be done with a bit more work using the differential identities.
The following identities adopt the following conventions:
* the scalars, , , , , and are constant in respect of, and the scalars, , and are functions of one of , , or ;
* the vectors, , , , , and are constant in respect of, and the vectors, , and are functions of one of , , or ;
* the matrices, , , , , and are constant in respect of, and the matrices, and are functions of one of , , or .
Vector-by-vector identities
This is presented first because all of the operations that apply to vector-by-vector differentiation apply directly to vector-by-scalar or scalar-by-vector differentiation simply by reducing the appropriate vector in the numerator or denominator to a scalar.
:
Scalar-by-vector identities
The fundamental identities are placed above the thick black line.
:{, class="wikitable" style="text-align: center;"
, + Identities: scalar-by-vector
! scope="col" width="150" , Condition
! scope="col" width="200" , Expression
! scope="col" width="200" , Numerator layout,
i.e. by ; result is row vector
! scope="col" width="200" , Denominator layout,
i.e. by ; result is column vector
, -
, is not a function of , ,
,
[Here, refers to a ]column vector
In linear algebra, a column vector with elements is an m \times 1 matrix consisting of a single column of entries, for example,
\boldsymbol = \begin x_1 \\ x_2 \\ \vdots \\ x_m \end.
Similarly, a row vector is a 1 \times n matrix for some , c ...
of all 0's, of size , where is the length of ., ,
, -
, is not a function of ,
, ,
, colspan=2,
, -
, , , ,
, colspan=2,
, -
, , , ,
, colspan=2,
, -
, , ,
, colspan=2,
, -
, , ,
, colspan=2,
, -
, ,
,
,
in numerator layout
,
in denominator layout
, -
, , ,
is not a function of
,
,
in numerator layout
,
in denominator layout
, -
,
,
,
,
, the
Hessian matrix
, - style="border-top: 3px solid;"
, is not a function of , ,
, ,
, ,
, -
, is not a function of
is not a function of , ,
, ,
, ,
, -
, is not a function of , ,
, ,
, ,
, -
, is not a function of
is
symmetric , ,
, ,
, ,
, -
, is not a function of , ,
, , colspan=2,
, -
, is not a function of
is
symmetric , ,
, , colspan=2,
, -
, , ,
, ,
, ,
, -
, is not a function of ,
,
,
in numerator layout
,
in denominator layout
, -
, , are not functions of , ,
, ,
, ,
, -
, , , , , are not functions of , ,
, ,
, ,
, -
, is not a function of , ,
, ,
, ,
Vector-by-scalar identities
:{, class="wikitable" style="text-align: center;"
, + Identities: vector-by-scalar
! scope="col" width="150" , Condition
! scope="col" width="100" , Expression
! scope="col" width="100" , Numerator layout, i.e. by ,
result is column vector
! scope="col" width="100" , Denominator layout, i.e. by ,
result is row vector
, -
, is not a function of , ,
, , colspan=2,
, -
, is not a function of ,
, ,
, colspan=2,
, -
, is not a function of ''x'',
, ,
, ,
, ,
, -
, , ,
, colspan=2,
, -
, , , ,
, colspan=2,
, -
, , , ,
, ,
, ,
, -
, rowspan=2, , , rowspan=2,
, ,
, ,
, -
, colspan=2, Assumes consistent matrix layout; see below.
, -
, rowspan=2, , , rowspan=2,
, ,
, ,
, -
, colspan=2, Assumes consistent matrix layout; see below.
, -
, , , ,
, ,
, ,
NOTE: The formulas involving the vector-by-vector derivatives
and
(whose outputs are matrices) assume the matrices are laid out consistent with the vector layout, i.e. numerator-layout matrix when numerator-layout vector and vice versa; otherwise, transpose the vector-by-vector derivatives.
Scalar-by-matrix identities
Note that exact equivalents of the scalar
product rule and
chain rule
In calculus, the chain rule is a formula that expresses the derivative of the Function composition, composition of two differentiable functions and in terms of the derivatives of and . More precisely, if h=f\circ g is the function such that h ...
do not exist when applied to matrix-valued functions of matrices. However, the product rule of this sort does apply to the differential form (see below), and this is the way to derive many of the identities below involving the
trace function, combined with the fact that the trace function allows transposing and cyclic permutation, i.e.:
:
For example, to compute
Therefore,
:
(numerator layout)
:
(denominator layout)
(For the last step, see the
Conversion from differential to derivative form section.)
:{, class="wikitable" style="text-align: center;"
, + Identities: scalar-by-matrix
! scope="col" width="175" , Condition
! scope="col" width="10" , Expression
! scope="col" width="100" , Numerator layout, i.e. by
! scope="col" width="100" , Denominator layout, i.e. by
, -
, is not a function of , ,
,
[Here, refers to a matrix of all 0's, of the same shape as .], ,
, -
, is not a function of , , ,
, colspan=2,
, -
, , , ,
, colspan=2,
, -
, , , ,
, colspan=2,
, -
, , ,
, colspan=2,
, -
, , ,
, colspan=2,
, -
, rowspan=2, , , rowspan=2,
, ,
, ,
, -
, colspan=2, Both forms assume ''numerator'' layout for
i.e. mixed layout if denominator layout for is being used.
, - style="border-top: 3px solid;"
, and are not functions of , ,
,
, ,
, -
, and are not functions of , ,
,
, ,
, -
, and are not functions of , is a real-valued differentiable function
,
,
,
, -
, , and are not functions of , ,
,
, ,
, -
, , and are not functions of , ,
,
, ,
, - style="border-top: 3px solid;"
, , ,
, , colspan="2" ,
, -
, , , ,
, , colspan="2" ,
, -
, is not a function of ,
, ,
, , colspan="2" ,
, -
, is any
polynomial
In mathematics, a polynomial is a Expression (mathematics), mathematical expression consisting of indeterminate (variable), indeterminates (also called variable (mathematics), variables) and coefficients, that involves only the operations of addit ...
with scalar coefficients, or any matrix function defined by an infinite polynomial series (e.g. , , , , etc. using a
Taylor series
In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor ser ...
); is the equivalent scalar function, is its derivative, and is the corresponding matrix function , ,
, ,
, ,
, -
, is not a function of , ,
, ,
, ,
, -
, is not a function of , ,
, ,
, ,
, -
, is not a function of , ,
, ,
, ,
, -
, is not a function of , ,
, ,
, ,
, -
, , are not functions of , ,
, ,
, ,
, -
, , , are not functions of , ,
, ,
, ,
, -
, is a positive integer , ,
, ,
, ,
, -
, is not a function of ,
is a positive integer , ,
, ,
, ,
, -
, , ,
, ,
, ,
, -
, , ,
, ,
, ,
, - style="border-top: 3px solid;"
, , ,
, ,
, ,
, -
, is not a function of , ,
[The constant disappears in the result. This is intentional. In general,
or, also
]
, ,
, ,
, -
, , are not functions of , ,
, ,
, ,
, -
, is a positive integer , ,
, ,
, ,
, -
, (see
pseudo-inverse) , ,
, ,
, ,
, -
, (see
pseudo-inverse) , ,
, ,
, ,
, -
, is not a function of ,
is square and invertible , ,
, ,
, ,
, -
, is not a function of ,
is non-square,
is symmetric , ,
, ,
, ,
, -
, is not a function of ,
is non-square,
is non-symmetric , ,
,
,
Matrix-by-scalar identities
:{, class="wikitable" style="text-align: center;"
, + Identities: matrix-by-scalar
! scope="col" width="175" , Condition
! scope="col" width="100" , Expression
! scope="col" width="100" , Numerator layout, i.e. by
, -
, , ,
, ,
, -
, , are not functions of ''x'',
, ,
, ,
, -
, , , ,
, ,
, -
, , , ,
, ,
, -
, , , ,
, ,
, -
, , , ,
, ,
, -
, , ,
, ,
, -
, , ,
, ,
, -
, is not a function of , is any polynomial with scalar coefficients, or any matrix function defined by an infinite polynomial series (e.g. , , , , etc.); is the equivalent scalar function, is its derivative, and is the corresponding matrix function , ,
, ,
, -
, is not a function of , ,
, ,
Scalar-by-scalar identities
With vectors involved
:{, class="wikitable" style="text-align: center;"
, + Identities: scalar-by-scalar, with vectors involved
! scope="col" width="150" , Condition
! scope="col" width="10" , Expression
! scope="col" width="150" , Any layout (assumes
dot product
In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a Scalar (mathematics), scalar as a result". It is also used for other symmetric bilinear forms, for example in a pseudo-Euclidean space. N ...
ignores row vs. column layout)
, -
, , ,
, ,
, -
, , , ,
,
With matrices involved
:{, class="wikitable" style="text-align: center;"
, +Identities: scalar-by-scalar, with matrices involved
[ This book uses a mixed layout, i.e. by in by in ]
! scope="col" width="175" , Condition
! scope="col" width="100" , Expression
! scope="col" width="100" , Consistent numerator layout,
i.e. by and
! scope="col" width="100" , Mixed layout,
i.e. by and
, -
, , ,
, , colspan=2,
, -
, , ,
, , colspan=2,
, -
, , ,
, colspan=2 ,
, -
,
,
,
,
, -
, is not a function of , is any polynomial with scalar coefficients, or any matrix function defined by an infinite polynomial series (e.g. , , , , etc.); is the equivalent scalar function, is its derivative, and is the corresponding matrix function. , ,
, , colspan=2,
, -
, is not a function of , ,
, , colspan=2,
Identities in differential form
It is often easier to work in differential form and then convert back to normal derivatives. This only works well using the numerator layout. In these rules, is a scalar.
:{, class="wikitable" style="text-align: center;"
, + Differential identities: scalar involving matrix
! Expression !! Result (numerator layout)
, -
,
, ,
, -
,
, ,
, -
,
, ,
:{, class="wikitable" style="text-align: center;"
, + Differential identities: matrix
! Condition !! Expression !! Result (numerator layout)
, -
, A is not a function of , ,
, ,
, -
, ''a'' is not a function of , ,
, ,
, -
, , ,
, ,
, -
, , ,
, ,
, -
, (
Kronecker product
In mathematics, the Kronecker product, sometimes denoted by ⊗, is an operation on two matrices of arbitrary size resulting in a block matrix. It is a specialization of the tensor product (which is denoted by the same symbol) from vector ...
) , ,
, ,
, -
, (
Hadamard product) , ,
, ,
, -
, , ,
, ,
, -
,
,
,
, -
, (
conjugate transpose) , ,
, ,
, -
, is a positive integer , ,
, ,
, -
,
,
,
, -
,
,
,
, -
,
is
diagonalizable
is
differentiable at every eigenvalue
,
,
In the last row,
is the
Kronecker delta
In mathematics, the Kronecker delta (named after Leopold Kronecker) is a function of two variables, usually just non-negative integers. The function is 1 if the variables are equal, and 0 otherwise:
\delta_ = \begin
0 &\text i \neq j, \\
1 &\ ...
and
is the set of orthogonal projection operators that project onto the -th eigenvector of .
is the matrix of
eigenvectors
In linear algebra, an eigenvector ( ) or characteristic vector is a Vector (mathematics and physics), vector that has its direction (geometry), direction unchanged (or reversed) by a given linear map, linear transformation. More precisely, an e ...
of
, and
are the eigenvalues.
The matrix function
is
defined in terms of the scalar function for diagonalizable matrices by
where
with
To convert to normal derivative form, first convert it to one of the following canonical forms, and then use these identities:
:{, class="wikitable" style="text-align: center;"
, + Conversion from differential to derivative form
! Canonical differential form !! Equivalent derivative form (numerator layout)
, -
,
, ,
, -
,
, ,
, -
,
, ,
, -
,
, ,
, -
,
, ,
, -
,
, ,
Applications
Matrix differential calculus is used in statistics and econometrics, particularly for the statistical analysis of
multivariate distributions, especially the
multivariate normal distribution and other
elliptical distributions.
It is used in
regression analysis to compute, for example, the
ordinary least squares regression formula for the case of multiple
explanatory variable
A variable is considered dependent if it depends on (or is hypothesized to depend on) an independent variable. Dependent variables are studied under the supposition or demand that they depend, by some law or rule (e.g., by a mathematical function ...
s.
It is also used in random matrices, statistical moments, local sensitivity and statistical diagnostics.
See also
*
Derivative (generalizations)
*
Product integral
*
Ricci calculus
*
Tensor derivative
Notes
References
Further reading
*
*
*. Note that this Wikipedia article has been nearly completely revised from the version criticized in this article.
External links
Software
MatrixCalculus.org a website for evaluating matrix calculus expressions symbolically
NCAlgebra an open-source
Mathematica package that has some matrix calculus functionality
*
SymPy supports symbolic matrix derivatives in it
matrix expression module as well as symbolic tensor derivatives in it
Tensorgrad an open-source python package for matrix calculus. Supports general symbolic tensor derivatives using
Penrose graphical notation.
Information
Matrix Reference Manual Mike Brookes,
Imperial College London
Imperial College London, also known as Imperial, is a Public university, public research university in London, England. Its history began with Prince Albert of Saxe-Coburg and Gotha, Prince Albert, husband of Queen Victoria, who envisioned a Al ...
.
Matrix Differentiation (and some other stuff) Randal J. Barnes, Department of Civil Engineering, University of Minnesota.
Notes on Matrix Calculus Paul L. Fackler,
North Carolina State University
North Carolina State University (NC State, North Carolina State, NC State University, or NCSU) is a public university, public Land-grant university, land-grant research university in Raleigh, North Carolina, United States. Founded in 1887 and p ...
.
Matrix Differential Calculus (slide presentation), Zhang Le,
University of Edinburgh
The University of Edinburgh (, ; abbreviated as ''Edin.'' in Post-nominal letters, post-nominals) is a Public university, public research university based in Edinburgh, Scotland. Founded by the City of Edinburgh Council, town council under th ...
.
Introduction to Vector and Matrix Differentiation(notes on matrix differentiation, in the context of
Econometrics
Econometrics is an application of statistical methods to economic data in order to give empirical content to economic relationships. M. Hashem Pesaran (1987). "Econometrics", '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8 ...
), Heino Bohn Nielsen.
A note on differentiating matrices(notes on matrix differentiation), Pawel Koval, from Munich Personal RePEc Archive.
Vector/Matrix CalculusMore notes on matrix differentiation.
Matrix Identities(notes on matrix differentiation), Sam Roweis.
Tensor CookbookMatrix Calculus using
Tensor Diagrams.
{{Calculus topics
Matrix theory
Linear algebra
Multivariable calculus