Chain Rule
   HOME

TheInfoList



OR:

In
calculus Calculus, originally called infinitesimal calculus or "the calculus of infinitesimals", is the mathematical study of continuous change, in the same way that geometry is the study of shape, and algebra is the study of generalizations of arithm ...
, the chain rule is a formula that expresses the
derivative In mathematics, the derivative of a function of a real variable measures the sensitivity to change of the function value (output value) with respect to a change in its argument (input value). Derivatives are a fundamental tool of calculus. F ...
of the
composition Composition or Compositions may refer to: Arts and literature *Composition (dance), practice and teaching of choreography *Composition (language), in literature and rhetoric, producing a work in spoken tradition and written discourse, to include v ...
of two
differentiable function In mathematics, a differentiable function of one real variable is a function whose derivative exists at each point in its domain. In other words, the graph of a differentiable function has a non-vertical tangent line at each interior point in it ...
s and in terms of the derivatives of and . More precisely, if h=f\circ g is the function such that h(x)=f(g(x)) for every , then the chain rule is, in
Lagrange's notation In differential calculus, there is no single uniform notation for differentiation. Instead, various notations for the derivative of a function or variable have been proposed by various mathematicians. The usefulness of each notation varies with ...
, :h'(x) = f'(g(x)) g'(x). or, equivalently, :h'=(f\circ g)'=(f'\circ g)\cdot g'. The chain rule may also be expressed in
Leibniz's notation In calculus, Leibniz's notation, named in honor of the 17th-century German philosopher and mathematician Gottfried Wilhelm Leibniz, uses the symbols and to represent infinitely small (or infinitesimal) increments of and , respectively, just a ...
. If a variable depends on the variable , which itself depends on the variable (that is, and are dependent variables), then depends on as well, via the intermediate variable . In this case, the chain rule is expressed as :\frac = \frac \cdot \frac, and : \left.\frac\_ = \left.\frac\_ \cdot \left. \frac\_ , for indicating at which points the derivatives have to be evaluated. In
integration Integration may refer to: Biology *Multisensory integration *Path integration * Pre-integration complex, viral genetic material used to insert a viral genome into a host genome *DNA integration, by means of site-specific recombinase technology, ...
, the counterpart to the chain rule is the substitution rule.


Intuitive explanation

Intuitively, the chain rule states that knowing the instantaneous rate of change of relative to and that of relative to allows one to calculate the instantaneous rate of change of relative to as the product of the two rates of change. As put by
George F. Simmons George Finlay Simmons (March 3, 1925 – August 6, 2019) was an American mathematician who worked in topology and classical analysis. He is known as the author of widely used textbooks on university mathematics. Life He was born on 3 March 1925 ...
: "if a car travels twice as fast as a bicycle and the bicycle is four times as fast as a walking man, then the car travels 2 × 4 = 8 times as fast as the man." The relationship between this example and the chain rule is as follows. Let , and be the (variable) positions of the car, the bicycle, and the walking man, respectively. The rate of change of relative positions of the car and the bicycle is \frac =2. Similarly, \frac =4. So, the rate of change of the relative positions of the car and the walking man is :\frac=\frac\cdot\frac=2\cdot 4=8. The rate of change of positions is the ratio of the speeds, and the speed is the derivative of the position with respect to the time; that is, :\frac=\frac \frac\frac, or, equivalently, :\frac=\frac\cdot \frac, which is also an application of the chain rule.


History

The chain rule seems to have first been used by
Gottfried Wilhelm Leibniz Gottfried Wilhelm (von) Leibniz . ( – 14 November 1716) was a German polymath active as a mathematician, philosopher, scientist and diplomat. He is one of the most prominent figures in both the history of philosophy and the history of mathema ...
. He used it to calculate the derivative of \sqrt as the composite of the square root function and the function a + bz + cz^2\!. He first mentioned it in a 1676 memoir (with a sign error in the calculation). The common notation of the chain rule is due to Leibniz.
Guillaume de l'Hôpital Guillaume François Antoine, Marquis de l'Hôpital (; sometimes spelled L'Hospital; 1661 – 2 February 1704), also known as Guillaume-François-Antoine Marquis de l'Hôpital, Marquis de Sainte-Mesme, Comte d'Entremont, and Seigneur d'Ouques-la- ...
used the chain rule implicitly in his '' Analyse des infiniment petits''. The chain rule does not appear in any of
Leonhard Euler Leonhard Euler ( , ; 15 April 170718 September 1783) was a Swiss mathematician, physicist, astronomer, geographer, logician and engineer who founded the studies of graph theory and topology and made pioneering and influential discoveries in ma ...
's analysis books, even though they were written over a hundred years after Leibniz's discovery.


Statement

The simplest form of the chain rule is for real-valued functions of one
real Real may refer to: Currencies * Brazilian real (R$) * Central American Republic real * Mexican real * Portuguese real * Spanish real * Spanish colonial real Music Albums * ''Real'' (L'Arc-en-Ciel album) (2000) * ''Real'' (Bright album) (2010) ...
variable. It states that if ' is a function that is differentiable at a point ' (i.e. the derivative exists) and ' is a function that is differentiable at , then the composite function f\circ g is differentiable at ', and the derivative is : (f\circ g)'(c) = f'(g(c))\cdot g'(c). The rule is sometimes abbreviated as :(f\circ g)' = (f'\circ g) \cdot g'. If and , then this abbreviated form is written in
Leibniz notation In calculus, Leibniz's notation, named in honor of the 17th-century German philosopher and mathematician Gottfried Wilhelm Leibniz, uses the symbols and to represent infinitely small (or infinitesimal) increments of and , respectively, just a ...
as: :\frac = \frac \cdot \frac. The points where the derivatives are evaluated may also be stated explicitly: :\left.\frac\_ = \left.\frac\_ \cdot \left.\frac\_. Carrying the same reasoning further, given ' functions f_1, \ldots, f_n\! with the composite function f_1 \circ ( f_2 \circ \cdots (f_ \circ f_n) )\!, if each function f_i\! is differentiable at its immediate input, then the composite function is also differentiable by the repeated application of Chain Rule, where the derivative is (in Leibniz's notation): :\frac = \frac\frac\cdots\frac.


Applications


Composites of more than two functions

The chain rule can be applied to composites of more than two functions. To take the derivative of a composite of more than two functions, notice that the composite of , , and ' (in that order) is the composite of with . The chain rule states that to compute the derivative of , it is sufficient to compute the derivative of ' and the derivative of . The derivative of ' can be calculated directly, and the derivative of can be calculated by applying the chain rule again. For concreteness, consider the function :y = e^. This can be decomposed as the composite of three functions: :\begin y &= f(u) = e^u, \\ ptu &= g(v) = \sin v = \sin(x^2), \\ ptv &= h(x) = x^2. \end Their derivatives are: :\begin \frac &= f'(u) = e^u = e^, \\ pt\frac &= g'(v) = \cos v = \cos(x^2), \\ pt\frac &= h'(x) = 2x. \end The chain rule states that the derivative of their composite at the point is: : \begin (f \circ g \circ h)'(a) & = f'((g \circ h)(a))\cdot (g \circ h)'(a) \\
0pt PT, Pt, or pt may refer to: Arts and entertainment * ''P.T.'' (video game), acronym for ''Playable Teaser'', a short video game released to promote the cancelled video game ''Silent Hills'' * Porcupine Tree, a British progressive rock group ...
& = f'((g \circ h)(a)) \cdot g'(h(a)) \cdot h'(a) = (f' \circ g \circ h)(a) \cdot (g' \circ h)(a) \cdot h'(a). \end In
Leibniz's notation In calculus, Leibniz's notation, named in honor of the 17th-century German philosopher and mathematician Gottfried Wilhelm Leibniz, uses the symbols and to represent infinitely small (or infinitesimal) increments of and , respectively, just a ...
, this is: :\frac = \left.\frac\_\cdot\left.\frac\_\cdot\left.\frac\_, or for short, :\frac = \frac\cdot\frac\cdot\frac. The derivative function is therefore: :\frac = e^\cdot\cos(x^2)\cdot 2x. Another way of computing this derivative is to view the composite function as the composite of and ''h''. Applying the chain rule in this manner would yield: :(f \circ g \circ h)'(a) = (f \circ g)'(h(a))\cdot h'(a) = f'(g(h(a)))\cdot g'(h(a))\cdot h'(a). This is the same as what was computed above. This should be expected because . Sometimes, it is necessary to differentiate an arbitrarily long composition of the form f_1 \circ f_2 \circ \cdots \circ f_ \circ f_n\!. In this case, define :f_ = f_ \circ f_ \circ \cdots \circ f_ \circ f_ where f_ = f_a and f_(x) = x when b < a. Then the chain rule takes the form :Df_ = (Df_1 \circ f_) (Df_2 \circ f_) \cdots (Df_ \circ f_) Df_n = \prod_^n \left f_k \circ f_\right/math> or, in the Lagrange notation, :f_'(x) = f_1' \left( f_(x) \right) \; f_2' \left( f_(x) \right) \cdots f_' \left(f_(x)\right) \; f_n'(x) = \prod_^ f_k' \left(f_(x) \right)


Quotient rule

The chain rule can be used to derive some well-known differentiation rules. For example, the quotient rule is a consequence of the chain rule and the
product rule In calculus, the product rule (or Leibniz rule or Leibniz product rule) is a formula used to find the derivatives of products of two or more functions. For two functions, it may be stated in Lagrange's notation as (u \cdot v)' = u ' \cdot v ...
. To see this, write the function as the product . First apply the product rule: :\begin \frac\left(\frac\right) &= \frac\left(f(x)\cdot\frac\right) \\ &= f'(x)\cdot\frac + f(x)\cdot\frac\left(\frac\right). \end To compute the derivative of , notice that it is the composite of with the reciprocal function, that is, the function that sends to . The derivative of the reciprocal function is -1/x^2\!. By applying the chain rule, the last expression becomes: :f'(x)\cdot\frac + f(x)\cdot\left(-\frac\cdot g'(x)\right) = \frac, which is the usual formula for the quotient rule.


Derivatives of inverse functions

Suppose that has an
inverse function In mathematics, the inverse function of a function (also called the inverse of ) is a function that undoes the operation of . The inverse of exists if and only if is bijective, and if it exists, is denoted by f^ . For a function f\colon X\t ...
. Call its inverse function so that we have . There is a formula for the derivative of in terms of the derivative of . To see this, note that and satisfy the formula :f(g(x)) = x. And because the functions f(g(x)) and are equal, their derivatives must be equal. The derivative of is the constant function with value 1, and the derivative of f(g(x)) is determined by the chain rule. Therefore, we have that: :f'(g(x)) g'(x) = 1. To express as a function of an independent variable , we substitute f(y) for wherever it appears. Then we can solve for . :\begin f'(g(f(y))) g'(f(y)) &= 1 \\ ptf'(y) g'(f(y)) &= 1 \\ ptf'(y) = \frac. \end For example, consider the function . It has an inverse . Because , the above formula says that :\frac\ln y = \frac = \frac. This formula is true whenever is differentiable and its inverse is also differentiable. This formula can fail when one of these conditions is not true. For example, consider . Its inverse is , which is not differentiable at zero. If we attempt to use the above formula to compute the derivative of at zero, then we must evaluate . Since and , we must evaluate 1/0, which is undefined. Therefore, the formula fails in this case. This is not surprising because is not differentiable at zero.


Higher derivatives

Faà di Bruno's formula Faà di Bruno's formula is an identity in mathematics generalizing the chain rule to higher derivatives. It is named after , although he was not the first to state or prove the formula. In 1800, more than 50 years before Faà di Bruno, the French ...
generalizes the chain rule to higher derivatives. Assuming that and , then the first few derivatives are: : \begin \frac & = \frac \frac \\ pt\frac & = \frac \left(\frac\right)^2 + \frac \frac \\ pt\frac & = \frac \left(\frac\right)^3 + 3 \, \frac \frac \frac + \frac \frac \\ pt\frac & =\frac \left(\frac\right)^4 + 6 \, \frac \left(\frac\right)^2 \frac + \frac \left( 4 \, \frac \frac + 3 \, \left(\frac\right)^2\right) + \frac \frac. \end


Proofs


First proof

One proof of the chain rule begins by defining the derivative of the composite function , where we take the
limit Limit or Limits may refer to: Arts and media * ''Limit'' (manga), a manga by Keiko Suenobu * ''Limit'' (film), a South Korean film * Limit (music), a way to characterize harmony * "Limit" (song), a 2016 single by Luna Sea * "Limits", a 2019 ...
of the
difference quotient In single-variable calculus, the difference quotient is usually the name for the expression : \frac which when taken to the limit as ''h'' approaches 0 gives the derivative of the function ''f''. The name of the expression stems from the fact ...
for as approaches : :(f \circ g)'(a) = \lim_ \frac. Assume for the moment that g(x)\! does not equal g(a) for any near . Then the previous expression is equal to the product of two factors: :\lim_ \frac \cdot \frac. If g oscillates near , then it might happen that no matter how close one gets to , there is always an even closer such that . For example, this happens near for the
continuous function In mathematics, a continuous function is a function such that a continuous variation (that is a change without jump) of the argument induces a continuous variation of the value of the function. This means that there are no abrupt changes in value ...
defined by for and otherwise. Whenever this happens, the above expression is undefined because it involves
division by zero In mathematics, division by zero is division (mathematics), division where the divisor (denominator) is 0, zero. Such a division can be formally expression (mathematics), expressed as \tfrac, where is the dividend (numerator). In ordinary ari ...
. To work around this, introduce a function Q as follows: :Q(y) = \begin \displaystyle\frac, & y \neq g(a), \\ f'(g(a)), & y = g(a). \end We will show that the difference quotient for is always equal to: :Q(g(x)) \cdot \frac. Whenever is not equal to , this is clear because the factors of cancel. When equals , then the difference quotient for is zero because equals , and the above product is zero because it equals times zero. So the above product is always equal to the difference quotient, and to show that the derivative of at exists and to determine its value, we need only show that the limit as goes to of the above product exists and determine its value. To do this, recall that the limit of a product exists if the limits of its factors exist. When this happens, the limit of the product of these two factors will equal the product of the limits of the factors. The two factors are and . The latter is the difference quotient for at , and because is differentiable at by assumption, its limit as tends to exists and equals . As for , notice that is defined wherever ' is. Furthermore, ' is differentiable at by assumption, so is continuous at , by definition of the derivative. The function is continuous at because it is differentiable at , and therefore is continuous at . So its limit as ' goes to ' exists and equals , which is . This shows that the limits of both factors exist and that they equal and , respectively. Therefore, the derivative of at ''a'' exists and equals .


Second proof

Another way of proving the chain rule is to measure the error in the linear approximation determined by the derivative. This proof has the advantage that it generalizes to several variables. It relies on the following equivalent definition of differentiability at a point: A function ''g'' is differentiable at ''a'' if there exists a real number ''g''′(''a'') and a function ''ε''(''h'') that tends to zero as ''h'' tends to zero, and furthermore :g(a + h) - g(a) = g'(a) h + \varepsilon(h) h. Here the left-hand side represents the true difference between the value of ''g'' at ''a'' and at , whereas the right-hand side represents the approximation determined by the derivative plus an error term. In the situation of the chain rule, such a function ''ε'' exists because ''g'' is assumed to be differentiable at ''a''. Again by assumption, a similar function also exists for ''f'' at ''g''(''a''). Calling this function ''η'', we have :f(g(a) + k) - f(g(a)) = f'(g(a)) k + \eta(k) k. The above definition imposes no constraints on ''η''(0), even though it is assumed that ''η''(''k'') tends to zero as ''k'' tends to zero. If we set , then ''η'' is continuous at 0. Proving the theorem requires studying the difference as ''h'' tends to zero. The first step is to substitute for using the definition of differentiability of ''g'' at ''a'': :f(g(a + h)) - f(g(a)) = f(g(a) + g'(a) h + \varepsilon(h) h) - f(g(a)). The next step is to use the definition of differentiability of ''f'' at ''g''(''a''). This requires a term of the form for some ''k''. In the above equation, the correct ''k'' varies with ''h''. Set and the right hand side becomes . Applying the definition of the derivative gives: :f(g(a) + k_h) - f(g(a)) = f'(g(a)) k_h + \eta(k_h) k_h. To study the behavior of this expression as ''h'' tends to zero, expand ''k''''h''. After regrouping the terms, the right-hand side becomes: :f'(g(a)) g'(a)h + '(g(a)) \varepsilon(h) + \eta(k_h) g'(a) + \eta(k_h) \varepsilon(h)h. Because ''ε''(''h'') and ''η''(''k''''h'') tend to zero as ''h'' tends to zero, the first two bracketed terms tend to zero as ''h'' tends to zero. Applying the same theorem on products of limits as in the first proof, the third bracketed term also tends zero. Because the above expression is equal to the difference , by the definition of the derivative is differentiable at ''a'' and its derivative is The role of ''Q'' in the first proof is played by ''η'' in this proof. They are related by the equation: :Q(y) = f'(g(a)) + \eta(y - g(a)). The need to define ''Q'' at ''g''(''a'') is analogous to the need to define ''η'' at zero.


Third proof

Constantin Carathéodory Constantin Carathéodory ( el, Κωνσταντίνος Καραθεοδωρή, Konstantinos Karatheodori; 13 September 1873 – 2 February 1950) was a Greek mathematician who spent most of his professional career in Germany. He made significant ...
's alternative definition of the differentiability of a function can be used to give an elegant proof of the chain rule. Under this definition, a function is differentiable at a point if and only if there is a function , continuous at and such that . There is at most one such function, and if is differentiable at then . Given the assumptions of the chain rule and the fact that differentiable functions and compositions of continuous functions are continuous, we have that there exist functions , continuous at , and , continuous at , and such that, :f(g(x))-f(g(a))=q(g(x))(g(x)-g(a)) and :g(x)-g(a)=r(x)(x-a). Therefore, :f(g(x))-f(g(a))=q(g(x))r(x)(x-a), but the function given by is continuous at , and we get, for this :(f(g(a)))'=q(g(a))r(a)=f'(g(a))g'(a). A similar approach works for continuously differentiable (vector-)functions of many variables. This method of factoring also allows a unified approach to stronger forms of differentiability, when the derivative is required to be
Lipschitz continuous In mathematical analysis, Lipschitz continuity, named after German mathematician Rudolf Lipschitz, is a strong form of uniform continuity for functions. Intuitively, a Lipschitz continuous function is limited in how fast it can change: there e ...
,
Hölder continuous Hölder: * ''Hölder, Hoelder'' as surname * Hölder condition * Hölder's inequality * Hölder mean In mathematics, generalized means (or power mean or Hölder mean from Otto Hölder) are a family of functions for aggregating sets of number ...
, etc. Differentiation itself can be viewed as the
polynomial remainder theorem In mathematics, a polynomial is an expression consisting of indeterminates (also called variables) and coefficients, that involves only the operations of addition, subtraction, multiplication, and positive-integer powers of variables. An exampl ...
(the little Bézout theorem, or factor theorem), generalized to an appropriate class of functions.


Proof via infinitesimals

If y=f(x) and x=g(t) then choosing infinitesimal \Delta t\not=0 we compute the corresponding \Delta x=g(t+\Delta t)-g(t) and then the corresponding \Delta y=f(x+\Delta x)-f(x), so that :\frac=\frac \frac and applying the
standard part In nonstandard analysis, the standard part function is a function from the limited (finite) hyperreal numbers to the real numbers. Briefly, the standard part function "rounds off" a finite hyperreal to the nearest real. It associates to every suc ...
we obtain :\frac=\frac \frac which is the chain rule.


Multivariable case

The generalization of the chain rule to multi-variable functions is rather technical. However, it is simpler to write in the case of functions of the form :f(g_1(x), \dots, g_k(x)). As this case occurs often in the study of functions of a single variable, it is worth describing it separately.


Case of

For writing the chain rule for a function of the form :, one needs the
partial derivative In mathematics, a partial derivative of a function of several variables is its derivative with respect to one of those variables, with the others held constant (as opposed to the total derivative, in which all variables are allowed to vary). Part ...
s of with respect to its arguments. The usual notations for partial derivatives involve names for the arguments of the function. As these arguments are not named in the above formula, it is simpler and clearer to denote by :D_i f the partial derivative of with respect to its th argument, and by : D_i f(z) the value of this derivative at . With this notation, the chain rule is :\fracf(g_1(x), \dots, g_k (x))=\sum_^k \left(\frac(x)\right) D_i f(g_1(x), \dots, g_k (x)).


Example: arithmetic operations

If the function is addition, that is, if :f(u,v)=u+v, then D_1 f = \frac = 1 and D_2 f = \frac = 1. Thus, the chain rule gives :\frac(g(x)+h(x)) = \left( \fracg(x) \right) D_1 f+\left( \frach(x)\right) D_2 f=\fracg(x) +\frach(x). For multiplication :f(u,v)=uv, the partials are D_1 f = v and D_2 f = u. Thus, :\frac(g(x)h(x)) = h(x) \frac g(x) + g(x) \frac h(x). The case of exponentiation :f(u,v)=u^v is slightly more complicated, as :D_1 f = vu^, and, as u^v=e^, :D_2 f = u^v\ln u. It follows that :\frac\left(g(x)^\right) = h(x)g(x)^ \fracg(x) + g(x)^ \ln g(x) \frach(x).


General rule

The simplest way for writing the chain rule in the general case is to use the
total derivative In mathematics, the total derivative of a function at a point is the best linear approximation near this point of the function with respect to its arguments. Unlike partial derivatives, the total derivative approximates the function with res ...
, which is a linear transformation that captures all
directional derivative In mathematics, the directional derivative of a multivariable differentiable (scalar) function along a given vector v at a given point x intuitively represents the instantaneous rate of change of the function, moving through x with a velocity ...
s in a single formula. Consider differentiable functions and , and a point in . Let denote the total derivative of at and denote the total derivative of at . These two derivatives are linear transformations and , respectively, so they can be composed. The chain rule for total derivatives is that their composite is the total derivative of at : :D_(f \circ g) = D_f \circ D_g, or for short, :D(f \circ g) = Df \circ Dg. The higher-dimensional chain rule can be proved using a technique similar to the second proof given above. Because the total derivative is a linear transformation, the functions appearing in the formula can be rewritten as matrices. The matrix corresponding to a total derivative is called a Jacobian matrix, and the composite of two derivatives corresponds to the product of their Jacobian matrices. From this perspective the chain rule therefore says: :J_(\mathbf) = J_(g(\mathbf)) J_(\mathbf), or for short, :J_ = (J_f \circ g)J_g. That is, the Jacobian of a composite function is the product of the Jacobians of the composed functions (evaluated at the appropriate points). The higher-dimensional chain rule is a generalization of the one-dimensional chain rule. If ''k'', ''m'', and ''n'' are 1, so that and , then the Jacobian matrices of ''f'' and ''g'' are . Specifically, they are: :\begin J_g(a) &= \begin g'(a) \end, \\ J_(g(a)) &= \begin f'(g(a)) \end. \end The Jacobian of ''f'' ∘ ''g'' is the product of these matrices, so it is , as expected from the one-dimensional chain rule. In the language of linear transformations, ''D''''a''(''g'') is the function which scales a vector by a factor of ''g''′(''a'') and ''D''''g''(''a'')(''f'') is the function which scales a vector by a factor of ''f''′(''g''(''a'')). The chain rule says that the composite of these two linear transformations is the linear transformation , and therefore it is the function that scales a vector by ''f''′(''g''(''a''))⋅''g''′(''a''). Another way of writing the chain rule is used when ''f'' and ''g'' are expressed in terms of their components as and . In this case, the above rule for Jacobian matrices is usually written as: :\frac = \frac \frac. The chain rule for total derivatives implies a chain rule for partial derivatives. Recall that when the total derivative exists, the partial derivative in the ''i''th coordinate direction is found by multiplying the Jacobian matrix by the ''i''th basis vector. By doing this to the formula above, we find: :\frac = \frac \frac. Since the entries of the Jacobian matrix are partial derivatives, we may simplify the above formula to get: :\frac = \sum_^m \frac \frac. More conceptually, this rule expresses the fact that a change in the ''x''''i'' direction may change all of ''g''1 through ''gm'', and any of these changes may affect ''f''. In the special case where , so that ''f'' is a real-valued function, then this formula simplifies even further: :\frac = \sum_^m \frac \frac. This can be rewritten as a
dot product In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a scalar as a result". It is also used sometimes for other symmetric bilinear forms, for example in a pseudo-Euclidean space. is an algebra ...
. Recalling that , the partial derivative is also a vector, and the chain rule says that: :\frac = \nabla y \cdot \frac.


Example

Given where and , determine the value of and using the chain rule. :\frac=\frac \frac+\frac \frac = (2x)(\sin(t)) + (2)(0) = 2r \sin^2(t), and :\begin\frac &= \frac \frac+\frac \frac \\ &= (2x)(r\cos(t)) + (2)(2\sin(t)\cos(t)) \\ &= (2r\sin(t))(r\cos(t)) + 4\sin(t)\cos(t) \\ &= 2(r^2 + 2) \sin(t)\cos(t) \\ &= (r^2 + 2) \sin(2t).\end


Higher derivatives of multivariable functions

Faà di Bruno's formula for higher-order derivatives of single-variable functions generalizes to the multivariable case. If is a function of as above, then the second derivative of is: :\frac = \sum_k \left(\frac\frac\right) + \sum_ \left(\frac\frac\frac\right).


Further generalizations

All extensions of calculus have a chain rule. In most of these, the formula remains the same, though the meaning of that formula may be vastly different. One generalization is to
manifold In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point. More precisely, an n-dimensional manifold, or ''n-manifold'' for short, is a topological space with the property that each point has a n ...
s. In this situation, the chain rule represents the fact that the derivative of is the composite of the derivative of ''f'' and the derivative of ''g''. This theorem is an immediate consequence of the higher dimensional chain rule given above, and it has exactly the same formula. The chain rule is also valid for
Fréchet derivative In mathematics, the Fréchet derivative is a derivative defined on normed spaces. Named after Maurice Fréchet, it is commonly used to generalize the derivative of a real-valued function of a single real variable to the case of a vector-valued ...
s in
Banach space In mathematics, more specifically in functional analysis, a Banach space (pronounced ) is a complete normed vector space. Thus, a Banach space is a vector space with a metric that allows the computation of vector length and distance between vector ...
s. The same formula holds as before. This case and the previous one admit a simultaneous generalization to
Banach manifold In mathematics, a Banach manifold is a manifold modeled on Banach spaces. Thus it is a topological space in which each point has a neighbourhood homeomorphic to an open set in a Banach space (a more involved and formal definition is given below) ...
s. In
differential algebra In mathematics, differential rings, differential fields, and differential algebras are rings, fields, and algebras equipped with finitely many derivations, which are unary functions that are linear and satisfy the Leibniz product rule. A n ...
, the derivative is interpreted as a morphism of modules of
Kähler differential In mathematics, Kähler differentials provide an adaptation of differential forms to arbitrary commutative rings or schemes. The notion was introduced by Erich Kähler in the 1930s. It was adopted as standard in commutative algebra and algebr ...
s. A
ring homomorphism In ring theory, a branch of abstract algebra, a ring homomorphism is a structure-preserving function between two rings. More explicitly, if ''R'' and ''S'' are rings, then a ring homomorphism is a function such that ''f'' is: :addition preservi ...
of commutative rings determines a morphism of Kähler differentials which sends an element ''dr'' to ''d''(''f''(''r'')), the exterior differential of ''f''(''r''). The formula holds in this context as well. The common feature of these examples is that they are expressions of the idea that the derivative is part of a
functor In mathematics, specifically category theory, a functor is a Map (mathematics), mapping between Category (mathematics), categories. Functors were first considered in algebraic topology, where algebraic objects (such as the fundamental group) ar ...
. A functor is an operation on spaces and functions between them. It associates to each space a new space and to each function between two spaces a new function between the corresponding new spaces. In each of the above cases, the functor sends each space to its
tangent bundle In differential geometry, the tangent bundle of a differentiable manifold M is a manifold TM which assembles all the tangent vectors in M . As a set, it is given by the disjoint unionThe disjoint union ensures that for any two points and of ...
and it sends each function to its derivative. For example, in the manifold case, the derivative sends a ''C''''r''-manifold to a ''C''''r''−1-manifold (its tangent bundle) and a ''C''''r''-function to its total derivative. There is one requirement for this to be a functor, namely that the derivative of a composite must be the composite of the derivatives. This is exactly the formula . There are also chain rules in
stochastic calculus Stochastic calculus is a branch of mathematics that operates on stochastic processes. It allows a consistent theory of integration to be defined for integrals of stochastic processes with respect to stochastic processes. This field was created an ...
. One of these, Itō's lemma, expresses the composite of an Itō process (or more generally a
semimartingale In probability theory, a real valued stochastic process ''X'' is called a semimartingale if it can be decomposed as the sum of a local martingale and a càdlàg adapted finite-variation process. Semimartingales are "good integrators", forming the ...
) ''dX''''t'' with a twice-differentiable function ''f''. In Itō's lemma, the derivative of the composite function depends not only on ''dX''''t'' and the derivative of ''f'' but also on the second derivative of ''f''. The dependence on the second derivative is a consequence of the non-zero
quadratic variation In mathematics, quadratic variation is used in the analysis of stochastic processes such as Brownian motion and other martingales. Quadratic variation is just one kind of variation of a process. Definition Suppose that X_t is a real-valued sto ...
of the stochastic process, which broadly speaking means that the process can move up and down in a very rough way. This variant of the chain rule is not an example of a functor because the two functions being composed are of different types.


See also

* − a computational method that makes heavy use of the chain rule to compute exact numerical derivatives. * * * * * *


References


External links

* * {{Calculus topics Articles containing proofs Differentiation rules Theorems in analysis Theorems in calculus