Delta Method
   HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, the delta method is a result concerning the approximate
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
for a
function Function or functionality may refer to: Computing * Function key, a type of key on computer keyboards * Function model, a structured representation of processes in a system * Function object or functor or functionoid, a concept of object-oriente ...
of an asymptotically normal statistical
estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...
from knowledge of the limiting
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
of that estimator.


History

The delta method was derived from
propagation of error In statistics, propagation of uncertainty (or propagation of error) is the effect of variables' uncertainties (or errors, more specifically random errors) on the uncertainty of a function based on them. When the variables are the values of ex ...
, and the idea behind was known in the early 19th century. Its statistical application can be traced as far back as 1928 by T. L. Kelley. A formal description of the method was presented by
J. L. Doob Joseph Leo Doob (February 27, 1910 – June 7, 2004) was an United States of America, American mathematician, specializing in Mathematical analysis, analysis and probability theory. The theory of Martingale (probability theory), martingales was ...
in 1935.
Robert Dorfman Robert Dorfman (27 October 1916 – 24 June 2002) was professor of political economy at Harvard University. Dorfman made great contributions to the fields of economics, statistics, group testing and in the process of coding theory. His paper†...
also described a version of it in 1938.


Univariate delta method

While the delta method generalizes easily to a multivariate setting, careful motivation of the technique is more easily demonstrated in univariate terms. Roughly, if there is a
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is calle ...
of random variables satisfying :, where ''θ'' and ''σ''2 are finite valued constants and \xrightarrow denotes
convergence in distribution In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications t ...
, then : for any function ''g'' satisfying the property that exists and is non-zero valued.


Proof in the univariate case

Demonstration of this result is fairly straightforward under the assumption that is
continuous Continuity or continuous may refer to: Mathematics * Continuity (mathematics), the opposing concept to discreteness; common examples include ** Continuous probability distribution or random variable in probability and statistics ** Continuous ...
. To begin, we use the mean value theorem (i.e.: the first order approximation of a
Taylor series In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor serie ...
using
Taylor's theorem In calculus, Taylor's theorem gives an approximation of a ''k''-times differentiable function around a given point by a polynomial of degree ''k'', called the ''k''th-order Taylor polynomial. For a smooth function, the Taylor polynomial is the t ...
): :g(X_n)=g(\theta)+g'(\tilde)(X_n-\theta), where \tilde lies between and ''θ''. Note that since X_n\,\xrightarrow\,\theta and , \tilde-\theta, <, X_n-\theta, , it must be that \tilde \,\xrightarrow\,\theta and since is continuous, applying the
continuous mapping theorem In probability theory, the continuous mapping theorem states that continuous functions preserve limits even if their arguments are sequences of random variables. A continuous function, in Heine’s definition, is such a function that maps converg ...
yields :g'(\tilde)\,\xrightarrow\,g'(\theta), where \xrightarrow denotes
convergence in probability In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications t ...
. Rearranging the terms and multiplying by \sqrt gives :\sqrt (X_n)-g(\theta)g' \left (\tilde \right )\sqrt _n-\theta Since : by assumption, it follows immediately from appeal to
Slutsky's theorem In probability theory, Slutsky’s theorem extends some properties of algebraic operations on convergent sequences of real numbers to sequences of random variables. The theorem was named after Eugen Slutsky. Slutsky's theorem is also attributed to ...
that :. This concludes the proof.


Proof with an explicit order of approximation

Alternatively, one can add one more step at the end, to obtain the
order of approximation In science, engineering, and other quantitative disciplines, order of approximation refers to formal or informal expressions for how accurate an approximation is. Usage in science and engineering In formal expressions, the English_numerals#Ordi ...
: : \begin \sqrt (X_n)-g(\theta)=g' \left (\tilde \right )\sqrt _n-\theta\ &=\sqrt _n-\thetaleft g'(\tilde )+g'(\theta)-g'(\theta)\right\ &=\sqrt _n-\thetaleft '(\theta)\right\sqrt _n-\thetaleft g'(\tilde )-g'(\theta)\right\ &=\sqrt _n-\thetaleft '(\theta)\rightO_p(1)\cdot o_p(1)\\ &=\sqrt _n-\thetaleft '(\theta)\righto_p(1) \end This suggests that the error in the approximation converges to 0 in probability.


Multivariate delta method

By definition, a
consistent estimator In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter ''θ''0—having the property that as the number of data points used increases indefinitely, the result ...
''B''
converges in probability In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications to ...
to its true value ''β'', and often a
central limit theorem In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themselv ...
can be applied to obtain
asymptotic normality In mathematics and statistics, an asymptotic distribution is a probability distribution that is in a sense the "limiting" distribution of a sequence of distributions. One of the main uses of the idea of an asymptotic distribution is in providing a ...
: :\sqrt\left(B-\beta\right)\,\xrightarrow\,N\left(0, \Sigma \right), where ''n'' is the number of observations and Σ is a (symmetric positive semi-definite) covariance matrix. Suppose we want to estimate the variance of a scalar-valued function ''h'' of the estimator ''B''. Keeping only the first two terms of the
Taylor series In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor serie ...
, and using vector notation for the
gradient In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gradi ...
, we can estimate ''h(B)'' as :h(B) \approx h(\beta) + \nabla h(\beta)^T \cdot (B-\beta) which implies the variance of ''h(B)'' is approximately :\begin \operatorname\left(h(B)\right) & \approx \operatorname\left(h(\beta) + \nabla h(\beta)^T \cdot (B-\beta)\right) \\ & = \operatorname\left(h(\beta) + \nabla h(\beta)^T \cdot B - \nabla h(\beta)^T \cdot \beta\right) \\ & = \operatorname\left(\nabla h(\beta)^T \cdot B\right) \\ & = \nabla h(\beta)^T \cdot \operatorname(B) \cdot \nabla h(\beta) \\ & = \nabla h(\beta)^T \cdot \frac \cdot \nabla h(\beta) \end One can use the mean value theorem (for real-valued functions of many variables) to see that this does not rely on taking first order approximation. The delta method therefore implies that :\sqrt\left(h(B)-h(\beta)\right)\,\xrightarrow\,N\left(0, \nabla h(\beta)^T \cdot \Sigma \cdot \nabla h(\beta)\right) or in univariate terms, :\sqrt\left(h(B)-h(\beta)\right)\,\xrightarrow\,N\left(0, \sigma^2 \cdot \left(h^\prime(\beta)\right)^2 \right).


Example: the binomial proportion

Suppose ''Xn'' is
binomial Binomial may refer to: In mathematics *Binomial (polynomial), a polynomial with two terms * Binomial coefficient, numbers appearing in the expansions of powers of binomials *Binomial QMF, a perfect-reconstruction orthogonal wavelet decomposition ...
with parameters p \in (0,1] and ''n''. Since :, we can apply the Delta method with to see : Hence, even though for any finite ''n'', the variance of \log\left(\frac\right) does not actually exist (since ''Xn'' can be zero), the asymptotic variance of \log \left( \frac \right) does exist and is equal to : \frac. Note that since ''p>0'', \Pr \left( \frac > 0 \right) \rightarrow 1 as n \rightarrow \infty , so with probability converging to one, \log\left(\frac\right) is finite for large ''n''. Moreover, if \hat p and \hat q are estimates of different group rates from independent samples of sizes ''n'' and ''m'' respectively, then the logarithm of the estimated relative risk \frac has asymptotic variance equal to : \frac+\frac. This is useful to construct a hypothesis test or to make a confidence interval for the relative risk.


Alternative form

The delta method is often used in a form that is essentially identical to that above, but without the assumption that or ''B'' is asymptotically normal. Often the only context is that the variance is "small". The results then just give approximations to the means and covariances of the transformed quantities. For example, the formulae presented in Klein (1953, p. 258) are: :\begin \operatorname \left(h_r \right) = & \sum_i \left( \frac \right)^2 \operatorname\left( B_i \right) + \sum_i \sum_ \left( \frac \right) \left( \frac \right) \operatorname\left( B_i, B_j \right) \\ \operatorname\left( h_r, h_s \right) = & \sum_i \left( \frac \right) \left( \frac \right) \operatorname\left( B_i \right) + \sum_i \sum_ \left( \frac \right) \left(\frac \right) \operatorname\left( B_i, B_j \right) \end where is the ''r''th element of ''h''(''B'') and ''Bi'' is the ''i''th element of ''B''.


Second-order delta method

When the delta method cannot be applied. However, if exists and is not zero, the second-order delta method can be applied. By the Taylor expansion, \sqrt (X_n)-g(\theta)\frac\sqrt _n-\theta2\left ''(\theta)\righto_p(1), so that the variance of g\left(X_n\right) relies on up to the 4th moment of X_n. The second-order delta method is also useful in conducting a more accurate approximation of g\left(X_n\right)'s distribution when sample size is small. \sqrt (X_n)-g(\theta)\sqrt _n-\thetag'(\theta)+\frac\sqrt _n-\theta2 g''(\theta) +o_p(1). For example, when X_n follows the standard normal distribution, g\left(X_n\right) can be approximated as the weighted sum of a standard normal and a chi-square with degree-of-freedom of 1.


Nonparametric delta method

A version of the delta method exists in
nonparametric statistics Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being distr ...
. Let X_i \sim F be an
independent and identically distributed In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usua ...
random variable with a sample of size n with an empirical distribution function \hat_n, and let T be a functional. If T is Hadamard differentiable with respect to the Chebyshev metric, then :\frac \xrightarrow N(0, 1) where \widehat = \frac and \hat^2 = \frac\sum_^n \hat^2(X_i), with \hat(x) = L_(\delta_x) denoting the empirical influence function for T. A nonparametric (1-\alpha) pointwise asymptotic confidence interval for T(F) is therefore given by :T(\hat_n) \pm z_ \widehat where z_q denotes the q-quantile of the standard normal. See Wasserman (2006) p. 19f. for details and examples.


See also

* Taylor expansions for the moments of functions of random variables *
Variance-stabilizing transformation In applied statistics, a variance-stabilizing transformation is a data transformation that is specifically chosen either to simplify considerations in graphical exploratory data analysis or to allow the application of simple regression-based or anal ...


References


Further reading

* * *


External links

* * *{{cite web , title=Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes , first1=Jun , last1=Xu , first2=J. Scott , last2=Long, author2-link=J. Scott Long , publisher=Indiana University , date=August 22, 2005 , work=Lecture notes , url=http://www.indiana.edu/~jslsoc/stata/ci_computations/spost_deltaci.pdf Estimation methods Statistical approximations Articles containing proofs Statistics articles needing expert attention de:Statistischer Test#Asymptotisches Verhalten des Tests