HOME

TheInfoList



OR:

In statistics, the delta method is a result concerning the approximate probability distribution for a
function Function or functionality may refer to: Computing * Function key, a type of key on computer keyboards * Function model, a structured representation of processes in a system * Function object or functor or functionoid, a concept of object-oriente ...
of an asymptotically normal statistical
estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...
from knowledge of the limiting
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
of that estimator.


History

The delta method was derived from propagation of error, and the idea behind was known in the early 19th century. Its statistical application can be traced as far back as 1928 by T. L. Kelley. A formal description of the method was presented by
J. L. Doob Joseph Leo Doob (February 27, 1910 – June 7, 2004) was an United States of America, American mathematician, specializing in Mathematical analysis, analysis and probability theory. The theory of Martingale (probability theory), martingales was ...
in 1935. Robert Dorfman also described a version of it in 1938.


Univariate delta method

While the delta method generalizes easily to a multivariate setting, careful motivation of the technique is more easily demonstrated in univariate terms. Roughly, if there is a
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is calle ...
of random variables satisfying :, where ''θ'' and ''σ''2 are finite valued constants and \xrightarrow denotes
convergence in distribution In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications t ...
, then : for any function ''g'' satisfying the property that exists and is non-zero valued.


Proof in the univariate case

Demonstration of this result is fairly straightforward under the assumption that is continuous. To begin, we use the mean value theorem (i.e.: the first order approximation of a
Taylor series In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor ser ...
using
Taylor's theorem In calculus, Taylor's theorem gives an approximation of a ''k''-times differentiable function around a given point by a polynomial of degree ''k'', called the ''k''th-order Taylor polynomial. For a smooth function, the Taylor polynomial is th ...
): :g(X_n)=g(\theta)+g'(\tilde)(X_n-\theta), where \tilde lies between and ''θ''. Note that since X_n\,\xrightarrow\,\theta and , \tilde-\theta, <, X_n-\theta, , it must be that \tilde \,\xrightarrow\,\theta and since is continuous, applying the
continuous mapping theorem In probability theory, the continuous mapping theorem states that continuous functions preserve limits even if their arguments are sequences of random variables. A continuous function, in Heine’s definition, is such a function that maps converg ...
yields :g'(\tilde)\,\xrightarrow\,g'(\theta), where \xrightarrow denotes
convergence in probability In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications t ...
. Rearranging the terms and multiplying by \sqrt gives :\sqrt (X_n)-g(\theta)g' \left (\tilde \right )\sqrt _n-\theta Since : by assumption, it follows immediately from appeal to Slutsky's theorem that :. This concludes the proof.


Proof with an explicit order of approximation

Alternatively, one can add one more step at the end, to obtain the
order of approximation In science, engineering, and other quantitative disciplines, order of approximation refers to formal or informal expressions for how accurate an approximation is. Usage in science and engineering In formal expressions, the ordinal number used b ...
: : \begin \sqrt (X_n)-g(\theta)=g' \left (\tilde \right )\sqrt _n-\theta\ &=\sqrt _n-\thetaleft g'(\tilde )+g'(\theta)-g'(\theta)\right\ &=\sqrt _n-\thetaleft '(\theta)\right\sqrt _n-\thetaleft g'(\tilde )-g'(\theta)\right\ &=\sqrt _n-\thetaleft '(\theta)\rightO_p(1)\cdot o_p(1)\\ &=\sqrt _n-\thetaleft '(\theta)\righto_p(1) \end This suggests that the error in the approximation converges to 0 in probability.


Multivariate delta method

By definition, a
consistent estimator In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter ''θ''0—having the property that as the number of data points used increases indefinitely, the result ...
''B'' converges in probability to its true value ''β'', and often a
central limit theorem In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themsel ...
can be applied to obtain
asymptotic normality In mathematics and statistics, an asymptotic distribution is a probability distribution that is in a sense the "limiting" distribution of a sequence of distributions. One of the main uses of the idea of an asymptotic distribution is in providing ap ...
: :\sqrt\left(B-\beta\right)\,\xrightarrow\,N\left(0, \Sigma \right), where ''n'' is the number of observations and Σ is a (symmetric positive semi-definite) covariance matrix. Suppose we want to estimate the variance of a scalar-valued function ''h'' of the estimator ''B''. Keeping only the first two terms of the
Taylor series In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor ser ...
, and using vector notation for the
gradient In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gr ...
, we can estimate ''h(B)'' as :h(B) \approx h(\beta) + \nabla h(\beta)^T \cdot (B-\beta) which implies the variance of ''h(B)'' is approximately :\begin \operatorname\left(h(B)\right) & \approx \operatorname\left(h(\beta) + \nabla h(\beta)^T \cdot (B-\beta)\right) \\ & = \operatorname\left(h(\beta) + \nabla h(\beta)^T \cdot B - \nabla h(\beta)^T \cdot \beta\right) \\ & = \operatorname\left(\nabla h(\beta)^T \cdot B\right) \\ & = \nabla h(\beta)^T \cdot \operatorname(B) \cdot \nabla h(\beta) \\ & = \nabla h(\beta)^T \cdot \frac \cdot \nabla h(\beta) \end One can use the mean value theorem (for real-valued functions of many variables) to see that this does not rely on taking first order approximation. The delta method therefore implies that :\sqrt\left(h(B)-h(\beta)\right)\,\xrightarrow\,N\left(0, \nabla h(\beta)^T \cdot \Sigma \cdot \nabla h(\beta)\right) or in univariate terms, :\sqrt\left(h(B)-h(\beta)\right)\,\xrightarrow\,N\left(0, \sigma^2 \cdot \left(h^\prime(\beta)\right)^2 \right).


Example: the binomial proportion

Suppose ''Xn'' is binomial with parameters p \in (0,1] and ''n''. Since :, we can apply the Delta method with to see : Hence, even though for any finite ''n'', the variance of \log\left(\frac\right) does not actually exist (since ''Xn'' can be zero), the asymptotic variance of \log \left( \frac \right) does exist and is equal to : \frac. Note that since ''p>0'', \Pr \left( \frac > 0 \right) \rightarrow 1 as n \rightarrow \infty , so with probability converging to one, \log\left(\frac\right) is finite for large ''n''. Moreover, if \hat p and \hat q are estimates of different group rates from independent samples of sizes ''n'' and ''m'' respectively, then the logarithm of the estimated relative risk \frac has asymptotic variance equal to : \frac+\frac. This is useful to construct a hypothesis test or to make a confidence interval for the relative risk.


Alternative form

The delta method is often used in a form that is essentially identical to that above, but without the assumption that or ''B'' is asymptotically normal. Often the only context is that the variance is "small". The results then just give approximations to the means and covariances of the transformed quantities. For example, the formulae presented in Klein (1953, p. 258) are: :\begin \operatorname \left(h_r \right) = & \sum_i \left( \frac \right)^2 \operatorname\left( B_i \right) + \sum_i \sum_ \left( \frac \right) \left( \frac \right) \operatorname\left( B_i, B_j \right) \\ \operatorname\left( h_r, h_s \right) = & \sum_i \left( \frac \right) \left( \frac \right) \operatorname\left( B_i \right) + \sum_i \sum_ \left( \frac \right) \left(\frac \right) \operatorname\left( B_i, B_j \right) \end where is the ''r''th element of ''h''(''B'') and ''Bi'' is the ''i''th element of ''B''.


Second-order delta method

When the delta method cannot be applied. However, if exists and is not zero, the second-order delta method can be applied. By the Taylor expansion, \sqrt (X_n)-g(\theta)\frac\sqrt _n-\theta2\left ''(\theta)\righto_p(1), so that the variance of g\left(X_n\right) relies on up to the 4th moment of X_n. The second-order delta method is also useful in conducting a more accurate approximation of g\left(X_n\right)'s distribution when sample size is small. \sqrt (X_n)-g(\theta)\sqrt _n-\thetag'(\theta)+\frac\sqrt _n-\theta2 g''(\theta) +o_p(1). For example, when X_n follows the standard normal distribution, g\left(X_n\right) can be approximated as the weighted sum of a standard normal and a chi-square with degree-of-freedom of 1.


Nonparametric delta method

A version of the delta method exists in
nonparametric statistics Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being dist ...
. Let X_i \sim F be an
independent and identically distributed In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usual ...
random variable with a sample of size n with an empirical distribution function \hat_n, and let T be a functional. If T is Hadamard differentiable with respect to the Chebyshev metric, then :\frac \xrightarrow N(0, 1) where \widehat = \frac and \hat^2 = \frac\sum_^n \hat^2(X_i), with \hat(x) = L_(\delta_x) denoting the empirical influence function for T. A nonparametric (1-\alpha) pointwise asymptotic confidence interval for T(F) is therefore given by :T(\hat_n) \pm z_ \widehat where z_q denotes the q-quantile of the standard normal. See Wasserman (2006) p. 19f. for details and examples.


See also

* Taylor expansions for the moments of functions of random variables *
Variance-stabilizing transformation In applied statistics, a variance-stabilizing transformation is a data transformation that is specifically chosen either to simplify considerations in graphical exploratory data analysis or to allow the application of simple regression-based or anal ...


References


Further reading

* * *


External links

* * *{{cite web , title=Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes , first1=Jun , last1=Xu , first2=J. Scott , last2=Long, author2-link=J. Scott Long , publisher=Indiana University , date=August 22, 2005 , work=Lecture notes , url=http://www.indiana.edu/~jslsoc/stata/ci_computations/spost_deltaci.pdf Estimation methods Statistical approximations Articles containing proofs Statistics articles needing expert attention de:Statistischer Test#Asymptotisches Verhalten des Tests