delta method
   HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, the delta method is a method of deriving the asymptotic distribution of a random variable. It is applicable when the random variable being considered can be defined as a
differentiable function In mathematics, a differentiable function of one real variable is a function whose derivative exists at each point in its domain. In other words, the graph of a differentiable function has a non- vertical tangent line at each interior point in ...
of a random variable which is
asymptotically In analytic geometry, an asymptote () of a curve is a line such that the distance between the curve and the line approaches zero as one or both of the ''x'' or ''y'' coordinates tends to infinity. In projective geometry and related contexts, ...
Gaussian.


History

The delta method was derived from propagation of error, and the idea behind was known in the early 20th century. Its statistical application can be traced as far back as 1928 by T. L. Kelley. A formal description of the method was presented by
J. L. Doob Joseph Leo Doob (February 27, 1910 – June 7, 2004) was an American mathematician, specializing in Mathematical analysis, analysis and probability theory. The theory of Martingale (probability theory), martingales was developed by Doob. Early ...
in 1935. Robert Dorfman also described a version of it in 1938.


Univariate delta method

While the delta method generalizes easily to a multivariate setting, careful motivation of the technique is more easily demonstrated in univariate terms. Roughly, if there is a
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is cal ...
of random variables satisfying :, where ''θ'' and ''σ''2 are finite valued constants and \xrightarrow denotes
convergence in distribution In probability theory, there exist several different notions of convergence of sequences of random variables, including ''convergence in probability'', ''convergence in distribution'', and ''almost sure convergence''. The different notions of conve ...
, then : for any function ''g'' satisfying the property that its first derivative, evaluated at \theta, g'(\theta) exists and is non-zero valued. The intuition of the delta method is that any such ''g'' function, in a "small enough" range of the function, can be approximated via a first order
Taylor series In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor ser ...
(which is basically a linear function). If the random variable is roughly normal then a
linear transformation In mathematics, and more specifically in linear algebra, a linear map (also called a linear mapping, linear transformation, vector space homomorphism, or in some contexts linear function) is a mapping V \to W between two vector spaces that pr ...
of it is also normal. Small range can be achieved when approximating the function around the mean, when the variance is "small enough". When g is applied to a random variable such as the mean, the delta method would tend to work better as the sample size increases, since it would help reduce the variance, and thus the Taylor approximation would be applied to a smaller range of the function g at the point of interest.


Proof in the univariate case

Demonstration of this result is fairly straightforward under the assumption that g(x) is differentiable near the neighborhood of \theta and g'(x) is continuous at \theta with g'(\theta)\neq 0. To begin, we use the
mean value theorem In mathematics, the mean value theorem (or Lagrange's mean value theorem) states, roughly, that for a given planar arc (geometry), arc between two endpoints, there is at least one point at which the tangent to the arc is parallel to the secant lin ...
(i.e.: the first order approximation of a
Taylor series In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor ser ...
using
Taylor's theorem In calculus, Taylor's theorem gives an approximation of a k-times differentiable function around a given point by a polynomial of degree k, called the k-th-order Taylor polynomial. For a smooth function, the Taylor polynomial is the truncation a ...
): :g(X_n)=g(\theta)+g'(\tilde)(X_n-\theta), where \tilde lies between and ''θ''. Note that since X_n\,\xrightarrow\,\theta and , \tilde-\theta, <, X_n-\theta, , it must be that \tilde \,\xrightarrow\,\theta and since is continuous, applying the
continuous mapping theorem In probability theory, the continuous mapping theorem states that continuous functions preserve limits even if their arguments are sequences of random variables. A continuous function, in Heine's definition, is such a function that maps converge ...
yields :g'(\tilde)\,\xrightarrow\,g'(\theta), where \xrightarrow denotes
convergence in probability In probability theory, there exist several different notions of convergence of sequences of random variables, including ''convergence in probability'', ''convergence in distribution'', and ''almost sure convergence''. The different notions of conve ...
. Rearranging the terms and multiplying by \sqrt gives :\sqrt (X_n)-g(\theta)g' \left (\tilde \right )\sqrt _n-\theta Since : by assumption, it follows immediately from appeal to Slutsky's theorem that :. This concludes the proof.


Proof with an explicit order of approximation

Alternatively, one can add one more step at the end, to obtain the order of approximation: : \begin \sqrt (X_n)-g(\theta)=g' \left (\tilde \right )\sqrt _n-\theta\ pt&=\sqrt _n-\thetaleft g'(\tilde )+g'(\theta)-g'(\theta)\right\ pt&=\sqrt _n-\thetaleft '(\theta)\right\sqrt _n-\thetaleft g'(\tilde )-g'(\theta)\right\ pt&=\sqrt _n-\thetaleft '(\theta)\rightO_p(1)\cdot o_p(1)\\ pt&=\sqrt _n-\thetaleft '(\theta)\righto_p(1) \end This suggests that the error in the approximation converges to 0 in probability.


Multivariate delta method

By definition, a
consistent estimator In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter ''θ''0—having the property that as the number of data points used increases indefinitely, the result ...
''B'' converges in probability to its true value ''β'', and often a
central limit theorem In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the Probability distribution, distribution of a normalized version of the sample mean converges to a Normal distribution#Standard normal distributi ...
can be applied to obtain
asymptotic normality In mathematics and statistics, an asymptotic distribution is a probability distribution that is in a sense the limiting distribution of a sequence of distributions. One of the main uses of the idea of an asymptotic distribution is in providing appr ...
: :\sqrt\left(B-\beta\right)\,\xrightarrow\,N\left(0, \Sigma \right), where ''n'' is the number of observations and Σ is a (symmetric positive semi-definite)
covariance matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...
. Suppose we want to estimate the variance of a scalar-valued function ''h'' of the estimator ''B''. Keeping only the first two terms of the
Taylor series In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor ser ...
, and using vector notation for the
gradient In vector calculus, the gradient of a scalar-valued differentiable function f of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p gives the direction and the rate of fastest increase. The g ...
, we can estimate ''h(B)'' as :h(B) \approx h(\beta) + \nabla h(\beta)^T \cdot (B-\beta) which implies the variance of ''h(B)'' is approximately :\begin \operatorname\left(h(B)\right) & \approx \operatorname\left(h(\beta) + \nabla h(\beta)^T \cdot (B-\beta)\right) \\ pt & = \operatorname\left(h(\beta) + \nabla h(\beta)^T \cdot B - \nabla h(\beta)^T \cdot \beta\right) \\ pt & = \operatorname\left(\nabla h(\beta)^T \cdot B\right) \\ pt & = \nabla h(\beta)^T \cdot \operatorname(B) \cdot \nabla h(\beta) \\ pt & = \nabla h(\beta)^T \cdot \frac \cdot \nabla h(\beta) \end One can use the
mean value theorem In mathematics, the mean value theorem (or Lagrange's mean value theorem) states, roughly, that for a given planar arc (geometry), arc between two endpoints, there is at least one point at which the tangent to the arc is parallel to the secant lin ...
(for real-valued functions of many variables) to see that this does not rely on taking first order approximation. The delta method therefore implies that :\sqrt\left(h(B)-h(\beta)\right)\,\xrightarrow\,N\left(0, \nabla h(\beta)^T \cdot \Sigma \cdot \nabla h(\beta)\right) or in univariate terms, :\sqrt\left(h(B)-h(\beta)\right)\,\xrightarrow\,N\left(0, \sigma^2 \cdot \left(h^\prime(\beta)\right)^2 \right).


Example: the binomial proportion

Suppose ''Xn'' is
binomial Binomial may refer to: In mathematics *Binomial (polynomial), a polynomial with two terms *Binomial coefficient, numbers appearing in the expansions of powers of binomials *Binomial QMF, a perfect-reconstruction orthogonal wavelet decomposition * ...
with parameters p \in (0,1] and ''n''. Since :, we can apply the Delta method with to see : Hence, even though for any finite ''n'', the variance of \log\left(\frac\right) does not actually exist (since ''Xn'' can be zero), the asymptotic variance of \log \left( \frac \right) does exist and is equal to : \frac. Note that since ''p>0'', \Pr \left( \frac > 0 \right) \rightarrow 1 as n \rightarrow \infty , so with probability converging to one, \log\left(\frac\right) is finite for large ''n''. Moreover, if \hat p and \hat q are estimates of different group rates from independent samples of sizes ''n'' and ''m'' respectively, then the logarithm of the estimated
relative risk The relative risk (RR) or risk ratio is the ratio of the probability of an outcome in an exposed group to the probability of an outcome in an unexposed group. Together with risk difference and odds ratio, relative risk measures the association bet ...
\frac has asymptotic variance equal to : \frac+\frac. This is useful to construct a hypothesis test or to make a confidence interval for the relative risk.


Alternative form

The delta method is often used in a form that is essentially identical to that above, but without the assumption that or ''B'' is asymptotically normal. Often the only context is that the variance is "small". The results then just give approximations to the means and covariances of the transformed quantities. For example, the formulae presented in Klein (1953, p. 258) are: :\begin \operatorname \left(h_r \right) = & \sum_i \left( \frac \right)^2 \operatorname\left( B_i \right) + \sum_i \sum_ \left( \frac \right) \left( \frac \right) \operatorname\left( B_i, B_j \right) \\ \operatorname\left( h_r, h_s \right) = & \sum_i \left( \frac \right) \left( \frac \right) \operatorname\left( B_i \right) + \sum_i \sum_ \left( \frac \right) \left(\frac \right) \operatorname\left( B_i, B_j \right) \end where is the ''r''th element of ''h''(''B'') and ''Bi'' is the ''i''th element of ''B''.


Second-order delta method

When the delta method cannot be applied. However, if exists and is not zero, the second-order delta method can be applied. By the Taylor expansion, n (X_n)-g(\theta)\fracn _n-\theta2\left ''(\theta)\righto_p(1), so that the variance of g\left(X_n\right) relies on up to the 4th moment of X_n. The second-order delta method is also useful in conducting a more accurate approximation of g\left(X_n\right)'s distribution when sample size is small. \sqrt (X_n)-g(\theta)\sqrt _n-\thetag'(\theta)+\frac\frac g''(\theta) +o_p(1). For example, when X_n follows the standard normal distribution, g\left(X_n\right) can be approximated as the weighted sum of a standard normal and a chi-square with 1 degree of freedom.


Nonparametric delta method

A version of the delta method exists in
nonparametric statistics Nonparametric statistics is a type of statistical analysis that makes minimal assumptions about the underlying distribution of the data being studied. Often these models are infinite-dimensional, rather than finite dimensional, as in parametric s ...
. Let X_i \sim F be an
independent and identically distributed Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in Pennsylvania, United States * Independentes (English: Independents), a Portuguese artist ...
random variable with a sample of size n with an
empirical distribution function In statistics, an empirical distribution function ( an empirical cumulative distribution function, eCDF) is the Cumulative distribution function, distribution function associated with the empirical measure of a Sampling (statistics), sample. Th ...
\hat_n, and let T be a functional. If T is Hadamard differentiable with respect to the Chebyshev metric, then :\frac \xrightarrow N(0, 1) where \widehat = \frac and \hat^2 = \frac\sum_^n \hat^2(X_i), with \hat(x) = L_(\delta_x) denoting the empirical influence function for T. A nonparametric (1-\alpha) pointwise asymptotic confidence interval for T(F) is therefore given by :T(\hat_n) \pm z_ \widehat where z_q denotes the q-quantile of the standard normal. See Wasserman (2006) p. 19f. for details and examples.


See also

* Taylor expansions for the moments of functions of random variables *
Variance-stabilizing transformation In applied statistics, a variance-stabilizing transformation is a data transformation that is specifically chosen either to simplify considerations in graphical exploratory data analysis or to allow the application of simple regression-based or ana ...


References


Further reading

* * *


External links

* *{{cite web , title=Explanation of the delta method , first=Alan H. , last=Feiveson , publisher=Stata Corp. , url=https://www.stata.com/support/faqs/statistics/delta-method/ Estimation methods Statistical approximations Articles containing proofs Statistics articles needing expert attention