Two-step M-estimators
   HOME

TheInfoList



OR:

Two-step M-estimators deals with M-estimation problems that require preliminary estimation to obtain the parameter of interest. Two-step M-estimation is different from usual M-estimation problem because asymptotic distribution of the second-step estimator generally depends on the first-step estimator. Accounting for this change in asymptotic distribution is important for valid inference.


Description

The class of two-step M-estimators includes Heckman's sample selection estimator, weighted
non-linear least squares Non-linear least squares is the form of least squares analysis used to fit a set of ''m'' observations with a model that is non-linear in ''n'' unknown parameters (''m'' ≥ ''n''). It is used in some forms of nonlinear regression. The ...
, and
ordinary least squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression In statistics, linear regression is a statistical model, model that estimates the relationship ...
with generated regressors.Wooldridge, J.M., Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge, Mass. To fix ideas, let \^n_ \subseteq R^d be an i.i.d. sample. \Theta and \Gamma are subsets of Euclidean spaces R^p and R^q , respectively. Given a function m(;;;): R^d \times \Theta \times \Gamma\rightarrow R , two-step M-estimator \hat\theta is defined as: :\hat \theta:=\arg\max_\frac\sum_m\bigl(W_,\theta,\hat\gamma\bigr) where \hat\gamma is an M-estimate of a
nuisance parameter In statistics, a nuisance parameter is any parameter which is unspecified but which must be accounted for in the hypothesis testing of the parameters which are of interest. The classic example of a nuisance parameter comes from the normal distri ...
that needs to be calculated in the first step.
Consistency In deductive logic, a consistent theory is one that does not lead to a logical contradiction. A theory T is consistent if there is no formula \varphi such that both \varphi and its negation \lnot\varphi are elements of the set of consequences ...
of two-step M-estimators can be verified by checking consistency conditions for usual M-estimators, although some modification might be necessary. In practice, the important condition to check is the identification condition. If \hat\gamma\rightarrow\gamma^*, where \gamma^* is a non-random vector, then the identification condition is that E (W_,\theta,\gamma^*)/math> has a unique maximizer over \Theta.


Asymptotic distribution

Under regularity conditions, two-step M-estimators have
asymptotic normality In mathematics and statistics, an asymptotic distribution is a probability distribution that is in a sense the limiting distribution of a sequence of distributions. One of the main uses of the idea of an asymptotic distribution is in providing appr ...
. An important point to note is that the asymptotic variance of a two-step M-estimator is generally not the same as that of the usual M-estimator in which the first step estimation is not necessary.Newey, K.W. and D. McFadden, Large Sample Estimation and Hypothesis Testing, in R. Engel and D. McFadden, eds., Handbook of Econometrics, Vol.4, Amsterdam: North-Holland. This fact is intuitive because \hat\gamma is a random object and its variability should influence the estimation of \Theta. However, there exists a special case in which the asymptotic variance of two-step M-estimator takes the form as if there were no first-step estimation procedure. Such special case occurs if: :E \fracm(W_,\theta_,\gamma^*)=0 where \theta_ is the true value of \theta and \gamma^* is the probability limit of \hat\gamma. To interpret this condition, first note that under regularity conditions, E \fracm(W_,\theta_,\gamma^*)=0 since \theta_ is the maximizer of E m(W_,\theta, \gamma^*)/math>. So the condition above implies that small perturbation in γ has no impact on the
first-order condition In calculus, a derivative test uses the derivatives of a function to locate the critical points of a function and determine whether each point is a local maximum, a local minimum, or a saddle point. Derivative tests can also give information abou ...
. Thus, in large sample, variability of \hat\gamma does not affect the argmax of the objective function, which explains invariant property of asymptotic variance. Of course, this result is valid only as the sample size tends to infinity, so the finite-sample property could be quite different.


Involving MLE

When the first step is a
maximum likelihood estimator In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...
, under some assumptions, two-step M-estimator is more asymptotically efficient (i.e. has smaller asymptotic variance) than M-estimator with known first-step parameter.
Consistency In deductive logic, a consistent theory is one that does not lead to a logical contradiction. A theory T is consistent if there is no formula \varphi such that both \varphi and its negation \lnot\varphi are elements of the set of consequences ...
and
asymptotic normality In mathematics and statistics, an asymptotic distribution is a probability distribution that is in a sense the limiting distribution of a sequence of distributions. One of the main uses of the idea of an asymptotic distribution is in providing appr ...
of the estimator follows from the general result on two-step M-estimators.Wooldridge, J.M., Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge, Mass. Let be a random sample and the second-step M-estimator \widehat is the following: :\widehat := \underset\sum_^n m(v_i,w_i,z_i: \theta,\widehat) where \widehat is the parameter estimated by maximum likelihood in the first step. For the MLE, :\widehat := \underset\sum_^n \log f(v_ : z_ , \gamma) where ''f'' is the conditional density of ''V'' given ''Z''. Now, suppose that given ''Z'', ''V'' is conditionally independent of ''W''. This is called the
conditional independence In probability theory, conditional independence describes situations wherein an observation is irrelevant or redundant when evaluating the certainty of a hypothesis. Conditional independence is usually formulated in terms of conditional probabi ...
assumption or selection on observables.Heckman, J.J., and R. Robb, 1985, Alternative Methods for Evaluating the Impact of Interventions: An Overview, Journal of Econometrics, 30, 239-267. Intuitively, this condition means that Z is a good predictor of V so that once conditioned on ''Z, V'' has no systematic dependence on ''W''. Under the conditional independence assumption, the asymptotic variance of the two-step estimator is: :\mathrm E nabla_\theta s(\theta_0, \gamma_0) \mathrm E (\theta_0, \gamma_0) g(\theta_0, \gamma_0)^\mathrm E nabla_\theta s(\theta_0,\gamma_0) where :\begin g(\theta,\gamma) &:= s(\theta,\gamma)-\mathrm E \nabla_\gamma s(\theta , \gamma)\mathrm E (\gamma) d(\gamma)^ d(\gamma) \\ s(\theta,\gamma) &:= \nabla_\theta m(V, W, Z: \theta, \gamma) \\ d(\gamma) &:= \nabla_\gamma \log f (V : Z, \gamma) \end and represents partial derivative with respect to a row vector. In the case where is known, the asymptotic variance is :\mathrm E nabla_\theta s(\theta_0, \gamma_0) \mathrm E (\theta_0, \gamma_0 )s(\theta_0, \gamma_0 )^\mathrm E nabla_\theta s(\theta_0, \gamma_0) and therefore, unless \mathrm E \nabla_\gamma s(\theta, \gamma) 0, the two-step M-estimator is more efficient than the usual M-estimator. This fact suggests that even when is known a priori, there is an efficiency gain by estimating by MLE. An application of this result can be found, for example, in treatment effect estimation.


Examples

* Generated regressor *
Heckman correction The Heckman correction is a statistical technique to correct bias from non-randomly selected samples or otherwise incidentally truncated dependent variables, a pervasive issue in quantitative social sciences when using observational data. Concep ...
*
Feasible generalized least squares In statistics, generalized least squares (GLS) is a method used to estimate the unknown parameters in a linear regression model. It is used when there is a non-zero amount of correlation between the residuals in the regression model. GLS is emp ...
* Two-step feasible generalized method of moments


See also

*
Adaptive estimator In statistics, an adaptive estimator is an estimator in a parametric or semiparametric model with nuisance parameters such that the presence of these nuisance parameters does not affect efficiency of estimation. Definition Formally, let parameter ...


References

{{Reflist M-estimators Estimator Robust regression Robust statistics