Two-step M-estimator
   HOME

TheInfoList



OR:

Two-step M-estimators deals with
M-estimation In statistics, M-estimators are a broad class of extremum estimators for which the objective function is a sample average. Both non-linear least squares and maximum likelihood estimation are special cases of M-estimators. The definition of M-estima ...
problems that require preliminary estimation to obtain the parameter of interest. Two-step M-estimation is different from usual M-estimation problem because asymptotic distribution of the second-step estimator generally depends on the first-step estimator. Accounting for this change in asymptotic distribution is important for valid inference.


Description

The class of two-step M-estimators includes Heckman's sample selection estimator, weighted
non-linear least squares Non-linear least squares is the form of least squares analysis used to fit a set of ''m'' observations with a model that is non-linear in ''n'' unknown parameters (''m'' ≥ ''n''). It is used in some forms of nonlinear regression. The ...
, and
ordinary least squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the prin ...
with generated regressors.Wooldridge, J.M., Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge, Mass. To fix ideas, let \^n_ \subseteq R^d be an
i.i.d. In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is us ...
sample. \Theta and \Gamma are subsets of Euclidean spaces R^p and R^q , respectively. Given a function m(;;;): R^d \times \Theta \times \Gamma\rightarrow R , two-step M-estimator \hat\theta is defined as: :\hat \theta:=\arg\max_\frac\sum_m\bigl(W_,\theta,\hat\gamma\bigr) where \hat\gamma is an M-estimate of a
nuisance parameter Nuisance (from archaic ''nocence'', through Fr. ''noisance'', ''nuisance'', from Lat. ''nocere'', "to hurt") is a common law tort. It means that which causes offence, annoyance, trouble or injury. A nuisance can be either public (also "common") ...
that needs to be calculated in the first step.
Consistency In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consistent ...
of two-step M-estimators can be verified by checking consistency conditions for usual M-estimators, although some modification might be necessary. In practice, the important condition to check is the identification condition. If \hat\gamma\rightarrow\gamma^*, where \gamma^* is a non-random vector, then the identification condition is that E (W_,\theta,\gamma^*)/math> has a unique maximizer over \Theta.


Asymptotic distribution

Under regularity conditions, two-step M-estimators have
asymptotic normality In mathematics and statistics, an asymptotic distribution is a probability distribution that is in a sense the "limiting" distribution of a sequence of distributions. One of the main uses of the idea of an asymptotic distribution is in providing a ...
. An important point to note is that the
asymptotic variance In mathematics and statistics, an asymptotic distribution is a probability distribution that is in a sense the "limiting" distribution of a sequence of distributions. One of the main uses of the idea of an asymptotic distribution is in providing ap ...
of a two-step M-estimator is generally not the same as that of the usual M-estimator in which the first step estimation is not necessary.Newey, K.W. and D. McFadden, Large Sample Estimation and Hypothesis Testing, in R. Engel and D. McFadden, eds., Handbook of Econometrics, Vol.4, Amsterdam: North-Holland. This fact is intuitive because \hat\gamma is a random object and its variability should influence the estimation of \Theta. However, there exists a special case in which the asymptotic variance of two-step M-estimator takes the form as if there were no first-step estimation procedure. Such special case occurs if: :E \fracm(W_,\theta_,\gamma^*)=0 where \theta_ is the true value of \theta and \gamma^* is the probability limit of \hat\gamma. To interpret this condition, first note that under regularity conditions, E \fracm(W_,\theta_,\gamma^*)=0 since \theta_ is the maximizer of E m(W_,\theta, \gamma^*)/math>. So the condition above implies that small perturbation in γ has no impact on the
first-order condition In calculus, a derivative test uses the derivatives of a function to locate the critical points of a function and determine whether each point is a local maximum, a local minimum, or a saddle point. Derivative tests can also give information about ...
. Thus, in large sample, variability of \hat\gamma does not affect the argmax of the objective function, which explains invariant property of asymptotic variance. Of course, this result is valid only as the sample size tends to infinity, so the finite-sample property could be quite different.


Involving MLE

When the first step is a
maximum likelihood estimator In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statist ...
, under some assumptions, two-step M-estimator is more asymptotically efficient (i.e. has smaller asymptotic variance) than M-estimator with known first-step parameter.
Consistency In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consistent ...
and
asymptotic normality In mathematics and statistics, an asymptotic distribution is a probability distribution that is in a sense the "limiting" distribution of a sequence of distributions. One of the main uses of the idea of an asymptotic distribution is in providing a ...
of the estimator follows from the general result on two-step M-estimators.Wooldridge, J.M., Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge, Mass. Let be a random sample and the second-step M-estimator \widehat is the following: :\widehat := \underset\sum_^n m(v_i,w_i,z_i: \theta,\widehat) where \widehat is the parameter estimated by maximum likelihood in the first step. For the MLE, :\widehat := \underset\sum_^n \log f(v_ : z_ , \gamma) where ''f'' is the conditional density of ''V'' given ''Z''. Now, suppose that given ''Z'', ''V'' is conditionally independent of ''W''. This is called the
conditional independence In probability theory, conditional independence describes situations wherein an observation is irrelevant or redundant when evaluating the certainty of a hypothesis. Conditional independence is usually formulated in terms of conditional probabil ...
assumption or selection on observables.Heckman, J.J., and R. Robb, 1985, Alternative Methods for Evaluating the Impact of Interventions: An Overview, Journal of Econometrics, 30, 239-267. Intuitively, this condition means that Z is a good predictor of V so that once conditioned on ''Z, V'' has no systematic dependence on ''W''. Under the conditional independence assumption, the
asymptotic variance In mathematics and statistics, an asymptotic distribution is a probability distribution that is in a sense the "limiting" distribution of a sequence of distributions. One of the main uses of the idea of an asymptotic distribution is in providing ap ...
of the two-step estimator is: :\mathrm E nabla_\theta s(\theta_0, \gamma_0) \mathrm E (\theta_0, \gamma_0) g(\theta_0, \gamma_0)^\mathrm E nabla_\theta s(\theta_0,\gamma_0) where :\begin g(\theta,\gamma) &:= s(\theta,\gamma)-\mathrm E s(\theta , \gamma) \nabla_\gamma d(\gamma)^ \mathrm E nabla_\gamma d(\gamma) \nabla_\gamma d(\gamma)^ d(\gamma) \\ s(\theta,\gamma) &:= \nabla_\theta m(V, W, Z: \theta, \gamma) \\ d(\gamma) &:= \nabla_\gamma \log f (V : Z, \gamma) \end and represents partial derivative with respect to a row vector. In the case where is known, the asymptotic variance is :\mathrm E nabla_\theta s(\theta_0, \gamma_0) \mathrm E (\theta_0, \gamma_0 )s(\theta_0, \gamma_0 )^\mathrm E nabla_\theta s(\theta_0, \gamma_0) and therefore, unless \mathrm E s(\theta, \gamma) \nabla_\gamma d(\gamma)^ 0, the two-step M-estimator is more efficient than the usual M-estimator. This fact suggests that even when is known a priori, there is an efficiency gain by estimating by MLE. An application of this result can be found, for example, in treatment effect estimation.


Examples

*
Generated regressor In least squares estimation problems, sometimes one or more regressors specified in the model are not observable. One way to circumvent this issue is to estimate or generate regressors from observable data. This generated regressor method is also ...
*
Heckman correction The Heckman correction is a statistical technique to correct bias from non-randomly selected samples or otherwise incidentally truncated dependent variables, a pervasive issue in quantitative social sciences when using observational data. Conceptu ...
*
Feasible generalized least squares In statistics, generalized least squares (GLS) is a technique for estimating the unknown parameters in a linear regression model when there is a certain degree of correlation between the residuals in a regression model. In these cases, ordinar ...
* Two-step feasible generalized method of moments


See also

*
Adaptive estimator In statistics, an adaptive estimator is an estimator in a parametric or semiparametric model with nuisance parameters such that the presence of these nuisance parameters does not affect efficiency of estimation. Definition Formally, let parameter ...


References

{{Reflist M-estimators Estimator Robust regression Robust statistics