Two-step M-estimators deals with M-estimation problems that require preliminary estimation to obtain the parameter of interest. Two-step M-estimation is different from usual M-estimation problem because asymptotic distribution of the second-step estimator generally depends on the first-step estimator. Accounting for this change in asymptotic distribution is important for valid inference.

Description

The class of two-step M-estimators includes Heckman's sample selection estimator, weighted

non-linear least squares Non-linear least squares is the form of least squares analysis used to fit a set of ''m'' observations with a model that is non-linear in ''n'' unknown parameters (''m'' ≥ ''n''). It is used in some forms of nonlinear regression. The ...

, and

ordinary least squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the ...

with

generated regressors In least squares estimation problems, sometimes one or more regressors Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, ...

.Wooldridge, J.M., Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge, Mass. To fix ideas, let

\^n_ \subseteq R^d

be an i.i.d. sample.

\Theta

and

\Gamma

are subsets of Euclidean spaces

R^p

and

R^q

, respectively. Given a function

m(;;;): R^d \times \Theta \times \Gamma\rightarrow R

, two-step M-estimator

\hat\theta

is defined as: :

\hat \theta:=\arg\max_\frac\sum_m\bigl(W_,\theta,\hat\gamma\bigr)

where

\hat\gamma

is an M-estimate of a

nuisance parameter Nuisance (from archaic ''nocence'', through Fr. ''noisance'', ''nuisance'', from Lat. ''nocere'', "to hurt") is a common law tort. It means that which causes offence, annoyance, trouble or injury. A nuisance can be either public (also "common") ...

that needs to be calculated in the first step.

Consistency In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consistent ...

of two-step M-estimators can be verified by checking consistency conditions for usual M-estimators, although some modification might be necessary. In practice, the important condition to check is the

identification condition In statistics, identifiability is a property which a statistical model, model must satisfy for precise statistical inference, inference to be possible. A model is identifiable if it is theoretically possible to learn the true values of this model' ...

. If

\hat\gamma\rightarrow\gamma^*,

where

\gamma^*

is a non-random vector, then the identification condition is that

E (W_,\theta,\gamma^*) /math> has a unique maximizer over \Theta .

Asymptotic distribution

Under regularity conditions, two-step M-estimators have

asymptotic normality In mathematics and statistics, an asymptotic distribution is a probability distribution that is in a sense the "limiting" distribution of a sequence of distributions. One of the main uses of the idea of an asymptotic distribution is in providing ap ...

. An important point to note is that the

asymptotic variance In mathematics and statistics, an asymptotic distribution is a probability distribution that is in a sense the "limiting" distribution of a sequence of distributions. One of the main uses of the idea of an asymptotic distribution is in providing ...

of a two-step M-estimator is generally not the same as that of the usual M-estimator in which the first step estimation is not necessary.Newey, K.W. and D. McFadden, Large Sample Estimation and Hypothesis Testing, in R. Engel and D. McFadden, eds., Handbook of Econometrics, Vol.4, Amsterdam: North-Holland. This fact is intuitive because

\hat\gamma

is a random object and its variability should influence the estimation of

\Theta

. However, there exists a special case in which the asymptotic variance of two-step M-estimator takes the form as if there were no first-step estimation procedure. Such special case occurs if: :

E \fracm(W_,\theta_,\gamma^*)=0

where

\theta_

is the true value of

\theta

and

\gamma^*

is the probability limit of

\hat\gamma

. To interpret this condition, first note that under regularity conditions,

E \fracm(W_,\theta_,\gamma^*)=0

since

\theta_

is the maximizer of

E m(W_,\theta,  \gamma^*) /math>.  So the condition above implies that small perturbation in γ has no impact on the

first-order condition In calculus, a derivative test uses the derivatives of a function to locate the critical points of a function and determine whether each point is a local maximum, a local minimum, or a saddle point. Derivative tests can also give information abo ...

. Thus, in large sample, variability of

\hat\gamma

does not affect the argmax of the objective function, which explains invariant property of asymptotic variance. Of course, this result is valid only as the sample size tends to infinity, so the finite-sample property could be quite different.

Involving MLE

When the first step is a

maximum likelihood estimator In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...

, under some assumptions, two-step M-estimator is more asymptotically efficient (i.e. has smaller asymptotic variance) than M-estimator with known first-step parameter.

and

of the estimator follows from the general result on two-step M-estimators.Wooldridge, J.M., Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge, Mass. Let be a random sample and the second-step M-estimator

\widehat

is the following: :

\widehat := \underset\sum_^n m(v_i,w_i,z_i: \theta,\widehat)

where

\widehat

is the parameter estimated by maximum likelihood in the first step. For the MLE, :

\widehat := \underset\sum_^n \log f(v_ : z_ , \gamma)

where ''f'' is the conditional density of ''V'' given ''Z''. Now, suppose that given ''Z'', ''V'' is conditionally independent of ''W''. This is called the

conditional independence In probability theory, conditional independence describes situations wherein an observation is irrelevant or redundant when evaluating the certainty of a hypothesis. Conditional independence is usually formulated in terms of conditional probabil ...

assumption or selection on observables.Heckman, J.J., and R. Robb, 1985, Alternative Methods for Evaluating the Impact of Interventions: An Overview, Journal of Econometrics, 30, 239-267. Intuitively, this condition means that Z is a good predictor of V so that once conditioned on ''Z, V'' has no systematic dependence on ''W''. Under the conditional independence assumption, the

of the two-step estimator is: :

\mathrm E nabla_\theta s(\theta_0, \gamma_0) \mathrm E (\theta_0, \gamma_0) g(\theta_0, \gamma_0)^\mathrm E nabla_\theta s(\theta_0,\gamma_0)

where :

d(\gamma) \\ s(\theta,\gamma) &:= \nabla_\theta m(V, W, Z: \theta, \gamma) \\ d(\gamma) &:= \nabla_\gamma \log f (V : Z, \gamma) \end

and represents partial derivative with respect to a row vector. In the case where is known, the asymptotic variance is :

\mathrm E nabla_\theta  s(\theta_0, \gamma_0) \mathrm E (\theta_0, \gamma_0 )s(\theta_0, \gamma_0 )^\mathrm E nabla_\theta  s(\theta_0, \gamma_0)

and therefore, unless

\mathrm E s(\theta, \gamma) \nabla_\gamma d(\gamma)^0

, the two-step M-estimator is more efficient than the usual M-estimator. This fact suggests that even when is known a priori, there is an efficiency gain by estimating by MLE. An application of this result can be found, for example, in treatment effect estimation.

Examples

Generated regressor In least squares estimation problems, sometimes one or more regressors specified in the model are not observable. One way to circumvent this issue is to estimate or generate regressors from observable data. This generated regressor method is also a ...

Heckman correction The Heckman correction is a statistical technique to correct bias from non-randomly selected samples or otherwise incidentally truncated dependent variables, a pervasive issue in quantitative social sciences when using observational data. Conceptu ...

Feasible generalized least squares In statistics, generalized least squares (GLS) is a technique for estimating the unknown parameters in a linear regression model when there is a certain degree of correlation between the residuals in a regression model. In these cases, ordina ...

* Two-step feasible generalized method of moments

References

{{Reflist M-estimators Estimator Robust regression Robust statistics

Description

Asymptotic distribution

Involving MLE

Examples

See also

References