Generalized Least-squares
   HOME

TheInfoList



OR:

In statistics, generalized least squares (GLS) is a technique for estimating the unknown
parameter A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
s in a linear regression model when there is a certain degree of correlation between the residuals in a
regression model In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
. In these cases,
ordinary least squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the ...
and
weighted least squares Weighted least squares (WLS), also known as weighted linear regression, is a generalization of ordinary least squares and linear regression in which knowledge of the variance of observations is incorporated into the regression. WLS is also a speci ...
can be statistically inefficient, or even give misleading
inferences Inferences are steps in reasoning, moving from premises to logical consequences; etymologically, the word ''infer'' means to "carry forward". Inference is theoretically traditionally divided into deduction and induction, a distinction that in ...
. GLS was first described by Alexander Aitken in 1936.


Method outline

In standard linear regression models we observe data \_ on ''n''
statistical unit In statistics, a unit is one member of a set of entities being studied. It is the main source for the mathematical abstraction of a "random variable". Common examples of a unit would be a single person, animal, plant, manufactured item, or country ...
s. The response values are placed in a vector \mathbf = \left( y_, \dots, y_ \right)^, and the predictor values are placed in the
design matrix In statistics and in particular in regression analysis, a design matrix, also known as model matrix or regressor matrix and often denoted by X, is a matrix of values of explanatory variables of a set of objects. Each row represents an individual ob ...
\mathbf = \left( \mathbf_^, \dots, \mathbf_^ \right)^, where \mathbf_ = \left( 1, x_, \dots, x_ \right) is a vector of the ''k'' predictor variables (including a constant) for the ''i''th unit. The model forces the
conditional mean In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value – the value it would take “on average” over an arbitrarily large number of occurrences – giv ...
of \mathbf given \mathbf to be a linear function of \mathbf, and assumes the conditional
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
of the error term given \mathbf is a ''known'' nonsingular '' covariance matrix'' \mathbf. This is usually written as : \mathbf = \mathbf \mathbf + \mathbf, \qquad \operatorname varepsilon\mid\mathbf0,\ \operatorname varepsilon\mid\mathbf \mathbf. Here \beta \in \mathbb^k is a vector of unknown constants (known as “regression coefficients”) that must be estimated from the data. Suppose \mathbf is a candidate estimate for \mathbf. Then the residual vector for \mathbf will be \mathbf- \mathbf \mathbf. The generalized least squares method estimates \mathbf by minimizing the squared Mahalanobis length of this residual vector: : \mathbf = \underset\operatorname\,(\mathbf- \mathbf \mathbf)^\mathbf^(\mathbf- \mathbf \mathbf) = \underset\operatorname\,\mathbf^\,\mathbf^\mathbf + (\mathbf \mathbf)^ \mathbf^ \mathbf \mathbf - \mathbf^\mathbf^\mathbf \mathbf-(\mathbf \mathbf)^\mathbf^\mathbf\, , where the last two terms evaluate to scalars, resulting in : \mathbf = \underset\operatorname\,\mathbf^\,\mathbf^\mathbf + \mathbf^ \mathbf^ \mathbf^ \mathbf \mathbf -2 \mathbf^ \mathbf ^\mathbf^\mathbf\, . This objective is a quadratic form in \mathbf. Taking the gradient of this quadratic form with respect to \mathbf and equating it to zero (when \mathbf=\hat) gives : 2 \mathbf^ \mathbf^ \mathbf \hat -2 \mathbf ^\mathbf^\mathbf = 0 Therefore, the minimum of the objective function can be computed yielding the explicit formula: : \mathbf = \left( \mathbf^ \mathbf^ \mathbf \right)^ \mathbf^\mathbf^\mathbf. The quantity \mathbf^ is known as the ''
precision matrix In statistics, the precision matrix or concentration matrix is the matrix inverse of the covariance matrix or dispersion matrix, P = \Sigma^. For univariate distributions, the precision matrix degenerates into a scalar precision, defined as the ...
'' (or ''dispersion matrix''), a generalization of the diagonal
weight matrix Weighted least squares (WLS), also known as weighted linear regression, is a generalization of ordinary least squares and linear regression in which knowledge of the variance of observations is incorporated into the regression. WLS is also a speci ...
.


Properties

The GLS estimator is
unbiased Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group, ...
,
consistent In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consistent ...
, efficient, and asymptotically normal with \operatorname hat\beta\mid\mathbf= \beta and \operatorname hat\mid\mathbf= (\mathbf^\Omega^\mathbf)^. GLS is equivalent to applying ordinary least squares to a linearly transformed version of the data. To see this, factor \mathbf = \mathbf \mathbf^, for instance using the
Cholesky decomposition In linear algebra, the Cholesky decomposition or Cholesky factorization (pronounced ) is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which is useful for effici ...
. Then if we pre-multiply both sides of the equation \mathbf = \mathbf \mathbf + \mathbf by \mathbf^, we get an equivalent linear model \mathbf^ = \mathbf^ \mathbf + \mathbf^ where \mathbf^ = \mathbf^ \mathbf, \mathbf^ = \mathbf^ \mathbf, and \mathbf^ = \mathbf^ \mathbf. In this model \operatorname varepsilon^\mid\mathbf \mathbf^ \mathbf \left(\mathbf^ \right)^ = \mathbf, where \mathbf is the identity matrix. Thus we can efficiently estimate \mathbf by applying
Ordinary least squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the ...
(OLS) to the transformed data, which requires minimizing : \left(\mathbf^ - \mathbf^ \mathbf \right)^ (\mathbf^ - \mathbf^ \mathbf) = (\mathbf- \mathbf \mathbf)^\,\mathbf^(\mathbf- \mathbf \mathbf). This has the effect of standardizing the scale of the errors and “de-correlating” them. Since OLS is applied to data with homoscedastic errors, the
Gauss–Markov theorem In statistics, the Gauss–Markov theorem (or simply Gauss theorem for some authors) states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the ...
applies, and therefore the GLS estimate is the
best linear unbiased estimator Best or The Best may refer to: People * Best (surname), people with the surname Best * Best (footballer, born 1968), retired Portuguese footballer Companies and organizations * Best & Co., an 1879–1971 clothing chain * Best Lock Corporation ...
for ''β''.


Weighted least squares

A special case of GLS called weighted least squares (WLS) occurs when all the off-diagonal entries of ''Ω'' are 0. This situation arises when the variances of the observed values are unequal (i.e.  heteroscedasticity is present), but where no correlations exist among the observed variances. The weight for unit ''i'' is proportional to the reciprocal of the variance of the response for unit ''i''.


Feasible generalized least squares

If the covariance of the errors \Omega is unknown, one can get a consistent estimate of \Omega , say \widehat \Omega ,Baltagi, B. H. (2008). Econometrics (4th ed.). New York: Springer. using an implementable version of GLS known as the feasible generalized least squares (FGLS) estimator. In FGLS, modeling proceeds in two stages: (1) the model is estimated by OLS or another consistent (but inefficient) estimator, and the residuals are used to build a consistent estimator of the errors covariance matrix (to do so, one often needs to examine the model adding additional constraints, for example if the errors follow a time series process, a statistician generally needs some theoretical assumptions on this process to ensure that a consistent estimator is available); and (2) using the consistent estimator of the covariance matrix of the errors, one can implement GLS ideas. Whereas GLS is more efficient than OLS under heteroscedasticity (also spelled heteroskedasticity) or autocorrelation, this is not true for FGLS. The feasible estimator is, provided the errors covariance matrix is consistently estimated, ''asymptotically'' more efficient, but for a small or medium size sample, it can be actually less efficient than OLS. This is why some authors prefer to use OLS, and reformulate their inferences by simply considering an alternative estimator for the variance of the estimator robust to heteroscedasticity or serial autocorrelation. But for large samples FGLS is preferred over OLS under heteroskedasticity or serial correlation.Greene, W. H. (2003). Econometric Analysis (5th ed.). Upper Saddle River, NJ: Prentice Hall. A cautionary note is that the FGLS estimator is not always consistent. One case in which FGLS might be inconsistent is if there are individual specific fixed effects. In general this estimator has different properties than GLS. For large samples (i.e., asymptotically) all properties are (under appropriate conditions) common with respect to GLS, but for finite samples the properties of FGLS estimators are unknown: they vary dramatically with each particular model, and as a general rule their exact distributions cannot be derived analytically. For finite samples, FGLS may be even less efficient than OLS in some cases. Thus, while GLS can be made feasible, it is not always wise to apply this method when the sample is small. A method sometimes used to improve the accuracy of the estimators in finite samples is to iterate, i.e. taking the residuals from FGLS to update the errors covariance estimator, and then updating the FGLS estimation, applying the same idea iteratively until the estimators vary less than some tolerance. But this method does not necessarily improve the efficiency of the estimator very much if the original sample was small. A reasonable option when samples are not too large is to apply OLS, but throwing away the classical variance estimator : \sigma^2*(X'X)^ (which is inconsistent in this framework) and using a HAC (Heteroskedasticity and Autocorrelation Consistent) estimator. For example, in autocorrelation context we can use the Bartlett estimator (often known as
Newey–West estimator __NOTOC__ A Newey–West estimator is used in statistics and econometrics to provide an estimate of the covariance matrix of the parameters of a regression-type model where the standard assumptions of regression analysis do not apply. It was devi ...
estimator since these authors popularized the use of this estimator among econometricians in their 1987 ''Econometrica'' article), and in heteroskedastic context we can use the Eicker–White estimator. This approach is much safer, and it is the appropriate path to take unless the sample is large, and "large" is sometimes a slippery issue (e.g. if the errors distribution is asymmetric the required sample would be much larger). The
ordinary least squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the ...
(OLS) estimator is calculated as usual by : \widehat \beta_\text = (X' X)^ X' y and estimates of the residuals \widehat_j= (Y-X\widehat\beta_\text)_j are constructed. For simplicity consider the model for heteroscedastic and not autocorrelated errors. Assume that the variance-covariance matrix \Omega of the error vector is diagonal, or equivalently that errors from distinct observations are uncorrelated. Then each diagonal entry may be estimated by the fitted residuals \widehat_j so \widehat_ may be constructed by : \widehat_\text = \operatorname(\widehat^2_1, \widehat^2_2, \dots , \widehat^2_n). It is important to notice that the squared residuals cannot be used in the previous expression; we need an estimator of the errors variances. To do so, we can use a parametric heteroskedasticity model, or a nonparametric estimator. Once this step is fulfilled, we can proceed: Estimate \beta_ using \widehat_\text using
weighted least squares Weighted least squares (WLS), also known as weighted linear regression, is a generalization of ordinary least squares and linear regression in which knowledge of the variance of observations is incorporated into the regression. WLS is also a speci ...
: \widehat \beta_ = (X'\widehat^_\text X)^ X' \widehat^_\text y The procedure can be iterated. The first iteration is given by : \widehat_ = Y - X \widehat \beta_ : \widehat_ = \operatorname(\widehat^2_, \widehat^2_, \dots ,\widehat^2_) : \widehat \beta_ = (X'\widehat^_ X)^ X' \widehat^_ y This estimation of \widehat can be iterated to convergence. Under regularity conditions any of the FGLS estimator (or that of any of its iterations, if we iterate a finite number of times) is asymptotically distributed as : \sqrt(\hat\beta_ - \beta)\ \xrightarrow\ \mathcal\!\left(0,\,V\right). where n is the sample size and : V = \operatorname(X'\Omega^X/T) here p-lim means limit in probability


See also

*
Confidence region In statistics, a confidence region is a multi-dimensional generalization of a confidence interval. It is a set of points in an ''n''-dimensional space, often represented as an ellipsoid around a point which is an estimated solution to a problem, al ...
* Effective degrees of freedom *
Prais–Winsten estimation In econometrics, Prais–Winsten estimation is a procedure meant to take care of the serial correlation of type AR(1) in a linear model. Conceived by Sigbert Prais and Christopher Winsten in 1954, it is a modification of Cochrane–Orcutt esti ...


References


Further reading

* * * * {{DEFAULTSORT:Generalized Least Squares Least squares Estimation methods Regression with time series structure