The topic of heteroskedasticity-consistent (HC) standard errors arises in
statistics
Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
and
econometrics
Econometrics is the application of statistical methods to economic data in order to give empirical content to economic relationships. M. Hashem Pesaran (1987). "Econometrics," '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. ...
in the context of
linear regression
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is cal ...
and
time series analysis
In mathematics
Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in m ...
. These are also known as heteroskedasticity-robust standard errors (or simply robust standard errors), Eicker–Huber–White standard errors (also Huber–White standard errors or White standard errors), to recognize the contributions of
Friedhelm Eicker,
Peter J. Huber
Peter Jost Huber (born 25 March 1934) is a Swiss statistician. He is known for his contributions to the development of heteroscedasticity-consistent standard errors.
A native of Wohlen, Aargau, Huber earned his Ph.D. at the ETH Zürich in 1962, ...
, and
Halbert White
Halbert Lynn White Jr. (November 19, 1950 – March 31, 2012) was the Chancellor’s Associates Distinguished Professor of Economics at the University of California, San Diego, and a Fellow of the Econometric Society and the American Academy of ...
.
In regression and time-series modelling, basic forms of models make use of the assumption that the errors or disturbances ''u''
''i'' have the same variance across all observation points. When this is not the case, the errors are said to be heteroskedastic, or to have
heteroskedasticity
In statistics, a sequence (or a vector) of random variables is homoscedastic () if all its random variables have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The s ...
, and this behaviour will be reflected in the residuals
estimated from a fitted model. Heteroskedasticity-consistent standard errors are used to allow the fitting of a model that does contain heteroskedastic residuals. The first such approach was proposed by Huber (1967), and further improved procedures have been produced since for cross-sectional data,
time-series
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Ex ...
data and
GARCH estimation.
Heteroskedasticity-consistent standard errors that differ from classical standard errors may indicate model misspecification. Substituting heteroskedasticity-consistent standard errors does not resolve this misspecification, which may lead to bias in the coefficients. In most situations, the problem should be found and fixed. Other types of standard error adjustments, such as
clustered standard errors or
HAC standard errors, may be considered as extensions to HC standard errors.
History
Heteroskedasticity-consistent standard errors are introduced by
Friedhelm Eicker, and popularized in econometrics by
Halbert White
Halbert Lynn White Jr. (November 19, 1950 – March 31, 2012) was the Chancellor’s Associates Distinguished Professor of Economics at the University of California, San Diego, and a Fellow of the Econometric Society and the American Academy of ...
.
Problem
Consider the linear regression model for the scalar ''Y''.
:
where
is a ''k'' x 1 column vector of explanatory variables (features),
is a ''k'' × 1 column vector of parameters to be estimated, and
is the
residual error.
The
ordinary least squares
In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the ...
(OLS) estimator is
:
where
is a vector of observations
, and
denotes the matrix of stacked
values observed in the data.
If the
sample errors have equal variance
and are
uncorrelated
In probability theory and statistics, two real-valued random variables, X, Y, are said to be uncorrelated if their covariance, \operatorname ,Y= \operatorname Y- \operatorname \operatorname /math>, is zero. If two variables are uncorrelated, ther ...
, then the least-squares estimate of
is
BLUE
Blue is one of the three primary colours in the RYB colour model (traditional colour theory), as well as in the RGB (additive) colour model. It lies between violet and cyan on the spectrum of visible light. The eye perceives blue when ...
(best linear unbiased estimator), and its variance is estimated with
:
where
are the regression residuals.
When the error terms do not have constant variance (i.e., the assumption of
is untrue), the OLS estimator loses its desirable properties. The formula for variance now cannot be simplified:
:
where
While the OLS point estimator remains unbiased, it is not "best" in the sense of having minimum mean square error, and the OLS variance estimator