Breusch–Pagan Test
   HOME

TheInfoList



OR:

In statistics, the Breusch–Pagan test, developed in 1979 by Trevor Breusch and
Adrian Pagan Adrian Rodney Pagan (born 12 January 1947 in Mungindi, Queensland) is an Australian economist and Professor of Economics in the School of Economics at the University of Sydney. From 1995 to 2000, he was a member of the board of the Reserve Ban ...
, is used to test for
heteroskedasticity In statistics, a sequence (or a vector) of random variables is homoscedastic () if all its random variables have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The s ...
in a linear regression model. It was independently suggested with some extension by R. Dennis Cook and Sanford Weisberg in 1983 (Cook–Weisberg test). Derived from the
Lagrange multiplier test In statistics, the score test assesses constraints on statistical parameters based on the gradient of the likelihood function—known as the ''score''—evaluated at the hypothesized parameter value under the null hypothesis. Intuitively, if the ...
principle, it tests whether the
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
of the errors from a regression is dependent on the values of the independent variables. In that case, heteroskedasticity is present. Suppose that we estimate the regression model : y = \beta_0 + \beta_1 x + u, \, and obtain from this fitted model a set of values for \widehat, the residuals.
Ordinary least squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the ...
constrains these so that their mean is 0 and so, given the assumption that their variance does not depend on the
independent variables Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or deman ...
, an estimate of this variance can be obtained from the average of the squared values of the residuals. If the assumption is not held to be true, a simple model might be that the variance is linearly related to independent variables. Such a model can be examined by regressing the squared residuals on the independent variables, using an auxiliary regression equation of the form : \widehat^2 = \gamma_0 + \gamma_1 x + v.\, This is the basis of the Breusch–Pagan test. It is a
chi-squared test A chi-squared test (also chi-square or test) is a statistical hypothesis test used in the analysis of contingency tables when the sample sizes are large. In simpler terms, this test is primarily used to examine whether two categorical variables ...
: the test statistic is distributed ''nχ''2 with ''k'' degrees of freedom. If the test statistic has a p-value below an appropriate threshold (e.g. ''p'' < 0.05) then the null hypothesis of homoskedasticity is rejected and heteroskedasticity assumed. If the Breusch–Pagan test shows that there is conditional heteroskedasticity, one could either use
weighted least squares Weighted least squares (WLS), also known as weighted linear regression, is a generalization of ordinary least squares and linear regression in which knowledge of the variance of observations is incorporated into the regression. WLS is also a speci ...
(if the source of heteroskedasticity is known) or use
heteroscedasticity-consistent standard errors The topic of heteroskedasticity-consistent (HC) standard errors arises in statistics and econometrics in the context of linear regression and time series analysis. These are also known as heteroskedasticity-robust standard errors (or simply robust ...
.


Procedure

Under the classical assumptions, ordinary least squares is the
best linear unbiased estimator Best or The Best may refer to: People * Best (surname), people with the surname Best * Best (footballer, born 1968), retired Portuguese footballer Companies and organizations * Best & Co., an 1879–1971 clothing chain * Best Lock Corporation ...
(BLUE), i.e., it is unbiased and efficient. It remains unbiased under heteroskedasticity, but efficiency is lost. Before deciding upon an estimation method, one may conduct the Breusch–Pagan test to examine the presence of heteroskedasticity. The Breusch–Pagan test is based on models of the type \sigma_i^2 = h(z_i'\gamma) for the variances of the observations where z_i = (1, z_, \ldots, z_) explain the difference in the variances. The null hypothesis is equivalent to the (p - 1)\, parameter restrictions: : \gamma_2 = \cdots = \gamma_p = 0. The following
Lagrange multiplier In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equality constraints (i.e., subject to the condition that one or more equations have to be satisfied e ...
(LM) yields the
test statistic A test statistic is a statistic (a quantity derived from the sample) used in statistical hypothesis testing.Berger, R. L.; Casella, G. (2001). ''Statistical Inference'', Duxbury Press, Second Edition (p.374) A hypothesis test is typically specifi ...
for the Breusch–Pagan test: : \text=\left (\frac \right )^ \left (-E\left frac \right \right )^ \left(\frac \right ). This test can be implemented via the following three-step procedure: * ''Step 1'': Apply OLS in the model :: y_i = X_i\beta+\varepsilon_i, \quad i=1,\dots,n * ''Step 2'': Compute the regression residuals, \hat_i, square them, and divide by the Maximum Likelihood estimate of the error variance from the Step 1 regression, to obtain what Breusch and Pagan call g_i: :: g_i = \hat_i^2 / \hat^2, \quad \hat^2 = \sum/n * ''Step 2'': Estimate the auxiliary regression :: g_i=\gamma_1+\gamma_2z_+\cdots+\gamma_pz_+\eta_i. where the ''z'' terms will typically but not necessarily be the same as the original covariates ''x''. * ''Step 3'': The LM test statistic is then half of the explained sum of squares from the auxiliary regression in Step 2: :: \text=\frac\left(\text - \text\right). where TSS is the sum of squared deviations of the g_i from their mean of 1, and SSR is the sum of squared residuals from the auxiliary regression. The test statistic is asymptotically distributed as \chi^2_ under the
null hypothesis In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...
of homoskedasticity and normally distributed \varepsilon_i, as proved by Breusch and Pagan in their 1979 paper.


Robust variant

A variant of this test, robust in the case of a non-
Gaussian Carl Friedrich Gauss (1777–1855) is the eponym of all of the topics listed below. There are over 100 topics all named after this German mathematician and scientist, all in the fields of mathematics, physics, and astronomy. The English eponymo ...
error term, was proposed by Roger Koenker. In this variant, the dependent variable in the auxiliary regression is just the squared residual from the Step 1 regression, \hat_i^2, and the test statistic is nR^2 from the auxiliary regression. As Koenker notes (1981, page 111), while the revised statistic has correct asymptotic size its
power Power most often refers to: * Power (physics), meaning "rate of doing work" ** Engine power, the power put out by an engine ** Electric power * Power (social and political), the ability to influence people or events ** Abusive power Power may a ...
"may be quite poor except under idealized Gaussian conditions."


Software

In R, this test is performed by the function ncvTest available in the car package, the function bptest available in the lmtest package, the function plmtest available in the plm package, or the function breusch_pagan available in the skedastic package. In Stata, one specifies the full regression, and then enters the command estat hettest followed by all independent variables. In SAS, Breusch–Pagan can be obtained using the Proc Model option. In
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
, there is a method het_breuschpagan in statsmodels.stats.diagnostic (the statsmodels package) for Breusch–Pagan test. In
gretl gretl is an open-source statistical package, mainly for econometrics. The name is an acronym for ''G''nu ''R''egression, ''E''conometrics and ''T''ime-series ''L''ibrary. It has both a graphical user interface (GUI) and a command-line inter ...
, the command modtest --breusch-pagan can be applied following an OLS regression.


See also

*
White test In statistics, the White test is a statistical test that establishes whether the variance of the errors in a regression model is constant: that is for homoskedasticity. This test, and an estimator for heteroscedasticity-consistent standard erro ...


References


Further reading

* * * * {{DEFAULTSORT:Breusch-Pagan Test Statistical tests Regression diagnostics