Durbin–Watson statistic
   HOME

TheInfoList



OR:

In statistics, the Durbin–Watson statistic is a
test statistic A test statistic is a statistic (a quantity derived from the sample) used in statistical hypothesis testing.Berger, R. L.; Casella, G. (2001). ''Statistical Inference'', Duxbury Press, Second Edition (p.374) A hypothesis test is typically specifi ...
used to detect the presence of autocorrelation at lag 1 in the residuals (prediction errors) from a
regression analysis In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
. It is named after
James Durbin __NOTOC__ James Durbin FBA (30 June 1923 – 23 June 2012) was a British statistician and econometrician, known particularly for his work on time series analysis and serial correlation. Education The son of a greengrocer, Durbin was born in W ...
and Geoffrey Watson. The small sample distribution of this ratio was derived by
John von Neumann John von Neumann (; hu, Neumann János Lajos, ; December 28, 1903 – February 8, 1957) was a Hungarian-American mathematician, physicist, computer scientist, engineer and polymath. He was regarded as having perhaps the widest cove ...
(von Neumann, 1941). Durbin and Watson (1950, 1951) applied this statistic to the residuals from least squares regressions, and developed bounds tests for the
null hypothesis In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...
that the errors are serially uncorrelated against the alternative that they follow a first order
autoregressive In statistics, econometrics and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it is used to describe certain time-varying processes in nature, economics, etc. The autoregressive model spe ...
process. Note that the distribution of this test statistic does not depend on the estimated regression coefficients and the variance of the errors. A similar assessment can be also carried out with the Breusch–Godfrey test and the
Ljung–Box test The Ljung–Box test (named for Greta M. Ljung and George E. P. Box) is a type of statistical test of whether any of a group of autocorrelations of a time series are different from zero. Instead of testing randomness at each distinct lag, it tes ...
.


Computing and interpreting the Durbin–Watson statistic

If ''et'' is the residual given by e_t = \rho e_+ \nu_t , the Durbin-Watson
test statistic A test statistic is a statistic (a quantity derived from the sample) used in statistical hypothesis testing.Berger, R. L.; Casella, G. (2001). ''Statistical Inference'', Duxbury Press, Second Edition (p.374) A hypothesis test is typically specifi ...
is : d = , where ''T'' is the number of observations. For large ''T'', ''d'' is approximately equal to 2(1 − \hat \rho), where \hat \rho is the sample autocorrelation of the residuals,Gujarati (2003) p. 469 ''d'' = 2 therefore indicates no autocorrelation. The value of ''d'' always lies between 0 and 4. If the Durbin–Watson statistic is substantially less than 2, there is evidence of positive serial correlation. As a rough rule of thumb, if Durbin–Watson is less than 1.0, there may be cause for alarm. Small values of ''d'' indicate successive error terms are positively correlated. If ''d'' > 2, successive error terms are negatively correlated. In regressions, this can imply an underestimation of the level of statistical significance. To test for positive autocorrelation at significance ''α'', the test statistic ''d'' is compared to lower and upper critical values (''dL,α'' and ''dU,α''): :*If ''d'' < ''dL,α'', there is statistical evidence that the error terms are positively autocorrelated. :*If ''d'' > ''dU,α'', there is no statistical evidence that the error terms are positively autocorrelated. :*If ''dL,α'' < ''d'' < ''dU,α'', the test is inconclusive. Positive serial correlation is serial correlation in which a positive error for one observation increases the chances of a positive error for another observation. To test for negative autocorrelation at significance ''α'', the test statistic (4 − ''d'') is compared to lower and upper critical values (''dL,α'' and ''dU,α''): :*If (4 − ''d'') < ''dL,α'', there is statistical evidence that the error terms are negatively autocorrelated. :*If (4 − ''d'') > ''dU,α'', there is no statistical evidence that the error terms are negatively autocorrelated. :*If ''dL,α'' < (4 − ''d'') < ''dU,α'', the test is inconclusive. Negative serial correlation implies that a positive error for one observation increases the chance of a negative error for another observation and a negative error for one observation increases the chances of a positive error for another. The critical values, ''dL,α'' and ''dU,α'', vary by level of significance (''α'') and the degrees of freedom in the regression equation. Their derivation is complex—statisticians typically obtain them from the appendices of statistical texts. If the
design matrix In statistics and in particular in regression analysis, a design matrix, also known as model matrix or regressor matrix and often denoted by X, is a matrix of values of explanatory variables of a set of objects. Each row represents an individual ob ...
\mathbf of the regression is known, exact critical values for the distribution of d under the null hypothesis of no serial correlation can be calculated. Under the null hypothesis d is distributed as : \frac , where ''n'' are the number of observations and ''k'' the number of regression variables; the \xi_i are independent standard normal random variables; and the \nu_i are the nonzero eigenvalues of ( \mathbf - \mathbf ( \mathbf^T \mathbf ) ^ \mathbf^T ) \mathbf, where \mathbf is the matrix that transforms the residuals into the d statistic, i.e. d = \mathbf^T\mathbf\mathbf. . A number of computational algorithms for finding percentiles of this distribution are available. Although serial correlation does not affect the consistency of the estimated regression coefficients, it does affect our ability to conduct valid statistical tests. First, the F-statistic to test for overall significance of the regression may be inflated under positive serial correlation because the mean squared error (MSE) will tend to underestimate the population error variance. Second, positive serial correlation typically causes the ordinary least squares (OLS) standard errors for the regression coefficients to underestimate the true standard errors. As a consequence, if positive serial correlation is present in the regression, standard linear regression analysis will typically lead us to compute artificially small standard errors for the regression coefficient. These small standard errors will cause the estimated t-statistic to be inflated, suggesting significance where perhaps there is none. The inflated t-statistic, may in turn, lead us to incorrectly reject null hypotheses, about population values of the parameters of the regression model more often than we would if the standard errors were correctly estimated. If the Durbin–Watson statistic indicates the presence of serial correlation of the residuals, this can be remedied by using the Cochrane–Orcutt procedure. The Durbin–Watson statistic, while displayed by many regression analysis programs, is not applicable in certain situations. For instance, when lagged dependent variables are included in the explanatory variables, then it is inappropriate to use this test. Durbin's h-test (see below) or likelihood ratio tests, that are valid in large samples, should be used.


Durbin h-statistic

The Durbin–Watson statistic is biased for
autoregressive moving average model In statistics, econometrics and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it is used to describe certain time-varying processes in nature, economics, etc. The autoregressive model spe ...
s, so that autocorrelation is underestimated. But for large samples one can easily compute the unbiased normally distributed h-statistic: :h = \left( 1 - \frac d \right) \sqrt, using the Durbin–Watson statistic ''d'' and the estimated variance : \widehat (\widehat\beta_1) of the regression coefficient of the lagged dependent variable, provided : T \cdot \widehat(\widehat\beta_1)<1. \,


Implementations in statistics packages

# R: the dwtest function in the lmtest package, durbinWatsonTest (or dwt for short) function in the car package, and pdwtest and pbnftest for panel models in the plm package. #
MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementa ...
: the dwtest function in the Statistics Toolbox. # Mathematica: the Durbin–Watson (''d'') statistic is included as an option in the LinearModelFit function. # SAS: Is a standard output when using proc model and is an option (dw) when using proc reg. #
EViews EViews is a statistical package for Windows, used mainly for time-series oriented econometric analysis. It is developed by Quantitative Micro Software (QMS), now a part of IHS. Version 1.0 was released in March 1994, and replaced MicroTSP. Th ...
: Automatically calculated when using OLS regression #
gretl gretl is an open-source statistical package, mainly for econometrics. The name is an acronym for ''G''nu ''R''egression, ''E''conometrics and ''T''ime-series ''L''ibrary. It has both a graphical user interface (GUI) and a command-line inter ...
: Automatically calculated when using OLS regression # Stata: the command estat dwatson, following regress in time series data. Engle's LM test for autoregressive conditional heteroskedasticity (ARCH), a test for time-dependent volatility, the Breusch–Godfrey test, and Durbin's alternative test for serial correlation are also available. All (except -dwatson-) tests separately for higher-order serial correlations. The Breusch–Godfrey test and Durbin's alternative test also allow regressors that are not strictly exogenous. #
Excel ExCeL London (an abbreviation for Exhibition Centre London) is an exhibition centre, international convention centre and former hospital in the Custom House area of Newham, East London. It is situated on a site on the northern quay of the ...
: although Microsoft Excel 2007 does not have a specific Durbin–Watson function, the ''d''-statistic may be calculated using =SUMXMY2(x_array,y_array)/SUMSQ(array) #
Minitab Minitab is a statistics package developed at the Pennsylvania State University by researchers Barbara F. Ryan, Thomas A. Ryan, Jr., and Brian L. Joiner in conjunction with Triola Statistics Company in 1972. It began as a light version of OMNITA ...
: the option to report the statistic in the Session window can be found under the "Options" box under Regression and via the "Results" box under General Regression. #
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
: a durbin_watson function is included in the statsmodels package (statsmodels.stats.stattools.durbin_watson), but statistical tables for critical values are not available there. #
SPSS SPSS Statistics is a statistical software suite developed by IBM for data management, advanced analytics, multivariate analysis, business intelligence, and criminal investigation. Long produced by SPSS Inc., it was acquired by IBM in 2009. C ...
: Included as an option in the Regression function. #
Julia Julia is usually a feminine given name. It is a Latinate feminine form of the name Julio and Julius. (For further details on etymology, see the Wiktionary entry "Julius".) The given name ''Julia'' had been in use throughout Late Antiquity (e.g ...
: the ''DurbinWatsonTest'' function is available in the ''HypothesisTests'' package.


See also

* Time-series regression * ACF / PACF *
Correlation dimension In chaos theory, the correlation dimension (denoted by ''ν'') is a measure of the dimensionality of the space occupied by a set of random points, often referred to as a type of fractal dimension. For example, if we have a set of random points on t ...
* Breusch–Godfrey test *
Ljung–Box test The Ljung–Box test (named for Greta M. Ljung and George E. P. Box) is a type of statistical test of whether any of a group of autocorrelations of a time series are different from zero. Instead of testing randomness at each distinct lag, it tes ...


Notes


References

* * * * * *


External links


Table for high ''n'' and ''k''
* by
Mark Thoma Mark Allen Thoma (born December 15, 1956) is a macroeconomist and econometrician and a professor of economics at the Department of Economics of the University of Oregon. Thoma is best known as a regular columnist for ''The Fiscal Times'' throug ...
{{DEFAULTSORT:Durbin-Watson statistic Time series statistical tests Autocorrelation