The Chow test (), proposed by
econometrician
Econometrics is an application of statistical methods to economic data in order to give empirical content to economic relationships. M. Hashem Pesaran (1987). "Econometrics", '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8� ...
Gregory Chow in 1960, is a
statistical test
A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. ...
of whether the true coefficients in two
linear regression
In statistics, linear regression is a statistical model, model that estimates the relationship between a Scalar (mathematics), scalar response (dependent variable) and one or more explanatory variables (regressor or independent variable). A mode ...
s on different data sets are equal. In econometrics, it is most commonly used in
time series analysis
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...
to test for the presence of a
structural break at a period which can be assumed to be known ''a priori'' (for instance, a major historical event such as a war). In
program evaluation
Program evaluation is a systematic method for collecting, analyzing, and using information to answer questions about projects, policies and programs, particularly about their effectiveness and efficiency.
In the public, private, and voluntar ...
, the Chow test is often used to determine whether the independent variables have different impacts on different subgroups of the population.
Illustrations
First Chow Test
Suppose that we model our data as
:
If we split our data into two groups, then we have
:
and
:
The
null hypothesis
The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...
of the Chow test asserts that
,
, and
, and there is the assumption that the
model errors are
independent and identically distributed from a
normal distribution
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
f(x) = \frac ...
with unknown
variance
In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
.
Let
be the sum of squared
residuals from the combined data,
be the sum of squared residuals from the first group, and
be the sum of squared residuals from the second group.
and
are the number of observations in each group and
is the total number of parameters (in this case 3, i.e. 2 independent variables coefficients + intercept). Then the Chow
test statistic
Test statistic is a quantity derived from the sample for statistical hypothesis testing.Berger, R. L.; Casella, G. (2001). ''Statistical Inference'', Duxbury Press, Second Edition (p.374) A hypothesis test is typically specified in terms of a tes ...
is
:
The test statistic follows the
''F''-distribution with
and
degrees of freedom
In many scientific fields, the degrees of freedom of a system is the number of parameters of the system that may vary independently. For example, a point in the plane has two degrees of freedom for translation: its two coordinates; a non-infinite ...
.
The same result can be achieved via dummy variables.
Consider the two data sets which are being compared. Firstly there is the 'primary'
data set
A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more table (database), database tables, where every column (database), column of a table represents a particular Variable (computer sci ...
i= and the 'secondary' data set i=. Then there is the union of these two sets: i=. If there is no structural change between the primary and secondary data sets a regression can be run over the union without the issue of biased estimators arising.
Consider the regression:
Which is run over i=.
D is a dummy variable taking a value of 1 for i= and 0 otherwise.
If both data sets can be explained fully by
then there is no use in the dummy variable as the data set is explained fully by the restricted equation. That is, under the assumption of no structural change we have a null and alternative hypothesis of:
The null hypothesis of joint insignificance of D can be run as an
F-test
An F-test is a statistical test that compares variances. It is used to determine if the variances of two samples, or if the ratios of variances among multiple samples, are significantly different. The test calculates a Test statistic, statistic, ...
with
degrees of freedom (DoF). That is:
.
Remarks
* The global sum of squares (SSE) is often called the Restricted Sum of Squares (RSSM) as we basically test a constrained model where we have
assumptions (with
the number of regressors).
* Some software like SAS will use a predictive Chow test when the size of a subsample is less than the number of regressors.
References
*
*
*
*
*
External links
{{commonscat, Chow test
Computing the Chow statistic Series of FAQ explanations from the
Stata Corporation at https://www.stata.com/support/faqs/
Series of FAQ explanations from the
SAS Institute, SAS Corporation
Time series statistical tests
Regression diagnostics