Co-integration
   HOME

TheInfoList



OR:

In
econometrics Econometrics is an application of statistical methods to economic data in order to give empirical content to economic relationships. M. Hashem Pesaran (1987). "Econometrics", '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8 ...
, cointegration is a
statistical Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
property describing a long-term, stable relationship between two or more
time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...
variables, even if those variables themselves are individually
non-stationary In mathematics and statistics, a stationary process (also called a strict/strictly stationary process or strong/strongly stationary process) is a stochastic process whose statistical properties, such as mean and variance, do not change over time. M ...
(i.e., they have trends). This means that despite their individual fluctuations, the variables move together in the long run, anchored by an underlying equilibrium relationship. More formally, if several time series are individually integrated of order ''d'' (meaning they require ''d'' differences to become stationary) but a
linear combination In mathematics, a linear combination or superposition is an Expression (mathematics), expression constructed from a Set (mathematics), set of terms by multiplying each term by a constant and adding the results (e.g. a linear combination of ''x'' a ...
of them is integrated of a lower order, then those time series are said to be cointegrated. That is, if (''X'',''Y'',''Z'') are each integrated of order ''d'', and there exist coefficients ''a'',''b'',''c'' such that is integrated of order less than d, then ''X'', ''Y'', and ''Z'' are cointegrated. Cointegration is a crucial concept in time series analysis, particularly when dealing with variables that exhibit trends, such as
macroeconomic Macroeconomics is a branch of economics that deals with the performance, structure, behavior, and decision-making of an economy as a whole. This includes regional, national, and global economies. Macroeconomists study topics such as output/ GDP ...
data. In an influential paper, Charles Nelson and
Charles Plosser Charles Irving Plosser (; born September 19, 1948) is a former president of the Federal Reserve Bank of Philadelphia who served from August 1, 2006, to March 1, 2015. An academic macroeconomist, he is well known for his work on real business cyc ...
(1982) provided statistical evidence that many US macroeconomic time series (like GNP, wages, employment, etc.) have stochastic trends.


Introduction

If two or more series are individually integrated (in the time series sense) but some
linear combination In mathematics, a linear combination or superposition is an Expression (mathematics), expression constructed from a Set (mathematics), set of terms by multiplying each term by a constant and adding the results (e.g. a linear combination of ''x'' a ...
of them has a lower
order of integration In statistics, the order of integration, denoted ''I''(''d''), of a time series is a summary statistic, which reports the minimum number of differences required to obtain a covariance-stationary series (i.e., a time series whose mean and autoco ...
, then the series are said to be cointegrated. A common example is where the individual series are first-order integrated () but some (cointegrating) vector of coefficients exists to form a stationary linear combination of them.


History

The first to introduce and analyse the concept of spurious—or nonsense—regression was
Udny Yule George Udny Yule, CBE, FRS (18 February 1871 – 26 June 1951), usually known as Udny Yule, was a British statistician, particularly known for the Yule distribution and proposing the preferential attachment model for random graphs. Perso ...
in 1926. Before the 1980s, many economists used
linear regression In statistics, linear regression is a statistical model, model that estimates the relationship between a Scalar (mathematics), scalar response (dependent variable) and one or more explanatory variables (regressor or independent variable). A mode ...
s on non-stationary time series data, which Nobel laureate
Clive Granger Sir Clive William John Granger (; 4 September 1934 – 27 May 2009) was a British econometrician known for his contributions to nonlinear time series analysis. He taught in Britain, at the University of Nottingham and in the United States, at t ...
and Paul Newbold showed to be a dangerous approach that could produce
spurious correlation In statistics, a spurious relationship or spurious correlation is a mathematical relationship in which two or more events or variables are associated but '' not'' causally related, due to either coincidence or the presence of a certain third, u ...
, since standard detrending techniques can result in data that are still non-stationary. Granger's 1987 paper with
Robert Engle Robert Fry Engle III (born November 10, 1942) is an American economist and statistician. He won the 2003 Nobel Memorial Prize in Economic Sciences, sharing the award with Clive Granger, "for methods of analyzing economic time series with time-va ...
formalized the cointegrating vector approach, and coined the term. For integrated processes, Granger and Newbold showed that de-trending does not work to eliminate the problem of spurious correlation, and that the superior alternative is to check for co-integration. Two series with trends can be co-integrated only if there is a genuine relationship between the two. Thus the standard current methodology for time series regressions is to check all-time series involved for integration. If there are series on both sides of the regression relationship, then it is possible for regressions to give misleading results. The possible presence of cointegration must be taken into account when choosing a technique to test hypotheses concerning the relationship between two variables having
unit root In probability theory and statistics, a unit root is a feature of some stochastic processes (such as random walks) that can cause problems in statistical inference involving time series models. A linear stochastic process has a unit root if ...
s (i.e. integrated of at least order one). The usual procedure for testing hypotheses concerning the relationship between non-stationary variables was to run
ordinary least squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression In statistics, linear regression is a statistical model, model that estimates the relationship ...
(OLS) regressions on data which had been differenced. This method is biased if the non-stationary variables are cointegrated. For example, regressing the consumption series for any country (e.g. Fiji) against the GNP for a randomly selected dissimilar country (e.g. Afghanistan) might give a high
R-squared In statistics, the coefficient of determination, denoted ''R''2 or ''r''2 and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the independent variable(s). It is a statistic used in t ...
relationship (suggesting high explanatory power on Fiji's consumption from Afghanistan's
GNP The gross national income (GNI), previously known as gross national product (GNP), is the total amount of factor incomes earned by the residents of a country. It is equal to gross domestic product (GDP), plus factor incomes received from n ...
). This is called spurious regression: two integrated series which are not directly causally related may nonetheless show a significant correlation.


Tests

The six main methods for testing for cointegration are:


Engle–Granger two-step method

If x_t and y_t both have
order of integration In statistics, the order of integration, denoted ''I''(''d''), of a time series is a summary statistic, which reports the minimum number of differences required to obtain a covariance-stationary series (i.e., a time series whose mean and autoco ...
''d''=1 and are cointegrated, then a linear combination of them must be stationary for some value of \beta and u_t . In other words: : y_t - \beta x_t = u_t \, where u_t is stationary. If \beta is known, we can test u_t for stationarity with an
Augmented Dickey–Fuller test In statistics, an augmented Dickey–Fuller test (ADF) tests the null hypothesis that a unit root is present in a time series sample. The alternative hypothesis depends on which version of the test is used, but is usually stationarity or trend-s ...
or
Phillips–Perron test In statistics, the Phillips–Perron test (named after Peter C. B. Phillips and Pierre Perron) is a unit root test. That is, it is used in time series In mathematics, a time series is a series of data points indexed (or listed or graphed) ...
. If \beta is unknown, we must first estimate it. This is typically done by using
ordinary least squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression In statistics, linear regression is a statistical model, model that estimates the relationship ...
(by regressing y_t on x_t and an intercept). Then, we can run an ADF test on u_t. However, when \beta is estimated, the critical values of this ADF test are non-standard, and increase in absolute value as more regressors are included. If the variables are found to be cointegrated, a second-stage regression is conducted. This is a regression of \Delta y_t on the lagged regressors, \Delta x_t and the lagged residuals from the first stage, \hat_. The second stage regression is given as: \Delta y_t = \Delta x_t b + \alpha u_ + \varepsilon_t If the variables are not cointegrated (if we cannot reject the null of no cointegration when testing u_t), then \alpha=0 and we estimate a differences model: \Delta y_t = \Delta x_t b + \varepsilon_t


Johansen test

The
Johansen test In statistics, the Johansen test, named after Søren Johansen, is a procedure for testing cointegration of several, say ''k'', Order of integration, I(1) time series. This test permits more than one cointegrating relationship so is more generally a ...
is a test for cointegration that allows for more than one cointegrating relationship, unlike the Engle–Granger method, but this test is subject to asymptotic properties, i.e. large samples. If the sample size is too small then the results will not be reliable and one should use Auto Regressive Distributed Lags (ARDL).


Phillips–Ouliaris cointegration test

Peter C. B. Phillips and Sam Ouliaris (1990) show that residual-based unit root tests applied to the estimated cointegrating residuals do not have the usual Dickey–Fuller distributions under the null hypothesis of no-cointegration. Because of the spurious regression phenomenon under the null hypothesis, the distribution of these tests have asymptotic distributions that depend on (1) the number of deterministic trend terms and (2) the number of variables with which co-integration is being tested. These distributions are known as Phillips–Ouliaris distributions and critical values have been tabulated. In finite samples, a superior alternative to the use of these asymptotic critical value is to generate critical values from simulations.


Multicointegration

In practice, cointegration is often used for two series, but it is more generally applicable and can be used for variables integrated of higher order (to detect correlated accelerations or other second-difference effects). Multicointegration extends the cointegration technique beyond two variables, and occasionally to variables integrated at different orders.


Variable shifts in long time series

Tests for cointegration assume that the cointegrating vector is constant during the period of study. In reality, it is possible that the long-run relationship between the underlying variables change (shifts in the cointegrating vector can occur). The reason for this might be technological progress, economic crises, changes in the people's preferences and behaviour accordingly, policy or regime alteration, and organizational or institutional developments. This is especially likely to be the case if the sample period is long. To take this issue into account, tests have been introduced for cointegration with one unknown structural break, and tests for cointegration with two unknown breaks are also available.


Bayesian inference

Several
Bayesian methods Bayesian inference ( or ) is a method of statistical inference in which Bayes' theorem is used to calculate a probability of a hypothesis, given prior evidence, and update it as more information becomes available. Fundamentally, Bayesian inferen ...
have been proposed to compute the posterior distribution of the number of cointegrating relationships and the cointegrating linear combinations.


See also

*
Error correction model An error correction model (ECM) belongs to a category of multiple time series models most commonly used for data where the underlying variables have a long-run common stochastic trend, also known as cointegration. ECMs are a theoretically-driven ap ...
* Granger causality * Stationary subspace analysis *
Asymmetric cointegration In statistics and econometrics, asymmetric cointegration describes a long-term relationship between variables where positive and negative shocks to the equilibrium have different impacts. This builds upon the concept of cointegration, which refers ...


References


Further reading

* * * * An intuitive introduction to cointegration. {{Authority control Mathematical finance Time series