Correlogram
   HOME

TheInfoList



OR:

In the analysis of data, a correlogram is a chart of correlation statistics. For example, in
time series analysis In mathematics Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in m ...
, a plot of the sample autocorrelations r_h\, versus h\, (the time lags) is an autocorrelogram. If cross-correlation is plotted, the result is called a cross-correlogram. The correlogram is a commonly used tool for checking
randomness In common usage, randomness is the apparent or actual lack of pattern or predictability in events. A random sequence of events, symbols or steps often has no order and does not follow an intelligible pattern or combination. Individual rand ...
in a
data set A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the ...
. If random, autocorrelations should be near zero for any and all time-lag separations. If non-random, then one or more of the autocorrelations will be significantly non-zero. In addition, correlograms are used in the model identification stage for Box–Jenkins
autoregressive moving average In statistics, econometrics and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it is used to describe certain time-varying processes in nature, economics, etc. The autoregressive model spe ...
time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Ex ...
models. Autocorrelations should be near-zero for randomness; if the analyst does not check for randomness, then the validity of many of the statistical conclusions becomes suspect. The correlogram is an excellent way of checking for such randomness. In
multivariate analysis Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable. Multivariate statistics concerns understanding the different aims and background of each of the dif ...
, correlation matrices shown as color-mapped images may also be called "correlograms" or "corrgrams".


Applications

The correlogram can help provide answers to the following questions: * Are the data random? * Is an observation related to an adjacent observation? * Is an observation related to an observation twice-removed? (etc.) * Is the observed time series
white noise In signal processing, white noise is a random signal having equal intensity at different frequencies, giving it a constant power spectral density. The term is used, with this or similar meanings, in many scientific and technical disciplines ...
? * Is the observed time series sinusoidal? * Is the observed time series autoregressive? * What is an appropriate model for the observed time series? * Is the model :: Y = \text + \text : valid and sufficient? * Is the formula s_=s/\sqrt valid?


Importance

Randomness (along with fixed model, fixed variation, and fixed distribution) is one of the four assumptions that typically underlie all measurement processes. The randomness assumption is critically important for the following three reasons: * Most standard
statistical test A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...
s depend on randomness. The validity of the test conclusions is directly linked to the validity of the randomness assumption. * Many commonly used statistical formulae depend on the randomness assumption, the most common formula being the formula for determining the
standard error The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error o ...
of the sample mean: :: s_=s/\sqrt where ''s'' is the standard deviation of the data. Although heavily used, the results from using this formula are of no value unless the randomness assumption holds. * For univariate data, the default model is :: Y = \text + \text If the data are not random, this model is incorrect and invalid, and the estimates for the parameters (such as the constant) become nonsensical and invalid.


Estimation of autocorrelations

The autocorrelation coefficient at lag ''h'' is given by : r_h = c_h/c_0 \, where ''ch'' is the autocovariance function : c_h = \frac 1 N \sum_^ \left(Y_t - \bar\right)\left(Y_ - \bar\right) and ''c''0 is the
variance function In statistics, the variance function is a smooth function which depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statisti ...
: c_0 = \frac 1 N \sum_^N \left(Y_t - \bar\right)^2 The resulting value of ''rh'' will range between −1 and +1.


Alternate estimate

Some sources may use the following formula for the autocovariance function: : c_h = \frac\sum_^ \left(Y_t - \bar\right)\left(Y_ - \bar \right) Although this definition has less
bias Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group ...
, the (1/''N'') formulation has some desirable statistical properties and is the form most commonly used in the statistics literature. See pages 20 and 49–50 in Chatfield for details.


Statistical inference with correlograms

In the same graph one can draw upper and lower bounds for autocorrelation with significance level \alpha\,: :B=\pm z_ SE(r_h)\, with r_h\, as the estimated autocorrelation at lag h\,. If the autocorrelation is higher (lower) than this upper (lower) bound, the null hypothesis that there is no autocorrelation at and beyond a given lag is rejected at a significance level of \alpha\,. This test is an approximate one and assumes that the time-series is
Gaussian Carl Friedrich Gauss (1777–1855) is the eponym of all of the topics listed below. There are over 100 topics all named after this German mathematician and scientist, all in the fields of mathematics, physics, and astronomy. The English eponymo ...
. In the above, ''z''1−''α''/2 is the quantile of the
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
; SE is the standard error, which can be computed by Bartlett's formula for MA(''ℓ'') processes: :SE(r_1)=\frac 1 : SE(r_h)=\sqrt\frac for h>1.\, In the example plotted, we can reject the
null hypothesis In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...
that there is no autocorrelation between time-points which are separated by lags up to 4. For most longer periods one cannot reject the
null hypothesis In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...
of no autocorrelation. Note that there are two distinct formulas for generating the confidence bands: 1. If the correlogram is being used to test for randomness (i.e., there is no time dependence in the data), the following formula is recommended: : \pm \frac where ''N'' is the
sample size Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a populatio ...
, ''z'' is the quantile function of the standard normal distribution and α is the
significance level In statistical hypothesis testing, a result has statistical significance when it is very unlikely to have occurred given the null hypothesis (simply by chance alone). More precisely, a study's defined significance level, denoted by \alpha, is the p ...
. In this case, the confidence bands have fixed width that depends on the sample size. 2. Correlograms are also used in the model identification stage for fitting
ARIMA Arima, officially The Royal Chartered Borough of Arima is the easternmost and second largest in area of the three boroughs of Trinidad and Tobago. It is geographically adjacent to Sangre Grande and Arouca at the south central foothills of th ...
models. In this case, a
moving average model In time series analysis, the moving-average model (MA model), also known as moving-average process, is a common approach for modeling univariate time series. The moving-average model specifies that the output variable is cross-correlated with a ...
is assumed for the data and the following confidence bands should be generated: : \pm z_ \sqrt where ''k'' is the lag. In this case, the confidence bands increase as the lag increases.


Software

Correlograms are available in most general purpose statistical libraries. Correlograms: *
python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
pandas Pediatric autoimmune neuropsychiatric disorders associated with streptococcal infections (PANDAS) is a controversial hypothetical diagnosis for a subset of children with rapid onset of obsessive-compulsive disorder (OCD) or tic disorders. Sy ...
: pandas.plotting.autocorrelation_plot * R: functions acf and pacf Corrgrams: *
python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
seaborn Seaborn is a given name and a surname. Notable persons with that name include: Persons with the given name * Seaborn Buckalew, Jr. (1920–2017), American judge and politician * Seaborn McDaniel Denson (1854–1936), American musician and singin ...
: heatmap, pairplot * R: corrgram


Related techniques

*
Partial autocorrelation function In time series analysis, the partial autocorrelation function (PACF) gives the partial correlation of a stationary time series with its own lagged values, regressed the values of the time series at all shorter lags. It contrasts with the autocorre ...
* Lag plot * Spectral plot * Seasonal subseries plot *
Scaled Correlation In statistics, scaled correlation is a form of a coefficient of correlation applicable to data that have a temporal component such as time series. It is the average short-term correlation. If the signals have multiple components (slow and fast), sca ...
*
Variogram In spatial statistics the theoretical variogram 2\gamma(\mathbf_1,\mathbf_2) is a function describing the degree of spatial dependence of a spatial random field or stochastic process Z(\mathbf). The semivariogram \gamma(\mathbf_1,\mathbf_2) is ...


References


Further reading

* * *


External links


Autocorrelation Plot
{{Statistics, descriptive Statistical charts and diagrams Autocorrelation