In
time series analysis
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...
used in
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
and
econometrics
Econometrics is an application of statistical methods to economic data in order to give empirical content to economic relationships. M. Hashem Pesaran (1987). "Econometrics", '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8 ...
, autoregressive integrated moving average (ARIMA) and seasonal ARIMA (SARIMA)
models
A model is an informative representation of an object, person, or system. The term originally denoted the plans of a building in late 16th-century English, and derived via French and Italian ultimately from Latin , .
Models can be divided int ...
are generalizations of the
autoregressive moving average (ARMA) model to non-stationary series and periodic variation, respectively. All these models are fitted to
time series
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...
in order to better understand it and predict future values. The purpose of these generalizations is to fit the data as well as possible. Specifically, ARMA assumes that the series is
stationary, that is, its expected value is constant in time. If instead the series has a trend (but a constant variance/
autocovariance
In probability theory and statistics, given a stochastic process, the autocovariance is a function that gives the covariance of the process with itself at pairs of time points. Autocovariance is closely related to the autocorrelation of the proces ...
), the trend is removed by "differencing", leaving a stationary series. This operation generalizes ARMA and corresponds to the "
integrated" part of ARIMA. Analogously, periodic variation is removed by "seasonal differencing".
Components
As in ARMA, the "autoregressive" () part of ARIMA indicates that the evolving variable of interest is
regressed on its prior values. The "moving average" () part indicates that the
regression error is a
linear combination
In mathematics, a linear combination or superposition is an Expression (mathematics), expression constructed from a Set (mathematics), set of terms by multiplying each term by a constant and adding the results (e.g. a linear combination of ''x'' a ...
of error terms whose values occurred contemporaneously and at various times in the past. The "integrated" () part indicates that the data values have been replaced with the difference between each value and the previous value.
According to
Wold's decomposition theorem the ARMA model is sufficient to describe a regular (a.k.a. purely nondeterministic
)
wide-sense stationary time series. This motivates to make such a non-stationary time series stationary, e.g., by using differencing, before using ARMA.
If the time series contains a predictable sub-process (a.k.a. pure sine or complex-valued exponential process
), the predictable component is treated as a non-zero-mean but periodic (i.e., seasonal) component in the ARIMA framework that it is eliminated by the seasonal differencing.
Mathematical formulation
Non-seasonal ARIMA models are usually denoted ARIMA(''p'', ''d'', ''q'') where
parameter
A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
s ''p'', ''d'', ''q'' are non-negative integers: ''p'' is the order (number of time lags) of the
autoregressive model
In statistics, econometrics, and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it can be used to describe certain time-varying processes in nature, economics, behavior, etc. The autoregre ...
, ''d'' is the degree of differencing (the number of times the data have had past values subtracted), and ''q'' is the order of the
moving-average model. Seasonal ARIMA models are usually denoted ARIMA(''p'', ''d'', ''q'')(''P'', ''D'', ''Q'')
''m'', where the uppercase ''P'', ''D'', ''Q'' are the autoregressive, differencing, and moving average terms for the seasonal part of the ARIMA model and ''m'' is the number of periods in each season.
When two of the parameters are 0, the model may be referred to based on the non-zero parameter, dropping "", "" or "" from the acronym. For example, is , is , and is .
Given time series data ''X''
''t'' where ''t'' is an integer index and the ''X''
''t'' are real numbers, an
model is given by
:
or equivalently by
:
where
is the
lag operator, the
are the parameters of the autoregressive part of the model, the
are the parameters of the moving average part and the
are error terms. The error terms
are generally assumed to be
independent, identically distributed variables sampled from a
normal distribution
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
f(x) = \frac ...
with zero mean.
If the polynomial
has a
unit root
In probability theory and statistics, a unit root is a feature of some stochastic processes (such as random walks) that can cause problems in statistical inference involving time series models. A linear stochastic process has a unit root if ...
(a factor
) of multiplicity ''d'', then it can be rewritten as:
:
An ARIMA(''p'', ''d'', ''q'') process expresses this polynomial factorisation property with ''p'' = ''p'−d'', and is given by:
:
and so is special case of an ARMA(''p+d'', ''q'') process having the autoregressive polynomial with ''d'' unit roots. (This is why no process that is accurately described by an ARIMA model with ''d'' > 0 is
wide-sense stationary.)
The above can be generalized as follows.
:
This defines an ARIMA(''p'', ''d'', ''q'') process with drift
.
Other special forms
The explicit identification of the factorization of the autoregression polynomial into factors as above can be extended to other cases, firstly to apply to the moving average polynomial and secondly to include other special factors. For example, having a factor
in a model is one way of including a non-stationary seasonality of period ''s'' into the model; this factor has the effect of re-expressing the data as changes from ''s'' periods ago. Another example is the factor
, which includes a (non-stationary) seasonality of period 2. The effect of the first type of factor is to allow each season's value to drift separately over time, whereas with the second type values for adjacent seasons move together.
Identification and specification of appropriate factors in an ARIMA model can be an important step in modeling as it can allow a reduction in the overall number of parameters to be estimated while allowing the imposition on the model of types of behavior that logic and experience suggest should be there.
Differencing
A stationary time series's properties do not change. Specifically, for a
wide-sense stationary time series, the mean and the variance/
autocovariance
In probability theory and statistics, given a stochastic process, the autocovariance is a function that gives the covariance of the process with itself at pairs of time points. Autocovariance is closely related to the autocorrelation of the proces ...
are constant over time. Differencing in statistics is a transformation applied to a non-stationary time-series in order to make it
trend stationary (i.e., stationary ), by removing or subtracting the trend or non-constant mean. However, it does not affect the non-stationarity of the variance or
autocovariance
In probability theory and statistics, given a stochastic process, the autocovariance is a function that gives the covariance of the process with itself at pairs of time points. Autocovariance is closely related to the autocorrelation of the proces ...
. Likewise, ''seasonal differencing'' or ''
deseasonalization'' is applied to a time-series to remove the seasonal component.
From the perspective of signal processing, especially the
Fourier spectral analysis theory, the trend is a low-frequency part in the spectrum of a series, while the season is a periodic-frequency part. Therefore, differencing is a
high-pass (that is, low-stop) filter and the seasonal-differencing is a
comb filter to suppress respectively the low-frequency trend and the periodic-frequency season in the spectrum domain (rather than directly in the time domain).
To difference the data, we compute the difference between consecutive observations. Mathematically, this is shown as
:
It may be necessary to difference the data a second time to obtain a stationary time series, which is referred to as second-order differencing:
:
Seasonal differencing involves computing the difference between an observation and the corresponding observation in the previous season e.g a year. This is shown as:
:
The differenced data are then used for the estimation of an
ARMA
Arma, ARMA or variants, may refer to:
Places
* Arma, Kansas, United States
* Arma, Nepal
* Arma District, Peru
* Arma District, Yemen
* Arma Mountains, Afghanistan
People
* Arma people, an ethnic group of the middle Niger River valley
* Arma lan ...
model.
Examples
Some well-known special cases arise naturally or are mathematically equivalent to other popular forecasting models. For example:
* ARIMA(0, 0, 0) models
white noise
In signal processing, white noise is a random signal having equal intensity at different frequencies, giving it a constant power spectral density. The term is used with this or similar meanings in many scientific and technical disciplines, i ...
.
* An ARIMA(0, 1, 0) model is a
random walk
In mathematics, a random walk, sometimes known as a drunkard's walk, is a stochastic process that describes a path that consists of a succession of random steps on some Space (mathematics), mathematical space.
An elementary example of a rand ...
.
* An ARIMA(0, 1, 2) model is a Damped Holt's model.
* An ARIMA(0, 1, 1) model without constant is a
basic exponential smoothing model.
* An ARIMA(0, 2, 2) model is given by
— which is equivalent to Holt's linear method with additive errors, or
double exponential smoothing.
Choosing the order
The order ''p'' and ''q'' can be determined using the sample
autocorrelation function (ACF),
partial autocorrelation function
In time series analysis, the partial autocorrelation function (PACF) gives the partial correlation of a stationary time series with its own lagged values, regressed the values of the time series at all shorter lags. It contrasts with the autocorre ...
(PACF), and/or extended autocorrelation function (EACF) method.
Other alternative methods include AIC, BIC, etc.
To determine the order of a non-seasonal ARIMA model, a useful criterion is the
Akaike information criterion (AIC). It is written as
:
where ''L ''is the likelihood of the data, ''p ''is the order of the autoregressive part and ''q ''is the order of the moving average part. The ''k'' represents the intercept of the ARIMA model. For AIC, if ''k'' = 1 then there is an intercept in the ARIMA model (''c ''≠ 0) and if ''k ''= 0 then there is no intercept in the ARIMA model (''c ''= 0).
The corrected AIC for ARIMA models can be written as
:
The
Bayesian Information Criterion (BIC) can be written as
:
The objective is to minimize the AIC, AICc or BIC values for a good model. The lower the value of one of these criteria for a range of models being investigated, the better the model will suit the data. The AIC and the BIC are used for two completely different purposes. While the AIC tries to approximate models towards the reality of the situation, the BIC attempts to find the perfect fit. The BIC approach is often criticized as there never is a perfect fit to real-life complex data; however, it is still a useful method for selection as it penalizes models more heavily for having more parameters than the AIC would.
AICc can only be used to compare ARIMA models with the same orders of differencing. For ARIMAs with different orders of differencing,
RMSE can be used for model comparison.
Estimation of coefficients
Forecasts using ARIMA models
The ARIMA model can be viewed as a "cascade" of two models. The first is non-stationary:
:
while the second is
wide-sense stationary:
:
Now forecasts can be made for the process
, using a generalization of the method of
autoregressive forecasting.
Forecast intervals
The forecast intervals (
confidence intervals for forecasts) for ARIMA models are based on assumptions that the residuals are uncorrelated and normally distributed. If either of these assumptions does not hold, then the forecast intervals may be incorrect. For this reason, researchers plot the ACF and histogram of the residuals to check the assumptions before producing forecast intervals.
95% forecast interval:
, where
is the variance of
.
For
,
for all ARIMA models regardless of parameters and orders.
For ARIMA(0,0,q),
:
In general, forecast intervals from ARIMA models will increase as the forecast horizon increases.
Variations and extensions
A number of variations on the ARIMA model are commonly employed. If multiple time series are used then the
can be thought of as vectors and a VARIMA model may be appropriate. Sometimes a seasonal effect is suspected in the model; in that case, it is generally considered better to use a SARIMA (seasonal ARIMA) model than to increase the order of the AR or MA parts of the model.
If the time-series is suspected to exhibit
long-range dependence, then the ''d'' parameter may be allowed to have non-integer values in an
autoregressive fractionally integrated moving average model, which is also called a Fractional ARIMA (FARIMA or ARFIMA) model.
Software implementations
Various packages that apply methodology like
Box–Jenkins parameter optimization are available to find the right parameters for the ARIMA model.
*
EViews: has extensive ARIMA and SARIMA capabilities.
*
Julia: contains an ARIMA implementation in the TimeModels package
*
Mathematica
Wolfram (previously known as Mathematica and Wolfram Mathematica) is a software system with built-in libraries for several areas of technical computing that allows machine learning, statistics, symbolic computation, data manipulation, network ...
: include
ARIMAProcessfunction.
*
MATLAB
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...
: th
Econometrics Toolboxinclude
an
*
NCSS: includes several procedures for
ARIMA
fitting and forecasting.
*
Python: th
"statsmodels"package includes models for time series analysis – univariate time series analysis: AR, ARIMA – vector autoregressive models, VAR and structural VAR – descriptive statistics and process models for time series analysis.
*
R: the standard R ''stats'' package includes an ''arima'' function, which is documented i
"ARIMA Modelling of Time Series" Besides the part, the function also includes seasonal factors, an intercept term, and exogenous variables (''xreg'', called "external regressors"). The packag
has scripts such as ''sarima'' to estimate seasonal or nonseasonal models and ''sarima.sim'' to simulate from these models. The CRAN task view o
is the reference with many more links. Th
package in
R can automatically select an ARIMA model for a given time series with the function
hat can often give questionable result
and can also simulate seasonal and non-seasonal ARIMA models with its function.
*
Ruby
Ruby is a pinkish-red-to-blood-red-colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapph ...
: th
"statsample-timeseries"gem is used for time series analysis, including ARIMA models and Kalman Filtering.
*
JavaScript
JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior.
Web browsers have ...
: th
"arima"package includes models for time series analysis and forecasting (ARIMA, SARIMA, SARIMAX, AutoARIMA)
*
C: th
"ctsa"package includes ARIMA, SARIMA, SARIMAX, AutoARIMA and multiple methods for time series analysis.
SAFE TOOLBOXES include
an
*
SAS: includes extensive ARIMA processing in its Econometric and Time Series Analysis system: SAS/ETS.
* IBM
SPSS
SPSS Statistics is a statistical software suite developed by IBM for data management, advanced analytics, multivariate analysis, business intelligence, and criminal investigation. Long produced by SPSS Inc., it was acquired by IBM in 2009. Versi ...
: includes ARIMA modeling in the Professional and Premium editions of its Statistics package as well as its Modeler package. The default Expert Modeler feature evaluates a range of seasonal and non-seasonal autoregressive (''p''), integrated (''d''), and moving average (''q'') settings and seven exponential smoothing models. The Expert Modeler can also transform the target time-series data into its square root or natural log. The user also has the option to restrict the Expert Modeler to ARIMA models, or to manually enter ARIMA nonseasonal and seasonal ''p'', ''d'', and ''q'' settings without Expert Modeler. Automatic outlier detection is available for seven types of outliers, and the detected outliers will be accommodated in the time-series model if this feature is selected.
*
SAP: the APO-FCS package
in
SAP ERP
SAP ERP is enterprise resource planning software developed by the European company SAP SE. SAP ERP incorporates the key business functions of an organization. The latest version of SAP ERP (V.6.0) was made available in 2006. The most recent SA ...
from
SAP allows creation and fitting of ARIMA models using the Box–Jenkins methodology.
*
SQL Server Analysis Services: from
Microsoft
Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
includes ARIMA as a Data Mining algorithm.
*
Stata
Stata (, , alternatively , occasionally stylized as STATA) is a general-purpose Statistics, statistical software package developed by StataCorp for data manipulation, visualization, statistics, and automated reporting. It is used by researchers ...
includes ARIMA modelling (using its arima command) as of Stata 9.
StatSim includes ARIMA models in th
Forecastweb app.
*
Teradata
Teradata Corporation is an American software company that provides cloud database and Analytics, analytics-related software, products, and services. The company was formed in 1979 in Brentwood, California, as a collaboration between researchers a ...
Vantage has the ARIMA function as part of its machine learning engine.
* TOL (Time Oriented Language) is designed to model ARIMA models (including SARIMA, ARIMAX and DSARIMAX variants
*
Scala (programming language), Scalaspark-timeserieslibrary contains ARIMA implementation for Scala, Java and Python. Implementation is designed to run on
Apache Spark
Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of Californ ...
.
*
PostgreSQL
PostgreSQL ( ) also known as Postgres, is a free and open-source software, free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transaction processing, transactions ...
/MadLib
Time Series Analysis/ARIMA
*
X-12-ARIMA
X-13ARIMA-SEATS, successor to X-12-ARIMA and X-11, is a set of statistical methods for seasonal adjustment and other descriptive analysis of time series data that are implemented in the U.S. Census Bureau's software package. These methods are or ...
: from the
US Bureau of the Census
See also
*
Autocorrelation
Autocorrelation, sometimes known as serial correlation in the discrete time case, measures the correlation of a signal with a delayed copy of itself. Essentially, it quantifies the similarity between observations of a random variable at differe ...
*
ARMA
Arma, ARMA or variants, may refer to:
Places
* Arma, Kansas, United States
* Arma, Nepal
* Arma District, Peru
* Arma District, Yemen
* Arma Mountains, Afghanistan
People
* Arma people, an ethnic group of the middle Niger River valley
* Arma lan ...
*
Finite impulse response
In signal processing, a finite impulse response (FIR) filter is a filter whose impulse response (or response to any finite length input) is of ''finite'' duration, because it settles to zero in finite time. This is in contrast to infinite impuls ...
*
Infinite impulse response
Infinite impulse response (IIR) is a property applying to many linear time-invariant systems that are distinguished by having an impulse response h(t) that does not become exactly zero past a certain point but continues indefinitely. This is in ...
*
Partial autocorrelation
*
X-13ARIMA-SEATS
References
Further reading
*
*
*
* Shumway R.H. and Stoffer, D.S. (2017). ''Time Series Analysis and Its Applications: With R Examples''. Springer
DOI: 10.1007/978-3-319-52452-8ARIMA Models in R Become an expert in fitting ARIMA (autoregressive integrated moving average) models to time series data using R.
External links
by Robert Nau at
Duke University
Duke University is a Private university, private research university in Durham, North Carolina, United States. Founded by Methodists and Quakers in the present-day city of Trinity, North Carolina, Trinity in 1838, the school moved to Durham in 1 ...
{{Stochastic processes
Time series models
de:ARMA-Modell#ARIMA