In
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
and
econometrics
Econometrics is the application of Statistics, statistical methods to economic data in order to give Empirical evidence, empirical content to economic relationships.M. Hashem Pesaran (1987). "Econometrics," ''The New Palgrave: A Dictionary of ...
, and in particular in
time series analysis
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Exa ...
, an autoregressive integrated moving average (ARIMA)
model
A model is an informative representation of an object, person or system. The term originally denoted the Plan_(drawing), plans of a building in late 16th-century English, and derived via French and Italian ultimately from Latin ''modulus'', a mea ...
is a generalization of an
autoregressive moving average
In statistics, econometrics and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it is used to describe certain time-varying processes in nature, economics, etc. The autoregressive model spe ...
(ARMA) model. Both of these models are fitted to
time series
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Exa ...
data either to better understand the data or to predict future points in the series (
forecasting
Forecasting is the process of making predictions based on past and present data. Later these can be compared (resolved) against what happens. For example, a company might estimate their revenue in the next year, then compare it against the actual ...
). ARIMA models are applied in some cases where data show evidence of
non-stationarity in the sense of mean (but not variance/
autocovariance
In probability theory and statistics, given a stochastic process, the autocovariance is a function that gives the covariance of the process with itself at pairs of time points. Autocovariance is closely related to the autocorrelation of the process ...
), where an initial differencing step (corresponding to the
"integrated" part of the model) can be applied one or more times to eliminate the non-stationarity of the mean function (i.e., the trend). When the seasonality shows in a time series, the seasonal-differencing
could be applied to eliminate the seasonal component. Since the
ARMA model, according to the Wold's decomposition theorem,
is theoretically sufficient to describe a regular (a.k.a. purely nondeterministic
)
wide-sense stationary
In mathematics and statistics, a stationary process (or a strict/strictly stationary process or strong/strongly stationary process) is a stochastic process whose unconditional joint probability distribution does not change when shifted in time. Con ...
time series, we are motivated to make stationary a non-stationary time series, e.g., by using differencing, before we can use the
ARMA model.
Note that if the time series contains a predictable sub-process (a.k.a. pure sine or complex-valued exponential process
), the predictable component is treated as a non-zero-mean but periodic (i.e., seasonal) component in the ARIMA framework so that it is eliminated by the seasonal differencing.
The part of ARIMA indicates that the evolving variable of interest is
regressed on its own lagged (i.e., prior) values. The part indicates that the
regression error
In statistics and optimization, errors and residuals are two closely related and easily confused measures of the deviation of an observed value of an element of a statistical sample from its "true value" (not necessarily observable). The er ...
is actually a
linear combination of error terms whose values occurred contemporaneously and at various times in the past. The (for "integrated") indicates that the data values have been replaced with the difference between their values and the previous values (and this differencing process may have been performed more than once). The purpose of each of these features is to make the model fit the data as well as possible.
Non-seasonal ARIMA models are generally denoted ARIMA(''p'',''d'',''q'') where
parameter
A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
s ''p'', ''d'', and ''q'' are non-negative integers, ''p'' is the order (number of time lags) of the
autoregressive model
In statistics, econometrics and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it is used to describe certain time-varying processes in nature, economics, etc. The autoregressive model spe ...
, ''d'' is the degree of differencing (the number of times the data have had past values subtracted), and ''q'' is the order of the
moving-average model
In time series analysis, the moving-average model (MA model), also known as moving-average process, is a common approach for modeling univariate time series. The moving-average model specifies that the output variable is Cross-correlation, cross ...
. Seasonal ARIMA models are usually denoted ARIMA(''p'',''d'',''q'')(''P'',''D'',''Q'')
''m'', where ''m'' refers to the number of periods in each season, and the uppercase ''P'',''D'',''Q'' refer to the autoregressive, differencing, and moving average terms for the seasonal part of the ARIMA model.
When two out of the three terms are zeros, the model may be referred to based on the non-zero parameter, dropping "", "" or "" from the acronym describing the model. For example, is , is , and is .
ARIMA models can be estimated following the
Box–Jenkins approach.
Definition
Given time series data ''X''
''t'' where ''t'' is an integer index and the ''X''
''t'' are real numbers, an
model is given by
:
or equivalently by
:
where
is the
lag operator
In time series analysis, the lag operator (L) or backshift operator (B) operates on an element of a time series to produce the previous element. For example, given some time series
:X= \
then
: L X_t = X_ for all t > 1
or similarly in term ...
, the
are the parameters of the autoregressive part of the model, the
are the parameters of the moving average part and the
are error terms. The error terms
are generally assumed to be
independent, identically distributed
In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is us ...
variables sampled from a
normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu ...
with zero mean.
Assume now that the polynomial
has a
unit root
In probability theory and statistics, a unit root is a feature of some stochastic processes (such as random walks) that can cause problems in statistical inference involving time series models. A linear stochastic process has a unit root if 1 is ...
(a factor
) of multiplicity ''d''. Then it can be rewritten as:
:
An ARIMA(''p'',''d'',''q'') process expresses this polynomial factorisation property with ''p''=''p'−d'', and is given by:
:
and thus can be thought as a particular case of an ARMA(''p+d'',''q'') process having the autoregressive polynomial with ''d'' unit roots. (For this reason, no process that is accurately described by an ARIMA model with ''d'' > 0 is
wide-sense stationary
In mathematics and statistics, a stationary process (or a strict/strictly stationary process or strong/strongly stationary process) is a stochastic process whose unconditional joint probability distribution does not change when shifted in time. Con ...
.)
The above can be generalized as follows.
:
This defines an ARIMA(''p'',''d'',''q'') process with drift
.
Other special forms
The explicit identification of the factorisation of the autoregression polynomial into factors as above, can be extended to other cases, firstly to apply to the moving average polynomial and secondly to include other special factors. For example, having a factor
in a model is one way of including a non-stationary seasonality of period ''s'' into the model; this factor has the effect of re-expressing the data as changes from ''s'' periods ago. Another example is the factor
, which includes a (non-stationary) seasonality of period 2. The effect of the first type of factor is to allow each season's value to drift separately over time, whereas with the second type values for adjacent seasons move together.
Identification and specification of appropriate factors in an ARIMA model can be an important step in modelling as it can allow a reduction in the overall number of parameters to be estimated, while allowing the imposition on the model of types of behaviour that logic and experience suggest should be there.
Differencing
A stationary time series's properties do not depend on the time at which the series is observed. Specifically, for a
wide-sense stationary
In mathematics and statistics, a stationary process (or a strict/strictly stationary process or strong/strongly stationary process) is a stochastic process whose unconditional joint probability distribution does not change when shifted in time. Con ...
time series, the mean and the variance/
autocovariance
In probability theory and statistics, given a stochastic process, the autocovariance is a function that gives the covariance of the process with itself at pairs of time points. Autocovariance is closely related to the autocorrelation of the process ...
keep constant over time. Differencing in statistics is a transformation applied to a non-stationary time-series in order to make it stationary ''in the mean sense'' (viz., to remove the non-constant trend), but having nothing to do with the non-stationarity of the variance or
autocovariance
In probability theory and statistics, given a stochastic process, the autocovariance is a function that gives the covariance of the process with itself at pairs of time points. Autocovariance is closely related to the autocorrelation of the process ...
. Likewise, the seasonal differencing is applied to a seasonal time-series to remove the seasonal component. From the perspective of signal processing, especially the
Fourier spectral analysis theory, the trend is the low-frequency part in the spectrum of a non-stationary time series, while the season is the periodic-frequency part in the spectrum of it. Therefore, the differencing works as a
high-pass
A high-pass filter (HPF) is an electronic filter that passes signals with a frequency higher than a certain cutoff frequency and attenuates signals with frequencies lower than the cutoff frequency. The amount of attenuation for each frequency de ...
(i.e., low-stop) filter and the seasonal-differencing as a
comb filter
In signal processing, a comb filter is a filter implemented by adding a delayed version of a signal to itself, causing constructive and destructive interference. The frequency response of a comb filter consists of a series of regularly spaced no ...
to suppress the low-frequency trend and the periodic-frequency season in the spectrum domain (rather than directly in the time domain), respectively.
To difference the data, the difference between consecutive observations is computed. Mathematically, this is shown as
:
Differencing removes the changes in the level of a time series, eliminating trend and seasonality and consequently stabilizing the mean of the time series.
Sometimes it may be necessary to difference the data a second time to obtain a stationary time series, which is referred to as second-order differencing:
:
Another method of differencing data is seasonal differencing, which involves computing the difference between an observation and the corresponding observation in the previous season e.g a year. This is shown as:
:
The differenced data are then used for the estimation of an
ARMA model.
Examples
Some well-known special cases arise naturally or are mathematically equivalent to other popular forecasting models. For example:
* An ARIMA(0, 1, 0) model (or model) is given by
— which is simply a
random walk
In mathematics, a random walk is a random process that describes a path that consists of a succession of random steps on some mathematical space.
An elementary example of a random walk is the random walk on the integer number line \mathbb Z ...
.
* An ARIMA(0, 1, 0) with a constant, given by
— which is a random walk with drift.
* An ARIMA(0, 0, 0) model is a
white noise
In signal processing, white noise is a random signal having equal intensity at different frequencies, giving it a constant power spectral density. The term is used, with this or similar meanings, in many scientific and technical disciplines, ...
model.
* An ARIMA(0, 1, 2) model is a Damped Holt's model.
* An ARIMA(0, 1, 1) model without constant is a
basic exponential smoothing model.
* An ARIMA(0, 2, 2) model is given by
— which is equivalent to Holt's linear method with additive errors, or
double exponential smoothing.
Choosing the order
The order p and q can be determined using the sample autocorrelation function (ACF), partial autocorrelation function (PACF), and/or extended autocorrelation function (EACF) method.
Other alternative methods include AIC, BIC, etc.
To determine the order of a non-seasonal ARIMA model, a useful criterion is the
Akaike information criterion (AIC). It is written as
:
where ''L ''is the likelihood of the data, ''p ''is the order of the autoregressive part and ''q ''is the order of the moving average part. The ''k'' represents the intercept of the ARIMA model. For AIC, if ''k'' = 1 then there is an intercept in the ARIMA model (''c ''≠ 0) and if ''k ''= 0 then there is no intercept in the ARIMA model (''c ''= 0).
The corrected AIC for ARIMA models can be written as
:
The
Bayesian Information Criterion (BIC) can be written as
:
The objective is to minimize the AIC, AICc or BIC values for a good model. The lower the value of one of these criteria for a range of models being investigated, the better the model will suit the data. The AIC and the BIC are used for two completely different purposes. While the AIC tries to approximate models towards the reality of the situation, the BIC attempts to find the perfect fit. The BIC approach is often criticized as there never is a perfect fit to real-life complex data; however, it is still a useful method for selection as it penalizes models more heavily for having more parameters than the AIC would.
AICc can only be used to compare ARIMA models with the same orders of differencing. For ARIMAs with different orders of differencing,
RMSE
The root-mean-square deviation (RMSD) or root-mean-square error (RMSE) is a frequently used measure of the differences between values (sample or population values) predicted by a model or an estimator and the values observed. The RMSD represents ...
can be used for model comparison.
Estimation of coefficients
Forecasts using ARIMA models
The ARIMA model can be viewed as a "cascade" of two models. The first is non-stationary:
:
while the second is
wide-sense stationary
In mathematics and statistics, a stationary process (or a strict/strictly stationary process or strong/strongly stationary process) is a stochastic process whose unconditional joint probability distribution does not change when shifted in time. Con ...
:
:
Now forecasts can be made for the process
, using a generalization of the method of
autoregressive forecasting.
Forecast intervals
The forecast intervals (
confidence interval
In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...
s for forecasts) for ARIMA models are based on assumptions that the residuals are uncorrelated and normally distributed. If either of these assumptions does not hold, then the forecast intervals may be incorrect. For this reason, researchers plot the ACF and histogram of the residuals to check the assumptions before producing forecast intervals.
95% forecast interval:
, where
is the variance of
.
For
,
for all ARIMA models regardless of parameters and orders.
For ARIMA(0,0,q),
:
In general, forecast intervals from ARIMA models will increase as the forecast horizon increases.
Variations and extensions
A number of variations on the ARIMA model are commonly employed. If multiple time series are used then the
can be thought of as vectors and a VARIMA model may be appropriate. Sometimes a seasonal effect is suspected in the model; in that case, it is generally considered better to use a SARIMA (seasonal ARIMA) model than to increase the order of the AR or MA parts of the model.
If the time-series is suspected to exhibit
long-range dependence Long-range dependence (LRD), also called long memory or long-range persistence, is a phenomenon that may arise in the analysis of spatial or time series data. It relates to the rate of decay of statistical dependence of two points with increasing t ...
, then the ''d'' parameter may be allowed to have non-integer values in an
autoregressive fractionally integrated moving average
In statistics, autoregressive fractionally integrated moving average models are time series models that generalize ARIMA (''autoregressive integrated moving average'') models by allowing non-integer values of the differencing parameter. These model ...
model, which is also called a Fractional ARIMA (FARIMA or ARFIMA) model.
Software implementations
Various packages that apply methodology like
Box–Jenkins parameter optimization are available to find the right parameters for the ARIMA model.
*
EViews
EViews is a statistical package for Microsoft Windows, Windows, used mainly for time-series oriented econometrics, econometric analysis. It is developed by Quantitative Micro Software (QMS), now a part of IHS Inc., IHS. Version 1.0 was released ...
: has extensive ARIMA and SARIMA capabilities.
*
Julia
Julia is usually a feminine given name. It is a Latinate feminine form of the name Julio and Julius. (For further details on etymology, see the Wiktionary entry "Julius".) The given name ''Julia'' had been in use throughout Late Antiquity (e.g. ...
: contains an ARIMA implementation in the TimeModels package
*
Mathematica
Wolfram Mathematica is a software system with built-in libraries for several areas of technical computing that allow machine learning, statistics, symbolic computation, data manipulation, network analysis, time series analysis, NLP, optimizat ...
: include
ARIMAProcessfunction.
*
MATLAB
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementation ...
: th
Econometrics Toolboxinclude
an
*
NCSS: includes several procedures for
ARIMA
fitting and forecasting.
*
Python
Python may refer to:
Snakes
* Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia
** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia
* Python (mythology), a mythical serpent
Computing
* Python (pro ...
: th
"statsmodels"package includes models for time series analysis – univariate time series analysis: AR, ARIMA – vector autoregressive models, VAR and structural VAR – descriptive statistics and process models for time series analysis.
*
R: the standard R ''stats'' package includes an ''arima'' function, which is documented i
"ARIMA Modelling of Time Series" Besides the part, the function also includes seasonal factors, an intercept term, and exogenous variables (''xreg'', called "external regressors"). The CRAN task view o
is the reference with many more links. Th
package in
R can automatically select an ARIMA model for a given time series with the function and can also simulate seasonal and non-seasonal ARIMA models with its function.
*
Ruby
A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sa ...
: th
"statsample-timeseries"gem is used for time series analysis, including ARIMA models and Kalman Filtering.
*
JavaScript
JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
: th
"arima"package includes models for time series analysis and forecasting (ARIMA, SARIMA, SARIMAX, AutoARIMA)
*
C: th
"ctsa"package includes ARIMA, SARIMA, SARIMAX, AutoARIMA and multiple methods for time series analysis.
SAFE TOOLBOXES include
an
*
SAS: includes extensive ARIMA processing in its Econometric and Time Series Analysis system: SAS/ETS.
* IBM
SPSS
SPSS Statistics is a statistical software suite developed by IBM for data management, advanced analytics, multivariate analysis, business intelligence, and criminal investigation. Long produced by SPSS Inc., it was acquired by IBM in 2009. C ...
: includes ARIMA modeling in its Statistics and Modeler statistical packages. The default Expert Modeler feature evaluates a range of seasonal and non-seasonal autoregressive (''p''), integrated (''d''), and moving average (''q'') settings and seven exponential smoothing models. The Expert Modeler can also transform the target time-series data into its square root or natural log. The user also has the option to restrict the Expert Modeler to ARIMA models, or to manually enter ARIMA nonseasonal and seasonal ''p'', ''d'', and ''q'' settings without Expert Modeler. Automatic outlier detection is available for seven types of outliers, and the detected outliers will be accommodated in the time-series model if this feature is selected.
*
SAP
Sap is a fluid transported in xylem cells (vessel elements or tracheids) or phloem sieve tube elements of a plant. These cells transport water and nutrients throughout the plant.
Sap is distinct from latex, resin, or cell sap; it is a separa ...
: the APO-FCS package
in
SAP ERP
SAP ERP is an enterprise resource planning software developed by the German company SAP SE. SAP ERP incorporates the key business functions of an organization. The latest version of SAP ERP (V.6.0) was made available in 2006. The most recent SA ...
from
SAP
Sap is a fluid transported in xylem cells (vessel elements or tracheids) or phloem sieve tube elements of a plant. These cells transport water and nutrients throughout the plant.
Sap is distinct from latex, resin, or cell sap; it is a separa ...
allows creation and fitting of ARIMA models using the Box–Jenkins methodology.
*
SQL Server Analysis Services
Microsoft SQL Server Analysis Services (SSAS) is an online analytical processing (OLAP) and data mining tool in Microsoft SQL Server. SSAS is used as a tool by organizations to analyze and make sense of information possibly spread out across mul ...
: from
Microsoft
Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washing ...
includes ARIMA as a Data Mining algorithm.
*
Stata
Stata (, , alternatively , occasionally stylized as STATA) is a general-purpose statistical software package developed by StataCorp for data manipulation, visualization, statistics, and automated reporting. It is used by researchers in many fie ...
includes ARIMA modelling (using its arima command) as of Stata 9.
StatSim includes ARIMA models in th
Forecastweb app.
*
Teradata
Teradata Corporation is an American software company that provides cloud database and analytics-related software, products, and services. The company was formed in 1979 in Brentwood, California, as a collaboration between researchers at Caltech a ...
Vantage has the ARIMA function as part of its machine learning engine.
* TOL (Time Oriented Language) is designed to model ARIMA models (including SARIMA, ARIMAX and DSARIMAX variants
*
Scala (programming language), Scalaspark-timeserieslibrary contains ARIMA implementation for Scala, Java and Python. Implementation is designed to run on
Apache Spark
Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of Californi ...
.
*
PostgreSQL
PostgreSQL (, ), also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the In ...
/MadLib
Time Series Analysis/ARIMA
*
X-12-ARIMA
X-13ARIMA-SEATS, successor to X-12-ARIMA and X-11, is a set of statistical methods for seasonal adjustment and other descriptive analysis of time series data that are implemented in the U.S. Census Bureau's software package. These methods are or ...
: from the
US Bureau of the Census
The United States Census Bureau (USCB), officially the Bureau of the Census, is a principal agency of the U.S. Federal Statistical System, responsible for producing data about the American people and economy. The Census Bureau is part of the ...
See also
*
Autocorrelation
Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variable ...
*
ARMA
*
Partial autocorrelation
In time series analysis, the partial autocorrelation function (PACF) gives the partial correlation of a stationary time series with its own lagged values, regressed the values of the time series at all shorter lags. It contrasts with the autocorre ...
*
Finite impulse response
In signal processing, a finite impulse response (FIR) filter is a filter whose impulse response (or response to any finite length input) is of ''finite'' duration, because it settles to zero in finite time. This is in contrast to infinite impulse r ...
*
Infinite impulse response
Infinite impulse response (IIR) is a property applying to many linear time-invariant systems that are distinguished by having an impulse response h(t) which does not become exactly zero past a certain point, but continues indefinitely. This is in ...
References
Further reading
*
*
*
External links
The US Census Bureau uses ARIMA for "seasonally adjusted" data (programs, docs, and papers here)
{{Stochastic processes
Time series models
de:ARMA-Modell#ARIMA