Time-series analysis
   HOME

TheInfoList



OR:

In mathematics, a time series is a series of
data point In statistics, a unit of observation is the unit described by the data that one analyzes. A study may treat groups as a unit of observation with a country as the unit of analysis, drawing conclusions on group characteristics from data collected at ...
s indexed (or listed or graphed) in time order. Most commonly, a time series is a
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is calle ...
taken at successive equally spaced points in time. Thus it is a sequence of
discrete-time In mathematical dynamics, discrete time and continuous time are two alternative frameworks within which variables that evolve over time are modeled. Discrete time Discrete time views values of variables as occurring at distinct, separate "po ...
data. Examples of time series are heights of ocean tides, counts of sunspots, and the daily closing value of the
Dow Jones Industrial Average The Dow Jones Industrial Average (DJIA), Dow Jones, or simply the Dow (), is a stock market index of 30 prominent companies listed on stock exchanges in the United States. The DJIA is one of the oldest and most commonly followed equity inde ...
. A time series is very frequently plotted via a
run chart A run chart, also known as a run-sequence plot is a graph that displays observed data in a time sequence. Often, the data displayed represent some aspect of the output or performance of a manufacturing or other business process. It is therefore ...
(which is a temporal
line chart A line chart or line graph or curve chart is a type of chart which displays information as a series of data points called 'markers' connected by straight line segments. It is a basic type of chart common in many fields. It is similar to a ...
). Time series are used in statistics,
signal processing Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing ''signals'', such as sound, images, and scientific measurements. Signal processing techniques are used to optimize transmissions, ...
,
pattern recognition Pattern recognition is the automated recognition of patterns and regularities in data. It has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics ...
,
econometrics Econometrics is the application of statistical methods to economic data in order to give empirical content to economic relationships. M. Hashem Pesaran (1987). "Econometrics," '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8 ...
, mathematical finance,
weather forecasting Weather forecasting is the application of science and technology to predict the conditions of the atmosphere for a given location and time. People have attempted to predict the weather informally for millennia and formally since the 19th cent ...
,
earthquake prediction Earthquake prediction is a branch of the science of seismology concerned with the specification of the time, location, and magnitude of future earthquakes within stated limits, and particularly "the determination of parameters for the ''next'' ...
, electroencephalography,
control engineering Control engineering or control systems engineering is an engineering discipline that deals with control systems, applying control theory to design equipment and systems with desired behaviors in control environments. The discipline of controls o ...
,
astronomy Astronomy () is a natural science that studies celestial objects and phenomena. It uses mathematics, physics, and chemistry in order to explain their origin and evolution. Objects of interest include planets, moons, stars, nebulae, g ...
,
communications engineering Telecommunications Engineering is a subfield of electrical engineering which seeks to design and devise systems of communication at a distance. The work ranges from basic circuit design to strategic mass developments. A telecommunication enginee ...
, and largely in any domain of applied
science Science is a systematic endeavor that Scientific method, builds and organizes knowledge in the form of Testability, testable explanations and predictions about the universe. Science may be as old as the human species, and some of the earli ...
and
engineering Engineering is the use of scientific principles to design and build machines, structures, and other items, including bridges, tunnels, roads, vehicles, and buildings. The discipline of engineering encompasses a broad range of more speciali ...
which involves temporal measurements. Time series ''analysis'' comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. Time series ''forecasting'' is the use of a
model A model is an informative representation of an object, person or system. The term originally denoted the plans of a building in late 16th-century English, and derived via French and Italian ultimately from Latin ''modulus'', a measure. Models c ...
to predict future values based on previously observed values. While
regression analysis In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
is often employed in such a way as to test relationships between one or more different time series, this type of analysis is not usually called "time series analysis", which refers in particular to relationships between different points in time within a single series. Time series data have a natural temporal ordering. This makes time series analysis distinct from
cross-sectional studies In medical research, social science, and biology, a cross-sectional study (also known as a cross-sectional analysis, transverse study, prevalence study) is a type of observational study that analyzes data from a population, or a representative su ...
, in which there is no natural ordering of the observations (e.g. explaining people's wages by reference to their respective education levels, where the individuals' data could be entered in any order). Time series analysis is also distinct from
spatial data analysis Spatial analysis or spatial statistics includes any of the formal techniques which studies entities using their topological, geometric, or geographic properties. Spatial analysis includes a variety of techniques, many still in their early dev ...
where the observations typically relate to geographical locations (e.g. accounting for house prices by the location as well as the intrinsic characteristics of the houses). A stochastic model for a time series will generally reflect the fact that observations close together in time will be more closely related than observations further apart. In addition, time series models will often make use of the natural one-way ordering of time so that values for a given period will be expressed as deriving in some way from past values, rather than from future values (see
time reversibility A mathematical or physical process is time-reversible if the dynamics of the process remain well-defined when the sequence of time-states is reversed. A deterministic process is time-reversible if the time-reversed process satisfies the same dyn ...
). Time series analysis can be applied to
real-valued In mathematics, value may refer to several, strongly related notions. In general, a mathematical value may be any definite mathematical object. In elementary mathematics, this is most often a number – for example, a real number such as or an i ...
, continuous data,
discrete Discrete may refer to: *Discrete particle or quantum in physics, for example in quantum theory *Discrete device, an electronic component with just one circuit element, either passive or active, other than an integrated circuit *Discrete group, a g ...
numeric data, or discrete symbolic data (i.e. sequences of characters, such as letters and words in the
English language English is a West Germanic language of the Indo-European language family, with its earliest forms spoken by the inhabitants of early medieval England. It is named after the Angles, one of the ancient Germanic peoples that migrated to the ...
).


Methods for analysis

Methods for time series analysis may be divided into two classes:
frequency-domain In physics, electronics, control systems engineering, and statistics, the frequency domain refers to the analysis of mathematical functions or signals with respect to frequency, rather than time. Put simply, a time-domain graph shows how a si ...
methods and
time-domain Time domain refers to the analysis of mathematical functions, physical signals or time series of economic or environmental data, with respect to time. In the time domain, the signal or function's value is known for all real numbers, for the ca ...
methods. The former include spectral analysis and wavelet analysis; the latter include auto-correlation and cross-correlation analysis. In the time domain, correlation and analysis can be made in a filter-like manner using
scaled correlation In statistics, scaled correlation is a form of a coefficient of correlation applicable to data that have a temporal component such as time series. It is the average short-term correlation. If the signals have multiple components (slow and fast), sca ...
, thereby mitigating the need to operate in the frequency domain. Additionally, time series analysis techniques may be divided into parametric and
non-parametric Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being distri ...
methods. The parametric approaches assume that the underlying
stationary stochastic process In mathematics and statistics, a stationary process (or a strict/strictly stationary process or strong/strongly stationary process) is a stochastic process whose unconditional joint probability distribution does not change when shifted in time. Con ...
has a certain structure which can be described using a small number of parameters (for example, using an
autoregressive In statistics, econometrics and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it is used to describe certain time-varying processes in nature, economics, etc. The autoregressive model spe ...
or
moving average model In time series analysis, the moving-average model (MA model), also known as moving-average process, is a common approach for modeling univariate time series. The moving-average model specifies that the output variable is cross-correlated with a ...
). In these approaches, the task is to estimate the parameters of the model that describes the stochastic process. By contrast, non-parametric approaches explicitly estimate the
covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the ...
or the
spectrum A spectrum (plural ''spectra'' or ''spectrums'') is a condition that is not limited to a specific set of values but can vary, without gaps, across a continuum. The word was first used scientifically in optics to describe the rainbow of colors ...
of the process without assuming that the process has any particular structure. Methods of time series analysis may also be divided into
linear Linearity is the property of a mathematical relationship ('' function'') that can be graphically represented as a straight line. Linearity is closely related to '' proportionality''. Examples in physics include rectilinear motion, the linear ...
and non-linear, and
univariate In mathematics, a univariate object is an expression, equation, function or polynomial involving only one variable. Objects involving more than one variable are multivariate. In some cases the distinction between the univariate and multivariate ...
and multivariate.


Panel data

A time series is one type of
panel data In statistics and econometrics, panel data and longitudinal data are both multi-dimensional data involving measurements over time. Panel data is a subset of longitudinal data where observations are for the same subjects each time. Time series and ...
. Panel data is the general class, a multidimensional data set, whereas a time series data set is a one-dimensional panel (as is a
cross-sectional data Cross-sectional data, or a cross section of a study population, in statistics and econometrics, is a type of data collected by observing many subjects (such as individuals, firms, countries, or regions) at the one point or period of time. The anal ...
set). A data set may exhibit characteristics of both panel data and time series data. One way to tell is to ask what makes one data record unique from the other records. If the answer is the time data field, then this is a time series data set candidate. If determining a unique record requires a time data field and an additional identifier which is unrelated to time (e.g. student ID, stock symbol, country code), then it is panel data candidate. If the differentiation lies on the non-time identifier, then the data set is a cross-sectional data set candidate.


Analysis

There are several types of motivation and data analysis available for time series which are appropriate for different purposes.


Motivation

In the context of statistics,
econometrics Econometrics is the application of statistical methods to economic data in order to give empirical content to economic relationships. M. Hashem Pesaran (1987). "Econometrics," '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8 ...
,
quantitative finance Mathematical finance, also known as quantitative finance and financial mathematics, is a field of applied mathematics, concerned with mathematical modeling of financial markets. In general, there exist two separate branches of finance that require ...
,
seismology Seismology (; from Ancient Greek σεισμός (''seismós'') meaning "earthquake" and -λογία (''-logía'') meaning "study of") is the scientific study of earthquakes and the propagation of elastic waves through the Earth or through other ...
,
meteorology Meteorology is a branch of the atmospheric sciences (which include atmospheric chemistry and physics) with a major focus on weather forecasting. The study of meteorology dates back millennia, though significant progress in meteorology did no ...
, and
geophysics Geophysics () is a subject of natural science concerned with the physical processes and physical properties of the Earth and its surrounding space environment, and the use of quantitative methods for their analysis. The term ''geophysics'' so ...
the primary goal of time series analysis is
forecasting Forecasting is the process of making predictions based on past and present data. Later these can be compared (resolved) against what happens. For example, a company might estimate their revenue in the next year, then compare it against the actual ...
. In the context of
signal processing Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing ''signals'', such as sound, images, and scientific measurements. Signal processing techniques are used to optimize transmissions, ...
,
control engineering Control engineering or control systems engineering is an engineering discipline that deals with control systems, applying control theory to design equipment and systems with desired behaviors in control environments. The discipline of controls o ...
and
communication engineering Telecommunications Engineering is a subfield of electrical engineering which seeks to design and devise systems of communication at a distance. The work ranges from basic circuit design to strategic mass developments. A telecommunication enginee ...
it is used for signal detection. Other applications are in data mining,
pattern recognition Pattern recognition is the automated recognition of patterns and regularities in data. It has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics ...
and
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
, where time series analysis can be used for clustering, classification, query by content,
anomaly detection In data analysis, anomaly detection (also referred to as outlier detection and sometimes as novelty detection) is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority o ...
as well as
forecasting Forecasting is the process of making predictions based on past and present data. Later these can be compared (resolved) against what happens. For example, a company might estimate their revenue in the next year, then compare it against the actual ...
.


Exploratory analysis

A straightforward way to examine a regular time series is manually with a
line chart A line chart or line graph or curve chart is a type of chart which displays information as a series of data points called 'markers' connected by straight line segments. It is a basic type of chart common in many fields. It is similar to a ...
. An example chart is shown on the right for tuberculosis incidence in the United States, made with a spreadsheet program. The number of cases was standardized to a rate per 100,000 and the percent change per year in this rate was calculated. The nearly steadily dropping line shows that the TB incidence was decreasing in most years, but the percent change in this rate varied by as much as +/- 10%, with 'surges' in 1975 and around the early 1990s. The use of both vertical axes allows the comparison of two time series in one graphic. A study of corporate data analysts found two challenges to exploratory time series analysis: discovering the shape of interesting patterns, and finding an explanation for these patterns. Visual tools that represent time series data as heat map matrices can help overcome these challenges. Other techniques include: * Autocorrelation analysis to examine
serial dependence Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variable a ...
* Spectral analysis to examine cyclic behavior which need not be related to
seasonality In time series data, seasonality is the presence of variations that occur at specific regular intervals less than a year, such as weekly, monthly, or quarterly. Seasonality may be caused by various factors, such as weather, vacation, and holidays a ...
. For example, sunspot activity varies over 11 year cycles. Other common examples include celestial phenomena, weather patterns, neural activity, commodity prices, and economic activity. * Separation into components representing trend, seasonality, slow and fast variation, and cyclical irregularity: see
trend estimation Linear trend estimation is a statistical technique to aid interpretation of data. When a series of measurements of a process are treated as, for example, a sequences or time series, trend estimation can be used to make and justify statements abou ...
and
decomposition of time series The decomposition of time series is a statistical task that deconstructs a time series into several components, each representing one of the underlying categories of patterns. There are two principal types of decomposition, which are outlined belo ...


Curve fitting

Curve fitting is the process of constructing a
curve In mathematics, a curve (also called a curved line in older texts) is an object similar to a line, but that does not have to be straight. Intuitively, a curve may be thought of as the trace left by a moving point. This is the definition that ...
, or
mathematical function In mathematics, a function from a set to a set assigns to each element of exactly one element of .; the words map, mapping, transformation, correspondence, and operator are often used synonymously. The set is called the domain of the functi ...
, that has the best fit to a series of
data In the pursuit of knowledge, data (; ) is a collection of discrete Value_(semiotics), values that convey information, describing quantity, qualitative property, quality, fact, statistics, other basic units of meaning, or simply sequences of sy ...
points, possibly subject to constraints. Curve fitting can involve either interpolation, where an exact fit to the data is required, or
smoothing In statistics and image processing, to smooth a data set is to create an approximating function that attempts to capture important patterns in the data, while leaving out noise or other fine-scale structures/rapid phenomena. In smoothing, the dat ...
, in which a "smooth" function is constructed that approximately fits the data. A related topic is
regression analysis In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
, which focuses more on questions of statistical inference such as how much uncertainty is present in a curve that is fit to data observed with random errors. Fitted curves can be used as an aid for data visualization, to infer values of a function where no data are available, and to summarize the relationships among two or more variables.
Extrapolation In mathematics, extrapolation is a type of estimation, beyond the original observation range, of the value of a variable on the basis of its relationship with another variable. It is similar to interpolation, which produces estimates between know ...
refers to the use of a fitted curve beyond the range of the observed data, and is subject to a degree of uncertainty since it may reflect the method used to construct the curve as much as it reflects the observed data. The construction of economic time series involves the estimation of some components for some dates by interpolation between values ("benchmarks") for earlier and later dates. Interpolation is estimation of an unknown quantity between two known quantities (historical data), or drawing conclusions about missing information from the available information ("reading between the lines"). Interpolation is useful where the data surrounding the missing data is available and its trend, seasonality, and longer-term cycles are known. This is often done by using a related series known for all relevant dates. Alternatively
polynomial interpolation In numerical analysis, polynomial interpolation is the interpolation of a given data set by the polynomial of lowest possible degree that passes through the points of the dataset. Given a set of data points (x_0,y_0), \ldots, (x_n,y_n), with no ...
or
spline interpolation In the mathematical field of numerical analysis, spline interpolation is a form of interpolation where the interpolant is a special type of piecewise polynomial called a spline. That is, instead of fitting a single, high-degree polynomial to all ...
is used where piecewise
polynomial In mathematics, a polynomial is an expression consisting of indeterminates (also called variables) and coefficients, that involves only the operations of addition, subtraction, multiplication, and positive-integer powers of variables. An example ...
functions are fit into time intervals such that they fit smoothly together. A different problem which is closely related to interpolation is the approximation of a complicated function by a simple function (also called regression). The main difference between regression and interpolation is that polynomial regression gives a single polynomial that models the entire data set. Spline interpolation, however, yield a piecewise continuous function composed of many polynomials to model the data set.
Extrapolation In mathematics, extrapolation is a type of estimation, beyond the original observation range, of the value of a variable on the basis of its relationship with another variable. It is similar to interpolation, which produces estimates between know ...
is the process of estimating, beyond the original observation range, the value of a variable on the basis of its relationship with another variable. It is similar to interpolation, which produces estimates between known observations, but extrapolation is subject to greater
uncertainty Uncertainty refers to epistemic situations involving imperfect or unknown information. It applies to predictions of future events, to physical measurements that are already made, or to the unknown. Uncertainty arises in partially observable ...
and a higher risk of producing meaningless results.


Function approximation

In general, a function approximation problem asks us to select a
function Function or functionality may refer to: Computing * Function key, a type of key on computer keyboards * Function model, a structured representation of processes in a system * Function object or functor or functionoid, a concept of object-oriente ...
among a well-defined class that closely matches ("approximates") a target function in a task-specific way. One can distinguish two major classes of function approximation problems: First, for known target functions, approximation theory is the branch of
numerical analysis Numerical analysis is the study of algorithms that use numerical approximation (as opposed to symbolic manipulations) for the problems of mathematical analysis (as distinguished from discrete mathematics). It is the study of numerical methods ...
that investigates how certain known functions (for example,
special function Special functions are particular mathematical functions that have more or less established names and notations due to their importance in mathematical analysis, functional analysis, geometry, physics, or other applications. The term is defined b ...
s) can be approximated by a specific class of functions (for example,
polynomial In mathematics, a polynomial is an expression consisting of indeterminates (also called variables) and coefficients, that involves only the operations of addition, subtraction, multiplication, and positive-integer powers of variables. An example ...
s or rational functions) that often have desirable properties (inexpensive computation, continuity, integral and limit values, etc.). Second, the target function, call it ''g'', may be unknown; instead of an explicit formula, only a set of points (a time series) of the form (''x'', ''g''(''x'')) is provided. Depending on the structure of the domain and
codomain In mathematics, the codomain or set of destination of a function is the set into which all of the output of the function is constrained to fall. It is the set in the notation . The term range is sometimes ambiguously used to refer to either th ...
of ''g'', several techniques for approximating ''g'' may be applicable. For example, if ''g'' is an operation on the
real number In mathematics, a real number is a number that can be used to measure a ''continuous'' one-dimensional quantity such as a distance, duration or temperature. Here, ''continuous'' means that values can have arbitrarily small variations. Every ...
s, techniques of interpolation,
extrapolation In mathematics, extrapolation is a type of estimation, beyond the original observation range, of the value of a variable on the basis of its relationship with another variable. It is similar to interpolation, which produces estimates between know ...
,
regression analysis In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
, and
curve fitting Curve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points, possibly subject to constraints. Curve fitting can involve either interpolation, where an exact fit to the data i ...
can be used. If the
codomain In mathematics, the codomain or set of destination of a function is the set into which all of the output of the function is constrained to fall. It is the set in the notation . The term range is sometimes ambiguously used to refer to either th ...
(range or target set) of ''g'' is a finite set, one is dealing with a classification problem instead. A related problem of ''online'' time series approximation is to summarize the data in one-pass and construct an approximate representation that can support a variety of time series queries with bounds on worst-case error. To some extent, the different problems ( regression, classification,
fitness approximation Fitness approximationY. JinA comprehensive survey of fitness approximation in evolutionary computation ''Soft Computing'', 9:3–12, 2005 aims to approximate the objective or fitness functions in evolutionary optimization by building up machine l ...
) have received a unified treatment in
statistical learning theory Statistical learning theory is a framework for machine learning drawing from the fields of statistics and functional analysis. Statistical learning theory deals with the statistical inference problem of finding a predictive function based on dat ...
, where they are viewed as
supervised learning Supervised learning (SL) is a machine learning paradigm for problems where the available data consists of labelled examples, meaning that each data point contains features (covariates) and an associated label. The goal of supervised learning alg ...
problems.


Prediction and forecasting

In statistics, prediction is a part of statistical inference. One particular approach to such inference is known as
predictive inference Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properti ...
, but the prediction can be undertaken within any of the several approaches to statistical inference. Indeed, one description of statistics is that it provides a means of transferring knowledge about a sample of a population to the whole population, and to other related populations, which is not necessarily the same as prediction over time. When information is transferred across time, often to specific points in time, the process is known as
forecasting Forecasting is the process of making predictions based on past and present data. Later these can be compared (resolved) against what happens. For example, a company might estimate their revenue in the next year, then compare it against the actual ...
. * Fully formed statistical models for
stochastic simulation A stochastic simulation is a simulation of a system that has variables that can change stochastically (randomly) with individual probabilities.DLOUHÝ, M.; FÁBRY, J.; KUNCOVÁ, M.. Simulace pro ekonomy. Praha : VŠE, 2005. Realizations of these ...
purposes, so as to generate alternative versions of the time series, representing what might happen over non-specific time-periods in the future * Simple or fully formed statistical models to describe the likely outcome of the time series in the immediate future, given knowledge of the most recent outcomes (forecasting). * Forecasting on time series is usually done using automated statistical software packages and programming languages, such as
Julia Julia is usually a feminine given name. It is a Latinate feminine form of the name Julio and Julius. (For further details on etymology, see the Wiktionary entry "Julius".) The given name ''Julia'' had been in use throughout Late Antiquity (e.g ...
,
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
, R, SAS,
SPSS SPSS Statistics is a statistical software suite developed by IBM for data management, advanced analytics, multivariate analysis, business intelligence, and criminal investigation. Long produced by SPSS Inc., it was acquired by IBM in 2009. C ...
and many others. * Forecasting on large scale data can be done with Apache Spark using the Spark-TS library, a third-party package.


Classification

Assigning time series pattern to a specific category, for example identify a word based on series of hand movements in
sign language Sign languages (also known as signed languages) are languages that use the visual-manual modality to convey meaning, instead of spoken words. Sign languages are expressed through manual articulation in combination with non-manual markers. Sign l ...
.


Signal estimation

This approach is based on harmonic analysis and filtering of signals in the
frequency domain In physics, electronics, control systems engineering, and statistics, the frequency domain refers to the analysis of mathematical functions or signals with respect to frequency, rather than time. Put simply, a time-domain graph shows how a s ...
using the Fourier transform, and spectral density estimation, the development of which was significantly accelerated during
World War II World War II or the Second World War, often abbreviated as WWII or WW2, was a world war that lasted from 1939 to 1945. It involved the vast majority of the world's countries—including all of the great powers—forming two opposing ...
by mathematician Norbert Wiener, electrical engineers Rudolf E. Kálmán,
Dennis Gabor Dennis Gabor ( ; hu, Gábor Dénes, ; 5 June 1900 – 9 February 1979) was a Hungarian-British electrical engineer and physicist, most notable for inventing holography, for which he later received the 1971 Nobel Prize in Physics. He obtained ...
and others for filtering signals from noise and predicting signal values at a certain point in time. See
Kalman filter For statistics and control theory, Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, including statistical noise and other inaccuracies, and produces estima ...
, Estimation theory, and Digital signal processing


Segmentation

Splitting a time-series into a sequence of segments. It is often the case that a time-series can be represented as a sequence of individual segments, each with its own characteristic properties. For example, the audio signal from a conference call can be partitioned into pieces corresponding to the times during which each person was speaking. In time-series segmentation, the goal is to identify the segment boundary points in the time-series, and to characterize the dynamical properties associated with each segment. One can approach this problem using change-point detection, or by modeling the time-series as a more sophisticated system, such as a Markov jump linear system.


Models

Models for time series data can have many forms and represent different stochastic processes. When modeling variations in the level of a process, three broad classes of practical importance are the ''
autoregressive In statistics, econometrics and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it is used to describe certain time-varying processes in nature, economics, etc. The autoregressive model spe ...
'' (AR) models, the ''integrated'' (I) models, and the ''
moving average In statistics, a moving average (rolling average or running average) is a calculation to analyze data points by creating a series of averages of different subsets of the full data set. It is also called a moving mean (MM) or rolling mean and is ...
'' (MA) models. These three classes depend linearly on previous data points. Combinations of these ideas produce
autoregressive moving average In statistics, econometrics and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it is used to describe certain time-varying processes in nature, economics, etc. The autoregressive model spe ...
(ARMA) and
autoregressive integrated moving average In statistics and econometrics, and in particular in time series analysis, an autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model. Both of these models are fitted to time ser ...
(ARIMA) models. The
autoregressive fractionally integrated moving average In statistics, autoregressive fractionally integrated moving average models are time series models that generalize ARIMA (''autoregressive integrated moving average'') models by allowing non-integer values of the differencing parameter. These model ...
(ARFIMA) model generalizes the former three. Extensions of these classes to deal with vector-valued data are available under the heading of multivariate time-series models and sometimes the preceding acronyms are extended by including an initial "V" for "vector", as in VAR for
vector autoregression Vector autoregression (VAR) is a statistical model used to capture the relationship between multiple quantities as they change over time. VAR is a type of stochastic process model. VAR models generalize the single-variable (univariate) autoregres ...
. An additional set of extensions of these models is available for use where the observed time-series is driven by some "forcing" time-series (which may not have a causal effect on the observed series): the distinction from the multivariate case is that the forcing series may be deterministic or under the experimenter's control. For these models, the acronyms are extended with a final "X" for "exogenous". Non-linear dependence of the level of a series on previous data points is of interest, partly because of the possibility of producing a chaotic time series. However, more importantly, empirical investigations can indicate the advantage of using predictions derived from non-linear models, over those from linear models, as for example in nonlinear autoregressive exogenous models. Further references on nonlinear time series analysis: (Kantz and Schreiber), and (Abarbanel) Among other types of non-linear time series models, there are models to represent the changes of variance over time (
heteroskedasticity In statistics, a sequence (or a vector) of random variables is homoscedastic () if all its random variables have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The s ...
). These models represent autoregressive conditional heteroskedasticity (ARCH) and the collection comprises a wide variety of representation (
GARCH In econometrics, the autoregressive conditional heteroskedasticity (ARCH) model is a statistical model for time series data that describes the variance of the current error term or innovation as a function of the actual sizes of the previous time ...
, TARCH, EGARCH, FIGARCH, CGARCH, etc.). Here changes in variability are related to, or predicted by, recent past values of the observed series. This is in contrast to other possible representations of locally varying variability, where the variability might be modelled as being driven by a separate time-varying process, as in a
doubly stochastic model In statistics, a doubly stochastic model is a type of model that can arise in many contexts, but in particular in modelling time-series and stochastic processes. The basic idea for a doubly stochastic model is that an observed random variable is m ...
. In recent work on model-free analyses, wavelet transform based methods (for example locally stationary wavelets and wavelet decomposed neural networks) have gained favor. Multiscale (often referred to as multiresolution) techniques decompose a given time series, attempting to illustrate time dependence at multiple scales. See also Markov switching multifractal (MSMF) techniques for modeling volatility evolution. A
Hidden Markov model A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an o ...
(HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states. An HMM can be considered as the simplest
dynamic Bayesian network A Dynamic Bayesian Network (DBN) is a Bayesian network (BN) which relates variables to each other over adjacent time steps. This is often called a ''Two-Timeslice'' BN (2TBN) because it says that at any point in time T, the value of a variable c ...
. HMM models are widely used in
speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the ...
, for translating a time series of spoken words into text.


Notation

A number of different notations are in use for time-series analysis. A common notation specifying a time series ''X'' that is indexed by the
natural number In mathematics, the natural numbers are those numbers used for counting (as in "there are ''six'' coins on the table") and ordering (as in "this is the ''third'' largest city in the country"). Numbers used for counting are called ''cardinal ...
s is written :''X'' = (''X''1, ''X''2, ...). Another common notation is :''Y'' = (''Yt'': ''t'' ∈ ''T''), where ''T'' is the
index set In mathematics, an index set is a set whose members label (or index) members of another set. For instance, if the elements of a set may be ''indexed'' or ''labeled'' by means of the elements of a set , then is an index set. The indexing consists ...
.


Conditions

There are two sets of conditions under which much of the theory is built: *
Stationary process In mathematics and statistics, a stationary process (or a strict/strictly stationary process or strong/strongly stationary process) is a stochastic process whose unconditional joint probability distribution does not change when shifted in time. Con ...
*
Ergodic process In physics, statistics, econometrics and signal processing, a stochastic process is said to be in an ergodic regime if an observable's ensemble average equals the time average. In this regime, any collection of random samples from a process must r ...
Ergodicity implies stationarity, but the converse is not necessarily the case. Stationarity is usually classified into strict stationarity and wide-sense or second-order stationarity. Both models and applications can be developed under each of these conditions, although the models in the latter case might be considered as only partly specified. In addition, time-series analysis can be applied where the series are seasonally stationary or non-stationary. Situations where the amplitudes of frequency components change with time can be dealt with in time-frequency analysis which makes use of a
time–frequency representation A time–frequency representation (TFR) is a view of a signal (taken to be a function of time) represented over both time and frequency. Time–frequency analysis means analysis into the time–frequency domain provided by a TFR. This is achieved b ...
of a time-series or signal.


Tools

Tools for investigating time-series data include: * Consideration of the
autocorrelation function Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variabl ...
and the spectral density function (also
cross-correlation function In signal processing, cross-correlation is a measure of similarity of two series as a function of the displacement of one relative to the other. This is also known as a ''sliding dot product'' or ''sliding inner-product''. It is commonly used f ...
s and cross-spectral density functions) * Scaled cross- and auto-correlation functions to remove contributions of slow components * Performing a Fourier transform to investigate the series in the
frequency domain In physics, electronics, control systems engineering, and statistics, the frequency domain refers to the analysis of mathematical functions or signals with respect to frequency, rather than time. Put simply, a time-domain graph shows how a s ...
* Discrete, continuous or mixed spectra of time series, depending on whether the time series contains a (generalized) harmonic signal or not * Use of a
filter Filter, filtering or filters may refer to: Science and technology Computing * Filter (higher-order function), in functional programming * Filter (software), a computer program to process a data stream * Filter (video), a software component tha ...
to remove unwanted
noise Noise is unwanted sound considered unpleasant, loud or disruptive to hearing. From a physics standpoint, there is no distinction between noise and desired sound, as both are vibrations through a medium, such as air or water. The difference aris ...
* Principal component analysis (or empirical orthogonal function analysis) * Singular spectrum analysis * "Structural" models: ** General
State Space Model In control engineering, a state-space representation is a mathematical model of a physical system as a set of input, output and state variables related by first-order differential equations or difference equations. State variables are variables wh ...
s ** Unobserved Components Models *
Machine Learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
**
Artificial neural network Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains. An ANN is based on a collection of connected unit ...
s ** Support vector machine ** Fuzzy logic ** Gaussian process **
Genetic Programming In artificial intelligence, genetic programming (GP) is a technique of evolving programs, starting from a population of unfit (usually random) programs, fit for a particular task by applying operations analogous to natural genetic processes to t ...
** Gene expression programming **
Hidden Markov model A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an o ...
**
Multi expression programming Multi Expression Programming (MEP) is an evolutionary algorithm for generating mathematical functions describing a given set of data. MEP is a Genetic Programming variant encoding multiple solutions in the same chromosome. MEP representation is no ...
*
Queueing theory Queueing theory is the mathematical study of waiting lines, or queues. A queueing model is constructed so that queue lengths and waiting time can be predicted. Queueing theory is generally considered a branch of operations research because the ...
analysis *
Control chart Control charts is a graph used in production control to determine whether quality and manufacturing processes are being controlled under stable conditions. (ISO 7870-1) The hourly status is arranged on the graph, and the occurrence of abnormalit ...
**
Shewhart individuals control chart In statistical quality control, the individual/moving-range chart is a type of control chart used to monitor variables data from a business or industrial process for which it is impractical to use rational subgroups. The chart is necessary in t ...
**
CUSUM In statistical quality control, the CUsUM (or cumulative sum control chart) is a sequential analysis technique developed by E. S. Page of the University of Cambridge. It is typically used for monitoring change detection. CUSUM was announced in ...
chart ** EWMA chart *
Detrended fluctuation analysis In stochastic processes, chaos theory and time series analysis, detrended fluctuation analysis (DFA) is a method for determining the statistical self-affinity of a signal. It is useful for analysing time series that appear to be long-memory proces ...
* Nonlinear mixed-effects modeling *
Dynamic time warping In time series analysis, dynamic time warping (DTW) is an algorithm for measuring similarity between two temporal sequences, which may vary in speed. For instance, similarities in walking could be detected using DTW, even if one person was walk ...
*
Dynamic Bayesian network A Dynamic Bayesian Network (DBN) is a Bayesian network (BN) which relates variables to each other over adjacent time steps. This is often called a ''Two-Timeslice'' BN (2TBN) because it says that at any point in time T, the value of a variable c ...
* Time-frequency analysis techniques: ** Fast Fourier transform **
Continuous wavelet transform Continuity or continuous may refer to: Mathematics * Continuity (mathematics), the opposing concept to discreteness; common examples include ** Continuous probability distribution or random variable in probability and statistics ** Continuous g ...
**
Short-time Fourier transform The short-time Fourier transform (STFT), is a Fourier-related transform used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time. In practice, the procedure for computing STFTs is to divi ...
**
Chirplet transform In signal processing, the chirplet transform is an inner product of an input signal with a family of analysis primitives called chirplets.S. Mann and S. Haykin,The Chirplet transform: A generalization of Gabor's logon transform, ''Proc. Vision Int ...
** Fractional Fourier transform * Chaotic analysis **
Correlation dimension In chaos theory, the correlation dimension (denoted by ''ν'') is a measure of the dimensionality of the space occupied by a set of random points, often referred to as a type of fractal dimension. For example, if we have a set of random points on t ...
**
Recurrence plot In descriptive statistics and chaos theory, a recurrence plot (RP) is a plot showing, for each moment i in time, the times at which the state of a dynamical system returns to the previous state at i, i.e., when the phase space trajectory visits roug ...
s **
Recurrence quantification analysis Recurrence quantification analysis (RQA) is a method of nonlinear data analysis (cf. chaos theory) for the investigation of dynamical systems. It quantifies the number and duration of recurrences of a dynamical system presented by its phase space tr ...
** Lyapunov exponents **
Entropy encoding In information theory, an entropy coding (or entropy encoding) is any lossless data compression method that attempts to approach the lower bound declared by Shannon's source coding theorem, which states that any lossless data compression method ...


Measures

Time series metrics or features that can be used for time series classification or
regression analysis In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
: * Univariate linear measures ** Moment (mathematics) ** Spectral band power ** Spectral edge frequency ** Accumulated
Energy (signal processing) In signal processing, the energy E_s of a continuous-time signal ''x''(''t'') is defined as the area under the squared magnitude of the considered signal i.e., mathematically :E_ \ \ = \ \ \langle x(t), x(t)\rangle \ \ = \int_^dt :Unit of E_sw ...
** Characteristics of the autocorrelation function ** Hjorth parameters **
FFT A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). Fourier analysis converts a signal from its original domain (often time or space) to a representation in the ...
parameters **
Autoregressive model In statistics, econometrics and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it is used to describe certain time-varying processes in nature, economics, etc. The autoregressive model spe ...
parameters ** Mann–Kendall test * Univariate non-linear measures ** Measures based on the correlation sum **
Correlation dimension In chaos theory, the correlation dimension (denoted by ''ν'') is a measure of the dimensionality of the space occupied by a set of random points, often referred to as a type of fractal dimension. For example, if we have a set of random points on t ...
**
Correlation integral In chaos theory, the correlation integral is the mean probability that the states at two different times are close: :C(\varepsilon) = \lim_ \frac \sum_^N \Theta(\varepsilon - \, \vec(i) - \vec(j)\, ), \quad \vec(i) \in \mathbb^m, where N is the n ...
** Correlation density ** Correlation entropy **
Approximate entropy In statistics, an approximate entropy (ApEn) is a technique used to quantify the amount of regularity and the unpredictability of fluctuations over time-series data. For example, consider two series of data: : Series A: (0, 1, 0, 1, 0, 1, 0, 1, 0, ...
** Sample entropy ** ** Wavelet entropy ** Dispersion entropy ** Fluctuation dispersion entropy ** Rényi entropy ** Higher-order methods ** Marginal predictability ** Dynamical similarity index **
State space A state space is the set of all possible configurations of a system. It is a useful abstraction for reasoning about the behavior of a given system and is widely used in the fields of artificial intelligence and game theory. For instance, the to ...
dissimilarity measures ** Lyapunov exponent ** Permutation methods **
Local flow In mathematics, a flow formalizes the idea of the motion of particles in a fluid. Flows are ubiquitous in science, including engineering and physics. The notion of flow is basic to the study of ordinary differential equations. Informally, a f ...
* Other univariate measures ** Algorithmic complexity **
Kolmogorov complexity In algorithmic information theory (a subfield of computer science and mathematics), the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program (in a predetermined programming language) that produ ...
estimates **
Hidden Markov Model A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an o ...
states ** Rough path signature ** Surrogate time series and surrogate correction ** Loss of recurrence (degree of non-stationarity) * Bivariate linear measures ** Maximum linear cross-correlation ** Linear
Coherence (signal processing) In signal processing, the coherence is a statistic that can be used to examine the relation between two signals or data sets. It is commonly used to estimate the power transfer between input and output of a linear system. If the signals are ergodi ...
* Bivariate non-linear measures ** Non-linear interdependence ** Dynamical Entrainment (physics) ** Measures for
Phase synchronization {{no footnotes, date=June 2017 Phase synchronization is the process by which two or more cyclic signals tend to oscillate with a repeating sequence of relative phase angles. Phase synchronisation is usually applied to two waveforms of the same fr ...
** Measures for
Phase locking In mathematics, particularly in dynamical systems, Arnold tongues (named after Vladimir Arnold) Section 12 in page 78 has a figure showing Arnold tongues. are a pictorial phenomenon that occur when visualizing how the rotation number of a dynami ...
* Similarity measures: ** Cross-correlation **
Dynamic Time Warping In time series analysis, dynamic time warping (DTW) is an algorithm for measuring similarity between two temporal sequences, which may vary in speed. For instance, similarities in walking could be detected using DTW, even if one person was walk ...
** Hidden Markov Models **
Edit distance In computational linguistics and computer science, edit distance is a string metric, i.e. a way of quantifying how dissimilar two strings (e.g., words) are to one another, that is measured by counting the minimum number of operations required to ...
** Total correlation ** Newey–West estimator ** Prais–Winsten transformation ** Data as Vectors in a Metrizable Space *** Minkowski distance *** Mahalanobis distance ** Data as time series with envelopes *** Global standard deviation *** Local standard deviation *** Windowed standard deviation ** Data interpreted as stochastic series ***
Pearson product-moment correlation coefficient In statistics, the Pearson correlation coefficient (PCC, pronounced ) ― also known as Pearson's ''r'', the Pearson product-moment correlation coefficient (PPMCC), the bivariate correlation, or colloquially simply as the correlation coefficient ...
***
Spearman's rank correlation coefficient In statistics, Spearman's rank correlation coefficient or Spearman's ''ρ'', named after Charles Spearman and often denoted by the Greek letter \rho (rho) or as r_s, is a nonparametric measure of rank correlation ( statistical dependence betwee ...
** Data interpreted as a probability distribution function ***
Kolmogorov–Smirnov test In statistics, the Kolmogorov–Smirnov test (K–S test or KS test) is a nonparametric test of the equality of continuous (or discontinuous, see Section 2.2), one-dimensional probability distributions that can be used to compare a sample wit ...
***
Cramér–von Mises criterion In statistics the Cramér–von Mises criterion is a criterion used for judging the goodness of fit of a cumulative distribution function F^* compared to a given empirical distribution function F_n, or for comparing two empirical distributions. ...


Visualization

Time series can be visualized with two categories of chart: Overlapping Charts and Separated Charts. Overlapping Charts display all-time series on the same layout while Separated Charts presents them on different layouts (but aligned for comparison purpose)


Overlapping charts

*
Braided graphs Braided is a musical group consisting of Casey LeBlanc, Ashley Leitão, and Amber Fleury, who all competed on the third season of ''Canadian Idol ''Canadian Idol'' is a Canadian reality television competition show which aired on CTV, based on ...
* Line charts * Slope graphs *


Separated charts

*
Horizon graphs The horizon is the apparent line that separates the surface of a celestial body from its sky when viewed from the perspective of an observer on or near the surface of the relevant body. This line divides all viewing directions based on whether ...
* Reduced line chart (small multiples) * Silhouette graph * Circular silhouette graph


See also


References


Further reading

* * * Durbin J., Koopman S.J. (2001), ''Time Series Analysis by State Space Methods'',
Oxford University Press Oxford University Press (OUP) is the university press of the University of Oxford. It is the largest university press in the world, and its printing history dates back to the 1480s. Having been officially granted the legal right to print books ...
. * * * Priestley, M. B. (1981), ''Spectral Analysis and Time Series'',
Academic Press Academic Press (AP) is an academic book publisher founded in 1941. It was acquired by Harcourt, Brace & World in 1969. Reed Elsevier bought Harcourt in 2000, and Academic Press is now an imprint of Elsevier. Academic Press publishes referen ...
. * * Shumway R. H., Stoffer D. S. (2017), ''Time Series Analysis and its Applications: With R Examples (ed. 4)'', Springer, * Weigend A. S., Gershenfeld N. A. (Eds.) (1994), ''Time Series Prediction: Forecasting the Future and Understanding the Past''. Proceedings of the NATO Advanced Research Workshop on Comparative Time Series Analysis (Santa Fe, May 1992),
Addison-Wesley Addison-Wesley is an American publisher of textbooks and computer literature. It is an imprint of Pearson PLC, a global publishing and education company. In addition to publishing books, Addison-Wesley also distributes its technical titles throu ...
. * Wiener, N. (1949), ''Extrapolation, Interpolation, and Smoothing of Stationary Time Series'',
MIT Press The MIT Press is a university press affiliated with the Massachusetts Institute of Technology (MIT) in Cambridge, Massachusetts (United States). It was established in 1962. History The MIT Press traces its origins back to 1926 when MIT publish ...
. * Woodward, W. A., Gray, H. L. & Elliott, A. C. (2012), ''Applied Time Series Analysis'', CRC Press. *


External links


Introduction to Time series Analysis (Engineering Statistics Handbook)
— A practical guide to Time series analysis. {{DEFAULTSORT:Time Series Statistical data types Mathematical and quantitative methods (economics) Machine learning Mathematics in medicine