HOME

TheInfoList



OR:

In
statistical analysis Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers propertie ...
, change detection or change point detection tries to identify times when the
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
of a
stochastic process In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that ap ...
or
time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Ex ...
changes. In general the problem concerns both detecting whether or not a change has occurred, or whether several changes might have occurred, and identifying the times of any such changes. Specific applications, like step detection and edge detection, may be concerned with changes in the
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set. For a data set, the '' ar ...
,
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
,
correlation In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistic ...
, or
spectral density The power spectrum S_(f) of a time series x(t) describes the distribution of power into frequency components composing that signal. According to Fourier analysis, any physical signal can be decomposed into a number of discrete frequencies ...
of the process. More generally change detection also includes the detection of anomalous behavior: anomaly detection.


Introduction

A
time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Ex ...
measures the progression of one or more quantities over time. For instance, the figure above shows the level of water in the
Nile The Nile, , Bohairic , lg, Kiira , Nobiin: Áman Dawū is a major north-flowing river in northeastern Africa. It flows into the Mediterranean Sea. The Nile is the longest river in Africa and has historically been considered the longest riv ...
river between 1870 and 1970. Change point detection is concerned with identifying whether, and if so ''when'', the behavior of the series changes significantly. In the Nile river example, the volume of water changes significantly after a dam was built in the river. Importantly, anomalous observations that differ from the ongoing behavior of the time series are not generally considered change points as long as the series returns to its previous behavior afterwards. Mathematically, we can describe a time series as an ordered sequence of observations (x_1, x_2, \ldots). We can write the
joint distribution Given two random variables that are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs. The joint distribution can just as well be considered ...
of a subset x_ = (x_a, x_, \ldots, x_) of the time series as p(x_). If the goal is to determine whether a change point occurred at a time \tau in a finite time series of length T, then we really ask whether p(x_) equals p(x_). This problem can be generalized to the case of more than one change point. The problem of change point detection can be narrowed down further into more specific problems. In ''offline'' change point detection it is assumed that a sequence of length T is available and the goal is to identify whether any change point(s) occurred in the series. This is an example of post hoc analysis and is often approached using
hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...
methods. By contrast, ''online'' change point detection is concerned with detecting change points in an incoming data stream.


Algorithms


Online change detection

Using the
sequential analysis In statistics, sequential analysis or sequential hypothesis testing is statistical analysis where the sample size is not fixed in advance. Instead data are evaluated as they are collected, and further sampling is stopped in accordance with a pre- ...
("online") approach, any change test must make a trade-off between these common metrics: * False alarm rate * Misdetection rate * Detection delay In a Bayes change-detection problem, a prior distribution is available for the change time. Online change detection is also done using
streaming algorithm In computer science, streaming algorithms are algorithms for processing data streams in which the input is presented as a sequence of items and can be examined in only a few passes (typically just one). In most models, these algorithms have access t ...
s.


Offline change detection

Basseville (1993, Section 2.6) discusses
offline In computer technology and telecommunications, online indicates a state of connectivity and offline indicates a disconnected state. In modern terminology, this usually refers to an Internet connection, but (especially when expressed "on line" o ...
change-in-mean detection with hypothesis testing based on the works of Page and Picard and maximum-likelihood estimation of the change time, related to two-phase regression. Other approaches employ clustering based on
maximum likelihood estimation In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...
,, use
optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
to infer the number and times of changes, via spectral analysis, or singular spectrum analysis. "Offline" approaches cannot be used on streaming data because they need to compare to statistics of the complete time series, and cannot react to changes in real-time but often provide a more accurate estimation of the change time and magnitude.


Applications of change detection

Change detection tests are often used in manufacturing (
quality control Quality control (QC) is a process by which entities review the quality of all factors involved in production. ISO 9000 defines quality control as "a part of quality management focused on fulfilling quality requirements". This approach place ...
), intrusion detection, spam filtering, website tracking, and medical diagnostics.


Linguistic change detection

Linguistic Linguistics is the scientific study of human language. It is called a scientific study because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language, particularly its nature and structure. Linguis ...
change detection refers to the ability to detect word-level changes across multiple presentations of the same sentence. Researchers have found that the amount of
semantic Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and comput ...
overlap (i.e., relatedness) between the changed word and the new word influences the ease with which such a detection is made (Sturt, Sanford, Stewart, & Dawydiak, 2004). Additional research has found that focussing one's attention to the word that will be changed during the initial reading of the original sentence can improve detection. This was shown using italicized text to focus attention, whereby the word that will be changing is italicized in the original sentence (Sanford, Sanford, Molle, & Emmott, 2006), as well as using clefting constructions such as "''It was the'' tree that needed water." (Kennette, Wurm, & Van Havermaet, 2010). These change-detection phenomena appear to be robust, even occurring cross-linguistically when bilinguals read the original sentence in their
native language A first language, native tongue, native language, mother tongue or L1 is the first language or dialect that a person has been exposed to from birth or within the critical period. In some countries, the term ''native language'' or ''mother to ...
and the changed sentence in their
second language A person's second language, or L2, is a language that is not the native language ( first language or L1) of the speaker, but is learned later. A second language may be a neighbouring language, another language of the speaker's home country, or a ...
(Kennette, Wurm & Van Havermaet, 2010). Recently, researchers have detected word-level changes in semantics across time by computationally analyzing temporal corpora (for example:the word ''"gay"'' ha''s'' acquired a new meaning over time'')'' using change point detection.


See also

*
Structural break In econometrics and statistics, a structural break is an unexpected change over time in the parameters of regression models, which can lead to huge forecasting errors and unreliability of the model in general. This issue was popularised by Da ...
—Change in model structure *
Detection theory Detection theory or signal detection theory is a means to measure the ability to differentiate between information-bearing patterns (called stimulus in living organisms, signal in machines) and random patterns that distract from the information ( ...
*
Hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...
* Recall rate *
Receiver operating characteristic A receiver operating characteristic curve, or ROC curve, is a graph of a function, graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. The method was originally develope ...


References


Further reading

* * {{DEFAULTSORT:Change Detection