HOME

TheInfoList



OR:

Surrogate data testing (or the ''method of surrogate data'') is a statistical
proof by contradiction In logic and mathematics, proof by contradiction is a form of proof that establishes the truth or the validity of a proposition, by showing that assuming the proposition to be false leads to a contradiction. Proof by contradiction is also known as ...
technique and similar to
permutation test A permutation test (also called re-randomization test) is an exact statistical hypothesis test making use of the proof by contradiction. A permutation test involves two or more samples. The null hypothesis is that all samples come from the same di ...
s and as a resampling technique related (but different) to parametric bootstrapping. It is used to detect
non-linearity In mathematics and science, a nonlinear system is a system in which the change of the output is not proportional to the change of the input. Nonlinear problems are of interest to engineers, biologists, physicists, mathematicians, and many othe ...
in a
time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Exa ...
. The technique basically involves specifying a
null hypothesis In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...
H_0 describing a linear process and then generating several surrogate data sets according to H_0 using
Monte Carlo Monte Carlo (; ; french: Monte-Carlo , or colloquially ''Monte-Carl'' ; lij, Munte Carlu ; ) is officially an administrative area of the Principality of Monaco, specifically the ward of Monte Carlo/Spélugues, where the Monte Carlo Casino is ...
methods. A discriminating statistic is then calculated for the original time series and all the surrogate set. If the value of the statistic is significantly different for the original series than for the surrogate set, the null hypothesis is rejected and non-linearity assumed. The particular surrogate data testing method to be used is directly related to the null hypothesis. Usually this is similar to the following: ''The data is a realization of a stationary linear system, whose output has been possibly measured by a monotonically increasing possibly nonlinear (but static) function''. Here ''linear'' means that each value is linearly dependent on past values or on present and past values of some independent identically distributed (i.i.d.) process, usually also Gaussian. This is equivalent to saying that the process is ARMA type. In case of fluxes (continuous mappings), linearity of system means that it can be expressed by a linear differential equation. In this hypothesis, the ''static'' measurement function is one which depends only on the present value of its argument, not on past ones.


Methods

Many algorithms to generate surrogate data have been proposed. They are usually classified in two groups: * ''Typical realizations'': data series are generated as outputs of a well-fitted model to the original data. * ''Constrained realizations'': data series are created directly from original data, generally by some suitable transformation of it. The last surrogate data methods do not depend on a particular model, nor on any parameters, thus they are non-parametric methods. These surrogate data methods are usually based on preserving the linear structure of the original series (for instance, by preserving the
autocorrelation function Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variabl ...
, or equivalently the
periodogram In signal processing, a periodogram is an estimate of the spectral density of a signal. The term was coined by Arthur Schuster in 1898. Today, the periodogram is a component of more sophisticated methods (see spectral estimation). It is the most ...
, an estimate of the sample spectrum). Among constrained realizations methods, the most widely used (and thus could be called the ''classical methods'') are: # Algorithm 0, or RS (for ''Random Shuffle''): New data are created simply by random permutations of the original series. This concept is also used in
permutation test A permutation test (also called re-randomization test) is an exact statistical hypothesis test making use of the proof by contradiction. A permutation test involves two or more samples. The null hypothesis is that all samples come from the same di ...
s. The permutations guarantee the same amplitude distribution as the original series, but destroy any temporal correlation that may have been in the original data. This method is associated to the null hypothesis of the data being uncorrelated i.i.d noise (possibly Gaussian and measured by a static nonlinear function). # Algorithm 1, or RP (for ''Random Phases''; also known as FT, for
Fourier Transform A Fourier transform (FT) is a mathematical transform that decomposes functions into frequency components, which are represented by the output of the transform as a function of frequency. Most commonly functions of time or space are transformed, ...
): In order to preserve the linear correlation (the periodogram) of the series, surrogate data are created by the inverse Fourier Transform of the modules of Fourier Transform of the original data with new (uniformly random) phases. If the surrogates must be real, the Fourier phases must be antisymmetric with respect to the central value of data. # Algorithm 2, or AAFT (for ''Amplitude Adjusted Fourier Transform''): This method has approximately the advantages of the two previous ones: it tries to preserve both the linear structure and the amplitude distribution. This method consists of these steps: #* Scaling the data to a Gaussian distribution (''Gaussianization''). #* Performing a RP transformation of the new data. #* Finally doing a transformation inverse of the first one (''de-Gaussianization''). #:The drawback of this method is precisely that the last step changes somewhat the linear structure. # Iterative algorithm 2, or IAAFT (for ''Iterative Amplitude Adjusted Fourier Transform''): This algorithm is an iterative version of AAFT. The steps are repeated until the autocorrelation function is sufficiently similar to the original, or until there is no change in the amplitudes. Many other surrogate data methods have been proposed, some based on optimizations to achieve an autocorrelation close to the original one, some based on wavelet transform and some capable of dealing with some types of non-stationary data. The above mentioned techniques are called linear surrogate methods, because they are based on a linear process and address a linear null hypothesis. Broadly speaking, these methods are useful for data showing irregular fluctuations (short-term variabilities) and data with such a behaviour abound in the real world. However, we often observe data with obvious periodicity, for example, annual sunspot numbers, electrocardiogram (ECG) and so on. Time series exhibiting strong periodicities are clearly not consistent with the linear null hypotheses. To tackle this case, some algorithms and null hypotheses have been proposed.


See also

*
Resampling (statistics) In statistics, resampling is the creation of new samples based on one observed sample. Resampling methods are: # Permutation tests (also re-randomization tests) # Bootstrapping # Cross validation Permutation tests Permutation tests rely on r ...
*
Permutation test A permutation test (also called re-randomization test) is an exact statistical hypothesis test making use of the proof by contradiction. A permutation test involves two or more samples. The null hypothesis is that all samples come from the same di ...


References

{{reflist, 2 Nonlinear time series analysis Statistical tests