Kolmogorov–Zurbenko Filter
   HOME

TheInfoList



OR:

Within
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, the Kolmogorov–Zurbenko (KZ) filter was first proposed by A. N.
Kolmogorov Andrey Nikolaevich Kolmogorov ( rus, Андре́й Никола́евич Колмого́ров, p=ɐnˈdrʲej nʲɪkɐˈlajɪvʲɪtɕ kəlmɐˈɡorəf, a=Ru-Andrey Nikolaevich Kolmogorov.ogg, 25 April 1903 – 20 October 1987) was a Sovi ...
and formally defined by Zurbenko.I. Zurbenko. The Spectral Analysis of Time Series. North-Holland Series in Statistics and Probability, 1986. It is a series of
iteration Iteration is the repetition of a process in order to generate a (possibly unbounded) sequence of outcomes. Each repetition of the process is a single iteration, and the outcome of each iteration is then the starting point of the next iteration. ...
s of a
moving average In statistics, a moving average (rolling average or running average) is a calculation to analyze data points by creating a series of averages of different subsets of the full data set. It is also called a moving mean (MM) or rolling mean and is ...
filter of length ''m'', where ''m'' is a positive, odd integer. The KZ filter belongs to the class of
low-pass filter A low-pass filter is a filter that passes signals with a frequency lower than a selected cutoff frequency and attenuates signals with frequencies higher than the cutoff frequency. The exact frequency response of the filter depends on the filter des ...
s. The KZ filter has two parameters, the length ''m'' of the moving average window and the number of iterations ''k'' of the moving average itself. It also can be considered as a special
window function In signal processing and statistics, a window function (also known as an apodization function or tapering function) is a mathematical function that is zero-valued outside of some chosen interval, normally symmetric around the middle of the inte ...
designed to eliminate spectral leakage.


Background

A. N. Kolmogorov had the original idea for the KZ filter during a study of
turbulence In fluid dynamics, turbulence or turbulent flow is fluid motion characterized by chaotic changes in pressure and flow velocity. It is in contrast to a laminar flow, which occurs when a fluid flows in parallel layers, with no disruption between ...
in the Pacific Ocean. Kolmogorov had just received the International
Balzan Prize The International Balzan Prize Foundation awards four annual monetary prizes to people or organizations who have made outstanding achievements in the fields of humanities, natural sciences, culture, as well as for endeavours for peace and the br ...
for his law of 5/3 in the energy spectra of turbulence. Surprisingly the 5/3 law was not obeyed in the Pacific Ocean, causing great concern. Standard
fast Fourier transform A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). Fourier analysis converts a signal from its original domain (often time or space) to a representation in th ...
(FFT) was completely fooled by the noisy and non-stationary ocean environment. KZ filtration resolved the problem and enabled proof of Kolmogorov's law in that domain. Filter construction relied on the main concepts of the continuous
Fourier transform A Fourier transform (FT) is a mathematical transform that decomposes functions into frequency components, which are represented by the output of the transform as a function of frequency. Most commonly functions of time or space are transformed, ...
and their discrete analogues. The
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
of the KZ filter came from the definition of higher-order derivatives for discrete functions as higher-order differences. Believing that infinite smoothness in the
Gaussian window In discrete-time signal processing, windowing is a preliminary signal shaping technique, usually applied to improve the appearance and usefulness of a subsequent Discrete Fourier Transform. Several ''window functions'' can be defined, based on a ...
was a beautiful but unrealistic approximation of a truly discrete world, Kolmogorov chose a finitely
differentiable In mathematics, a differentiable function of one real variable is a function whose derivative exists at each point in its domain. In other words, the graph of a differentiable function has a non-vertical tangent line at each interior point in its ...
tapering window with finite support, and created this mathematical construction for the discrete case. The KZ filter is robust and nearly optimal. Because its operation is a simple Moving Average (MA), the KZ filter performs well in a missing data environment, especially in multidimensional time series where missing data problem arises from spatial sparseness. Another nice feature of the KZ filter is that the two parameters have clear interpretation so that it can be easily adopted by specialists in different areas. A few software packages for time series, longitudinal and spatial data have been developed in the popular statistical software R, which facilitate the use of the KZ filter and its extensions in different areas. I.Zurbenko Postdoctoral position at UC Berkeley with
Jerzy Neyman Jerzy Neyman (April 16, 1894 – August 5, 1981; born Jerzy Spława-Neyman; ) was a Polish mathematician and statistician who spent the first part of his professional career at various institutions in Warsaw, Poland and then at University College ...
and Elizabeth Scott provided a lot of ideas of applications supported in contacts with Murray Rosenblatt, Robert Shumway,
Harald Cramér Harald Cramér (; 25 September 1893 – 5 October 1985) was a Swedish mathematician, actuary, and statistician, specializing in mathematical statistics and probabilistic number theory. John Kingman described him as "one of the giants of statist ...
, David Brillinger,
Herbert Robbins Herbert Ellis Robbins (January 12, 1915 – February 12, 2001) was an American mathematician and statistician. He did research in topology, measure theory, statistics, and a variety of other fields. He was the co-author, with Richard Courant ...
,
Wilfrid Dixon Wilfrid Joseph Dixon (December 13, 1915 – September 20, 2008) was an American mathematician and statistician. He made notable contributions to nonparametric statistics, statistical education and experimental design. A native of Portland, ...
,
Emanuel Parzen Emanuel Parzen (April 21, 1929 – February 6, 2016) was an American statistician. He worked and published on signal detection theory and time series analysis, where he pioneered the use of kernel density estimation (also known as the Parzen win ...
.


Definition

KZ Filter Let \, t=0, \pm 1, \pm 2, \dots be a
real-valued In mathematics, value may refer to several, strongly related notions. In general, a mathematical value may be any definite mathematical object. In elementary mathematics, this is most often a number – for example, a real number such as or an i ...
time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Exa ...
, the KZ filter with
parameter A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
s m and k is defined as : KZ_ (t)= \sum_^ where coefficients : a_s^=\frac,s=\frac,\dots,\frac are given by the
polynomial In mathematics, a polynomial is an expression consisting of indeterminates (also called variables) and coefficients, that involves only the operations of addition, subtraction, multiplication, and positive-integer powers of variables. An exa ...
coefficient In mathematics, a coefficient is a multiplicative factor in some term of a polynomial, a series, or an expression; it is usually a number, but may be any expression (including variables such as , and ). When the coefficients are themselves var ...
s obtained from equation : \sum_^ From another point of view, the KZ filter with parameters m and k can be defined as k time iterations of a moving average (MA) filter of m points. It can be obtained through iterations. First iteration is to apply a MA filter over process X(t) : KZ_ (t) \sum_^ \times \frac The second iteration is to apply the MA operation to the result of the first iteration, : \begin & KZ_ (t)\sum_^ \\ = & \sum_^ \end Generally the ''k''th iteration is an application of the MA filter to the (''k'' − 1)th iteration. The iteration process of a simple operation of MA is very convenient computationally.


Properties

The impulse response function of the product of filters is the convolution of impulse responses. The coefficients of the KZ filter , can be interpreted as a
distribution Distribution may refer to: Mathematics *Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations * Probability distribution, the probability of a particular value or value range of a vari ...
obtained by the
convolution In mathematics (in particular, functional analysis), convolution is a operation (mathematics), mathematical operation on two function (mathematics), functions ( and ) that produces a third function (f*g) that expresses how the shape of one is ...
of ''k'' uniform discrete distributions on the interval where ''m'' is an odd integer. Therefore, the coefficient forms a tapering window which has
finite support In mathematics, the support of a real-valued function f is the subset of the function domain containing the elements which are not mapped to zero. If the domain of f is a topological space, then the support of f is instead defined as the small ...
. The KZ filter has main weight concentrated on a length of with weights vanishing to zero outside. The impulse response function of the KZ filter has continuous derivatives and is asymptotically Gaussian distributed. Zero derivatives at the edges for the
impulse response function In signal processing and control theory, the impulse response, or impulse response function (IRF), of a dynamic system is its output when presented with a brief input signal, called an impulse (). More generally, an impulse response is the reacti ...
make from it a sharply declining function, what is resolving in high frequency resolution. The energy
transfer function In engineering, a transfer function (also known as system function or network function) of a system, sub-system, or component is a function (mathematics), mathematical function that mathematical model, theoretically models the system's output for ...
of the KZ filter is : , B_(\omega), ^2=\left\^. It is a low-pass filter with a cut-off frequency of : \omega_0 \approx \frac \sqrt. Compared to a MA filter, the KZ filter has much better performance in terms of attenuating the frequency components above the cutoff frequency. The KZ filter is essentially a repetitive MA filter. It is easy to compute and allows for a straightforward way to deal with missing data. The main piece of this procedure is a simple average of available information within the interval of m points disregarding the missing observations within the interval. The same idea can be easily extended to spatial data analysis. It has been shown that missing values have very little effect on the transfer function of the KZ filter. Arbitrary ''k'' will provide ''k'' power of this transfer function and will reduce side lobe value to . It will be a perfect low pass filter. For practical purposes a choice of ''k'' within a range 3 to 5 is usually sufficient, when regular MA (''k'' = 1) is providing strong spectral leakage of about 5%.


Optimality

The KZ filter is robust and nearly optimal. Because its operation is a simple moving average, the KZ filter performs well in a missing data environment, especially in multidimensional time and space where missing data can cause problems arising from spatial sparseness. Another nice feature of the KZ filter is that the two parameters each have clear interpretations so that it can be easily adopted by specialists in different areas. Software implementations for time series, longitudinal and spatial data have been developed in the popular statistical package R, which facilitate the use of the KZ filter and its extensions in different areas. KZ filter can be used to smooth the
periodogram In signal processing, a periodogram is an estimate of the spectral density of a signal. The term was coined by Arthur Schuster in 1898. Today, the periodogram is a component of more sophisticated methods (see spectral estimation). It is the most co ...
. For a class of
stochastic process In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that appea ...
es, Zurbenko considered the worst-case scenario where the only information available about a process is its spectral density and smoothness quantified by
Hölder condition In mathematics, a real or complex-valued function ''f'' on ''d''-dimensional Euclidean space satisfies a Hölder condition, or is Hölder continuous, when there are nonnegative real constants ''C'', α > 0, such that : , f(x) - f(y) , \leq C\, ...
. He derived the optimal bandwidth of the spectral window, which is dependent upon the underlying smoothness of the spectral density. Zurbenko compared the performance of Kolmogorov–Zurbenko (KZ) window to the other typically used spectral windows including Bartlett window,
Parzen window In statistics, kernel density estimation (KDE) is the application of kernel smoothing for probability density estimation, i.e., a non-parametric method to estimate the probability density function of a random variable based on ''kernels'' as wei ...
, Tukey–Hamming window and uniform window and showed that the result from KZ window is closest to optimum. Developed as an abstract discrete construction, KZ filtration is robust and statistically nearly optimal. At the same time, because of its natural form, it has computational advantages, permitting analysis of space/time problems with data that has much as 90% of observations missing, and which represent a messy combination of several different physical phenomena. Clear answers can often be found for "unsolvable" problems. Unlike some mathematical developments, KZ is adaptable by specialists in different areas because it has a clear physical interpretation behind it.


Extensions

Extensions of KZ filter include KZ adaptive (KZA) filter, spatial KZ filter and KZ Fourier transform (KZFT). Yang and Zurbenko provided a detailed review of KZ filter and its extensions. R packages are also available to implement KZ filtrationB.Close, I.Zurbenko, Kolmogorov–Zurbenko adaptive algorithm, Proceedings JSM, 2011 KZFT KZFT filter is design for a reconstruction of periodic signals or seasonality covered by heavy noise. Seasonality is one of the key forms of nonstationarity that is often seen in time series. It is usually defined as the periodic components within the time series. Spectral analysis is a powerful tool to analyze time series with seasonality. If a process is stationary, its spectrum is a continuous form as well. It can be treated parametrically for simplicity of prediction. If a spectrum contains lines, it indicates that the process is not stationary and contains periodicities. In this situation, parametric fitting generally results in seasonal residuals with reduced energies. This is due to the season to season variations. To avoid this problem, nonparametric approaches including band pass filters are recommended. Kolmogorov–Zurbenko Fourier Transform (KZFT) is one of such filters. The purpose of many applications is to reconstruct high resolution wavelet from the noisy environment. It was proven that KZFT provides the best possible resolution in spectral domain. It permits the separation of two signals on the edge of a theoretically smallest distance, or reconstruct periodic signals covered by heavy noise and irregularly observed in time. Because of this, KZFT provides a unique opportunity for various applications. A computer algorithm to implement the KZFT has been provided in the R software. The KZFT is essentially a band pass filter that belongs to the category of
short-time Fourier transform The short-time Fourier transform (STFT), is a Fourier-related transform used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time. In practice, the procedure for computing STFTs is to divid ...
(STFT) with a unique time window. KZFT readily uncovers small deviations from a constant spectral density of
white noise In signal processing, white noise is a random signal having equal intensity at different frequencies, giving it a constant power spectral density. The term is used, with this or similar meanings, in many scientific and technical disciplines, ...
resulting from computer random numbers generator. Such computer random number generations become predictable in the long run.
Kolmogorov complexity In algorithmic information theory (a subfield of computer science and mathematics), the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program (in a predetermined programming language) that produ ...
provides the opportunity to generate unpredictable sequences of random numbers. Formally, we have a process ,, the KZFT filter with parameters ''m'' and ''k'', computed at frequency ''ν''0, produces an output process, which is defined as following: : KZFT_ (t)= \sum_^ where is defined as: , ,..., and the polynomial coefficients is given by . Apparently filter is equivalent to the application of filter to the process . Similarly, the KZFT filter can be obtained through iterations in the same way as KZ filter. The average of the square of KZFT in time over ''S'' periods of will provide an estimate of the square amplitude of the wave at frequency ''ν''0 or KZ periodogram (KZP) based on 2''Sρ''0 observations around moment ''t'': : \operatorname(t,m,k,\nu_0) = 2\left, \frac \sum_^ 2\operatorname ZFT_[X(\tau+t)^2 \ Transfer function of KZFT is provided in Figure 2 has a very sharp frequency resolution with bandwidth limited by . For a complex-valued process , the outcome is unchanged. For a real-valued process, it distributes energy evenly over the real and complex domains. In other words, reconstructs a cosine or sine wave at the same frequency ν0. It follows that correctly reconstructs the amplitude and phase of an unknown wave with frequency ν0. Figure below is providing power transfer function of KZFT filtration. It clearly display that it perfectly captured frequency of interest ν0 = 0.4 and provide practically no spectral leakage from a side lobes which control by parameter ''k'' of filtration. For practical purposes choice of ''k'' within range 3–5 is usually sufficient, when regular FFT (''k'' = 1) is providing strong leakage of about 5%. Example: Simulated signal
normal random noise N(0,16) was used to test the KZFT algorithm's ability to accurately determine spectra of datasets with missing values. For practical considerations, the percentage of missing values was used at p=70% to determine if the spectrum could continue to capture the dominant frequencies. Using a wider window length of m=600 and k=3 iterations, adaptively smoothed KZP algorithm was used to determine the spectrum for the simulated longitudinal dataset. It is apparent in Figure 3 that the dominant frequencies of 0.08 and 0.10 cycles per unit time are identifiable as the signal's inherent frequencies. KZFT reconstruction of original signal embedded in the high noise of longitudinal observations ( missing rate 60%.) The KZFT filter in the KZA package of R-software has a parameter ''f'' = frequency. By defining this parameter for each of the known dominant frequencies found in the spectrum, KZFT filter with parameters m=300 and k=3 to reconstruct the signal about each frequency (0.08 and 0.10 cycles per unit time). The reconstructed signal was determined by applying the KZFT filter twice (once about each dominant frequency) and then the summing the results of each filter. The correlation between the true signal and the reconstructed signal was 96.4%; displayed in figure 4. The original observations provide no guess of the complex, hidden periodicity, which was perfectly reconstructed by the algorithm. Raw data frequently contain hidden frequencies. Combinations of a few fixed frequency waves can complicate the recognition of the mixture of signals, but still remain predictable over time. Publications show that atmospheric pressure contains hidden periodicities resulting from the gravitational force of the moon and the daily period of the sun. The reconstruction of these periodic signals of atmospheric tidal waves allows for an explanation and prediction of many anomalies present in extreme weather. Similar tidal waves must exist on the sun resulting from the gravitational force of planets. The rotation of the sun around its axes will cause a current, similar to the equatorial current on the earth. Perturbations or eddies around the current will cause anomalies on the surface of the sun. Horizontal rotational eddies in highly magnetic plasma will create a vertical explosion which will transport deeper, hotter plasma to above the surface of the sun. Each planet creates a tidal wave with a specific frequency on the sun. At times any two of the waves will occur in phase and other times will be out of phase. The resulting amplitude will oscillate with a difference frequency. The estimation of the spectra of sunspot data using the DZ algorithm provides two sharp frequency lines with periodicities close to 9.9 and 11.7 years. These frequency lines can be considered as difference frequencies caused by Jupiter and Saturn (9.9) and Venus and Earth (11.7). The difference frequency between 9.9 and 11.7 yields a frequency with a 64-year period. All of these periods are identifiable in sunspot data. The 64-year period component is currently in a declining mode. This decline may cause a cooling effect on the earth in the near future. An examination of the joint effect of multiple planets may reveal some long periods in sun activity and help explain climate fluctuations on earth. KZA Adaptive version of KZ filter, called KZ adaptive (KZA) filter, was developed for a search of breaks in nonparametric signals covered by heavy noise.. The KZA filter first identifies potential time intervals when a break occurs. It then examines these time intervals more carefully by reducing the window size so that the resolution of the smoothed outcome increases. As an example of break point detection, we simulate a long-term trend containing a break buried in seasonality and noise. Figure 2 is a plot of a seasonal sine wave with amplitude of 1 unit, normally distributed noise (), and a base signal with a break. To make things more challenging, the base signal contains an overall downward trend of 1 unit and an upward break of 0.5 units. The downward trend and break are hardly visible in the original data. The base signal is a step function , with and with . The application of a low-pass smoothing filter to the original data results in an over smoothing of the break as shown in Figure 6. The position of the break is no longer obvious. The application of an adaptive version of the KZ filter (KZA) finds the break as shown in Figure 5b. The construction of KZA is based on an adaptive version of the iterated smoothing filter KZ. The idea is to change the size of the filtering window based on the trends found with KZ. This will cause the filter to zoom in on the areas where the data is changing; the more rapid the change, the tighter the zoom will be. The first step in constructing KZA is to use KZ; where ''k'' is iterations and ''q'' is the filter length, where is an iterated moving average where are the original data and are the filtered data. This result is used to build an adaptive version of the filter. The filter is composed of a head and tail (''q''''f'' and ''q''''b'') respectively, with f = head and b = tail) that adjust in size in response to the data, effectively zooming in on regions where the data are changing rapidly. The head ''q''''f'' shrinks in response to the break in the data. The difference vector built from KZ; is used to find the discrete equivalent of the derivative '. This result determines the sizes of the head and the tail (''q''''f'' and ''q''''b'' respectively) of the filtering window. If the slope is positive the head will shrink and the tail will expand to full size ('(, then and ) with . If the slope is negative the head of the window will be full sized while the tail will shrink (', then and . Detailed code of KZA is available. The KZA algorithm has all of the typical advantages of a nonparametric approach; it does not require any specific model of the time series under investigation. It searches for sudden changes over a low frequency signal of any nature covered by heavy noise. KZA shows very high sensitivity for break detection, even with a very low signal-to-noise ratio; the accuracy of the detection of the time of the break is also very high. The KZA algorithm can be applied to restore noisy two-dimensional images. This could be a two-level function f(x,y) as a black-and-white picture damaged by strong noise, or a multilevel color picture. KZA can be applied line by line to detect the break (change of color), then the break points at different lines would be smoothed by the regular KZ filter. Demonstration of spatial KZA is provided in Figure 7. Determinations of sharp frequency lines in the spectra can be determine by adaptively smoothed periodogram. The central idea of the algorithm is adaptively smoothing the logarithm of a KZ periodogram. The range of smoothing is provided by some fixed percentage of conditional entropy from total Entropy (information theory), entropy. Roughly speaking, the algorithm operates uniformly on an information scale rather than a frequency scale. This algorithm is also known for parameter k=1 in KZP as Dirienzo-Zurbenko algorithm and provided in software. Spatial KZ filter Spatial KZ filter can be applied to the variable recorded in time and space. Parameters of the filter can be chosen separately in time and space. Usually physical sense can be applied what scale of averaging is reasonable in space and what scale of averaging is reasonable in time. Parameter k is controlling sharpness of resolution of the filter or suppression of leak of frequencies. An algorithms for spatial KZ filter are available in R software. Outcome time parameter can be treated as virtual time, then images of results of filtration in space can be displayed as "movie" in virtual time. We may demonstrate application of 3D spatial KZ filter applied to the world records of temperature ''T''(''t'', ''x'', ''y'') as a function of time ''t'', longitude ''x'' and latitude ''y''. To select global climate fluctuations component parameters 25 month for time ''t'', 3° for longitude and latitude were chosen for KZ filtration. Parameter ''k'' were chosen equal 5 to accommodate resolutions of scales. Single slide of outcome "movie" is provided in Figure 8 below. Standard average cosine square temperature distribution low along latitudes were subtracted to identify fluctuations of climate in time and space. We can see anomalies of temperature fluctuations from cosine square law over globe for 2007. Temperature anomalies are displayed over globe in the provided in figure scale on the right. It displays very high positive anomaly over Europe and North Africa, which were extending over last 100 years. Absolute humidity variable is keeping responsibility for major regional climate changings as it was displayed recently by Zurbenko Igor and Smith Devin in Kolmogorov–Zurbenko filters in spatiotemporal analysis. Those anomalies are slowly changing in time in the outcome "movie" of KZ filtration, slow intensification of observed anomalies were identified in time. Different scales fluctuations like El Niño scale and others are also can be identified by spatial KZ filtration. High definition "movie" of those scales are provided in over North America. Different scales can be selected by KZ filtration for a different variable and corresponding multivariate analysis can provide high efficiency results for investigating outcome variable over other covariates. KZ filter resolution performs exceptionally well compare to conventional methods and in fact is computationally optimal.


Implementations

*W. Yang and I. Zurbenko. kzft: Kolmogorov–Zurbenko Fourier transform and application. R package, 2006. *B. Close and I. Zurbenko. kza: Kolmogorov–Zurbenko adaptive algorithm for the image detection. R package, 2016 (https://cran.r-project.org/web/packages/kza/) *KZ and KZA Java implementation for 1-dimensional arrays of Andreas Weiler and Michael Grossniklaus (University of Konstanz, Germany) (https://web.archive.org/web/20140914054417/http://dbis.uni-konstanz.de/research/social-media-stream-analysis/) *Python implementation of KZ, KZFT and KZP of Mathieu Schopfer (University of Lausanne, Switzerland) (https://github.com/MathieuSchopfer/kolmogorov-zurbenko-filter)


References

{{DEFAULTSORT:Kolmogorov-Zurbenko filter Statistical signal processing Filter theory