Lomb-Scargle
   HOME

TheInfoList



OR:

Least-squares spectral analysis (LSSA) is a method of estimating a frequency spectrum, based on a
least squares The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the res ...
fit of
sinusoid A sine wave, sinusoidal wave, or just sinusoid is a mathematical curve defined in terms of the ''sine'' trigonometric function, of which it is the graph. It is a type of continuous wave and also a smooth periodic function. It occurs often in ma ...
s to data samples, similar to
Fourier analysis In mathematics, Fourier analysis () is the study of the way general functions may be represented or approximated by sums of simpler trigonometric functions. Fourier analysis grew from the study of Fourier series, and is named after Josep ...
. Fourier analysis, the most used spectral method in science, generally boosts long-periodic noise in long gapped records; LSSA mitigates such problems. Unlike with Fourier analysis, data need not be equally spaced to use LSSA. LSSA is also known as the Vaníček method or the Gauss-Vaniček method after
Petr Vaníček Petr Vaníček (born 18 July 1935) is a Czech Canadian geodesist and theoretical List of geophysicists, geophysicist who has made important breakthroughs in theory of frequency spectrum#Spectrum analysis, spectral analysis and geoid computation. ...
, and as the Lomb method or the Lomb–Scargle periodogram, based on the contributions of Nicholas R. Lomb and, independently, Jeffrey D. Scargle.


Historical background

The close connections between
Fourier analysis In mathematics, Fourier analysis () is the study of the way general functions may be represented or approximated by sums of simpler trigonometric functions. Fourier analysis grew from the study of Fourier series, and is named after Josep ...
, the periodogram, and least-squares fitting of sinusoids have long been known. Most developments, however, are restricted to complete data sets of equally spaced samples. In 1963, Freek J. M. Barning of Mathematisch Centrum, Amsterdam, handled unequally spaced data by similar techniques, including both a periodogram analysis equivalent to what is now referred to the Lomb method, and least-squares fitting of selected frequencies of sinusoids determined from such periodograms, connected by a procedure that is now known as
matching pursuit Matching pursuit (MP) is a sparse approximation algorithm which finds the "best matching" projections of multidimensional data onto the span of an over-complete (i.e., redundant) dictionary D. The basic idea is to approximately represent a signal ...
with post-backfitting or orthogonal matching pursuit.
Petr Vaníček Petr Vaníček (born 18 July 1935) is a Czech Canadian geodesist and theoretical List of geophysicists, geophysicist who has made important breakthroughs in theory of frequency spectrum#Spectrum analysis, spectral analysis and geoid computation. ...
, a Canadian geodesist of the University of New Brunswick, also proposed the matching-pursuit approach, which he called "successive spectral analysis" and the result a "least-squares periodogram", with equally and unequally spaced data, in 1969. He generalized this method to account for systematic components beyond a simple mean, such as a "predicted linear (quadratic, exponential, ...) secular trend of unknown magnitude", and applied it to a variety of samples, in 1971. The Vaníček method was then simplified in 1976 by Nicholas R. Lomb of the University of Sydney, who pointed out its close connection to periodogram analysis. The definition of a periodogram of unequally spaced data was subsequently further modified and analyzed by Jeffrey D. Scargle of NASA Ames Research Center, who showed that with minor changes it could be made identical to Lomb's least-squares formula for fitting individual sinusoid frequencies. Scargle states that his paper "does not introduce a new detection technique, but instead studies the reliability and efficiency of detection with the most commonly used technique, the periodogram, in the case where the observation times are unevenly spaced," and further points out in reference to least-squares fitting of sinusoids compared to periodogram analysis, that his paper "establishes, apparently for the first time, that (with the proposed modifications) these two methods are exactly equivalent." Press summarizes the development this way: In 1989, Michael J. Korenberg of
Queen's University Queen's or Queens University may refer to: *Queen's University at Kingston, Ontario, Canada *Queen's University Belfast, Northern Ireland, UK **Queen's University of Belfast (UK Parliament constituency) (1918–1950) **Queen's University of Belfast ...
in Kingston, Ontario, developed the "fast orthogonal search" method of more quickly finding a near-optimal decomposition of spectra or other problems, similar to the technique that later became known as orthogonal matching pursuit.


Method variants


The Vaníček method

In the Vaníček method, a discrete data set is approximated by a weighted sum of sinusoids of progressively determined frequencies, using a standard
linear regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is call ...
, or least-squares fit.Wells, D.E., P. Vaníček, S. Pagiatakis, 1985. Least-squares spectral analysis revisited. Department of Surveying Engineering Technical Report 84, University of New Brunswick, Fredericton, 68 pages, Available a

The frequencies are chosen using a method similar to Barning's, but going further in optimizing the choice of each successive new frequency by picking the frequency that minimizes the residual after least-squares fitting (equivalent to the fitting technique now known as
matching pursuit Matching pursuit (MP) is a sparse approximation algorithm which finds the "best matching" projections of multidimensional data onto the span of an over-complete (i.e., redundant) dictionary D. The basic idea is to approximately represent a signal ...
with pre-backfitting). The number of sinusoids must be less than or equal to the number of data samples (counting sines and cosines of the same frequency as separate sinusoids). A data vector ''Φ'' is represented as a weighted sum of sinusoidal basis functions, tabulated in a matrix A by evaluating each function at the sample times, with weight vector ''x'': :\phi \approx \textbfx where the weight vector ''x'' is chosen to minimize the sum of squared errors in approximating ''Φ''. The solution for ''x'' is closed-form, using standard
linear regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is call ...
: :x = (\textbf^\textbf)^\textbf^\phi. Here the matrix A can be based on any set of functions that are mutually independent (not necessarily orthogonal) when evaluated at the sample times; for spectral analysis, the functions used are typically sines and cosines evenly distributed over the frequency range of interest. If too many frequencies are chosen in a too-narrow frequency range, the functions will not be sufficiently independent, the matrix will be badly conditioned, and the resulting spectrum will not be meaningful. When the basis functions in A are orthogonal (that is, not correlated, meaning the columns have zero pair-wise dot products), the matrix ATA is a diagonal matrix; when the columns all have the same power (sum of squares of elements), then that matrix is an
identity matrix In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere. Terminology and notation The identity matrix is often denoted by I_n, or simply by I if the size is immaterial o ...
times a constant, so the inversion is trivial. The latter is the case when the sample times are equally spaced and the sinusoids are chosen to be sines and cosines equally spaced in pairs on the frequency interval 0 to a half cycle per sample (spaced by 1/N cycle per sample, omitting the sine phases at 0 and maximum frequency where they are identically zero). This particular case is known as the discrete Fourier transform, slightly rewritten in terms of real data and coefficients. :x = \textbf^\phi     (DFT case for ''N'' equally spaced samples and frequencies, within a scalar factor)


The Lomb method

Trying to lower the computational burden of the Vaníček method in 1976 (no longer an issue), Lomb proposed using the above simplification in general, except for pair-wise correlations between sine and cosine bases of the same frequency, since the correlations between pairs of sinusoids are often small, at least when they are not too closely spaced. This is essentially the traditional periodogram formulation, but now adopted for use with unevenly spaced samples. The vector ''x'' is a good estimate of an underlying spectrum, but since correlations are ignored, A''x'' is no longer a good approximation to the signal, and the method is no longer a least-squares method – yet it has continued to be referred to as such. Rather than just taking dot products of the data with sine and cosine waveforms directly, Scargle modified the standard periodogram formula to first find a time delay \tau such that this pair of sinusoids would be mutually orthogonal at sample times t_j, and also adjusted for the potentially unequal powers of these two basis functions, to obtain a better estimate of the power at a frequency, which made his modified periodogram method exactly equivalent to Lomb's method. The time delay \tau is defined by the formula :\tan = \frac. The periodogram at frequency \omega is then estimated as: :P_x(\omega) = \frac \left( \frac + \frac \right) which Scargle reports then has the same statistical distribution as the periodogram in the evenly sampled case. At any individual frequency \omega, this method gives the same power as does a least-squares fit to sinusoids of that frequency, of the form :\phi(t) = A \sin \omega t + B \cos \omega t.


The generalized Lomb–Scargle periodogram

The standard Lomb–Scargle periodogram is valid for a model with zero mean. Commonly, this is approximated by subtracting the mean of the data before calculating the periodogram. However, this is an inaccurate assumption when the mean of the model (the fitted sinusoids) is non-zero. The ''generalized'' Lomb–Scargle periodogram removes this assumption, and explicitly solves for the mean. In this case, the function fitted is :\phi(t) = A \sin \omega t + B \cos \omega t + C. The generalized Lomb–Scargle periodogram has also been referred to as a ''floating mean periodogram''.


Korenberg's "fast orthogonal search" method

Michael Korenberg of
Queen's University Queen's or Queens University may refer to: *Queen's University at Kingston, Ontario, Canada *Queen's University Belfast, Northern Ireland, UK **Queen's University of Belfast (UK Parliament constituency) (1918–1950) **Queen's University of Belfast ...
in Kingston, Ontario, developed a method for choosing a sparse set of components from an over-complete set, such as sinusoidal components for spectral analysis, called fast orthogonal search (FOS). Mathematically, FOS uses a slightly modified Cholesky decomposition in a mean-square error reduction (MSER) process, implemented as a sparse matrix inversion. As with the other LSSA methods, FOS avoids the major shortcoming of discrete Fourier analysis, and can achieve highly accurate identifications of embedded periodicities and excels with unequally spaced data; the fast orthogonal search method has also been applied to other problems such as nonlinear system identification.


Palmer's Chi-squared method

Palmer has developed a method for finding the best-fit function to any chosen number of harmonics, allowing more freedom to find non-sinusoidal harmonic functions. This method is a fast technique ( FFT-based) for doing weighted least-squares analysis on arbitrarily spaced data with non-uniform standard errors. Source code that implements this technique is available. Because data are often not sampled at uniformly spaced discrete times, this method "grids" the data by sparsely filling a time series array at the sample times. All intervening grid points receive zero statistical weight, equivalent to having infinite error bars at times between samples.


Applications

The most useful feature of the LSSA method is enabling incomplete records to be spectrally analyzed, without the need to manipulate the record or to invent otherwise non-existent data. Magnitudes in the LSSA spectrum depict the contribution of a frequency or period to the variance of the time series. Generally, spectral magnitudes defined in the above manner enable the output's straightforward significance level regime.Beard, A.G., Williams, P.J.S., Mitchell, N.J. & Muller, H.G. A special climatology of planetary waves and tidal variability, J Atm. Solar-Ter. Phys. 63 (09), p.801–811 (2001). Alternatively, magnitudes in the Vaníček spectrum can also be expressed in dB.Pagiatakis, S. Stochastic significance of peaks in the least-squares spectrum, J of Geodesy 73, p.67-78 (1999). Note that magnitudes in the Vaníček spectrum follow β-distribution.
Inverse transformation Inverse or invert may refer to: Science and mathematics * Inverse (logic), a type of conditional sentence which is an immediate inference made from another conditional sentence * Additive inverse (negation), the inverse of a number that, when a ...
of Vaníček's LSSA is possible, as is most easily seen by writing the forward transform as a matrix; the matrix inverse (when the matrix is not singular) or pseudo-inverse will then be an inverse transformation; the inverse will exactly match the original data if the chosen sinusoids are mutually independent at the sample points and their number is equal to the number of data points.Craymer, M.R.
''The Least Squares Spectrum, Its Inverse Transform and Autocorrelation Function: Theory and Some Applications in Geodesy''
Ph.D. Dissertation, University of Toronto, Canada (1998).
No such inverse procedure is known for the periodogram method.


Implementation

The LSSA can be implemented in less than a page of MATLAB code. In essence:
"to compute the least-squares spectrum we must compute ''m'' spectral values ... which involves performing the least-squares approximation ''m'' times, each time to get
he spectral power He or HE may refer to: Language * He (pronoun), an English pronoun * He (kana), the romanization of the Japanese kana へ * He (letter), the fifth letter of many Semitic alphabets * He (Cyrillic), a letter of the Cyrillic script called ''He'' in ...
for a different frequency"
I.e., for each frequency in a desired set of frequencies,
sine In mathematics, sine and cosine are trigonometric functions of an angle. The sine and cosine of an acute angle are defined in the context of a right triangle: for the specified angle, its sine is the ratio of the length of the side that is oppo ...
and cosine functions are evaluated at the times corresponding to the data samples, and dot products of the data vector with the sinusoid vectors are taken and appropriately normalized; following the method known as Lomb/Scargle periodogram, a time shift is calculated for each frequency to orthogonalize the sine and cosine components before the dot product, as described by Craymer; finally, a power is computed from those two amplitude components. This same process implements a discrete Fourier transform when the data are uniformly spaced in time and the frequencies chosen correspond to integer numbers of cycles over the finite data record. This method treats each sinusoidal component independently, or out of context, even though they may not be orthogonal on the data points; it is Vaníček's original method. In contrast, as Craymer explains, it is also possible to perform a full simultaneous or in-context least-squares fit by solving a matrix equation, partitioning the total data variance between the specified sinusoid frequencies. Such a matrix least-squares solution is natively available in MATLAB as the backslash operator. Craymer explains that the simultaneous or in-context method, as opposed to the independent or out-of-context version (as well as the periodogram version due to Lomb), cannot fit more components (sines and cosines) than there are data samples, and further that: Lomb's periodogram method, on the other hand, can use an arbitrarily high number of, or density of, frequency components, as in a standard periodogram; that is, the frequency domain can be over-sampled by an arbitrary factor. In Fourier analysis, such as the
Fourier transform A Fourier transform (FT) is a mathematical transform that decomposes functions into frequency components, which are represented by the output of the transform as a function of frequency. Most commonly functions of time or space are transformed, ...
or the discrete Fourier transform, the sinusoids being fitted to the data are all mutually orthogonal, so there is no distinction between the simple out-of-context dot-product-based projection onto basis functions versus an in-context simultaneous least-squares fit; that is, no matrix inversion is required to least-squares partition the variance between orthogonal sinusoids of different frequencies. This method is usually preferred for its efficient
fast Fourier transform A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). Fourier analysis converts a signal from its original domain (often time or space) to a representation in th ...
implementation, when complete data records with equally spaced samples are available.


See also

*
Non-uniform discrete Fourier transform In applied mathematics, the nonuniform discrete Fourier transform (NUDFT or NDFT) of a signal is a type of Fourier transform, related to a discrete Fourier transform or discrete-time Fourier transform, but in which the input signal is not sampled ...
*
Orthogonal functions In mathematics, orthogonal functions belong to a function space that is a vector space equipped with a bilinear form. When the function space has an interval (mathematics), interval as the domain of a function, domain, the bilinear form may be the i ...
*
SigSpec SigSpec (acronym of SIGnificance SPECtrum) is a statistical technique to provide the reliability of periodicities in a measured (noisy and not necessarily equidistant) time series. It relies on the amplitude spectrum obtained by the Discrete Fourie ...
* Sinusoidal model *
Spectral density The power spectrum S_(f) of a time series x(t) describes the distribution of power into frequency components composing that signal. According to Fourier analysis, any physical signal can be decomposed into a number of discrete frequencies, o ...
*
Spectral density estimation In statistical signal processing, the goal of spectral density estimation (SDE) or simply spectral estimation is to estimate the spectral density (also known as the power spectral density) of a signal from a sequence of time samples of the signa ...
, for competing alternatives


References


External links


LSSA package freeware download
FORTRAN, Vaníček's least-squares spectral analysis method, from the University of New Brunswick.
LSWAVE package freeware download
MATLAB, includes the Vaníček's least-squares spectral analysis method, from the U.S. National Geodetic Survey. * tp://ftp.geod.nrcan.gc.ca/pub/GSD/craymer/software/lssa/ LSSA software freeware download(via ftp), FORTRAN, Vaníček's method, from the Natural Resources Canada. {{Authority control Algorithms Analysis of variance Applied mathematics Applied statistics Carl Friedrich Gauss Computational mathematics Computational science Data processing Digital signal processing Engineering statistics Frequency Frequency-domain analysis Harmonic analysis Iterative methods Least squares Linear algebra Mathematical analysis Mathematical optimization Mathematical physics Mathematics of computing Multivariate statistics Numerical analysis Numerical linear algebra Optimization algorithms and methods Sequences in time Signal processing Statistical forecasting Statistical methods Statistical signal processing Stochastic processes Theoretical computer science Time series