Least-squares spectral analysis
   HOME

TheInfoList



OR:

Least-squares spectral analysis (LSSA) is a method of estimating a
frequency spectrum The power spectrum S_(f) of a time series x(t) describes the distribution of power into frequency components composing that signal. According to Fourier analysis, any physical signal can be decomposed into a number of discrete frequencies, ...
, based on a
least squares The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the res ...
fit of
sinusoid A sine wave, sinusoidal wave, or just sinusoid is a mathematical curve defined in terms of the ''sine'' trigonometric function, of which it is the graph. It is a type of continuous wave and also a smooth periodic function. It occurs often in ma ...
s to data samples, similar to
Fourier analysis In mathematics, Fourier analysis () is the study of the way general functions may be represented or approximated by sums of simpler trigonometric functions. Fourier analysis grew from the study of Fourier series, and is named after Josep ...
. Fourier analysis, the most used spectral method in science, generally boosts long-periodic noise in long gapped records; LSSA mitigates such problems. Unlike with Fourier analysis, data need not be equally spaced to use LSSA. LSSA is also known as the Vaníček method or the Gauss-Vaniček method after
Petr Vaníček Petr Vaníček (born 18 July 1935) is a Czech Canadian geodesist and theoretical List of geophysicists, geophysicist who has made important breakthroughs in theory of frequency spectrum#Spectrum analysis, spectral analysis and geoid computation. ...
, and as the Lomb method or the Lomb–Scargle periodogram, based on the contributions of Nicholas R. Lomb and, independently, Jeffrey D. Scargle.


Historical background

The close connections between
Fourier analysis In mathematics, Fourier analysis () is the study of the way general functions may be represented or approximated by sums of simpler trigonometric functions. Fourier analysis grew from the study of Fourier series, and is named after Josep ...
, the
periodogram In signal processing, a periodogram is an estimate of the spectral density of a signal. The term was coined by Arthur Schuster in 1898. Today, the periodogram is a component of more sophisticated methods (see spectral estimation). It is the most ...
, and
least-squares The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the res ...
fitting of sinusoids have long been known. Most developments, however, are restricted to complete data sets of equally spaced samples. In 1963, Freek J. M. Barning of
Mathematisch Centrum The (abbr. CWI; English: "National Research Institute for Mathematics and Computer Science") is a research centre in the field of mathematics and theoretical computer science. It is part of the institutes organization of the Dutch Research Cou ...
, Amsterdam, handled unequally spaced data by similar techniques, including both a periodogram analysis equivalent to what is now referred to the Lomb method, and least-squares fitting of selected frequencies of sinusoids determined from such periodograms, connected by a procedure that is now known as
matching pursuit Matching pursuit (MP) is a sparse approximation algorithm which finds the "best matching" projections of multidimensional data onto the span of an over-complete (i.e., redundant) dictionary D. The basic idea is to approximately represent a signal ...
with post-backfitting or orthogonal matching pursuit.
Petr Vaníček Petr Vaníček (born 18 July 1935) is a Czech Canadian geodesist and theoretical List of geophysicists, geophysicist who has made important breakthroughs in theory of frequency spectrum#Spectrum analysis, spectral analysis and geoid computation. ...
, a Canadian
geodesist Geodesy ( ) is the Earth science of accurately measuring and understanding Earth's figure (geometric shape and size), orientation in space, and gravity. The field also incorporates studies of how these properties change over time and equival ...
of the
University of New Brunswick The University of New Brunswick (UNB) is a public university with two primary campuses in Fredericton and Saint John, New Brunswick. It is the oldest English-language university in Canada, and among the oldest public universities in North Ameri ...
, also proposed the matching-pursuit approach, which he called "successive spectral analysis" and the result a "least-squares periodogram", with equally and unequally spaced data, in 1969. He generalized this method to account for systematic components beyond a simple mean, such as a "predicted linear (quadratic, exponential, ...) secular trend of unknown magnitude", and applied it to a variety of samples, in 1971. The Vaníček method was then simplified in 1976 by Nicholas R. Lomb of the
University of Sydney The University of Sydney (USYD), also known as Sydney University, or informally Sydney Uni, is a public research university located in Sydney, Australia. Founded in 1850, it is the oldest university in Australia and is one of the country's si ...
, who pointed out its close connection to
periodogram In signal processing, a periodogram is an estimate of the spectral density of a signal. The term was coined by Arthur Schuster in 1898. Today, the periodogram is a component of more sophisticated methods (see spectral estimation). It is the most ...
analysis. The definition of a periodogram of unequally spaced data was subsequently further modified and analyzed by Jeffrey D. Scargle of
NASA Ames Research Center The Ames Research Center (ARC), also known as NASA Ames, is a major NASA research center at Moffett Federal Airfield in California's Silicon Valley. It was founded in 1939 as the second National Advisory Committee for Aeronautics (NACA) laborat ...
, who showed that with minor changes it could be made identical to Lomb's least-squares formula for fitting individual sinusoid frequencies. Scargle states that his paper "does not introduce a new detection technique, but instead studies the reliability and efficiency of detection with the most commonly used technique, the periodogram, in the case where the observation times are unevenly spaced," and further points out in reference to least-squares fitting of sinusoids compared to periodogram analysis, that his paper "establishes, apparently for the first time, that (with the proposed modifications) these two methods are exactly equivalent." Press summarizes the development this way: In 1989, Michael J. Korenberg of Queen's University in Kingston, Ontario, developed the "fast orthogonal search" method of more quickly finding a near-optimal decomposition of spectra or other problems, similar to the technique that later became known as orthogonal matching pursuit.


Method variants


The Vaníček method

In the Vaníček method, a discrete data set is approximated by a weighted sum of sinusoids of progressively determined frequencies, using a standard
linear regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is call ...
, or
least-squares The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the res ...
fit.Wells, D.E., P. Vaníček, S. Pagiatakis, 1985. Least-squares spectral analysis revisited. Department of Surveying Engineering Technical Report 84, University of New Brunswick, Fredericton, 68 pages, Available a

The frequencies are chosen using a method similar to Barning's, but going further in optimizing the choice of each successive new frequency by picking the frequency that minimizes the residual after least-squares fitting (equivalent to the fitting technique now known as
matching pursuit Matching pursuit (MP) is a sparse approximation algorithm which finds the "best matching" projections of multidimensional data onto the span of an over-complete (i.e., redundant) dictionary D. The basic idea is to approximately represent a signal ...
with pre-backfitting). The number of sinusoids must be less than or equal to the number of data samples (counting sines and cosines of the same frequency as separate sinusoids). A data vector ''Φ'' is represented as a weighted sum of sinusoidal basis functions, tabulated in a matrix A by evaluating each function at the sample times, with weight vector ''x'': :\phi \approx \textbfx where the weight vector ''x'' is chosen to minimize the sum of squared errors in approximating ''Φ''. The solution for ''x'' is closed-form, using standard
linear regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is call ...
: :x = (\textbf^\textbf)^\textbf^\phi. Here the matrix A can be based on any set of functions that are mutually independent (not necessarily orthogonal) when evaluated at the sample times; for spectral analysis, the functions used are typically sines and cosines evenly distributed over the frequency range of interest. If too many frequencies are chosen in a too-narrow frequency range, the functions will not be sufficiently independent, the matrix will be badly conditioned, and the resulting spectrum will not be meaningful. When the basis functions in A are orthogonal (that is, not correlated, meaning the columns have zero pair-wise
dot product In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a scalar as a result". It is also used sometimes for other symmetric bilinear forms, for example in a pseudo-Euclidean space. is an algebra ...
s), the matrix ATA is a diagonal matrix; when the columns all have the same power (sum of squares of elements), then that matrix is an
identity matrix In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere. Terminology and notation The identity matrix is often denoted by I_n, or simply by I if the size is immaterial o ...
times a constant, so the inversion is trivial. The latter is the case when the sample times are equally spaced and the sinusoids are chosen to be sines and cosines equally spaced in pairs on the frequency interval 0 to a half cycle per sample (spaced by 1/N cycle per sample, omitting the sine phases at 0 and maximum frequency where they are identically zero). This particular case is known as the
discrete Fourier transform In mathematics, the discrete Fourier transform (DFT) converts a finite sequence of equally-spaced samples of a function into a same-length sequence of equally-spaced samples of the discrete-time Fourier transform (DTFT), which is a complex- ...
, slightly rewritten in terms of real data and coefficients. :x = \textbf^\phi     (DFT case for ''N'' equally spaced samples and frequencies, within a scalar factor)


The Lomb method

Trying to lower the computational burden of the Vaníček method in 1976 (no longer an issue), Lomb proposed using the above simplification in general, except for pair-wise correlations between sine and cosine bases of the same frequency, since the correlations between pairs of sinusoids are often small, at least when they are not too closely spaced. This is essentially the traditional
periodogram In signal processing, a periodogram is an estimate of the spectral density of a signal. The term was coined by Arthur Schuster in 1898. Today, the periodogram is a component of more sophisticated methods (see spectral estimation). It is the most ...
formulation, but now adopted for use with unevenly spaced samples. The vector ''x'' is a good estimate of an underlying spectrum, but since correlations are ignored, A''x'' is no longer a good approximation to the signal, and the method is no longer a least-squares method – yet it has continued to be referred to as such. Rather than just taking dot products of the data with sine and cosine waveforms directly, Scargle modified the standard periodogram formula to first find a time delay \tau such that this pair of sinusoids would be mutually orthogonal at sample times t_j, and also adjusted for the potentially unequal powers of these two basis functions, to obtain a better estimate of the power at a frequency, which made his modified periodogram method exactly equivalent to Lomb's method. The time delay \tau is defined by the formula :\tan = \frac. The periodogram at frequency \omega is then estimated as: :P_x(\omega) = \frac \left( \frac + \frac \right) which Scargle reports then has the same statistical distribution as the periodogram in the evenly sampled case. At any individual frequency \omega, this method gives the same power as does a least-squares fit to sinusoids of that frequency, of the form :\phi(t) = A \sin \omega t + B \cos \omega t.


The generalized Lomb–Scargle periodogram

The standard Lomb–Scargle periodogram is valid for a model with zero mean. Commonly, this is approximated by subtracting the mean of the data before calculating the periodogram. However, this is an inaccurate assumption when the mean of the model (the fitted sinusoids) is non-zero. The ''generalized'' Lomb–Scargle periodogram removes this assumption, and explicitly solves for the mean. In this case, the function fitted is :\phi(t) = A \sin \omega t + B \cos \omega t + C. The generalized Lomb–Scargle periodogram has also been referred to as a ''floating mean periodogram''.


Korenberg's "fast orthogonal search" method

Michael Korenberg of Queen's University in
Kingston, Ontario Kingston is a city in Ontario, Canada. It is located on the north-eastern end of Lake Ontario, at the beginning of the St. Lawrence River and at the mouth of the Cataraqui River (south end of the Rideau Canal). The city is midway between Toro ...
, developed a method for choosing a sparse set of components from an over-complete set, such as sinusoidal components for spectral analysis, called fast orthogonal search (FOS). Mathematically, FOS uses a slightly modified
Cholesky decomposition In linear algebra, the Cholesky decomposition or Cholesky factorization (pronounced ) is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which is useful for effici ...
in a mean-square error reduction (MSER) process, implemented as a
sparse matrix In numerical analysis and scientific computing, a sparse matrix or sparse array is a matrix in which most of the elements are zero. There is no strict definition regarding the proportion of zero-value elements for a matrix to qualify as sparse b ...
inversion. As with the other LSSA methods, FOS avoids the major shortcoming of discrete Fourier analysis, and can achieve highly accurate identifications of embedded periodicities and excels with unequally spaced data; the fast orthogonal search method has also been applied to other problems such as nonlinear system identification.


Palmer's Chi-squared method

Palmer has developed a method for finding the best-fit function to any chosen number of harmonics, allowing more freedom to find non-sinusoidal harmonic functions. This method is a fast technique (
FFT A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). Fourier analysis converts a signal from its original domain (often time or space) to a representation in the ...
-based) for doing weighted least-squares analysis on arbitrarily spaced data with non-uniform standard errors. Source code that implements this technique is available. Because data are often not sampled at uniformly spaced discrete times, this method "grids" the data by sparsely filling a time series array at the sample times. All intervening grid points receive zero statistical weight, equivalent to having infinite error bars at times between samples.


Applications

The most useful feature of the LSSA method is enabling incomplete records to be spectrally analyzed, without the need to
manipulate Manipulation may refer to: * Manipulation (psychology) - the action of manipulating someone in a clever or unscrupulous way *Crowd manipulation - use of crowd psychology to direct the behavior of a crowd toward a specific action ::* Internet mani ...
the record or to invent otherwise non-existent data.
Magnitude Magnitude may refer to: Mathematics *Euclidean vector, a quantity defined by both its magnitude and its direction *Magnitude (mathematics), the relative size of an object *Norm (mathematics), a term for the size or length of a vector *Order of ...
s in the LSSA
spectrum A spectrum (plural ''spectra'' or ''spectrums'') is a condition that is not limited to a specific set of values but can vary, without gaps, across a continuum. The word was first used scientifically in optics to describe the rainbow of colors i ...
depict the contribution of a frequency or period to the
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
of the
time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Exa ...
. Generally, spectral magnitudes defined in the above manner enable the output's straightforward
significance level In statistical hypothesis testing, a result has statistical significance when it is very unlikely to have occurred given the null hypothesis (simply by chance alone). More precisely, a study's defined significance level, denoted by \alpha, is the ...
regime.Beard, A.G., Williams, P.J.S., Mitchell, N.J. & Muller, H.G. A special climatology of planetary waves and tidal variability, J Atm. Solar-Ter. Phys. 63 (09), p.801–811 (2001). Alternatively, magnitudes in the Vaníček spectrum can also be expressed in dB.Pagiatakis, S. Stochastic significance of peaks in the least-squares spectrum, J of Geodesy 73, p.67-78 (1999). Note that magnitudes in the Vaníček spectrum follow β-distribution. Inverse transformation of Vaníček's LSSA is possible, as is most easily seen by writing the forward transform as a matrix; the matrix inverse (when the matrix is not singular) or pseudo-inverse will then be an inverse transformation; the inverse will exactly match the original data if the chosen sinusoids are mutually independent at the sample points and their number is equal to the number of data points.Craymer, M.R.
''The Least Squares Spectrum, Its Inverse Transform and Autocorrelation Function: Theory and Some Applications in Geodesy''
Ph.D. Dissertation, University of Toronto, Canada (1998).
No such inverse procedure is known for the periodogram method.


Implementation

The LSSA can be implemented in less than a page of
MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementation ...
code. In essence:
"to compute the least-squares spectrum we must compute ''m'' spectral values ... which involves performing the least-squares approximation ''m'' times, each time to get he spectral powerfor a different frequency"
I.e., for each frequency in a desired set of frequencies,
sine In mathematics, sine and cosine are trigonometric functions of an angle. The sine and cosine of an acute angle are defined in the context of a right triangle: for the specified angle, its sine is the ratio of the length of the side that is oppo ...
and cosine functions are evaluated at the times corresponding to the data samples, and
dot product In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a scalar as a result". It is also used sometimes for other symmetric bilinear forms, for example in a pseudo-Euclidean space. is an algebra ...
s of the data
vector Vector most often refers to: *Euclidean vector, a quantity with a magnitude and a direction *Vector (epidemiology), an agent that carries and transmits an infectious pathogen into another living organism Vector may also refer to: Mathematic ...
with the sinusoid vectors are taken and appropriately normalized; following the method known as Lomb/Scargle periodogram, a time shift is calculated for each frequency to orthogonalize the sine and cosine components before the dot product, as described by Craymer; finally, a power is computed from those two
amplitude The amplitude of a periodic variable is a measure of its change in a single period (such as time or spatial period). The amplitude of a non-periodic signal is its magnitude compared with a reference value. There are various definitions of amplit ...
components. This same process implements a
discrete Fourier transform In mathematics, the discrete Fourier transform (DFT) converts a finite sequence of equally-spaced samples of a function into a same-length sequence of equally-spaced samples of the discrete-time Fourier transform (DTFT), which is a complex- ...
when the data are uniformly spaced in time and the frequencies chosen correspond to integer numbers of cycles over the finite data record. This method treats each sinusoidal component independently, or out of context, even though they may not be orthogonal on the data points; it is Vaníček's original method. In contrast, as Craymer explains, it is also possible to perform a full simultaneous or in-context least-squares fit by solving a matrix equation, partitioning the total data variance between the specified sinusoid frequencies. Such a matrix least-squares solution is natively available in MATLAB as the
backslash The backslash is a typographical mark used mainly in computing and mathematics. It is the mirror image of the common slash . It is a relatively recent mark, first documented in the 1930s. History , efforts to identify either the origin o ...
operator. Craymer explains that the simultaneous or in-context method, as opposed to the independent or out-of-context version (as well as the periodogram version due to Lomb), cannot fit more components (sines and cosines) than there are data samples, and further that: Lomb's periodogram method, on the other hand, can use an arbitrarily high number of, or density of, frequency components, as in a standard
periodogram In signal processing, a periodogram is an estimate of the spectral density of a signal. The term was coined by Arthur Schuster in 1898. Today, the periodogram is a component of more sophisticated methods (see spectral estimation). It is the most ...
; that is, the frequency domain can be over-sampled by an arbitrary factor. In Fourier analysis, such as the
Fourier transform A Fourier transform (FT) is a mathematical transform that decomposes functions into frequency components, which are represented by the output of the transform as a function of frequency. Most commonly functions of time or space are transformed, ...
or the
discrete Fourier transform In mathematics, the discrete Fourier transform (DFT) converts a finite sequence of equally-spaced samples of a function into a same-length sequence of equally-spaced samples of the discrete-time Fourier transform (DTFT), which is a complex- ...
, the sinusoids being fitted to the data are all mutually orthogonal, so there is no distinction between the simple out-of-context dot-product-based projection onto basis functions versus an in-context simultaneous least-squares fit; that is, no matrix inversion is required to least-squares partition the variance between orthogonal sinusoids of different frequencies. This method is usually preferred for its efficient fast Fourier transform implementation, when complete data records with equally spaced samples are available.


See also

*
Non-uniform discrete Fourier transform In applied mathematics, the nonuniform discrete Fourier transform (NUDFT or NDFT) of a signal is a type of Fourier transform, related to a discrete Fourier transform or discrete-time Fourier transform, but in which the input signal is not sampled ...
*
Orthogonal functions In mathematics, orthogonal functions belong to a function space that is a vector space equipped with a bilinear form. When the function space has an interval (mathematics), interval as the domain of a function, domain, the bilinear form may be the i ...
* SigSpec *
Sinusoidal model In statistics, signal processing, and time series analysis, a sinusoidal model is used to approximate a sequence ''Yi'' to a sine function: :Y_i = C + \alpha\sin(\omega T_i + \phi) + E_i where ''C'' is constant defining a mean level, α is an ...
*
Spectral density The power spectrum S_(f) of a time series x(t) describes the distribution of power into frequency components composing that signal. According to Fourier analysis, any physical signal can be decomposed into a number of discrete frequencies, o ...
* Spectral density estimation, for competing alternatives


References


External links


LSSA package freeware download
FORTRAN, Vaníček's least-squares spectral analysis method, from the
University of New Brunswick The University of New Brunswick (UNB) is a public university with two primary campuses in Fredericton and Saint John, New Brunswick. It is the oldest English-language university in Canada, and among the oldest public universities in North Ameri ...
.
LSWAVE package freeware download
MATLAB, includes the Vaníček's least-squares spectral analysis method, from the
U.S. National Geodetic Survey The National Geodetic Survey (NGS) is a United States federal agency that defines and manages a national coordinate system, providing the foundation for transportation and communication; mapping and charting; and a large number of applications ...
. * tp://ftp.geod.nrcan.gc.ca/pub/GSD/craymer/software/lssa/ LSSA software freeware download(via ftp), FORTRAN, Vaníček's method, from the
Natural Resources Canada Natural Resources Canada (NRCan; french: Ressources naturelles Canada; french: RNCan, label=none)Natural Resources Canada is the applied title under the Federal Identity Program; the legal title is Department of Natural Resources (). is the depa ...
. {{Authority control Algorithms Analysis of variance Applied mathematics Applied statistics Carl Friedrich Gauss Computational mathematics Computational science Data processing Digital signal processing Engineering statistics Frequency Frequency-domain analysis Harmonic analysis Iterative methods Least squares Linear algebra Mathematical analysis Mathematical optimization Mathematical physics Mathematics of computing Multivariate statistics Numerical analysis Numerical linear algebra Optimization algorithms and methods Sequences in time Signal processing Statistical forecasting Statistical methods Statistical signal processing Stochastic processes Theoretical computer science Time series