HOME

TheInfoList



OR:

Smoothing splines are function estimates, \hat f(x), obtained from a set of noisy observations y_i of the target f(x_i), in order to balance a measure of
goodness of fit The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measure ...
of \hat f(x_i) to y_i with a derivative based measure of the smoothness of \hat f(x). They provide a means for smoothing noisy x_i, y_i data. The most familiar example is the cubic smoothing spline, but there are many other possibilities, including for the case where x is a vector quantity.


Cubic spline definition

Let \ be a set of observations, modeled by the relation Y_i = f(x_i) + \epsilon_i where the \epsilon_i are independent, zero mean random variables (usually assumed to have constant variance). The cubic smoothing spline estimate \hat f of the function f is defined to be the minimizer (over the class of twice differentiable functions) of : \sum_^n \^2 + \lambda \int \hat f''(x)^2 \,dx. Remarks: * \lambda \ge 0 is a smoothing parameter, controlling the trade-off between fidelity to the data and roughness of the function estimate. This is often estimated by generalized cross-validation, or by restricted marginal likelihood (REML) which exploits the link between spline smoothing and Bayesian estimation (the smoothing penalty can be viewed as being induced by a prior on the f). * The integral is often evaluated over the whole real line although it is also possible to restrict the range to that of x_i. * As \lambda\to 0 (no smoothing), the smoothing spline converges to the
interpolating spline In the mathematical field of numerical analysis, spline interpolation is a form of interpolation where the interpolant is a special type of piecewise polynomial called a spline. That is, instead of fitting a single, high-degree polynomial to all ...
. * As \lambda\to\infty (infinite smoothing), the roughness penalty becomes paramount and the estimate converges to a
linear least squares Linear least squares (LLS) is the least squares approximation of linear functions to data. It is a set of formulations for solving statistical problems involved in linear regression, including variants for ordinary (unweighted), weighted, and ...
estimate. * The roughness penalty based on the
second derivative In calculus, the second derivative, or the second order derivative, of a function is the derivative of the derivative of . Roughly speaking, the second derivative measures how the rate of change of a quantity is itself changing; for example, ...
is the most common in modern statistics literature, although the method can easily be adapted to penalties based on other derivatives. * In early literature, with equally-spaced ordered x_i, second or third-order differences were used in the penalty, rather than derivatives. * The penalized sum of squares smoothing objective can be replaced by a ''penalized likelihood'' objective in which the sum of squares terms is replaced by another log-likelihood based measure of fidelity to the data. The sum of squares term corresponds to penalized likelihood with a Gaussian assumption on the \epsilon_i.


Derivation of the cubic smoothing spline

It is useful to think of fitting a smoothing spline in two steps: # First, derive the values \hat f(x_i);i=1,\ldots,n. # From these values, derive \hat f(x) for all ''x''. Now, treat the second step first. Given the vector \hat = (\hat f(x_1),\ldots,\hat f(x_n))^T of fitted values, the sum-of-squares part of the spline criterion is fixed. It remains only to minimize \int \hat f''(x)^2 \, dx, and the minimizer is a natural cubic spline that interpolates the points (x_i,\hat f(x_i)). This interpolating spline is a linear operator, and can be written in the form : \hat f(x) = \sum_^n \hat f(x_i) f_i(x) where f_i(x) are a set of spline basis functions. As a result, the roughness penalty has the form : \int \hat f''(x)^2 dx = \hat^T A \hat. where the elements of ''A'' are \int f_i''(x) f_j''(x)dx. The basis functions, and hence the matrix ''A'', depend on the configuration of the predictor variables x_i, but not on the responses Y_i or \hat m. ''A'' is an ''n''×''n'' matrix given by A = \Delta^T W^ \Delta. ''Δ'' is an ''(n-2)''×''n'' matrix of second differences with elements: \Delta_ = 1/h_i, \Delta_ = -1/h_i - 1/h_, \Delta_ = 1/h_ ''W'' is an ''(n-2)''×''(n-2)'' symmetric tri-diagonal matrix with elements: W_=W_=h_i/6, W_=(h_i+h_)/3 and h_i=\xi_ - \xi_i, the distances between successive knots (or x values). Now back to the first step. The penalized sum-of-squares can be written as : \^T \ + \lambda \hat^T A \hat m, where Y=(Y_1,\ldots,Y_n)^T. Minimizing over \hat m by differentiating against \hat m. This results in: -2 \ + 2 \lambda A \hat m = 0 and \hat m = (I + \lambda A)^ Y.


De Boor's approach

De Boor's approach exploits the same idea, of finding a balance between having a smooth curve and being close to the given data. p\sum_^n \left ( \frac \right )^2+\left ( 1-p \right )\int \left ( \hat f^\left ( x \right ) \right )^2 \, dx where p is a parameter called smooth factor and belongs to the interval ,1/math>, and \delta_i;i=1,\dots,n are the quantities controlling the extent of smoothing (they represent the weight \delta_i^ of each point Y_i). In practice, since
cubic splines In mathematics, a spline is a special function defined piecewise by polynomials. In interpolating problems, spline interpolation is often preferred to polynomial interpolation because it yields similar results, even when using low degree poly ...
are mostly used, m is usually 2. The solution for m=2 was proposed by Reinsch in 1967. For m=2, when p approaches 1, \hat f converges to the "natural" spline interpolant to the given data. As p approaches 0, \hat f converges to a straight line (the smoothest curve). Since finding a suitable value of p is a task of trial and error, a redundant constant S was introduced for convenience. S is used to numerically determine the value of p so that the function \hat f meets the following condition: \sum_^n \left ( \frac \right )^2 \le S The algorithm described by de Boor starts with p=0 and increases p until the condition is met. If \delta_i is an estimation of the standard deviation for Y_i, the constant S is recommended to be chosen in the interval \left n-\sqrt,n+\sqrt \right /math>. Having S=0 means the solution is the "natural" spline interpolant. Increasing S means we obtain a smoother curve by getting farther from the given data.


Multidimensional splines

There are two main classes of method for generalizing from smoothing with respect to a scalar x to smoothing with respect to a vector x. The first approach simply generalizes the spline smoothing penalty to the multidimensional setting. For example, if trying to estimate f(x,z) we might use the
Thin plate spline Thin plate splines (TPS) are a spline-based technique for data interpolation and smoothing. They were introduced to geometric design by Duchon. They are an important special case of a polyharmonic spline. Robust Point Matching (RPM) is a common e ...
penalty and find the \hat f(x,z) minimizing : \sum_^n \^2 + \lambda \int \left left(\frac\right)^2 + 2\left(\frac\right)^2 + \left(\frac\right)^2 \right\textrm x \, \textrmz. The thin plate spline approach can be generalized to smoothing with respect to more than two dimensions and to other orders of differentiation in the penalty. As the dimension increases there are some restrictions on the smallest order of differential that can be used, but actually Duchon's original paper, gives slightly more complicated penalties that can avoid this restriction. The thin plate splines are isotropic, meaning that if we rotate the x,z co-ordinate system the estimate will not change, but also that we are assuming that the same level of smoothing is appropriate in all directions. This is often considered reasonable when smoothing with respect to spatial location, but in many other cases isotropy is not an appropriate assumption and can lead to sensitivity to apparently arbitrary choices of measurement units. For example, if smoothing with respect to distance and time an isotropic smoother will give different results if distance is measure in metres and time in seconds, to what will occur if we change the units to centimetres and hours. The second class of generalizations to multi-dimensional smoothing deals directly with this scale invariance issue using tensor product spline constructions. Such splines have smoothing penalties with multiple smoothing parameters, which is the price that must be paid for not assuming that the same degree of smoothness is appropriate in all directions.


Related methods

Smoothing splines are related to, but distinct from: * ''Regression splines''. In this method, the data is fitted to a set of spline basis functions with a reduced set of knots, typically by least squares. No roughness penalty is used. (See also
multivariate adaptive regression splines In statistics, multivariate adaptive regression splines (MARS) is a form of regression analysis introduced by Jerome H. Friedman in 1991. It is a non-parametric regression technique and can be seen as an extension of linear models that automaticall ...
.) * ''Penalized splines''. This combines the reduced knots of regression splines, with the roughness penalty of smoothing splines. * ''
Thin plate splines Thin plate splines (TPS) are a spline-based technique for data interpolation and smoothing. They were introduced to geometric design by Duchon. They are an important special case of a polyharmonic spline. Robust Point Matching (RPM) is a common ex ...
'' and
Elastic map Elastic maps provide a tool for nonlinear dimensionality reduction. By their construction, they are a system of elastic springs embedded in the data space. This system approximates a low-dimensional manifold. The elastic coefficients of this s ...
s method for
manifold learning Nonlinear dimensionality reduction, also known as manifold learning, refers to various related techniques that aim to project high-dimensional data onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low- ...
. This method combines the
least squares The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the res ...
penalty for approximation error with the bending and stretching penalty of the approximating manifold and uses the coarse discretization of the optimization problem.


Source code

Source code for spline smoothing can be found in the examples from Carl de Boor's book ''A Practical Guide to Splines''. The examples are in the Fortran
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
. The updated sources are available also on Carl de Boor's official sit


References

{{Reflist


Further reading

* Wahba, G. (1990). ''Spline Models for Observational Data''. SIAM, Philadelphia. * Green, P. J. and Silverman, B. W. (1994). ''Nonparametric Regression and Generalized Linear Models''. CRC Press. * De Boor, C. (2001). ''A Practical Guide to Splines (Revised Edition)''. Springer. Regression analysis Splines (mathematics)