In
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
and
image processing
An image or picture is a visual representation. An image can be two-dimensional, such as a drawing, painting, or photograph, or three-dimensional, such as a carving or sculpture. Images may be displayed through other media, including a pr ...
, to smooth a
data set
A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more table (database), database tables, where every column (database), column of a table represents a particular Variable (computer sci ...
is to create an approximating
function that attempts to capture important
pattern
A pattern is a regularity in the world, in human-made design, or in abstract ideas. As such, the elements of a pattern repeat in a predictable manner. A geometric pattern is a kind of pattern formed of geometric shapes and typically repeated l ...
s in the data, while leaving out
noise
Noise is sound, chiefly unwanted, unintentional, or harmful sound considered unpleasant, loud, or disruptive to mental or hearing faculties. From a physics standpoint, there is no distinction between noise and desired sound, as both are vibrat ...
or other fine-scale structures/rapid phenomena. In smoothing, the data points of a signal are modified so individual points higher than the adjacent points (presumably because of noise) are reduced, and points that are lower than the adjacent points are increased leading to a smoother signal. Smoothing may be used in two important ways that can aid in data analysis (1) by being able to extract more information from the data as long as the assumption of smoothing is reasonable and (2) by being able to provide analyses that are both flexible and robust. Many different
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
s are used in smoothing.
Compared to curve fitting
Smoothing may be distinguished from the related and partially overlapping concept of
curve fitting
Curve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points, possibly subject to constraints. Curve fitting can involve either interpolation, where an exact fit to the data is ...
in the following ways:
* curve fitting often involves the use of an explicit function form for the result, whereas the immediate results from smoothing are the "smoothed" values with no later use made of a functional form if there is one;
* the aim of smoothing is to give a general idea of relatively slow changes of value with little attention paid to the close matching of data values, while curve fitting concentrates on achieving as close a match as possible.
* smoothing methods often have an associated tuning parameter which is used to control the extent of smoothing. Curve fitting will adjust any number of parameters of the function to obtain the 'best' fit.
Linear smoothers
In the case that the smoothed values can be written as a
linear transformation
In mathematics, and more specifically in linear algebra, a linear map (also called a linear mapping, linear transformation, vector space homomorphism, or in some contexts linear function) is a mapping V \to W between two vector spaces that pr ...
of the observed values, the smoothing operation is known as a linear smoother; the matrix representing the transformation is known as a smoother matrix or
hat matrix.
The operation of applying such a matrix transformation is called
convolution
In mathematics (in particular, functional analysis), convolution is a operation (mathematics), mathematical operation on two function (mathematics), functions f and g that produces a third function f*g, as the integral of the product of the two ...
. Thus the matrix is also called convolution matrix or a
convolution kernel. In the case of simple series of data points (rather than a multi-dimensional image), the convolution kernel is a one-dimensional
vector
Vector most often refers to:
* Euclidean vector, a quantity with a magnitude and a direction
* Disease vector, an agent that carries and transmits an infectious pathogen into another living organism
Vector may also refer to:
Mathematics a ...
.
Algorithms
One of the most common algorithms is the "
moving average
In statistics, a moving average (rolling average or running average or moving mean or rolling mean) is a calculation to analyze data points by creating a series of averages of different selections of the full data set. Variations include: #Simpl ...
", often used to try to capture important trends in repeated
statistical survey
Survey methodology is "the study of survey methods".
As a field of applied statistics concentrating on human-research surveys, survey methodology studies the sampling of individual units from a population and associated techniques of survey d ...
s. In
image processing
An image or picture is a visual representation. An image can be two-dimensional, such as a drawing, painting, or photograph, or three-dimensional, such as a carving or sculpture. Images may be displayed through other media, including a pr ...
and
computer vision
Computer vision tasks include methods for image sensor, acquiring, Image processing, processing, Image analysis, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical ...
, smoothing ideas are used in
scale space
Scale-space theory is a framework for multi-scale signal representation developed by the computer vision, image processing and signal processing communities with complementary motivations from physics and biological vision. It is a formal the ...
representations. The simplest smoothing algorithm is the "rectangular" or "unweighted sliding-average smooth". This method replaces each point in the signal with the average of "m" adjacent points, where "m" is a positive integer called the "smooth width". Usually m is an odd number. The ''triangular smooth'' is like the ''rectangular smooth'' except that it implements a weighted smoothing function.
Some specific smoothing and filter types, with their respective uses, pros and cons are:
See also
*
Convolution
In mathematics (in particular, functional analysis), convolution is a operation (mathematics), mathematical operation on two function (mathematics), functions f and g that produces a third function f*g, as the integral of the product of the two ...
*
Curve fitting
Curve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points, possibly subject to constraints. Curve fitting can involve either interpolation, where an exact fit to the data is ...
*
Discretization
In applied mathematics, discretization is the process of transferring continuous functions, models, variables, and equations into discrete counterparts. This process is usually carried out as a first step toward making them suitable for numeri ...
*
Edge preserving smoothing
*
Filtering (signal processing)
*
Graph cuts in computer vision
*
Interpolation
In the mathematics, mathematical field of numerical analysis, interpolation is a type of estimation, a method of constructing (finding) new data points based on the range of a discrete set of known data points.
In engineering and science, one ...
*
Numerical smoothing and differentiation
*
Scale space
Scale-space theory is a framework for multi-scale signal representation developed by the computer vision, image processing and signal processing communities with complementary motivations from physics and biological vision. It is a formal the ...
*
Scatterplot smoothing
*
Smoothing spline
*
Smoothness
In mathematical analysis, the smoothness of a function is a property measured by the number of continuous derivatives (''differentiability class)'' it has over its domain.
A function of class C^k is a function of smoothness at least ; t ...
*
Statistical signal processing
Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing ''signals'', such as sound, images, potential fields, seismic signals, altimetry processing, and scientific measurements. Signal ...
*
Subdivision surface
In the field of 3D computer graphics, a subdivision surface (commonly shortened to SubD surface or Subsurf) is a curved Computer representation of surfaces, surface represented by the specification of a coarser polygon mesh and produced by a re ...
, used in computer graphics
*
Window function
In signal processing and statistics, a window function (also known as an apodization function or tapering function) is a mathematical function that is zero-valued outside of some chosen interval. Typically, window functions are symmetric around ...
References
{{reflist
Further reading
* Hastie, T.J. and Tibshirani, R.J. (1990), ''Generalized Additive Models'', New York: Chapman and Hall.
Curve fitting
Statistical charts and diagrams
Time series
Image processing