Curve fitting is the process of constructing a
curve
In mathematics, a curve (also called a curved line in older texts) is an object similar to a line (geometry), line, but that does not have to be Linearity, straight.
Intuitively, a curve may be thought of as the trace left by a moving point (ge ...
, or
mathematical function
In mathematics, a function from a set to a set assigns to each element of exactly one element of .; the words map, mapping, transformation, correspondence, and operator are often used synonymously. The set is called the domain of the functi ...
, that has the best fit to a series of
data points
In statistics, a unit of observation is the unit described by the data that one analyzes. A study may treat groups as a unit of observation with a country as the unit of analysis, drawing conclusions on group characteristics from data collected at ...
, possibly subject to constraints. Curve fitting can involve either
interpolation, where an exact fit to the data is required, or
smoothing
In statistics and image processing, to smooth a data set is to create an approximating function that attempts to capture important patterns in the data, while leaving out noise or other fine-scale structures/rapid phenomena. In smoothing, the dat ...
, in which a "smooth" function is constructed that approximately fits the data. A related topic is
regression analysis
In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
, which focuses more on questions of
statistical inference
Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution, distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical ...
such as how much uncertainty is present in a curve that is fit to data observed with random errors. Fitted curves can be used as an aid for data visualization, to infer values of a function where no data are available, and to summarize the relationships among two or more variables.
Extrapolation
In mathematics, extrapolation is a type of estimation, beyond the original observation range, of the value of a variable on the basis of its relationship with another variable. It is similar to interpolation, which produces estimates between know ...
refers to the use of a fitted curve beyond the
range
Range may refer to:
Geography
* Range (geographic), a chain of hills or mountains; a somewhat linear, complex mountainous or hilly area (cordillera, sierra)
** Mountain range, a group of mountains bordered by lowlands
* Range, a term used to i ...
of the observed data, and is subject to a
degree of uncertainty since it may reflect the method used to construct the curve as much as it reflects the observed data.
For linear-algebraic analysis of data, "fitting" usually means trying to find the curve that minimizes the vertical (''y''-axis) displacement of a point from the curve (e.g.,
ordinary least squares
In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the prin ...
). However, for graphical and image applications, geometric fitting seeks to provide the best visual fit; which usually means trying to minimize the
orthogonal distance In geometry, the perpendicular distance between two objects is the distance from one to the other, measured along a line that is perpendicular to one or both.
The distance from a point to a line is the distance to the nearest point on that line. Th ...
to the curve (e.g.,
total least squares
In applied statistics, total least squares is a type of errors-in-variables regression, a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account. It is a generaliza ...
), or to otherwise include both axes of displacement of a point from the curve. Geometric fits are not popular because they usually require non-linear and/or iterative calculations, although they have the advantage of a more aesthetic and geometrically accurate result.
Algebraic fitting of functions to data points
Most commonly, one fits a function of the form .
Fitting lines and polynomial functions to data points
The first degree
polynomial
In mathematics, a polynomial is an expression consisting of indeterminates (also called variables) and coefficients, that involves only the operations of addition, subtraction, multiplication, and positive-integer powers of variables. An exa ...
equation
:
is a line with
slope
In mathematics, the slope or gradient of a line is a number that describes both the ''direction'' and the ''steepness'' of the line. Slope is often denoted by the letter ''m''; there is no clear answer to the question why the letter ''m'' is use ...
''a''. A line will connect any two points, so a first degree polynomial equation is an exact fit through any two points with distinct x coordinates.
If the order of the equation is increased to a second degree polynomial, the following results:
:
This will exactly fit a simple curve to three points.
If the order of the equation is increased to a third degree polynomial, the following is obtained:
:
This will exactly fit four points.
A more general statement would be to say it will exactly fit four constraints. Each constraint can be a point,
angle
In Euclidean geometry, an angle is the figure formed by two Ray (geometry), rays, called the ''Side (plane geometry), sides'' of the angle, sharing a common endpoint, called the ''vertex (geometry), vertex'' of the angle.
Angles formed by two ...
, or
curvature
In mathematics, curvature is any of several strongly related concepts in geometry. Intuitively, the curvature is the amount by which a curve deviates from being a straight line, or a surface deviates from being a plane.
For curves, the canonic ...
(which is the reciprocal of the radius of an
osculating circle
In differential geometry of curves, the osculating circle of a sufficiently smooth plane curve at a given point ''p'' on the curve has been traditionally defined as the circle passing through ''p'' and a pair of additional points on the curve i ...
). Angle and curvature constraints are most often added to the ends of a curve, and in such cases are called end conditions. Identical end conditions are frequently used to ensure a smooth transition between polynomial curves contained within a single
spline. Higher-order constraints, such as "the change in the rate of curvature", could also be added. This, for example, would be useful in highway
cloverleaf design to understand the rate of change of the forces applied to a car (see
jerk), as it follows the cloverleaf, and to set reasonable speed limits, accordingly.
The first degree polynomial equation could also be an exact fit for a single point and an angle while the third degree polynomial equation could also be an exact fit for two points, an angle constraint, and a curvature constraint. Many other combinations of constraints are possible for these and for higher order polynomial equations.
If there are more than ''n'' + 1 constraints (''n'' being the degree of the polynomial), the polynomial curve can still be run through those constraints. An exact fit to all constraints is not certain (but might happen, for example, in the case of a first degree polynomial exactly fitting three
collinear points
In geometry, collinearity of a set of points is the property of their lying on a single line. A set of points with this property is said to be collinear (sometimes spelled as colinear). In greater generality, the term has been used for aligned o ...
). In general, however, some method is then needed to evaluate each approximation. The
least squares
The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the res ...
method is one way to compare the deviations.
There are several reasons given to get an approximate fit when it is possible to simply increase the degree of the polynomial equation and get an exact match.:
* Even if an exact match exists, it does not necessarily follow that it can be readily discovered. Depending on the algorithm used there may be a divergent case, where the exact fit cannot be calculated, or it might take too much computer time to find the solution. This situation might require an approximate solution.
* The effect of averaging out questionable data points in a sample, rather than distorting the curve to fit them exactly, may be desirable.
*
Runge's phenomenon
In the mathematical field of numerical analysis, Runge's phenomenon () is a problem of oscillation at the edges of an interval that occurs when using polynomial interpolation with polynomials of high degree over a set of equispaced interpolation ...
: high order polynomials can be highly oscillatory. If a curve runs through two points ''A'' and ''B'', it would be expected that the curve would run somewhat near the midpoint of ''A'' and ''B'', as well. This may not happen with high-order polynomial curves; they may even have values that are very large in positive or negative
magnitude
Magnitude may refer to:
Mathematics
*Euclidean vector, a quantity defined by both its magnitude and its direction
*Magnitude (mathematics), the relative size of an object
*Norm (mathematics), a term for the size or length of a vector
*Order of ...
. With low-order polynomials, the curve is more likely to fall near the midpoint (it's even guaranteed to exactly run through the midpoint on a first degree polynomial).
* Low-order polynomials tend to be smooth and high order polynomial curves tend to be "lumpy". To define this more precisely, the maximum number of
inflection point
In differential calculus and differential geometry, an inflection point, point of inflection, flex, or inflection (British English: inflexion) is a point on a smooth plane curve at which the curvature changes sign. In particular, in the case of ...
s possible in a polynomial curve is ''n-2'', where ''n'' is the order of the polynomial equation. An inflection point is a location on the curve where it switches from a positive radius to negative. We can also say this is where it transitions from "holding water" to "shedding water". Note that it is only "possible" that high order polynomials will be lumpy; they could also be smooth, but there is no guarantee of this, unlike with low order polynomial curves. A fifteenth degree polynomial could have, at most, thirteen inflection points, but could also have eleven, or nine or any odd number down to one. (Polynomials with even numbered degree could have any even number of inflection points from ''n'' - 2 down to zero.)
The degree of the polynomial curve being higher than needed for an exact fit is undesirable for all the reasons listed previously for high order polynomials, but also leads to a case where there are an infinite number of solutions. For example, a first degree polynomial (a line) constrained by only a single point, instead of the usual two, would give an infinite number of solutions. This brings up the problem of how to compare and choose just one solution, which can be a problem for software and for humans, as well. For this reason, it is usually best to choose as low a degree as possible for an exact match on all constraints, and perhaps an even lower degree, if an approximate fit is acceptable.
Fitting other functions to data points
Other types of curves, such as
trigonometric functions
In mathematics, the trigonometric functions (also called circular functions, angle functions or goniometric functions) are real functions which relate an angle of a right-angled triangle to ratios of two side lengths. They are widely used in all ...
(such as sine and cosine), may also be used, in certain cases.
In spectroscopy, data may be fitted with
Gaussian
Carl Friedrich Gauss (1777–1855) is the eponym of all of the topics listed below.
There are over 100 topics all named after this German mathematician and scientist, all in the fields of mathematics, physics, and astronomy. The English eponymo ...
,
Lorentzian,
Voigt and related functions.
In biology, ecology, demography, epidemiology, and many other disciplines, the
growth of a population, the spread of infectious disease, etc. can be fitted using the
logistic function
A logistic function or logistic curve is a common S-shaped curve (sigmoid curve) with equation
f(x) = \frac,
where
For values of x in the domain of real numbers from -\infty to +\infty, the S-curve shown on the right is obtained, with the ...
.
In
agriculture
Agriculture or farming is the practice of cultivating plants and livestock. Agriculture was the key development in the rise of sedentary human civilization, whereby farming of domesticated species created food surpluses that enabled people to ...
the inverted logistic
sigmoid function
A sigmoid function is a mathematical function having a characteristic "S"-shaped curve or sigmoid curve.
A common example of a sigmoid function is the logistic function shown in the first figure and defined by the formula:
:S(x) = \frac = \f ...
(S-curve) is used to describe the relation between crop yield and growth factors. The blue figure was made by a sigmoid regression of data measured in farm lands. It can be seen that initially, i.e. at low soil salinity, the crop yield reduces slowly at increasing soil salinity, while thereafter the decrease progresses faster.
Geometric fitting of plane curves to data points
If a function of the form
cannot be postulated, one can still try to fit a
plane curve
In mathematics, a plane curve is a curve in a plane that may be either a Euclidean plane, an affine plane or a projective plane. The most frequently studied cases are smooth plane curves (including piecewise smooth plane curves), and algebraic pla ...
.
Other types of curves, such as
conic sections (circular, elliptical, parabolic, and hyperbolic arcs) or
trigonometric functions
In mathematics, the trigonometric functions (also called circular functions, angle functions or goniometric functions) are real functions which relate an angle of a right-angled triangle to ratios of two side lengths. They are widely used in all ...
(such as sine and cosine), may also be used, in certain cases. For example, trajectories of objects under the influence of gravity follow a parabolic path, when air resistance is ignored. Hence, matching trajectory data points to a parabolic curve would make sense. Tides follow sinusoidal patterns, hence tidal data points should be matched to a sine wave, or the sum of two sine waves of different periods, if the effects of the Moon and Sun are both considered.
For a
parametric curve
In mathematics, a parametric equation defines a group of quantities as functions of one or more independent variables called parameters. Parametric equations are commonly used to express the coordinates of the points that make up a geometric obj ...
, it is effective to fit each of its coordinates as a separate function of
arc length
ARC may refer to:
Business
* Aircraft Radio Corporation, a major avionics manufacturer from the 1920s to the '50s
* Airlines Reporting Corporation, an airline-owned company that provides ticket distribution, reporting, and settlement services
* ...
; assuming that data points can be ordered, the
chord distance
Chord may refer to:
* Chord (music), an aggregate of musical pitches sounded simultaneously
** Guitar chord a chord played on a guitar, which has a particular tuning
* Chord (geometry), a line segment joining two points on a curve
* Chord (a ...
may be used.
Fitting a circle by geometric fit
Coope approaches the problem of trying to find the best visual fit of circle to a set of 2D data points. The method elegantly transforms the ordinarily non-linear problem into a linear problem that can be solved without using iterative numerical methods, and is hence much faster than previous techniques.
Fitting an ellipse by geometric fit
The above technique is extended to general ellipses
[Paul Sheer]
A software assistant for manual stereo photometrology
M.Sc. thesis, 1997 by adding a non-linear step, resulting in a method that is fast, yet finds visually pleasing ellipses of arbitrary orientation and displacement.
Fitting surfaces
Note that while this discussion was in terms of 2D curves, much of this logic also extends to 3D surfaces, each patch of which is defined by a net of curves in two parametric directions, typically called u and v. A surface may be composed of one or more surface patches in each direction.
Software
Many
statistical packages such as
R and
numerical software such as the
gnuplot
gnuplot is a command-line and GUI program that can generate two- and three-dimensional plots of functions, data, and data fits. The program runs on all major computers and operating systems (Linux, Unix, Microsoft Windows, macOS, FreeDOS, an ...
,
GNU Scientific Library
The GNU Scientific Library (or GSL) is a software library for numerical computations in applied mathematics and science. The GSL is written in C; wrappers are available for other programming languages. The GSL is part of the GNU Project and is d ...
,
MLAB
MLAB (Modeling LABoratory) is a multi-paradigm numerical computing environment and fourth-generation programming language was originally developed at the National Institutes of Health.
A proprietary programming language developed by Civilized ...
,
Maple
''Acer'' () is a genus of trees and shrubs commonly known as maples. The genus is placed in the family Sapindaceae.Stevens, P. F. (2001 onwards). Angiosperm Phylogeny Website. Version 9, June 2008 nd more or less continuously updated since http ...
,
MATLAB
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementation ...
, TK Solver 6.0,
Scilab
Scilab is a free and open-source, cross-platform numerical computational package and a high-level, numerically oriented programming language. It can be used for signal processing, statistical analysis, image enhancement, fluid dynamics simulat ...
,
Mathematica,
GNU Octave
GNU Octave is a high-level programming language primarily intended for scientific computing and numerical computation. Octave helps in solving linear and nonlinear problems numerically, and for performing other numerical experiments using a langu ...
, and
SciPy
SciPy (pronounced "sigh pie") is a free and open-source Python library used for scientific computing and technical computing.
SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal ...
include commands for doing curve fitting in a variety of scenarios. There are also programs specifically written to do curve fitting; they can be found in the
lists of statistical and
numerical-analysis programs as well as in
:Regression and curve fitting software.
See also
*
Calibration curve
In analytical chemistry, a calibration curve, also known as a standard curve, is a general method for determining the concentration of a substance in an unknown sample by comparing the unknown to a set of standard samples of known concentration. ...
*
Curve-fitting compaction
Curve-fitting compaction is data compaction accomplished by replacing data to be stored or transmitted with an analytical expression.
Examples of curve-fitting compaction consisting of discretization and then interpolation are:
* Breaking of a ...
*
Estimation theory
Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their valu ...
*
Function approximation
In general, a function approximation problem asks us to select a function among a that closely matches ("approximates") a in a task-specific way. The need for function approximations arises in many branches of applied mathematics, and compute ...
*
Goodness of fit
The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measure ...
*
Genetic programming
In artificial intelligence, genetic programming (GP) is a technique of evolving programs, starting from a population of unfit (usually random) programs, fit for a particular task by applying operations analogous to natural genetic processes to t ...
*
Least-squares adjustment Least-squares adjustment is a model for the solution of an overdetermined system of equations based on the principle of least squares of observation residuals. It is used extensively in the disciplines of surveying, geodesy, and photogrammetry—t ...
*
Levenberg–Marquardt algorithm
In mathematics and computing, the Levenberg–Marquardt algorithm (LMA or just LM), also known as the damped least-squares (DLS) method, is used to solve non-linear least squares problems. These minimization problems arise especially in least sq ...
*
Line fitting
Line fitting is the process of constructing a straight line that has the best fit to a series of data points.
Several methods exist, considering:
*Vertical distance: Simple linear regression
**Resistance to outliers: Robust simple linear regres ...
*
Multi expression programming
Multi Expression Programming (MEP) is an evolutionary algorithm for generating mathematical functions describing a given set of data. MEP is a Genetic Programming variant encoding multiple solutions in the same chromosome. MEP representation is no ...
*
Nonlinear regression
In statistics, nonlinear regression is a form of regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one or more independent variables. The data are fi ...
*
Overfitting
mathematical modeling, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably". An overfitt ...
*
Plane curve
In mathematics, a plane curve is a curve in a plane that may be either a Euclidean plane, an affine plane or a projective plane. The most frequently studied cases are smooth plane curves (including piecewise smooth plane curves), and algebraic pla ...
*
Probability distribution fitting
Probability distribution fitting or simply distribution fitting is the fitting of a probability distribution to a series of data concerning the repeated measurement of a variable phenomenon.
The aim of distribution fitting is to predict the proba ...
*
Sinusoidal model
In statistics, signal processing, and time series analysis, a sinusoidal model is used to approximate a sequence ''Yi'' to a sine function:
:Y_i = C + \alpha\sin(\omega T_i + \phi) + E_i
where ''C'' is constant defining a mean level, α is an ...
*
Smoothing
In statistics and image processing, to smooth a data set is to create an approximating function that attempts to capture important patterns in the data, while leaving out noise or other fine-scale structures/rapid phenomena. In smoothing, the dat ...
*
Splines (
interpolating,
smoothing
In statistics and image processing, to smooth a data set is to create an approximating function that attempts to capture important patterns in the data, while leaving out noise or other fine-scale structures/rapid phenomena. In smoothing, the dat ...
)
*
Time series
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Exa ...
*
Total least squares
In applied statistics, total least squares is a type of errors-in-variables regression, a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account. It is a generaliza ...
*
Linear trend estimation
Linear trend estimation is a statistical technique to aid interpretation of data. When a series of measurements of a process are treated as, for example, a sequences or time series, trend estimation can be used to make and justify statements abo ...
References
Further reading
*N. Chernov (2010), ''Circular and linear regression: Fitting circles and lines by least squares'', Chapman & Hall/CRC, Monographs on Statistics and Applied Probability, Volume 117 (256 pp.)
{{Authority control
Curve fitting,
Numerical analysis
Interpolation
Regression analysis