![Isotonic regression](https://upload.wikimedia.org/wikipedia/commons/3/30/Isotonic_regression.svg)
In
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
and
numerical analysis
Numerical analysis is the study of algorithms that use numerical approximation (as opposed to symbolic computation, symbolic manipulations) for the problems of mathematical analysis (as distinguished from discrete mathematics). It is the study of ...
, isotonic regression or
monotonic
In mathematics, a monotonic function (or monotone function) is a function between ordered sets that preserves or reverses the given order. This concept first arose in calculus, and was later generalized to the more abstract setting of order ...
regression is the technique of fitting a free-form line to a sequence of observations such that the fitted line is
non-decreasing (or non-increasing) everywhere, and lies as close to the observations as possible.
Applications
Isotonic regression has applications in
statistical inference
Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution, distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical ...
. For example, one might use it to fit an isotonic curve to the means of some set of experimental results when an increase in those means according to some particular ordering is expected. A benefit of isotonic regression is that it is not constrained by any functional form, such as the linearity imposed by
linear regression, as long as the function is monotonic increasing.
Another application is nonmetric
multidimensional scaling
Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. MDS is used to translate "information about the pairwise 'distances' among a set of n objects or individuals" into a configurati ...
, where a low-dimensional
embedding
In mathematics, an embedding (or imbedding) is one instance of some mathematical structure contained within another instance, such as a group that is a subgroup.
When some object X is said to be embedded in another object Y, the embedding is gi ...
for data points is sought such that order of distances between points in the embedding matches
order of dissimilarity between points. Isotonic regression is used iteratively to fit ideal distances to preserve relative dissimilarity order.
Isotonic regression is also used in
probabilistic classification
In machine learning, a probabilistic classifier is a classifier that is able to predict, given an observation of an input, a probability distribution over a set of classes, rather than only outputting the most likely class that the observation sho ...
to calibrate the predicted probabilities of
supervised machine learning
Supervised learning (SL) is a machine learning paradigm for problems where the available data consists of labelled examples, meaning that each data point contains features (covariates) and an associated label. The goal of supervised learning alg ...
models.
Isotonic regression for the simply ordered case with univariate
has been applied to estimating continuous dose-response relationships in fields such as anesthesiology and toxicology. Narrowly speaking, isotonic regression only provides point estimates at observed values of
Estimation of the complete dose-response curve without any additional assumptions is usually done via linear interpolation between the point estimates.
Software for computing isotone (monotonic) regression has been developed for
R,
Stata, and
Python
Python may refer to:
Snakes
* Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia
** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia
* Python (mythology), a mythical serpent
Computing
* Python (pro ...
.
Problem Statement and Algorithms
Let
be a given set of observations, where the
and the
fall in some
partially ordered set
In mathematics, especially order theory, a partially ordered set (also poset) formalizes and generalizes the intuitive concept of an ordering, sequencing, or arrangement of the elements of a set. A poset consists of a set together with a bina ...
. For generality, each observation
may be given a weight
, although commonly
for all
.
Isotonic regression seeks a weighted
least-squares
The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the res ...
fit
for all
, subject to the constraint that
whenever
. This gives the following
quadratic program (QP) in the variables
:
:
subject to
where
specifies the partial ordering of the observed inputs
(and may be regarded as the set of edges of some
directed acyclic graph
In mathematics, particularly graph theory, and computer science, a directed acyclic graph (DAG) is a directed graph with no directed cycles. That is, it consists of vertices and edges (also called ''arcs''), with each edge directed from one ve ...
(dag) with vertices
). Problems of this form may be solved by generic quadratic programming techniques.
In the usual setting where the
values fall in a
totally ordered set
In mathematics, a total or linear order is a partial order in which any two elements are comparable. That is, a total order is a binary relation \leq on some set X, which satisfies the following for all a, b and c in X:
# a \leq a ( reflexive) ...
such as
, we may assume
WLOG
''Without loss of generality'' (often abbreviated to WOLOG, WLOG or w.l.o.g.; less commonly stated as ''without any loss of generality'' or ''with no loss of generality'') is a frequently used expression in mathematics. The term is used to indicate ...
that the observations have been sorted so that
, and take
. In this case, a simple
iterative algorithm
In computational mathematics, an iterative method is a mathematical procedure that uses an initial value to generate a sequence of improving approximate solutions for a class of problems, in which the ''n''-th approximation is derived from the pre ...
for solving the quadratic program is the
pool adjacent violators algorithm
Pool may refer to:
Water pool
* Swimming pool, usually an artificial structure containing a large body of water intended for swimming
* Reflecting pool, a shallow pool designed to reflect a structure and its surroundings
* Tide pool, a rocky poo ...
. Conversely, Best and Chakravarti
studied the problem as an active set identification problem, and proposed a primal algorithm. These two algorithms can be seen as each other's dual, and both have a
computational complexity of
on already sorted data.
To complete the isotonic regression task, we may then choose any non-decreasing function
such that
for all i. Any such function obviously solves
:
subject to
being nondecreasing
and can be used to predict the
values for new values of
. A common choice when
would be to interpolate linearly between the points
, as illustrated in the figure, yielding a continuous piecewise linear function:
:
Centered Isotonic Regression
As this article's first figure shows, in the presence of monotonicity violations the resulting interpolated curve will have flat (constant) intervals. In dose-response applications it is usually known that
is not only monotone but also smooth. The flat intervals are incompatible with
's assumed shape, and can be shown to be biased. A simple improvement for such applications, named centered isotonic regression (CIR), was developed by Oron and Flournoy and shown to substantially reduce estimation error for both dose-response and dose-finding applications.
Both CIR and the standard isotonic regression for the univariate, simply ordered case, are implemented in the R package "cir".
[ This package also provides analytical confidence-interval estimates.
]
References
Further reading
*
*
*
*
{{DEFAULTSORT:Isotonic Regression
Nonparametric regression
Nonparametric Bayesian statistics
Numerical analysis