statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...

and

numerical analysis Numerical analysis is the study of algorithms that use numerical approximation (as opposed to symbolic manipulations) for the problems of mathematical analysis (as distinguished from discrete mathematics). It is the study of numerical methods ...

, isotonic regression or

monotonic In mathematics, a monotonic function (or monotone function) is a function between ordered sets that preserves or reverses the given order. This concept first arose in calculus, and was later generalized to the more abstract setting of order ...

regression is the technique of fitting a free-form line to a sequence of observations such that the fitted line is non-decreasing (or non-increasing) everywhere, and lies as close to the observations as possible.

Applications

Isotonic regression has applications in

statistical inference Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properti ...

. For example, one might use it to fit an isotonic curve to the means of some set of experimental results when an increase in those means according to some particular ordering is expected. A benefit of isotonic regression is that it is not constrained by any functional form, such as the linearity imposed by

linear regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is cal ...

, as long as the function is monotonic increasing. Another application is nonmetric

multidimensional scaling Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. MDS is used to translate "information about the pairwise 'distances' among a set of n objects or individuals" into a configurati ...

, where a low-dimensional

embedding In mathematics, an embedding (or imbedding) is one instance of some mathematical structure contained within another instance, such as a group that is a subgroup. When some object X is said to be embedded in another object Y, the embedding is g ...

for data points is sought such that order of distances between points in the embedding matches order of dissimilarity between points. Isotonic regression is used iteratively to fit ideal distances to preserve relative dissimilarity order. Isotonic regression is also used in probabilistic classification to calibrate the predicted probabilities of

supervised machine learning Supervised learning (SL) is a machine learning paradigm for problems where the available data consists of labelled examples, meaning that each data point contains features (covariates) and an associated label. The goal of supervised learning alg ...

models. Isotonic regression for the simply ordered case with univariate

x, y

has been applied to estimating continuous dose-response relationships in fields such as anesthesiology and toxicology. Narrowly speaking, isotonic regression only provides point estimates at observed values of

x.

Estimation of the complete dose-response curve without any additional assumptions is usually done via linear interpolation between the point estimates. Software for computing isotone (monotonic) regression has been developed for R,

Stata Stata (, , alternatively , occasionally stylized as STATA) is a general-purpose statistical software package developed by StataCorp for data manipulation, visualization, statistics, and automated reporting. It is used by researchers in many fie ...

, and Python.

Problem Statement and Algorithms

Let

(x_1,y_1),\ldots,(x_n,y_n)

be a given set of observations, where the

y_i \in \mathbb

and the

x_i

fall in some

partially ordered set In mathematics, especially order theory, a partially ordered set (also poset) formalizes and generalizes the intuitive concept of an ordering, sequencing, or arrangement of the elements of a set. A poset consists of a set together with a binary ...

. For generality, each observation

(x_i,y_i)

may be given a weight

w_i \ge 0

, although commonly

w_i=1

for all

i

. Isotonic regression seeks a weighted

least-squares The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the res ...

fit

\hat_i \approx y_i

for all

i

, subject to the constraint that

\hat_i \le \hat_j

whenever

x_i \le x_j

. This gives the following

quadratic program Quadratic programming (QP) is the process of solving certain mathematical optimization problems involving quadratic functions. Specifically, one seeks to optimize (minimize or maximize) a multivariate quadratic function subject to linear const ...

(QP) in the variables

\hat_1, \ldots, \hat_n

: :

\min \sum_^n w_i (\hat_i - y_i)^2

subject to

\hat_i \le \hat_j \text (i,j) \in E

where

E = \

specifies the partial ordering of the observed inputs

x_i

(and may be regarded as the set of edges of some

directed acyclic graph In mathematics, particularly graph theory, and computer science, a directed acyclic graph (DAG) is a directed graph with no directed cycles. That is, it consists of vertices and edges (also called ''arcs''), with each edge directed from one ...

(dag) with vertices

1,2,\ldots n

). Problems of this form may be solved by generic quadratic programming techniques. In the usual setting where the

x_i

values fall in a

totally ordered set In mathematics, a total or linear order is a partial order in which any two elements are comparable. That is, a total order is a binary relation \leq on some set X, which satisfies the following for all a, b and c in X: # a \leq a ( reflexive) ...

such as

\mathbb

, we may assume WLOG that the observations have been sorted so that

x_1 \le x_2 \le \cdots \le x_n

, and take

E = \

. In this case, a simple

iterative algorithm In computational mathematics, an iterative method is a mathematical procedure that uses an initial value to generate a sequence of improving approximate solutions for a class of problems, in which the ''n''-th approximation is derived from the pre ...

for solving the quadratic program is the

pool adjacent violators algorithm Pool may refer to: Water pool * Swimming pool, usually an artificial structure containing a large body of water intended for swimming * Reflecting pool, a shallow pool designed to reflect a structure and its surroundings * Tide pool, a rocky p ...

. Conversely, Best and Chakravarti studied the problem as an active set identification problem, and proposed a primal algorithm. These two algorithms can be seen as each other's dual, and both have a

computational complexity In computer science, the computational complexity or simply complexity of an algorithm is the amount of resources required to run it. Particular focus is given to computation time (generally measured by the number of needed elementary operations) ...

O(n)

on already sorted data. To complete the isotonic regression task, we may then choose any non-decreasing function

f(x)

such that

f(x_i) = \hat_i

for all i. Any such function obviously solves :

\min_f \sum_^n w_i (f(x_i) - y_i)^2

subject to

f

being nondecreasing and can be used to predict the

y

values for new values of

x

. A common choice when

x_i \in \mathbb

would be to interpolate linearly between the points

(x_i,\hat_i)

, as illustrated in the figure, yielding a continuous piecewise linear function: :

f(x) = \begin 
                 \hat_1 & \text x \le x_1 \\
                 \hat_i + \frac(\hat_-\hat_i) & \text x_i \le x \le x_ \\
                 \hat_n & \text x \ge x_n
              \end

Centered Isotonic Regression

As this article's first figure shows, in the presence of monotonicity violations the resulting interpolated curve will have flat (constant) intervals. In dose-response applications it is usually known that

f(x)

is not only monotone but also smooth. The flat intervals are incompatible with

f(x)

's assumed shape, and can be shown to be biased. A simple improvement for such applications, named centered isotonic regression (CIR), was developed by Oron and Flournoy and shown to substantially reduce estimation error for both dose-response and dose-finding applications. Both CIR and the standard isotonic regression for the univariate, simply ordered case, are implemented in the R package "cir". This package also provides analytical confidence-interval estimates.

Applications

Problem Statement and Algorithms

Centered Isotonic Regression

References

Further reading