Interior-point methods (also referred to as barrier methods or IPMs) are
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
s for solving
linear
In mathematics, the term ''linear'' is used in two distinct senses for two different properties:
* linearity of a '' function'' (or '' mapping'');
* linearity of a '' polynomial''.
An example of a linear function is the function defined by f(x) ...
and
non-linear
In mathematics and science, a nonlinear system (or a non-linear system) is a system in which the change of the output is not proportional to the change of the input. Nonlinear problems are of interest to engineers, biologists, physicists, mathe ...
convex optimization
Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets (or, equivalently, maximizing concave functions over convex sets). Many classes of convex optimization problems ...
problems. IPMs combine two advantages of previously-known algorithms:
* Theoretically, their run-time is
polynomial
In mathematics, a polynomial is a Expression (mathematics), mathematical expression consisting of indeterminate (variable), indeterminates (also called variable (mathematics), variables) and coefficients, that involves only the operations of addit ...
—in contrast to the
simplex method
In mathematical optimization, Dantzig's simplex algorithm (or simplex method) is a popular algorithm for linear programming.
The name of the algorithm is derived from the concept of a simplex and was suggested by T. S. Motzkin. Simplices are n ...
, which has exponential run-time in the worst case.
* Practically, they run as fast as the simplex method—in contrast to the
ellipsoid method
In mathematical optimization, the ellipsoid method is an iterative method for convex optimization, minimizing convex functions over convex sets. The ellipsoid method generates a sequence of ellipsoids whose volume uniformly decreases at every ste ...
, which has polynomial run-time in theory but is very slow in practice.
In contrast to the simplex method which traverses the ''boundary'' of the feasible region, and the ellipsoid method which bounds the feasible region from ''outside'', an IPM reaches a best solution by traversing the ''interior'' of the
feasible region
In mathematical optimization and computer science, a feasible region, feasible set, or solution space is the set of all possible points (sets of values of the choice variables) of an optimization problem that satisfy the problem's constraints, ...
—hence the name.
History
An interior point method was discovered by Soviet mathematician I. I. Dikin in 1967. The method was reinvented in the U.S. in the mid-1980s. In 1984,
Narendra Karmarkar
Narendra Krishna Karmarkar (born 1956) is an Indian mathematician. He developed Karmarkar's algorithm. He is listed as an ISI highly cited researcher.
He invented one of the first probably polynomial time algorithms for linear programming, w ...
developed a method for
linear programming
Linear programming (LP), also called linear optimization, is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements and objective are represented by linear function#As a polynomia ...
called
Karmarkar's algorithm, which runs in probably polynomial time (
operations on ''L''-bit numbers, where ''n'' is the number of variables and constants), and is also very efficient in practice. Karmarkar's paper created a surge of interest in interior point methods. Two years later,
James Renegar invented the first ''path-following'' interior-point method, with run-time
. The method was later extended from linear to convex optimization problems, based on a
self-concordant barrier function used to encode the
convex set
In geometry, a set of points is convex if it contains every line segment between two points in the set.
For example, a solid cube (geometry), cube is a convex set, but anything that is hollow or has an indent, for example, a crescent shape, is n ...
.
Any convex optimization problem can be transformed into minimizing (or maximizing) a
linear function
In mathematics, the term linear function refers to two distinct but related notions:
* In calculus and related areas, a linear function is a function whose graph is a straight line, that is, a polynomial function of degree zero or one. For di ...
over a convex set by converting to the
epigraph form.
The idea of encoding the
feasible set using a barrier and designing barrier methods was studied by Anthony V. Fiacco, Garth P. McCormick, and others in the early 1960s. These ideas were mainly developed for general
nonlinear programming
In mathematics, nonlinear programming (NLP) is the process of solving an optimization problem where some of the constraints are not linear equalities or the objective function is not a linear function. An optimization problem is one of calculation ...
, but they were later abandoned due to the presence of more competitive methods for this class of problems (e.g.
sequential quadratic programming
Sequential quadratic programming (SQP) is an iterative method for constrained nonlinear optimization, also known as Lagrange-Newton method. SQP methods are used on mathematical problems for which the objective function and the constraints are twi ...
).
Yurii Nesterov and
Arkadi Nemirovski came up with a special class of such barriers that can be used to encode any convex set. They guarantee that the number of
iteration
Iteration is the repetition of a process in order to generate a (possibly unbounded) sequence of outcomes. Each repetition of the process is a single iteration, and the outcome of each iteration is then the starting point of the next iteration.
...
s of the algorithm is bounded by a polynomial in the dimension and accuracy of the solution.
The class of primal-dual path-following interior-point methods is considered the most successful.
Mehrotra's predictor–corrector algorithm provides the basis for most implementations of this class of methods.
Definitions
We are given a
convex program of the form:
where f is a
convex function
In mathematics, a real-valued function is called convex if the line segment between any two distinct points on the graph of a function, graph of the function lies above or on the graph between the two points. Equivalently, a function is conve ...
and G is a
convex set
In geometry, a set of points is convex if it contains every line segment between two points in the set.
For example, a solid cube (geometry), cube is a convex set, but anything that is hollow or has an indent, for example, a crescent shape, is n ...
. Without loss of generality,
we can assume that the objective ''f'' is a linear function. Usually, the convex set ''G'' is represented by a set of convex inequalities and linear equalities; the linear equalities can be eliminated using linear algebra, so for simplicity we assume there are only convex inequalities, and the program can be described as follows, where the ''g
i'' are convex functions:
We assume that the constraint functions belong to some family (e.g. quadratic functions), so that the program can be represented by a finite ''vector of coefficients'' (e.g. the coefficients to the quadratic functions). The dimension of this coefficient vector is called the ''size'' of the program. A ''numerical solver'' for a given family of programs is an algorithm that, given the coefficient vector, generates a sequence of approximate solutions ''x
t'' for ''t''=1,2,..., using finitely many arithmetic operations. A numerical solver is called ''convergent'' if, for any program from the family and any positive ''ε''>0, there is some ''T'' (which may depend on the program and on ''ε'') such that, for any ''t''>''T'', the approximate solution ''x
t'' is ''ε-approximate,'' that is:
where
is the optimal solution. A solver is called ''polynomial'' if the total number of arithmetic operations in the first ''T'' steps is at most
poly(problem-size) * log(''V''/''ε''),
where ''V'' is some data-dependent constant, e.g., the difference between the largest and smallest value in the feasible set. In other words, ''V''/''ε'' is the "relative accuracy" of the solution - the accuracy w.r.t. the largest coefficient. log(''V''/''ε'') represents the number of "accuracy digits". Therefore, a solver is 'polynomial' if each additional digit of accuracy requires a number of operations that is polynomial in the problem size.
Types
Types of interior point methods include:
* Potential reduction methods:
Karmarkar's algorithm was the first one.
* Path-following methods: the algorithms of
James Renegar and Clovis Gonzaga
were the first ones.
* Primal-dual methods.
Path-following methods
Idea
Given a convex optimization program (P) with constraints, we can convert it to an ''unconstrained'' program by adding a
barrier function. Specifically, let ''b'' be a smooth convex function, defined in the interior of the feasible region ''G'', such that for any sequence whose limit is on the boundary of ''G'':
. We also assume that ''b'' is non-degenerate, that is:
is
positive definite for all x in interior(G). Now, consider the family of programs:
(''Pt'') minimize t * f(x) + b(x)
Technically the program is restricted, since ''b'' is defined only in the interior of ''G''. But practically, it is possible to solve it as an unconstrained program, since any solver trying to minimize the function will not approach the boundary, where ''b'' approaches infinity. Therefore, (''P
t'') has a unique solution - denote it by ''x''*(''t''). The function ''x''* is a continuous function of ''t'', which is called the ''central path''. All limit points of ''x''*, as ''t'' approaches infinity, are optimal solutions of the original program (P).
A path-following method is a method of tracking the function ''x''* along a certain increasing sequence t
1,t
2,..., that is: computing a good-enough approximation ''x
i'' to the point ''x''*(''t
i''), such that the difference ''x
i'' - ''x''*(''t
i'') approaches 0 as ''i'' approaches infinity; then the sequence ''x
i'' approaches the optimal solution of (P). This requires to specify three things:
* The barrier function b(x).
* A policy for determining the penalty parameters ''t
i''.
* The unconstrained-optimization solver used to solve (''P
i'') and find ''x
i'', such as
Newton's method
In numerical analysis, the Newton–Raphson method, also known simply as Newton's method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a ...
. Note that we can use each ''x
i'' as a starting-point for solving the next problem (''P
i+1'').
The main challenge in proving that the method is polytime is that, as the penalty parameter grows, the solution gets near the boundary, and the function becomes steeper. The run-time of solvers such as
Newton's method
In numerical analysis, the Newton–Raphson method, also known simply as Newton's method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a ...
becomes longer, and it is hard to prove that the total runtime is polynomial.
Renegar
and Gonzaga
proved that a specific instance of a path-following method is polytime:
* The constraints (and the objective) are linear functions;
* The barrier function is
logarithmic: b(x) := - sum''
j'' log(''-g
j''(''x'')).
* The penalty parameter ''t'' is updated geometrically, that is,
, where ''μ'' is a constant (they took
, where ''m'' is the number of inequality constraints);
* The solver is Newton's method, and a ''single'' step of Newton is done for each single step in ''t''.
They proved that, in this case, the difference ''x
i'' - ''x''*(''t
i'') remains at most 0.01, and f(''x
i'') - f* is at most 2*''m''/''t
i''. Thus, the solution accuracy is proportional to 1/''t
i'', so to add a single accuracy-digit, it is sufficient to multiply ''t
i'' by 2 (or any other constant factor), which requires O(sqrt(''m'')) Newton steps. Since each Newton step takes O(''m n''
2) operations, the total complexity is O(''m
3/2 n''
2) operations for accuracy digit.
Yuri Nesterov extended the idea from linear to non-linear programs. He noted that the main property of the logarithmic barrier, used in the above proofs, is that it is
self-concordant with a finite barrier parameter. Therefore, many other classes of convex programs can be solved in polytime using a path-following method, if we can find a suitable self-concordant barrier function for their feasible region.
Details
We are given a convex optimization problem (P) in "standard form":
minimize ''c''T''x'' s.t. ''x'' in ''G'',
where ''G'' is convex and closed. We can also assume that ''G'' is bounded (we can easily make it bounded by adding a constraint , ''x'', ≤''R'' for some sufficiently large ''R'').
To use the interior-point method, we need a
self-concordant barrier for ''G''. Let ''b'' be an ''M''-self-concordant barrier for ''G'', where ''M''≥1 is the self-concordance parameter. We assume that we can compute efficiently the value of ''b'', its gradient, and its
Hessian, for every point x in the interior of ''G''.
For every ''t''>0, we define the ''penalized objective'' f
t(x) := t''c''
T''x +'' b(''x''). We define the path of minimizers by: x*(t) := arg min f
t(x). We approximate this path along an increasing sequence ''t
i''. The sequence is initialized by a certain non-trivial two-phase initialization procedure. Then, it is updated according to the following rule:
.
For each ''t
i'', we find an approximate minimum of ''f
ti'', denoted by ''x
i''. The approximate minimum is chosen to satisfy the following "closeness condition" (where ''L'' is the ''path tolerance''):
.
To find ''x
i''
+1, we start with ''x
i'' and apply the
damped Newton method. We apply several steps of this method, until the above "closeness relation" is satisfied. The first point that satisfies this relation is denoted by ''x
i''
+1.
Convergence and complexity
The convergence rate of the method is given by the following formula, for every ''i'':
Taking
, the number of Newton steps required to go from ''x
i'' to ''x
i''
+1 is at most a fixed number, that depends only on ''r'' and ''L''. In particular, the total number of Newton steps required to find an ''ε''-approximate solution (i.e., finding ''x'' in ''G'' such that ''c''
T''x'' - c* ≤ ''ε'') is at most:
where the constant factor O(1) depends only on ''r'' and ''L''. The number of Newton steps required for the two-step initialization procedure is at most:
where the constant factor O(1) depends only on ''r'' and ''L'', and
, and
is some point in the interior of ''G''. Overall, the overall Newton complexity of finding an ''ε''-approximate solution is at most
, where V is some problem-dependent constant: .
Each Newton step takes O(''n''
3) arithmetic operations.
Initialization: phase-I methods
To initialize the path-following methods, we need a point in the relative interior of the feasible region ''G''. In other words: if ''G'' is defined by the inequalities ''g
i''(''x'') ≤ 0, then we need some ''x'' for which ''g
i''(''x'') < 0 for all ''i'' in 1,...,''m''. If we do not have such a point, we need to find one using a so-called phase I method.
A simple phase-I method is to solve the following convex program:
Denote the optimal solution by x*,''s''*.
* If ''s''*<0, then we know that x* is an interior point of the original problem and can go on to "phase II", which is solving the original problem.
* If ''s''*>0, then we know that the original program is infeasible - the feasible region is empty.
* If ''s''*=0 and it is attained by some solution x*, then the problem is feasible but has no interior point; if it is not attained, then the problem is infeasible.
For this program it is easy to get an interior point: we can take arbitrarily ''x''=0, and take ''s'' to be any number larger than max(''f''
1(0),...,''f
m''(0)). Therefore, it can be solved using interior-point methods. However, the run-time is proportional to log(1/''s''*). As s* comes near 0, it becomes harder and harder to find an exact solution to the phase-I problem, and thus harder to decide whether the original problem is feasible.
Practical considerations
The theoretic guarantees assume that the penalty parameter is increased at the rate
, so the worst-case number of required Newton steps is
. In theory, if ''μ'' is larger (e.g. 2 or more), then the worst-case number of required Newton steps is in
. However, in practice, larger ''μ'' leads to a much faster convergence. These methods are called ''long-step methods''.
In practice, if ''μ'' is between 3 and 100, then the program converges within 20-40 Newton steps, regardless of the number of constraints (though the runtime of each Newton step of course grows with the number of constraints). The exact value of ''μ'' within this range has little effect on the performance.
Potential-reduction methods
For potential-reduction methods, the problem is presented in the ''conic form'':
minimize ''c''T''x'' s.t. ''x'' in '' ∩ K'',
where ''b'' is a vector in R
''n'', L is a
linear subspace
In mathematics, the term ''linear'' is used in two distinct senses for two different properties:
* linearity of a ''function (mathematics), function'' (or ''mapping (mathematics), mapping'');
* linearity of a ''polynomial''.
An example of a li ...
in R
''n'' (so ''b''+''L'' is an
affine plane
In geometry, an affine plane is a two-dimensional affine space.
Definitions
There are two ways to formally define affine planes, which are equivalent for affine planes over a field.
The first way consists in defining an affine plane as a set on ...
), and ''K'' is a closed pointed
convex cone
In linear algebra, a cone—sometimes called a linear cone to distinguish it from other sorts of cones—is a subset of a real vector space that is closed under positive scalar multiplication; that is, C is a cone if x\in C implies sx\in C for e ...
with a nonempty interior. Every convex program can be converted to the conic form. To use the potential-reduction method (specifically, the extension of
Karmarkar's algorithm to convex programming), we need the following assumptions:
* A. The feasible set '' ∩ K'' is bounded, and intersects the interior of the cone ''K''.
* B. We are given in advance a strictly-feasible solution ''x''^, that is, a feasible solution in the interior of ''K''.
* C. We know in advance the optimal objective value, c*, of the problem.
* D. We are given an ''M''-logarithmically-homogeneous
self-concordant barrier ''F'' for the cone ''K''.
Assumptions A, B and D are needed in most interior-point methods. Assumption C is specific to Karmarkar's approach; it can be alleviated by using a "sliding objective value". It is possible to further reduce the program to the ''Karmarkar format'':
minimize ''s''T''x'' s.t. ''x'' in ''M ∩ K'' and ''e''T''x'' = 1
where ''M'' is a
linear subspace
In mathematics, the term ''linear'' is used in two distinct senses for two different properties:
* linearity of a ''function (mathematics), function'' (or ''mapping (mathematics), mapping'');
* linearity of a ''polynomial''.
An example of a li ...
of in R
''n'', and the optimal objective value is 0.
The method is based on the following
scalar potential
In mathematical physics, scalar potential describes the situation where the difference in the potential energies of an object in two different positions depends only on the positions, not upon the path taken by the object in traveling from one p ...
function:
''v''(''x'') = ''F''(''x'') + ''M'' ln (''s''T''x'')
where ''F'' is the ''M''-self-concordant barrier for the feasible cone. It is possible to prove that, when ''x'' is strictly feasible and ''v''(''x'') is very small (- very negative), ''x'' is approximately-optimal. The idea of the potential-reduction method is to modify ''x'' such that the potential at each iteration drops by at least a fixed constant ''X'' (specifically, ''X''=1/3-ln(4/3)). This implies that, after ''i'' iterations, the difference between objective value and the optimal objective value is at most ''V'' * exp(-''i X'' / ''M''), where ''V'' is a data-dependent constant. Therefore, the number of Newton steps required for an ''ε''-approximate solution is at most
.
Note that in path-following methods the expression is
rather than ''M'', which is better in theory. But in practice, Karmarkar's method allows taking much larger steps towards the goal, so it may converge much faster than the theoretical guarantees.
Primal-dual methods
The primal-dual method's idea is easy to demonstrate for constrained
nonlinear optimization.
For simplicity, consider the following nonlinear optimization problem with inequality constraints:
This inequality-constrained optimization problem is solved by converting it into an unconstrained objective function whose minimum we hope to find efficiently.
Specifically, the logarithmic
barrier function associated with (1) is
Here
is a small positive scalar, sometimes called the "barrier parameter". As
converges to zero the minimum of
should converge to a solution of (1).
The
gradient
In vector calculus, the gradient of a scalar-valued differentiable function f of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p gives the direction and the rate of fastest increase. The g ...
of a differentiable function
is denoted
.
The gradient of the barrier function is
In addition to the original ("primal") variable
we introduce a
Lagrange multiplier
In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function (mathematics), function subject to constraint (mathematics), equation constraints (i.e., subject to the conditio ...
-inspired
dual variable
Equation (4) is sometimes called the "perturbed complementarity" condition, for its resemblance to "complementary slackness" in
KKT conditions.
We try to find those
for which the gradient of the barrier function is zero.
Substituting
from (4) into (3), we get an equation for the gradient:
where the matrix
is the
Jacobian of the constraints
.
The intuition behind (5) is that the gradient of
should lie in the subspace spanned by the constraints' gradients. The "perturbed complementarity" with small
(4) can be understood as the condition that the solution should either lie near the boundary
, or that the projection of the gradient
on the constraint component
normal should be almost zero.
Let
be the search direction for iteratively updating
.
Applying
Newton's method
In numerical analysis, the Newton–Raphson method, also known simply as Newton's method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a ...
to (4) and (5), we get an equation for
:
where
is the
Hessian matrix
In mathematics, the Hessian matrix, Hessian or (less commonly) Hesse matrix is a square matrix of second-order partial derivatives of a scalar-valued Function (mathematics), function, or scalar field. It describes the local curvature of a functio ...
of
,
is a
diagonal matrix
In linear algebra, a diagonal matrix is a matrix in which the entries outside the main diagonal are all zero; the term usually refers to square matrices. Elements of the main diagonal can either be zero or nonzero. An example of a 2×2 diagon ...
of
, and
is the diagonal matrix of
.
Because of (1), (4) the condition
:
should be enforced at each step. This can be done by choosing appropriate
:
:
Types of convex programs solvable via interior-point methods
Here are some special cases of convex programs that can be solved efficiently by interior-point methods.
Linear program
Linear programming (LP), also called linear optimization, is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements and objective are represented by linear relationships. Linear ...
s
Consider a linear program of the form:
We can apply path-following methods with the barrier
The function
is self-concordant with parameter ''M''=''m'' (the number of constraints). Therefore, the number of required Newton steps for the path-following method is O(''mn''
2), and the total runtime complexity is O(''m''
3/2 ''n''
2).
Quadratically constrained quadratic programs
Given a quadratically constrained quadratic program of the form:
where all matrices ''A
j'' are
positive-semidefinite matrices.
We can apply path-following methods with the barrier
The function
is a self-concordant barrier with parameter ''M''=''m''. The Newton complexity is O(''(m+n)n''
2), and the total runtime complexity is O(''m''
1/2 (m+n) ''n''
2).
Lp norm approximation
Consider a problem of the form
where each
is a vector, each
is a scalar, and
is an
Lp norm with
After converting to the standard form, we can apply path-following methods with a self-concordant barrier with parameter ''M''=4''m''. The Newton complexity is O(''(m+n)n''
2), and the total runtime complexity is O(''m''
1/2 (m+n) ''n''
2).
Geometric programs
Consider the problem
There is a self-concordant barrier with parameter 2''k''+''m''. The path-following method has Newton complexity O(''mk''
2+''k''
3+''n''
3) and total complexity O((''k+m'')
1/2 2+''k''3+''n''3">'mk''2+''k''3+''n''3.
Semidefinite programs
Interior point methods can be used to solve semidefinite programs.
See also
*
Affine scaling
*
Augmented Lagrangian method
*
Chambolle-Pock algorithm
In mathematics, the Chambolle-Pock algorithm is an algorithm used to solve convex optimization problems. It was introduced by Antonin Chambolle and Thomas Pock in 2011 and has since become a widely used method in various fields, including image pr ...
*
Karush–Kuhn–Tucker conditions
In mathematical optimization, the Karush–Kuhn–Tucker (KKT) conditions, also known as the Kuhn–Tucker conditions, are first derivative tests (sometimes called first-order necessary conditions) for a solution in nonlinear programming to be ...
*
Penalty method
In mathematical optimization, penalty methods are a certain class of algorithms for solving constrained optimization problems.
A penalty method replaces a constrained optimization problem by a series of unconstrained problems whose solutions idea ...
References
*
*
*
{{Optimization algorithms, convex
Optimization algorithms and methods