mathematical optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfiel ...

theory, duality or the duality principle is the principle that

optimization problem In mathematics, engineering, computer science and economics Economics () is a behavioral science that studies the Production (economics), production, distribution (economics), distribution, and Consumption (economics), consumption of goo ...

s may be viewed from either of two perspectives, the primal problem or the dual problem. If the primal is a minimization problem then the dual is a maximization problem (and vice versa). Any feasible solution to the primal (minimization) problem is at least as large as any feasible solution to the dual (maximization) problem. Therefore, the solution to the primal is an upper bound to the solution of the dual, and the solution of the dual is a lower bound to the solution of the primal. This fact is called weak duality. In general, the optimal values of the primal and dual problems need not be equal. Their difference is called the duality gap. For

convex optimization Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets (or, equivalently, maximizing concave functions over convex sets). Many classes of convex optimization problems ...

problems, the duality gap is zero under a constraint qualification condition. This fact is called strong duality.

Dual problem

Usually the term "dual problem" refers to the ''Lagrangian dual problem'' but other dual problems are used – for example, the Wolfe dual problem and the Fenchel dual problem. The Lagrangian dual problem is obtained by forming the Lagrangian of a minimization problem by using nonnegative

Lagrange multiplier In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function (mathematics), function subject to constraint (mathematics), equation constraints (i.e., subject to the conditio ...

s to add the constraints to the objective function, and then solving for the primal variable values that minimize the original objective function. This solution gives the primal variables as functions of the Lagrange multipliers, which are called dual variables, so that the new problem is to maximize the objective function with respect to the dual variables under the derived constraints on the dual variables (including at least the nonnegativity constraints). In general given two dual pairs of separated locally convex spaces

\left(X,X^*\right)

and

\left(Y,Y^*\right)

and the function

f: X \to \mathbb \cup \

, we can define the primal problem as finding

\hat

such that

f(\hat) = \inf_ f(x). \,

In other words, if

\hat

exists,

f(\hat)

is the minimum of the function

f

and the infimum (greatest lower bound) of the function is attained. If there are constraint conditions, these can be built into the function

f

by letting

\tilde = f + I_

where

I_

is a suitable function on

X

that has a minimum 0 on the constraints, and for which one can prove that

\inf_ \tilde(x) = \inf_ f(x)

. The latter condition is trivially, but not always conveniently, satisfied for the characteristic function (i.e.

I_(x) = 0

for

x

satisfying the constraints and

I_(x) = \infty

otherwise). Then extend

\tilde

to a perturbation function

F: X \times Y \to \mathbb \cup \

such that

F(x,0) = \tilde(x)

. The duality gap is the difference of the right and left hand sides of the inequality :

\sup_ -F^*(0,y^*) \le \inf_ F(x,0), \,

where

F^*

is the convex conjugate in both variables and

\sup

denotes the

supremum In mathematics, the infimum (abbreviated inf; : infima) of a subset S of a partially ordered set P is the greatest element in P that is less than or equal to each element of S, if such an element exists. If the infimum of S exists, it is unique, ...

(least upper bound).

Duality gap

The duality gap is the difference between the values of any primal solutions and any dual solutions. If

d^*

is the optimal dual value and

p^*

is the optimal primal value, then the duality gap is equal to

p^* - d^*

. This value is always greater than or equal to 0 (for minimization problems). The duality gap is zero if and only if strong duality holds. Otherwise the gap is strictly positive and weak duality holds. In computational optimization, another "duality gap" is often reported, which is the difference in value between any dual solution and the value of a feasible but suboptimal iterate for the primal problem. This alternative "duality gap" quantifies the discrepancy between the value of a current feasible but suboptimal iterate for the primal problem and the value of the dual problem; the value of the dual problem is, under regularity conditions, equal to the value of the ''convex relaxation'' of the primal problem: The convex relaxation is the problem arising replacing a non-convex feasible set with its closed convex hull and with replacing a non-convex function with its convex closure, that is the function that has the epigraph that is the closed convex hull of the original primal objective function.

Linear case

Linear programming Linear programming (LP), also called linear optimization, is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements and objective are represented by linear function#As a polynomia ...

problems are

optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfiel ...

problems in which the

objective function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost ...

and the constraints are all

linear In mathematics, the term ''linear'' is used in two distinct senses for two different properties: * linearity of a '' function'' (or '' mapping''); * linearity of a '' polynomial''. An example of a linear function is the function defined by f(x) ...

. In the primal problem, the objective function is a

linear combination In mathematics, a linear combination or superposition is an Expression (mathematics), expression constructed from a Set (mathematics), set of terms by multiplying each term by a constant and adding the results (e.g. a linear combination of ''x'' a ...

of ''n'' variables. There are ''m'' constraints, each of which places an upper bound on a linear combination of the ''n'' variables. The goal is to maximize the value of the objective function subject to the constraints. A ''solution'' is a vector (a list) of ''n'' values that achieves the maximum value for the objective function. In the dual problem, the objective function is a linear combination of the ''m'' values that are the limits in the ''m'' constraints from the primal problem. There are ''n'' dual constraints, each of which places a lower bound on a linear combination of ''m'' dual variables.

Relationship between the primal problem and the dual problem

In the linear case, in the primal problem, from each sub-optimal point that satisfies all the constraints, there is a direction or subspace of directions to move that increases the objective function. Moving in any such direction is said to remove slack between the

candidate solution In mathematical optimization and computer science, a feasible region, feasible set, or solution space is the set of all possible points (sets of values of the choice variables) of an optimization problem that satisfy the problem's constraints, ...

and one or more constraints. An ''infeasible'' value of the candidate solution is one that exceeds one or more of the constraints. In the dual problem, the dual vector multiplies the constraints that determine the positions of the constraints in the primal. Varying the dual vector in the dual problem is equivalent to revising the upper bounds in the primal problem. The lowest upper bound is sought. That is, the dual vector is minimized in order to remove slack between the candidate positions of the constraints and the actual optimum. An infeasible value of the dual vector is one that is too low. It sets the candidate positions of one or more of the constraints in a position that excludes the actual optimum. This intuition is made formal by the equations in Linear programming: Duality.

Nonlinear case

In nonlinear programming, the constraints are not necessarily linear. Nonetheless, many of the same principles apply. To ensure that the global maximum of a non-linear problem can be identified easily, the problem formulation often requires that the functions be convex and have compact lower level sets. This is the significance of the Karush–Kuhn–Tucker conditions. They provide necessary conditions for identifying local optima of non-linear programming problems. There are additional conditions (constraint qualifications) that are necessary so that it will be possible to define the direction to an ''optimal'' solution. An optimal solution is one that is a local optimum, but possibly not a global optimum.

Lagrange duality

Motivation Suppose we want to solve the following nonlinear programming problem:

$\begin
\text &f_0(x) \\
\text &f_i(x) \leq 0,\ i \in \left \ \\
\end$

The problem has constraints; we would like to convert it to a program without constraints. Theoretically, it is possible to do it by minimizing the function

J(x)

, defined as

$J(x) = f_0(x) + \sum_i I_i(x)$

where

I

is an infinite step function:

I 0

u \leq 0

, and

\infty

otherwise. But

J(x)

is hard to solve as it is not continuous. It is possible to "approximate"

I /math> by \lambda u, where \lambda is a positive constant. This yields a function known as the Lagrangian:

$L(x,\lambda) = f_0(x) + \sum_i \lambda_i f_i(x)$

Note that, for every

x

$\max_ L(x,\lambda) = J(x)$ .

''Proof'': * If x satisfies all constraints ''f_i''(''x'')≤0, then L(''x'',''λ'') is maximized when taking ''λ''=0, and its value is then f(x); * If ''x'' violates some constraint, ''f_i''(''x'')>0 for some ''i'', then L(x,''λ'')→∞ when ''λ_i→∞''. Therefore, the original problem is equivalent to:

$\min_x \max_ L(x,\lambda)$ .

By reversing the order of min and max, we get:

$\max_ \min_x L(x,\lambda)$ .

The ''dual function'' is the inner problem in the above formula:

$g(\lambda) := \min_x L(x,\lambda)$ .

The Lagrangian dual program is the program of maximizing g:

$\max_ g(\lambda)$ .

The optimal solution to the dual program is a lower bound for the optimal solution of the original (primal) program; this is the ''weak duality'' principle. If the primal problem is convex and bounded from below, and there exists a point in which all nonlinear constraints are strictly satisfied ( Slater's condition), then the optimal solution to the dual program ''equals'' the optimal solution of the primal program; this is the ''strong duality'' principle. In this case, we can solve the primal program by finding an optimal solution ''λ''* to the dual program, and then solving:

$\min_x L(x,\lambda^*)$ .

Note that, to use either the weak or the strong duality principle, we need a way to compute g(''λ''). In general this may be hard, as we need to solve a different minimization problem for every ''λ''. But for some classes of functions, it is possible to get an explicit formula for g(). Solving the primal and dual programs together is often easier than solving only one of them. Examples are

linear programming Linear programming (LP), also called linear optimization, is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements and objective are represented by linear function#As a polynomia ...

and quadratic programming. A better and more general approach to duality is provided by Fenchel's duality theorem. Another condition in which the min-max and max-min are equal is when the Lagrangian has a

saddle point In mathematics, a saddle point or minimax point is a Point (geometry), point on the surface (mathematics), surface of the graph of a function where the slopes (derivatives) in orthogonal directions are all zero (a Critical point (mathematics), ...

: (x∗, λ∗) is a saddle point of the Lagrange function L if and only if x∗ is an optimal solution to the primal, λ∗ is an optimal solution to the dual, and the optimal values in the indicated problems are equal to each other.

The strong Lagrange principle

Given a nonlinear programming problem in standard form :

\begin
\text    &f_0(x) \\
\text &f_i(x) \leq 0,\ i \in \left \ \\
                    &h_i(x) = 0,\ i \in \left \
\end

with the domain

\mathcal \subset \mathbb^n

having non-empty interior, the ''Lagrangian function''

\mathcal: \mathbb^n \times \mathbb^m \times \mathbb^p \to \mathbb

is defined as :

\mathcal(x,\lambda,\nu) = f_0(x) + \sum_^m \lambda_i f_i(x) + \sum_^p \nu_i h_i(x).

The vectors

\lambda

and

\nu

are called the ''dual variables'' or ''Lagrange multiplier vectors'' associated with the problem. The ''Lagrange dual function''

g:\mathbb^m \times \mathbb^p \to \mathbb

is defined as :

g(\lambda,\nu) = \inf_ \mathcal(x,\lambda,\nu) = \inf_ \left\.

The dual function ''g'' is concave, even when the initial problem is not convex, because it is a point-wise infimum of affine functions. The dual function yields lower bounds on the optimal value

p^*

of the initial problem; for any

\lambda \geq 0

and any

\nu

we have

g(\lambda,\nu) \leq p^*

. If a constraint qualification such as Slater's condition holds and the original problem is convex, then we have strong duality, i.e.

d^* = \max_ g(\lambda,\nu) = \inf f_0 = p^*

Convex problems

For a convex minimization problem with inequality constraints, :

\begin
&\underset& & f(x) \\
&\operatorname
& &g_i(x) \leq 0, \quad i = 1,\ldots,m
\end

the Lagrangian dual problem is :

\begin
&\underset& & \inf_x \left(f(x) + \sum_^m u_j g_j(x)\right) \\
&\operatorname
& &u_i \geq 0, \quad i = 1,\ldots,m
\end

where the objective function is the Lagrange dual function. Provided that the functions

f

and

g_1, \ldots, g_m

are continuously differentiable, the infimum occurs where the gradient is equal to zero. The problem :

\begin
&\underset& & f(x) + \sum_^m u_j g_j(x) \\
&\operatorname
& & \nabla f(x) + \sum_^m u_j \, \nabla g_j(x) = 0 \\
&&&u_i \geq 0, \quad i = 1,\ldots,m
\end

is called the Wolfe dual problem. This problem may be difficult to deal with computationally, because the objective function is not concave in the joint variables

(u,x)

. Also, the equality constraint

\nabla f(x) + \sum_^m u_j \, \nabla g_j(x)

is nonlinear in general, so the Wolfe dual problem is typically a nonconvex optimization problem. In any case, weak duality holds.

History

According to George Dantzig, the duality theorem for linear optimization was conjectured by

John von Neumann John von Neumann ( ; ; December 28, 1903 – February 8, 1957) was a Hungarian and American mathematician, physicist, computer scientist and engineer. Von Neumann had perhaps the widest coverage of any mathematician of his time, in ...

immediately after Dantzig presented the linear programming problem. Von Neumann noted that he was using information from his

game theory Game theory is the study of mathematical models of strategic interactions. It has applications in many fields of social science, and is used extensively in economics, logic, systems science and computer science. Initially, game theory addressed ...

, and conjectured that two person zero sum matrix game was equivalent to linear programming. Rigorous proofs were first published in 1948 by Albert W. Tucker and his group. (Dantzig's foreword to Nering and Tucker, 1993)

Applications

In support vector machines (SVMs), formulating the primal problem of SVMs as the dual problem can be used to implement the Kernel trick, but the latter has higher time complexity in the historical cases.

Notes

References

Books

* * * * * * * * * * * * * * * *

Articles

* *
Duality in Linear Programming
Gary D. Knott {{Convex analysis and variational analysis Convex optimization Linear programming

Dual problem

Duality gap

Linear case

Relationship between the primal problem and the dual problem

Nonlinear case

Lagrange duality

The strong Lagrange principle

Convex problems

History

Applications

See also

Notes

References

Books

Articles