The Frank–Wolfe algorithm is an
iterative first-order optimization
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfiel ...
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
for
constrained convex optimization
Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets (or, equivalently, maximizing concave functions over convex sets). Many classes of convex optimization problems ...
. Also known as the conditional gradient method, reduced gradient algorithm and the convex combination algorithm, the method was originally proposed by
Marguerite Frank and
Philip Wolfe in 1956. In each iteration, the Frank–Wolfe algorithm considers a
linear approximation of the objective function, and moves towards a minimizer of this linear function (taken over the same domain).
Problem statement
Suppose
is a
compact
Compact as used in politics may refer broadly to a pact or treaty; in more specific cases it may refer to:
* Interstate compact, a type of agreement used by U.S. states
* Blood compact, an ancient ritual of the Philippines
* Compact government, a t ...
convex set
In geometry, a set of points is convex if it contains every line segment between two points in the set.
For example, a solid cube (geometry), cube is a convex set, but anything that is hollow or has an indent, for example, a crescent shape, is n ...
in a
vector space
In mathematics and physics, a vector space (also called a linear space) is a set (mathematics), set whose elements, often called vector (mathematics and physics), ''vectors'', can be added together and multiplied ("scaled") by numbers called sc ...
and
is a
convex
Convex or convexity may refer to:
Science and technology
* Convex lens, in optics
Mathematics
* Convex set, containing the whole line segment that joins points
** Convex polygon, a polygon which encloses a convex set of points
** Convex polytop ...
,
differentiable
In mathematics, a differentiable function of one real variable is a function whose derivative exists at each point in its domain. In other words, the graph of a differentiable function has a non- vertical tangent line at each interior point in ...
real-valued function
In mathematics, a real-valued function is a function whose values are real numbers. In other words, it is a function that assigns a real number to each member of its domain.
Real-valued functions of a real variable (commonly called ''real ...
. The Frank–Wolfe algorithm solves the
optimization problem
In mathematics, engineering, computer science and economics
Economics () is a behavioral science that studies the Production (economics), production, distribution (economics), distribution, and Consumption (economics), consumption of goo ...
:Minimize
:subject to
.
Algorithm

:''Initialization:'' Let
, and let
be any point in
.
:Step 1. ''Direction-finding subproblem:'' Find
solving
::Minimize
::Subject to
:''(Interpretation: Minimize the linear approximation of the problem given by the first-order
Taylor approximation
In calculus, Taylor's theorem gives an approximation of a k-times differentiable function around a given point by a polynomial of degree k, called the k-th-order Taylor polynomial. For a smooth function, the Taylor polynomial is the truncation a ...
of
around
constrained to stay within
.)''
:Step 2. ''Step size determination:'' Set
, or alternatively find
that minimizes
subject to
.
:Step 3. ''Update:'' Let
, let
and go to Step 1.
Properties
While competing methods such as
gradient descent
Gradient descent is a method for unconstrained mathematical optimization. It is a first-order iterative algorithm for minimizing a differentiable multivariate function.
The idea is to take repeated steps in the opposite direction of the gradi ...
for constrained optimization require a
projection step back to the feasible set in each iteration, the Frank–Wolfe algorithm only needs the solution of a convex problem over the same set in each iteration, and automatically stays in the feasible set.
The convergence of the Frank–Wolfe algorithm is sublinear in general: the error in the objective function to the optimum is
after ''k'' iterations, so long as the gradient is
Lipschitz continuous
In mathematical analysis, Lipschitz continuity, named after Germany, German mathematician Rudolf Lipschitz, is a strong form of uniform continuity for function (mathematics), functions. Intuitively, a Lipschitz continuous function is limited in h ...
with respect to some norm. The same convergence rate can also be shown if the sub-problems are only solved approximately.
The iterations of the algorithm can always be represented as a sparse convex combination of the extreme points of the feasible set, which has helped to the popularity of the algorithm for sparse greedy optimization in
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
and
signal processing
Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing ''signals'', such as audio signal processing, sound, image processing, images, Scalar potential, potential fields, Seismic tomograph ...
problems, as well as for example the optimization of
minimum–cost flows in
transportation networks.
If the feasible set is given by a set of linear constraints, then the subproblem to be solved in each iteration becomes a
linear program
Linear programming (LP), also called linear optimization, is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements and objective are represented by linear relationships. Linear ...
.
While the worst-case convergence rate with
can not be improved in general, faster convergence can be obtained for special problem classes, such as some strongly convex problems.
Lower bounds on the solution value, and primal-dual analysis
Since
is
convex
Convex or convexity may refer to:
Science and technology
* Convex lens, in optics
Mathematics
* Convex set, containing the whole line segment that joins points
** Convex polygon, a polygon which encloses a convex set of points
** Convex polytop ...
, for any two points
we have:
:
This also holds for the (unknown) optimal solution
. That is,
. The best lower bound with respect to a given point
is given by
:
The latter optimization problem is solved in every iteration of the Frank–Wolfe algorithm, therefore the solution
of the direction-finding subproblem of the
-th iteration can be used to determine increasing lower bounds
during each iteration by setting
and
:
Such lower bounds on the unknown optimal value are important in practice because they can be used as a stopping criterion, and give an efficient certificate of the approximation quality in every iteration, since always
.
It has been shown that this corresponding
duality gap, that is the difference between
and the lower bound
, decreases with the same convergence rate, i.e.
Notes
Bibliography
* (Overview paper)
The Frank–Wolfe algorithmdescription
* .
External links
*https://conditional-gradients.org/: a survey of Frank–Wolfe algorithms.
Marguerite Frank giving a personal account of the history of the algorithm
See also
*
Proximal gradient methods
{{DEFAULTSORT:Frank-Wolfe algorithm
Optimization algorithms and methods
Iterative methods
First order methods
Gradient methods