In mathematics, mirror descent is an
iterative
Iteration is the repetition of a process in order to generate a (possibly unbounded) sequence of outcomes. Each repetition of the process is a single iteration, and the outcome of each iteration is then the starting point of the next iteration. ...
optimization
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
for finding a
local minimum
In mathematical analysis, the maxima and minima (the respective plurals of maximum and minimum) of a function, known collectively as extrema (the plural of extremum), are the largest and smallest value of the function, either within a given ra ...
of a
differentiable function
In mathematics, a differentiable function of one real variable is a function whose derivative exists at each point in its domain. In other words, the graph of a differentiable function has a non-vertical tangent line at each interior point in it ...
.
It generalizes algorithms such as
gradient descent
In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the ...
and
multiplicative weights.
History
Mirror descent was originally proposed by
Nemirovski and Yudin in 1983.
Motivation
In
gradient descent
In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the ...
with the sequence of learning rates
applied to a differentiable function
, one starts with a guess
for a local minimum of
, and considers the sequence
such that
:
This can be reformulated by noting that
:
In other words,
minimizes the first-order approximation to
at
with added proximity term
.
This Euclidean distance term is a particular example of a
Bregman distance
In mathematics, specifically statistics and information geometry, a Bregman divergence or Bregman distance is a measure of difference between two points, defined in terms of a strictly convex function; they form an important class of divergence ...
. Using other Bregman distances will yield other algorithms such as
Hedge
A hedge or hedgerow is a line of closely spaced shrubs and sometimes trees, planted and trained to form a barrier or to mark the boundary of an area, such as between neighbouring properties. Hedges that are used to separate a road from adjoin ...
which may be more suited to optimization over particular geometries.
Formulation
We are given convex function
to optimize over a convex set
, and given some norm
on
.
We are also given differentiable convex function
,
-
strongly convex with respect to the given norm. This is called the ''distance-generating function'', and its gradient
is known as the ''mirror map''.
Starting from initial
, in each iteration of Mirror Descent:
* Map to the dual space:
* Update in the dual space using a gradient step:
* Map back to the primal space:
* Project back to the feasible region
:
, where
is the
Bregman divergence
In mathematics, specifically statistics and information geometry, a Bregman divergence or Bregman distance is a measure of difference between two points, defined in terms of a strictly convex function; they form an important class of divergences. W ...
.
Extensions
Mirror descent in the
online optimization setting is known as Online Mirror Descent (OMD).
See also
*
Gradient descent
In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the ...
*
Multiplicative weight update method
*
Hedge algorithm
*
Bregman divergence
In mathematics, specifically statistics and information geometry, a Bregman divergence or Bregman distance is a measure of difference between two points, defined in terms of a strictly convex function; they form an important class of divergences. W ...
References
*
{{Optimization algorithms, unconstrained
Mathematical optimization
Optimization algorithms and methods
Gradient methods