In mathematics, mirror descent is an

iterative Iteration is the repetition of a process in order to generate a (possibly unbounded) sequence of outcomes. Each repetition of the process is a single iteration, and the outcome of each iteration is then the starting point of the next iteration. ...

optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...

algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...

for finding a

local minimum In mathematical analysis, the maxima and minima (the respective plurals of maximum and minimum) of a function, known collectively as extrema (the plural of extremum), are the largest and smallest value of the function, either within a given ra ...

of a

differentiable function In mathematics, a differentiable function of one real variable is a function whose derivative exists at each point in its domain. In other words, the graph of a differentiable function has a non-vertical tangent line at each interior point in it ...

. It generalizes algorithms such as

gradient descent In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the ...

and multiplicative weights.

History

Mirror descent was originally proposed by Nemirovski and Yudin in 1983.

Motivation

with the sequence of learning rates

(\eta_n)_

applied to a differentiable function

F

, one starts with a guess

\mathbf_0

for a local minimum of

F

, and considers the sequence

\mathbf_0, \mathbf_1, \mathbf_2, \ldots

such that :

\mathbf_=\mathbf_n-\eta_n \nabla F(\mathbf_n),\ n \ge 0.

This can be reformulated by noting that :

\mathbf_=\arg \min_ \left(F(\mathbf_n) + \nabla F(\mathbf_n)^T (\mathbf - \mathbf_n) + \frac\, \mathbf - \mathbf_n\, ^2\right)

In other words,

\mathbf_

minimizes the first-order approximation to

F

\mathbf_n

with added proximity term

\, \mathbf - \mathbf_n\, ^2

. This Euclidean distance term is a particular example of a

Bregman distance In mathematics, specifically statistics and information geometry, a Bregman divergence or Bregman distance is a measure of difference between two points, defined in terms of a strictly convex function; they form an important class of divergence ...

. Using other Bregman distances will yield other algorithms such as

Hedge A hedge or hedgerow is a line of closely spaced shrubs and sometimes trees, planted and trained to form a barrier or to mark the boundary of an area, such as between neighbouring properties. Hedges that are used to separate a road from adjoin ...

which may be more suited to optimization over particular geometries.

Formulation

We are given convex function

f

to optimize over a convex set

K \subset \mathbb^n

, and given some norm

\, \cdot\,

\mathbb^n

. We are also given differentiable convex function

h \colon \mathbb^n \to \mathbb

\alpha

- strongly convex with respect to the given norm. This is called the ''distance-generating function'', and its gradient

\nabla h \colon \mathbb^n \to \mathbb^n

is known as the ''mirror map''. Starting from initial

x_0 \in K

, in each iteration of Mirror Descent: * Map to the dual space:

\theta_t \leftarrow \nabla h (x_t)

* Update in the dual space using a gradient step:

\theta_ \leftarrow \theta_t - \eta_t \nabla f(x_t)

* Map back to the primal space:

x'_ \leftarrow (\nabla h)^(\theta_)

* Project back to the feasible region

K

x_ \leftarrow \min_D_h(x, , x'_)

, where

D_h

is the

Bregman divergence In mathematics, specifically statistics and information geometry, a Bregman divergence or Bregman distance is a measure of difference between two points, defined in terms of a strictly convex function; they form an important class of divergences. W ...

Extensions

Mirror descent in the online optimization setting is known as Online Mirror Descent (OMD).

References

* {{Optimization algorithms, unconstrained Mathematical optimization Optimization algorithms and methods Gradient methods

History

Motivation

Formulation

Extensions

See also

References