The Bregman method is an
iterative algorithm
In computational mathematics, an iterative method is a mathematical procedure that uses an initial value to generate a sequence of improving approximate solutions for a class of problems, in which the ''n''-th approximation is derived from the pre ...
to solve certain
convex optimization
Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets (or, equivalently, maximizing concave functions over convex sets). Many classes of convex optimization probl ...
problems involving
regularization
Regularization may refer to:
* Regularization (linguistics)
* Regularization (mathematics)
* Regularization (physics)
In physics, especially quantum field theory, regularization is a method of modifying observables which have singularities in ...
.
The original version is due to
Lev M. Bregman
Lev M. Bregman (1941 - 2023) is a Soviet and Israeli mathematician, most known for the Bregman divergence named after him.
Bregman received his M. Sc. in mathematics in 1963 at Leningrad University and his Ph.D. in mathematics in 1966 at the same ...
, who published it in 1967.
The algorithm is a
row-action method accessing
constraint functions one by one and the method is particularly suited for large optimization problems where constraints can be efficiently enumerated. The algorithm works particularly well for
regularizers such as the
norm, where it converges very quickly because of an error cancellation effect.
Algorithm
In order to be able to use the Bregman method, one must frame the problem of interest as finding
, where
is a regularizing function such as
.
The ''Bregman distance'' is defined as
where
belongs to the
subdifferential
In mathematics, the subderivative, subgradient, and subdifferential generalize the derivative to convex functions which are not necessarily differentiable. Subderivatives arise in convex analysis, the study of convex functions, often in connection ...
of
at
(which we denoted
).
One performs the iteration
, with
a constant to be chosen by the user (and the minimization performed by an ordinary convex optimization algorithm),
or
, with
chosen each time to be a member of
.
The algorithm starts with a pair of primal and dual variables. Then, for each constraint a
generalized projection onto its feasible set is performed, updating both the constraint's dual variable and all primal variables for which there are non-zero coefficients in the constraint functions gradient. In case the objective is strictly convex and all constraint functions are convex, the limit of this iterative projection converges to the optimal primal dual pair.
In the case of a
basis pursuit
Basis pursuit is the mathematical optimization problem of the form
: \min_x \, x\, _1 \quad \text \quad y = Ax,
where ''x'' is a ''N''-dimensional solution vector (signal), ''y'' is a ''M''-dimensional vector of observations (measurements), ''A' ...
-type problem
, the Bregman method is equivalent to ordinary
gradient descent
In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the ...
on the dual problem
.
An exact regularization-type effect also occurs in this case; if
exceeds a certain threshold, the optimum value of
is precisely the optimum solution of
.
Applications
The Bregman method or its generalizations can be applied to:
*
Image deblurring
Deblurring is the process of removing blurring artifacts from images. Deblurring recovers a sharp image ''S'' from a blurred image ''B'', where ''S'' is convolved with ''K'' (the blur kernel) to generate ''B''. Mathematically, this can be represen ...
or
denoising
Noise reduction is the process of removing noise from a signal. Noise reduction techniques exist for audio and images. Noise reduction algorithms may distort the signal to some degree. Noise rejection is the ability of a circuit to isolate an un ...
(including
total variation denoising
In signal processing, particularly image processing, total variation denoising, also known as total variation regularization or total variation filtering, is a noise removal process (filter). It is based on the principle that signals with excessi ...
)
* MR image reconstruction
*
Magnetic resonance imaging
Magnetic resonance imaging (MRI) is a medical imaging technique used in radiology to form pictures of the anatomy and the physiological processes of the body. MRI scanners use strong magnetic fields, magnetic field gradients, and radio wave ...
*
Radar
Radar is a detection system that uses radio waves to determine the distance (''ranging''), angle, and radial velocity of objects relative to the site. It can be used to detect aircraft, ships, spacecraft, guided missiles, motor vehicles, w ...
*
Hyperspectral imaging
Hyperspectral imaging collects and processes information from across the electromagnetic spectrum. The goal of hyperspectral imaging is to obtain the spectrum for each pixel in the image of a scene, with the purpose of finding objects, identifyi ...
*
Compressed sensing
Compressed sensing (also known as compressive sensing, compressive sampling, or sparse sampling) is a signal processing technique for efficiently acquiring and reconstructing a Signal (electronics), signal, by finding solutions to Underdetermined ...
*
Least absolute deviations
Least absolute deviations (LAD), also known as least absolute errors (LAE), least absolute residuals (LAR), or least absolute values (LAV), is a statistical optimality criterion and a statistical optimization technique based minimizing the ''sum o ...
or
-regularized linear regression
*Covariance selection (learning a sparse covariance matrix)
*
Matrix completion
Matrix completion is the task of filling in the missing entries of a partially observed matrix, which is equivalent to performing data imputation in statistics. A wide range of datasets are naturally organized in matrix form. One example is the mo ...
*
Structural risk minimization
Generalizations and drawbacks
The method has links to the
method of multipliers
Method ( grc, μέθοδος, methodos) literally means a pursuit of knowledge, investigation, mode of prosecuting such inquiry, or system. In recent centuries it more often means a prescribed process for completing a task. It may refer to:
*Scien ...
and
dual ascent method
Dual or Duals may refer to:
Paired/two things
* Dual (mathematics), a notion of paired concepts that mirror one another
** Dual (category theory), a formalization of mathematical duality
*** see more cases in :Duality theories
* Dual (grammatical ...
(through the so-called ''Bregman alternating direction method of multipliers'',
generalizing the ''alternating direction method of multipliers''
) and multiple generalizations exist.
One drawback of the method is that it is only provably convergent if the objective function is ''strictly'' convex. In case this can not be ensured, as for
linear program
Linear programming (LP), also called linear optimization, is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements are represented by linear relationships. Linear programming is ...
s or non-strictly convex quadratic programs, additional methods such as
proximal gradient method
Proximal gradient methods are a generalized form of projection used to solve non-differentiable convex optimization problems.
Many interesting problems can be formulated as convex optimization problems of the form
\operatorname\limits_ \sum_^n ...
s have been developed. In the case of the Rudin-Osher-Fatemi model of image denoising, the Bregman method provably converges.
Some generalizations of the Bregman method include:
* Inverse scale space method
* Linearized Bregman
* Logistic Bregman
* Split Bregman
Linearized Bregman
In the Linearized Bregman method, one linearizes the intermediate objective functions
by replacing the second term with
(which approximates the second term near
) and adding the penalty term
for a constant
. The result is much more computationally tractable, especially in
basis pursuit
Basis pursuit is the mathematical optimization problem of the form
: \min_x \, x\, _1 \quad \text \quad y = Ax,
where ''x'' is a ''N''-dimensional solution vector (signal), ''y'' is a ''M''-dimensional vector of observations (measurements), ''A' ...
-type problems.
In the case of a generic basis pursuit problem
, one can express the iteration as
for each component
, where we define
.
Sometimes, when running the Linearized Bregman method, there are periods of "stagnation" where the residual is almost constant. To alleviate this issue, one can use the Linearized Bregman method with ''kicking'', where one essentially detects the beginning of a stagnation period, then predicts and skips to the end of it.
Since Linearized Bregman is mathematically equivalent to gradient descent, it can be accelerated with methods to accelerate gradient descent, such as line search,
L-BGFS,
Barzilai-Borwein steps, or the
Nesterov method
Nesterov (russian: Не́стеров), until 1938 known by its German name ( lt, Stalupėnai; pl, Stołupiany) and in 1938-1946 as Ebenrode, is a town and the administrative center of Nesterovsky District in Kaliningrad Oblast, Russia, locat ...
; the last has been proposed as the ''accelerated linearized Bregman method''.
Split Bregman
The Split Bregman method solves problems of the form
, where
and
are both convex,
particularly problems of the form
.
We start by rewriting it as the constrained optimization problem
, then relax it into
where
is a constant. By defining
, one reduces the problem to one that can be solved with the ordinary Bregman algorithm.
The Split Bregman method has been generalized to optimization over
complex number
In mathematics, a complex number is an element of a number system that extends the real numbers with a specific element denoted , called the imaginary unit and satisfying the equation i^= -1; every complex number can be expressed in the form ...
s using
Wirtinger derivatives
In complex analysis of one and several complex variables, Wirtinger derivatives (sometimes also called Wirtinger operators), named after Wilhelm Wirtinger who introduced them in 1927 in the course of his studies on the theory of functions of sev ...
.
References
{{reflist
Optimization algorithms and methods
Convex optimization