The MM algorithm is an iterative

optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...

method which exploits the

convexity Convex or convexity may refer to: Science and technology * Convex lens, in optics Mathematics * Convex set, containing the whole line segment that joins points ** Convex polygon, a polygon which encloses a convex set of points ** Convex polytope, ...

of a function in order to find its maxima or minima. The MM stands for “Majorize-Minimization” or “Minorize-Maximization”, depending on whether the desired optimization is a minimization or a maximization. Despite the name, MM itself is not an algorithm, but a description of how to construct an

optimization algorithm Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...

. The

expectation–maximization algorithm In statistics, an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variabl ...

can be treated as a special case of the MM algorithm. However, in the EM algorithm

conditional expectation In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value – the value it would take “on average” over an arbitrarily large number of occurrences – give ...

s are usually involved, while in the MM algorithm convexity and inequalities are the main focus, and it is easier to understand and apply in most cases.

History

The historical basis for the MM algorithm can be dated back to at least 1970, when Ortega and Rheinboldt were performing studies related to

line search In optimization, the line search strategy is one of two basic iterative approaches to find a local minimum \mathbf^* of an objective function f:\mathbb R^n\to\mathbb R. The other approach is trust region. The line search approach first finds a d ...

methods. The same concept continued to reappear in different areas in different forms. In 2000, Hunter and Lange put forth "MM" as a general framework. Recent studies have applied the method in a wide range of subject areas, such as

mathematics Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...

statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...

machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

and

engineering Engineering is the use of scientific method, scientific principles to design and build machines, structures, and other items, including bridges, tunnels, roads, vehicles, and buildings. The discipline of engineering encompasses a broad rang ...

Algorithm

The MM algorithm works by finding a surrogate function that minorizes or majorizes the objective function. Optimizing the surrogate function will either improve the value of the objective function or leave it unchanged. Taking the minorize-maximization version, let

f(\theta)

be the objective concave function to be maximized. At the step of the algorithm,

m=0,1...

, the constructed function

g(\theta, \theta_m)

will be called the minorized version of the objective function (the surrogate function) at

\theta_m

if :

g(\theta, \theta_m) \le f(\theta) \text \theta

g(\theta_m, \theta_m)=f(\theta_m)

Then, maximize

g(\theta, \theta_m)

instead of

f(\theta)

, and let :

\theta_=\arg\max_g(\theta, \theta_m)

The above iterative method will guarantee that

f(\theta_m)

will converge to a local optimum or a saddle point as goes to infinity. By the above construction :

f(\theta_) \ge g(\theta_, \theta_m) \ge g(\theta_m, \theta_m)= f(\theta_m)

The marching of

\theta_m

and the surrogate functions relative to the objective function is shown in the figure. Majorize-Minimization is the same procedure but with a convex objective to be minimised.

Constructing the surrogate function

One can use any inequality to construct the desired majorized/minorized version of the objective function. Typical choices include *

Jensen's inequality In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906, building on an earlier pr ...

* Convexity inequality *

Cauchy–Schwarz inequality The Cauchy–Schwarz inequality (also called Cauchy–Bunyakovsky–Schwarz inequality) is considered one of the most important and widely used inequalities in mathematics. The inequality for sums was published by . The corresponding inequality fo ...

Inequality of arithmetic and geometric means In mathematics, the inequality of arithmetic and geometric means, or more briefly the AM–GM inequality, states that the arithmetic mean of a list of non-negative real numbers is greater than or equal to the geometric mean of the same list; and ...

* Quadratic majorization/mininorization via second order

Taylor expansion In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor serie ...

of twice-differentiable functions with bounded curvature.

References

{{reflist Optimization algorithms and methods