HOME

TheInfoList



OR:

In
optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
, a self-concordant function is a
function Function or functionality may refer to: Computing * Function key, a type of key on computer keyboards * Function model, a structured representation of processes in a system * Function object or functor or functionoid, a concept of object-oriente ...
f:\mathbb \rightarrow \mathbb for which : , f(x), \leq 2 f''(x)^ or, equivalently, a function f:\mathbb \rightarrow \mathbb that, wherever f''(x) > 0, satisfies : \left, \frac \frac \ \leq 1 and which satisfies f(x) = 0 elsewhere. More generally, a multivariate function f(x) : \mathbb^n \rightarrow \mathbb is self-concordant if : \left. \frac \nabla^2 f(x + \alpha y) \_ \preceq 2 \sqrt \, \nabla^2 f(x) or, equivalently, if its restriction to any arbitrary line is self-concordant.


History

As mentioned in the "Bibliography Comments" of their 1994 book, self-concordant functions were introduced in 1988 by
Yurii Nesterov Yurii Nesterov is a Russian mathematician, an internationally recognized expert in convex optimization, especially in the development of efficient algorithms and numerical optimization analysis. He is currently a professor at the University of Lou ...
and further developed with
Arkadi Nemirovski Arkadi Nemirovski (born March 14, 1947) is a professor at the H. Milton Stewart School of Industrial and Systems Engineering at the Georgia Institute of Technology. He has been a leader in continuous optimization and is best known for his work o ...
. As explained in their basic observation was that the Newton method is affine invariant, in the sense that if for a function f(x) we have Newton steps x_ = x_k - ''(x_k)f'(x_k) then for a function \phi(y) = f(Ay) where A is a non-degenerate linear transformation, starting from y_0 = A^ x_0 we have the Newton steps y_k = A^ x_k which can be shown recursively : y_ = y_k - phi''(y_k) \phi'(y_k) = y_k - ^T f''(A y_k) A A^T f'(A y_k) = A^ x_k - A^ ''(x_k) f'(x_k) = A^ x_. However the standard analysis of the Newton method supposes that the Hessian of f is
Lipschitz continuous In mathematical analysis, Lipschitz continuity, named after German mathematician Rudolf Lipschitz, is a strong form of uniform continuity for functions. Intuitively, a Lipschitz continuous function is limited in how fast it can change: there exis ...
, that is \, f''(x) - f''(y)\, \leq M\, x-y \, for some constant M. If we suppose that f is 3 times continuously differentiable, then this is equivalent to : , \langle f(x) , v \rangle , \leq M \, u\, \, v\, ^2for all u,v \in \mathbb^n where f(x) = \lim_ \alpha^ ''(x + \alpha u) - f''(x)/math> . Then the left hand side of the above inequality is invariant under the affine transformation f(x) \to \phi(y) = f(A y), u \to A^ u, v \to A^ v, however the right hand side is not. The authors note that the right hand side can be made also invariant if we replace the Euclidean metric by the scalar product defined by the Hessian of f defined as \, w \, _ = \langle f''(x)w, w \rangle^ for w \in \mathbb R^n. They then arrive at the definition of a self concordant function as : , \langle f(x) , u \rangle , \leq M \langle f''(x) u, u \rangle^.


Properties


Linear combination

If f_1 and f_2 are self-concordant with constants M_1 and M_2 and \alpha,\beta>0, then \alpha f_1 + \beta f_2 is self-concordant with constant \max(\alpha^ M_1, \beta^ M_2).


Affine transformation

If f is self-concordant with constant M and Ax + b is an affine transformation of \mathbb R^n, then \phi(x) = f(Ax+b) is also self-concordant with parameter M.


Convex conjugate

If f is self-concordant, then its
convex conjugate In mathematics and mathematical optimization, the convex conjugate of a function is a generalization of the Legendre transformation which applies to non-convex functions. It is also known as Legendre–Fenchel transformation, Fenchel transformation ...
f^* is also self-concordant.


Non-singular Hessian

If f is self-concordant and the domain of f contains no straight line (infinite in both directions), then f'' is non-singular. Conversely, if for some x in the domain of f and u \in \mathbb R^n, u \neq 0 we have \langle f''(x) u, u \rangle = 0, then \langle f''(x + \alpha u) u, u \rangle = 0 for all \alpha for which x + \alpha u is in the domain of f and then f(x + \alpha u) is linear and cannot have a maximum so all of x + \alpha u, \alpha \in \mathbb R is in the domain of f. We note also that f cannot have a minimum inside its domain.


Applications

Among other things, self-concordant functions are useful in the analysis of
Newton's method In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real-valu ...
. Self-concordant ''barrier functions'' are used to develop the
barrier function In constrained optimization, a field of mathematics, a barrier function is a continuous function whose value on a point increases to infinity as the point approaches the boundary of the feasible region of an optimization problem. Such functions a ...
s used in
interior point method Interior-point methods (also referred to as barrier methods or IPMs) are a certain class of algorithms that solve linear and nonlinear convex optimization problems. An interior point method was discovered by Soviet mathematician I. I. Dikin in 1 ...
s for convex and nonlinear optimization. The usual analysis of the Newton method would not work for barrier functions as their second derivative cannot be Lipschitz continuous, otherwise they would be bounded on any compact subset of \mathbb R^n. Self-concordant barrier functions * are a class of functions that can be used as barriers in constrained optimization methods * can be minimized using the Newton algorithm with provable convergence properties analogous to the usual case (but these results are somewhat more difficult to derive) * to have both of the above, the usual constant bound on the third derivative of the function (required to get the usual convergence results for the Newton method) is replaced by a bound relative to the Hessian


Minimizing a self-concordant function

A self-concordant function may be minimized with a modified Newton method where we have a bound on the number of steps required for convergence. We suppose here that f is a ''standard'' self-concordant function, that is it is self-concordant with parameter M = 2. We define the ''Newton decrement'' \lambda_f(x) of f at x as the size of the Newton step
''(x) In formal language theory, the empty string, or empty word, is the unique string of length zero. Formal theory Formally, a string is a finite, ordered sequence of characters such as letters, digits or spaces. The empty string is the special case ...
f'(x) in the local norm defined by the Hessian of f at x :\lambda_f(x) = \langle f''(x)
''(x) In formal language theory, the empty string, or empty word, is the unique string of length zero. Formal theory Formally, a string is a finite, ordered sequence of characters such as letters, digits or spaces. The empty string is the special case ...
f'(x),
''(x) In formal language theory, the empty string, or empty word, is the unique string of length zero. Formal theory Formally, a string is a finite, ordered sequence of characters such as letters, digits or spaces. The empty string is the special case ...
f'(x) \rangle^ = \langle
''(x) In formal language theory, the empty string, or empty word, is the unique string of length zero. Formal theory Formally, a string is a finite, ordered sequence of characters such as letters, digits or spaces. The empty string is the special case ...
f'(x), f'(x) \rangle^ Then for x in the domain of f, if \lambda_f(x) < 1 then it is possible to prove that the Newton iterate : x_+ = x -
''(x) In formal language theory, the empty string, or empty word, is the unique string of length zero. Formal theory Formally, a string is a finite, ordered sequence of characters such as letters, digits or spaces. The empty string is the special case ...
f'(x) will be also in the domain of f. This is because, based on the self-concordance of f, it is possible to give some finite bounds on the value of f(x_+). We further have : \lambda_f(x_+) \leq \Bigg( \frac \Bigg)^2 Then if we have : \lambda_f(x) < \bar\lambda = \frac then it is also guaranteed that \lambda_f(x_+) < \lambda_f(x), so that we can continue to use the Newton method until convergence. Note that for \lambda_f(x_+) < \beta for some \beta \in (0, \bar\lambda) we have quadratic convergence of \lambda_f to 0 as \lambda_f(x_+) \leq (1-\beta)^ \lambda_f(x)^2. This then gives quadratic convergence of f(x_k) to f(x^*) and of x to x^*, where x^* = \arg\min f(x), by the following theorem. If \lambda_f(x) < 1 then : \omega(\lambda_f(x)) \leq f(x)-f(x^*) \leq \omega_*(\lambda_f(x)) : \omega'(\lambda_f(x)) \leq \, x-x^* \, _x \leq \omega_*'(\lambda_f(x)) with the following definitions : \omega(t) = t - \log(1+t) : \omega_*(t) = -t-\log(1-t) : \, u \, _x = \langle f''(x) u, u \rangle^ If we start the Newton method from some x_0 with \lambda_f(x_0) \geq \bar\lambda then we have to start by using a ''damped Newton method'' defined by :x_ = x_k - \frac ''(x_k)f'(x_k) For this it can be shown that f(x_) \leq f(x_k) - \omega(\lambda_f(x_k)) with \omega as defined previously. Note that \omega(t) is an increasing function for t > 0 so that \omega(t) \geq \omega(\bar\lambda) for any t \geq \bar\lambda, so the value of f is guaranteed to decrease by a certain amount in each iteration, which also proves that x_ is in the domain of f.


Examples


Self-concordant functions

* Linear and convex quadratic functions are self-concordant with M = 0 since their third derivative is zero. * Any function f(x) = -\log(-g(x))-\log x where g(x) is defined and convex for all x > 0 and verifies , g(x) , \leq 3g''(x)/x, is self concordant on its domain which is \. Some examples are **g(x) = -x^p for 0 < p \leq 1 ** g(x) = -\log x ** g(x) = x^p for -1 \leq p \leq 0 ** g(x) = (ax+b)^2 / x ** for any function g satisfying the conditions, the function g(x) + a x^2 + bx + c with a \geq 0 also satisfies the conditions Functions that are not self-concordant * f(x) = e^x * f(x) = \frac, x >0, p >0 * f(x) = , x^p, , p > 2


Self-concordant barriers

* positive half-line \mathbb R_+: f(x) = -\log x with domain x > 0 is a self-concordant barrier with M = 1. * positive orthant \mathbb R_+^n: f(x) = -\sum_^n \log x_i with M = n * the logarithmic barrier f(x) = -\log \phi(x) for the quadratic region defined by \phi(x) > 0 where \phi(x) = \alpha +\langle a, x \rangle - \frac \langle Ax, x \rangle where A = A^T \geq 0 is a positive semi-definite symmetric matrix self-concordant for M = 2 * second order cone \: f(x,y) = -\log(y^2 - x^T x) * semi-definite cone A = A^T \geq 0: f(A) = - \log \det A * exponential cone \: f(x,y,z) = -\log (y \log(z/y) - x) - \log z - \log y * power cone \: f(x_1,x_2,y) = -\log(x_1^ x_2^ - y^2) - \log x_1 - \log x_2


References

{{mathapplied-stub Functions and mappings