Nonlinear Conjugate Gradient Method
   HOME
*





Nonlinear Conjugate Gradient Method
In numerical optimization, the nonlinear conjugate gradient method generalizes the conjugate gradient method to nonlinear optimization. For a quadratic function \displaystyle f(x) :: \displaystyle f(x)=\, Ax-b\, ^2, the minimum of f is obtained when the gradient is 0: :: \nabla_x f=2 A^T(Ax-b)=0. Whereas linear conjugate gradient seeks a solution to the linear equation \displaystyle A^T Ax=A^T b, the nonlinear conjugate gradient method is generally used to find the local minimum of a nonlinear function using its gradient \nabla_x f alone. It works when the function is approximately quadratic near the minimum, which is the case when the function is twice differentiable at the minimum and the second derivative is non-singular there. Given a function \displaystyle f(x) of N variables to minimize, its gradient \nabla_x f indicates the direction of maximum increase. One simply starts in the opposite (steepest descent) direction: :: \Delta x_0=-\nabla_x f (x_0) with an adjustable s ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Numerical Optimization
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfields: discrete optimization and continuous optimization. Optimization problems of sorts arise in all quantitative disciplines from computer science and engineering to operations research and economics, and the development of solution methods has been of interest in mathematics for centuries. In the more general approach, an optimization problem consists of maximizing or minimizing a real function by systematically choosing input values from within an allowed set and computing the value of the function. The generalization of optimization theory and techniques to other formulations constitutes a large area of applied mathematics. More generally, optimization includes finding "best available" values of some objective function given a define ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


L-BFGS
Limited-memory BFGS (L-BFGS or LM-BFGS) is an optimization algorithm in the family of quasi-Newton methods that approximates the Broyden–Fletcher–Goldfarb–Shanno algorithm (BFGS) using a limited amount of computer memory. It is a popular algorithm for parameter estimation in machine learning. The algorithm's target problem is to minimize f(\mathbf) over unconstrained values of the real-vector \mathbf where f is a differentiable scalar function. Like the original BFGS, L-BFGS uses an estimate of the inverse Hessian matrix to steer its search through variable space, but where BFGS stores a dense n\times n approximation to the inverse Hessian (''n'' being the number of variables in the problem), L-BFGS stores only a few vectors that represent the approximation implicitly. Due to its resulting linear memory requirement, the L-BFGS method is particularly well suited for optimization problems with many variables. Instead of the inverse Hessian H''k'', L-BFGS maintains a history of t ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Wolfe Conditions
In the unconstrained minimization problem, the Wolfe conditions are a set of inequalities for performing inexact line search, especially in quasi-Newton methods, first published by Philip Wolfe in 1969. In these methods the idea is to find ::\min_x f(\mathbf) for some smooth f\colon\mathbb R^n\to\mathbb R. Each step often involves approximately solving the subproblem ::\min_ f(\mathbf_k + \alpha \mathbf_k) where \mathbf_k is the current best guess, \mathbf_k \in \mathbb R^n is a search direction, and \alpha \in \mathbb R is the step length. The inexact line searches provide an efficient way of computing an acceptable step length \alpha that reduces the objective function 'sufficiently', rather than minimizing the objective function over \alpha\in\mathbb R^+ exactly. A line search algorithm can use Wolfe conditions as a requirement for any guessed \alpha, before finding a new search direction \mathbf_k. Armijo rule and curvature A step length \alpha_k is said to satisfy the ' ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Nelder–Mead Method
The Nelder–Mead method (also downhill simplex method, amoeba method, or polytope method) is a numerical method used to find the minimum or maximum of an objective function in a multidimensional space. It is a direct search method (based on function comparison) and is often applied to nonlinear optimization problems for which derivatives may not be known. However, the Nelder–Mead technique is a heuristic search method that can converge to non-stationary points * * (algorithm summary online). on problems that can be solved by alternative methods. * Yu, Wen Ci. 1979. "Positive basis and a class of direct search techniques". ''Scientia Sinica'' 'Zhongguo Kexue'' 53—68. * Yu, Wen Ci. 1979. "The convergent property of the simplex evolutionary technique". ''Scientia Sinica'' 'Zhongguo Kexue'' 69–77. * * The Nelder–Mead technique was proposed by John Nelder and Roger Mead in 1965, as a development of the method of Spendley et al. Overview The method uses the ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Conjugate Gradient Method
In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-definite. The conjugate gradient method is often implemented as an iterative algorithm, applicable to sparse systems that are too large to be handled by a direct implementation or other direct methods such as the Cholesky decomposition. Large sparse systems often arise when numerically solving partial differential equations or optimization problems. The conjugate gradient method can also be used to solve unconstrained optimization problems such as energy minimization. It is commonly attributed to Magnus Hestenes and Eduard Stiefel, who programmed it on the Z4, and extensively researched it. The biconjugate gradient method provides a generalization to non-symmetric matrices. Various nonlinear conjugate gradient methods seek minima of nonlinear optimization problems. Description of the problem addressed by co ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Broyden–Fletcher–Goldfarb–Shanno Algorithm
In numerical optimization, the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm is an iterative method for solving unconstrained nonlinear optimization problems. Like the related Davidon–Fletcher–Powell method, BFGS determines the descent direction by preconditioning the gradient with curvature information. It does so by gradually improving an approximation to the Hessian matrix of the loss function, obtained only from gradient evaluations (or approximate gradient evaluations) via a generalized secant method. Since the updates of the BFGS curvature matrix do not require matrix inversion, its computational complexity is only \mathcal(n^), compared to \mathcal(n^) in Newton's method. Also in common use is L-BFGS, which is a limited-memory version of BFGS that is particularly suited to problems with very large numbers of variables (e.g., >1000). The BFGS-B variant handles simple box constraints. The algorithm is named after Charles George Broyden, Roger Fletcher, ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Double Integrator
In systems and control theory, the double integrator is a canonical example of a second-order control system. It models the dynamics of a simple mass in one-dimensional space under the effect of a time-varying force input \textbf. Differential equations The differential equations which represent a double integrator are: :\ddot = u(t) :y = q(t) where both q(t), u(t) \in \mathbb Let us now represent this in state space form with the vector \textbf = \begin q\\ \dot\\ \end : \dot(t)= \frac = \begin \dot\\ \ddot\\ \end In this representation, it is clear that the control input \textbf is the second derivative of the output \textbf. In the scalar form, the control input is the second derivative of the output q. State space representation The normalized state space model of a double integrator t ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Feedback Control
Feedback occurs when outputs of a system are routed back as inputs as part of a chain of cause-and-effect that forms a circuit or loop. The system can then be said to ''feed back'' into itself. The notion of cause-and-effect has to be handled carefully when applied to feedback systems: History Self-regulating mechanisms have existed since antiquity, and the idea of feedback had started to enter economic theory in Britain by the 18th century, but it was not at that time recognized as a universal abstraction and so did not have a name. The first ever known artificial feedback device was a float valve, for maintaining water at a constant level, invented in 270 BC in Alexandria, Egypt. This device illustrated the principle of feedback: a low water level opens the valve, the rising water then provides feedback into the system, closing the valve when the required level is reached. This then reoccurs in a circular fashion as the water level fluctuates. Centrifugal governors were u ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Optimal Control
Optimal control theory is a branch of mathematical optimization that deals with finding a control for a dynamical system over a period of time such that an objective function is optimized. It has numerous applications in science, engineering and operations research. For example, the dynamical system might be a spacecraft with controls corresponding to rocket thrusters, and the objective might be to reach the moon with minimum fuel expenditure. Or the dynamical system could be a nation's economy, with the objective to minimize unemployment; the controls in this case could be fiscal and monetary policy. A dynamical system may also be introduced to embed operations research problems within the framework of optimal control theory. Optimal control is an extension of the calculus of variations, and is a mathematical optimization method for deriving control policies. The method is largely due to the work of Lev Pontryagin and Richard Bellman in the 1950s, after contributions to calc ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Quasi-Newton Method
Quasi-Newton methods are methods used to either find zeroes or local maxima and minima of functions, as an alternative to Newton's method. They can be used if the Jacobian or Hessian is unavailable or is too expensive to compute at every iteration. The "full" Newton's method requires the Jacobian in order to search for zeros, or the Hessian for finding extrema. Search for zeros: root finding Newton's method to find zeroes of a function g of multiple variables is given by x_ = x_n - _g(x_n) g(x_n), where _g(x_n) is the left inverse of the Jacobian matrix J_g(x_n) of g evaluated for x_n. Strictly speaking, any method that replaces the exact Jacobian J_g(x_n) with an approximation is a quasi-Newton method. For instance, the chord method (where J_g(x_n) is replaced by J_g(x_0) for all iterations) is a simple example. The methods given below for optimization refer to an important subclass of quasi-Newton methods, secant methods. Using methods developed to find extrema in order to fi ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Conjugate Gradient Method
In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-definite. The conjugate gradient method is often implemented as an iterative algorithm, applicable to sparse systems that are too large to be handled by a direct implementation or other direct methods such as the Cholesky decomposition. Large sparse systems often arise when numerically solving partial differential equations or optimization problems. The conjugate gradient method can also be used to solve unconstrained optimization problems such as energy minimization. It is commonly attributed to Magnus Hestenes and Eduard Stiefel, who programmed it on the Z4, and extensively researched it. The biconjugate gradient method provides a generalization to non-symmetric matrices. Various nonlinear conjugate gradient methods seek minima of nonlinear optimization problems. Description of the problem addressed by co ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Hessian Matrix
In mathematics, the Hessian matrix or Hessian is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German mathematician Ludwig Otto Hesse and later named after him. Hesse originally used the term "functional determinants". Definitions and properties Suppose f : \R^n \to \R is a function taking as input a vector \mathbf \in \R^n and outputting a scalar f(\mathbf) \in \R. If all second-order partial derivatives of f exist, then the Hessian matrix \mathbf of f is a square n \times n matrix, usually defined and arranged as follows: \mathbf H_f= \begin \dfrac & \dfrac & \cdots & \dfrac \\ .2ex \dfrac & \dfrac & \cdots & \dfrac \\ .2ex \vdots & \vdots & \ddots & \vdots \\ .2ex \dfrac & \dfrac & \cdots & \dfrac \end, or, by stating an equation for the coefficients using indices i and j, (\mathbf H_f)_ = \fra ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]