The Symmetric Rank 1 (SR1) method is a

quasi-Newton method In numerical analysis, a quasi-Newton method is an iterative numerical method used either to find zeroes or to find local maxima and minima of functions via an iterative recurrence formula much like the one for Newton's method, except using ap ...

to update the second derivative (Hessian) based on the derivatives (gradients) calculated at two points. It is a generalization to the secant method for a multidimensional problem. This update maintains the ''symmetry'' of the matrix but does ''not'' guarantee that the update be ''positive definite''. The sequence of Hessian approximations generated by the SR1 method converges to the true Hessian under mild conditions, in theory; in practice, the approximate Hessians generated by the SR1 method show faster progress towards the true Hessian than do popular alternatives ( BFGS or DFP), in preliminary numerical experiments. The SR1 method has computational advantages for sparse or partially separable problems. A twice continuously differentiable function

x \mapsto f(x)

has a

gradient In vector calculus, the gradient of a scalar-valued differentiable function f of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p gives the direction and the rate of fastest increase. The g ...

(

\nabla f

) and

Hessian matrix In mathematics, the Hessian matrix, Hessian or (less commonly) Hesse matrix is a square matrix of second-order partial derivatives of a scalar-valued Function (mathematics), function, or scalar field. It describes the local curvature of a functio ...

B

: The function

f

has an expansion as a

Taylor series In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor ser ...

x_0

, which can be truncated ::

f(x_0+\Delta x) \approx f(x_0)+\nabla f(x_0)^T \Delta x+\frac \Delta x^T  \Delta x

; its gradient has a Taylor-series approximation also ::

\nabla f(x_0+\Delta x) \approx \nabla f(x_0)+B \Delta x

, which is used to update

B

. The above secant-equation need not have a unique solution

B

. The SR1 formula computes (via an update of

rank A rank is a position in a hierarchy. It can be formally recognized—for example, cardinal, chief executive officer, general, professor—or unofficial. People Formal ranks * Academic rank * Corporate title * Diplomatic rank * Hierarchy ...

1) the symmetric solution that is closest to the current approximate-value

B_k

: ::

B_=B_+\frac

, where ::

y_k=\nabla f(x_k+\Delta x_k)-\nabla f(x_k)

. The corresponding update to the approximate inverse-Hessian

H_k=B_k^

is ::

H_=H_+\frac

. One might wonder why positive-definiteness is not preserved — after all, a rank-1 update of the form

B_ = B_k + vv^T

is positive-definite if

B_k

is. The explanation is that the update might be of the form

B_ = B_k - vv^T

instead because the denominator can be negative, and in that case there are no guarantees about positive-definiteness. The SR1 formula has been rediscovered a number of times. Since the denominator can vanish, some authors have suggested that the update be applied only if ::

, \Delta x_k^T (y_k-B_k \Delta x_k), \geq r \, \Delta x_k\, \cdot \, y_k-B_k \Delta x_k\,

, where

r\in(0,1)

is a small number, e.g.

10^

Limited Memory

The SR1 update maintains a dense matrix, which can be prohibitive for large problems. Similar to the L-BFGS method also a limited-memory SR1 (L-SR1) algorithm exists. Instead of storing the full Hessian approximation, a L-SR1 method only stores the

m

most recent pairs

\_^

, where

\Delta x_i := s_i

and

m

is an integer much smaller than the problem size (

m \ll n

). The limited-memory matrix is based on a compact matrix representation

S_k = \begin s_ & s_ & \ldots & s_ \end,

Y_k = \begin y_ & y_ & \ldots & y_ \end,

\big(L_k\big)_ = s^T_y_, \quad (D_k)_ = s^T_y_, \quad k-m \le i \le k-1

Since the update can be indefinite, the L-SR1 algorithm is suitable for a trust-region strategy. Because of the limited-memory matrix, the trust-region L-SR1 algorithm scales linearly with the problem size, just like L-BFGS.

References

{{Optimization algorithms, unconstrained Quasi-Newton methods

Limited Memory

See also

References