HOME

TheInfoList



OR:

The adjoint state method is a numerical method for efficiently computing the
gradient In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gr ...
of a
function Function or functionality may refer to: Computing * Function key, a type of key on computer keyboards * Function model, a structured representation of processes in a system * Function object or functor or functionoid, a concept of object-oriente ...
or operator in a numerical optimization problem. It has applications in
geophysics Geophysics () is a subject of natural science concerned with the physical processes and physical properties of the Earth and its surrounding space environment, and the use of quantitative methods for their analysis. The term ''geophysics'' so ...
,
seismic imaging Geophysical imaging (also known as geophysical tomography) is a minimally destructive geophysical technique that investigates the subsurface of a terrestrial planet. Geophysical imaging is a noninvasive imaging technique with a high parametrical a ...
,
photonics Photonics is a branch of optics that involves the application of generation, detection, and manipulation of light in form of photons through emission, transmission, modulation, signal processing, switching, amplification, and sensing. Though ...
and more recently in neural networks. The adjoint state space is chosen to simplify the physical interpretation of equation constraints.Plessix, R-E. "A review of the adjoint-state method for computing the gradient of a functional with geophysical applications." Geophysical Journal International,2006,167(2): 495-503
free access on GJI website
/ref> Adjoint state techniques allow the use of
integration by parts In calculus, and more generally in mathematical analysis, integration by parts or partial integration is a process that finds the integral of a product of functions in terms of the integral of the product of their derivative and antiderivative. ...
, resulting in a form which explicitly contains the physically interesting quantity. An adjoint state equation is introduced, including a new unknown variable. The adjoint method formulates the gradient of a function towards its parameters in a constraint optimization form. By using the dual form of this constraint optimization problem, it can be used to calculate the gradient very fast. A nice property is that the number of computations is independent of the number of parameters for which you want the gradient. The adjoint method is derived from the
dual problem In mathematical optimization theory, duality or the duality principle is the principle that optimization problems may be viewed from either of two perspectives, the primal problem or the dual problem. If the primal is a minimization problem then t ...
and is used e.g. in the
Landweber iteration The Landweber iteration or Landweber algorithm is an algorithm to solve ill-posed linear inverse problems, and it has been extended to solve non-linear problems that involve constraints. The method was first proposed in the 1950s by Louis Landweber, ...
method. The name adjoint state method refers to the dual form of the problem, where the
adjoint matrix In mathematics, the conjugate transpose, also known as the Hermitian transpose, of an m \times n complex matrix \boldsymbol is an n \times m matrix obtained by transposing \boldsymbol and applying complex conjugate on each entry (the complex c ...
A^*=\overline A ^T is used. When the initial problem consists of calculating the product s^T x and x must satisfy Ax=b, the dual problem can be realized as calculating the product , where r must satisfy A^* r = s . And r is called the adjoint state vector.


General case

The original adjoint calculation method goes back to Jean Cea, with the use of the lagrangian of the optimization problem to compute the derivative of a functional with respect to a
shape A shape or figure is a graphical representation of an object or its external boundary, outline, or external surface, as opposed to other properties such as color, texture, or material type. A plane shape or plane figure is constrained to lie ...
parameter. For a state variable u \in \mathcal, an optimization variable v\in\mathcal, an objective functional J:\mathcal\times\mathcal\to\mathbb is defined. The state variable u is often implicitly dependant on v through the (direct) state equation D_v(u)=0 (usually the weak form of a partial differential equation), thus the considered objective is j(v) = J(u_v,v). Usually, one would be interested in calculating \nabla j(v) using the
chain rule In calculus, the chain rule is a formula that expresses the derivative of the composition of two differentiable functions and in terms of the derivatives of and . More precisely, if h=f\circ g is the function such that h(x)=f(g(x)) for every , ...
: : \nabla j(v) = \nabla_v J(u_v,v) + \nabla_u J(u_v)\nabla_v u_v. Unfortunately, the term \nabla_v u_v is often very hard to differentiate analytically since the dependance is defined through an implicit equation. The lagrangian functional can be used as a workaround for this issue. Since the state equation can be considered as a constraint in the minimization of j, the problem : \text\ j(v) = J(u_v,v) : \text\ D_v(u_v) = 0 has an associate lagrangian functional \mathcal:\mathcal\times\mathcal\times\mathcal \to \mathbb defined by : \mathcal(u,v,\lambda) = J(u,v) + \langle D_v(u),\lambda\rangle, where \lambda\in\mathcal is a
Lagrange multiplier In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equality constraints (i.e., subject to the condition that one or more equations have to be satisfied e ...
or adjoint state variable and \langle\cdot,\cdot\rangle is an
inner product In mathematics, an inner product space (or, rarely, a Hausdorff pre-Hilbert space) is a real vector space or a complex vector space with an operation called an inner product. The inner product of two vectors in the space is a scalar, often ...
on \mathcal. The method of Lagrange multipliers states that a solution to the problem has to be a
stationary point In mathematics, particularly in calculus, a stationary point of a differentiable function of one variable is a point on the graph of the function where the function's derivative is zero. Informally, it is a point where the function "stops" in ...
of the lagrangian, namely : \begin d_u\mathcal(u,v,\lambda;\delta_u) = d_uJ(u,v;\delta_u) + \langle \delta_u,D_v^*(\lambda)\rangle = 0 & \forall \delta_u \in \mathcal,\\ d_v\mathcal(u,v,\lambda;\delta_v) = d_vJ(u,v;\delta_v) + \langle d_vD_v(u;\delta_v),\lambda\rangle = 0 & \forall \delta_v\in\mathcal,\\ d_\lambda\mathcal(u,v,\lambda;\delta_\lambda) = \langle D_v(u), \delta_\lambda\rangle = 0 \quad & \forall \delta_\lambda \in \mathcal, \end where d_xF(x;\delta_x) is the
Gateaux derivative In mathematics, the Gateaux differential or Gateaux derivative is a generalization of the concept of directional derivative in differential calculus. Named after René Gateaux, a French mathematician who died young in World War I, it is defined ...
of F with respect to x in the direction \delta_x. The last equation is equivalent to D_v(u) = 0, the state equation, to which the solution is u_v. The first equation is the so-called adjoint state equation, : \langle \delta_u,D_v^*(\lambda)\rangle = -d_uJ(u_v,v;\delta_u) \quad \forall \delta_u \in \mathcal, because the operator involved is the adjoint operator of D_v, D_v^*. Resolving this equation yields the adjoint state \lambda_v. The gradient of the quantity of interest j with respect to v is \langle \nabla j(v),\delta_v\rangle = d_v j(v;\delta_v) = d_v \mathcal(u_v,v,\lambda_v;\delta_v) (the second equation with u=u_v and \lambda=\lambda_v), thus it can be easily identified by subsequently resolving the direct and adjoint state equations. The process is even simpler when the operator D_v is
self-adjoint In mathematics, and more specifically in abstract algebra, an element ''x'' of a *-algebra is self-adjoint if x^*=x. A self-adjoint element is also Hermitian, though the reverse doesn't necessarily hold. A collection ''C'' of elements of a st ...
or symmetric since the direct and adjoint state equations differ only by their right-hand side.


Example: Linear case

In a real finite dimensional linear programming context, the objective function could be J(u,v) = \langle Au , v \rangle, for v\in\mathbb^n, u\in\mathbb^m and A \in \mathbb^, and let the state equation be B_vu = b, with B_v \in \mathbb^ and b\in \mathbb^m. The lagrangian function of the problem is \mathcal(u,v,\lambda) = \langle Au,v\rangle + \langle B_vu - b,\lambda\rangle, where \lambda\in\mathbb^m. The derivative of \mathcal with respect to \lambda yields the state equation as shown before, and the state variable is u_v = B_v^b. The derivative of \mathcal with respect to u is equivalent to the adjoint equation, which is, for every \delta_u \in \mathbb^m, : d_u langle B_v\cdot- b, \lambda\rangleu;\delta_u)= - \langle A^\top v,\delta u\rangle \iff \langle B_v \delta_u,\lambda\rangle = - \langle A^\top v,\delta u\rangle \iff \langle B_v^\top \lambda + A^\top v,\delta_u\rangle = 0 \iff B_v^\top \lambda = - A^\top v. Thus, we can write symbolically \lambda_v = B_v^A^\top v. The gradient would be : \langle \nabla j(v),\delta_v\rangle = \langle Au_v,\delta_v\rangle + \langle \nabla_vB_v : \lambda_v\otimes u_v,\delta_v\rangle, where \nabla_v B_v = \frac is a third order
tensor In mathematics, a tensor is an algebraic object that describes a multilinear relationship between sets of algebraic objects related to a vector space. Tensors may map between different objects such as vectors, scalars, and even other tensor ...
, \lambda_v \otimes u_v = \lambda^\top_v u_v is the
dyadic product In mathematics, specifically multilinear algebra, a dyadic or dyadic tensor is a second order tensor, written in a notation that fits in with vector algebra. There are numerous ways to multiply two Euclidean vectors. The dot product takes in two v ...
between the direct and adjoint states and : denotes a double
tensor contraction In multilinear algebra, a tensor contraction is an operation on a tensor that arises from the natural pairing of a finite-dimensional vector space and its dual. In components, it is expressed as a sum of products of scalar components of the tens ...
. It is assumed that B_v has a known analytic expression that can be differentiated easily.


Numerical consideration for the self-adjoint case

If the operator B_v was self-adjoint, B_v = B_v^\top, the direct state equation and the adjoint state equation would have the same left-hand side. In the goal of never inverting a matrix, which is a very slow process numerically, a
LU decomposition In numerical analysis and linear algebra, lower–upper (LU) decomposition or factorization factors a matrix as the product of a lower triangular matrix and an upper triangular matrix (see matrix decomposition). The product sometimes includes a p ...
can be used instead to solve the state equation, in O(m^3) operations for the decomposition and O(m^2) operations for the resolution. That same decomposition can then be used to solve the adjoint state equation in only O(m^2) operations since the matrices are the same.


See also

*
Backpropagation In machine learning, backpropagation (backprop, BP) is a widely used algorithm for training feedforward artificial neural networks. Generalizations of backpropagation exist for other artificial neural networks (ANNs), and for functions gener ...
*
Adjoint equation An adjoint equation is a linear differential equation, usually derived from its primal equation using integration by parts. Gradient values with respect to a particular quantity of interest can be efficiently calculated by solving the adjoint equat ...
*
Shape optimization Shape optimization is part of the field of optimal control theory. The typical problem is to find the shape which is optimal in that it minimizes a certain cost functional while satisfying given constraints. In many cases, the functional being ...
* Method of Lagrange multipliers


References


External links

* A well written explanation by Errico
What is an adjoint Model?
* Another well written explanation with worked examples, written by Bradle

* More technical explanation:
review
of the adjoint-state method for computing the gradient of a functional with geophysical applications * MIT cours

* MIT note

Numerical analysis {{Mathapplied-stub