The Volterra series is a model for non-linear behavior similar to the

Taylor series In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor se ...

. It differs from the Taylor series in its ability to capture "memory" effects. The Taylor series can be used for approximating the response of a nonlinear system to a given input if the output of this system depends strictly on the input at that particular time. In the Volterra series the output of the nonlinear system depends on the input to the system at ''all'' other times. This provides the ability to capture the "memory" effect of devices like

capacitor A capacitor is a device that stores electrical energy in an electric field by virtue of accumulating electric charges on two close surfaces insulated from each other. It is a passive electronic component with two terminals. The effect of a ...

s and

inductor An inductor, also called a coil, choke, or reactor, is a passive two-terminal electrical component that stores energy in a magnetic field when electric current flows through it. An inductor typically consists of an insulated wire wound into a ...

s. It has been applied in the fields of medicine (

biomedical engineering Biomedical engineering (BME) or medical engineering is the application of engineering principles and design concepts to medicine and biology for healthcare purposes (e.g., diagnostic or therapeutic). BME is also traditionally logical sciences ...

) and biology, especially

neuroscience Neuroscience is the science, scientific study of the nervous system (the brain, spinal cord, and peripheral nervous system), its functions and disorders. It is a Multidisciplinary approach, multidisciplinary science that combines physiology, an ...

. It is also used in electrical engineering to model

intermodulation Intermodulation (IM) or intermodulation distortion (IMD) is the amplitude modulation of signals containing two or more different frequencies, caused by nonlinearities or time variance in a system. The intermodulation between frequency com ...

distortion in many devices, including power amplifiers and

frequency mixer In electronics, a mixer, or frequency mixer, is an electrical circuit that creates new frequencies from two signals applied to it. In its most common application, two signals are applied to a mixer, and it produces new signals at the sum and di ...

s. Its main advantage lies in its generality: it can represent a wide range of systems. Thus it is sometimes considered a

non-parametric Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being distr ...

model. In mathematics, a Volterra series denotes a functional expansion of a dynamic,

nonlinear In mathematics and science, a nonlinear system is a system in which the change of the output is not proportional to the change of the input. Nonlinear problems are of interest to engineers, biologists, physicists, mathematicians, and many other ...

, time-invariant

functional Functional may refer to: * Movements in architecture: ** Functionalism (architecture) ** Form follows function * Functional group, combination of atoms within molecules * Medical conditions without currently visible organic basis: ** Functional s ...

. Volterra series are frequently used in

system identification The field of system identification uses statistical methods to build mathematical models of dynamical systems from measured data. System identification also includes the optimal design of experiments for efficiently generating informative data f ...

. The Volterra series, which is used to prove the Volterra theorem, is an infinite sum of multidimensional convolutional integrals.

History

The Volterra series is a modernized version of the theory of analytic functionals due to the Italian mathematician

Vito Volterra Vito Volterra (, ; 3 May 1860 – 11 October 1940) was an Italian mathematician and physicist, known for his contributions to mathematical biology and integral equations, being one of the founders of functional analysis. Biography Born in An ...

in work dating from 1887.

Norbert Wiener Norbert Wiener (November 26, 1894 – March 18, 1964) was an American mathematician and philosopher. He was a professor of mathematics at the Massachusetts Institute of Technology (MIT). A child prodigy, Wiener later became an early researcher ...

became interested in this theory in the 1920s from contact with Volterra's student Paul Lévy. He applied his theory of

Brownian motion Brownian motion, or pedesis (from grc, πήδησις "leaping"), is the random motion of particles suspended in a medium (a liquid or a gas). This pattern of motion typically consists of random fluctuations in a particle's position insi ...

to the integration of Volterra analytic functionals. The use of Volterra series for system analysis originated from a restricted 1942 wartime report of Wiener, then professor of mathematics at

MIT The Massachusetts Institute of Technology (MIT) is a private land-grant research university in Cambridge, Massachusetts. Established in 1861, MIT has played a key role in the development of modern technology and science, and is one of the ...

. It used the series to make an approximate analysis of the effect of radar noise in a nonlinear receiver circuit. The report became public after the war. As a general method of analysis of nonlinear systems, Volterra series came into use after about 1957 as the result of a series of reports, at first privately circulated, from MIT and elsewhere. The name ''Volterra series'' came into use a few years later.

Mathematical theory

The theory of Volterra series can be viewed from two different perspectives: * An

operator Operator may refer to: Mathematics * A symbol indicating a mathematical operation * Logical operator or logical connective in mathematical logic * Operator (mathematics), mapping that acts on elements of a space to produce elements of another ...

mapping between two

function space In mathematics, a function space is a set of functions between two fixed sets. Often, the domain and/or codomain will have additional structure which is inherited by the function space. For example, the set of functions from any set into a ve ...

s (real or complex) * A real or complex functional mapping from a function space into real or complex numbers The latter, functional mapping perspective is in more frequent use due to the assumed time-invariance of the system.

Continuous time

A continuous

time-invariant system In control theory, a time-invariant (TIV) system has a time-dependent system function that is not a direct function of time. Such systems are regarded as a class of systems in the field of system analysis. The time-dependent system function is ...

with ''x''(''t'') as input and ''y''(''t'') as output can be expanded in Volterra series as :

y(t) = h_0 + \sum_^N \int_a^b \cdots \int_a^b
    h_n(\tau_1, \dots, \tau_n) \prod^n_ x(t - \tau_j) \,d\tau_j.

Here the constant term

h_0

on the right side is usually taken to be zero by suitable choice of output level

y

. The function

h_n(\tau_1, \dots, \tau_n)

is called the ''n''-th-order Volterra

kernel Kernel may refer to: Computing * Kernel (operating system), the central component of most operating systems * Kernel (image processing), a matrix used for image convolution * Compute kernel, in GPGPU programming * Kernel method, in machine lea ...

. It can be regarded as a higher-order

impulse response In signal processing and control theory, the impulse response, or impulse response function (IRF), of a dynamic system is its output when presented with a brief input signal, called an impulse (). More generally, an impulse response is the reac ...

of the system. For the representation to be unique, the kernels must be symmetrical in the ''n'' variables

\tau

. If it is not symmetrical, it can be replaced by a symmetrized kernel, which is the average over the ''n''! permutations of these ''n'' variables

\tau

. If ''N'' is finite, the series is said to be ''truncated''. If ''a'', ''b'', and ''N'' are finite, the series is called ''doubly finite''. Sometimes the ''n''-th-order term is divided by ''n''!, a convention which is convenient when taking the output of one Volterra system as the input of another ("cascading"). ''The causality condition'': Since in any physically realizable system the output can only depend on previous values of the input, the kernels

h_n(t_1, t_2, \ldots, t_n)

will be zero if any of the variables

t_1, t_2, \ldots, t_n

are negative. The integrals may then be written over the half range from zero to infinity. So if the operator is causal,

a \geq 0

. ''Fréchet's approximation theorem'': The use of the Volterra series to represent a time-invariant functional relation is often justified by appealing to a theorem due to Fréchet. This theorem states that a time-invariant functional relation (satisfying certain very general conditions) can be approximated uniformly and to an arbitrary degree of precision by a sufficiently high finite-order Volterra series. Among other conditions, the set of admissible input functions

x(t)

for which the approximation will hold is required to be

compact Compact as used in politics may refer broadly to a pact or treaty; in more specific cases it may refer to: * Interstate compact * Blood compact, an ancient ritual of the Philippines * Compact government, a type of colonial rule utilized in British ...

. It is usually taken to be an

equicontinuous In mathematical analysis, a family of functions is equicontinuous if all the functions are continuous and they have equal variation over a given neighbourhood, in a precise sense described herein. In particular, the concept applies to countable f ...

uniformly bounded In mathematics, a uniformly bounded family of functions is a family of bounded functions that can all be bounded by the same constant. This constant is larger than or equal to the absolute value of any value of any of the functions in the famil ...

set of functions, which is compact by the

Arzelà–Ascoli theorem The Arzelà–Ascoli theorem is a fundamental result of mathematical analysis giving necessary and sufficient conditions to decide whether every sequence of a given family of real-valued continuous functions defined on a closed and bounded inte ...

. In many physical situations, this assumption about the input set is a reasonable one. The theorem, however, gives no indication as to how many terms are needed for a good approximation, which is an essential question in applications.

Discrete time

This is similar to the continuous-time case: :

y(n) = h_0 + \sum_^P \sum_^b \cdots \sum_^b
    h_p(\tau_1, \dots, \tau_p) \prod^p_ x(n - \tau_j),

h_p(\tau_1, \dots, \tau_p)

are called discrete-time Volterra kernels. If ''P'' is finite, the series operator is said to be truncated. If ''a'', ''b'' and ''P'' are finite, the series operator is called doubly finite Volterra series. If

a \geq 0

, the operator is said to be ''causal''. We can always consider, without loss of the generality, the kernel

h_p(\tau_1, \dots, \tau_p)

as symmetrical. In fact, for the commutativity of the multiplication it is always possible to symmetrize it by forming a new kernel taken as the average of the kernels for all permutations of the variables

\tau_1, \dots, \tau_p

. For a

causal system In control theory, a causal system (also known as a physical or nonanticipative system) is a system where the output depends on past and current inputs but not future inputs—i.e., the output y(t_) depends only on the input x(t) for values of t ...

with symmetrical kernels we can rewrite the ''n''-th term approximately in triangular form :

\sum_^M \sum_^M \cdots \sum_^M
    h_p(\tau_1, \dots, \tau_p) \prod^p_ x(n - \tau_j).

Methods to estimate the kernel coefficients

Estimating the Volterra coefficients individually is complicated, since the basis functionals of the Volterra series are correlated. This leads to the problem of simultaneously solving a set of integral equations for the coefficients. Hence, estimation of Volterra coefficients is generally performed by estimating the coefficients of an orthogonalized series, e.g. the Wiener series, and then recomputing the coefficients of the original Volterra series. The Volterra series main appeal over the orthogonalized series lies in its intuitive, canonical structure, i.e. all interactions of the input have one fixed degree. The orthogonalized basis functionals will generally be quite complicated. An important aspect, with respect to which the following methods differ, is whether the orthogonalization of the basis functionals is to be performed over the idealized specification of the input signal (e.g. gaussian,

white noise In signal processing, white noise is a random signal having equal intensity at different frequencies, giving it a constant power spectral density. The term is used, with this or similar meanings, in many scientific and technical disciplines, ...

) or over the actual realization of the input (i.e. the pseudo-random, bounded, almost-white version of gaussian white noise, or any other stimulus). The latter methods, despite their lack of mathematical elegance, have been shown to be more flexible (as arbitrary inputs can be easily accommodated) and precise (due to the effect that the idealized version of the input signal is not always realizable).

Crosscorrelation method

This method, developed by Lee and Schetzen, orthogonalizes with respect to the actual mathematical description of the signal, i.e. the projection onto the new basis functionals is based on the knowledge of the moments of the random signal. We can write the Volterra series in terms of

homogeneous Homogeneity and heterogeneity are concepts often used in the sciences and statistics relating to the uniformity of a substance or organism. A material or image that is homogeneous is uniform in composition or character (i.e. color, shape, siz ...

operators, as :

y(n) = h_0 + \sum_^P H_p x(n),

where :

H_p x(n) = \sum_^b \cdots \sum_^b h_p(\tau_1, \dots, \tau_p) \prod^p_ x(n - \tau_j).

To allow identification orthogonalization, Volterra series must be rearranged in terms of orthogonal non-homogeneous ''G'' operators ( Wiener series): :

y(n) = \sum_p H_p x(n) \equiv \sum_p G_p x(n).

The ''G'' operators can be defined by the following: :

E\ = 0; \quad i < j,

E\ = 0; \quad i \neq j,

whenever

H_i x(n)

is arbitrary homogeneous Volterra, ''x''(''n'') is some stationary white noise (SWN) with zero mean and variance ''A''. Recalling that every Volterra functional is orthogonal to all Wiener functional of greater order, and considering the following Volterra functional: :

H^*_ x(n) = \prod^_ x(n - \tau_j),

we can write :

E\left\ = E\left\.

If ''x'' is SWN,

\tau_1 \neq \tau_2 \neq \ldots \neq \tau_P

and by letting

A = \sigma^2_x

, we have :

E\left\ = E\left\ = \overline! A^ k_(\tau_1, \dots, \tau_).

So if we exclude the diagonal elements,

, it is :

k_p(\tau_1, \dots, \tau_p) = \frac.

If we want to consider the diagonal elements, the solution proposed by Lee and Schetzen is :

k_p(\tau_1, \dots, \tau_p) = \frac.

The main drawback of this technique is that the estimation errors, made on all elements of lower-order kernels, will affect each diagonal element of order ''p'' by means of the summation

\sum\limits_^ G_m x(n)

, conceived as the solution for the estimation of the diagonal elements themselves. Efficient formulas to avoid this drawback and references for diagonal kernel element estimation exist Once the Wiener kernels were identified, Volterra kernels can be obtained by using Wiener-to-Volterra formulas, in the following reported for a fifth-order Volterra series: :

h_5 = k_5,

h_4 = k_4,

h_3 = k_3 - 10 A \sum_ k_5(\tau_1, \tau_2, \tau_3, \tau_4, \tau_4),

h_2 = k_2 - 6 A \sum_ k_4(\tau_1, \tau_2, \tau_3, \tau_3),

h_1 = k_1 - 3 A \sum_ k_3(\tau_1, \tau_2, \tau_2) + 15 A^2 \sum_ \sum_ k_5(\tau_1, \tau_2, \tau_2, \tau_3, \tau_3),

h_0 = k_0 - A \sum_ k_2(\tau_1, \tau_1) + 3 A^2 \sum_ \sum_ k_4(\tau_1, \tau_1, \tau_2, \tau_2).

Multiple-variance method

In the traditional orthogonal algorithm, using inputs with high

\sigma_x

has the advantage of stimulating high-order nonlinearity, so as to achieve more accurate high-order kernel identification. As a drawback, the use of high

\sigma_x

values causes high identification error in lower-order kernels, mainly due to nonideality of the input and truncation errors. On the contrary, the use of lower

\sigma_x

in the identification process can lead to a better estimation of lower-order kernel, but can be insufficient to stimulate high-order nonlinearity. This phenomenon, which can be called ''locality'' of truncated Volterra series, can be revealed by calculating the output error of a series as a function of different variances of input. This test can be repeated with series identified with different input variances, obtaining different curves, each with a minimum in correspondence of the variance used in the identification. To overcome this limitation, a low

\sigma_x

value should be used for the lower-order kernel and gradually increased for higher-order kernels. This is not a theoretical problem in Wiener kernel identification, since the Wiener functional are orthogonal to each other, but an appropriate normalization is needed in Wiener-to-Volterra conversion formulas for taking into account the use of different variances. Furthermore, new Wiener to Volterra conversion formulas are needed. The traditional Wiener kernel identification should be changed as follows: :

k_0^ = E\,

k_1^(\tau_1) = \frac E\left\,

k_2^(\tau_1, \tau_2) = \frac \left\,

k_3^(\tau_1, \tau_2, \tau_3) = \frac \left\.

In the above formulas the impulse functions are introduced for the identification of diagonal kernel points. If the Wiener kernels are extracted with the new formulas, the following Wiener-to-Volterra formulas (explicited up the fifth order) are needed: :

h_5 = k_5^,

h_4 = k_4^,

h_3 = k_3^ - 10 A_3 \sum_ k_5^(\tau_1, \tau_2, \tau_3, \tau_4, \tau_4),

h_2 = k_2^ - 6 A_2 \sum_ k_4^(\tau_1, \tau_2, \tau_3, \tau_3),

h_1 = k_1^ - 3 A_1 \sum_ k_3^(\tau_1, \tau_2, \tau_2) + 15 A_1^2 \sum_ \sum_ k_5^(\tau_1, \tau_2, \tau_2, \tau_3, \tau_3),

h_0 = k_0^ - A_0 \sum_ k_2^(\tau_1, \tau_1) + 3 A_0^2 \sum_ \sum_ k_4^(\tau_1, \tau_1, \tau_2, \tau_2).

As can be seen, the drawback with respect to the previous formula is that for the identification of the ''n''-th-order kernel, all lower kernels must be identified again with the higher variance. However, an outstanding improvement in the output MSE will be obtained if the Wiener and Volterra kernels are obtained with the new formulas.

Feedforward network

This method was developed by Wray and Green (1994) and utilizes the fact that a simple 2-layer

neural network A neural network is a network or neural circuit, circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up ...

(i.e. a

multilayer perceptron A multilayer perceptron (MLP) is a fully connected class of feedforward artificial neural network (ANN). The term MLP is used ambiguously, sometimes loosely to mean ''any'' feedforward ANN, sometimes strictly to refer to networks composed of mul ...

or feedforward network) is computationally equivalent to the Volterra series and therefore contains the kernels hidden in its architecture. After such a network has been trained to successfully predict the output based on the current state and memory of the system, the kernels can then be computed from the weights and biases of that network. The general notation for the ''n''-th-order volterra kernel is given by :

h_n(\tau_1, \dots, \tau_n) = \sum_^M (c_i a_ \omega_ \dots \omega_),

where

n

is the order,

c_i

the weights to the linear output node,

a_

the coefficients of the polynomial expansion of the output function of the hidden nodes, and

\omega_

are the weights from the input layer to the non-linear hidden layer. It is important to note that this method allows kernel extraction up until the number of input delays in the architecture of the network. Furthermore, it is vital to carefully construct the size of the network input layer so that it represents the effective memory of the system.

Exact orthogonal algorithm

This method and its more efficient version (fast orthogonal algorithm) were invented by Korenberg. In this method the orthogonalization is performed empirically over the actual input. It has been shown to perform more precisely than the crosscorrelation method. Another advantage is that arbitrary inputs can be used for the orthogonalization and that fewer data points suffice to reach a desired level of accuracy. Also, estimation can be performed incrementally until some criterion is fulfilled.

Linear regression

Linear regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is ...

is a standard tool from linear analysis. Hence, one of its main advantages is the widespread existence of standard tools for solving linear regressions efficiently. It has some educational value, since it highlights the basic property of Volterra series: linear combination of non-linear basis-functionals. For estimation, the order of the original should be known, since the Volterra basis functionals are not orthogonal, and thus estimation cannot be performed incrementally.

Kernel method

This method was invented by Franz and Schölkopf and is based on

statistical learning theory Statistical learning theory is a framework for machine learning drawing from the fields of statistics and functional analysis. Statistical learning theory deals with the statistical inference problem of finding a predictive function based on d ...

. Consequently, this approach is also based on minimizing the empirical error (often called

empirical risk minimization Empirical risk minimization (ERM) is a principle in statistical learning theory which defines a family of learning algorithms and is used to give theoretical bounds on their performance. The core idea is that we cannot know exactly how well an a ...

). Franz and Schölkopf proposed that the kernel method could essentially replace the Volterra series representation, although noting that the latter is more intuitive.

Differential sampling

This method was developed by van Hemmen and coworkers and utilizes

Dirac delta function In mathematics, the Dirac delta distribution ( distribution), also known as the unit impulse, is a generalized function or distribution over the real numbers, whose value is zero everywhere except at zero, and whose integral over the entire ...

s to sample the Volterra coefficients.

References

{{Reflist