HOME

TheInfoList



OR:

System identification The field of system identification uses statistical methods to build mathematical models of dynamical systems from measured data. System identification also includes the optimal design of experiments for efficiently generating informative dat ...
is a method of identifying or measuring the
mathematical model A mathematical model is a description of a system using mathematical concepts and language. The process of developing a mathematical model is termed mathematical modeling. Mathematical models are used in the natural sciences (such as physics, ...
of a
system A system is a group of interacting or interrelated elements that act according to a set of rules to form a unified whole. A system, surrounded and influenced by its environment, is described by its boundaries, structure and purpose and express ...
from measurements of the system inputs and outputs. The applications of system identification include any system where the inputs and outputs can be measured and include
industrial process Industrial processes are procedures involving chemical, physical, electrical or mechanical steps to aid in the manufacturing of an item or items, usually carried out on a very large scale. Industrial processes are the key components of heavy in ...
es,
control system A control system manages, commands, directs, or regulates the behavior of other devices or systems using control loops. It can range from a single home heating controller using a thermostat controlling a domestic boiler to large industrial ...
s,
economic data Economic data are data describing an actual economy, past or present. These are typically found in time-series form, that is, covering more than one time period (say the monthly unemployment rate for the last five years) or in cross-sectional dat ...
,
biology Biology is the scientific study of life. It is a natural science with a broad scope but has several unifying themes that tie it together as a single, coherent field. For instance, all organisms are made up of cells that process hereditary ...
and the
life sciences This list of life sciences comprises the branches of science that involve the scientific study of life – such as microorganisms, plants, and animals including human beings. This science is one of the two major branches of natural science, th ...
,
medicine Medicine is the science and practice of caring for a patient, managing the diagnosis, prognosis, prevention, treatment, palliation of their injury or disease, and promoting their health. Medicine encompasses a variety of health care pr ...
,
social system In sociology, a social system is the patterned network of relationships constituting a coherent whole that exist between individuals, groups, and institutions. It is the formal structure of role and status that can form in a small, stable group. A ...
s and many more. A
nonlinear system In mathematics and science, a nonlinear system is a system in which the change of the output is not proportional to the change of the input. Nonlinear problems are of interest to engineers, biologists, physicists, mathematicians, and many othe ...
is defined as any system that is not linear, that is any system that does not satisfy the
superposition principle The superposition principle, also known as superposition property, states that, for all linear systems, the net response caused by two or more stimuli is the sum of the responses that would have been caused by each stimulus individually. So tha ...
. This negative definition tends to obscure that there are very many different types of nonlinear systems. Historically, system identification for nonlinear systemsNelles O. "Nonlinear System Identification: From Classical Approaches to Neural Networks". Springer Verlag,2001Billings S.A. "Nonlinear System Identification: NARMAX Methods in the Time, Frequency, and Spatio-Temporal Domains". Wiley, 2013 has developed by focusing on specific classes of system and can be broadly categorized into five basic approaches, each defined by a model class: #
Volterra series The Volterra series is a model for non-linear behavior similar to the Taylor series. It differs from the Taylor series in its ability to capture "memory" effects. The Taylor series can be used for approximating the response of a nonlinear system ...
models, # Block-structured models, #
Neural network A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...
models, # NARMAX models, and # State-space models. There are four steps to be followed for system identification: data gathering, model postulate, parameter identification and model validation. Data gathering is considered as the first and essential part in identification terminology, used as the input for the model which is prepared later. It consists of selecting an appropriate data set, pre-processing and processing. It involves the implementation of the known algorithms together with the transcription of flight tapes, data storage and data management, calibration, processing, analysis and presentation. Moreover, model validation is necessary to gain confidence in, or reject, a particular model. In particular, the parameter estimation and the model validation are integral parts of the system identification. Validation refers to the process of confirming the conceptual model and demonstrating an adequate correspondence between the computational results of the model and the actual data.


Volterra series methods

The early work was dominated by methods based on the
Volterra series The Volterra series is a model for non-linear behavior similar to the Taylor series. It differs from the Taylor series in its ability to capture "memory" effects. The Taylor series can be used for approximating the response of a nonlinear system ...
, which in the discrete time case can be expressed as : \begin y(k)& = h_0+\sum\limits_^M h_1(m_1)u(k-m_1) + \sum\limits_^M \sum\limits_^M h_2(m_1,m_2)u(k-m_1)u(k-m_2) \\ & \quad+\sum\limits_^M \sum\limits_^M \sum\limits_^M h_3(m_1,m_2,m_3)u(k-m_1)u(k-m_2)u(k-m_3) + \cdots \end where ''u''(''k''), ''y''(''k''); ''k'' = 1, 2, 3, ... are the measured input and output respectively and h_\ell(m_1,\ldots ,m_\ell) is the ''l''th-order Volterra kernel, or ''l''th-order nonlinear impulse response. The Volterra series is an extension of the linear
convolution In mathematics (in particular, functional analysis), convolution is a mathematical operation on two functions ( and ) that produces a third function (f*g) that expresses how the shape of one is modified by the other. The term ''convolution'' ...
integral. Most of the earlier identification algorithms assumed that just the first two, linear and quadratic, Volterra kernels are present and used special inputs such as Gaussian white noise and correlation methods to identify the two Volterra kernels. In most of these methods the input has to be Gaussian and white which is a severe restriction for many real processes. These results were later extended to include the first three Volterra kernels, to allow different inputs, and other related developments including the Wiener series. A very important body of work was developed by Wiener, Lee, Bose and colleagues at MIT from the 1940s to the 1960s including the famous Lee and Schetzen method. While these methods are still actively studied today there are several basic restrictions. These include the necessity of knowing the number of Volterra series terms a priori, the use of special inputs, and the large number of estimates that have to be identified. For example, for a system where the first order Volterra kernel is described by say 30 samples, 30x30 points will be required for the second order kernel, 30x30x30 for the third order and so on and hence the amount of data required to provide good estimates becomes excessively large.Billings S.A.
Identification of Nonlinear Systems: A Survey
. IEE Proceedings Part D 127(6), 272–285,1980
These numbers can be reduced by exploiting certain symmetries but the requirements are still excessive irrespective of what algorithm is used for the identification.


Block-structured systems

Because of the problems of identifying Volterra models other model forms were investigated as a basis for system identification for nonlinear systems. Various forms of block structured nonlinear models have been introduced or re-introduced. The Hammerstein model consists of a static single valued nonlinear element followed by a linear dynamic element. The Wiener model is the reverse of this combination so that the linear element occurs before the static nonlinear characteristic. The Wiener-Hammerstein model consists of a static nonlinear element sandwiched between two dynamic linear elements, and several other model forms are available. The Hammerstein-Wiener model consists of a linear dynamic block sandwiched between two static nonlinear blocks. The Urysohn model is different from other block models, it does not consists of sequence linear and nonlinear blocks, but describes both dynamic and static nonlinearities in the expression of the kernel of an operator. All these models can be represented by a Volterra series but in this case the Volterra kernels take on a special form in each case. Identification consists of correlation based and parameter estimation methods. The correlation methods exploit certain properties of these systems, which means that if specific inputs are used, often white Gaussian noise, the individual elements can be identified one at a time. This results in manageable data requirements and the individual blocks can sometimes be related to components in the system under study. More recent results are based on parameter estimation and neural network based solutions. Many results have been introduced and these systems continue to be studied in depth. One problem is that these methods are only applicable to a very special form of model in each case and usually this model form has to be known prior to identification.


Neural networks

Artificial neural networks Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains. An ANN is based on a collection of connected units ...
try loosely to imitate the network of neurons in the brain where computation takes place through a large number of simple processing elements. A typical neural network consists of a number of simple processing units interconnected to form a complex network. Layers of such units are arranged so that data is entered at the input layer and passes through either one or several intermediate layers before reaching the output layer. In
supervised learning Supervised learning (SL) is a machine learning paradigm for problems where the available data consists of labelled examples, meaning that each data point contains features (covariates) and an associated label. The goal of supervised learning alg ...
the network is trained by operating on the difference between the actual output and the desired output of the network, the prediction error, to change the connection strengths between the nodes. By iterating the weights are modified until the output error reaches an acceptable level. This process is called machine learning because the network adjusts the weights so that the output pattern is reproduced. Neural networks have been extensively studied and there are many excellent textbooks devoted to this topic in general, and more focused textbooks which emphasise control and systems applications,. There are two main problem types that can be studied using neural networks: static problems, and dynamic problems. Static problems include
pattern recognition Pattern recognition is the automated recognition of patterns and regularities in data. It has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics ...
,
class Class or The Class may refer to: Common uses not otherwise categorized * Class (biology), a taxonomic rank * Class (knowledge representation), a collection of individuals or objects * Class (philosophy), an analytical concept used differently ...
ification, and
approximation An approximation is anything that is intentionally similar but not exactly equal to something else. Etymology and usage The word ''approximation'' is derived from Latin ''approximatus'', from ''proximus'' meaning ''very near'' and the prefix ' ...
. Dynamic problems involve lagged variables and are more appropriate for system identification and related applications. Depending on the architecture of the network the training problem can be either nonlinear-in-the-parameters which involves optimisation or linear-in-the-parameters which can be solved using classical approaches. The training algorithms can be categorised into supervised, unsupervised, or reinforcement learning. Neural networks have excellent approximation properties but these are usually based on standard function approximation results using for example the
Weierstrass Karl Theodor Wilhelm Weierstrass (german: link=no, Weierstraß ; 31 October 1815 – 19 February 1897) was a German mathematician often cited as the "father of modern analysis". Despite leaving university without a degree, he studied mathematics ...
Theorem that applies equally well to polynomials, rational functions, and other well-known models. Neural networks have been applied extensively to system identification problems which involve nonlinear and dynamic relationships. However, classical neural networks are purely gross static approximating machines. There is no dynamics within the network. Hence when fitting dynamic models all the dynamics arise by allocating lagged inputs and outputs to the input layer of the network. The training procedure then produces the best static approximation that relates the lagged variables assigned to the input nodes to the output. There are more complex network architectures, including recurrent networks, that produce dynamics by introducing increasing orders of lagged variables to the input nodes. But in these cases it is very easy to over specify the lags and this can lead to over fitting and poor generalisation properties. Neural networks have several advantages; they are conceptually simple, easy to train and to use, have excellent approximation properties, the concept of local and parallel processing is important and this provides integrity and fault tolerant behaviour. The biggest criticism of the classical neural network models is that the models produced are completely opaque and usually cannot be written down or analysed. It is therefore very difficult to know what is causing what, to analyse the model, or to compute dynamic characteristics from the model. Some of these points will not be relevant to all applications but they are for dynamic modelling.


NARMAX methods

The nonlinear autoregressive moving average model with exogenous inputs (NARMAX model) can represent a wide class of nonlinear systems, and is defined as : \begin y(k) & =F (k-1),y(k-2),\ldots ,y(k-n_y),u(k-d),u(k-d-1),\ldots ,u(k-d-n_u), \\ & \quad e(k-1),e(k-2),\ldots ,e(k-n_e)e(k) \end where ''y''(''k''), ''u''(''k'') and ''e''(''k'') are the system output, input, and noise sequences respectively; n_y, n_u, and n_e are the maximum lags for the system output, input and noise; F ��is some nonlinear function, d is a time delay typically set to ''d'' = 1.The model is essentially an expansion of past inputs, outputs and noise terms. Because the
noise Noise is unwanted sound considered unpleasant, loud or disruptive to hearing. From a physics standpoint, there is no distinction between noise and desired sound, as both are vibrations through a medium, such as air or water. The difference aris ...
is modelled explicitly, unbiased estimates of the system model can be obtained in the presence of unobserved highly correlated and nonlinear noise. The Volterra, the block structured models and many neural network architectures can all be considered as subsets of the NARMAX model. Since NARMAX was introduced, by proving what class of nonlinear systems can be represented by this model, many results and algorithms have been derived based around this description. Most of the early work was based on polynomial expansions of the NARMAX model. These are still the most popular methods today but other more complex forms based on
wavelets A wavelet is a wave-like oscillation with an amplitude that begins at zero, increases or decreases, and then returns to zero one or more times. Wavelets are termed a "brief oscillation". A taxonomy of wavelets has been established, based on the nu ...
and other expansions have been introduced to represent severely nonlinear and highly complex nonlinear systems. A significant proportion of nonlinear systems can be represented by a NARMAX model including systems with exotic behaviours such as chaos, bifurcations, and subharmonics. While NARMAX started as the name of a model it has now developed into a philosophy of nonlinear system identification,. The NARMAX approach consists of several steps: * Structure detection: which terms are in the model * Parameter estimation: determine the model coefficients * Model validation: is the model unbiased and correct * Prediction: what is the output at some future time * Analysis: what are the dynamical properties of the system Structure detection forms the most fundamental part of NARMAX. For example, a NARMAX model which consists of one lagged input and one lagged output term, three lagged noise terms, expanded as a cubic polynomial would consist of eighty two possible candidate terms. This number of candidate terms arises because the expansion by definition includes all possible combinations within the cubic expansion. Naively proceeding to estimate a model which includes all these terms and then pruning will cause numerical and computational problems and should always be avoided. However, only a few terms are often important in the model. Structure detection, which aims to select terms one at a time, is therefore critically important. These objectives can easily be achieved by using the Orthogonal Least Squares algorithm and its derivatives to select the NARMAX model terms one at a time. These ideas can also be adapted for
pattern recognition Pattern recognition is the automated recognition of patterns and regularities in data. It has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics ...
and
feature selection In machine learning and statistics, feature selection, also known as variable selection, attribute selection or variable subset selection, is the process of selecting a subset of relevant features (variables, predictors) for use in model construc ...
and provide an alternative to
principal component analysis Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and ...
but with the advantage that the features are revealed as basis functions that are easily related back to the original problem.
NARMAX methods are designed to do more than find the best approximating model. System identification can be divided into two aims. The first involves approximation where the key aim is to develop a model that approximates the data set such that good predictions can be made. There are many applications where this approach is appropriate, for example in time series prediction of the weather, stock prices, speech, target tracking, pattern classification etc. In such applications the form of the model is not that important. The objective is to find an approximation scheme which produces the minimum prediction errors. A second objective of system identification, which includes the first objective as a subset, involves much more than just finding a model to achieve the best mean squared errors. This second aim is why the NARMAX philosophy was developed and is linked to the idea of finding the simplest model structure. The aim here is to develop models that reproduce the dynamic characteristics of the underlying system, to find the simplest possible model, and if possible to relate this to components and behaviours of the system under study. The core aim of this second approach to identification is therefore to identify and reveal the rule that represents the system. These objectives are relevant to model simulation and control systems design, but increasingly to applications in medicine, neuro science, and the life sciences. Here the aim is to identify models, often nonlinear, that can be used to understand the basic mechanisms of how these systems operate and behave so that we can manipulate and utilise these. NARMAX methods have also been developed in the frequency and spatio-temporal domains.


Stochastic nonlinear models

In a general situation, it might be the case that some exogenous uncertain disturbance passes through the nonlinear dynamics and influence the outputs. A model class that is general enough to capture this situation is the class of stochastic nonlinear state-space models. A state-space model is usually obtained using first principle laws, such as mechanical, electrical, or thermodynamic physical laws, and the parameters to be identified usually have some physical meaning or significance. A discrete-time state-space model may be defined by the difference equations: : \begin x_ &= f(x_t, u_t, w_t; \theta),\\ y_t &= g(x_t, u_t, v_t; \theta), \quad t =1,2, \dots \end in which t is a positive integer referring to time. The functions f and g are general nonlinear functions. The first equation is known as the state equation and the second is known as the output equation. All the signals are modeled using
stochastic process In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that ap ...
es. The process x_t is known as the state process, w_t and v_t are usually assumed
independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independe ...
and mutually independent such that w_t \sim p(w; \theta),\; v_t \sim p(v; \theta). The parameter \theta is usually a finite-dimensional (real) parameter to be estimated (using experimental data). Observe that the state process does not have to be a physical signal, and it is normally unobserved (not measured). The data set is given as a set of input-output pairs (y_t, u_t) for t = 1, \dots, N for some finite positive integer value N. Unfortunately, due to the nonlinear transformation of unobserved random variables, the
likelihood function The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood functi ...
of the outputs is analytically intractable; it is given in terms of a multidimensional marginalization integral. Consequently, commonly used parameter estimation methods such as the Maximum Likelihood Method or the Prediction Error Method based on the optimal one-step ahead predictor are analytically intractable. Recently, algorithms based on sequential Monte Carlo methods have been used to approximate the conditional mean of the outputs or, in conjunction with the Expectation-Maximization algorithm, to approximate the maximum likelihood estimator. These methods, albeit asymptotically optimal, are computationally demanding and their use is limited to specific cases where the fundamental limitations of the employed particle filters can be avoided. An alternative solution is to apply the prediction error method using a sub-optimal predictor. The resulting estimator can be shown to be strongly consistent and asymptotically normal and can be evaluated using relatively simple algorithms.


See also

* Grey box model *
Statistical Model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form ...


References


Further reading

* Lennart Ljung: System Identification — Theory For the User, 2nd ed, PTR Prentice Hall, Upper Saddle River, N. J., 1999. * R. Pintelon, J. Schoukens, System Identification: A Frequency Domain Approach, IEEE Press, New York, 2001. * T. Söderström, P. Stoica, System Identification, Prentice Hall, Upper Saddle River, N.J., 1989. * R. K. Pearson:
Discrete-Time Dynamic Models
Oxford University Press, 1999. * P. Marmarelis, V. Marmarelis, V
Analysis of Physiological Systems
Plenum, 1978. * K. Worden, G. R. Tomlinson, Nonlinearity in Structural Dynamics, Institute of Physics Publishing, 2001. {{DEFAULTSORT:Nonlinear system identification Dynamical systems