HOME

TheInfoList



OR:

Generalized filtering is a generic Bayesian filtering scheme for nonlinear state-space models. It is based on a variational principle of least action, formulated in generalized coordinates of motion. Note that "generalized coordinates of motion" are related to—but distinct from—
generalized coordinates In analytical mechanics, generalized coordinates are a set of parameters used to represent the state of a system in a configuration space. These parameters must uniquely define the configuration of the system relative to a reference state.,p. 39 ...
as used in (multibody) dynamical systems analysis. Generalized filtering furnishes posterior densities over hidden states (and parameters) generating observed data using a generalized gradient descent on variational free energy, under the Laplace assumption. Unlike classical (e.g. Kalman-Bucy or
particle In the Outline of physical science, physical sciences, a particle (or corpuscule in older texts) is a small wikt:local, localized physical body, object which can be described by several physical property, physical or chemical property, chemical ...
) filtering, generalized filtering eschews Markovian assumptions about random fluctuations. Furthermore, it operates online, assimilating data to approximate the posterior density over unknown quantities, without the need for a backward pass. Special cases include variational filtering, dynamic expectation maximization and generalized predictive coding.


Definition

Definition: Generalized filtering rests on the
tuple In mathematics, a tuple is a finite ordered list (sequence) of elements. An -tuple is a sequence (or ordered list) of elements, where is a non-negative integer. There is only one 0-tuple, referred to as ''the empty tuple''. An -tuple is defi ...
(\Omega,U,X,S,p,q): * ''A sample space'' \Omega from which random fluctuations \omega \in \Omega are drawn * ''Control states'' U \in \mathbb – that act as external causes, input or forcing terms * ''Hidden states'' X:X \times U \times \Omega \to \mathbb – that cause sensory states and depend on control states * ''Sensor states'' S:X \times U \times \Omega \to \mathbb – a probabilistic mapping from hidden and control states * ''Generative density'' p(\tilde,\tilde,\tilde\mid m) – over sensory, hidden and control states under a generative model m * ''Variational density'' q(\tilde,\tilde\mid \tilde) – over hidden and control states with mean \tilde \in \mathbb Here ~ denotes a variable in generalized coordinates of motion: \tilde = ,u',u'',\ldotsT


Generalized filtering

The objective is to approximate the posterior density over hidden and control states, given sensor states and a generative model – and estimate the (path integral of)
model evidence A marginal likelihood is a likelihood function that has been integrated over the parameter space. In Bayesian statistics, it represents the probability of generating the observed sample from a prior and is therefore often referred to as model evi ...
p(\tilde(t)\vert m) to compare different models. This generally involves an intractable marginalization over hidden states, so model evidence (or marginal likelihood) is replaced with a variational free energy bound. Given the following definitions: : \tilde(t) = \underset \ : G(\tilde,\tilde,\tilde)=-\ln p(\tilde,\tilde,\tilde\vert m) Denote the
Shannon entropy Shannon may refer to: People * Shannon (given name) * Shannon (surname) * Shannon (American singer), stage name of singer Shannon Brenda Greene (born 1958) * Shannon (South Korean singer), British-South Korean singer and actress Shannon Arrum W ...
of the density q by H E_q \log(q)/math>. We can then write the variational free energy in two ways: : F(\tilde, \tilde\mu)=E_q (\tilde,\tilde,\tilde)H (\tilde,\tilde \vert \tilde)=-\ln p(\tilde \vert m)+D_ (\tilde,\tilde \vert \tilde)\vert \vert p(\tilde,\tilde \vert \tilde,m) The second equality shows that minimizing variational free energy (i) minimizes the Kullback-Leibler divergence between the variational and true posterior density and (ii) renders the variational free energy (a bound approximation to) the negative log evidence (because the divergence can never be less than zero). Under the Laplace assumption q(\tilde,\tilde\mid \tilde)=\mathcal(\tilde,C) the variational density is Gaussian and the precision that minimizes free energy is C^=\Pi =\partial_ G(\tilde). This means that free-energy can be expressed in terms of the variational mean (omitting constants): : F=G(\tilde)+\textstyle\ln \vert \partial_ G(\tilde)\vert The variational means that minimize the (path integral) of free energy can now be recovered by solving the generalized filter: : \dot=D\tilde-\partial_ F(\tilde,\tilde) where D is a block matrix derivative operator of identify matrices such that D\tilde= ',u'',\ldots T


Variational basis

Generalized filtering is based on the following lemma: ''The self-consistent solution to'' \dot=D\tilde-\partial_ F(s,\tilde) ''satisfies the variational principle of stationary action, where action is the path integral of variational free energy'' : S=\int dt\, F(\tilde(t),\tilde(t)) Proof: self-consistency requires the motion of the mean to be the mean of the motion and (by the fundamental lemma of variational calculus) : \dot = D\tilde\Leftrightarrow \partial_ F(\tilde,\tilde) = 0\Leftrightarrow \delta_ S=0 Put simply, small perturbations to the path of the mean do not change variational free energy and it has the least action of all possible (local) paths. Remarks: Heuristically, generalized filtering performs a gradient descent on variational free energy in a moving frame of reference: \dot-D\tilde=-\partial_ F(s,\tilde), where the frame itself minimizes variational free energy. For a related example in statistical physics, see Kerr and Graham who use ensemble dynamics in generalized coordinates to provide a generalized phase-space version of Langevin and associated Fokker-Planck equations. In practice, generalized filtering uses
local linearization In mathematics, linearization is finding the linear approximation to a function at a given point. The linear approximation of a function is the first order Taylor expansion around the point of interest. In the study of dynamical systems, linear ...
over intervals \Delta t to recover discrete updates : \begin \Delta \tilde & =(\exp (\Delta t\cdot J)-I)J^\dot \\ J & =\partial_ \dot =D-\partial _ F(\tilde,\tilde) \end This updates the means of hidden variables at each interval (usually the interval between observations).


Generative (state-space) models in generalized coordinates

Usually, the generative density or model is specified in terms of a nonlinear input-state-output model with continuous nonlinear functions: : \begin s & = g(x,u)+\omega_s \\ \dot & = f(x,u)+\omega_x \end The corresponding generalized model (under local linearity assumptions) obtains the from the chain rule : \begin \tilde & =\tilde(\tilde,\tilde)+\tilde_s \\ \\ s & =g(x,u)+\omega_s \\ s' & =\partial_x g\cdot x'+\partial_u g\cdot u'+\omega'_s \\ s'' & =\partial_x g\cdot x''+\partial_u g\cdot u''+\omega''_s \\ & \vdots \\ \end \qquad \begin \dot & =\tilde(\tilde,\tilde)+\tilde_x \\ \\ \dot & =f(x,u)+\omega_x \\ \dot' & =\partial_x f\cdot x'+\partial_u f\cdot u'+\omega'_x \\ \dot''& =\partial_x f\cdot x''+\partial_u f\cdot u''+\omega''_x \\ & \vdots \end Gaussian assumptions about the random fluctuations \omega then prescribe the likelihood and empirical priors on the motion of hidden states : \begin p\left( \tilde,\tilde,\tilde\vert m \right) & = p\left( \tilde\vert \tilde,\tilde,m \right)p\left( \right)p(x\vert m)p(\tilde\vert m) \\ p\left( \tilde\vert \tilde,\tilde,m \right) & = \mathcal(\tilde(\tilde,\tilde),\tilde(\tilde,\tilde)_s) \\ p\left( \right) & = (\tilde(\tilde,\tilde),\tilde(\tilde,\tilde)_x ) \\ \end The covariances \tilde=V\otimes \Sigma factorize into a covariance among variables and correlations V among generalized fluctuations that encodes their
autocorrelation Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variable ...
: : V=\begin 1 & 0 & \ddot(0) & \cdots \\ 0 & -\ddot(0) & 0 \ & \ \\ \ddot(0) \ & 0 \ & \ddot(0) \ & \ \\ \vdots \ & \ & \ & \ddots \ \\ \end Here, \ddot(0) is the second derivative of the autocorrelation function evaluated at zero. This is a ubiquitous measure of roughness in the theory of
stochastic processes In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that appe ...
. Crucially, the precision (inverse variance) of high order derivatives fall to zero fairly quickly, which means it is only necessary to model relatively low order generalized motion (usually between two and eight) for any given or parameterized autocorrelation function.


Special cases


Filtering discrete time series

When time series are observed as a discrete sequence of N observations, the implicit sampling is treated as part of the generative process, where (using
Taylor's theorem In calculus, Taylor's theorem gives an approximation of a ''k''-times differentiable function around a given point by a polynomial of degree ''k'', called the ''k''th-order Taylor polynomial. For a smooth function, the Taylor polynomial is the t ...
) : _1 ,\dots ,s_N T = (E\otimes I)\cdot \tilde(t): \qquad E_ =\frac In principle, the entire sequence could be used to estimate hidden variables at each point in time. However, the precision of samples in the past and future falls quickly and can be ignored. This allows the scheme to assimilate data online, using local observations around each time point (typically between two and eight).


Generalized filtering and model parameters

For any slowly varying model parameters of the equations of motion f(x,u,\theta ) or precision \tilde(x,u,\theta ) generalized filtering takes the following form (where \mu corresponds to the variational mean of the parameters) : \begin \dot & = \mu' \\ \dot & = -\partial_\mu F(\tilde,\mu )-\kappa \mu' \end Here, the solution \dot=0 minimizes variational free energy, when the motion of the mean is small. This can be seen by noting \dot='=0\Rightarrow \partial_ F=0\Rightarrow \delta_ S=0. It is straightforward to show that this solution corresponds to a classical Newton update.


Relationship to Bayesian filtering and predictive coding


Generalized filtering and Kalman filtering

Classical filtering under Markovian or Wiener assumptions is equivalent to assuming the precision of the motion of random fluctuations is zero. In this limiting case, one only has to consider the states and their first derivative \tilde=(\mu ,'). This means generalized filtering takes the form of a Kalman-Bucy filter, with prediction and correction terms: : \begin \dot & = \mu'-\partial_\mu F(s,\tilde) \\ \dot & =-\partial_ F(s,\tilde) \end Substituting this first-order filtering into the discrete update scheme above gives the equivalent of (extended) Kalman filtering.


Generalized filtering and particle filtering

Particle filter Particle filters, or sequential Monte Carlo methods, are a set of Monte Carlo algorithms used to solve filtering problems arising in signal processing and Bayesian statistical inference. The filtering problem consists of estimating the i ...
ing is a sampling-based scheme that relaxes assumptions about the form of the variational or approximate posterior density. The corresponding generalized filtering scheme is called variational filtering.K J Friston,
Variational filtering
" Neuroimage, vol. 41, no. 3, pp. 747-66, 2008.
In variational filtering, an ensemble of particles diffuse over the free energy landscape in a frame of reference that moves with the expected (generalized) motion of the ensemble. This provides a relatively simple scheme that eschews Gaussian (unimodal) assumptions. Unlike particle filtering it does not require proposal densities—or the elimination or creation of particles.


Generalized filtering and variational Bayes

Variational Bayes Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables (usuall ...
rests on a mean field partition of the variational density: : q(\tilde,\tilde,\theta \dots \vert \tilde,\mu )=q(\tilde,\tilde\vert \tilde)q(\theta \vert \mu )\dots This partition induces a variational update or step for each marginal density—that is usually solved analytically using conjugate priors. In generalized filtering, this leads to dynamic expectation maximisation.K J Friston, N Trujillo-Barreto, and J Daunizeau,
DEM: A variational treatment of dynamic systems
" Neuroimage, vol. 41, no. 3, pp. 849-85, 2008
that comprises a D-step that optimizes the sufficient statistics of unknown states, an E-step for parameters and an M-step for precisions.


Generalized filtering and predictive coding

Generalized filtering is usually used to invert hierarchical models of the following form : \begin \tilde & =\tilde^1 (\tilde^1,\tilde^)+\tilde_s^ \\ \dot^ & =\tilde^(\tilde^,\tilde^)+\tilde_^ \\ \vdots \\ \tilde^ & =\tilde^(\tilde^,\tilde^)+\tilde_u^ \\ \dot^ & =\tilde^(\tilde^,\tilde^)+\tilde_x^ \\ \vdots \end The ensuing generalized gradient descent on free energy can then be expressed compactly in terms of prediction errors, where (omitting high order terms): : \begin \dot_u^ & =D\tilde^-\partial_u \tilde^\cdot \Pi^\tilde^ -\Pi^\tilde_u^ \\ \dot_x^ & =D\tilde^-\partial_x \tilde^\cdot \Pi^\tilde^ \\ \\ \tilde_u^ & =\tilde_u^ -\tilde^ \\ \tilde_x^ & =D\tilde_x^ -\tilde^ \end Here, \Pi^ is the precision of random fluctuations at the ''i''-th level. This is known as generalized predictive coding 1 with
linear predictive coding Linear predictive coding (LPC) is a method used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model. ...
as a special case.


Applications

Generalized filtering has been primarily applied to biological timeseries—in particular functional magnetic resonance imaging and electrophysiological data. This is usually in the context of
dynamic causal modelling Dynamic causal modeling (DCM) is a framework for specifying models, fitting them to data and comparing their evidence using Bayes factor, Bayesian model comparison. It uses nonlinear State space, state-space models in continuous time, specified us ...
to make inferences about the underlying architectures of (neuronal) systems generating data. It is also used to simulate inference in terms of generalized (hierarchical) predictive coding in the brain.K Friston,
Hierarchical models in the brain
" PLoS Comput. Biol., vol. 4, no. 11, p. e1000211, 2008.


See also

*
Dynamic Bayesian network A Dynamic Bayesian Network (DBN) is a Bayesian network (BN) which relates variables to each other over adjacent time steps. This is often called a ''Two-Timeslice'' BN (2TBN) because it says that at any point in time T, the value of a variable c ...
*
Kalman filter For statistics and control theory, Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, including statistical noise and other inaccuracies, and produces estimat ...
*
Linear predictive coding Linear predictive coding (LPC) is a method used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model. ...
*
Optimal control Optimal control theory is a branch of mathematical optimization that deals with finding a control for a dynamical system over a period of time such that an objective function is optimized. It has numerous applications in science, engineering and ...
*
Particle filter Particle filters, or sequential Monte Carlo methods, are a set of Monte Carlo algorithms used to solve filtering problems arising in signal processing and Bayesian statistical inference. The filtering problem consists of estimating the i ...
*
Recursive Bayesian estimation In probability theory, statistics, and machine learning, recursive Bayesian estimation, also known as a Bayes filter, is a general probabilistic approach for estimating an unknown probability density function (PDF) recursively over time using inco ...
* System identification *
Variational Bayesian methods Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables (usually ...


References

{{Reflist, 3


External links


software
demonstrations and applications are available as academic freeware (as Matlab code) in the DEM toolbox of SPM
papers
collection of technical and application papers Bayesian estimation Systems theory Control theory Nonlinear filters Linear filters Signal estimation Stochastic differential equations Markov models