HOME

TheInfoList



OR:

Bayesian programming is a formalism and a methodology for having a technique to specify
probabilistic models Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
and solve problems when less than the necessary information is available. Edwin T. Jaynes proposed that probability could be considered as an alternative and an extension of logic for rational reasoning with incomplete and uncertain information. In his founding book ''Probability Theory: The Logic of Science'' he developed this theory and proposed what he called “the robot,” which was not a physical device, but an
inference engine In the field of artificial intelligence, an inference engine is a component of the system that applies logical rules to the knowledge base to deduce new information. The first inference engines were components of expert systems. The typical expert ...
to automate probabilistic reasoning—a kind of
Prolog Prolog is a logic programming language associated with artificial intelligence and computational linguistics. Prolog has its roots in first-order logic, a formal logic, and unlike many other programming languages, Prolog is intended primarily ...
for probability instead of logic. Bayesian programming is a formal and concrete implementation of this "robot". Bayesian programming may also be seen as an algebraic formalism to specify
graphical model A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a Graph (discrete mathematics), graph expresses the conditional dependence structure between random variables. They are ...
s such as, for instance,
Bayesian network A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bay ...
s,
dynamic Bayesian network A Dynamic Bayesian Network (DBN) is a Bayesian network (BN) which relates variables to each other over adjacent time steps. This is often called a ''Two-Timeslice'' BN (2TBN) because it says that at any point in time T, the value of a variable c ...
s,
Kalman filter For statistics and control theory, Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, including statistical noise and other inaccuracies, and produces estimat ...
s or
hidden Markov model A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ob ...
s. Indeed, Bayesian Programming is more general than
Bayesian network A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bay ...
s and has a power of expression equivalent to probabilistic
factor graph A factor graph is a bipartite graph representing the factorization of a function. In probability theory and its applications, factor graphs are used to represent factorization of a probability distribution function, enabling efficient computatio ...
s.


Formalism

A Bayesian program is a means of specifying a family of probability distributions. The constituent elements of a Bayesian program are presented below: : \text \begin \text \begin \text (\pi) \begin \text\\ \text\\ \text\\ \end\\ \text\delta) \end\\ \text \end # A program is constructed from a description and a question. # A description is constructed using some specification (\pi) as given by the programmer and an identification or learning process for the parameters not completely specified by the specification, using a data set (\delta). # A specification is constructed from a set of pertinent variables, a decomposition and a set of forms. # Forms are either parametric forms or questions to other Bayesian programs. # A question specifies which probability distribution has to be computed.


Description

The purpose of a description is to specify an effective method of computing a
joint probability distribution Given two random variables that are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs. The joint distribution can just as well be considered ...
on a set of variables \left\ given a set of experimental data \delta and some specification \pi. This
joint distribution Given two random variables that are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs. The joint distribution can just as well be considered ...
is denoted as: P\left(X_\wedge X_\wedge\cdots\wedge X_\mid\delta\wedge\pi\right). To specify preliminary knowledge \pi, the programmer must undertake the following: # Define the set of relevant variables \left\ on which the joint distribution is defined. # Decompose the joint distribution (break it into relevant
independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independ ...
or
conditional probabilities In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. This particular method relies on event B occur ...
). # Define the forms of each of the distributions (e.g., for each variable, one of the
list of probability distributions Many probability distributions that are important in theory or applications have been given specific names. Discrete distributions With finite support * The Bernoulli distribution, which takes value 1 with probability ''p'' and value 0 with ...
).


Decomposition

Given a partition of \left\ containing K subsets, K variables are defined L_,\cdots,L_, each corresponding to one of these subsets. Each variable L_ is obtained as the conjunction of the variables \left\ belonging to the k^ subset. Recursive application of
Bayes' theorem In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For examp ...
leads to: : \begin & P\left(X_\wedge X_\wedge\cdots\wedge X_\mid\delta\wedge\pi\right)\\ = & P\left(L_\wedge\cdots\wedge L_\mid\delta\wedge\pi\right)\\ = & P\left(L_\mid\delta\wedge\pi\right)\times P\left(L_\mid L_\wedge\delta\wedge\pi\right) \times\cdots\times P\left(L_\mid L_\wedge\cdots\wedge L_\wedge\delta\wedge\pi\right)\end
Conditional independence In probability theory, conditional independence describes situations wherein an observation is irrelevant or redundant when evaluating the certainty of a hypothesis. Conditional independence is usually formulated in terms of conditional probabil ...
hypotheses then allow further simplifications. A conditional independence hypothesis for variable L_ is defined by choosing some variable X_ among the variables appearing in the conjunction L_\wedge\cdots\wedge L_\wedge L_, labelling R_ as the conjunction of these chosen variables and setting: : P \left(L_\mid L_\wedge\cdots\wedge L_\wedge\delta\wedge\pi\right ) = P\left( L_ \mid R_\wedge\delta\wedge\pi \right) We then obtain: : \begin & P\left(X_\wedge X_\wedge\cdots\wedge X_\mid\delta\wedge\pi\right)\\ = & P\left(L_\mid\delta\wedge\pi\right)\times P\left(L_\mid R_\wedge\delta\wedge\pi\right)\times\cdots\times P\left(L_\mid R_\wedge\delta\wedge\pi\right)\end Such a simplification of the joint distribution as a product of simpler distributions is called a decomposition, derived using the
chain rule In calculus, the chain rule is a formula that expresses the derivative of the composition of two differentiable functions and in terms of the derivatives of and . More precisely, if h=f\circ g is the function such that h(x)=f(g(x)) for every , ...
. This ensures that each variable appears at the most once on the left of a conditioning bar, which is the necessary and sufficient condition to write mathematically valid decompositions.


Forms

Each distribution P\left(L_\mid R_\wedge\delta\wedge\pi\right) appearing in the product is then associated with either a parametric form (i.e., a function f_\left(L_\right)) or a question to another Bayesian program P\left(L_\mid R_ \wedge \delta \wedge \pi \right) = P\left(L\mid R\wedge\widehat\wedge\widehat\right). When it is a form f_\left(L_\right), in general, \mu is a vector of parameters that may depend on R_ or \delta or both. Learning takes place when some of these parameters are computed using the data set \delta. An important feature of Bayesian Programming is this capacity to use questions to other Bayesian programs as components of the definition of a new Bayesian program. P\left(L_\mid R_\wedge\delta\wedge\pi\right) is obtained by some inferences done by another Bayesian program defined by the specifications \widehat and the data \widehat. This is similar to calling a subroutine in classical programming and provides an easy way to build hierarchical models.


Question

Given a description (i.e., P\left(X_\wedge X_\wedge\cdots\wedge X_\mid\delta\wedge\pi\right)), a question is obtained by partitioning \left\ into three sets: the searched variables, the known variables and the free variables. The 3 variables Searched, Known and Free are defined as the conjunction of the variables belonging to these sets. A question is defined as the set of distributions: : P\left(Searched\mid \text\wedge\delta\wedge\pi\right) made of many "instantiated questions" as the cardinal of Known, each instantiated question being the distribution: : P\left(\text\mid\text\wedge\delta\wedge\pi\right)


Inference

Given the joint distribution P\left(X_\wedge X_\wedge\cdots\wedge X_\mid\delta\wedge\pi\right), it is always possible to compute any possible question using the following general inference: : \begin & P\left(\text\mid\text\wedge\delta\wedge\pi\right)\\ = & \sum_\text\left \left( \text \wedge \text \mid \text\wedge\delta\wedge\pi\right)\right\ = & \frac\\ = & \frac\\ = & \frac\times\sum_\text\left \left(\text\wedge \text \wedge \text \mid \delta\wedge\pi\right)\rightend where the first equality results from the marginalization rule, the second results from
Bayes' theorem In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For examp ...
and the third corresponds to a second application of marginalization. The denominator appears to be a normalization term and can be replaced by a constant Z. Theoretically, this allows to solve any Bayesian inference problem. In practice, however, the cost of computing exhaustively and exactly P\left(\text \mid \text \wedge\delta\wedge\pi\right) is too great in almost all cases. Replacing the joint distribution by its decomposition we get: : \begin & P\left(\text\mid \text\wedge\delta\wedge\pi\right)\\ = & \frac \sum_\text \left prod_^K_\left[_P\left(_L_\mid_K__\wedge_\pi_\right)\rightright.html" ;"title="P\left( L_\mid K_ \wedge \pi \right)\right">prod_^K \left[ P\left( L_\mid K_ \wedge \pi \right)\rightright">P\left( L_\mid K_ \wedge \pi \right)\right">prod_^K \left[ P\left( L_\mid K_ \wedge \pi \right)\rightright\end which is usually a much simpler expression to compute, as the dimensionality of the problem is considerably reduced by the decomposition into a product of lower dimension distributions.


Example


Bayesian spam detection

The purpose of Bayesian spam filtering is to eliminate junk e-mails. The problem is very easy to formulate. E-mails should be classified into one of two categories: non-spam or spam. The only available information to classify the e-mails is their content: a set of words. Using these words without taking the order into account is commonly called a
bag of words model The bag-of-words model is a simplifying representation used in natural language processing and information retrieval (IR). In this model, a text (such as a sentence or a document) is represented as the bag (multiset) of its words, disregarding g ...
. The classifier should furthermore be able to adapt to its user and to learn from experience. Starting from an initial standard setting, the classifier should modify its internal parameters when the user disagrees with its own decision. It will hence adapt to the user's criteria to differentiate between non-spam and spam. It will improve its results as it encounters increasingly classified e-mails.


Variables

The variables necessary to write this program are as follows: # Spam: a binary variable, false if the e-mail is not spam and true otherwise. # W_0,W_1, \ldots, W_: N binary variables. W_n is true if the n^ word of the dictionary is present in the text. These N + 1 binary variables sum up all the information about an e-mail.


Decomposition

Starting from the joint distribution and applying recursively
Bayes' theorem In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For examp ...
we obtain: : \begin & P(\text\wedge W_\wedge\cdots\wedge W_)\\ = & P(\text)\times P(W_0 \mid \text)\times P(W_1 \mid \text \wedge W_0)\\ & \times\cdots\\ & \times P\left(W_\mid\text\wedge W_\wedge\cdots\wedge W_\right)\end This is an exact mathematical expression. It can be drastically simplified by assuming that the probability of appearance of a word knowing the nature of the text (spam or not) is independent of the appearance of the other words. This is the
naive Bayes In statistics, naive Bayes classifiers are a family of simple "Probabilistic classification, probabilistic classifiers" based on applying Bayes' theorem with strong (naive) statistical independence, independence assumptions between the features (s ...
assumption and this makes this spam filter a
naive Bayes In statistics, naive Bayes classifiers are a family of simple "Probabilistic classification, probabilistic classifiers" based on applying Bayes' theorem with strong (naive) statistical independence, independence assumptions between the features (s ...
model. For instance, the programmer can assume that: : P(W_1\mid\text \land W_0) = P(W_1\mid\text) to finally obtain: : P(\text \land W_0 \land \ldots \land W_) = P(\text)\prod_^ (W_n\mid\text) This kind of assumption is known as the naive Bayes' assumption. It is "naive" in the sense that the independence between words is clearly not completely true. For instance, it completely neglects that the appearance of pairs of words may be more significant than isolated appearances. However, the programmer may assume this hypothesis and may develop the model and the associated inferences to test how reliable and efficient it is.


Parametric forms

To be able to compute the joint distribution, the programmer must now specify the N + 1 distributions appearing in the decomposition: # P(\text) is a prior defined, for instance, by P( text=1 = 0.75 # Each of the N forms P(W_n\mid\text) may be specified using
Laplace rule of succession In probability theory, the rule of succession is a formula introduced in the 18th century by Pierre-Simon Laplace in the course of treating the sunrise problem. The formula is still used, particularly to estimate underlying probabilities when t ...
(this is a pseudocounts-based smoothing technique to counter the zero-frequency problem of words never-seen-before): ## P(W_n\mid text=\text=\frac ## P(W_n\mid text=\text=\frac where a^n_f stands for the number of appearances of the n^ word in non-spam e-mails and a_f stands for the total number of non-spam e-mails. Similarly, a_t^n stands for the number of appearances of the n^ word in spam e-mails and a_t stands for the total number of spam e-mails.


Identification

The N forms P(W_n\mid\text) are not yet completely specified because the 2N + 2 parameters a_f^, a_t^, a_f and a_t have no values yet. The identification of these parameters could be done either by batch processing a series of classified e-mails or by an incremental updating of the parameters using the user's classifications of the e-mails as they arrive. Both methods could be combined: the system could start with initial standard values of these parameters issued from a generic database, then some incremental learning customizes the classifier to each individual user.


Question

The question asked to the program is: "what is the probability for a given text to be spam knowing which words appear and don't appear in this text?" It can be formalized by: : P (\text\mid w_0 \wedge\cdots\wedge w_ ) which can be computed as follows: : \begin & P(\text\mid w_\wedge\cdots\wedge w_ )\\ = & \frac\end The denominator appears to be a
normalization constant The concept of a normalizing constant arises in probability theory and a variety of other areas of mathematics. The normalizing constant is used to reduce any probability function to a probability density function with total probability of one. ...
. It is not necessary to compute it to decide if we are dealing with spam. For instance, an easy trick is to compute the ratio: : \begin & \frac\\ = & \frac\times\prod_^ \left frac\right\end This computation is faster and easier because it requires only 2N products.


Bayesian program

The Bayesian spam filter program is completely defined by: : \Pr \begin Ds \begin Sp (\pi) \begin Va: \text,W_0,W_1 \ldots W_ \\ Dc: \begin P(\text \land W_0 \land \ldots \land W_n \land \ldots \land W_)\\ = P(\text)\prod_^P(W_n\mid\text) \end\\ Fo: \begin P(\text): \begin P( text=\text=0.25 \\ P( text=\text=0.75 \end\\ P(W_n\mid\text): \begin P(W_n\mid text=\text\\ =\frac \\ P(W_n\mid text=\text\\ =\frac \end \\ \end\\ \end\\ \text\delta) \end\\ Qu: P(\text\mid w_0 \land \ldots \land w_n \land \ldots \land w_) \end


Bayesian filter, Kalman filter and hidden Markov model

Bayesian filters (often called
Recursive Bayesian estimation In probability theory, statistics, and machine learning, recursive Bayesian estimation, also known as a Bayes filter, is a general probabilistic approach for estimating an unknown probability density function ( PDF) recursively over time using inc ...
) are generic probabilistic models for time evolving processes. Numerous models are particular instances of this generic approach, for instance: the
Kalman filter For statistics and control theory, Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, including statistical noise and other inaccuracies, and produces estimat ...
or the
Hidden Markov model A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ob ...
(HMM).


Variables

* Variables S^,\ldots,S^ are a time series of state variables considered to be on a time horizon ranging from 0 to T. * Variables O^,\ldots,O^ are a time series of observation variables on the same horizon.


Decomposition

The decomposition is based: * on P(S^t \mid S^), called the system model, transition model or dynamic model, which formalizes the transition from the state at time t-1 to the state at time t; * on P(O^t\mid S^t), called the observation model, which expresses what can be observed at time t when the system is in state S^t; * on an initial state at time 0: P(S^0 \wedge O^0).


Parametrical forms

The parametrical forms are not constrained and different choices lead to different well-known models: see Kalman filters and Hidden Markov models just below.


Question

The typical question for such models is P\left(S^\mid O^\wedge\cdots\wedge O^\right): what is the probability distribution for the state at time t + k knowing the observations from instant 0 to t? The most common case is Bayesian filtering where k=0, which searches for the present state, knowing past observations. However, it is also possible (k>0), to extrapolate a future state from past observations, or to do smoothing (k<0), to recover a past state from observations made either before or after that instant. More complicated questions may also be asked as shown below in the HMM section. Bayesian filters (k=0) have a very interesting recursive property, which contributes greatly to their attractiveness. P\left(S^, O^\wedge\cdots\wedge O^\right) may be computed simply from P\left(S^\mid O^0 \wedge \cdots \wedge O^\right) with the following formula: : \begin & P\left(S^, O^\wedge\cdots\wedge O^\right)\\ = & P\left(O^, S^\right)\times\sum_\left S^\right)\times P\left(S^, O^\wedge\cdots\wedge O^\right)\rightend Another interesting point of view for this equation is to consider that there are two phases: a prediction phase and an estimation phase: * During the prediction phase, the state is predicted using the dynamic model and the estimation of the state at the previous moment: :: \begin & P\left(S^, O^\wedge\cdots\wedge O^\right)\\ = & \sum_\left S^\right)\times P\left(S^, O^\wedge\cdots\wedge O^\right)\rightend * During the estimation phase, the prediction is either confirmed or invalidated using the last observation: :: \begin & P\left(S^\mid O^\wedge\cdots\wedge O^\right)\\ = & P\left(O^\mid S^\right)\times P\left(S^, O^\wedge\cdots\wedge O^\right) \end


Bayesian program

: Pr\begin Ds\begin Sp(\pi)\begin Va:\\ S^,\cdots,S^,O^,\cdots,O^\\ Dc:\\ \begin & P\left(S^\wedge\cdots\wedge S^\wedge O^\wedge\cdots\wedge O^, \pi\right)\\ = & P\left(S^\wedge O^\right)\times\prod_^\left S^\right)\times P\left(O^, S^\right)\rightend\\ Fo:\\ \begin P\left(S^\wedge O^\right)\\ P\left(S^, S^\right)\\ P\left(O^, S^\right)\end\end\\ Id\end\\ Qu:\\ \begin \begin P\left(S^, O^\wedge\cdots\wedge O^\right)\\ \left(k=0\right)\equiv \text \\ \left(k>0\right)\equiv \text \\ \left(k<0\right)\equiv \text \end\end\end


Kalman filter

The very well-known
Kalman filter For statistics and control theory, Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, including statistical noise and other inaccuracies, and produces estimat ...
s are a special case of Bayesian filters. They are defined by the following Bayesian program: : Pr\begin Ds\begin Sp(\pi)\begin Va:\\ S^,\cdots,S^,O^,\cdots,O^\\ Dc:\\ \begin & P\left(S^\wedge\cdots\wedge O^, \pi\right)\\ = & \left \pi\right)\\ \prod_^\left S^\wedge\pi\right)\times P\left(O^, S^\wedge\pi\right)\rightend\rightend\\ Fo:\\ \begin P\left(S^t \mid S^\wedge\pi\right)\equiv G\left(S^,A\bullet S^,Q\right)\\ P\left(O^t \mid S^t \wedge\pi\right)\equiv G\left(O^,H\bullet S^,R\right)\end\end\\ Id\end\\ Qu:\\ P\left(S^T \mid O^0 \wedge\cdots\wedge O^\wedge\pi\right)\end * Variables are continuous. * The transition model P(S^t \mid S^\wedge\pi) and the observation model P(O^t \mid S^t \wedge\pi) are both specified using Gaussian laws with means that are linear functions of the conditioning variables. With these hypotheses and by using the recursive formula, it is possible to solve the inference problem analytically to answer the usual P(S^T \mid O^0 \wedge\cdots\wedge O^T \wedge\pi) question. This leads to an extremely efficient algorithm, which explains the popularity of Kalman filters and the number of their everyday applications. When there are no obvious linear transition and observation models, it is still often possible, using a first-order Taylor's expansion, to treat these models as locally linear. This generalization is commonly called the
extended Kalman filter In estimation theory, the extended Kalman filter (EKF) is the nonlinear version of the Kalman filter which linearizes about an estimate of the current mean and covariance. In the case of well defined transition models, the EKF has been considered t ...
.


Hidden Markov model

Hidden Markov model A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ob ...
s (HMMs) are another very popular specialization of Bayesian filters. They are defined by the following Bayesian program: : \Pr\begin Ds\begin Sp(\pi)\begin Va:\\ S^,\ldots,S^,O^,\ldots,O^\\ Dc:\\ \begin & P\left(S^\wedge\cdots\wedge O^\mid\pi\right)\\ = & \left begin P\left(S^\wedge O^\mid\pi\right)\\ \prod_^\left[P\left(S^\mid S^\wedge\pi\right)\times P\left(O^\mid S^\wedge\pi\right)\rightend\right]\end\\ Fo:\\ \begin P\left(S^\wedge O^\mid\pi\right)\equiv \text\\ P\left(S^\mid S^\wedge\pi\right)\equiv \text\\ P\left(O^\mid S^\wedge\pi\right)\equiv \text\end\end\\ Id\end\\ Qu:\\ \max_\left[P\left(S^\wedge\cdots\wedge S^\mid S^\wedge O^\wedge\cdots\wedge O^\wedge\pi\right)\right]\end * Variables are treated as being discrete. * The transition model P\left(S^\mid S^\wedge\pi\right) and the observation model P\left(O^\mid S^\wedge\pi\right) are both specified using probability matrices. * The question most frequently asked of HMMs is: :: \max_\left \left(S^\wedge\cdots\wedge S^\mid S^\wedge O^\wedge\cdots\wedge O^\wedge\pi\right)\right What is the most probable series of states that leads to the present state, knowing the past observations? This particular question may be answered with a specific and very efficient algorithm called the
Viterbi algorithm The Viterbi algorithm is a dynamic programming algorithm for obtaining the maximum a posteriori probability estimate of the most likely sequence of hidden states—called the Viterbi path—that results in a sequence of observed events, especiall ...
. The
Baum–Welch algorithm In electrical engineering, statistical computing and bioinformatics, the Baum–Welch algorithm is a special case of the expectation–maximization algorithm used to find the unknown parameters of a hidden Markov model (HMM). It makes use of the f ...
has been developed for HMMs.


Applications


Academic applications

Since 2000, Bayesian programming has been used to develop both
robotics Robotics is an interdisciplinary branch of computer science and engineering. Robotics involves design, construction, operation, and use of robots. The goal of robotics is to design machines that can help and assist humans. Robotics integrat ...
applications and life sciences models.


Robotics

In robotics, bayesian programming was applied to
autonomous robotics An autonomous robot is a robot that acts without recourse to human control. The first autonomous robots environment were known as Elmer and Elsie, which were constructed in the late 1940s by W. Grey Walter. They were the first robots in history th ...
, robotic
CAD Computer-aided design (CAD) is the use of computers (or ) to aid in the creation, modification, analysis, or optimization of a design. This software is used to increase the productivity of the designer, improve the quality of design, improve co ...
systems,
advanced driver-assistance systems An advanced driver-assistance system (ADAS) is any of a groups of electronic technologies that assist drivers in driving and parking functions. Through a safe human-machine interface, ADAS increase car and road safety. ADAS uses automated technol ...
,
robotic arm A robotic arm is a type of mechanical arm, usually programmable, with similar functions to a human arm; the arm may be the sum total of the mechanism or may be part of a more complex robot. The links of such a manipulator are connected by joints ...
control,
mobile robot A mobile robot is an automatic machine that is capable of locomotion.Hu, J.; Bhowmick, P.; Lanzon, A.,Group Coordinated Control of Networked Mobile Robots with Applications to Object Transportation IEEE Transactions on Vehicular Technology, 2021 ...
ics, human-robot interaction, human-vehicle interaction (Bayesian autonomous driver models)
video game Video games, also known as computer games, are electronic games that involves interaction with a user interface or input device such as a joystick, controller, keyboard, or motion sensing device to generate visual feedback. This fee ...
avatar programming and training and real-time strategy games (AI).


Life sciences

In life sciences, bayesian programming was used in vision to reconstruct shape from motion, to model visuo-vestibular interaction and to study saccadic eye movements; in speech perception and control to study early
speech acquisition Speech acquisition focuses on the development of vocal, acoustic and oral language by a child. This includes motor planning and execution, pronunciation, phonological and articulation patterns (as opposed to content and grammar which is language). ...
and the emergence of articulatory-acoustic systems; and to model handwriting perception and control.


Pattern recognition

Bayesian program learning has potential applications voice recognition and synthesis,
image recognition Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...
and natural language processing. It employs the principles of ''compositionality'' (building abstract representations from parts), ''causality'' (building complexity from parts) and ''learning to learn'' (using previously recognized concepts to ease the creation of new concepts).


Possibility theories

The comparison between probabilistic approaches (not only bayesian programming) and possibility theories continues to be debated. Possibility theories like, for instance,
fuzzy set In mathematics, fuzzy sets (a.k.a. uncertain sets) are sets whose elements have degrees of membership. Fuzzy sets were introduced independently by Lotfi A. Zadeh in 1965 as an extension of the classical notion of set. At the same time, defined a ...
s,
fuzzy logic Fuzzy logic is a form of many-valued logic in which the truth value of variables may be any real number between 0 and 1. It is employed to handle the concept of partial truth, where the truth value may range between completely true and completely ...
and
possibility theory Possibility theory is a mathematical theory for dealing with certain types of uncertainty and is an alternative to probability theory. It uses measures of possibility and necessity between 0 and 1, ranging from impossible to possible and unnecessa ...
are alternatives to probability to model uncertainty. They argue that probability is insufficient or inconvenient to model certain aspects of incomplete/uncertain knowledge. The defense of probability is mainly based on
Cox's theorem Cox's theorem, named after the physicist Richard Threlkeld Cox, is a derivation of the laws of probability theory from a certain set of postulates. This derivation justifies the so-called "logical" interpretation of probability, as the laws of pro ...
, which starts from four postulates concerning rational reasoning in the presence of uncertainty. It demonstrates that the only mathematical framework that satisfies these postulates is probability theory. The argument is that any approach other than probability necessarily infringes one of these postulates and the value of that infringement.


Probabilistic programming

The purpose of
probabilistic programming Probabilistic programming (PP) is a programming paradigm in which probabilistic models are specified and inference for these models is performed automatically. It represents an attempt to unify probabilistic modeling and traditional general pur ...
is to unify the scope of classical programming languages with probabilistic modeling (especially
bayesian network A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bay ...
s) to deal with uncertainty while profiting from the programming languages' expressiveness to encode complexity. Extended classical programming languages include logical languages as proposed in Probabilistic Horn Abduction, Independent Choice Logic, PRISM, and ProbLog which proposes an extension of Prolog. It can also be extensions of
functional programming languages In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that m ...
(essentially
Lisp A lisp is a speech impairment in which a person misarticulates sibilants (, , , , , , , ). These misarticulations often result in unclear speech. Types * A frontal lisp occurs when the tongue is placed anterior to the target. Interdental lisping ...
and
Scheme A scheme is a systematic plan for the implementation of a certain idea. Scheme or schemer may refer to: Arts and entertainment * ''The Scheme'' (TV series), a BBC Scotland documentary series * The Scheme (band), an English pop band * ''The Schem ...
) such as IBAL or CHURCH. The underlying programming languages can be object-oriented as in BLOG and FACTORIE or more standard ones as in CES and FIGARO. The purpose of Bayesian programming is different. Jaynes' precept of "probability as logic" argues that probability is an extension of and an alternative to logic above which a complete theory of rationality, computation and programming can be rebuilt. Bayesian programming attempts to replace classical languages with a programming approach based on probability that considers incompleteness and
uncertainty Uncertainty refers to epistemic situations involving imperfect or unknown information. It applies to predictions of future events, to physical measurements that are already made, or to the unknown. Uncertainty arises in partially observable or ...
. The precise comparison between the
semantics Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy Philosophy (f ...
and power of expression of Bayesian and probabilistic programming is an open question.


See also


References


Further reading

*


External links


A companion site to the ''Bayesian programming'' book where to download ProBT an inference engine dedicated to Bayesian programming.
* Th
Bayesian-programming.org site
{{Webarchive, url=https://archive.today/20131123162815/http://bayesian-programming.org/ , date=2013-11-23 for the promotion of Bayesian programming with detailed information and numerous publications. Bayesian statistics Artificial intelligence