Bayesian programming is a formalism and a methodology for having a technique to specify

probabilistic models Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, ...

and solve problems when less than the necessary information is available. Edwin T. Jaynes proposed that probability could be considered as an alternative and an extension of logic for rational reasoning with incomplete and uncertain information. In his founding book ''Probability Theory: The Logic of Science'' he developed this theory and proposed what he called “the robot,” which was not a physical device, but an

inference engine In the field of artificial intelligence, an inference engine is a component of the system that applies logical rules to the knowledge base to deduce new information. The first inference engines were components of expert systems. The typical expert ...

to automate probabilistic reasoning—a kind of

Prolog Prolog is a logic programming language associated with artificial intelligence and computational linguistics. Prolog has its roots in first-order logic, a formal logic, and unlike many other programming languages, Prolog is intended primarily a ...

for probability instead of logic. Bayesian programming is a formal and concrete implementation of this "robot". Bayesian programming may also be seen as an algebraic formalism to specify

graphical model A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. They are commonly used in probability ...

s such as, for instance,

Bayesian network A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bay ...

s, dynamic Bayesian networks,

Kalman filter For statistics and control theory, Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, including statistical noise and other inaccuracies, and produces estima ...

s or

hidden Markov model A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ob ...

s. Indeed, Bayesian Programming is more general than

s and has a power of expression equivalent to probabilistic

factor graph A factor graph is a bipartite graph representing the factorization of a function. In probability theory and its applications, factor graphs are used to represent factorization of a probability distribution function, enabling efficient computati ...

Formalism

A Bayesian program is a means of specifying a family of probability distributions. The constituent elements of a Bayesian program are presented below: :

\text
\begin
\text
   \begin
   \text (\pi)
       \begin
       \text\\
       \text\\
       \text\\
       \end\\
   \text\delta)
   \end\\
\text
\end

# A program is constructed from a description and a question. # A description is constructed using some specification (

\pi

) as given by the programmer and an identification or learning process for the parameters not completely specified by the specification, using a data set (

\delta

). # A specification is constructed from a set of pertinent variables, a decomposition and a set of forms. # Forms are either parametric forms or questions to other Bayesian programs. # A question specifies which probability distribution has to be computed.

Description

The purpose of a description is to specify an effective method of computing a

joint probability distribution Given two random variables that are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs. The joint distribution can just as well be considere ...

on a set of variables

\left\

given a set of experimental data

\delta

and some specification

\pi

. This

joint distribution Given two random variables that are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs. The joint distribution can just as well be considered ...

is denoted as:

P\left(X_\wedge X_\wedge\cdots\wedge X_\mid\delta\wedge\pi\right)

. To specify preliminary knowledge

\pi

, the programmer must undertake the following: # Define the set of relevant variables

\left\

on which the joint distribution is defined. # Decompose the joint distribution (break it into relevant

independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independe ...

conditional probabilities In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. This particular method relies on event B occu ...

). # Define the forms of each of the distributions (e.g., for each variable, one of the

list of probability distributions Many probability distributions that are important in theory or applications have been given specific names. Discrete distributions With finite support * The Bernoulli distribution, which takes value 1 with probability ''p'' and value 0 with pr ...

Decomposition

Given a partition of

\left\

containing

K

subsets,

K

variables are defined

L_,\cdots,L_

, each corresponding to one of these subsets. Each variable

L_

is obtained as the conjunction of the variables

\left\

belonging to the

k^

subset. Recursive application of

Bayes' theorem In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For exa ...

leads to: :

\begin
 & P\left(X_\wedge X_\wedge\cdots\wedge X_\mid\delta\wedge\pi\right)\\
= & P\left(L_\wedge\cdots\wedge L_\mid\delta\wedge\pi\right)\\
= & P\left(L_\mid\delta\wedge\pi\right)\times P\left(L_\mid L_\wedge\delta\wedge\pi\right) \times\cdots\times P\left(L_\mid L_\wedge\cdots\wedge L_\wedge\delta\wedge\pi\right)\end

Conditional independence In probability theory, conditional independence describes situations wherein an observation is irrelevant or redundant when evaluating the certainty of a hypothesis. Conditional independence is usually formulated in terms of conditional probabil ...

hypotheses then allow further simplifications. A conditional independence hypothesis for variable

L_

is defined by choosing some variable

X_

among the variables appearing in the conjunction

L_\wedge\cdots\wedge L_\wedge L_

, labelling

R_

as the conjunction of these chosen variables and setting: :

P \left(L_\mid L_\wedge\cdots\wedge L_\wedge\delta\wedge\pi\right ) = P\left( L_ \mid R_\wedge\delta\wedge\pi \right)

We then obtain: :

\begin
 & P\left(X_\wedge X_\wedge\cdots\wedge X_\mid\delta\wedge\pi\right)\\
= & P\left(L_\mid\delta\wedge\pi\right)\times P\left(L_\mid R_\wedge\delta\wedge\pi\right)\times\cdots\times P\left(L_\mid R_\wedge\delta\wedge\pi\right)\end

Such a simplification of the joint distribution as a product of simpler distributions is called a decomposition, derived using the

chain rule In calculus, the chain rule is a formula that expresses the derivative of the Function composition, composition of two differentiable functions and in terms of the derivatives of and . More precisely, if h=f\circ g is the function such that h(x) ...

. This ensures that each variable appears at the most once on the left of a conditioning bar, which is the necessary and sufficient condition to write mathematically valid decompositions.

Forms

Each distribution

P\left(L_\mid R_\wedge\delta\wedge\pi\right)

appearing in the product is then associated with either a parametric form (i.e., a function

f_\left(L_\right)

) or a question to another Bayesian program

P\left(L_\mid R_ \wedge \delta \wedge \pi \right) = P\left(L\mid R\wedge\widehat\wedge\widehat\right)

. When it is a form

f_\left(L_\right)

, in general,

\mu

is a vector of parameters that may depend on

R_

\delta

or both. Learning takes place when some of these parameters are computed using the data set

\delta

. An important feature of Bayesian Programming is this capacity to use questions to other Bayesian programs as components of the definition of a new Bayesian program.

P\left(L_\mid R_\wedge\delta\wedge\pi\right)

is obtained by some inferences done by another Bayesian program defined by the specifications

\widehat

and the data

\widehat

. This is similar to calling a subroutine in classical programming and provides an easy way to build

hierarchical models A hierarchical database model is a data model in which the data are organized into a tree-like structure. The data are stored as records which are connected to one another through links. A record is a collection of fields, with each field containi ...

Question

Given a description (i.e.,

P\left(X_\wedge X_\wedge\cdots\wedge X_\mid\delta\wedge\pi\right)

), a question is obtained by partitioning

\left\

into three sets: the searched variables, the known variables and the free variables. The 3 variables

Searched

Known

and

Free

are defined as the conjunction of the variables belonging to these sets. A question is defined as the set of distributions: :

P\left(Searched\mid \text\wedge\delta\wedge\pi\right)

made of many "instantiated questions" as the cardinal of

Known

, each instantiated question being the distribution: :

P\left(\text\mid\text\wedge\delta\wedge\pi\right)

Inference

Given the joint distribution

P\left(X_\wedge X_\wedge\cdots\wedge
X_\mid\delta\wedge\pi\right)

, it is always possible to compute any possible question using the following general inference: :

\begin
 & P\left(\text\mid\text\wedge\delta\wedge\pi\right)\\
= & \sum_\text\left \left( \text \wedge \text \mid \text\wedge\delta\wedge\pi\right)\right \
= & \frac\\
= & \frac\\
= & \frac\times\sum_\text\left \left(\text\wedge \text \wedge \text \mid \delta\wedge\pi\right)\right end

where the first equality results from the marginalization rule, the second results from

and the third corresponds to a second application of marginalization. The denominator appears to be a normalization term and can be replaced by a constant

Z

. Theoretically, this allows to solve any Bayesian inference problem. In practice, however, the cost of computing exhaustively and exactly

P\left(\text \mid \text \wedge\delta\wedge\pi\right)

is too great in almost all cases. Replacing the joint distribution by its decomposition we get: :

right">P\left( L_\mid K_ \wedge \pi \right)\right">prod_^K \left[ P\left( L_\mid K_ \wedge \pi \right)\rightright\end

which is usually a much simpler expression to compute, as the dimensionality of the problem is considerably reduced by the decomposition into a product of lower dimension distributions.

Example

Bayesian spam detection

The purpose of Bayesian spam filtering is to eliminate junk e-mails. The problem is very easy to formulate. E-mails should be classified into one of two categories: non-spam or spam. The only available information to classify the e-mails is their content: a set of words. Using these words without taking the order into account is commonly called a

bag of words model The bag-of-words model is a simplifying representation used in natural language processing and information retrieval (IR). In this model, a text (such as a sentence or a document) is represented as the bag (multiset) of its words, disregarding ...

. The classifier should furthermore be able to adapt to its user and to learn from experience. Starting from an initial standard setting, the classifier should modify its internal parameters when the user disagrees with its own decision. It will hence adapt to the user's criteria to differentiate between non-spam and spam. It will improve its results as it encounters increasingly classified e-mails.

Variables

The variables necessary to write this program are as follows: #

Spam

: a binary variable, false if the e-mail is not spam and true otherwise. #

W_0,W_1, \ldots, W_

N

binary variables.

W_n

is true if the

n^

word of the dictionary is present in the text. These

N + 1

binary variables sum up all the information about an e-mail.

Decomposition

Starting from the joint distribution and applying recursively

we obtain: :

\begin
 & P(\text\wedge W_\wedge\cdots\wedge W_)\\
= & P(\text)\times P(W_0 \mid \text)\times P(W_1 \mid \text \wedge W_0)\\
 & \times\cdots\\
 & \times P\left(W_\mid\text\wedge W_\wedge\cdots\wedge W_\right)\end

This is an exact mathematical expression. It can be drastically simplified by assuming that the probability of appearance of a word knowing the nature of the text (spam or not) is independent of the appearance of the other words. This is the

naive Bayes In statistics, naive Bayes classifiers are a family of simple " probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features (see Bayes classifier). They are among the simplest Baye ...

assumption and this makes this spam filter a

model. For instance, the programmer can assume that: :

P(W_1\mid\text \land W_0) = P(W_1\mid\text)

to finally obtain: :

P(\text \land W_0 \land \ldots
      \land W_) = P(\text)\prod_^(W_n\mid\text)

This kind of assumption is known as the naive Bayes' assumption. It is "naive" in the sense that the independence between words is clearly not completely true. For instance, it completely neglects that the appearance of pairs of words may be more significant than isolated appearances. However, the programmer may assume this hypothesis and may develop the model and the associated inferences to test how reliable and efficient it is.

Parametric forms

To be able to compute the joint distribution, the programmer must now specify the

N + 1

distributions appearing in the decomposition: #

P(\text)

is a prior defined, for instance, by

P(text=1 = 0.75

# Each of the

N

forms

P(W_n\mid\text)

may be specified using Laplace rule of succession (this is a pseudocounts-based smoothing technique to counter the zero-frequency problem of words never-seen-before): ##

=\frac

=\frac

where

a^n_f

stands for the number of appearances of the

n^

word in non-spam e-mails and

a_f

stands for the total number of non-spam e-mails. Similarly,

a_t^n

stands for the number of appearances of the

n^

word in spam e-mails and

a_t

stands for the total number of spam e-mails.

Identification

The

N

forms

P(W_n\mid\text)

are not yet completely specified because the

2N + 2

parameters

a_f^

a_t^

a_f

and

a_t

have no values yet. The identification of these parameters could be done either by batch processing a series of classified e-mails or by an incremental updating of the parameters using the user's classifications of the e-mails as they arrive. Both methods could be combined: the system could start with initial standard values of these parameters issued from a generic database, then some incremental learning customizes the classifier to each individual user.

Question

The question asked to the program is: "what is the probability for a given text to be spam knowing which words appear and don't appear in this text?" It can be formalized by: :

P (\text\mid w_0 \wedge\cdots\wedge w_ )

which can be computed as follows: :

\begin
 & P(\text\mid w_\wedge\cdots\wedge w_ )\\
= & \frac\end

The denominator appears to be a

normalization constant The concept of a normalizing constant arises in probability theory and a variety of other areas of mathematics. The normalizing constant is used to reduce any probability function to a probability density function with total probability of one. ...

. It is not necessary to compute it to decide if we are dealing with spam. For instance, an easy trick is to compute the ratio: :

\end

This computation is faster and easier because it requires only

2N

products.

Bayesian program

The Bayesian spam filter program is completely defined by: :

\\ =\frac \end \\ \end\\ \end\\ \text\delta) \end\\ Qu: P(\text\mid w_0 \land \ldots \land w_n \land \ldots \land w_) \end

Bayesian filter, Kalman filter and hidden Markov model

Bayesian filters (often called

Recursive Bayesian estimation In probability theory, statistics, and machine learning, recursive Bayesian estimation, also known as a Bayes filter, is a general probabilistic approach for estimating an unknown probability density function (PDF) recursively over time using in ...

) are generic probabilistic models for time evolving processes. Numerous models are particular instances of this generic approach, for instance: the

or the

Hidden Markov model A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ob ...

(HMM).

Variables

* Variables

S^,\ldots,S^

are a time series of state variables considered to be on a time horizon ranging from

0

T

. * Variables

O^,\ldots,O^

are a time series of observation variables on the same horizon.

Decomposition

The decomposition is based: * on

P(S^t \mid S^)

, called the system model, transition model or dynamic model, which formalizes the transition from the state at time

t-1

to the state at time

t

; * on

P(O^t\mid S^t)

, called the observation model, which expresses what can be observed at time

t

when the system is in state

S^t

; * on an initial state at time

0

P(S^0 \wedge O^0)

Parametrical forms

The parametrical forms are not constrained and different choices lead to different well-known models: see Kalman filters and Hidden Markov models just below.

Question

The typical question for such models is

P\left(S^\mid O^\wedge\cdots\wedge O^\right)

: what is the probability distribution for the state at time

t + k

knowing the observations from instant

0

t

? The most common case is Bayesian filtering where

k=0

, which searches for the present state, knowing past observations. However, it is also possible

(k>0)

, to extrapolate a future state from past observations, or to do smoothing

(k<0)

, to recover a past state from observations made either before or after that instant. More complicated questions may also be asked as shown below in the HMM section. Bayesian filters

(k=0)

have a very interesting recursive property, which contributes greatly to their attractiveness.

P\left(S^, O^\wedge\cdots\wedge O^\right)

may be computed simply from

P\left(S^\mid O^0 \wedge \cdots \wedge O^\right)

with the following formula: :

\begin
 & P\left(S^, O^\wedge\cdots\wedge O^\right)\\
= & P\left(O^, S^\right)\times\sum_\left S^\right)\times P\left(S^, O^\wedge\cdots\wedge O^\right)\right end

Another interesting point of view for this equation is to consider that there are two phases: a prediction phase and an estimation phase: * During the prediction phase, the state is predicted using the dynamic model and the estimation of the state at the previous moment: ::

\begin
 & P\left(S^, O^\wedge\cdots\wedge O^\right)\\
= & \sum_\left S^\right)\times P\left(S^, O^\wedge\cdots\wedge O^\right)\right end

* During the estimation phase, the prediction is either confirmed or invalidated using the last observation: ::

\begin
 & P\left(S^\mid O^\wedge\cdots\wedge O^\right)\\
= & P\left(O^\mid S^\right)\times P\left(S^, O^\wedge\cdots\wedge O^\right)
\end

Bayesian program

end\\ Fo:\\ \begin P\left(S^\wedge O^\right)\\ P\left(S^, S^\right)\\ P\left(O^, S^\right)\end\end\\ Id\end\\ Qu:\\ \begin \begin P\left(S^, O^\wedge\cdots\wedge O^\right)\\ \left(k=0\right)\equiv \text \\ \left(k>0\right)\equiv \text \\ \left(k<0\right)\equiv \text \end\end\end

Kalman filter

The very well-known

s are a special case of Bayesian filters. They are defined by the following Bayesian program: :

end\\ Fo:\\ \begin P\left(S^t \mid S^\wedge\pi\right)\equiv G\left(S^,A\bullet S^,Q\right)\\ P\left(O^t \mid S^t \wedge\pi\right)\equiv G\left(O^,H\bullet S^,R\right)\end\end\\ Id\end\\ Qu:\\ P\left(S^T \mid O^0 \wedge\cdots\wedge O^\wedge\pi\right)\end

* Variables are continuous. * The transition model

P(S^t \mid S^\wedge\pi)

and the observation model

P(O^t \mid S^t \wedge\pi)

are both specified using Gaussian laws with means that are linear functions of the conditioning variables. With these hypotheses and by using the recursive formula, it is possible to solve the inference problem analytically to answer the usual

P(S^T \mid O^0 \wedge\cdots\wedge O^T \wedge\pi)

question. This leads to an extremely efficient algorithm, which explains the popularity of Kalman filters and the number of their everyday applications. When there are no obvious linear transition and observation models, it is still often possible, using a first-order Taylor's expansion, to treat these models as locally linear. This generalization is commonly called the

extended Kalman filter In estimation theory, the extended Kalman filter (EKF) is the nonlinear version of the Kalman filter which linearizes about an estimate of the current mean and covariance. In the case of well defined transition models, the EKF has been considered t ...

Hidden Markov model

s (HMMs) are another very popular specialization of Bayesian filters. They are defined by the following Bayesian program: :

end\right]\end\\ Fo:\\ \begin P\left(S^\wedge O^\mid\pi\right)\equiv \text\\ P\left(S^\mid S^\wedge\pi\right)\equiv \text\\ P\left(O^\mid S^\wedge\pi\right)\equiv \text\end\end\\ Id\end\\ Qu:\\ \max_\left[P\left(S^\wedge\cdots\wedge S^\mid S^\wedge O^\wedge\cdots\wedge O^\wedge\pi\right)\right]\end

* Variables are treated as being discrete. * The transition model

P\left(S^\mid S^\wedge\pi\right)

and the observation model

P\left(O^\mid S^\wedge\pi\right)

are both specified using probability matrices. * The question most frequently asked of HMMs is: ::

\max_\left \left(S^\wedge\cdots\wedge S^\mid S^\wedge O^\wedge\cdots\wedge O^\wedge\pi\right)\right

What is the most probable series of states that leads to the present state, knowing the past observations? This particular question may be answered with a specific and very efficient algorithm called the

Viterbi algorithm The Viterbi algorithm is a dynamic programming algorithm for obtaining the maximum a posteriori probability estimate of the most likely sequence of hidden states—called the Viterbi path—that results in a sequence of observed events, especially ...

. The

Baum–Welch algorithm In electrical engineering, statistical computing and bioinformatics, the Baum–Welch algorithm is a special case of the expectation–maximization algorithm used to find the unknown parameters of a hidden Markov model (HMM). It makes use of the f ...

has been developed for HMMs.

Applications

Academic applications

Since 2000, Bayesian programming has been used to develop both

robotics Robotics is an interdisciplinarity, interdisciplinary branch of computer science and engineering. Robotics involves design, construction, operation, and use of robots. The goal of robotics is to design machines that can help and assist human ...

applications and life sciences models.

Robotics

In robotics, bayesian programming was applied to autonomous robotics, robotic CAD systems,

advanced driver-assistance systems An advanced driver-assistance system (ADAS) is any of a groups of electronic technologies that assist drivers in driving and parking functions. Through a safe human-machine interface, ADAS increase car and road safety. ADAS uses automated technol ...

robotic arm A robotic arm is a type of mechanical arm, usually programmable, with similar functions to a human arm; the arm may be the sum total of the mechanism or may be part of a more complex robot. The links of such a manipulator are connected by join ...

control,

mobile robot A mobile robot is an automatic machine that is capable of locomotion.Hu, J.; Bhowmick, P.; Lanzon, A.,Group Coordinated Control of Networked Mobile Robots with Applications to Object Transportation IEEE Transactions on Vehicular Technology, 202 ...

ics, human-robot interaction, human-vehicle interaction (Bayesian autonomous driver models)

video game Video games, also known as computer games, are electronic games that involves interaction with a user interface or input device such as a joystick, game controller, controller, computer keyboard, keyboard, or motion sensing device to gener ...

avatar programming and training and real-time strategy games (AI).

Life sciences

In life sciences, bayesian programming was used in vision to reconstruct shape from motion, to model visuo-vestibular interaction and to study

saccadic A saccade ( , French for ''jerk'') is a quick, simultaneous movement of both eyes between two or more phases of fixation in the same direction.Cassin, B. and Solomon, S. ''Dictionary of Eye Terminology''. Gainesville, Florida: Triad Publishi ...

eye movements; in speech perception and control to study early

speech acquisition Speech acquisition focuses on the development of vocal, acoustic and oral language by a child. This includes motor planning and execution, pronunciation, phonological and articulation patterns (as opposed to content and grammar which is language). ...

and the emergence of articulatory-acoustic systems; and to model handwriting perception and control.

Pattern recognition

Bayesian program learning has potential applications voice recognition and synthesis,

image recognition Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the huma ...

and natural language processing. It employs the principles of ''compositionality'' (building abstract representations from parts), ''causality'' (building complexity from parts) and ''learning to learn'' (using previously recognized concepts to ease the creation of new concepts).

Possibility theories

The comparison between probabilistic approaches (not only bayesian programming) and possibility theories continues to be debated. Possibility theories like, for instance,

fuzzy set In mathematics, fuzzy sets (a.k.a. uncertain sets) are sets whose elements have degrees of membership. Fuzzy sets were introduced independently by Lotfi A. Zadeh in 1965 as an extension of the classical notion of set. At the same time, defined ...

fuzzy logic Fuzzy logic is a form of many-valued logic in which the truth value of variables may be any real number between 0 and 1. It is employed to handle the concept of partial truth, where the truth value may range between completely true and complet ...

and

possibility theory Possibility theory is a mathematical theory for dealing with certain types of uncertainty and is an alternative to probability theory. It uses measures of possibility and necessity between 0 and 1, ranging from impossible to possible and unneces ...

are alternatives to probability to model uncertainty. They argue that probability is insufficient or inconvenient to model certain aspects of incomplete/uncertain knowledge. The defense of probability is mainly based on

Cox's theorem Cox's theorem, named after the physicist Richard Threlkeld Cox, is a derivation of the laws of probability theory from a certain set of postulates. This derivation justifies the so-called "logical" interpretation of probability, as the laws of pr ...

, which starts from four postulates concerning rational reasoning in the presence of uncertainty. It demonstrates that the only mathematical framework that satisfies these postulates is probability theory. The argument is that any approach other than probability necessarily infringes one of these postulates and the value of that infringement.

Probabilistic programming

The purpose of

probabilistic programming Probabilistic programming (PP) is a programming paradigm in which probabilistic models are specified and inference for these models is performed automatically. It represents an attempt to unify probabilistic modeling and traditional general pur ...

is to unify the scope of classical programming languages with probabilistic modeling (especially

bayesian network A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bay ...

s) to deal with uncertainty while profiting from the programming languages' expressiveness to encode complexity. Extended classical programming languages include logical languages as proposed in Probabilistic Horn Abduction, Independent Choice Logic, PRISM, and ProbLog which proposes an extension of Prolog. It can also be extensions of

functional programming languages In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that m ...

(essentially

Lisp A lisp is a speech impairment in which a person misarticulates sibilants (, , , , , , , ). These misarticulations often result in unclear speech. Types * A frontal lisp occurs when the tongue is placed anterior to the target. Interdental lispi ...

and

Scheme A scheme is a systematic plan for the implementation of a certain idea. Scheme or schemer may refer to: Arts and entertainment * ''The Scheme'' (TV series), a BBC Scotland documentary series * The Scheme (band), an English pop band * ''The Schem ...

) such as IBAL or CHURCH. The underlying programming languages can be object-oriented as in BLOG and FACTORIE or more standard ones as in CES and FIGARO. The purpose of Bayesian programming is different. Jaynes' precept of "probability as logic" argues that probability is an extension of and an alternative to logic above which a complete theory of rationality, computation and programming can be rebuilt. Bayesian programming attempts to replace classical languages with a programming approach based on probability that considers incompleteness and

uncertainty Uncertainty refers to Epistemology, epistemic situations involving imperfect or unknown information. It applies to predictions of future events, to physical measurements that are already made, or to the unknown. Uncertainty arises in partially ...

. The precise comparison between the

semantics Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and compu ...

and power of expression of Bayesian and probabilistic programming is an open question.

References

External links

A companion site to the ''Bayesian programming'' book where to download ProBT an inference engine dedicated to Bayesian programming.
* Th
Bayesian-programming.org site
{{Webarchive, url=https://archive.today/20131123162815/http://bayesian-programming.org/ , date=2013-11-23 for the promotion of Bayesian programming with detailed information and numerous publications. Bayesian statistics Artificial intelligence

Formalism

Description

Decomposition

Forms

Question

Inference

Example

Bayesian spam detection

Variables

Decomposition

Parametric forms

Identification

Question

Bayesian program

Bayesian filter, Kalman filter and hidden Markov model

Variables

Decomposition

Parametrical forms

Question

Bayesian program

Kalman filter

Hidden Markov model

Applications

Academic applications

Robotics

Life sciences

Pattern recognition

Possibility theories

Probabilistic programming

See also

References

Further reading

External links