For parsing algorithms in

computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...

, the inside–outside algorithm is a way of re-estimating production probabilities in a

probabilistic context-free grammar Grammar theory to model symbol strings originated from work in computational linguistics aiming to understand the structure of natural languages. Probabilistic context free grammars (PCFGs) have been applied in probabilistic modeling of RNA structur ...

. It was introduced by

James K. Baker James K. Baker and his wife Janet M. Baker are the co-founders of Dragon Systems. Together they are credited with the creation of Dragon NaturallySpeaking. James Baker is an expert in speech recognition technology and a Distinguished Career ...

in 1979 as a generalization of the

forward–backward algorithm The forward–backward algorithm is an inference algorithm for hidden Markov models which computes the posterior marginals of all hidden state variables given a sequence of observations/emissions o_:= o_1,\dots,o_T, i.e. it computes, for all hi ...

for parameter estimation on

hidden Markov model A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ob ...

s to

stochastic context-free grammar Grammar theory to model symbol strings originated from work in computational linguistics aiming to understand the structure of natural languages. Probabilistic context free grammars (PCFGs) have been applied in probabilistic modeling of RNA struct ...

s. It is used to compute expectations, for example as part of the

expectation–maximization algorithm In statistics, an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variab ...

(an unsupervised learning algorithm).

Inside and outside probabilities

The inside probability

\beta_j(p,q)

is the total probability of generating words

w_p \cdots w_q

, given the root nonterminal

N^j

and a grammar

G

: :

\beta_j(p,q) = P(w_, N^j_, G)

The outside probability

\alpha_j(p,q)

is the total probability of beginning with the start symbol

N^1

and generating the nonterminal

N^j_

and all the words outside

w_p \cdots w_q

, given a grammar

G

: :

\alpha_j(p,q) = P(w_, N^j_, w_, G)

Computing inside probabilities

Base Case:

\beta_j(p,p) = P(w_, N^j, G)

General case: Suppose there is a rule

N_j \rightarrow N_r N_s

in the grammar, then the probability of generating

w_p \cdots w_q

starting with a subtree rooted at

N_j

is:

\sum_^ P(N_j \rightarrow N_r N_s)\beta_r(p,k) \beta_s(k+1,q)

The inside probability

\beta_j(p,q)

is just the sum over all such possible rules:

\beta_j(p,q) = \sum_ \sum_^ P(N_j \rightarrow N_r N_s)\beta_r(p,k) \beta_s(k+1,q)

Computing outside probabilities

Base Case:

\alpha_j(1,n) = \begin
1 & \mbox j=1 \\
0 & \mbox
\end

Here the start symbol is

N_1

. General case: Suppose there is a rule

N_r \rightarrow N_j N_s

in the grammar that generates

N_j

. Then the ''left'' contribution of that rule to the outside probability

\alpha_j(p,q)

is:

\sum_^ P(N_r \rightarrow N_j N_s)\alpha_r(p,k) \beta_s(q+1,k)

Now suppose there is a rule

N_r \rightarrow N_s N_j

in the grammar. Then the ''right'' contribution of that rule to the outside probability

\alpha_j(p,q)

is:

\sum_^ P(N_r \rightarrow N_s N_j)\alpha_r(k,q) \beta_s(k,p-1)

The outside probability

\alpha_j(p,q)

is the sum of the left and right contributions over all such rules:

\alpha_j(p,q) =
\sum_ \sum_^ P(N_r \rightarrow N_j N_s)\alpha_r(p,k) \beta_s(q+1,k)
+
\sum_ \sum_^ P(N_r \rightarrow N_s N_j)\alpha_r(k,q) \beta_s(k,p-1)

References

* J. Baker (1979)
Trainable grammars for speech recognition
In J. J. Wolf and D. H. Klatt, editors, ''Speech communication papers presented at the 97th meeting of the Acoustical Society of America'', pages 547–550, Cambridge, MA, June 1979. MIT. * Karim Lari, Steve J. Young (1990)
The estimation of stochastic context-free grammars using the inside–outside algorithm
''Computer Speech and Language'', 4:35–56. * Karim Lari, Steve J. Young (1991)
Applications of stochastic context-free grammars using the Inside–Outside algorithm
''Computer Speech and Language'', 5:237–257. * Fernando Pereira, Yves Schabes (1992)
Inside–outside reestimation from partially bracketed corpora
''Proceedings of the 30th annual meeting on Association for Computational Linguistics, Association for Computational Linguistics'', 128–135.

External links

Inside-outside algorithm - Fei Xia

The Inside-Outside Algorithm - Michael Collins
{{DEFAULTSORT:Inside-outside algorithm Parsing algorithms