Chomsky–Schützenberger Representation Theorem
   HOME

TheInfoList



OR:

In
formal language theory In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules. The alphabet of a formal language consists of symb ...
, the Chomsky–Schützenberger representation theorem is a theorem derived by
Noam Chomsky Avram Noam Chomsky (born December 7, 1928) is an American public intellectual: a linguist, philosopher, cognitive scientist, historian, social critic, and political activist. Sometimes called "the father of modern linguistics", Chomsky is ...
and
Marcel-Paul Schützenberger Marcel-Paul "Marco" Schützenberger (24 October 1920 – 29 July 1996) was a French mathematician and Doctor of Medicine. He worked in the fields of formal language, combinatorics, and information theory.Herbert Wilf, Dominique Foata, ''et al.' ...
about representing a given
context-free language In formal language theory, a context-free language (CFL) is a language generated by a context-free grammar (CFG). Context-free languages have many applications in programming languages, in particular, most arithmetic expressions are generated by ...
in terms of two simpler languages. These two simpler languages, namely a
regular language In theoretical computer science and formal language theory, a regular language (also called a rational language) is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science (as opposed to ...
and a
Dyck language In the theory of formal languages of computer science, mathematics, and linguistics, a Dyck word is a balanced string of square brackets and The set of Dyck words forms the Dyck language. Dyck words and language are named after the mathematici ...
, are combined by means of an
intersection In mathematics, the intersection of two or more objects is another object consisting of everything that is contained in all of the objects simultaneously. For example, in Euclidean geometry, when two lines in a plane are not parallel, their i ...
and a
homomorphism In algebra, a homomorphism is a structure-preserving map between two algebraic structures of the same type (such as two groups, two rings, or two vector spaces). The word ''homomorphism'' comes from the Ancient Greek language: () meaning "same" ...
. A few notions from formal language theory are in order. A context-free language is '' regular'', if it can be described by a
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
, or, equivalently, if it is accepted by a
finite automaton A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number o ...
. A homomorphism is based on a function h which maps symbols from an alphabet \Gamma to words over another alphabet \Sigma; If the domain of this function is extended to words over \Gamma in the natural way, by letting h(xy)=h(x)h(y) for all words x and y, this yields a ''
homomorphism In algebra, a homomorphism is a structure-preserving map between two algebraic structures of the same type (such as two groups, two rings, or two vector spaces). The word ''homomorphism'' comes from the Ancient Greek language: () meaning "same" ...
'' h:\Gamma^*\to \Sigma^*. A ''matched alphabet'' T \cup \overline T is an alphabet with two equal-sized sets; it is convenient to think of it as a set of parentheses types, where T contains the opening parenthesis symbols, whereas the symbols in \overline T contains the closing parenthesis symbols. For a matched alphabet T \cup \overline T, the ''
Dyck language In the theory of formal languages of computer science, mathematics, and linguistics, a Dyck word is a balanced string of square brackets and The set of Dyck words forms the Dyck language. Dyck words and language are named after the mathematici ...
'' D_T is given by :D_T = \. Chomsky–Schützenberger theorem. A language ''L'' over the alphabet \Sigma is context-free if and only if there exists :* a matched alphabet T \cup \overline T :* a regular language R over T \cup \overline T, :* and a homomorphism h : (T \cup \overline T)^* \to \Sigma^* :such that L = h(D_T \cap R). Proofs of this theorem are found in several textbooks, e.g. or .


References

* * {{DEFAULTSORT:Chomsky-Schutzenberger representation theorem Noam Chomsky Formal languages Theorems in discrete mathematics