In
formal language theory
In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules.
The alphabet of a formal language consists of sy ...
, the Chomsky–Schützenberger representation theorem is a theorem derived by
Noam Chomsky
Avram Noam Chomsky (born December 7, 1928) is an American public intellectual: a linguist, philosopher, cognitive scientist, historian, social critic, and political activist. Sometimes called "the father of modern linguistics", Chomsky is ...
and
Marcel-Paul Schützenberger
Marcel-Paul "Marco" Schützenberger (24 October 1920 – 29 July 1996) was a French mathematician and Doctor of Medicine. He worked in the fields of formal language, combinatorics, and information theory.Herbert Wilf, Dominique Foata, ''et al.'', ...
about representing a given
context-free language
In formal language theory, a context-free language (CFL) is a language generated by a context-free grammar (CFG).
Context-free languages have many applications in programming languages, in particular, most arithmetic expressions are generated by ...
in terms of two simpler languages. These two simpler languages, namely a
regular language
In theoretical computer science and formal language theory, a regular language (also called a rational language) is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science (as opposed to ...
and a
Dyck language
In the theory of formal languages of computer science, mathematics, and linguistics, a Dyck word is a balanced string of square brackets and The set of Dyck words forms the Dyck language.
Dyck words and language are named after the mathematic ...
, are combined by means of an
intersection
In mathematics, the intersection of two or more objects is another object consisting of everything that is contained in all of the objects simultaneously. For example, in Euclidean geometry, when two lines in a plane are not parallel, thei ...
and a
homomorphism
In algebra, a homomorphism is a structure-preserving map between two algebraic structures of the same type (such as two groups, two rings, or two vector spaces). The word ''homomorphism'' comes from the Ancient Greek language: () meaning "sa ...
.
A few notions from formal language theory are in order. A context-free language is ''
regular
The term regular can mean normal or in accordance with rules. It may refer to:
People
* Moses Regular (born 1971), America football player
Arts, entertainment, and media Music
* "Regular" (Badfinger song)
* Regular tunings of stringed instrum ...
'', if it can be described by a
regular expression
A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
, or, equivalently, if it is accepted by a
finite automaton
A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number o ...
. A homomorphism is based on a function
which maps symbols from an alphabet
to words over another alphabet
; If the domain of this function is extended to words over
in the natural way, by letting
for all words
and
, this yields a ''
homomorphism
In algebra, a homomorphism is a structure-preserving map between two algebraic structures of the same type (such as two groups, two rings, or two vector spaces). The word ''homomorphism'' comes from the Ancient Greek language: () meaning "sa ...
''
. A ''matched alphabet''
is an alphabet with two equal-sized sets; it is convenient to think of it as a set of parentheses types, where
contains the opening parenthesis symbols, whereas the symbols in
contains the closing parenthesis symbols. For a matched alphabet
, the ''
Dyck language
In the theory of formal languages of computer science, mathematics, and linguistics, a Dyck word is a balanced string of square brackets and The set of Dyck words forms the Dyck language.
Dyck words and language are named after the mathematic ...
''
is given by
:
Chomsky–Schützenberger theorem. A language ''L'' over the alphabet
is context-free if and only if there exists
:* a matched alphabet
:* a regular language
over
,
:* and a homomorphism
:such that
.
Proofs of this theorem are found in several textbooks, e.g. or .
References
*
*
{{DEFAULTSORT:Chomsky-Schutzenberger representation theorem
Noam Chomsky
Formal languages
Theorems in discrete mathematics