In
automata theory
Automata theory is the study of abstract machines and automata, as well as the computational problems that can be solved using them. It is a theory in theoretical computer science. The word ''automata'' comes from the Greek word αὐτόματο ...
, the class of unrestricted grammars (also called semi-Thue, type-0 or phrase structure grammars) is the most general class of grammars in the
Chomsky hierarchy
In formal language theory, computer science and linguistics, the Chomsky hierarchy (also referred to as the Chomsky–Schützenberger hierarchy) is a containment hierarchy of classes of formal grammars.
This hierarchy of grammars was described by ...
. No restrictions are made on the productions of an unrestricted grammar, other than each of their left-hand sides being non-empty.
This grammar class can generate arbitrary
recursively enumerable language
In mathematics, logic and computer science, a formal language is called recursively enumerable (also recognizable, partially decidable, semidecidable, Turing-acceptable or Turing-recognizable) if it is a recursively enumerable subset in the set o ...
s.
Formal definition
An unrestricted grammar is a
formal grammar
In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe ...
, where
*
is a
finite set
In mathematics, particularly set theory, a finite set is a set that has a finite number of elements. Informally, a finite set is a set which one could in principle count and finish counting. For example,
:\
is a finite set with five elements. Th ...
of
nonterminal symbols,
*
is a finite set of
terminal symbol
In computer science, terminal and nonterminal symbols are the lexical elements used in specifying the production rules constituting a formal grammar. ''Terminal symbols'' are the elementary symbols of the language defined by a formal grammar. ...
s with
and
disjoint,
[Actually, is not strictly necessary since unrestricted grammars make no real distinction between the two. The designation exists purely so that one knows when to stop generating sentential forms of the grammar; more precisely, the language recognized by is restricted to strings of terminal symbols.]
*
is a finite set of
production rules of the form
where
and
are strings of symbols in
and
is not the empty string, and
*
is a specially designated start symbol.
As the name implies, there are no real restrictions on the types of production rules that unrestricted grammars can have.
[While Hopcroft and Ullman (1979) do not mention the cardinalities of , , explicitly, the proof of their Theorem 9.3 (construction of an equivalent Turing machine from a given unrestricted grammar, p.221, cf. Section #Equivalence to Turing machines) tacitly requires finiteness of and finite lengths of all strings in rules of . Any member of or that does not occur in can be omitted without affecting the generated language.]
Equivalence to Turing machines
The unrestricted grammars characterize the recursively enumerable languages. This is the same as saying that for every unrestricted grammar
there exists some
Turing machine
A Turing machine is a mathematical model of computation describing an abstract machine that manipulates symbols on a strip of tape according to a table of rules. Despite the model's simplicity, it is capable of implementing any computer algori ...
capable of recognizing
and vice versa. Given an unrestricted grammar, such a Turing machine is simple enough to construct, as a two-tape
nondeterministic Turing machine
In theoretical computer science, a nondeterministic Turing machine (NTM) is a theoretical model of computation whose governing rules specify more than one possible action when in some given situations. That is, an NTM's next state is ''not'' comp ...
.
The first tape contains the input word
to be tested, and the second tape is used by the machine to generate
sentential forms from
. The Turing machine then does the following:
# Start at the left of the second tape and repeatedly choose to move right or select the current position on the tape.
#
Nondeterministically choose a production
from the productions in
.
# If
appears at some position on the second tape, replace
by
at that point, possibly shifting the symbols on the tape left or right depending on the relative lengths of
and
(e.g. if
is longer than
, shift the tape symbols left).
# Compare the resulting sentential form on tape 2 to the word on tape 1. If they match, then the Turing machine accepts the word. If they don't, the Turing machine will go back to step 1.
It is easy to see that this Turing machine will generate all and only the sentential forms of
on its second tape after the last step is executed an arbitrary number of times, thus the language
must be recursively enumerable.
The reverse construction is also possible. Given some Turing machine, it is possible to create an equivalent unrestricted grammar
which even uses only productions with one or more non-terminal symbols on their left-hand sides. Therefore, an arbitrary unrestricted grammar can always be equivalently converted to obey the latter form, by converting it to a Turing machine and back again. Some authors use the latter form as definition of ''unrestricted grammar''.
Computational properties
The
decision problem
In computability theory and computational complexity theory, a decision problem is a computational problem that can be posed as a yes–no question of the input values. An example of a decision problem is deciding by means of an algorithm wheth ...
of whether a given string
can be generated by a given unrestricted grammar is equivalent to the problem of whether it can be accepted by the Turing machine equivalent to the grammar. The latter problem is called the
Halting problem
In computability theory, the halting problem is the problem of determining, from a description of an arbitrary computer program and an input, whether the program will finish running, or continue to run forever. Alan Turing proved in 1936 that a ...
and is undecidable.
Recursively enumerable languages are
closed under
Kleene star
In mathematical logic and computer science, the Kleene star (or Kleene operator or Kleene closure) is a unary operation, either on sets of strings or on sets of symbols or characters. In mathematics,
it is more commonly known as the free monoid ...
,
concatenation
In formal language theory and computer programming, string concatenation is the operation of joining character strings end-to-end. For example, the concatenation of "snow" and "ball" is "snowball". In certain formalisations of concatenat ...
,
union
Union commonly refers to:
* Trade union, an organization of workers
* Union (set theory), in mathematics, a fundamental operation on sets
Union may also refer to:
Arts and entertainment
Music
* Union (band), an American rock group
** ''Un ...
, and
intersection, but not under
set difference; see
Recursively enumerable language#Closure properties.
The equivalence of unrestricted grammars to Turing machines implies the existence of a universal unrestricted grammar, a grammar capable of accepting any other unrestricted grammar's language given a description of the language. For this reason, it is theoretically possible to build a
programming language
A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language.
The description of a programming ...
based on unrestricted grammars (e.g.
Thue).
See also
*
Lambda calculus
Lambda calculus (also written as ''λ''-calculus) is a formal system in mathematical logic for expressing computation based on function abstraction and application using variable binding and substitution. It is a universal model of computation ...
*
Semi-Thue system
In theoretical computer science and mathematical logic a string rewriting system (SRS), historically called a semi- Thue system, is a rewriting system over strings from a (usually finite) alphabet. Given a binary relation R between fixed strings o ...
— doesn't distinguish terminal and nonterminal symbols, admits empty left-hand sides
Notes
References
{{Formal languages and grammars
Formal languages