In
theoretical computer science
Theoretical computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, lambda calculus, and type theory.
It is difficult to circumsc ...
and
formal language theory
In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules.
The alphabet of a formal language consists of sy ...
, a regular tree grammar is a
formal grammar
In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe ...
that describes a set of
directed trees, or
terms. A
regular word grammar can be seen as a special kind of regular tree grammar, describing a set of single-
path
A path is a route for physical travel – see Trail.
Path or PATH may also refer to:
Physical paths of different types
* Bicycle path
* Bridle path, used by people on horseback
* Course (navigation), the intended path of a vehicle
* Desire ...
trees.
Definition
A regular tree grammar ''G'' is defined by the tuple
''G'' = (''N'', Σ, ''Z'', ''P''),
where
* ''N'' is a finite set of nonterminals,
* Σ is a
ranked alphabet (i.e., an alphabet whose symbols have an associated
arity
Arity () is the number of arguments or operands taken by a function, operation or relation in logic, mathematics, and computer science. In mathematics, arity may also be named ''rank'', but this word can have many other meanings in mathematics. In ...
) disjoint from ''N'',
* ''Z'' is the starting nonterminal, with , and
* ''P'' is a finite set of productions of the form ''A'' → ''t'', with , and , where ''T''
Σ(''N'') is the associated
term algebra
In universal algebra and mathematical logic, a term algebra is a freely generated algebraic structure over a given signature. For example, in a signature consisting of a single binary operation, the term algebra over a set ''X'' of variables is exa ...
, i.e. the set of all trees composed from symbols in according to their arities, where nonterminals are considered nullary.
Derivation of trees
The grammar ''G'' implicitly defines a set of trees: any tree that can be derived from ''Z'' using the rule set ''P'' is said to be described by ''G''.
This set of trees is known as the
language
Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which humans communicate, and may be conveyed through a variety of ...
of ''G''.
More formally, the relation ⇒
''G'' on the set ''T''
Σ(''N'') is defined as follows:
A tree can be derived in a single step into a tree
(in short: ''t''
1 ⇒
''G'' ''t''
2), if there is a context ''S'' and a production such that:
* ''t''
1 = ''S''
'A'' and
* ''t''
2 = ''S''
't''
Here, a ''context'' means a tree with exactly one hole in it; if ''S'' is such a context, ''S''
't''denotes the result of filling the tree ''t'' into the hole of ''S''.
The tree language generated by ''G'' is the language .
Here, ''T''
Σ denotes the set of all trees composed from symbols of Σ, while ⇒
''G''* denotes successive applications of ⇒
''G''.
A language generated by some regular tree grammar is called a regular
tree language
A tree automaton is a type of state machine. Tree automata deal with tree structures, rather than the strings of more conventional state machines.
The following article deals with branching tree automata, which correspond to regular languages of ...
.
Examples

Let ''G''
1 = (''N''
1,Σ
1,''Z''
1,''P''
1), where
* ''N''
1 = is our set of nonterminals,
* Σ
1 = is our ranked alphabet, arities indicated by dummy arguments (i.e. the symbol ''cons'' has arity 2),
* ''Z''
1 = ''BList'' is our starting nonterminal, and
* the set ''P''
1 consists of the following productions:
** ''Bool'' → ''false''
** ''Bool'' → ''true''
** ''BList'' → ''nil''
** ''BList'' → ''cons''(''Bool'',''BList'')
An example derivation from the grammar ''G''
1 is
''BList''
⇒ ''cons''(''Bool'',''BList'')
⇒ ''cons''(''false'',''cons''(''Bool'',''BList''))
⇒ ''cons''(''false'',''cons''(''true'',''nil'')).
The image shows the corresponding
derivation tree; it is a tree of trees (main picture), whereas a derivation tree in
word grammars is a tree of strings (upper left table).
The tree language generated by ''G''
1 is the set of all finite lists of boolean values, that is, ''L''(''G''
1) happens to equal ''T''
Σ1.
The grammar ''G''
1 corresponds to the algebraic data type declarations (in the
Standard ML
Standard ML (SML) is a general-purpose, modular, functional programming language with compile-time type checking and type inference. It is popular among compiler writers and programming language researchers, as well as in the development of ...
programming language):
datatype Bool
= false
, true
datatype BList
= nil
, cons of Bool * BList
Every member of ''L''(''G''
1) corresponds to a Standard-ML value of type BList.
For another example, let , using the nonterminal set and the alphabet from above, but extending the production set by ''P''
2, consisting of the following productions:
* ''BList'' → ''cons''(''true'',''BList'')
* ''BList'' → ''cons''(''false'',''BList'')
The language ''L''(''G''
2) is the set of all finite lists of boolean values that contain ''true'' at least once. The set ''L''(''G''
2) has no datatype counterpart in Standard ML, nor in any other functional language.
It is a proper subset of ''L''(''G''
1).
The above example term happens to be in ''L''(''G''
2), too, as the following derivation shows:
''BList''
⇒ ''cons''(''false'',''BList'')
⇒ ''cons''(''false'',''cons''(''true'',''BList''))
⇒ ''cons''(''false'',''cons''(''true'',''nil'')).
Language properties
If ''L''
1, ''L''
2 both are regular tree languages, then the tree sets , and ''L''
1 \ ''L''
2 are also regular tree languages, and it is decidable whether , and whether ''L''
1 = ''L''
2.
Alternative characterizations and relation to other formal languages
*Regular tree grammars are a generalization of
regular word grammars.
*The regular tree languages are also the languages recognized by bottom-up
tree automata and nondeterministic top-down tree automata.
*Rajeev Alur and Parthasarathy Madhusudan related a subclass of regular binary tree languages to
nested words and
visibly pushdown language In computer science, more specifically in automata and formal language theory, nested words are a concept proposed by Alur and Madhusudan as a joint generalization of words, as traditionally used for modelling linearly ordered structures, and ...
s.
[ Sect.4, Theorem 5,][ Sect.7]
Applications
Applications of regular tree grammars include:
*
Instruction selection __NOTOC__
In computer science, ''instruction selection'' is the stage of a compiler backend that transforms its middle-level intermediate representation (IR) into a low-level IR. In a typical compiler, instruction selection precedes both instruction ...
in compiler code generation
* A
decision procedure for the
first-order logic
First-order logic—also known as predicate logic, quantificational logic, and first-order predicate calculus—is a collection of formal systems used in mathematics, philosophy, linguistics, and computer science. First-order logic uses quanti ...
theory
A theory is a rational type of abstract thinking about a phenomenon, or the results of such thinking. The process of contemplative and rational thinking is often associated with such processes as observational study or research. Theories may ...
of formulas over
equality (=) and
set membership
In mathematics, an element (or member) of a set is any one of the distinct objects that belong to that set.
Sets
Writing A = \ means that the elements of the set are the numbers 1, 2, 3 and 4. Sets of elements of , for example \, are subset ...
(∈) as the only
predicates
* Solving
constraints about mathematical sets
* The set of all truths expressible in
first-order logic
First-order logic—also known as predicate logic, quantificational logic, and first-order predicate calculus—is a collection of formal systems used in mathematics, philosophy, linguistics, and computer science. First-order logic uses quanti ...
about a finite algebra (which is always a regular tree language)
* Graph-search
/ref>
See also
* Set constraint – a generalization of regular tree grammars
* Tree-adjoining grammar
Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi. Tree-adjoining grammars are somewhat similar to context-free grammars, but the elementary unit of rewriting is the tree rather than the symbol. Whereas context-free gr ...
References
Further reading
* Regular tree grammars were already described in 1968 by:
**
**
* A book devoted to tree grammars is:
* Algorithms on regular tree grammars are discussed from an efficiency-oriented view in:
* Given a mapping from trees to weights, Donald Knuth
Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist, mathematician, and professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of computer ...
's generalization of Dijkstra's shortest-path algorithm can be applied to a regular tree grammar to compute for each nonterminal the minimum weight of a derivable tree. Based on this information, it is straightforward to enumerate its language in increasing weight order. In particular, any nonterminal with infinite minimum weight produces the empty language. See:
* Regular tree automata have been generalized to admit equality tests between sibling nodes in trees. See:
* Allowing equality tests between deeper nodes leads to undecidability. See:
{{Formal languages and grammars , state=collapsed
Formal languages