HOME

TheInfoList



OR:

In
theoretical computer science Theoretical computer science is a subfield of computer science and mathematics that focuses on the Abstraction, abstract and mathematical foundations of computation. It is difficult to circumscribe the theoretical areas precisely. The Associati ...
and
formal language theory In logic, mathematics, computer science, and linguistics, a formal language is a set of string (computer science), strings whose symbols are taken from a set called "#Definition, alphabet". The alphabet of a formal language consists of symbol ...
, a regular tree grammar is a
formal grammar A formal grammar is a set of Terminal and nonterminal symbols, symbols and the Production (computer science), production rules for rewriting some of them into every possible string of a formal language over an Alphabet (formal languages), alphabe ...
that describes a set of directed trees, or terms. A regular word grammar can be seen as a special kind of regular tree grammar, describing a set of single-
path A path is a route for physical travel – see Trail. Path or PATH may also refer to: Physical paths of different types * Bicycle path * Bridle path, used by people on horseback * Course (navigation), the intended path of a vehicle * Desir ...
trees.


Definition

A regular tree grammar ''G'' is defined by the tuple ''G'' = (''N'', Σ, ''Z'', ''P''), where * ''N'' is a finite set of nonterminals, * Σ is a ranked alphabet (i.e., an alphabet whose symbols have an associated
arity In logic, mathematics, and computer science, arity () is the number of arguments or operands taken by a function, operation or relation. In mathematics, arity may also be called rank, but this word can have many other meanings. In logic and ...
) disjoint from ''N'', * ''Z'' is the starting nonterminal, with , and * ''P'' is a finite set of productions of the form ''A'' → ''t'', with , and , where ''T''Σ(''N'') is the associated
term algebra Term may refer to: Language *Terminology, context-specific nouns or compound words **Technical term (or ''term of art''), used by specialists in a field ***Scientific terminology, used by scientists *Term (argumentation), part of an argument in d ...
, i.e. the set of all trees composed from symbols in according to their arities, where nonterminals are considered nullary.


Derivation of trees

The grammar ''G'' implicitly defines a set of trees: any tree that can be derived from ''Z'' using the rule set ''P'' is said to be described by ''G''. This set of trees is known as the
language Language is a structured system of communication that consists of grammar and vocabulary. It is the primary means by which humans convey meaning, both in spoken and signed language, signed forms, and may also be conveyed through writing syste ...
of ''G''. More formally, the relation ⇒''G'' on the set ''T''Σ(''N'') is defined as follows: A tree can be derived in a single step into a tree (in short: ''t''1''G'' ''t''2), if there is a context ''S'' and a production such that: * ''t''1 = ''S'' 'A'' and * ''t''2 = ''S'' 't'' Here, a ''context'' means a tree with exactly one hole in it; if ''S'' is such a context, ''S'' 't''denotes the result of filling the tree ''t'' into the hole of ''S''. The tree language generated by ''G'' is the language . Here, ''T''Σ denotes the set of all trees composed from symbols of Σ, while ⇒''G''* denotes successive applications of ⇒''G''. A language generated by some regular tree grammar is called a regular tree language.


Examples

Let ''G''1 = (''N''11,''Z''1,''P''1), where * ''N''1 = is our set of nonterminals, * Σ1 = is our ranked alphabet, arities indicated by dummy arguments (i.e. the symbol ''cons'' has arity 2), * ''Z''1 = ''BList'' is our starting nonterminal, and * the set ''P''1 consists of the following productions: ** ''Bool'' → ''false'' ** ''Bool'' → ''true'' ** ''BList'' → ''nil'' ** ''BList'' → ''cons''(''Bool'',''BList'') An example derivation from the grammar ''G''1 is ''BList'' ⇒ ''cons''(''Bool'',''BList'') ⇒ ''cons''(''false'',''cons''(''Bool'',''BList'')) ⇒ ''cons''(''false'',''cons''(''true'',''nil'')). The image shows the corresponding
derivation tree A parse tree or parsing tree (also known as a derivation tree or concrete syntax tree) is an ordered, rooted tree (data structure), tree that represents the syntax, syntactic structure of a string (computer science), string according to some cont ...
; it is a tree of trees (main picture), whereas a derivation tree in word grammars is a tree of strings (upper left table). The tree language generated by ''G''1 is the set of all finite lists of boolean values, that is, ''L''(''G''1) happens to equal ''T''Σ1. The grammar ''G''1 corresponds to the algebraic data type declarations (in the
Standard ML Standard ML (SML) is a General-purpose programming language, general-purpose, High-level programming language, high-level, Modular programming, modular, Functional programming, functional programming language with compile-time type checking and t ...
programming language): datatype Bool = false , true datatype BList = nil , cons of Bool * BList Every member of ''L''(''G''1) corresponds to a Standard-ML value of type BList. For another example, let , using the nonterminal set and the alphabet from above, but extending the production set by ''P''2, consisting of the following productions: * ''BList'' → ''cons''(''true'',''BList'') * ''BList'' → ''cons''(''false'',''BList'') The language ''L''(''G''2) is the set of all finite lists of boolean values that contain ''true'' at least once. The set ''L''(''G''2) has no datatype counterpart in Standard ML, nor in any other functional language. It is a proper subset of ''L''(''G''1). The above example term happens to be in ''L''(''G''2), too, as the following derivation shows: ''BList'' ⇒ ''cons''(''false'',''BList'') ⇒ ''cons''(''false'',''cons''(''true'',''BList'')) ⇒ ''cons''(''false'',''cons''(''true'',''nil'')).


Language properties

If ''L''1, ''L''2 both are regular tree languages, then the tree sets , and ''L''1 \ ''L''2 are also regular tree languages, and it is decidable whether , and whether ''L''1 = ''L''2.


Alternative characterizations and relation to other formal languages

*Regular tree grammars are a generalization of regular word grammars. *The regular tree languages are also the languages recognized by bottom-up tree automata and nondeterministic top-down tree automata. *Rajeev Alur and Parthasarathy Madhusudan related a subclass of regular binary tree languages to nested words and
visibly pushdown language In computer science, more specifically in automata theory, automata and formal language theory, nested words are a concept proposed by Rajeev Alur, Alur and Madhusudan as a joint generalization of String (computer science), words, as traditionally ...
s. Sect.4, Theorem 5, Sect.7


Applications

Applications of regular tree grammars include: *
Instruction selection __NOTOC__ In computer science, ''instruction selection'' is the stage of a compiler backend that transforms its middle-level intermediate representation (IR) into a low-level IR. In a typical compiler, instruction selection precedes both instructio ...
in compiler code generation * A
decision procedure Decision may refer to: Law and politics *Judgment (law), as the outcome of a legal case * Landmark decision, the outcome of a case that sets a legal precedent * ''Per curiam'' decision, by a court with multiple judges Books * ''Decision'' (novel ...
for the
first-order logic First-order logic, also called predicate logic, predicate calculus, or quantificational logic, is a collection of formal systems used in mathematics, philosophy, linguistics, and computer science. First-order logic uses quantified variables over ...
theory A theory is a systematic and rational form of abstract thinking about a phenomenon, or the conclusions derived from such thinking. It involves contemplative and logical reasoning, often supported by processes such as observation, experimentation, ...
of formulas over
equality Equality generally refers to the fact of being equal, of having the same value. In specific contexts, equality may refer to: Society * Egalitarianism, a trend of thought that favors equality for all people ** Political egalitarianism, in which ...
(=) and
set membership In mathematics, an element (or member) of a set is any one of the distinct objects that belong to that set. For example, given a set called containing the first four positive integers (A = \), one could say that "3 is an element of ", expressed ...
(∈) as the only predicates * Solving constraints about mathematical sets * The set of all truths expressible in
first-order logic First-order logic, also called predicate logic, predicate calculus, or quantificational logic, is a collection of formal systems used in mathematics, philosophy, linguistics, and computer science. First-order logic uses quantified variables over ...
about a finite algebra (which is always a regular tree language) * Graph-search
/ref>


See also

* Set constraint – a generalization of regular tree grammars *
Tree-adjoining grammar Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi. Tree-adjoining grammars are somewhat similar to context-free grammars, but the elementary unit of rewriting is the tree rather than the symbol. Whereas context-free gr ...


References


Further reading

* Regular tree grammars were already described in 1968 by: ** ** * A book devoted to tree grammars is: * Algorithms on regular tree grammars are discussed from an efficiency-oriented view in: * Given a mapping from trees to weights,
Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist and mathematician. He is a professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of comp ...
's generalization of Dijkstra's shortest-path algorithm can be applied to a regular tree grammar to compute for each nonterminal the minimum weight of a derivable tree. Based on this information, it is straightforward to enumerate its language in increasing weight order. In particular, any nonterminal with infinite minimum weight produces the empty language. See: * Regular tree automata have been generalized to admit equality tests between sibling nodes in trees. See: * Allowing equality tests between deeper nodes leads to undecidability. See: {{Formal languages and grammars , state=collapsed Formal languages