Tree Language
   HOME

TheInfoList



OR:

A tree automaton is a type of
state machine A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number o ...
. Tree automata deal with
tree structure A tree structure, tree diagram, or tree model is a way of representing the hierarchical nature of a structure in a graphical form. It is named a "tree structure" because the classic representation resembles a tree, although the chart is gener ...
s, rather than the
strings String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian anim ...
of more conventional state machines. The following article deals with branching tree automata, which correspond to regular languages of trees. As with classical automata, finite tree automata (FTA) can be either a
deterministic automaton In computer science, a deterministic automaton is a concept of automata theory in which the outcome of a transition from one state to another is determined by the input. A common deterministic automaton is a deterministic finite automaton (DFA) whi ...
or not. According to how the automaton processes the input tree, finite tree automata can be of two types: (a) bottom up, (b) top down. This is an important issue, as although non-deterministic (ND) top-down and ND bottom-up tree automata are equivalent in expressive power, deterministic top-down automata are strictly less powerful than their deterministic bottom-up counterparts, because tree properties specified by deterministic top-down tree automata can only depend on path properties. (Deterministic bottom-up tree automata are as powerful as ND tree automata.)


Definitions

A bottom-up finite tree automaton over ''F'' is defined as a tuple (''Q'', ''F'', ''Q''''f'', Δ), where ''Q'' is a set of states, ''F'' is a
ranked alphabet In theoretical computer science and formal language theory, a ranked alphabet is a pair of an ordinary alphabet ''F'' and a function ''Arity'': ''F''→\mathbb. Each letter in ''F'' has its arity so it can be used to build terms. Nullary elements ...
(i.e., an alphabet whose symbols have an associated
arity Arity () is the number of arguments or operands taken by a function, operation or relation in logic, mathematics, and computer science. In mathematics, arity may also be named ''rank'', but this word can have many other meanings in mathematics. In ...
), ''Q''''f'' ⊆ ''Q'' is a set of final states, and Δ is a set of transition rules of the form ''f''(''q''1(''x''1),...,''q''''n''(''x''''n'')) → ''q''(''f''(''x''1,...,''x''''n'')), for an ''n''-ary ''f'' ∈ ''F'', ''q'', ''q''''i'' ∈ ''Q'', and ''x''''i'' variables denoting subtrees. That is, members of Δ are rewrite rules from nodes whose childs' roots are states, to nodes whose roots are states. Thus the state of a node is deduced from the states of its children. For ''n''=0, that is, for a constant symbol ''f'', the above transition rule definition reads ''f''() → ''q''(''f''()); often the empty parentheses are omitted for convenience: ''f'' → ''q''(''f''). Since these transition rules for constant symbols (leaves) do not require a state, no explicitly defined initial states are needed. A bottom-up tree automaton is run on a
ground term In mathematical logic, a ground term of a formal system is a term that does not contain any variables. Similarly, a ground formula is a formula that does not contain any variables. In first-order logic with identity, the sentence Q(a) \lor P(b) ...
over ''F'', starting at all its leaves simultaneously and moving upwards, associating a run state from ''Q'' with each subterm. The term is accepted if its root is associated to an accepting state from ''Q''''f''. A top-down finite tree automaton over ''F'' is defined as a tuple (''Q'', ''F'', ''Q''''i'', Δ), with two differences with bottom-up tree automata. First, ''Q''''i'' ⊆ ''Q'', the set of its initial states, replaces ''Q''''f''; second, its transition rules are oriented conversely: ''q''(''f''(''x''1,...,''x''''n'')) → ''f''(''q''1(''x''1),...,''q''''n''(''x''''n'')), for an ''n''-ary ''f'' ∈ ''F'', ''q'', ''q''''i'' ∈ ''Q'', and ''x''''i'' variables denoting subtrees. That is, members of Δ are here rewrite rules from nodes whose roots are states to nodes whose children's roots are states. A top-down automaton starts in some of its initial states at the root and moves downward along branches of the tree, associating along a run a state with each subterm inductively. A tree is accepted if every branch can be gone through this way. A tree automaton is called deterministic (abbreviated DFTA) if no two rules from Δ have the same left hand side; otherwise it is called nondeterministic (NFTA). Non-deterministic top-down tree automata have the same expressive power as non-deterministic bottom-up ones; the transition rules are simply reversed, and the final states become the initial states. In contrast, deterministic top-down tree automata are less powerful than their bottom-up counterparts, because in a deterministic tree automaton no two transition rules have the same left-hand side. For tree automata, transition rules are rewrite rules; and for top-down ones, the left-hand side will be parent nodes. Consequently, a deterministic top-down tree automaton will only be able to test for tree properties that are true in all branches, because the choice of the state to write into each child branch is determined at the parent node, without knowing the child branches contents. Infinite-tree automata extend top-down automata to infinite trees, and can be used to prove decidability of S2S, the monadic second-order theory with two successors. Finite tree automata (nondeterministic if top-down) suffice for WS2S.


Examples


Bottom-up automaton accepting boolean lists

Employing coloring to distinguish members of ''F'' and ''Q'', and using the ranked alphabet ''F''=, with having arity 2 and all other symbols having arity 0, a bottom-up tree automaton accepting the set of all finite lists of boolean values can be defined as (''Q'', ''F'', ''Q''''f'', Δ) with , ''Q''''f''=, and Δ consisting of the rules In this example, the rules can be understood intuitively as assigning to each term its type in a bottom-up manner; e.g. rule (4) can be read as "A term (''x''1,''x''2) has type , provided ''x''1 and ''x''2 has type and , respectively". An accepting example run is Cf. the derivation of the same term from a regular tree grammar corresponding to the automaton, shown at Regular tree grammar#Examples. A rejecting example run is Intuitively, this corresponds to the term (,) not being well-typed.


Top-down automaton accepting multiples of 3 in binary notation

Using the same colorization as above, this example shows how tree automata generalize ordinary string automata. The finite deterministic string automaton shown in the picture accepts all strings of binary digits that denote a multiple of 3. Using the notions from Deterministic finite automaton#Formal definition, it is defined by: * the set ''Q'' of states being , * the input alphabet being , * the initial state being , * the set of final states being , and * the transitions being as shown in column (B) of the table. In the tree automaton setting, the input alphabet is changed such that the symbols and are both unary, and a nullary symbol, say is used for tree leaves. For example, the binary string "" in the string automaton setting corresponds to the term "((()))" in the tree automaton setting; this way, strings can be generalized to trees, or terms. The top-down finite tree automaton accepting the set of all terms corresponding to multiples of 3 in binary string notation is then defined by: * the set ''Q'' of states being still , * the ranked input alphabet being , with ''Arity''()=''Arity''()=1 and ''Arity''()=0, as explained, * the set of initial states being , and * the transitions being as shown in column (C) of the table. For example, the tree "((()))" is accepted by the following tree automaton run: In contrast, the term "(())" leads to following non-accepting automaton run: Since there are no other initial states than to start an automaton run with, the term "(())" is not accepted by the tree automaton. For comparison purposes, the table gives in column (A) and (D) a (right) regular (string) grammar, and a
regular tree grammar In theoretical computer science and formal language theory, a regular tree grammar is a formal grammar that describes a set of directed trees, or terms. A regular word grammar can be seen as a special kind of regular tree grammar, describing a se ...
, respectively, each accepting the same language as its automaton counterpart.


Properties


Recognizability

For a bottom-up automaton, a ground term ''t'' (that is, a tree) is accepted if there exists a reduction that starts from ''t'' and ends with ''q''(''t''), where ''q'' is a final state. For a top-down automaton, a ground term ''t'' is accepted if there exists a reduction that starts from ''q''(''t'') and ends with ''t'', where ''q'' is an initial state. The tree language ''L''(''A'') accepted, or recognized, by a tree automaton ''A'' is the set of all ground terms accepted by ''A''. A set of ground terms is recognizable if there exists a tree automaton that accepts it. A linear (that is, arity-preserving) tree homomorphism preserves recognizability.


Completeness and reduction

A non-deterministic finite tree automaton is complete if there is at least one transition rule available for every possible symbol-states combination. A state ''q'' is accessible if there exists a ground term ''t'' such that there exists a reduction from ''t'' to ''q''(''t''). An NFTA is reduced if all its states are accessible.


Pumping lemma

Every sufficiently large ground term ''t'' in a recognizable tree language ''L'' can be vertically tripartited such that arbitrary repetition ("pumping") of the middle part keeps the resulting term in ''L''.Formally: ''C'' 'C''’''n''[''u'' ∈ ''L'' for all ''n'' ≥ 0. The notation ''C''''n''[.">'u''.html" ;"title="'C''’''n''[''u''">'C''’''n''[''u'' ∈ ''L'' for all ''n'' ≥ 0. The notation ''C''''n''[.means the result of stacking ''n'' copies of ''C''[.] one in another, cf. . For the language of all finite lists of boolean values from the above example, all terms beyond the height limit ''k''=2 can be pumped, since they need to contain an occurrence of . For example, all belong to that language.


Closure

The class of recognizable tree languages is closed under union, under complementation, and under intersection.


Myhill–Nerode theorem

A congruence on the set of all trees over a ranked alphabet ''F'' is an
equivalence relation In mathematics, an equivalence relation is a binary relation that is reflexive, symmetric and transitive. The equipollence relation between line segments in geometry is a common example of an equivalence relation. Each equivalence relation ...
such that ''u''1 ≡ ''v''1 and ... and ''u''''n'' ≡ ''v''''n'' implies ''f''(''u''1,...,''u''''n'') ≡ ''f''(''v''1,...,''v''''n''), for every ''f'' ∈ ''F''. It is of finite index if its number of equivalence-classes is finite. For a given tree-language ''L'', a congruence can be defined by ''u'' ≡''L'' ''v'' if ''C'' 'u''∈ ''L'' ⇔ ''C'' 'v''∈ ''L'' for each context ''C''. The
Myhill–Nerode theorem In the theory of formal languages, the Myhill–Nerode theorem provides a necessary and sufficient condition for a language to be regular. The theorem is named for John Myhill and Anil Nerode, who proved it at the University of Chicago in 1958 . ...
for tree automata states that the following three statements are equivalent: # ''L'' is a recognizable tree language # ''L'' is the union of some equivalence classes of a congruence of finite index # the relation ≡''L'' is a congruence of finite index


See also

*
Courcelle's theorem In the study of graph algorithms, Courcelle's theorem is the statement that every graph property definable in the monadic second-order logic of graphs can be decided in linear time on graphs of bounded treewidth. The result was first proved by Bru ...
- an application of tree automata to prove an algorithmic meta-theorem about graphs *
Tree transducers In theoretical computer science and formal language theory, a tree transducer (TT) is an abstract machine taking as input a tree, and generating output – generally other trees, but models producing words or other structures exist. Roughly speaking ...
- extend tree automata in the same way that word transducers extend word automata. *
Alternating tree automata In automata theory, an alternating tree automaton (ATA) is an extension of nondeterministic tree automaton as same as alternating finite automaton extends nondeterministic finite automaton (NFA). Computational complexity The emptiness problem ( ...
* Infinite-tree automata


Notes


References

* * * *


External links


Implementations


Grappa
- ranked and unranked tree automata libraries (OCaml)
Timbuk
- tools for reachability analysis and tree automata calculations (OCaml)
LETHAL
- library for working with finite tree and hedge automata (Java)
Machine-checked tree automata library
(Isabelle Caml, SML, Haskell
VATA
- a library for efficient manipulation of non-deterministic tree automata (C++) {{Formal languages and grammars Trees (data structures) Automata (computation) Formal languages Theoretical computer science