Tree Transducers
   HOME

TheInfoList



OR:

In
theoretical computer science Theoretical computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, lambda calculus, and type theory. It is difficult to circumsc ...
and
formal language theory In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules. The alphabet of a formal language consists of symb ...
, a tree transducer (TT) is an
abstract machine An abstract machine is a computer science theoretical model that allows for a detailed and precise analysis of how a computer system functions. It is analogous to a mathematical function in that it receives inputs and produces outputs based on pre ...
taking as input a
tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, including only woody plants with secondary growth, plants that are ...
, and generating output – generally other trees, but models producing
words A word is a basic element of language that carries an objective or practical meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consen ...
or other structures exist. Roughly speaking, tree transducers extend
tree automata A tree automaton is a type of state machine. Tree automata deal with tree structures, rather than the strings of more conventional state machines. The following article deals with branching tree automata, which correspond to regular languages ...
in the same way that word transducers extend word automata. Manipulating tree structures instead of words enable TT to model syntax-directed transformations of formal or natural languages. However, TT are not as well-behaved as their word counterparts in terms of
algorithmic complexity Algorithmic may refer to: *Algorithm, step-by-step instructions for a calculation **Algorithmic art, art made by an algorithm **Algorithmic composition, music made by an algorithm ** Algorithmic trading, trading decisions made by an algorithm **Alg ...
, closure properties, etcetera. In particular, most of the main classes are not closed under
composition Composition or Compositions may refer to: Arts and literature *Composition (dance), practice and teaching of choreography *Composition (language), in literature and rhetoric, producing a work in spoken tradition and written discourse, to include v ...
. The main classes of tree transducers are:


Top-Down Tree Transducers (TOP)

A TOP ''T'' is a tuple (''Q'', Σ, Γ, ''I'', δ) such that: * ''Q'' is a
finite set In mathematics, particularly set theory, a finite set is a set that has a finite number of elements. Informally, a finite set is a set which one could in principle count and finish counting. For example, :\ is a finite set with five elements. Th ...
, the set of ''states''; * Σ is a finite
ranked alphabet In theoretical computer science and formal language theory, a ranked alphabet is a pair of an Alphabet (computer science), ordinary alphabet ''F'' and a function ''Arity'': ''F''→\mathbb. Each letter in ''F'' has its arity so it can be used to bui ...
, called the ''input alphabet''; * Γ is a finite ranked alphabet, called the ''output alphabet''; * ''I'' is a
subset In mathematics, Set (mathematics), set ''A'' is a subset of a set ''B'' if all Element (mathematics), elements of ''A'' are also elements of ''B''; ''B'' is then a superset of ''A''. It is possible for ''A'' and ''B'' to be equal; if they are ...
of ''Q'', the set of ''initial states''; and * \delta is a set of ''rules'' of the form q(f(x_1,\dots,x_n)) \to u, where ''f'' is a symbol of Σ, ''n'' is the arity of ''f'', ''q'' is a state, and ''u'' is a tree on Γ and Q\times 1..n, such pairs being
nullary Arity () is the number of arguments or operands taken by a function, operation or relation in logic, mathematics, and computer science. In mathematics, arity may also be named ''rank'', but this word can have many other meanings in mathematics. ...
.


Examples of rules and intuitions on semantics

For instance, :q(f(x_1,\dots,x_3)) \to g(a,q'(x_1),h(q''(x_3))) is a rule – one customarily writes q(x_i) instead of the pair (q,x_i) – and its intuitive semantics is that, under the action of ''q'', a tree with ''f'' at the root and three children is transformed into :g(a,q'(x_1),h(q''(x_3))) where, recursively, q'(x_1) and q''(x_3) are replaced, respectively, with the application of q' on the first child and with the application of q'' on the third.


Semantics as

term rewriting In mathematics, computer science, and logic, rewriting covers a wide range of methods of replacing subterms of a formula with other terms. Such methods may be achieved by rewriting systems (also known as rewrite systems, rewrite engines, or reduc ...

The
semantics Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy Philosophy (f ...
of each state of the transducer ''T'', and of ''T'' itself, is a
binary relation In mathematics, a binary relation associates elements of one set, called the ''domain'', with elements of another set, called the ''codomain''. A binary relation over Set (mathematics), sets and is a new set of ordered pairs consisting of ele ...
between input trees (on Σ) and output trees (on Γ). A way of defining the semantics formally is to see \delta as a
term rewriting system In mathematics, computer science, and logic, rewriting covers a wide range of methods of replacing subterms of a formula with other terms. Such methods may be achieved by rewriting systems (also known as rewrite systems, rewrite engines, or red ...
, provided that in the right-hand sides the calls are written in the form q(x_i), where states ''q'' are unary symbols. Then the semantics ![q!.html" ;"title=".html" ;"title="![q">![q!">.html" ;"title="![q">![q!/math> of a state ''q'' is given by : ![q!] = \ . The semantics of ''T'' is then defined as the union of the semantics of its initial states: :[\![T]\!] = \bigcup_ ! !


Determinism and domain

As with tree automata, a TOP is said to be deterministic (abbreviated DTOP) if no two rules of δ share the same left-hand side, and there is at most one initial state. In that case, the semantics of the DTOP is a
partial function In mathematics, a partial function from a set to a set is a function from a subset of (possibly itself) to . The subset , that is, the domain of viewed as a function, is called the domain of definition of . If equals , that is, if is de ...
from input trees (on Σ) to output trees (on Γ), as are the semantics of each of the DTOP's states. The domain of a transducer is the
domain Domain may refer to: Mathematics *Domain of a function, the set of input values for which the (total) function is defined **Domain of definition of a partial function **Natural domain of a partial function **Domain of holomorphy of a function * Do ...
of its semantics. Likewise, the image of a transducer is the
image An image is a visual representation of something. It can be two-dimensional, three-dimensional, or somehow otherwise feed into the visual system to convey information. An image can be an artifact, such as a photograph or other two-dimensiona ...
of its semantics.


Properties of DTOP

* DTOP are not closed under
union Union commonly refers to: * Trade union, an organization of workers * Union (set theory), in mathematics, a fundamental operation on sets Union may also refer to: Arts and entertainment Music * Union (band), an American rock group ** ''Un ...
: this is already the case for deterministic word transducers. * The domain of a DTOP is a
regular tree language In theoretical computer science and formal language theory, a regular tree grammar is a formal grammar that describes a set of directed trees, or terms. A regular word grammar can be seen as a special kind of regular tree grammar, describing a se ...
. Furthermore, the domain is recognisable by a deterministic top-down tree automaton (DTTA) of size at most exponential in that of the initial DTOP. ::That the domain is DTTA-recognizable is not surprising, considering that the left-hand sides of DTOP rules are the same as for DTTA. As for the reason for the exponential explosion in the worst case (that does not exist in the word case), consider the rule q(f(x_1,x_2)) \to g(p_1(x_1),p_2(x_1),p_3(x_2)). In order for the computation to succeed, it must succeed for both children. That means that the right child must be in the domain of p_3. As for the left child, it must be in the domain of ''both'' p_1 and p_2. Generally, since subtrees can be copied, a single subtree can be evaluated by multiple states during a run, despite the determinism, and unlike DTTA. Thus the construction of the DTTA recognising the domain of a DTOP must account for ''sets'' of states and compute the intersections of their domains, hence the exponential. In the special case of linear DTOP, that is to say DTOP where each x_i appears at most once in the right-hand side of each rule, the construction is linear in time and space. * The image of a DTOP is not a regular tree language. ::Consider the transducer coding the transformation f(x)\to g(x,x); that is, duplicate the child of the input. This is easily done by a rule q(f(x_1)) \to g(p(x_1),p(x_1)), where ''p'' encodes the
identity Identity may refer to: * Identity document * Identity (philosophy) * Identity (social science) * Identity (mathematics) Arts and entertainment Film and television * ''Identity'' (1987 film), an Iranian film * ''Identity'' (2003 film), ...
. Then, absent any restrictions on the first child of the input, the image is a classical non-regular tree language. * However, the domain of a DTOP cannot be ''restricted'' to a regular tree language. That is to say, given a DTOP ''T'' and a language ''L'', one cannot in general build a DTOP T' such that the semantics of T' is that of ''T'', restricted to ''L''. ::This property is linked to the reason deterministic top-down tree automata are less expressive than bottom-up automata: once you go down a given path, information from other paths is inaccessible. Consider the transducer coding the transformation f(x,y)\to y; that is, output the right child of the input. This is easily done by a rule q(f(x_1,x_2)) \to p(x_2), where ''p'' encodes the identity. Now let's say we want to restrict this transducer to the finite (and thus, in particular, regular) domain \. We must use the rules q(f(x_1,x_2)) \to p(x_2),\ p(a) \to a,\ p(b)\to b. But in the first rule, x_1 does not appear at all, since nothing is produced from the left child. Thus, it is not possible to test that the left child is ''c''. In contrast, since we produce from the right child, we can test that it is ''a'' or ''b''. In general, the criterion is that DTOP cannot test properties of subtrees from which they do not produce output. * DTOP are not closed under
composition Composition or Compositions may refer to: Arts and literature *Composition (dance), practice and teaching of choreography *Composition (language), in literature and rhetoric, producing a work in spoken tradition and written discourse, to include v ...
. However this problem can be solved by the addition of a lookahead: a tree automaton, coupled to the transducer, that can perform tests on the
domain Domain may refer to: Mathematics *Domain of a function, the set of input values for which the (total) function is defined **Domain of definition of a partial function **Natural domain of a partial function **Domain of holomorphy of a function * Do ...
which the transducer is incapable of. ::This follows from the point about domain restriction: composing the DTOP encoding identity on \ with the one encoding f(x,y)\to y must yield a transducer with the semantics \, which we know is not expressible by a DTOP. * The
typechecking In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every "term" (a word, phrase, or other set of symbols). Usually the terms are various constructs of a computer progra ...
problem—testing whether the image of a regular tree language is included in another regular tree language—is decidable. * The
equivalence problem In theoretical computer science and formal language theory, the equivalence problem is the question of determining, given two representations of formal languages, whether they denote the same formal language. The complexity and decidability of thi ...
—testing whether two DTOP define the same functions—is decidable.


Bottom-Up Tree Transducers (BOT)

As in the simpler case of tree automata, bottom-up tree transducers are defined similarly to their top-down counterparts, but proceed from the leaves of the tree to the root, instead of from the root to the leaves. Thus the main difference is in the form of the rules, which are of the form f(q_1(x_1),\dots,q_n(x_n))) \to q(u).


References

* * {{reflist Trees (data structures) Automata (computation) Finite automata Formal languages Theoretical computer science