Tree-adjoining grammar (TAG) is a
grammar formalism
In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe ...
defined by
Aravind Joshi
Aravind Krishna Joshi (August 5, 1929 – December 31, 2017) was the Henry Salvatori Professor of Computer and Cognitive Science in the computer science department of the University of Pennsylvania. Joshi defined the tree-adjoining grammar form ...
. Tree-adjoining grammars are somewhat similar to
context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form
:A\ \to\ \alpha
with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empt ...
s, but the elementary unit of rewriting is the tree rather than the symbol. Whereas context-free grammars have rules for rewriting symbols as strings of other symbols, tree-adjoining grammars have rules for rewriting the nodes of trees as other trees (see
tree (graph theory)
In graph theory
In mathematics, graph theory is the study of ''graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of '' vertices'' (also called ''nodes'' or ''points ...
and
tree (data structure)
In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be conn ...
).
History
TAG originated in investigations by Joshi and his students into the family of adjunction grammars (AG),
[
]
the "string grammar" of
Zellig Harris
Zellig Sabbettai Harris (; October 23, 1909 – May 22, 1992) was an influential American linguist, mathematical syntactician, and methodologist of science. Originally a Semiticist, he is best known for his work in structural linguistics and dis ...
. AGs handle
exocentric
In theoretical linguistics, a distinction is made between endocentric and exocentric constructions. A grammatical construction (for instance, a phrase or compound) is said to be ''endocentric'' if it fulfils the same linguistic function as one of ...
properties of language in a natural and effective way, but do not have a good characterization of
endocentric
In theoretical linguistics, a distinction is made between endocentric and exocentric constructions. A grammatical construction (for instance, a phrase or compound) is said to be ''endocentric'' if it fulfils the same linguistic function as one o ...
constructions; the converse is true of
rewrite grammars, or
phrase-structure grammar (PSG). In 1969, Joshi introduced a family of grammars that exploits this complementarity by mixing the two types of rules. A few very simple rewrite rules suffice to generate the vocabulary of strings for adjunction rules. This family is distinct from the
Chomsky-Schützenberger hierarchy but intersects it in interesting and linguistically relevant ways.
The center strings and adjunct strings can also be generated by a
dependency grammar
Dependency grammar (DG) is a class of modern grammatical theories that are all based on the dependency relation (as opposed to the ''constituency relation'' of phrase structure) and that can be traced back primarily to the work of Lucien Tesnià ...
, avoiding the limitations of rewrite systems entirely.
Description
The rules in a TAG are trees with a special leaf node known as the ''foot node'', which is anchored to a word.
There are two types of basic trees in TAG: ''initial'' trees (often represented as '
') and ''auxiliary'' trees ('
'). Initial trees represent basic valency relations, while auxiliary trees allow for recursion.
Auxiliary trees have the root (top) node and foot node labeled with the same symbol.
A derivation starts with an initial tree, combining via either ''substitution'' or ''adjunction''. Substitution replaces a frontier node with another tree whose top node has the same label. The root/foot label of the auxiliary tree must match the label of the node at which it adjoins. Adjunction can thus have the effect of inserting an auxiliary tree into the center of another tree.
Other variants of TAG allow
multi-component trees, trees with multiple foot nodes, and other extensions.
Complexity and application
Tree-adjoining grammars are more powerful (in terms of
weak generative capacity) than
context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form
:A\ \to\ \alpha
with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empt ...
s, but less powerful than
linear context-free rewriting system
Generalized context-free grammar (GCFG) is a grammar formalism that expands on context-free grammars by adding potentially non-context-free composition functions to rewrite rules. Head grammar (and its weak equivalents) is an instance of such a GC ...
s,
[Kallmeyer, Laura (2010). Parsing Beyond Context-Free Grammars. Springer. Here: p.215-216] indexed[since for each tree-adjoining grammar, a linear indexed grammar can be found producing the same language, see below, and for the latter, a weakly equivalent (proper) indexed grammar can be found, in turn, see Indexed grammar#Computational Power] or
context-sensitive grammars.
A TAG can describe the language of squares (in which some arbitrary string is repeated), and the language
. This type of processing can be represented by an
embedded pushdown automaton An embedded pushdown automaton or EPDA is a computational model for parsing languages generated by tree-adjoining grammars (TAGs). It is similar to the context-free grammar-parsing pushdown automaton, but instead of using a plain stack to store sy ...
.
Languages with cubes (i.e. triplicated strings) or with more than four distinct character strings of equal length cannot be generated by tree-adjoining grammars.
For these reasons, tree-adjoining grammars are often described as
mildly context-sensitive.
These grammar classes are conjectured to be powerful enough to model
natural language
In neuropsychology, linguistics, and philosophy of language, a natural language or ordinary language is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation. Natural languages ...
s while remaining efficiently
parsable
Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from L ...
in the general case.
Equivalences
Vijay-Shanker and Weir (1994)
[Vijay-Shanker, K. and Weir, David J. 1994. ''The Equivalence of Four Extensions of Context-Free Grammars''. Mathematical Systems Theory 27(6): 511–546.] demonstrate that
linear indexed grammars
Linearity is the property of a mathematical relationship (''function'') that can be graphically represented as a straight line. Linearity is closely related to '' proportionality''. Examples in physics include rectilinear motion, the linear r ...
,
combinatory categorial grammar
Combinatory categorial grammar (CCG) is an efficiently parsable, yet linguistically expressive grammar formalism. It has a transparent interface between surface syntax and underlying semantic representation, including predicate–argument structur ...
, tree-adjoining grammars, and
head grammar
Head grammar (HG) is a grammar formalism introduced in Carl Pollard (1984) Pollard, C. 1984. ''Generalized Phrase Structure Grammars, Head Grammars, and Natural Language''. Ph.D. thesis, Stanford University, CA. as an extension of the context-fr ...
s are
weakly equivalent formalisms, in that they all define the same string languages.
Lexicalized
Lexicalized tree-adjoining grammars (LTAG) are a variant of TAG in which each elementary tree (initial or auxiliary) is associated with a lexical item. A lexicalized grammar for English has been developed by the XTAG Research Group of the Institute for Research in Cognitive Science at the University of Pennsylvania.
Notes
References
External links
The XTAG project which uses a TAG for natural language processing.
SemConst DocumentationA quick survey on Syntax and Semantic Interface problematic within the TAG framework.
The TuLiPa projectThe Tübingen Linguistic Parsing Architecture (TuLiPA) is a multi-formalism syntactic (and semantic) parsing environment, designed mainly for
multi-component tree adjoining grammars with
tree tuple
In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, including only woody plants with secondary growth, plants that are u ...
s
The Metagrammar Toolkitwhich provides several tools to edit and compile
MetaGrammars into TAGs. It also include a wide coverage French Metagrammars.
LLP2A
lexicalized tree adjoining grammar
In linguistics, lexicalization is the process of adding words, set phrases, or word patterns to a language's lexicon.
Whether ''word formation'' and ''lexicalization'' refer to the same process is controversial within the field of linguistics. M ...
parser which provides an easy to use graphical environment (page in French)
{{DEFAULTSORT:Tree-Adjoining Grammar
Generative linguistics
Grammar frameworks