Link Grammar
   HOME

TheInfoList



OR:

Link grammar (LG) is a theory of
syntax In linguistics, syntax () is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure ( constituenc ...
by Davy Temperley and
Daniel Sleator Daniel Dominic Kaplan Sleator (born 10 December 1953) is a Professor of Computer Science at Carnegie Mellon University, Pittsburgh, United States. In 1999, he won the ACM Paris Kanellakis Award (jointly with Robert Tarjan) for the splay tre ...
which builds relations between pairs of words, rather than constructing constituents in a
phrase structure Phrase structure rules are a type of rewrite rule used to describe a given language's syntax and are closely associated with the early stages of transformational grammar, proposed by Noam Chomsky in 1957. They are used to break down a natural langu ...
hierarchy. Link grammar is similar to
dependency grammar Dependency grammar (DG) is a class of modern grammatical theories that are all based on the dependency relation (as opposed to the ''constituency relation'' of phrase structure) and that can be traced back primarily to the work of Lucien Tesnià ...
, but dependency grammar includes a head-dependent relationship, whereas Link Grammar makes the head-dependent relationship optional (links need not indicate direction).Link Grammar Bibliography
/ref> Colored Multiplanar Link Grammar (CMLG) is an extension of LG allowing crossing relations between pairs of words. The relationship between words is indicated with link types, thus making the Link grammar closely related to certain
categorial grammar Categorial grammar is a family of formalisms in natural language syntax that share the central assumption that syntactic constituents combine as functions and arguments. Categorial grammar posits a close relationship between the syntax and seman ...
s. For example, in a subject–verb–object language like English, the verb would look left to form a subject link, and right to form an object link. Nouns would look right to complete the subject link, or left to complete the object link. In a subject–object–verb language like Persian, the verb would look left to form an object link, and a more distant left to form a subject link. Nouns would look to the right for both subject and object links.


Overview

Link grammar connects the words in a sentence with links, similar in form to a catena. Unlike the catena or a traditional
dependency grammar Dependency grammar (DG) is a class of modern grammatical theories that are all based on the dependency relation (as opposed to the ''constituency relation'' of phrase structure) and that can be traced back primarily to the work of Lucien Tesnià ...
, the marking of the head-dependent relationship is optional for most languages, becoming mandatory only in free-word-order languages (such as Turkish,
Finnish Finnish may refer to: * Something or someone from, or related to Finland * Culture of Finland * Finnish people or Finns, the primary ethnic group in Finland * Finnish language, the national language of the Finnish people * Finnish cuisine See also ...
, Hungarian, Lithuanian). That is, in English, the subject-verb relationship is "obvious", in that the subject is almost always to the left of the verb, and thus no specific indication of dependency needs to be made. In the case of subject-verb inversion, a distinct link type is employed. For free word-order languages, this can no longer hold, and a link between the subject and verb must contain an explicit directional arrow to indicate which of the two words is which. Link grammar also differs from traditional dependency grammars by allowing cyclic relations between words. Thus, for example, there can be links indicating both the head verb of a sentence, the head subject of the sentence, as well as a link between the subject and the verb. These three links thus form a cycle (a triangle, in this case). Cycles are useful in constraining what might otherwise be ambiguous parses; cycles help "tighten up" the set of allowable parses of a sentence. For example, in the parse +---->WV--->+ +--Wd--+-Ss-+--Pa--+ , , , , LEFT-WALL he runs fast the LEFT-WALL indicates the start of the sentence, or the root node. The directional WV link (with arrows) points at the head verb of the sentence; it is the Wall-Verb link. The Wd link (drawn here without arrows) indicates the head noun (the subject) of the sentence. The link type Wd indicates both that it connects to the wall (W) and that the sentence is a declarative sentence (the lower-case "d" subtype). The Ss link indicates the subject-verb relationship; the lower-case "s" indicating that the subject is singular. Note that the WV, Wd and Ss links for a cycle. The Pa link connects the verb to a complement; the lower-case "a" indicating that it is a
predicative adjective A predicative expression (or just predicative) is part of a clause predicate, and is an expression that typically follows a copula (or linking verb), e.g. ''be'', ''seem'', ''appear'', or that appears as a second complement of a certain type of ...
in this case.


Parsing algorithm

Parsing is performed in analogy to assembling a
jigsaw puzzle A jigsaw puzzle is a tiling puzzle that requires the assembly of often irregularly shaped interlocking and mosaiced pieces, each of which typically has a portion of a picture. When assembled, the puzzle pieces produce a complete picture. In t ...
(representing the parsed sentence) from puzzle pieces (representing individual words).An Introduction to the Link Grammar Parser
/ref> A language is represented by means of a dictionary or
lexis Lexis may refer to: * Lexis (linguistics), the total bank of words and phrases of a particular language, the artifact of which is known as a lexicon *Lexis (Aristotle), a complete group of words in a language *LexisNexis, part of the LexisNexis onl ...
, which consists of words and the set of allowed "jigsaw puzzle shapes" that each word can have. The shape is indicated by a "connector", which is a link-type, and a direction indicator + or - indicating right or left. Thus for example, a
transitive verb A transitive verb is a verb that accepts one or more objects, for example, 'cleaned' in ''Donald cleaned the window''. This contrasts with intransitive verbs, which do not have objects, for example, 'panicked' in ''Donald panicked''. Transiti ...
may have the connectors S- & O+ indicating that the verb may form a Subject ("S") connection to its left ("-") and an object connection ("O") to its right ("+"). Similarly, a common noun may have the connectors D- & S+ indicating that it may connect to a
determiner A determiner, also called determinative ( abbreviated ), is a word, phrase, or affix that occurs together with a noun or noun phrase and generally serves to express the reference of that noun or noun phrase in the context. That is, a determine ...
on the left ("D-") and act as a subject, when connecting to a verb on the right ("S+"). The act of parsing is then to identify that the S+ connector can attach to the S- connector, forming an "S" link between the two words. Parsing completes when all connectors have been connected. A given word may have dozens or even hundreds of allowed puzzle-shapes (termed "disjuncts"): for example, many verbs may be optionally transitive, thus making the O+ connector optional; such verbs might also take adverbial modifiers (E connectors) which are inherently optional. More complex verbs may have additional connectors for indirect objects, or for
particles In the physical sciences, a particle (or corpuscule in older texts) is a small localized object which can be described by several physical or chemical properties, such as volume, density, or mass. They vary greatly in size or quantity, from s ...
or
preposition Prepositions and postpositions, together called adpositions (or broadly, in traditional grammar, simply prepositions), are a class of words used to express spatial or temporal relations (''in'', ''under'', ''towards'', ''before'') or mark various ...
s. Thus, a part of parsing also involves picking one single unique disjunct for a word; the final parse must satisfy (connect) ''all'' connectors for that disjunct.


Dependency

Connectors may also include head-dependent indicators h and d. In this case, a connector containing a head indicator is only allowed to connect to a connector containing the dependent indicator (or to a connector without any h-d indicators on it). When these indicators are used, the link is decorated with arrows to indicate the link direction. A recent extension simplifies the specification of connectors for languages that have little or no restrictions on word-order, such as Lithuanian. There are also extensions to make it easier to support languages with concatenative morphologies.


Planarity

The parsing algorithm also requires that the final graph is a
planar graph In graph theory, a planar graph is a graph that can be embedded in the plane, i.e., it can be drawn on the plane in such a way that its edges intersect only at their endpoints. In other words, it can be drawn in such a way that no edges cro ...
, i.e. that no links cross. This constraint is based on empirical psycho-linguistic evidence that, indeed, for most languages, in nearly all situations, dependency links really do not cross. There are rare exceptions, e.g. in Finnish, and even in English; they can be parsed by link-grammar only by introducing more complex and selective connector types to capture these situations.


Costs and selection

Connectors can have an optional
floating-point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can ...
cost markup, so that some are "cheaper" to use than others, thus giving preference to certain parses over others. That is, the total cost of parse is the sum of the individual costs of the connectors that were used; the cheapest parse indicates the most likely parse. This is used for parse-ranking multiple ambiguous parses. The fact that the costs are local to the connectors, and are not a global property of the algorithm makes them essentially Markovian in nature. The assignment of a log-likelihood to linkages allows link grammar to implement the semantic selection of predicate-argument relationships. That is, certain constructions, although syntactically valid, are extremely unlikely. In this way, link grammar embodies some of the ideas present in
operator grammar Operator grammar is a mathematical theory of human language that explains how language carries information. This theory is the culmination of the life work of Zellig Harris, with major publications toward the end of the last century. Operator g ...
. Because the costs are additive, they behave like the logarithm of the probability (since log-likelihoods are additive), or equivalently, somewhat like the
entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodyna ...
(since entropies are additive). This makes Link Grammar compatible with machine learning techniques such as
hidden Markov model A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ...
s and the
Viterbi algorithm The Viterbi algorithm is a dynamic programming algorithm for obtaining the maximum a posteriori probability estimate of the most likely sequence of hidden states—called the Viterbi path—that results in a sequence of observed events, especiall ...
, because the link costs correspond to the link weights in
Markov network In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said ...
s or
Bayesian network A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Ba ...
s.


Type theory

The Link Grammar link types can be understood to be the types in the sense of
type theory In mathematics, logic, and computer science, a type theory is the formal presentation of a specific type system, and in general type theory is the academic study of type systems. Some type theories serve as alternatives to set theory as a founda ...
. In effect, the Link Grammar can be used to model the internal language of certain (non-symmetric) compact closed categories, such as
pregroup grammar Pregroup grammar (PG) is a grammar formalism intimately related to categorial grammars. Much like categorial grammar (CG), PG is a kind of type logical grammar. Unlike CG, however, PG does not have a distinguished function type. Rather, PG uses in ...
s. In this sense, Link Grammar appears to be isomorphic or homomorphic to some
categorial grammar Categorial grammar is a family of formalisms in natural language syntax that share the central assumption that syntactic constituents combine as functions and arguments. Categorial grammar posits a close relationship between the syntax and seman ...
s. Thus, for example, in a categorial grammar the noun phrase "''the bad boy''" may be written as : whereas the corresponding disjuncts in Link Grammar would be the: D+; bad: A+; boy: D- & A-; The contraction rules (inference rules) of the Lambek calculus can be mapped to the connecting of connectors in Link Grammar. The + and - directional indicators correspond the forward and backward-slashes of the categorical grammar. Finally, the single-letter names A and D can be understood as labels or "easy-to-read" mnemonic names for the rather more verbose types ''NP/N'', etc. The primary distinction here is then that the categorical grammars have two
type constructor In the area of mathematical logic and computer science known as type theory, a type constructor is a feature of a typed formal language that builds new types from old ones. Basic types are considered to be built using nullary type constructors. S ...
s, the forward and backward slashes, that can be used to create new types (such as ''NP/N'') from base types (such as ''NP'' and ''N''). Link-grammar omits the use of type constructors, opting instead to define a much larger set of base types having compact, easy-to-remember mnemonics.


Examples


Example 1

A basic rule file for an SVO language might look like: D+; & S+; & O−; S− & ; Thus the English sentence, "The boy painted a picture" would appear as: +-----O-----+ +-D-+--S--+ +--D--+ , , , , , The boy painted a picture Similar parses apply for Chinese.


Example 2

Conversely, a rule file for a
null subject In linguistic typology, a null-subject language is a language whose grammar permits an independent clause to lack an explicit subject (grammar), subject; such a clause is then said to have a null subject. In the principles and parameters framew ...
SOV language might consist of the following links: S+; O+; & ; And a simple Persian sentence, ''man nAn xordam'' (من نان خوردم) 'I ate bread' would look like: +-----S-----+ , +--O--+ , , , man nAn xordam VSO order can be likewise accommodated, such as for Arabic.


Example 3 (Morphology)

In many languages with a concatenative morphology, the stem plays no grammatical role; the grammar is determined by the suffixes. Thus, in Russian, the sentence 'вверху плыли редкие облачка' might have the parse: +------------Wd-----------+---------------SIp---------------+ , +-------EI------+ +--------Api-------+ , , +--LLCZD-+ +-LLAQZ+ +--LLCAO-+ , , , , , , , , LEFT-WALL вверху.e плы.= =ли.vnndpp ре.= =дкие.api облачк.= =а.ndnpi The subscripts, such as '.vnndpp', are used to indicate the grammatical category. The primary links: Wd, EI, SIp and Api connect together the suffixes, as, in principle, other stems could appear here, without altering the structure of the sentence. The Api link indicates the adjective; SIp denotes subject-verb inversion; EI is a modifier. The Wd link is used to indicate the head noun; the head verb is not indicated in this sentence. The LLXXX links serve only to attach stems to suffixes.


Example 4 (Phonology)

The link-grammar can also indicate phonological agreement between neighboring words. For example: +---------Ost--------+ +------>WV------>+ +------Ds**x-----+ +----Wd---+-Ss*b-+ +--PHv-+----A----+ , , , , , , LEFT-WALL that.j-p is.v an abstract.a concept.n Here, the connector 'PH' is used to constrain the determiners that can appear before the word 'abstract'. It effectively blocks (makes it costly) to use the determiner 'a' in this sentence, while the link to 'an' becomes cheap. The other links are roughly as in previous examples: S denoting subject, O denoting object, D denoting determiner. The 'WV' link indicates the head verb, and the 'W' link indicates the head noun. The lower-case letters following the upper-case link types serve to refine the type; so for example, Ds can only connect to a singular noun; Ss only to a singular subject, Os to a singular object. The lower-case v in PHv denotes 'vowel'; the lower-case d in Wd denotes a declarative sentence.


Example 5 - Vietnamese

The
Vietnamese language Vietnamese ( vi, tiếng Việt, links=no) is an Austroasiatic language originating from Vietnam where it is the national and official language. Vietnamese is spoken natively by over 70 million people, several times as many as the rest of the ...
sentence "Bữa tiệc hôm qua là một thành công lớn" - "The party yesterday was a great success" may be parsed as follows:


Implementations

The link grammar syntax
parser Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lat ...
is a
library A library is a collection of materials, books or media that are accessible for use and not just for display purposes. A library provides physical (hard copies) or digital access (soft copies) materials, and may be a physical location or a vi ...
for
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to proc ...
written in C. It is available under the LGPL license. The parserAbiWord — Link Grammar Parser
/ref> is an ongoing project. Recent versions include improved sentence coverage, Russian, Persian and Arabic language support, prototypes for German, Hebrew, Lithuanian, Vietnamese and Turkish, and programming API's for Python,
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
,
Common LISP Common Lisp (CL) is a dialect of the Lisp programming language, published in ANSI standard document ''ANSI INCITS 226-1994 (S20018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived fr ...
,
AutoIt AutoIt is a freeware programming language for Microsoft Windows. In its earliest release, it was primarily intended to create automation scripts (sometimes called macros) for Microsoft Windows programs but has since grown to include enhancements ...
and
OCaml OCaml ( , formerly Objective Caml) is a general-purpose, multi-paradigm programming language Programming paradigms are a way to classify programming languages based on their features. Languages can be classified into multiple paradigms. ...
, with 3rd-party bindings for
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offic ...
,
Ruby A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called ...
and
JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of websites use JavaScript on the client side for webpage behavior, of ...
node.js Node.js is an open-source server environment. Node.js is cross-platform and runs on Windows, Linux, Unix, and macOS. Node.js is a back-end JavaScript runtime environment. Node.js runs on the V8 JavaScript Engine and executes JavaScript cod ...
. A current major undertaking is a project to learn the grammar and morphology of new languages, using unsupervised learning algorithms. The ''link-parser'' program along with rules and word lists for English may be found in standard
Linux distribution A Linux distribution (often abbreviated as distro) is an operating system made from a software collection that includes the Linux kernel and, often, a package management system. Linux users usually obtain their operating system by downloading one ...
s, e.g., as a
Debian Debian (), also known as Debian GNU/Linux, is a Linux distribution composed of free and open-source software, developed by the community-supported Debian Project, which was established by Ian Murdock on August 16, 1993. The first version of De ...
package, although many of these are years out of date.


Applications

AbiWord AbiWord () is a free and open-source software word processor. It is written in C++ and since version 3 it is based on GTK+ 3. The name "AbiWord" is derived from the root of the Spanish word "'' abierto''", meaning "open".Project MascoAbi the ...
, a free
word processor A word processor (WP) is a device or computer program that provides for input, editing, formatting, and output of text, often with some additional features. Early word processors were stand-alone devices dedicated to the function, but current ...
, uses Link Grammar for on-the-fly grammar checking. Words that cannot be linked anywhere are underlined in green. The semantic relationship extractor RelEx, layered on top of the Link Grammar library, generates a
dependency grammar Dependency grammar (DG) is a class of modern grammatical theories that are all based on the dependency relation (as opposed to the ''constituency relation'' of phrase structure) and that can be traced back primarily to the work of Lucien Tesnià ...
output by making explicit the semantic relationships between words in a sentence. Its output can be classified as being at a level between that of SSyntR and DSyntR of Meaning-Text Theory. It also provides framing/grounding,
anaphora resolution In linguistics, anaphora () is the use of an expression whose interpretation depends upon another expression in context (its antecedent or postcedent). In a narrower sense, anaphora is the use of an expression that depends specifically upon an ...
, head-word identification, lexical chunking, part-of-speech identification, and tagging, including entity, date, money, gender, etc. tagging. It includes a compatibility mode to generate dependency output compatible with the Stanford parser, and Penn
Treebank In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empiri ...
-compatible POS tagging. Link Grammar has also been employed for
information extraction Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. In most of the cases this activity concer ...
of biomedical texts and events described in news articles, as well as experimental
machine translation Machine translation, sometimes referred to by the abbreviation MT (not to be confused with computer-aided translation, machine-aided human translation or interactive translation), is a sub-field of computational linguistics that investigates ...
systems from English to German, Turkish, Indonesian. and Farsi. The Link Grammar link dictionary is used to generate and verify the syntactic correctness of three different
natural language generation Natural language generation (NLG) is a software process that produces natural language output. In one of the most widely-cited survey of NLG methods, NLG is characterized as "the subfield of artificial intelligence and computational linguistics th ...
systems: NLGen, NLGen2 and microplanner/surreal.Microplanner
an
Surface Realization (SuReal)
/ref> It is also used as a part of the NLP pipeline in the
OpenCog OpenCog is a project that aims to build an open source artificial intelligence framework. OpenCog Prime is an architecture for robot and virtual embodied cognition that defines a set of interacting components designed to give rise to human-equiv ...
AI project.


Notes


Further reading

* * *{{cite journal, author=Dennis Grinberg, author2=John Lafferty , author3=Daniel Sleator , date=September 1995, title=A robust parsing algorithm for link grammars, journal=Proceedings of the Fourth International Workshop on Parsing Technologies, arxiv=cmp-lg/9508003 , bibcode=1995cmp.lg....8003G , url=https://www.cs.cmu.edu/afs/cs.cmu.edu/project/link/pub/www/papers/ps/tr95-125.pdf


External links


The original Link Grammar homepage
(which has been replaced by th
current project
)

(for an older, out-of-date version; many bugs have been fixed since this version.)
BioLG
a modification of the Link Grammar Parser adapted for the biomedical domain (many, but not all, BioLG enhancements have been folded back into the main link-grammar distribution).
Parsing sentences with Link Grammar and Python
b
Jeff Elmore
a
PyCon 2012


Language extensions


Source package

Persian Link Grammar extensionRussian Link Grammar demonstrationTurkish Link Grammar extension developed as Master's degree thesis
Dependency grammar Natural language parsing