Definite clause grammar
   HOME

TheInfoList



OR:

A definite clause grammar (DCG) is a way of expressing grammar, either for
natural Nature, in the broadest sense, is the physical world or universe. "Nature" can refer to the phenomena of the physical world, and also to life in general. The study of nature is a large, if not the only, part of science. Although humans are p ...
or formal languages, in a logic programming language such as
Prolog Prolog is a logic programming language associated with artificial intelligence and computational linguistics. Prolog has its roots in first-order logic, a formal logic, and unlike many other programming languages, Prolog is intended primarily ...
. It is closely related to the concept of
attribute grammar An attribute grammar is a formal way to supplement a formal grammar with semantic information processing. Semantic information is stored in attributes associated with terminal and nonterminal symbols of the grammar. The values of attributes are resu ...
s /
affix grammar An affix grammar is a kind of formal grammar; it is used to describe the Syntax (programming languages), syntax of languages, mainly computer languages, using an approach based on how natural language is typically described.Koster, Cornelis HA.Affi ...
s from which Prolog was originally developed. DCGs are usually associated with Prolog, but similar languages such as
Mercury Mercury commonly refers to: * Mercury (planet), the nearest planet to the Sun * Mercury (element), a metallic chemical element with the symbol Hg * Mercury (mythology), a Roman god Mercury or The Mercury may also refer to: Companies * Merc ...
also include DCGs. They are called definite clause grammars because they represent a grammar as a set of definite clauses in
first-order logic First-order logic—also known as predicate logic, quantificational logic, and first-order predicate calculus—is a collection of formal systems used in mathematics, philosophy, linguistics, and computer science. First-order logic uses quantifie ...
. The term DCG refers to the specific type of expression in Prolog and other similar languages; not all ways of expressing grammars using definite clauses are considered DCGs. However, all of the capabilities or properties of DCGs will be the same for any grammar that is represented with definite clauses in essentially the same way as in Prolog. The definite clauses of a DCG can be considered a set of axioms where the validity of a sentence, and the fact that it has a certain parse tree can be considered theorems that follow from these axioms. This has the advantage of making it so that recognition and parsing of expressions in a language becomes a general matter of proving statements, such as statements in a logic programming language.


History

The history of DCGs is closely tied to the history of Prolog, and the history of Prolog revolves around several researchers in both Marseille, France, and Edinburgh, Scotland. According to
Robert Kowalski Robert Anthony Kowalski (born 15 May 1941) is an American-British logician and computer scientist, whose research is concerned with developing both human-oriented models of computing and computational models of human thinking. He has spent mo ...
, an early developer of Prolog, the first Prolog system was developed in 1972 by
Alain Colmerauer Alain Colmerauer (24 January 1941 – 12 May 2017) was a French computer scientist. He was a professor at Aix-Marseille University, and the creator of the logic programming language Prolog. Early life Alain Colmerauer was born on 24 January 194 ...
and Phillipe Roussel. The first program written in the language was a large natural-language processing system. Fernando Pereira and David Warren at the University of Edinburgh were also involved in the early development of Prolog. Colmerauer had previously worked on a language processing system called Q-systems that was used to translate between English and French. In 1978, Colmerauer wrote a paper about a way of representing grammars called metamorphosis grammars which were part of the early version of Prolog called Marseille Prolog. In this paper, he gave a formal description of metamorphosis grammars and some examples of programs that use them. Fernando Pereira and David Warren, two other early architects of Prolog, coined the term "definite clause grammar" and created the notation for DCGs that is used in Prolog today. They gave credit for the idea to Colmerauer and Kowalski, and they note that DCGs are a special case of Colmerauer's metamorphosis grammars. They introduced the idea in an article called "Definite Clause Grammars for Language Analysis", where they describe DCGs as a "formalism ... in which grammars are expressed clauses of first-order predicate logic" that "constitute effective programs of the programming language Prolog". Pereira, Warren, and other pioneers of Prolog later wrote about several other aspects of DCGs. Pereira and Warren wrote an article called "Parsing as Deduction", describing things such as how the Earley Deduction proof procedure is used for parsing. Pereira also collaborated with Stuart M. Shieber on a book called "Prolog and Natural Language Analysis", that was intended as a general introduction to
computational linguistics Computational linguistics is an Interdisciplinarity, interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, comput ...
using logic programming.


Example

A basic example of DCGs helps to illustrate what they are and what they look like. sentence --> noun_phrase, verb_phrase. noun_phrase --> det, noun. verb_phrase --> verb, noun_phrase. det --> he det --> noun --> at noun --> at verb --> ats This generates sentences such as "the cat eats the bat", "a bat eats the cat". One can generate all of the valid expressions in the language generated by this grammar at a Prolog interpreter by typing sentence(X,[]). Similarly, one can test whether a sentence is valid in the language by typing something like sentence([the,bat,eats,the,bat],[]).


Translation into definite clauses

DCG notation is just
syntactic sugar In computer science, syntactic sugar is syntax within a programming language that is designed to make things easier to read or to express. It makes the language "sweeter" for human use: things can be expressed more clearly, more concisely, or in an ...
for normal definite clauses in Prolog. For example, the previous example could be translated into the following: sentence(A,Z) :- noun_phrase(A,B), verb_phrase(B,Z). noun_phrase(A,Z) :- det(A,B), noun(B,Z). verb_phrase(A,Z) :- verb(A,B), noun_phrase(B,Z). det( X X). det( X X). noun( X X). noun( X X). verb( X X).


Difference lists

The arguments to each functor, such as (A,B) and (B,Z) are
difference list Difference, The Difference, Differences or Differently may refer to: Music * ''Difference'' (album), by Dreamtale, 2005 * ''Differently'' (album), by Cassie Davis, 2009 ** "Differently" (song), by Cassie Davis, 2009 * ''The Difference'' (al ...
s; difference lists are a way of representing a prefix of a list as the difference between its two suffixes (the bigger including the smaller one). Using Prolog's notation for lists, a singleton list prefix P = /code> can be seen as the difference between X/code> and X, and thus represented with the pair ( XX), for instance. Saying that P is the difference between A and B is the same as saying that append(P,B,A) holds. Or in the case of the previous example, append( X, X. Difference lists are used to represent lists with DCGs for reasons of efficiency. It is much more efficient to concatenate list differences (prefixes), in the circumstances that they can be used, because the concatenation of (A,B) and (B,Z) is just (A,Z). Indeed, append(P,B,A), append(Q,Z,B) entails append(P,Q,S), append(S,Z,A). This is the same as saying that list concatenation is ''associative'': A = P + B = P + (Q + Z) = (P + Q) + Z = S + Z = A


Non-context-free grammars

In pure Prolog, normal DCG rules with no extra arguments on the functors, such as the previous example, can only express
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form :A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empt ...
s; there is only one argument on the left side of the
production Production may refer to: Economics and business * Production (economics) * Production, the act of manufacturing goods * Production, in the outline of industrial organization, the act of making products (goods and services) * Production as a stati ...
. However,
context-sensitive grammar A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols. Context-sensitive grammars are more general than co ...
s can also be expressed with DCGs, by providing extra arguments, such as in the following example: s --> a(N), b(N), c(N). a(0) --> []. a(M) --> [a], a(N), . b(0) --> []. b(M) --> [b], b(N), . c(0) --> []. c(M) --> [c], c(N), . This set of DCG rules describes the grammar which generates the language that consists of strings of the form a^b^c^. s --> symbols(Sem,a), symbols(Sem,b), symbols(Sem,c). symbols(end,_) --> []. symbols(s(Sem),S) --> [S], symbols(Sem,S). This set of DCG rules describes the grammar which generates the language that consists of strings of the form a^b^c^, by structurally representing


Representing features

Various linguistic
feature Feature may refer to: Computing * Feature (CAD), could be a hole, pocket, or notch * Feature (computer vision), could be an edge, corner or blob * Feature (software design) is an intentional distinguishing characteristic of a software item ...
s can also be represented fairly concisely with DCGs by providing extra arguments to the functors. For example, consider the following set of DCG rules: sentence --> pronoun(subject), verb_phrase. verb_phrase --> verb, pronoun(object). pronoun(subject) --> e pronoun(subject) --> he pronoun(object) --> im pronoun(object) --> er verb --> ikes This grammar allows sentences like "he likes her" and "he likes him", but ''not'' "her likes he" and "him likes him".


Parsing with DCGs

The main practical use of a DCG is to parse sentences of the given grammar, i.e. to construct a parse tree. This can be done by providing "extra arguments" to the functors in the DCG, like in the following rules: sentence(s(NP,VP)) --> noun_phrase(NP), verb_phrase(VP). noun_phrase(np(D,N)) --> det(D), noun(N). verb_phrase(vp(V,NP)) --> verb(V), noun_phrase(NP). det(d(the)) --> he det(d(a)) --> noun(n(bat)) --> at noun(n(cat)) --> at verb(v(eats)) --> ats One can now query the interpreter to yield a parse tree of any given sentence: , ?- sentence(Parse_tree, he,bat,eats,a,cat []). Parse_tree = s(np(d(the),n(bat)),vp(v(eats),np(d(a),n(cat)))) ? ;


Other uses

DCGs can serve as a convenient syntactic sugar to hide certain parameters in code in other places besides parsing applications. In the declaratively pure programming language
Mercury Mercury commonly refers to: * Mercury (planet), the nearest planet to the Sun * Mercury (element), a metallic chemical element with the symbol Hg * Mercury (mythology), a Roman god Mercury or The Mercury may also refer to: Companies * Merc ...
I/O must be represented by a pair of io.state arguments. DCG notation can be used to make using I/O more convenient, although state variable notation is usually preferred. DCG notation is also used for parsing and similar things in Mercury as it is in Prolog.


Extensions

Since DCGs were introduced by Pereira and Warren, several extensions have been proposed. Pereira himself proposed an extension called extraposition grammars (XGs). This formalism was intended in part to make it easier to express certain grammatical phenomena, such as left-extraposition. Pereira states, "The difference between XG rules and DCG rules is then that the left-hand side of an XG rule may contain several symbols." This makes it easier to express rules for context-sensitive grammars. Peter Van Roy extended DCGs to allow multiple accumulators. Another, more recent, extension was made by researchers at NEC Corporation called Multi-Modal Definite Clause Grammars (MM-DCGs) in 1995. Their extensions were intended to allow the recognizing and parsing expressions that include non-textual parts such as pictures. Another extension, called definite clause translation grammars (DCTGs) was described by Harvey Abramson in 1984. DCTG notation looks very similar to DCG notation; the major difference is that one uses ::= instead of --> in the rules. It was devised to handle grammatical attributes conveniently. The translation of DCTGs into normal Prolog clauses is like that of DCGs, but 3 arguments are added instead of 2.


See also

*
Chomsky hierarchy In formal language theory, computer science and linguistics, the Chomsky hierarchy (also referred to as the Chomsky–Schützenberger hierarchy) is a containment hierarchy of classes of formal grammars. This hierarchy of grammars was described by ...
*
Context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form :A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empt ...
*
Natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
*
Phrase structure grammar The term phrase structure grammar was originally introduced by Noam Chomsky as the term for grammar studied previously by Emil Post and Axel Thue (Post canonical systems). Some authors, however, reserve the term for more restricted grammars in the ...


Notes


External links


NLP with Prolog

Context-free grammars and DCGs



Definite Clause Grammars for Language Analysis
{{Parsing algorithms Formal languages Logic programming Parsing