HOME

TheInfoList



OR:

Categorial grammar is a family of formalisms in
natural language In neuropsychology, linguistics, and philosophy of language, a natural language or ordinary language is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation. Natural languages ...
syntax In linguistics, syntax () is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure ( constituenc ...
that share the central assumption that syntactic constituents combine as functions and
arguments An argument is a statement or group of statements called premises intended to determine the degree of truth or acceptability of another statement called conclusion. Arguments can be studied from three main perspectives: the logical, the dialectic ...
. Categorial grammar posits a close relationship between the syntax and semantic composition, since it typically treats syntactic categories as corresponding to semantic types. Categorial grammars were developed in the 1930s by Kazimierz Ajdukiewicz and in the 1950s by
Yehoshua Bar-Hillel Yehoshua Bar-Hillel ( he, יהושע בר-הלל; 8 September 1915, in Vienna – 25 September 1975, in Jerusalem) was an Israeli philosopher, mathematician, and linguist. He was a pioneer in the fields of machine translation and formal linguis ...
and
Joachim Lambek Joachim "Jim" Lambek (5 December 1922 – 23 June 2014) was a German-born Canadian mathematician. He was Peter Redpath Emeritus Professor of Pure Mathematics at McGill University, where he earned his PhD degree in 1950 with Hans Zassenhaus ...
. It saw a surge of interest in the 1970s following the work of
Richard Montague Richard Merritt Montague (September 20, 1930 – March 7, 1971) was an American mathematician and philosopher who made contributions to mathematical logic and the philosophy of language. He is known for proposing Montague grammar to formaliz ...
, whose
Montague grammar __notoc__ Montague grammar is an approach to natural language semantics, named after American logician Richard Montague. The Montague grammar is based on mathematical logic, especially higher-order predicate logic and lambda calculus, and makes ...
assumed a similar view of syntax. It continues to be a major paradigm, particularly within formal semantics.


Basics

A categorial grammar consists of two parts: a lexicon, which assigns a set of types (also called categories) to each basic symbol, and some
type inference Type inference refers to the automatic detection of the type of an expression in a formal language. These include programming languages and mathematical type systems, but also natural languages in some branches of computer science and linguistic ...
rules, which determine how the type of a string of symbols follows from the types of the constituent symbols. It has the advantage that the type inference rules can be fixed once and for all, so that the specification of a particular language grammar is entirely determined by the lexicon. A categorial grammar shares some features with the
simply typed lambda calculus The simply typed lambda calculus (\lambda^\to), a form of type theory, is a typed interpretation of the lambda calculus with only one type constructor (\to) that builds function types. It is the canonical and simplest example of a typed lambda c ...
. Whereas the
lambda calculus Lambda calculus (also written as ''λ''-calculus) is a formal system in mathematical logic for expressing computation based on function abstraction and application using variable binding and substitution. It is a universal model of computation th ...
has only one function type A \rightarrow B, a categorial grammar typically has two function types, one type that is applied on the left, and one on the right. For example, a simple categorial grammar might have two function types B/A\,\! and A\backslash B. The first, B/A\,\!, is the type of a phrase that results in a phrase of type B\,\! when followed (on the right) by a phrase of type A\,\!. The second, A\backslash B\,\!, is the type of a phrase that results in a phrase of type B\,\! when preceded (on the left) by a phrase of type A\,\!. The notation is based upon algebra. A fraction when multiplied by (i.e.
concatenated In formal language theory and computer programming, string concatenation is the operation of joining character strings end-to-end. For example, the concatenation of "snow" and "ball" is "snowball". In certain formalisations of concatenat ...
with) its denominator yields its numerator. As concatenation is not
commutative In mathematics, a binary operation is commutative if changing the order of the operands does not change the result. It is a fundamental property of many binary operations, and many mathematical proofs depend on it. Most familiar as the name of ...
, it makes a difference whether the denominator occurs to the left or right. The concatenation must be on the same side as the denominator for it to cancel out. The first and simplest kind of categorial grammar is called a basic categorial grammar, or sometimes an AB-grammar (after
Ajdukiewicz Kazimierz Ajdukiewicz (12 December 1890 – 12 April 1963) was a Polish philosopher and logician, a prominent figure in the Lwów–Warsaw school of logic. He originated many novel ideas in semantics. Among these was categorial grammar, a highl ...
and Bar-Hillel). Given a set of primitive types \text\,\!, let \text(\text)\,\! be the set of types constructed from primitive types. In the basic case, this is the least set such that \text\subseteq \text(\text) and if X, Y\in \text(\text) then (X/Y), (Y\backslash X) \in \text(\text). Think of these as purely formal expressions freely generated from the primitive types; any semantics will be added later. Some authors assume a fixed infinite set of primitive types used by all grammars, but by making the primitive types part of the grammar, the whole construction is kept finite. A basic categorial grammar is a tuple (\Sigma, \text, S, \triangleleft) where \Sigma\,\! is a finite set of symbols, \text\,\! is a finite set of primitive types, and S \in \text(\text). The relation \triangleleft is the lexicon, which relates types to symbols (\triangleleft) \subseteq \text(\text) \times \Sigma. Since the lexicon is finite, it can be specified by listing a set of pairs like TYPE\triangleleft\text. Such a grammar for English might have three basic types (N,NP, \text S)\,\!, assigning
count noun In linguistics, a count noun (also countable noun) is a noun that can be modified by a quantity and that occurs in both singular and plural forms, and that can co-occur with quantificational determiners like ''every'', ''each'', ''several'', e ...
s the type N\,\!, complete noun phrases the type NP\,\!, and sentences the type S\,\!. Then an
adjective In linguistics, an adjective ( abbreviated ) is a word that generally modifies a noun or noun phrase or describes its referent. Its semantic role is to change information given by the noun. Traditionally, adjectives were considered one of the ...
could have the type N/N\,\!, because if it is followed by a noun then the whole phrase is a noun. Similarly, a
determiner A determiner, also called determinative ( abbreviated ), is a word, phrase, or affix that occurs together with a noun or noun phrase and generally serves to express the reference of that noun or noun phrase in the context. That is, a determine ...
has the type NP/N\,\!, because it forms a complete noun phrase when followed by a noun. Intransitive
verb A verb () is a word ( part of speech) that in syntax generally conveys an action (''bring'', ''read'', ''walk'', ''run'', ''learn''), an occurrence (''happen'', ''become''), or a state of being (''be'', ''exist'', ''stand''). In the usual descr ...
s have the type NP\backslash S, and transitive verbs the type (NP\backslash S)/NP. Then a string of words is a sentence if it has overall type S\,\!. For example, take the string "the bad boy made that mess". Now "the" and "that" are determiners, "boy" and "mess" are nouns, "bad" is an adjective, and "made" is a transitive verb, so the lexicon is . and the sequence of types in the string is now find functions and appropriate arguments and reduce them according to the two
inference rule In the philosophy of logic, a rule of inference, inference rule or transformation rule is a logical form consisting of a function which takes premises, analyzes their syntax, and returns a conclusion (or conclusions). For example, the rule of ...
s X\leftarrow X/Y,\; Y and X\leftarrow Y,\; Y\backslash X: .\qquad NP/N,\; N/N,\; N,\; (NP\backslash S)/NP,\; \underbrace
.\qquad NP/N,\; N/N,\; N,\; \underbrace
.\qquad NP/N,\; \underbrace, \qquad (NP\backslash S)
.\qquad \underbrace,\; \qquad (NP\backslash S)
.\qquad \qquad\underbrace
.\qquad \qquad\qquad\quad\;\;\; S The fact that the result is S\,\! means that the string is a sentence, while the sequence of reductions shows that it can be parsed as ((the (bad boy)) (made (that mess))). Categorial grammars of this form (having only function application rules) are equivalent in generative capacity to
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form :A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be em ...
s and are thus often considered inadequate for theories of natural language syntax. Unlike CFGs, categorial grammars are lexicalized, meaning that only a small number of (mostly language-independent) rules are employed, and all other syntactic phenomena derive from the lexical entries of specific words. Another appealing aspect of categorial grammars is that it is often easy to assign them a compositional semantics, by first assigning
interpretation type Interpretation may refer to: Culture * Aesthetic interpretation, an explanation of the meaning of a work of art * Allegorical interpretation, an approach that assumes a text should not be interpreted literally * Dramatic Interpretation, an event ...
s to all the basic categories, and then associating all the derived categories with appropriate function types. The interpretation of any constituent is then simply the value of a function at an argument. With some modifications to handle intensionality and quantification, this approach can be used to cover a wide variety of semantic phenomena.


Lambek calculus

A Lambek grammar is an elaboration of this idea that has a concatenation operator for types, and several other inference rules. Mati Pentus has shown that these still have the generative capacity of context-free grammars. For the Lambek calculus, there is a type concatenation operator \star, so that \text\subseteq \text(\text) and if X, Y\in \text(\text) then (X/Y), (X\backslash Y), (X\star Y)\in \text(\text). The Lambek calculus consists of several deduction rules, which specify how type inclusion assertions can be derived. In the following rules, upper case roman letters stand for types, upper case Greek letters stand for sequences of types. A sequent of the form X \leftarrow \Gamma can be read: a string is of type if it consists of the concatenation of strings of each of the types in . If a type is interpreted as a set of strings, then the ← may be interpreted as ⊇, that is, "includes as a subset". A horizontal line means that the inclusion above the line implies the one below the line. The process is begun by the Axiom rule, which has no antecedents and just says that any type includes itself. : \text\quad The Cut rule says that inclusions can be composed. : \text \quad The other rules come in pairs, one pair for each type construction operator, each pair consisting of one rule for the operator in the target, one in the source, of the arrow. The name of a rule consists of the operator and an arrow, with the operator on the side of the arrow on which it occurs in the conclusion. : For an example, here is a derivation of "type raising", which says that (B/A)\backslash B \leftarrow A. The names of rules and the substitutions used are to the right. : \dfrac \qquad \begin \mbox\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\\ \\ \qquad\qquad\qquad\\ \end


Relation to context-free grammars

Recall that a
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form :A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be em ...
is a 4-tuple G = (V,\, \Sigma,\, ::=,\, S) where # V\, is a finite set of ''non-terminals'' or ''variables''. # \Sigma\, is a finite set of ''terminal symbols''. # ::=\, is a finite set of production rules, that is, a finite relation (::=)\subseteq V \times (V \cup \Sigma)^*. # S\, is the start variable. From the point of view of categorial grammars, a context-free grammar can be seen as a calculus with a set of special purpose axioms for each language, but with no type construction operators and no inference rules except Cut. Specifically, given a context-free grammar as above, define a categorial grammar (\text,\, \Sigma,\, \triangleleft,\, S) where \text=V\cup\Sigma, and \text(\text)=\text\,\!. Let there be an axiom for every symbol x \in V\cup\Sigma, an axiom for every production rule X ::= \Gamma\,\!, a lexicon entry for every terminal symbol s \in \Sigma, and Cut for the only rule. This categorial grammar generates the same language as the given CFG. Of course, this is not a basic categorial grammar, since it has special axioms that depend upon the language; i.e. it is not lexicalized. Also, it makes no use at all of non-primitive types. To show that any context-free language can be generated by a basic categorial grammar, recall that any context-free language can be generated by a context-free grammar in
Greibach normal form In formal language theory, a context-free grammar is in Greibach normal form (GNF) if the right-hand sides of all production rules start with a terminal symbol, optionally followed by some variables. A non-strict form allows one exception to this f ...
. The grammar is in Greibach normal form if every production rule is of the form A ::= s A_0 \ldots A_, where capital letters are variables, s \in \Sigma, and N\ge 0, that is, the right side of the production is a single terminal symbol followed by zero or more (non-terminal) variables. Now given a CFG in Greibach normal form, define a basic categorial grammar with a primitive type for each non-terminal variable \text=V\,\!, and with an entry in the lexicon A/A_/ \ldots /A_0 \triangleleft s , for each production rule A ::= s A_0 \ldots A_. It is fairly easy to see that this basic categorial grammar generates the same language as the original CFG. Note that the lexicon of this grammar will generally assign multiple types to each symbol. The same construction works for Lambek grammars, since they are an extension of basic categorial grammars. It is necessary to verify that the extra inference rules do not change the generated language. This can be done and shows that every context-free language is generated by some Lambek grammar. To show the converse, that every language generated by a Lambek grammar is context-free, is much more difficult. It was an open problem for nearly thirty years, from the early 1960s until about 1991 when it was proven by Pentus. The basic idea is, given a Lambek grammar, (\text,\, \Sigma,\, \triangleleft,\, S) construct a context-free grammar (V,\, \Sigma,\, ::=,\, S) with the same set of terminal symbols, the same start symbol, with variables some (not all) types V\subseteq \text(\text)\,\!, and with a production rule T::=\text\,\! for each entry T\triangleleft\text in the lexicon, and production rules T::=\Gamma\,\! for certain sequents T\leftarrow\Gamma that are derivable in the Lambek calculus. Of course, there are infinitely many types and infinitely many derivable sequents, so in order to make a finite grammar it is necessary put a bound on the size of the types and sequents that are needed. The heart of Pentus's proof is to show that there is such a finite bound.


Notation

The notation in this field is not standardized. The notations used in formal language theory, logic,
category theory Category theory is a general theory of mathematical structures and their relations that was introduced by Samuel Eilenberg and Saunders Mac Lane in the middle of the 20th century in their foundational work on algebraic topology. Nowadays, ca ...
, and linguistics, conflict with each other. In logic, arrows point to the more general from the more particular, that is, to the conclusion from the hypotheses. In this article, this convention is followed, i.e. the target of the arrow is the more general (inclusive) type. In logic, arrows usually point left to right. In this article this convention is reversed for consistency with the notation of context-free grammars, where the single non-terminal symbol is always on the left. We use the symbol ::= in a production rule as in
Backus–Naur form In computer science, Backus–Naur form () or Backus normal form (BNF) is a metasyntax notation for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document format ...
. Some authors use an arrow, which unfortunately may point in either direction, depending on whether the grammar is thought of as generating or recognizing the language. Some authors on categorial grammars write B\backslash A instead of A\backslash B. The convention used here follows Lambek and algebra.


Historical notes

The basic ideas of categorial grammar date from work by Kazimierz Ajdukiewicz (in 1935) and
Yehoshua Bar-Hillel Yehoshua Bar-Hillel ( he, יהושע בר-הלל; 8 September 1915, in Vienna – 25 September 1975, in Jerusalem) was an Israeli philosopher, mathematician, and linguist. He was a pioneer in the fields of machine translation and formal linguis ...
(in 1953). In 1958,
Joachim Lambek Joachim "Jim" Lambek (5 December 1922 – 23 June 2014) was a German-born Canadian mathematician. He was Peter Redpath Emeritus Professor of Pure Mathematics at McGill University, where he earned his PhD degree in 1950 with Hans Zassenhaus ...
introduced a syntactic calculus that formalized the function
type constructors In the area of mathematical logic and computer science known as type theory, a type constructor is a feature of a typed formal language that builds new types from old ones. Basic types are considered to be built using nullary type constructors. S ...
along with various rules for the combination of functions. This calculus is a forerunner of
linear logic Linear logic is a substructural logic proposed by Jean-Yves Girard as a refinement of classical and intuitionistic logic, joining the dualities of the former with many of the constructive properties of the latter. Although the logic has also ...
in that it is a substructural logic.
Montague grammar __notoc__ Montague grammar is an approach to natural language semantics, named after American logician Richard Montague. The Montague grammar is based on mathematical logic, especially higher-order predicate logic and lambda calculus, and makes ...
uses an ad hoc syntactic system for English that is based on the principles of categorial grammar. Although Montague's work is sometimes regarded as syntactically uninteresting, it helped to bolster interest in categorial grammar by associating it with a highly successful formal treatment of natural language
semantics Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and comput ...
. More recent work in categorial grammar has focused on the improvement of syntactic coverage. One formalism that has received considerable attention in recent years is Steedman and Szabolcsi's
combinatory categorial grammar Combinatory categorial grammar (CCG) is an efficiently parsable, yet linguistically expressive grammar formalism. It has a transparent interface between surface syntax and underlying semantic representation, including predicate–argument structur ...
, which builds on
combinatory logic Combinatory logic is a notation to eliminate the need for quantified variables in mathematical logic. It was introduced by Moses Schönfinkel and Haskell Curry, and has more recently been used in computer science as a theoretical model of com ...
invented by Moses Schönfinkel and
Haskell Curry Haskell Brooks Curry (; September 12, 1900 – September 1, 1982) was an American mathematician and logician. Curry is best known for his work in combinatory logic. While the initial concept of combinatory logic was based on a single paper by ...
. There are a number of related formalisms of this kind in linguistics, such as
type logical grammar Type may refer to: Science and technology Computing * Typing, producing text via a keyboard, typewriter, etc. * Data type, collection of values used for computations. * File type * TYPE (DOS command), a command to display contents of a file. * Typ ...
and
abstract categorial grammar Abstract may refer to: * ''Abstract'' (album), 1962 album by Joe Harriott * Abstract of title a summary of the documents affecting title to parcel of land * Abstract (law), a summary of a legal document * Abstract (summary), in academic publishi ...
.


Some definitions

;Derivation: A derivation is a binary tree that encodes a proof. ;Parse tree: A parse tree displays a derivation, showing the syntactic structure of a sentence. ;Functor and argument: In a right (left) function application, the node of the type A\B (B/A) is called the functor, and the node of the type A is called an argument. ;Functor–argument structure


Refinements of categorial grammar

A variety of changes to categorial grammar have been proposed to improve syntactic coverage. Some of the most common are listed below.


Features and subcategories

Most systems of categorial grammar subdivide categories. The most common way to do this is by tagging them with
features Feature may refer to: Computing * Feature (CAD), could be a hole, pocket, or notch * Feature (computer vision), could be an edge, corner or blob * Feature (software design) is an intentional distinguishing characteristic of a software ite ...
, such as
person A person ( : people) is a being that has certain capacities or attributes such as reason, morality, consciousness or self-consciousness, and being a part of a culturally established form of social relations such as kinship, ownership of prope ...
,
gender Gender is the range of characteristics pertaining to femininity and masculinity and differentiating between them. Depending on the context, this may include sex-based social structures (i.e. gender roles) and gender identity. Most culture ...
,
number A number is a mathematical object used to count, measure, and label. The original examples are the natural numbers 1, 2, 3, 4, and so forth. Numbers can be represented in language with number words. More universally, individual number ...
, and tense. Sometimes only atomic categories are tagged in this way. In Montague grammar, it is traditional to subdivide function categories using a multiple slash convention, so ''A/B'' and ''A//B'' would be two distinct categories of left-applying functions, that took the same arguments but could be distinguished between by other functions taking them as arguments.


Function composition

Rules of function composition are included in many categorial grammars. An example of such a rule would be one that allowed the concatenation of a constituent of type ''A/B'' with one of type ''B/C'' to produce a new constituent of type ''A/C''. The semantics of such a rule would simply involve the composition of the functions involved. Function composition is important in categorial accounts of
conjunction Conjunction may refer to: * Conjunction (grammar), a part of speech * Logical conjunction, a mathematical operator ** Conjunction introduction, a rule of inference of propositional logic * Conjunction (astronomy), in which two astronomical bodies ...
and extraction, especially as they relate to phenomena like
right node raising In linguistics, the term right node raising (RNR) denotes a sharing mechanism that sees the material to the immediate right of parallel structures being in some sense "shared" by those parallel structures, e.g. '' am likesbut red dislikesthe debat ...
. The introduction of function composition into a categorial grammar leads to many kinds of derivational ambiguity that are vacuous in the sense that they do not correspond to semantic ambiguities.


Conjunction

Many categorial grammars include a typical conjunction rule, of the general form ''X CONJ X → X'', where ''X'' is a category. Conjunction can generally be applied to nonstandard constituents resulting from type raising or function composition..


Discontinuity

The grammar is extended to handle linguistic phenomena such as discontinuous idioms, gapping and extraction.


See also

*
Combinatory categorial grammar Combinatory categorial grammar (CCG) is an efficiently parsable, yet linguistically expressive grammar formalism. It has a transparent interface between surface syntax and underlying semantic representation, including predicate–argument structur ...
* Link grammar *
Noncommutative logic Noncommutative logic is an extension of linear logic which combines the commutative connectives of linear logic with the noncommutative multiplicative connectives of the Lambek calculus. Its sequent calculus relies on the structure of order varieti ...
*
Pregroup Grammar Pregroup grammar (PG) is a grammar formalism intimately related to categorial grammars. Much like categorial grammar (CG), PG is a kind of type logical grammar. Unlike CG, however, PG does not have a distinguished function type. Rather, PG uses in ...
*
Scope Scope or scopes may refer to: People with the surname * Jamie Scope (born 1986), English footballer * John T. Scopes (1900–1970), central figure in the Scopes Trial regarding the teaching of evolution Arts, media, and entertainment * Cinema ...
*
Type shifter Type may refer to: Science and technology Computing * Typing, producing text via a keyboard, typewriter, etc. * Data type, collection of values used for computations. * File type * TYPE (DOS command), a command to display contents of a file. * ...


References

* * * * * * * * * * *


Further reading

* Michael Moortgat, ''Categorial Type Logics'', Chapter 2 in J. van Benthem and A. ter Meulen (eds.) ''Handbook of Logic and Language''. Elsevier, 1997, * Wojciech Buszkowski, ''Mathematical linguistics and proof theory'', Chapter 12 in J. van Benthem and A. ter Meulen (eds.) ''Handbook of Logic and Language''. Elsevier, 1997, * * *


External links


Grammar, categorial
at Springer Encyclopaedia of Mathematics * http://plato.stanford.edu/entries/typelogical-grammar/ {{Formal semantics Grammar frameworks Formal languages Computational linguistics Type theory Semantics