Generalized context-free grammar (GCFG) is a grammar formalism that expands on
context-free grammars
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form
:A\ \to\ \alpha
with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empt ...
by adding potentially non-context-free composition functions to rewrite rules.
Head grammar
Head grammar (HG) is a grammar formalism introduced in Carl Pollard (1984) Pollard, C. 1984. ''Generalized Phrase Structure Grammars, Head Grammars, and Natural Language''. Ph.D. thesis, Stanford University, CA. as an extension of the context-f ...
(and its weak equivalents) is an instance of such a GCFG which is known to be especially adept at handling a wide variety of non-CF properties of natural language.
Description
A GCFG consists of two components: a set of composition functions that combine string tuples, and a set of rewrite rules. The composition functions all have the form
, where
is either a single string tuple, or some use of a (potentially different) composition function which reduces to a string tuple. Rewrite rules look like
, where
,
, ... are string tuples or non-terminal symbols.
The rewrite semantics of GCFGs is fairly straightforward. An occurrence of a non-terminal symbol is rewritten using rewrite rules as in a context-free grammar, eventually yielding just compositions (composition functions applied to string tuples or other compositions). The composition functions are then applied, successively reducing the tuples to a single tuple.
Example
A simple translation of a context-free grammar into a GCFG can be performed in the following fashion. Given the grammar in (), which generates the palindrome language
, where
is the string reverse of
, we can define the composition function ''conc'' as in () and the rewrite rules as in ().
The CF production of ' is
:
:
:
:
:
and the corresponding GCFG production is
:
:
:
:
:
:
:
:
Linear Context-free Rewriting Systems (LCFRSs)
Weir (1988)
describes two properties of composition functions, linearity and regularity. A function defined as
is linear if and only if each variable appears at most once on either side of the ''='', making
linear but not
. A function defined as
is regular if the left hand side and right hand side have exactly the same variables, making
regular but not
or
.
A grammar in which all composition functions are both linear and regular is called a Linear Context-free Rewriting System (LCFRS). LCFRS is a
proper subclass
In set theory and its applications throughout mathematics, a class is a collection of sets (or sometimes other mathematical objects) that can be unambiguously defined by a property that all its members share. Classes act as a way to have set-lik ...
of the GCFGs, i.e. it has strictly less computational power than the GCFGs as a whole.
On the other hand, LCFRSs are strictly more expressive than
linear-indexed grammar Indexed grammars are a generalization of context-free grammars in that nonterminals are equipped with lists of ''flags'', or ''index symbols''.
The language produced by an indexed grammar is called an indexed language.
Definition
Modern definitio ...
s and their
weakly equivalent variant
tree adjoining grammar
Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi. Tree-adjoining grammars are somewhat similar to context-free grammars, but the elementary unit of rewriting is the tree rather than the symbol. Whereas context-free gr ...
s (TAGs).
Head grammar
Head grammar (HG) is a grammar formalism introduced in Carl Pollard (1984) Pollard, C. 1984. ''Generalized Phrase Structure Grammars, Head Grammars, and Natural Language''. Ph.D. thesis, Stanford University, CA. as an extension of the context-f ...
is another example of an LCFRS that is strictly less powerful than the class of LCFRSs as a whole.
LCFRS are weakly equivalent to (set-local) ''multicomponent'' TAGs (
MCTAGs) and also with
multiple context-free grammar
Multiple may refer to:
Economics
*Multiple finance, a method used to analyze stock prices
*Multiples of the price-to-earnings ratio
*Chain stores, are also referred to as 'Multiples'
* Box office multiple, the ratio of a film's total gross to th ...
(MCFG
.
and
minimalist grammars (MGs). The languages generated by LCFRS (and their weakly equivalents) can be parsed in
polynomial time.
See also
*
Range concatenation grammar Range concatenation grammar (RCG) is a grammar formalism developed by Pierre Boullier in 1998 as an attempt to characterize a number of phenomena of natural language, such as Chinese numbers and German word order scrambling, which are outside the b ...
References
{{DEFAULTSORT:Generalized Context-Free Grammar
Formal languages
Grammar frameworks