Range concatenation grammar (RCG) is a grammar formalism developed by Pierre Boullier
in 1998 as an attempt to characterize a number of phenomena of natural language, such as Chinese numbers and German word order scrambling, which are outside the bounds of the
mildly context-sensitive language In computational linguistics, the term mildly context-sensitive grammar formalisms refers to several grammar formalisms that have been developed in an effort to provide adequate descriptions of the syntactic structure of natural language.
Every m ...
s.
From a theoretical point of view, any language that can be parsed in
polynomial time
In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
belongs to the subset of RCG called positive range concatenation grammars, and reciprocally.
Though intended as a variant on Groenink's
literal movement grammar In linguistics and theoretical computer science, literal movement grammars (LMGs) are a grammar formalism intended to characterize certain extraposition phenomena of natural language such as topicalization and cross-serial dependency. LMGs extend ...
s (LMGs), RCGs treat the grammatical process more as a proof than as a production. Whereas LMGs produce a terminal string from a start predicate, RCGs aim to reduce a start predicate (which predicates of a terminal string) to the empty string, which constitutes a proof of the terminal strings membership in the language.
Description
Formal definition
A Positive Range Concatenation Grammar (PRCG) is a tuple
, where:
*
,
and
are disjoint finite sets of (respectively) ''predicate names'', ''terminal symbols'' and ''variable names''. Each predicate name has an associated arity given by the function
.
*
is the start predicate name and verify
.
*
is a finite set of ''clauses'' of the form
, where the
are ''predicates'' of the form
with
and
.
A Negative Range Concatenation Grammar (NRCG) is defined like a PRCG, but with the addition that some predicates occurring in the right-hand side of a clause can have the form
. Such predicates are called ''negative predicates''.
A Range Concatenation Grammar is a positive or a negative one. Although PRCGs are technically NRCGs, the terms are used to highlight the absence (PRCG) or presence (NRCG) of negative predicates.
A range in a word
is a couple
, with
, where
is the length of
. Two ranges
and
can be concatenated iff
, and we then have:
.
For a word
, with
, the dotted notation for ranges is:
.
Recognition of strings
Like LMGs, RCG clauses have the general schema
, where in an RCG,
is either the empty string or a string of predicates. The arguments
consist of strings of terminal symbols and/or variable symbols, which pattern match against actual argument values like in LMG. Adjacent variables constitute a family of matches against partitions, so that the argument
, with two variables, matches the literal string
in three different ways:
.
Predicate terms come in two forms, positive (which produce the empty string on success), and negative (which produce the empty string on failure/if the positive term does ''not'' produce the empty string). Negative terms are denoted the same as positive terms, with an overbar, as in
.
The rewrite semantics for RCGs is rather simple, identical to the corresponding semantics of LMGs. Given a predicate string
, where the symbols
are terminal strings, if there is a rule
in the grammar that the predicate string matches, the predicate string is replaced by
, substituting for the matched variables in each
.
For example, given the rule
, where
and
are variable symbols and
and
are terminal symbols, the predicate string
can be rewritten as
, because
matches
when
. Similarly, if there were a rule
,
could be rewritten as
.
A proof/recognition of a string
is done by showing that
produces the empty string. For the individual rewrite steps, when multiple alternative variable matches are possible, any rewrite which could lead the whole proof to succeed is considered. Thus, if there is at least one way to produce the empty string from the initial string
, the proof is considered a success, regardless of how many other ways to fail exist.
Example
RCGs are capable of recognizing the non-linear index language
as follows:
Letting x, y, and z be variable symbols:
The proof for ' is then
Or, using the more correct dotted notation for ranges:
References
{{Formal languages and grammars
Formal languages
Grammar frameworks