A context-sensitive grammar (CSG) is a
formal grammar
A formal grammar is a set of Terminal and nonterminal symbols, symbols and the Production (computer science), production rules for rewriting some of them into every possible string of a formal language over an Alphabet (formal languages), alphabe ...
in which the left-hand sides and right-hand sides of any
production rules may be surrounded by a context of
terminal and
nonterminal symbols. Context-sensitive grammars are more general than
context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules
can be applied to a nonterminal symbol regardless of its context.
In particular, in a context-free grammar, each production rule is of the fo ...
s, in the sense that there are languages that can be described by a CSG but not by a context-free grammar. Context-sensitive grammars are less general (in the same sense) than
unrestricted grammars. Thus, CSGs are positioned between context-free and unrestricted grammars in the
Chomsky hierarchy
The Chomsky hierarchy in the fields of formal language theory, computer science, and linguistics, is a containment hierarchy of classes of formal grammars. A formal grammar describes how to form strings from a formal language's alphabet that are v ...
.
A
formal language
In logic, mathematics, computer science, and linguistics, a formal language is a set of strings whose symbols are taken from a set called "alphabet".
The alphabet of a formal language consists of symbols that concatenate into strings (also c ...
that can be described by a context-sensitive grammar, or, equivalently, by a
noncontracting grammar or a
linear bounded automaton, is called a
context-sensitive language. Some textbooks actually define CSGs as non-contracting,
although this is not how
Noam Chomsky
Avram Noam Chomsky (born December 7, 1928) is an American professor and public intellectual known for his work in linguistics, political activism, and social criticism. Sometimes called "the father of modern linguistics", Chomsky is also a ...
defined them in 1959.
This choice of definition makes no difference in terms of the languages generated (i.e. the two definitions are
weakly equivalent), but it does make a difference in terms of what grammars are structurally considered context-sensitive; the latter issue was analyzed by Chomsky in 1963.
Chomsky introduced context-sensitive grammars as a way to describe the syntax of
natural language
A natural language or ordinary language is a language that occurs naturally in a human community by a process of use, repetition, and change. It can take different forms, typically either a spoken language or a sign language. Natural languages ...
where it is often the case that a word may or may not be appropriate in a certain place depending on the context.
Walter Savitch has criticized the terminology "context-sensitive" as misleading and proposed "non-erasing" as better explaining the distinction between a CSG and an
unrestricted grammar.
Although it is well known that certain features of languages (e.g.
cross-serial dependency) are not context-free, it is an
open question how much of CSGs' expressive power is needed to capture the context sensitivity found in natural languages. Subsequent research in this area has focused on the more computationally tractable
mildly context-sensitive languages. The syntaxes of some
visual programming languages can be described by context-sensitive
graph grammars.
Formal definition
Formal grammar
Let us notate a
formal grammar
A formal grammar is a set of Terminal and nonterminal symbols, symbols and the Production (computer science), production rules for rewriting some of them into every possible string of a formal language over an Alphabet (formal languages), alphabe ...
as
, with
a set of nonterminal symbols,
a set of terminal symbols,
a set of production rules, and
the start symbol.
A string
''directly yields'', or ''directly derives to'', a string
, denoted as
, if ''v'' can be obtained from ''u'' by an application of some production rule in ''P'', that is, if
and
, where
is a production rule, and
is the unaffected left and right part of the string, respectively.
More generally, ''u'' is said to ''yield'', or ''derive to'', ''v'', denoted as
, if ''v'' can be obtained from ''u'' by repeated application of production rules, that is, if
for some ''n'' ≥ 0 and some strings
. In other words, the relation
is the
reflexive transitive closure of the relation
.
The language of the grammar ''G'' is the set of all terminal-symbol strings derivable from its start symbol, formally:
.
Derivations that do not end in a string composed of terminal symbols only are possible, but do not contribute to ''L''(''G'').
Context-sensitive grammar
A formal grammar is context-sensitive if each rule in ''P'' is either of the form
where
is the
empty string
In formal language theory, the empty string, or empty word, is the unique String (computer science), string of length zero.
Formal theory
Formally, a string is a finite, ordered sequence of character (symbol), characters such as letters, digits ...
, or of the form
: α''A''β → αγβ
with ''A'' ∈ ''N'',
[i.e., ''A'' a single nonterminal] ,
[i.e., α and β strings of nonterminals (except for the start symbol) and terminals] and
.
[i.e., γ is a nonempty string of nonterminals (except for the start symbol) and terminals]
The name ''context-sensitive'' is explained by the α and β that form the context of ''A'' and determine whether ''A'' can be replaced with γ or not.
By contrast, in a
context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules
can be applied to a nonterminal symbol regardless of its context.
In particular, in a context-free grammar, each production rule is of the fo ...
, no context is present: the left hand side of every production rule is just a nonterminal.
The string γ is not allowed to be empty. Without this restriction, the resulting grammars become equal in power to
unrestricted grammars.
(Weakly) equivalent definitions
A
noncontracting grammar is a grammar in which for any production rule, of the form ''u'' → ''v'', the length of ''u'' is less than or equal to the length of ''v''.
Every context-sensitive grammar is noncontracting, while every noncontracting grammar can be converted into an equivalent context-sensitive grammar; the two classes are
weakly equivalent.
Some authors use the term ''context-sensitive grammar'' to refer to noncontracting grammars in general.
The left-context- and right-context-sensitive grammars are defined by restricting the rules to just the form α''A'' → αγ and to just ''A''β → γβ, respectively. The languages generated by these grammars are also the full class of context-sensitive languages.
[ also at https://www.encyclopediaofmath.org/index.php/Grammar,_context-sensitive] The equivalence was established by
Penttonen normal form.
[ citing ]
Examples
''anbncn''
The following context-sensitive grammar, with start symbol ''S'', generates the canonical non-
context-free language :
Rules 1 and 2 allow for blowing-up ''S'' to ''a''
''n''''BC''(''BC'')
''n''−1; rules 3 to 6 allow for successively exchanging each ''CB'' to ''BC'' (
four rules are needed for that since a rule ''CB'' → ''BC'' wouldn't fit into the scheme α''A''β → αγβ); rules 7–10 allow replacing a non-terminal ''B'' or ''C'' with its corresponding terminal ''b'' or ''c'', respectively, provided it is in the right place.
A generation chain for ' is:
: ''S''
: →
2
: →
2
: →
1
: →
3
: →
4
: →
5
: →
6
: →
3
: →
4
: →
5
: →
6
: →
3
: →
4
: →
5
: →
6
: →
7
: →
8
: →
8
: →
9
: →
10
: →
10
:
''anbncndn'', etc.
More complicated grammars can be used to parse , and other languages with even more letters. Here we show a simpler approach using non-contracting grammars:
Start with a kernel of regular productions generating the sentential forms
and then include the non contracting productions
,
,
,
,
,
,
,
,
,
.
''ambncmdn''
A non contracting grammar (for which there is an equivalent CSG) for the language
is defined by
:
,
:
,
:
,
:
,
:
,
:
,
:
, and
:
.
With these definitions, a derivation for
is:
.
''a''2''i''
A noncontracting grammar for the language is constructed in Example 9.5 (p. 224) of (Hopcroft, Ullman, 1979):
#