HOME

TheInfoList



OR:

A context-sensitive grammar (CSG) is a
formal grammar A formal grammar is a set of Terminal and nonterminal symbols, symbols and the Production (computer science), production rules for rewriting some of them into every possible string of a formal language over an Alphabet (formal languages), alphabe ...
in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols. Context-sensitive grammars are more general than
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the fo ...
s, in the sense that there are languages that can be described by a CSG but not by a context-free grammar. Context-sensitive grammars are less general (in the same sense) than unrestricted grammars. Thus, CSGs are positioned between context-free and unrestricted grammars in the
Chomsky hierarchy The Chomsky hierarchy in the fields of formal language theory, computer science, and linguistics, is a containment hierarchy of classes of formal grammars. A formal grammar describes how to form strings from a formal language's alphabet that are v ...
. A
formal language In logic, mathematics, computer science, and linguistics, a formal language is a set of strings whose symbols are taken from a set called "alphabet". The alphabet of a formal language consists of symbols that concatenate into strings (also c ...
that can be described by a context-sensitive grammar, or, equivalently, by a noncontracting grammar or a linear bounded automaton, is called a context-sensitive language. Some textbooks actually define CSGs as non-contracting, although this is not how
Noam Chomsky Avram Noam Chomsky (born December 7, 1928) is an American professor and public intellectual known for his work in linguistics, political activism, and social criticism. Sometimes called "the father of modern linguistics", Chomsky is also a ...
defined them in 1959. This choice of definition makes no difference in terms of the languages generated (i.e. the two definitions are weakly equivalent), but it does make a difference in terms of what grammars are structurally considered context-sensitive; the latter issue was analyzed by Chomsky in 1963. Chomsky introduced context-sensitive grammars as a way to describe the syntax of
natural language A natural language or ordinary language is a language that occurs naturally in a human community by a process of use, repetition, and change. It can take different forms, typically either a spoken language or a sign language. Natural languages ...
where it is often the case that a word may or may not be appropriate in a certain place depending on the context. Walter Savitch has criticized the terminology "context-sensitive" as misleading and proposed "non-erasing" as better explaining the distinction between a CSG and an unrestricted grammar. Although it is well known that certain features of languages (e.g. cross-serial dependency) are not context-free, it is an open question how much of CSGs' expressive power is needed to capture the context sensitivity found in natural languages. Subsequent research in this area has focused on the more computationally tractable mildly context-sensitive languages. The syntaxes of some visual programming languages can be described by context-sensitive graph grammars.


Formal definition


Formal grammar

Let us notate a
formal grammar A formal grammar is a set of Terminal and nonterminal symbols, symbols and the Production (computer science), production rules for rewriting some of them into every possible string of a formal language over an Alphabet (formal languages), alphabe ...
as G = (N, \Sigma, P, S), with N a set of nonterminal symbols, \Sigma a set of terminal symbols, P a set of production rules, and S \in N the start symbol. A string u \in (N \cup \Sigma)^* ''directly yields'', or ''directly derives to'', a string v \in (N \cup \Sigma)^*, denoted as u \Rightarrow v, if ''v'' can be obtained from ''u'' by an application of some production rule in ''P'', that is, if u = \gamma L \delta and v = \gamma R \delta, where (L \to R) \in P is a production rule, and \gamma, \delta \in (N \cup \Sigma)^* is the unaffected left and right part of the string, respectively. More generally, ''u'' is said to ''yield'', or ''derive to'', ''v'', denoted as u \Rightarrow^* v, if ''v'' can be obtained from ''u'' by repeated application of production rules, that is, if u = u_0 \Rightarrow ... \Rightarrow u_n = v for some ''n'' ≥ 0 and some strings u_1, ..., u_ \in (N \cup \Sigma)^*. In other words, the relation \Rightarrow^* is the reflexive transitive closure of the relation \Rightarrow. The language of the grammar ''G'' is the set of all terminal-symbol strings derivable from its start symbol, formally: L(G) = \. Derivations that do not end in a string composed of terminal symbols only are possible, but do not contribute to ''L''(''G'').


Context-sensitive grammar

A formal grammar is context-sensitive if each rule in ''P'' is either of the form S \to \varepsilon where \varepsilon is the
empty string In formal language theory, the empty string, or empty word, is the unique String (computer science), string of length zero. Formal theory Formally, a string is a finite, ordered sequence of character (symbol), characters such as letters, digits ...
, or of the form : α''A''β → αγβ with ''A'' ∈ ''N'',i.e., ''A'' a single nonterminal \alpha, \beta\in (N \cup \Sigma \setminus\)^*,i.e., α and β strings of nonterminals (except for the start symbol) and terminals and \gamma\in (N \cup \Sigma \setminus\)^+.i.e., γ is a nonempty string of nonterminals (except for the start symbol) and terminals The name ''context-sensitive'' is explained by the α and β that form the context of ''A'' and determine whether ''A'' can be replaced with γ or not. By contrast, in a
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the fo ...
, no context is present: the left hand side of every production rule is just a nonterminal. The string γ is not allowed to be empty. Without this restriction, the resulting grammars become equal in power to unrestricted grammars.


(Weakly) equivalent definitions

A noncontracting grammar is a grammar in which for any production rule, of the form ''u'' → ''v'', the length of ''u'' is less than or equal to the length of ''v''. Every context-sensitive grammar is noncontracting, while every noncontracting grammar can be converted into an equivalent context-sensitive grammar; the two classes are weakly equivalent. Some authors use the term ''context-sensitive grammar'' to refer to noncontracting grammars in general. The left-context- and right-context-sensitive grammars are defined by restricting the rules to just the form α''A'' → αγ and to just ''A''β → γβ, respectively. The languages generated by these grammars are also the full class of context-sensitive languages. also at https://www.encyclopediaofmath.org/index.php/Grammar,_context-sensitive The equivalence was established by Penttonen normal form. citing


Examples


''anbncn''

The following context-sensitive grammar, with start symbol ''S'', generates the canonical non- context-free language : Rules 1 and 2 allow for blowing-up ''S'' to ''a''''n''''BC''(''BC'')''n''−1; rules 3 to 6 allow for successively exchanging each ''CB'' to ''BC'' ( four rules are needed for that since a rule ''CB'' → ''BC'' wouldn't fit into the scheme α''A''β → αγβ); rules 7–10 allow replacing a non-terminal ''B'' or ''C'' with its corresponding terminal ''b'' or ''c'', respectively, provided it is in the right place. A generation chain for ' is: : ''S'' : →2 : →2 : →1 : →3 : →4 : →5 : →6 : →3 : →4 : →5 : →6 : →3 : →4 : →5 : →6 : →7 : →8 : →8 : →9 : →10 : →10 :


''anbncndn'', etc.

More complicated grammars can be used to parse , and other languages with even more letters. Here we show a simpler approach using non-contracting grammars: Start with a kernel of regular productions generating the sentential forms (ABCD)^abcd and then include the non contracting productions p_ : Da\rightarrow aD, p_ : Db\rightarrow bD, p_ : Dc\rightarrow cD, p_ : Dd\rightarrow dd, p_ : Ca\rightarrow aC, p_ : Cb\rightarrow bC, p_ : Cc\rightarrow cc, p_ : Ba\rightarrow aB, p_ : Bb\rightarrow bb, p_ : Aa\rightarrow aa.


''ambncmdn''

A non contracting grammar (for which there is an equivalent CSG) for the language L_ = \ is defined by :p_0 : S \rightarrow RT, :p_1 : R\rightarrow aRC , aC, :p_3 : T\rightarrow BTd , Bd, :p_5 : CB\rightarrow BC, :p_6 : aB\rightarrow ab, :p_7 : bB\rightarrow bb, :p_8 : Cd\rightarrow cd, and :p_9 : Cc\rightarrow cc. With these definitions, a derivation for a^3b^2c^3d^2 is: S \Rightarrow_ RT \Rightarrow_ a^3C^3T \Rightarrow_ a^3C^3B^2d^2 \Rightarrow_ a^3B^2C^3d^2 \Rightarrow_ a^3b^2C^3d^2 \Rightarrow_ a^3b^2c^3d^2 .


''a''2''i''

A noncontracting grammar for the language is constructed in Example 9.5 (p. 224) of (Hopcroft, Ullman, 1979): # S\rightarrow CaB/math> # \begin \ a\rightarrow aa a\\ \ aaB]\rightarrow aa aB\\ \ Ca\rightarrow a a\\ \ CaaB]\rightarrow a aB\\ \ CaBrightarrow aaCB] \\ \ aBrightarrow a CB\end # CBrightarrow DB/math> # CBrightarrow E/math> # \begin \ a arightarrow a \\ \ DBrightarrow aB\\ \ aDa]\rightarrow Da \\ \ a aBrightarrow aaB] \\ \ aDaB]\rightarrow DaaB] \end # Darightarrow Ca/math> # \begin \ a arightarrow a \\ \ Erightarrow a\\ \ aEa]\rightarrow Ea \end # Earightarrow a


Kuroda normal form

Every context-sensitive grammar which does not generate the empty string can be transformed into a weakly equivalent one in Kuroda normal form. "Weakly equivalent" here means that the two grammars generate the same language. The normal form will not in general be context-sensitive, but will be a noncontracting grammar. The Kuroda normal form is an actual normal form for non-contracting grammars.


Properties and uses


Equivalence to linear bounded automaton

A formal language can be described by a context-sensitive grammar if and only if it is accepted by some linear bounded automaton (LBA). In some textbooks this result is attributed solely to Landweber and Kuroda. Others call it the Myhill–Landweber–Kuroda theorem. (Myhill introduced the concept of deterministic LBA in 1960. Peter S. Landweber published in 1963 that the language accepted by a deterministic LBA is context sensitive. Kuroda introduced the notion of non-deterministic LBA and the equivalence between LBA and CSGs in 1964.) it is still an open question whether every context-sensitive language can be accepted by a ''deterministic'' LBA.


Closure properties

Context-sensitive languages are closed under complement. This 1988 result is known as the Immerman–Szelepcsényi theorem. Moreover, they are closed under union,
intersection In mathematics, the intersection of two or more objects is another object consisting of everything that is contained in all of the objects simultaneously. For example, in Euclidean geometry, when two lines in a plane are not parallel, their ...
,
concatenation In formal language theory and computer programming, string concatenation is the operation of joining character strings end-to-end. For example, the concatenation of "snow" and "ball" is "snowball". In certain formalizations of concatenati ...
, substitution,more formally: if ''L'' ⊆ Σ* is a context-sensitive language and ''f'' maps each ''a''∈Σ to a context-sensitive language ''f''(''a''), the ''f''(''L'') is again a context-sensitive language inverse homomorphism, and Kleene plus. Every recursively enumerable language ''L'' can be written as ''h''(''L'') for some context-sensitive language ''L'' and some string homomorphism ''h''.


Computational problems

The
decision problem In computability theory and computational complexity theory, a decision problem is a computational problem that can be posed as a yes–no question on a set of input values. An example of a decision problem is deciding whether a given natura ...
that asks whether a certain string ''s'' belongs to the language of a given context-sensitive grammar ''G'', is
PSPACE-complete In computational complexity theory, a decision problem is PSPACE-complete if it can be solved using an amount of memory that is polynomial in the input length (PSPACE, polynomial space) and if every other problem that can be solved in polynomial sp ...
. Moreover, there are context-sensitive grammars whose languages are PSPACE-complete. In other words, there is a context-sensitive grammar ''G'' such that deciding whether a certain string ''s'' belongs to the language of ''G'' is PSPACE-complete (so ''G'' is fixed and only ''s'' is part of the input of the problem). The emptiness problem for context-sensitive grammars (given a context-sensitive grammar ''G'', is ''L''(''G'')=∅ ?) is undecidable.This also follows from (1) context-free languages being also context-sensitive, (2) context-sensitive language being closed under intersection, but (3) disjointness of context-free languages being undecidable.


As model of natural languages

Savitch has proven the following theoretical result, on which he bases his criticism of CSGs as basis for natural language: for any
recursively enumerable In computability theory, a set ''S'' of natural numbers is called computably enumerable (c.e.), recursively enumerable (r.e.), semidecidable, partially decidable, listable, provable or Turing-recognizable if: *There is an algorithm such that the ...
set ''R'', there exists a context-sensitive language/grammar ''G'' which can be used as a sort of proxy to test membership in ''R'' in the following way: given a string ''s'', ''s'' is in ''R'' if and only if there exists a positive integer ''n'' for which ''scn'' is in G, where ''c'' is an arbitrary symbol not part of ''R''. It has been shown that nearly all
natural language A natural language or ordinary language is a language that occurs naturally in a human community by a process of use, repetition, and change. It can take different forms, typically either a spoken language or a sign language. Natural languages ...
s may in general be characterized by context-sensitive grammars, but the whole class of CSGs seems to be much bigger than natural languages. Worse yet, since the aforementioned decision problem for CSGs is PSPACE-complete, that makes them totally unworkable for practical use, as a polynomial-time algorithm for a PSPACE-complete problem would imply P=NP. It was proven that some natural languages are not context-free, based on identifying so-called cross-serial dependencies and unbounded scrambling phenomena. However this does not necessarily imply that the class of CSGs is necessary to capture "context sensitivity" in the colloquial sense of these terms in natural languages. For example, linear context-free rewriting systems (LCFRSs) are strictly weaker than CSGs but can account for the phenomenon of cross-serial dependencies; one can write a LCFRS grammar for for example. Ongoing research on
computational linguistics Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics ...
has focused on formulating other classes of languages that are " mildly context-sensitive" whose decision problems are feasible, such as tree-adjoining grammars, combinatory categorial grammars, coupled context-free languages, and linear context-free rewriting systems. The languages generated by these formalisms properly lie between the context-free and context-sensitive languages. More recently, the class PTIME has been identified with range concatenation grammars, which are now considered to be the most expressive of the mild-context sensitive language classes.


See also

*
Chomsky hierarchy The Chomsky hierarchy in the fields of formal language theory, computer science, and linguistics, is a containment hierarchy of classes of formal grammars. A formal grammar describes how to form strings from a formal language's alphabet that are v ...
* Growing context-sensitive grammar * Definite clause grammar#Non-context-free grammars * List of parser generators for context-sensitive grammars


Notes


References


Further reading

*


External links


Earley Parsing for Context-Sensitive Grammars
{{DEFAULTSORT:Context-Sensitive Grammar Formal languages Grammar frameworks