In
formal language theory, a context-free grammar (CFG) is a
formal grammar whose
production rules are of the form
:
with
a ''single''
nonterminal symbol, and
a string of
terminals and/or nonterminals (
can be empty). A formal grammar is "context-free" if its production rules can be applied regardless of the context of a nonterminal. No matter which symbols surround it, the single nonterminal on the left hand side can always be replaced by the right hand side. This is what distinguishes it from a
context-sensitive grammar
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols. Context-sensitive grammars are more general than co ...
.
A formal grammar is essentially a set of production rules that describe all possible strings in a given formal language. Production rules are simple replacements. For example, the first rule in the picture,
:
replaces
with
. There can be multiple replacement rules for a given nonterminal symbol. The language generated by a grammar is the set of all strings of terminal symbols that can be derived, by repeated rule applications, from some particular nonterminal symbol ("start symbol").
Nonterminal symbols are used during the derivation process, but do not appear in its final result string.
Languages generated by context-free grammars are known as
context-free languages (CFL). Different context-free grammars can generate the same context-free language. It is important to distinguish the properties of the language (intrinsic properties) from the properties of a particular grammar (extrinsic properties). The
language equality question (do two given context-free grammars generate the same language?) is
undecidable.
Context-free grammars arise in
linguistics where they are used to describe the structure of sentences and words in a
natural language
In neuropsychology, linguistics, and philosophy of language, a natural language or ordinary language is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation. Natural languages ...
, and they were invented by the linguist
Noam Chomsky for this purpose. By contrast, in
computer science, as the use of recursively-defined concepts increased, they were used more and more. In an early application, grammars are used to describe the structure of
programming languages. In a newer application, they are used in an essential part of the
Extensible Markup Language (XML) called the ''
Document Type Definition
A document type definition (DTD) is a set of ''markup declarations'' that define a ''document type'' for an SGML-family markup language ( GML, SGML, XML, HTML).
A DTD defines the valid building blocks of an XML document. It defines the document ...
''.
In
linguistics, some authors use the term
phrase structure grammar to refer to context-free grammars, whereby phrase-structure grammars are distinct from
dependency grammar
Dependency grammar (DG) is a class of modern grammatical theories that are all based on the dependency relation (as opposed to the ''constituency relation'' of phrase structure) and that can be traced back primarily to the work of Lucien Tesnià ...
s. In
computer science, a popular notation for context-free grammars is
Backus–Naur form
In computer science, Backus–Naur form () or Backus normal form (BNF) is a metasyntax notation for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document formats ...
, or ''BNF''.
Background
Since the time of
PÄṇini, at least, linguists have described the
grammars of languages in terms of their block structure, and described how sentences are
recursively built up from smaller phrases, and eventually individual words or word elements. An essential property of these block structures is that logical units never overlap. For example, the sentence:
: ''John, whose blue car was in the garage, walked to the grocery store.''
can be logically parenthesized (with the logical metasymbols
'') as follows:
:
'John''[, [''whose ''[''blue car'' [''was ''[''in ''[''the garage''], [''walked ''[''to ''[''the ''[''grocery store'' .
A context-free grammar provides a simple and mathematically precise mechanism for describing the methods by which phrases in some natural language are built from smaller blocks, capturing the "block structure" of sentences in a natural way. Its simplicity makes the formalism amenable to rigorous mathematical study. Important features of natural language syntax such as
agreement Agreement may refer to:
Agreements between people and organizations
* Gentlemen's agreement, not enforceable by law
* Trade agreement, between countries
* Consensus, a decision-making process
* Contract, enforceable in a court of law
** Meeting of ...
and
reference are not part of the context-free grammar, but the basic recursive structure of sentences, the way in which clauses nest inside other clauses, and the way in which lists of adjectives and adverbs are swallowed by nouns and verbs, is described exactly.
Context-free grammars are a special form of
Semi-Thue systems that in their general form date back to the work of
Axel Thue.
The formalism of context-free grammars was developed in the mid-1950s by
Noam Chomsky,
[, p. 106.] and also their
classification as a special type of
formal grammar (which he called
phrase-structure grammars).
Some authors, however, reserve the term for more restricted grammars in the Chomsky hierarchy: context-sensitive grammars or context-free grammars. In a broader sense,
phrase structure grammars are also known as constituency grammars. The defining trait of phrase structure grammars is thus their adherence to the constituency relation, as opposed to the dependency relation of
dependency grammar
Dependency grammar (DG) is a class of modern grammatical theories that are all based on the dependency relation (as opposed to the ''constituency relation'' of phrase structure) and that can be traced back primarily to the work of Lucien Tesnià ...
s. In Chomsky's
generative grammar framework, the syntax of natural language was described by context-free rules combined with transformation rules.
Block structure was introduced into computer
programming languages by the
Algol project (1957–1960), which, as a consequence, also featured a context-free grammar to describe the resulting Algol syntax. This became a standard feature of computer languages, and the notation for grammars used in concrete descriptions of computer languages came to be known as
Backus–Naur form
In computer science, Backus–Naur form () or Backus normal form (BNF) is a metasyntax notation for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document formats ...
, after two members of the Algol language design committee.
The "block structure" aspect that context-free grammars capture is so fundamental to grammar that the terms syntax and grammar are often identified with context-free grammar rules, especially in computer science. Formal constraints not captured by the grammar are then considered to be part of the "semantics" of the language.
Context-free grammars are simple enough to allow the construction of efficient
parsing algorithms that, for a given string, determine whether and how it can be generated from the grammar. An
Earley parser is an example of such an algorithm, while the widely used
LR and
LL parsers are simpler algorithms that deal only with more restrictive subsets of context-free grammars.
Formal definitions
A context-free grammar is defined by the 4-
tuple ,
where
# is a finite set; each element
is called ''a nonterminal character'' or a ''variable''. Each variable represents a different type of phrase or clause in the sentence. Variables are also sometimes called syntactic categories. Each variable defines a sub-language of the language defined by .
# is a finite set of ''terminal''s, disjoint from , which make up the actual content of the sentence. The set of terminals is the alphabet of the language defined by the grammar .
# is a finite
relation in
, where the asterisk represents the
Kleene star operation. The members of are called the ''(rewrite) rule''s or ''production''s of the grammar. (also commonly symbolized by a )
# is the start variable (or start symbol), used to represent the whole sentence (or program). It must be an element of .
Production rule notation
A
production rule in is formalized mathematically as a pair
, where
is a nonterminal and
is a
string
String or strings may refer to:
*String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects
Arts, entertainment, and media Films
* ''Strings'' (1991 film), a Canadian anim ...
of variables and/or terminals; rather than using
ordered pair
In mathematics, an ordered pair (''a'', ''b'') is a pair of objects. The order in which the objects appear in the pair is significant: the ordered pair (''a'', ''b'') is different from the ordered pair (''b'', ''a'') unless ''a'' = ''b''. (In con ...
notation, production rules are usually written using an arrow operator with
as its left hand side and as its right hand side:
.
It is allowed for to be the
empty string, and in this case it is customary to denote it by ε. The form
is called an -production.
It is common to list all right-hand sides for the same left-hand side on the same line, using , (the
vertical bar
The vertical bar, , is a glyph with various uses in mathematics, computing, and typography. It has many names, often related to particular meanings: Sheffer stroke (in logic), pipe, bar, or (literally the word "or"), vbar, and others.
Usage
...
) to separate them. Rules
and
can hence be written as
. In this case,
and
are called the first and second alternative, respectively.
Rule application
For any strings
, we say directly yields , written as
, if
with
and
such that
and
. Thus, is a result of applying the rule
to .
Repetitive rule application
For any strings
we say ''yields'' or is ''derived'' from if there is a positive integer and strings
such that
. This relation is denoted
, or
in some textbooks. If
, the relation
holds. In other words,
and
are the
reflexive transitive closure (allowing a string to yield itself) and the
transitive closure (requiring at least one step) of
, respectively.
Context-free language
The language of a grammar
is the set
:
of all terminal-symbol strings derivable from the start symbol.
A language is said to be a context-free language (CFL), if there exists a CFG , such that
.
Non-deterministic pushdown automata recognize exactly the context-free languages.
Examples
Words concatenated with their reverse
The grammar
, with productions
:,
:,
:,
is context-free. It is not proper since it includes an ε-production. A typical derivation in this grammar is
:.
This makes it clear that
.
The language is context-free, however, it can be proved that it is not
regular.
If the productions
:,
:,
are added, a context-free grammar for the set of all
palindrome
A palindrome is a word, number, phrase, or other sequence of symbols that reads the same backwards as forwards, such as the words ''madam'' or ''racecar'', the date and time ''11/11/11 11:11,'' and the sentence: "A man, a plan, a canal – Panam ...
s over the alphabet is obtained.
Well-formed parentheses
The canonical example of a context-free grammar is parenthesis matching, which is representative of the general case. There are two terminal symbols "(" and ")" and one nonterminal symbol S. The production rules are
:,
:,
:
The first rule allows the S symbol to multiply; the second rule allows the S symbol to become enclosed by matching parentheses; and the third rule terminates the recursion.
Well-formed nested parentheses and square brackets
A second canonical example is two different kinds of matching nested parentheses, described by the productions:
:
:
:
:
:
with terminal symbols
( ) and nonterminal S.
The following sequence can be derived in that grammar:
:
Matching pairs
In a context-free grammar, we can pair up characters the way we do with
bracket
A bracket is either of two tall fore- or back-facing punctuation marks commonly used to isolate a segment of text or data from its surroundings. Typically deployed in symmetric pairs, an individual bracket may be identified as a 'left' or 'r ...
s. The simplest example:
:
:
This grammar generates the language
, which is not
regular (according to the
pumping lemma for regular languages).
The special character ε stands for the empty string. By changing the above grammar to
:
:
we obtain a grammar generating the language
instead. This differs only in that it contains the empty string while the original grammar did not.
Distinct number of a's and b's
A context-free grammar for the language consisting of all strings over containing an unequal number of a's and b's:
:
:
:
:
Here, the nonterminal T can generate all strings with more a's than b's, the nonterminal U generates all strings with more b's than a's and the nonterminal V generates all strings with an equal number of a's and b's. Omitting the third alternative in the rules for T and U doesn't restrict the grammar's language.
Second block of b's of double size
Another example of a non-regular language is
. It is context-free as it can be generated by the following context-free grammar:
:
:
First-order logic formulas
The
formation rules for the terms and formulas of formal logic fit the definition of context-free grammar, except that the set of symbols may be infinite and there may be more than one start symbol.
Examples of languages that are not context free
In contrast to well-formed nested parentheses and square brackets in the previous section, there is no context-free grammar for generating all sequences of two different types of parentheses, each separately balanced ''disregarding the other'', where the two types need not nest inside one another, for example:
:
or
:
The fact that this language is not context free can be proven using
pumping lemma for context-free languages and a proof by contradiction, observing that all words of the form
should belong to the language. This language belongs instead to a more general class and can be described by a
conjunctive grammar, which in turn also includes other non-context-free languages, such as the language of all words of the form
.
Regular grammars
Every
regular grammar is context-free, but not all context-free grammars are regular. The following context-free grammar, for example, is also regular.
:
:
:
The terminals here are and , while the only nonterminal is .
The language described is all nonempty strings of
s and
s that end in
.
This grammar is
regular: no rule has more than one nonterminal in its right-hand side, and each of these nonterminals is at the same end of the right-hand side.
Every regular grammar corresponds directly to a
nondeterministic finite automaton, so we know that this is a
regular language.
Using vertical bars, the grammar above can be described more tersely as follows:
:
Derivations and syntax trees
A ''derivation'' of a string for a grammar is a sequence of grammar rule applications that transform the start symbol into the string.
A derivation proves that the string belongs to the grammar's language.
A derivation is fully determined by giving, for each step:
* the rule applied in that step
* the occurrence of its left-hand side to which it is applied
For clarity, the intermediate string is usually given as well.
For instance, with the grammar:
#
#
#
the string
:
can be derived from the start symbol with the following derivation:
:
: (by rule 1. on )
: (by rule 1. on the second )
: (by rule 2. on the first )
: (by rule 2. on the second )
: (by rule 3. on the third )
Often, a strategy is followed that deterministically chooses the next nonterminal to rewrite:
* in a ''leftmost derivation'', it is always the leftmost nonterminal;
* in a ''rightmost derivation'', it is always the rightmost nonterminal.
Given such a strategy, a derivation is completely determined by the sequence of rules applied. For instance, one leftmost derivation of the same string is
:
: (by rule 1 on the leftmost )
: (by rule 2 on the leftmost )
: (by rule 1 on the leftmost )
: (by rule 2 on the leftmost )
: (by rule 3 on the leftmost ),
which can be summarized as
:rule 1
:rule 2
:rule 1
:rule 2
:rule 3.
One rightmost derivation is:
:
: (by rule 1 on the rightmost )
: (by rule 1 on the rightmost )
: (by rule 3 on the rightmost )
: (by rule 2 on the rightmost )
: (by rule 2 on the rightmost ),
which can be summarized as
:rule 1
:rule 1
:rule 3
:rule 2
:rule 2.
The distinction between leftmost derivation and rightmost derivation is important because in most
parsers the transformation of the input is defined by giving a piece of code for every grammar rule that is executed whenever the rule is applied. Therefore, it is important to know whether the parser determines a leftmost or a rightmost derivation because this determines the order in which the pieces of code will be executed. See for an example
LL parsers and
LR parsers.
A derivation also imposes in some sense a hierarchical structure on the string that is derived. For example, if the string "1 + 1 + a" is derived according to the leftmost derivation outlined above, the structure of the string would be:
:
where indicates a substring recognized as belonging to . This hierarchy can also be seen as a tree:
This tree is called a ''
parse tree
A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is used primarily in co ...
'' or "concrete syntax tree" of the string, by contrast with the
abstract syntax tree. In this case the presented leftmost and the rightmost derivations define the same parse tree; however, there is another rightmost derivation of the same string
:
: (by rule 1 on the rightmost )
: (by rule 3 on the rightmost )
: (by rule 1 on the rightmost )
: (by rule 2 on the rightmost )
: (by rule 2 on the rightmost ),
which defines a string with a different structure
:
and a different parse tree:
Note however that both parse trees can be obtained by both leftmost and rightmost derivations. For example, the last tree can be obtained with the leftmost derivation as follows:
:
: (by rule 1 on the leftmost )
: (by rule 1 on the leftmost )
: (by rule 2 on the leftmost )
: (by rule 2 on the leftmost )
: (by rule 3 on the leftmost ),
If a string in the language of the grammar has more than one parsing tree, then the grammar is said to be an ''
ambiguous grammar''. Such grammars are usually hard to parse because the parser cannot always decide which grammar rule it has to apply. Usually, ambiguity is a feature of the grammar, not the language, and an unambiguous grammar can be found that generates the same context-free language. However, there are certain languages that can only be generated by ambiguous grammars; such languages are called ''
inherently ambiguous languages''.
Example: Algebraic expressions
Here is a context-free grammar for syntactically correct
infix
An infix is an affix inserted inside a word stem (an existing word or the core of a family of words). It contrasts with ''adfix,'' a rare term for an affix attached to the outside of a stem, such as a prefix or suffix.
When marking text for int ...
algebraic expressions in the variables x, y and z:
#
#
#
#
#
#
#
#
This grammar can, for example, generate the string
:
as follows:
:
: (by rule 5)
: (by rule 6, applied to the leftmost )
: (by rule 7, applied to the rightmost )
: (by rule 8, applied to the leftmost )
: (by rule 8, applied to the rightmost )
: (by rule 4, applied to the leftmost )
: (by rule 6, applied to the fourth )
: (by rule 4, applied to the rightmost )
: (etc.)
:
:
:
:
:
:
Note that many choices were made underway as to which rewrite was going to be performed next.
These choices look quite arbitrary. As a matter of fact, they are, in the sense that the string finally generated is always the same. For example, the second and third rewrites
: (by rule 6, applied to the leftmost )
: (by rule 7, applied to the rightmost )
could be done in the opposite order:
: (by rule 7, applied to the rightmost )
: (by rule 6, applied to the leftmost )
Also, many choices were made on which rule to apply to each selected .
Changing the choices made and not only the order they were made in usually affects which terminal string comes out at the end.
Let's look at this in more detail. Consider the
parse tree
A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is used primarily in co ...
of this derivation:
Starting at the top, step by step, an S in the tree is expanded, until no more unexpanded es (nonterminals) remain.
Picking a different order of expansion will produce a different derivation, but the same parse tree.
The parse tree will only change if we pick a different rule to apply at some position in the tree.
But can a different parse tree still produce the same terminal string,
which is in this case?
Yes, for this particular grammar, this is possible.
Grammars with this property are called
ambiguous
Ambiguity is the type of meaning (linguistics), meaning in which a phrase, statement or resolution is not explicitly defined, making several interpretations wikt:plausible#Adjective, plausible. A common aspect of ambiguity is uncertainty. It ...
.
For example, can be produced with these two different parse trees:
However, the ''language'' described by this grammar is not inherently ambiguous:
an alternative, unambiguous grammar can be given for the language, for example:
:
:
:
:
:
:
:
:
:,
once again picking as the start symbol. This alternative grammar will produce with a parse tree similar to the left one above, i.e. implicitly assuming the association , which does not follow standard
order of operations
In mathematics and computer programming, the order of operations (or operator precedence) is a collection of rules that reflect conventions about which procedures to perform first in order to evaluate a given mathematical expression.
For exampl ...
. More elaborate, unambiguous and context-free grammars can be constructed that produce parse trees that obey all desired
operator precedence and associativity rules.
Normal forms
Every context-free grammar with no ε-production has an equivalent grammar in
Chomsky normal form, and a grammar in
Greibach normal form. "Equivalent" here means that the two grammars generate the same language.
The especially simple form of production rules in Chomsky normal form grammars has both theoretical and practical implications. For instance, given a context-free grammar, one can use the Chomsky normal form to construct a
polynomial-time
In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by t ...
algorithm that decides whether a given string is in the language represented by that grammar or not (the
CYK algorithm).
Closure properties
Context-free languages are
closed
Closed may refer to:
Mathematics
* Closure (mathematics), a set, along with operations, for which applying those operations on members always results in a member of the set
* Closed set, a set which contains all its limit points
* Closed interval, ...
under the various operations, that is, if the languages ''K'' and ''L'' are
context-free, so is the result of the following operations:
*
union ''K'' ∪ ''L'';
concatenation ''K'' ∘ ''L'';
Kleene star ''L''
*
*
substitution
Substitution may refer to:
Arts and media
*Chord substitution, in music, swapping one chord for a related one within a chord progression
* Substitution (poetry), a variation in poetic scansion
* "Substitution" (song), a 2009 song by Silversun Pi ...
(in particular
homomorphism)
*
inverse homomorphism
*
intersection
In mathematics, the intersection of two or more objects is another object consisting of everything that is contained in all of the objects simultaneously. For example, in Euclidean geometry, when two lines in a plane are not parallel, their i ...
with a
regular language
They are not closed under general intersection (hence neither under
complementation) and set difference.
Decidable problems
The following are some decidable problems about context-free grammars.
Parsing
The parsing problem, checking whether a given word belongs to the language given by a context-free grammar, is decidable, using one of the general-purpose parsing algorithms:
*
CYK algorithm (for grammars in
Chomsky normal form)
*
Earley parser
*
GLR parser
*
LL parser (only for the proper subclass of for LL(''k'') grammars)
Context-free parsing for
Chomsky normal form grammars was shown by
Leslie G. Valiant to be reducible to boolean
matrix multiplication, thus inheriting its complexity upper bound of
''O''(''n''
2.3728639).
[In Valiant's papers, ''O''(''n''2.81) is given, the then best known upper bound. See Matrix multiplication#Computational complexity for bound improvements since then.] Conversely,
Lillian Lee
Lillian Lee was a stage actress in New York City beginning in the early 1880s. She was in the cast of the original Ziegfeld Follies in 1907.
Acting career
Lee was only a child when she was assigned the part of ''Meenie'' in ''Rip Van Winkle'' ...
has shown ''O''(''n''
3−ε) boolean matrix multiplication to be reducible to ''O''(''n''
3−3ε) CFG parsing, thus establishing some kind of lower bound for the latter.
Reachability, productiveness, nullability
A nonterminal symbol
is called ''productive'', or ''generating'', if there is a derivation
for some string
of terminal symbols.
is called ''reachable'' if there is a derivation
for some strings
of nonterminal and terminal symbols from the start symbol.
is called ''useless'' if it is unreachable or unproductive.
is called ''nullable'' if there is a derivation
. A rule
is called an ''ε-production''. A derivation
is called a ''cycle''.
Algorithms are known to eliminate from a given grammar, without changing its generated language,
* unproductive symbols,
* unreachable symbols,
* ε-productions, with one possible exception,
[If the grammar can generate , a rule cannot be avoided.] and
* cycles.
[This is a consequence of the unit-production elimination theorem in Hopcroft & Ullman (1979), p.91, Theorem 4.4]
In particular, an alternative containing a useless nonterminal symbol can be deleted from the right-hand side of a rule.
Such rules and alternatives are called ''useless''.
In the depicted example grammar, the nonterminal ''D'' is unreachable, and ''E'' is unproductive, while ''C'' → ''C'' causes a cycle.
Hence, omitting the last three rules doesn't change the language generated by the grammar, nor does omitting the alternatives ", ''Cc'' , ''Ee''" from the right-hand side of the rule for ''S''.
A context-free grammar is said to be ''proper'' if it has neither useless symbols nor ε-productions nor cycles.
Combining the above algorithms, every context-free grammar not generating ε can be transformed into a
weakly equivalent proper one.
Regularity and LL(''k'') checks
It is decidable whether a given ''grammar'' is a
regular grammar, as well as whether it is an
LL(''k'') grammar for a given ''k''≥0.
If ''k'' is not given, the latter problem is undecidable.
Given a context-free ''language'', it is neither decidable whether it is regular, nor whether it is an LL(''k'') language for a given ''k''.
Emptiness and finiteness
There are algorithms to decide whether a language of a given context-free language is empty, as well as whether it is finite.
Undecidable problems
Some questions that are undecidable for wider classes of grammars become decidable for context-free grammars; e.g. the
emptiness problem (whether the grammar generates any terminal strings at all), is undecidable for
context-sensitive grammar
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols. Context-sensitive grammars are more general than co ...
s, but decidable for context-free grammars.
However, many problems are
undecidable even for context-free grammars. Examples are:
Universality
Given a CFG, does it generate the language of all strings over the alphabet of terminal symbols used in its rules?
[, Theorem 5.10, p. 181.]
A reduction can be demonstrated to this problem from the well-known undecidable problem of determining whether a
Turing machine accepts a particular input (the
halting problem). The reduction uses the concept of a ''
computation history'', a string describing an entire computation of a
Turing machine. A CFG can be constructed that generates all strings that are not accepting computation histories for a particular Turing machine on a particular input, and thus it will accept all strings only if the machine doesn't accept that input.
Language equality
Given two CFGs, do they generate the same language?
[, p. 281.]
The undecidability of this problem is a direct consequence of the previous: it is impossible to even decide whether a CFG is equivalent to the trivial CFG defining the language of all strings.
Language inclusion
Given two CFGs, can the first one generate all strings that the second one can generate?
If this problem was decidable, then language equality could be decided too: two CFGs G1 and G2 generate the same language if L(G1) is a subset of L(G2) and L(G2) is a subset of L(G1).
Being in a lower or higher level of the Chomsky hierarchy
Using
Greibach's theorem, it can be shown that the two following problems are undecidable:
* Given a
context-sensitive grammar
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols. Context-sensitive grammars are more general than co ...
, does it describe a context-free language?
* Given a context-free grammar, does it describe a
regular language?
[.]
Grammar ambiguity
Given a CFG, is it
ambiguous
Ambiguity is the type of meaning (linguistics), meaning in which a phrase, statement or resolution is not explicitly defined, making several interpretations wikt:plausible#Adjective, plausible. A common aspect of ambiguity is uncertainty. It ...
?
The undecidability of this problem follows from the fact that if an algorithm to determine ambiguity existed, the
Post correspondence problem could be decided, which is known to be undecidable.
Language disjointness
Given two CFGs, is there any string derivable from both grammars?
If this problem was decidable, the undecidable
Post correspondence problem could be decided, too: given strings
over some alphabet
, let the grammar consist of the rule
:
;
where
denotes the
reversed
Reversal may refer to:
* Medical reversal, when a medical intervention falls out of use after improved clinical trials demonstrate its ineffectiveness or harmfulness.
* Reversal (law), the setting aside of a decision of a lower court by a higher c ...
string
and
doesn't occur among the
; and let grammar consist of the rule
:
;
Then the Post problem given by
has a solution if and only if and share a derivable string.
Extensions
An obvious way to extend the context-free grammar formalism is to allow nonterminals to have arguments, the values of which are passed along within the rules. This allows natural language features such as
agreement Agreement may refer to:
Agreements between people and organizations
* Gentlemen's agreement, not enforceable by law
* Trade agreement, between countries
* Consensus, a decision-making process
* Contract, enforceable in a court of law
** Meeting of ...
and
reference, and programming language analogs such as the correct use and definition of identifiers, to be expressed in a natural way. E.g. we can now easily express that in English sentences, the subject and verb must agree in number. In computer science, examples of this approach include
affix grammars,
attribute grammars,
indexed grammar Indexed grammars are a generalization of context-free grammars in that nonterminals are equipped with lists of ''flags'', or ''index symbols''.
The language produced by an indexed grammar is called an indexed language.
Definition
Modern definition ...
s, and Van Wijngaarden
two-level grammar
A two-level grammar is a formal grammar that is used to generate another formal grammar, such as one with an infinite rule set. This is how a Van Wijngaarden grammar was used to specify Algol 68. A context-free grammar that defines the rules for a ...
s. Similar extensions exist in linguistics.
An extended context-free grammar (or regular right part grammar) is one in which the right-hand side of the production rules is allowed to be a
regular expression over the grammar's terminals and nonterminals. Extended context-free grammars describe exactly the context-free languages.
Another extension is to allow additional terminal symbols to appear at the left-hand side of rules, constraining their application. This produces the formalism of
context-sensitive grammar
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols. Context-sensitive grammars are more general than co ...
s.
Subclasses
There are a number of important subclasses of the context-free grammars:
*
LR(''k'') grammars (also known as
deterministic context-free grammars) allow
parsing (string recognition) with
deterministic pushdown automata (PDA), but they can only describe
deterministic context-free languages.
*
Simple LR,
Look-Ahead LR grammars are subclasses that allow further simplification of parsing. SLR and LALR are recognized using the same PDA as LR, but with simpler tables, in most cases.
*
LL(''k'') and LL(''*'') grammars allow parsing by direct construction of a leftmost derivation as described above, and describe even fewer languages.
*
Simple grammar
Simple or SIMPLE may refer to:
*Simplicity, the state or quality of being simple
Arts and entertainment
* ''Simple'' (album), by Andy Yorke, 2008, and its title track
* "Simple" (Florida Georgia Line song), 2018
* "Simple", a song by Johnn ...
s are a subclass of the LL(1) grammars mostly interesting for its theoretical property that language equality of simple grammars is decidable, while language inclusion is not.
*
Bracketed grammars have the property that the terminal symbols are divided into left and right bracket pairs that always match up in rules.
*
Linear grammars have no rules with more than one nonterminal on the right-hand side.
*
Regular grammars are a subclass of the linear grammars and describe the
regular languages, i.e. they correspond to
finite automata and
regular expressions.
LR parsing extends LL parsing to support a larger range of grammars; in turn,
generalized LR parsing extends LR parsing to support arbitrary context-free grammars. On LL grammars and LR grammars, it essentially performs LL parsing and LR parsing, respectively, while on
nondeterministic grammars, it is as efficient as can be expected. Although GLR parsing was developed in the 1980s, many new language definitions and
parser generators continue to be based on LL, LALR or LR parsing up to the present day.
Linguistic applications
Chomsky
Avram Noam Chomsky (born December 7, 1928) is an American public intellectual: a linguist, philosopher, cognitive scientist, historian, social critic, and political activist. Sometimes called "the father of modern linguistics", Chomsky is ...
initially hoped to overcome the limitations of context-free grammars by adding
transformation rules
In the philosophy of logic, a rule of inference, inference rule or transformation rule is a logical form consisting of a function which takes premises, analyzes their syntax, and returns a conclusion (or conclusions). For example, the rule of ...
.
Such rules are another standard device in traditional linguistics; e.g.
passivization in English. Much of
generative grammar has been devoted to finding ways of refining the descriptive mechanisms of phrase-structure grammar and transformation rules such that exactly the kinds of things can be expressed that natural language actually allows. Allowing arbitrary transformations does not meet that goal: they are much too powerful, being
Turing complete unless significant restrictions are added (e.g. no transformations that introduce and then rewrite symbols in a context-free fashion).
Chomsky's general position regarding the non-context-freeness of natural language has held up since then,
[.] although his specific examples regarding the inadequacy of context-free grammars in terms of their weak generative capacity were later disproved.
[.]
Gerald Gazdar
Gerald James Michael Gazdar, FBA (born 24 February 1950) is a British linguist and computer scientist.
Education
He was educated at Heath Mount School, Bradfield College, the University of East Anglia (BA, 1970) and the University of Reading (M ...
and
Geoffrey Pullum have argued that despite a few non-context-free constructions in natural language (such as
cross-serial dependencies
In linguistics, cross-serial dependencies (also called crossing dependencies by some authors.) occur when the lines representing the dependency relations between two series of words cross over each other.. They are of particular interest to ling ...
in
Swiss German
Swiss German (Standard German: , gsw, Schwiizerdütsch, Schwyzerdütsch, Schwiizertüütsch, Schwizertitsch Mundart,Because of the many different dialects, and because there is no defined orthography for any of them, many different spelling ...
and
reduplication
In linguistics, reduplication is a morphological process in which the root or stem of a word (or part of it) or even the whole word is repeated exactly or with a slight change.
The classic observation on the semantics of reduplication is Edwa ...
in
Bambara[.]), the vast majority of forms in natural language are indeed context-free.
See also
*
Parsing expression grammar
*
Stochastic context-free grammar Grammar theory to model symbol strings originated from work in computational linguistics aiming to understand the structure of natural languages. Probabilistic context free grammars (PCFGs) have been applied in probabilistic modeling of RNA structur ...
*
Algorithms for context-free grammar generation
*
Pumping lemma for context-free languages
References
Notes
Further reading
*. Chapter 4: Context-Free Grammars, pp. 77–106; Chapter 6: Properties of Context-Free Languages, pp. 125–137.
*
*. Chapter 2: Context-Free Grammars, pp. 91–122; Section 4.1.2: Decidable problems concerning context-free languages, pp. 156–159; Section 5.1.1: Reductions via computation histories: pp. 176–183.
*
External links
* Computer programmers may find th
stack exchange answerto be useful.
CFG Developercreated by Christopher Wong at Stanford University in 2014; modified by Kevin Gibbons in 2015.
{{DEFAULTSORT:Context-Free Grammar
1956 in computing
Compiler construction
Formal languages
Programming language topics
Wikipedia articles with ASCII art