In formal language theory, a context-free language (CFL) is a language generated by a context-free grammar (CFG).
Context-free languages have many applications in programming languages, in particular, most arithmetic expressions are generated by context-free grammars.

Background

Context-free grammar

Different context-free grammars can generate the same context-free language. Intrinsic properties of the language can be distinguished from extrinsic properties of a particular grammar by comparing multiple grammars that describe the language.

Automata

The set of all context-free languages is identical to the set of languages accepted by pushdown automata, which makes these languages amenable to parsing. Further, for a given CFG, there is a direct way to produce a pushdown automaton for the grammar (and thereby the corresponding language), though going the other way (producing a grammar given an automaton) is not as direct.

Examples

An example context-free language is $L\; =\; \backslash $, the language of all non-empty even-length strings, the entire first halves of which are 's, and the entire second halves of which are 's. is generated by the grammar $S\backslash to\; aSb\; ~|~\; ab$. This language is not regular. It is accepted by the pushdown automaton $M=(\backslash ,\; \backslash ,\; \backslash ,\; \backslash delta,\; q\_0,\; z,\; \backslash )$ where $\backslash delta$ is defined as follows:meaning of $\backslash delta$'s arguments and results: $\backslash delta(\backslash mathrm\_1,\; \backslash mathrm,\; \backslash mathrm)\; =\; (\backslash mathrm\_2,\; \backslash mathrm)$ :$\backslash begin\; \backslash delta(q\_0,\; a,\; z)\; \&=\; (q\_0,\; az)\; \backslash \backslash \; \backslash delta(q\_0,\; a,\; a)\; \&=\; (q\_0,\; aa)\; \backslash \backslash \; \backslash delta(q\_0,\; b,\; a)\; \&=\; (q\_1,\; \backslash varepsilon)\; \backslash \backslash \; \backslash delta(q\_1,\; b,\; a)\; \&=\; (q\_1,\; \backslash varepsilon)\; \backslash \backslash \; \backslash delta(q\_1,\; \backslash varepsilon,\; z)\; \&=\; (q\_f,\; \backslash varepsilon)\; \backslash end$ Unambiguous CFLs are a proper subset of all CFLs: there are inherently ambiguous CFLs. An example of an inherently ambiguous CFL is the union of $\backslash $ with $\backslash $. This set is context-free, since the union of two context-free languages is always context-free. But there is no way to unambiguously parse strings in the (non-context-free) subset $\backslash $ which is the intersection of these two languages.

Dyck language

The language of all properly matched parentheses is generated by the grammar $S\backslash to\; SS\; ~|~\; (S)\; ~|~\; \backslash varepsilon$.

Properties

Context-free parsing

The context-free nature of the language makes it simple to parse with a pushdown automaton. Determining an instance of the membership problem; i.e. given a string $w$, determine whether $w\; \backslash in\; L(G)$ where $L$ is the language generated by a given grammar $G$; is also known as ''recognition''. Context-free recognition for Chomsky normal form grammars was shown by Leslie G. Valiant to be reducible to boolean matrix multiplication, thus inheriting its complexity upper bound of ''O''(''n''^{2.3728639}).In Valiant's paper, ''O''(''n''^{2.81}) was the then-best known upper bound. See Matrix multiplication#Algorithms for efficient matrix multiplication and Coppersmith–Winograd algorithm for bound improvements since then.
Conversely, Lillian Lee has shown ''O''(''n''^{3−ε}) boolean matrix multiplication to be reducible to ''O''(''n''^{3−3ε}) CFG parsing, thus establishing some kind of lower bound for the latter.
Practical uses of context-free languages require also to produce a derivation tree that exhibits the structure that the grammar associates with the given string. The process of producing this tree is called ''parsing''. Known parsers have a time complexity that is cubic in the size of the string that is parsed.
Formally, the set of all context-free languages is identical to the set of languages accepted by pushdown automata (PDA). Parser algorithms for context-free languages include the CYK algorithm and Earley's Algorithm.
A special subclass of context-free languages are the deterministic context-free languages which are defined as the set of languages accepted by a deterministic pushdown automaton and can be parsed by a LR(k) parser.
See also parsing expression grammar as an alternative approach to grammar and parser.

Closure

The class of context-free languages is closed under the following operations. That is, if ''L'' and ''P'' are context-free languages, the following languages are context-free as well: *the union $L\; \backslash cup\; P$ of ''L'' and ''P'' *the reversal of ''L'' *the concatenation $L\; \backslash cdot\; P$ of ''L'' and ''P'' *the Kleene star $L^*$ of ''L'' *the image $\backslash varphi(L)$ of ''L'' under a homomorphism $\backslash varphi$ *the image $\backslash varphi^(L)$ of ''L'' under an inverse homomorphism $\backslash varphi^$ *the circular shift of ''L'' (the language $\backslash $) *the prefix closure of ''L'' (the set of all prefixes of strings from ''L'') *the quotient ''L''/''R'' of ''L'' by a regular language ''R''

Nonclosure under intersection, complement, and difference

The context-free languages are not closed under intersection. This can be seen by taking the languages $A\; =\; \backslash $ and $B\; =\; \backslash $, which are both context-free.A context-free grammar for the language ''A'' is given by the following production rules, taking ''S'' as the start symbol: ''S'' → ''Sc'' | ''aTb'' | ''ε''; ''T'' → ''aTb'' | ''ε''. The grammar for ''B'' is analogous. Their intersection is $A\; \backslash cap\; B\; =\; \backslash $, which can be shown to be non-context-free by the pumping lemma for context-free languages. As a consequence, context-free languages cannot be closed under complementation, as for any languages ''A'' and ''B'', their intersection can be expressed by union and complement: $A\; \backslash cap\; B\; =\; \backslash overline$. In particular, context-free language cannot be closed under difference, since complement can be expressed by difference: $\backslash overline\; =\; \backslash Sigma^*\; \backslash setminus\; L$. However, if ''L'' is a context-free language and ''D'' is a regular language then both their intersection $L\backslash cap\; D$ and their difference $L\backslash setminus\; D$ are context-free languages.

Decidability

In formal language theory, questions about regular languages are usually decidable, but ones about context-free languages are often not. It is decidable whether such a language is finite, but not whether it contains every possible string, is regular, is unambiguous, or is equivalent to a language with a different grammar. The following problems are undecidable for arbitrarily given context-free grammars A and B: *Equivalence: is $L(A)=L(B)$? *Disjointness: is $L(A)\; \backslash cap\; L(B)\; =\; \backslash emptyset$ ? However, the intersection of a context-free language and a ''regular'' language is context-free, hence the variant of the problem where ''B'' is a regular grammar is decidable (see "Emptiness" below). *Containment: is $L(A)\; \backslash subseteq\; L(B)$ ? Again, the variant of the problem where ''B'' is a regular grammar is decidable, while that where ''A'' is regular is generally not. *Universality: is $L(A)=\backslash Sigma^*$ ? The following problems are ''decidable'' for arbitrary context-free languages: *Emptiness: Given a context-free grammar ''A'', is $L(A)\; =\; \backslash emptyset$ ? *Finiteness: Given a context-free grammar ''A'', is $L(A)$ finite? *Membership: Given a context-free grammar ''G'', and a word $w$, does $w\; \backslash in\; L(G)$ ? Efficient polynomial-time algorithms for the membership problem are the CYK algorithm and Earley's Algorithm. According to Hopcroft, Motwani, Ullman (2003), many of the fundamental closure and (un)decidability properties of context-free languages were shown in the 1961 paper of Bar-Hillel, Perles, and Shamir

Languages that are not context-free

The set $\backslash $ is a context-sensitive language, but there does not exist a context-free grammar generating this language. So there exist context-sensitive languages which are not context-free. To prove that a given language is not context-free, one may employ the pumping lemma for context-free languages or a number of other methods, such as Ogden's lemma or Parikh's theorem.How to prove that a language is not context-free?

/ref>

Notes

References

** Works cited **

*
*

** Further reading **

*
*
*
{{Formal languages and grammars
Category:Formal languages
Category:Syntax

Background

Context-free grammar

Different context-free grammars can generate the same context-free language. Intrinsic properties of the language can be distinguished from extrinsic properties of a particular grammar by comparing multiple grammars that describe the language.

Automata

The set of all context-free languages is identical to the set of languages accepted by pushdown automata, which makes these languages amenable to parsing. Further, for a given CFG, there is a direct way to produce a pushdown automaton for the grammar (and thereby the corresponding language), though going the other way (producing a grammar given an automaton) is not as direct.

Examples

An example context-free language is $L\; =\; \backslash $, the language of all non-empty even-length strings, the entire first halves of which are 's, and the entire second halves of which are 's. is generated by the grammar $S\backslash to\; aSb\; ~|~\; ab$. This language is not regular. It is accepted by the pushdown automaton $M=(\backslash ,\; \backslash ,\; \backslash ,\; \backslash delta,\; q\_0,\; z,\; \backslash )$ where $\backslash delta$ is defined as follows:meaning of $\backslash delta$'s arguments and results: $\backslash delta(\backslash mathrm\_1,\; \backslash mathrm,\; \backslash mathrm)\; =\; (\backslash mathrm\_2,\; \backslash mathrm)$ :$\backslash begin\; \backslash delta(q\_0,\; a,\; z)\; \&=\; (q\_0,\; az)\; \backslash \backslash \; \backslash delta(q\_0,\; a,\; a)\; \&=\; (q\_0,\; aa)\; \backslash \backslash \; \backslash delta(q\_0,\; b,\; a)\; \&=\; (q\_1,\; \backslash varepsilon)\; \backslash \backslash \; \backslash delta(q\_1,\; b,\; a)\; \&=\; (q\_1,\; \backslash varepsilon)\; \backslash \backslash \; \backslash delta(q\_1,\; \backslash varepsilon,\; z)\; \&=\; (q\_f,\; \backslash varepsilon)\; \backslash end$ Unambiguous CFLs are a proper subset of all CFLs: there are inherently ambiguous CFLs. An example of an inherently ambiguous CFL is the union of $\backslash $ with $\backslash $. This set is context-free, since the union of two context-free languages is always context-free. But there is no way to unambiguously parse strings in the (non-context-free) subset $\backslash $ which is the intersection of these two languages.

Dyck language

The language of all properly matched parentheses is generated by the grammar $S\backslash to\; SS\; ~|~\; (S)\; ~|~\; \backslash varepsilon$.

Properties

Context-free parsing

The context-free nature of the language makes it simple to parse with a pushdown automaton. Determining an instance of the membership problem; i.e. given a string $w$, determine whether $w\; \backslash in\; L(G)$ where $L$ is the language generated by a given grammar $G$; is also known as ''recognition''. Context-free recognition for Chomsky normal form grammars was shown by Leslie G. Valiant to be reducible to boolean matrix multiplication, thus inheriting its complexity upper bound of ''O''(''n''

Closure

The class of context-free languages is closed under the following operations. That is, if ''L'' and ''P'' are context-free languages, the following languages are context-free as well: *the union $L\; \backslash cup\; P$ of ''L'' and ''P'' *the reversal of ''L'' *the concatenation $L\; \backslash cdot\; P$ of ''L'' and ''P'' *the Kleene star $L^*$ of ''L'' *the image $\backslash varphi(L)$ of ''L'' under a homomorphism $\backslash varphi$ *the image $\backslash varphi^(L)$ of ''L'' under an inverse homomorphism $\backslash varphi^$ *the circular shift of ''L'' (the language $\backslash $) *the prefix closure of ''L'' (the set of all prefixes of strings from ''L'') *the quotient ''L''/''R'' of ''L'' by a regular language ''R''

Nonclosure under intersection, complement, and difference

The context-free languages are not closed under intersection. This can be seen by taking the languages $A\; =\; \backslash $ and $B\; =\; \backslash $, which are both context-free.A context-free grammar for the language ''A'' is given by the following production rules, taking ''S'' as the start symbol: ''S'' → ''Sc'' | ''aTb'' | ''ε''; ''T'' → ''aTb'' | ''ε''. The grammar for ''B'' is analogous. Their intersection is $A\; \backslash cap\; B\; =\; \backslash $, which can be shown to be non-context-free by the pumping lemma for context-free languages. As a consequence, context-free languages cannot be closed under complementation, as for any languages ''A'' and ''B'', their intersection can be expressed by union and complement: $A\; \backslash cap\; B\; =\; \backslash overline$. In particular, context-free language cannot be closed under difference, since complement can be expressed by difference: $\backslash overline\; =\; \backslash Sigma^*\; \backslash setminus\; L$. However, if ''L'' is a context-free language and ''D'' is a regular language then both their intersection $L\backslash cap\; D$ and their difference $L\backslash setminus\; D$ are context-free languages.

Decidability

In formal language theory, questions about regular languages are usually decidable, but ones about context-free languages are often not. It is decidable whether such a language is finite, but not whether it contains every possible string, is regular, is unambiguous, or is equivalent to a language with a different grammar. The following problems are undecidable for arbitrarily given context-free grammars A and B: *Equivalence: is $L(A)=L(B)$? *Disjointness: is $L(A)\; \backslash cap\; L(B)\; =\; \backslash emptyset$ ? However, the intersection of a context-free language and a ''regular'' language is context-free, hence the variant of the problem where ''B'' is a regular grammar is decidable (see "Emptiness" below). *Containment: is $L(A)\; \backslash subseteq\; L(B)$ ? Again, the variant of the problem where ''B'' is a regular grammar is decidable, while that where ''A'' is regular is generally not. *Universality: is $L(A)=\backslash Sigma^*$ ? The following problems are ''decidable'' for arbitrary context-free languages: *Emptiness: Given a context-free grammar ''A'', is $L(A)\; =\; \backslash emptyset$ ? *Finiteness: Given a context-free grammar ''A'', is $L(A)$ finite? *Membership: Given a context-free grammar ''G'', and a word $w$, does $w\; \backslash in\; L(G)$ ? Efficient polynomial-time algorithms for the membership problem are the CYK algorithm and Earley's Algorithm. According to Hopcroft, Motwani, Ullman (2003), many of the fundamental closure and (un)decidability properties of context-free languages were shown in the 1961 paper of Bar-Hillel, Perles, and Shamir

Languages that are not context-free

The set $\backslash $ is a context-sensitive language, but there does not exist a context-free grammar generating this language. So there exist context-sensitive languages which are not context-free. To prove that a given language is not context-free, one may employ the pumping lemma for context-free languages or a number of other methods, such as Ogden's lemma or Parikh's theorem.

/ref>

Notes

References