formal language theory In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules. The alphabet of a formal language consists of symb ...

, a context-free language (CFL) is a

language Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which humans communicate, and may be conveyed through a variety of met ...

generated by a context-free grammar (CFG). Context-free languages have many applications in

programming languages A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...

, in particular, most arithmetic expressions are generated by context-free grammars.

Background

Context-free grammar

Different context-free grammars can generate the same context-free language. Intrinsic properties of the language can be distinguished from extrinsic properties of a particular grammar by comparing multiple grammars that describe the language.

Automata

The set of all context-free languages is identical to the set of languages accepted by

pushdown automata In the theory of computation, a branch of theoretical computer science, a pushdown automaton (PDA) is a type of automaton that employs a stack. Pushdown automata are used in theories about what can be computed by machines. They are more capab ...

, which makes these languages amenable to parsing. Further, for a given CFG, there is a direct way to produce a pushdown automaton for the grammar (and thereby the corresponding language), though going the other way (producing a grammar given an automaton) is not as direct.

Examples

An example context-free language is

L = \

, the language of all non-empty even-length strings, the entire first halves of which are 's, and the entire second halves of which are 's. is generated by the grammar

S\to aSb ~, ~ ab

. This language is not regular. It is accepted by the

pushdown automaton In the theory of computation, a branch of theoretical computer science, a pushdown automaton (PDA) is a type of automaton that employs a stack. Pushdown automata are used in theories about what can be computed by machines. They are more capab ...

M=(\, \, \, \delta, q_0, z, \)

where

\delta

is defined as follows:meaning of

\delta

's arguments and results:

\delta(\mathrm_1, \mathrm, \mathrm) = (\mathrm_2, \mathrm)

\begin
\delta(q_0, a, z) &= (q_0, az) \\
\delta(q_0, a, a) &= (q_0, aa) \\
\delta(q_0, b, a) &= (q_1, \varepsilon) \\
\delta(q_1, b, a) &= (q_1, \varepsilon) \\
\delta(q_1, \varepsilon, z) &= (q_f, \varepsilon)
\end

Unambiguous CFLs are a proper subset of all CFLs: there are inherently ambiguous CFLs. An example of an inherently ambiguous CFL is the union of

\

with

\

. This set is context-free, since the union of two context-free languages is always context-free. But there is no way to unambiguously parse strings in the (non-context-free) subset

\

which is the intersection of these two languages.

Dyck language

The language of all properly matched parentheses is generated by the grammar

S\to SS ~, ~ (S) ~, ~ \varepsilon

Properties

Context-free parsing

The context-free nature of the language makes it simple to parse with a pushdown automaton. Determining an instance of the

membership problem Member may refer to: * Military jury, referred to as "Members" in military jargon * Element (mathematics), an object that belongs to a mathematical set * In object-oriented programming, a member of a class ** Field (computer science), entries in ...

; i.e. given a string

w

, determine whether

w \in L(G)

where

L

is the language generated by a given grammar

G

; is also known as ''recognition''. Context-free recognition for

Chomsky normal form In formal language theory, a context-free grammar, ''G'', is said to be in Chomsky normal form (first described by Noam Chomsky) if all of its production rules are of the form: : ''A'' → ''BC'', or : ''A'' → ''a'', or : ''S'' � ...

grammars was shown by Leslie G. Valiant to be reducible to boolean

matrix multiplication In mathematics, particularly in linear algebra, matrix multiplication is a binary operation that produces a matrix from two matrices. For matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the s ...

, thus inheriting its complexity upper bound of ''O''(''n''^2.3728596).In Valiant's paper, ''O''(''n''^2.81) was the then-best known upper bound. See Matrix multiplication#Computational complexity for bound improvements since then. Conversely, Lillian Lee has shown ''O''(''n''^3−ε) boolean matrix multiplication to be reducible to ''O''(''n''^3−3ε) CFG parsing, thus establishing some kind of lower bound for the latter. Practical uses of context-free languages require also to produce a derivation tree that exhibits the structure that the grammar associates with the given string. The process of producing this tree is called ''

parsing Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...

''. Known parsers have a time complexity that is cubic in the size of the string that is parsed. Formally, the set of all context-free languages is identical to the set of languages accepted by pushdown automata (PDA). Parser algorithms for context-free languages include the

CYK algorithm In computer science, the Cocke–Younger–Kasami algorithm (alternatively called CYK, or CKY) is a parsing algorithm for context-free grammars published by Itiroo Sakai in 1961. The algorithm is named after some of its rediscoverers: John Cocke, ...

and Earley's Algorithm. A special subclass of context-free languages are the

deterministic context-free language In formal language theory, deterministic context-free languages (DCFL) are a proper subset of context-free languages. They are the context-free languages that can be accepted by a deterministic pushdown automaton. DCFLs are always unambiguous, mea ...

s which are defined as the set of languages accepted by a

deterministic pushdown automaton In automata theory, a deterministic pushdown automaton (DPDA or DPA) is a variation of the pushdown automaton. The class of deterministic pushdown automata accepts the deterministic context-free languages, a proper subset of context-free languages. ...

and can be parsed by a LR(k) parser. See also

parsing expression grammar In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 200 ...

as an alternative approach to grammar and parser.

Closure properties

The class of context-free languages is closed under the following operations. That is, if ''L'' and ''P'' are context-free languages, the following languages are context-free as well: *the

union Union commonly refers to: * Trade union, an organization of workers * Union (set theory), in mathematics, a fundamental operation on sets Union may also refer to: Arts and entertainment Music * Union (band), an American rock group ** ''Un ...

L \cup P

of ''L'' and ''P'' *the reversal of ''L'' *the

concatenation In formal language theory and computer programming, string concatenation is the operation of joining character strings end-to-end. For example, the concatenation of "snow" and "ball" is "snowball". In certain formalisations of concatenat ...

L \cdot P

of ''L'' and ''P'' *the

Kleene star In mathematical logic and computer science, the Kleene star (or Kleene operator or Kleene closure) is a unary operation, either on sets of strings or on sets of symbols or characters. In mathematics, it is more commonly known as the free monoid ...

L^*

of ''L'' *the image

\varphi(L)

of ''L'' under a

homomorphism In algebra, a homomorphism is a structure-preserving map between two algebraic structures of the same type (such as two groups, two rings, or two vector spaces). The word ''homomorphism'' comes from the Ancient Greek language: () meaning "same" ...

\varphi

*the image

\varphi^(L)

of ''L'' under an inverse homomorphism

\varphi^

*the

circular shift In combinatorial mathematics, a circular shift is the operation of rearranging the entries in a tuple, either by moving the final entry to the first position, while shifting all other entries to the next position, or by performing the inverse oper ...

of ''L'' (the language

\

) *the prefix closure of ''L'' (the set of all

prefix A prefix is an affix which is placed before the Word stem, stem of a word. Adding it to the beginning of one word changes it into another word. For example, when the prefix ''un-'' is added to the word ''happy'', it creates the word ''unhappy'' ...

es of strings from ''L'') *the

quotient In arithmetic, a quotient (from lat, quotiens 'how many times', pronounced ) is a quantity produced by the division of two numbers. The quotient has widespread use throughout mathematics, and is commonly referred to as the integer part of a ...

''L''/''R'' of ''L'' by a regular language ''R''

Nonclosure under intersection, complement, and difference

The context-free languages are not closed under intersection. This can be seen by taking the languages

A = \

and

B = \

, which are both context-free.A context-free grammar for the language ''A'' is given by the following production rules, taking ''S'' as the start symbol: ''S'' → ''Sc'' , ''aTb'' , ''ε''; ''T'' → ''aTb'' , ''ε''. The grammar for ''B'' is analogous. Their intersection is

A \cap B = \

, which can be shown to be non-context-free by the

pumping lemma for context-free languages Pumping may refer to: * The operation of a pump, for moving a liquid from one location to another **The use of a breast pump for extraction of milk * Pumping (audio), a creative misuse of dynamic range compression * Pumping (computer systems), the ...

. As a consequence, context-free languages cannot be closed under complementation, as for any languages ''A'' and ''B'', their intersection can be expressed by union and complement:

A \cap B = \overline

. In particular, context-free language cannot be closed under difference, since complement can be expressed by difference:

\overline = \Sigma^* \setminus L

. However, if ''L'' is a context-free language and ''D'' is a regular language then both their intersection

L\cap D

and their difference

L\setminus D

are context-free languages.

Decidability

In formal language theory, questions about regular languages are usually decidable, but ones about context-free languages are often not. It is decidable whether such a language is finite, but not whether it contains every possible string, is regular, is unambiguous, or is equivalent to a language with a different grammar. The following problems are undecidable for arbitrarily given context-free grammars A and B: *Equivalence: is

L(A)=L(B)

? *Disjointness: is

L(A) \cap L(B) = \emptyset

? However, the intersection of a context-free language and a ''regular'' language is context-free, hence the variant of the problem where ''B'' is a regular grammar is decidable (see "Emptiness" below). *Containment: is

L(A) \subseteq L(B)

? Again, the variant of the problem where ''B'' is a regular grammar is decidable, while that where ''A'' is regular is generally not. *Universality: is

L(A)=\Sigma^*

? *Regularity: is

L(A)

a regular language? *Ambiguity: is every grammar for

L(A)

ambiguous? The following problems are ''decidable'' for arbitrary context-free languages: *Emptiness: Given a context-free grammar ''A'', is

L(A) = \emptyset

? *Finiteness: Given a context-free grammar ''A'', is

L(A)

finite? *Membership: Given a context-free grammar ''G'', and a word

w

, does

w \in L(G)

? Efficient polynomial-time algorithms for the membership problem are the

and Earley's Algorithm. According to Hopcroft, Motwani, Ullman (2003), many of the fundamental closure and (un)decidability properties of context-free languages were shown in the 1961 paper of Bar-Hillel, Perles, and Shamir

Languages that are not context-free

The set

\

is a

context-sensitive language In formal language theory, a context-sensitive language is a language that can be defined by a context-sensitive grammar (and equivalently by a noncontracting grammar). Context-sensitive is one of the four types of grammars in the Chomsky hierarch ...

, but there does not exist a context-free grammar generating this language. So there exist context-sensitive languages which are not context-free. To prove that a given language is not context-free, one may employ the

or a number of other methods, such as

Ogden's lemma In the theory of formal languages, Ogden's lemma (named after William F. Ogden) is a generalization of the pumping lemma for context-free languages Pumping may refer to: * The operation of a pump, for moving a liquid from one location to another ...

Parikh's theorem Parikh's theorem in theoretical computer science says that if one looks only at the number of occurrences of each terminal symbol in a context-free language, without regard to their order, then the language is indistinguishable from a regular langua ...

Notes

References

Works cited

* *