Parikh's theorem
   HOME

TheInfoList



OR:

Parikh's theorem in theoretical computer science says that if one looks only at the number of occurrences of each Context-free grammar#Formal definitions, terminal symbol in a context-free language, without regard to their order, then the language is indistinguishable from a regular language. It is useful for deciding that strings with a given number of terminals are not accepted by a context-free grammar. It was first proved by Rohit Jivanlal Parikh, Rohit Parikh in 1961 and republished in 1966.


Definitions and formal statement

Let \Sigma=\ be an Alphabet (formal languages), alphabet. The ''Parikh vector'' of a word is defined as the function p:\Sigma^*\to\mathbb^k, given by p(w)=(, w, _, , w, _, \ldots, , w, _) where , w, _ denotes the number of occurrences of the letter a_i in the word w. A subset of \mathbb^k is said to be ''linear'' if it is of the form u_0 + \mathbbu_1 + \dots + \mathbbu_m = \ for some vectors u_0,\ldots,u_m. A subset of \mathbb^k is said to be ''semi-linear'' if it is a union of finitely many linear subsets. In short, the image under p of context-free languages and of regular languages is the same, and it is equal to the set of semilinear sets. Two languages are said to be ''commutatively equivalent'' if they have the same set of Parikh vectors. Thus, every context-free language is commutatively equivalent to some regular language.


Proof

The second part is easy to prove. The first part is less easy. The following proof is credited to. First we need a small strengthening of the pumping lemma for context-free languages: The proof is essentially the same as the standard pumping lemma: use the pigeonhole principle to find k copies of some nonterminal symbol A in the longest path in the shortest derivation tree.


Strengthening for bounded languages

A language L is ''bounded'' if L\subset w_1^*\ldots w_k^* for some fixed words w_1,\ldots, w_k. Ginsburg and Spanier gave a necessary and sufficient condition, similar to Parikh's theorem, for bounded languages. Call a linear set ''stratified'', if in its definition for each i\ge 1 the vector u_i has the property that it has at most two non-zero coordinates, and for each i,j\ge 1 if each of the vectors u_i,u_j has two non-zero coordinates, i_1 and j_1, respectively, then their order is ''not'' i_1. A semi-linear set is stratified if it is a union of finitely many stratified linear subsets.


Significance

The theorem has multiple interpretations. It shows that a context-free language over a singleton alphabet must be a regular language and that some context-free languages can only have ambiguous grammars{{explain, reason=why exactly?, date=April 2017. Such languages are called ''inherently ambiguous languages''. From a formal grammar perspective, this means that some ambiguous context-free grammars cannot be converted to equivalent unambiguous context-free grammars.


References

Formal languages