Parikh’s theorem

Let $\Sigma$ be an alphabet. Fix an order on elements of $\Sigma$ , so that they are $a_{1},\ldots,a_{n}$ . For each word $w$ over $\Sigma$ , we can form an $n$ -tuple $(w_{1},\ldots,w_{n})$ so that each $w_{i}$ is the number of occurrences of $a_{i}$ in $w$ . The function $\Psi:\Sigma^{*}\to\mathbb{N}^{n}$ such that

\Psi(w)=(w_{1},\ldots,w_{n})

is called the Parikh mapping (over the alphabet $\Sigma$ ). Here, $\mathbb{N}$ is the set of non-negative integers.

Two languages $L_{1}$ and $L_{2}$ over some $\Sigma$ are called letter-equivalent if $\Psi(L_{1})=\Psi(L_{2})$ .

For example, $L_{1}:=\{a^{n}b^{n}\mid n\geq 0\}$ is letter-equivalent to the $L_{2}:=\{(ab)^{n}\mid n\geq 0\}$ , as both are mapped to the set $\{(n,n)\mid n\geq 0\}$ by $\Psi$ . Note that $L_{1}$ is context-free, while $L_{2}$ is regular. In fact, we have the following:

Theorem 1 (Parikh).

Every context-free language is letter-equivalent to a regular language.

Like the pumping lemma, Parikh’s theorem can be used to show that certain languages are not context-free. But in order to apply Parikh’s theorem above, one needs to know more about the structure of the set $\Psi(L)$ for a given context-free language $L$ .

First, note that $\mathbb{N}^{n}$ is a commutative monoid under addition. For any element $x$ and subset $S$ of $\mathbb{N}^{n}$ , define $x+S$ in the obvious manner. Next, let $x_{1},\ldots,x_{m}$ be elements of $\mathbb{N}^{n}$ . Denote

\langle x_{1},\ldots,x_{m}\rangle

the submonoid generated by $x_{1},\ldots,x_{m}$ .

A subset $S$ of $\mathbb{N}^{n}$ is said to be linear if it has the form

x_{0}+\langle x_{1},\ldots,x_{m}\rangle,

for some $x_{0},\ldots,x_{m}\in\mathbb{N}^{n}$ . It is not hard to see that every linear set is the image of some regular language under $\Psi$ . For example,

(1,0,0)+\langle(2,1,0),(0,3,2)\rangle=\Psi(\{a(a^{2}b)^{m}(b^{3}c^{2})^{n}\mid m% ,n\geq 0\}).

The example can be easily generalized.

A subset of $\mathbb{N}^{n}$ is said to be semilinear if it is the union of finitely many linear sets. Since regular languages are closed under union, every semilinear set is the image of a regular language under $\Psi$ . As a result, Parikh’s theorem can be restated as

Theorem 2 (Theorem 1 restated).

If $L$ is context-free, then $\Psi(L)$ is semilinear.

Using this theorem, one easily sees that the language $\{a^{p}\mid p\text{ is a prime number }\}$ is not context-free, and neither is the language

\{a^{m}b^{n}\mid\mbox{ either }m\mbox{ is prime, and }n\geq m,\mbox{ or }n% \mbox{ is prime, and }m\geq n\},

which can not be proved by the pumping lemma in a straightforward manner.

Remarks.

•

The converse of Theorem 2 above is false. For example, let $L=\{a^{n}b^{n}c^{n}\mid n\geq 0\}$ . Then $L$ is not context-free by the pumping lemma. However, $\Psi(L)=\langle(1,1,1)\rangle$ , which is clearly linear.
•

In the literature, a lanugage $L$ such that $\Psi(L)$ is semilinear is often called a slip-language. It can be shown that for a slip-language $L$ over an alphabet consisting of two symbols, the language $\Psi^{-1}\circ\Psi(L)$ is context-free.

References

1 H.R. Lewis, C.H. Papadimitriou, Elements of the Theory of Computation. Prentice-Hall, Englewood Cliffs, New Jersey (1981).
2 D. C. Kozen, Automata and Computability, Springer, New York (1997).
3 R.J. Parikh, On Context-Free Languages, Journal of the Association for Computing Machinery, 4 (1966), 570-581.
4 M. Latteux, Cônes rationnels commutatifs, J. Comput. Systems Sci. 18 (1979), 307-333.

Title	Parikh’s theorem
Canonical name	ParikhsTheorem
Date of creation	2013-03-22 18:55:49
Last modified on	2013-03-22 18:55:49
Owner	CWoo (3771)
Last modified by	CWoo (3771)
Numerical id	9
Author	CWoo (3771)
Entry type	Theorem
Classification	msc 68Q42
Classification	msc 68Q45
Defines	Parikh mapping
Defines	linear set
Defines	semilinear set
Defines	letter-equivalent
Defines	slip-language