<?xml version="1.0" encoding="UTF-8"?>

<record version="6" id="11783">
 <title>Parikh's theorem</title>
 <name>ParikhsTheorem</name>
 <created>2009-05-14 22:27:09</created>
 <modified>2009-05-19 06:09:38</modified>
 <type>Theorem</type>
<parent id="2546">context-free language</parent>
 <creator id="3771" name="CWoo"/>
 <author id="3771" name="CWoo"/>
 <classification>
	<category scheme="msc" code="68Q45"/>
	<category scheme="msc" code="68Q42"/>
 </classification>
 <defines>
	<concept>Parikh mapping</concept>
	<concept>linear set</concept>
	<concept>semilinear set</concept>
	<concept>letter-equivalent</concept>
	<concept>slip-language</concept>
 </defines>
 <preamble>\usepackage{amssymb,amscd}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{mathrsfs}

% used for TeXing text within eps files
%\usepackage{psfrag}
% need this for including graphics (\includegraphics)
%\usepackage{graphicx}
% for neatly defining theorems and propositions
\usepackage{amsthm}
% making logically defined graphics
\usepackage{xypic}
\usepackage{pst-plot}

% define commands here
\newcommand*{\abs}[1]{\left\lvert #1\right\rvert}
\newtheorem{prop}{Proposition}
\newtheorem{thm}{Theorem}
\newtheorem{ex}{Example}
\newcommand{\real}{\mathbb{R}}
\newcommand{\pdiff}[2]{\frac{\partial #1}{\partial #2}}
\newcommand{\mpdiff}[3]{\frac{\partial^#1 #2}{\partial #3^#1}}</preamble>
 <content>Let $\Sigma$ be an alphabet.  Fix an order on elements of $\Sigma$, so that they are $a_1,\ldots, a_n$.  For each word $w$ over $\Sigma$, we can form an $n$-tuple $(w_1,\ldots, w_n)$ so that each $w_i$ is the number of occurrences of $a_i$ in $w$.  The function $\Psi: \Sigma^* \to \mathbb{N}^n$ such that $$\Psi(w)=(w_1,\ldots, w_n)$$ is called the \emph{Parikh mapping} (over the alphabet $\Sigma$).  Here, $\mathbb{N}$ is the set of non-negative integers.

Two languages $L_1$ and $L_2$ over some $\Sigma$ are called \emph{letter-equivalent} if $\Psi(L_1)=\Psi(L_2)$.

For example, $L_1:=\lbrace a^nb^n \mid n\ge 0\rbrace$ is letter-equivalent to the $L_2:=\lbrace (ab)^n \mid n\ge 0 \rbrace$, as both are mapped to the set $\lbrace (n,n)\mid n\ge 0\rbrace$ by $\Psi$.  Note that $L_1$ is context-free, while $L_2$ is regular.  In fact, we have the following:

\begin{thm}[Parikh] Every context-free language is letter-equivalent to a regular language. \end{thm}

Like the pumping lemma, Parikh's theorem can be used to show that certain languages are not context-free.  But in order to apply Parikh's theorem above, one needs to know more about the structure of the set $\Psi(L)$ for a given context-free language $L$.

First, note that $\mathbb{N}^n$ is a commutative monoid under addition.  For any element $x$ and subset $S$ of $\mathbb{N}^n$, define $x+S$ in the obvious manner.  Next, let $x_1,\ldots, x_m$ be elements of $\mathbb{N}^n$.  Denote $$\langle x_1,\ldots, x_m \rangle$$ the submonoid generated by $x_1,\ldots, x_m$.  

A subset $S$ of $\mathbb{N}^n$ is said to be \emph{linear} if it has the form $$x_0 + \langle x_1,\ldots, x_m\rangle,$$ for some $x_0,\ldots, x_m \in \mathbb{N}^n$.  It is not hard to see that every linear set is the image of some regular language under $\Psi$.  For example, $$(1,0,0)+\langle (2,1,0),(0,3,2)\rangle = \Psi(\lbrace a(a^2b)^m (b^3c^2)^n \mid m,n\ge 0\rbrace).$$  The example can be easily generalized.

A subset of $\mathbb{N}^n$ is said to be \emph{semilinear} if it is the union of finitely many linear sets.  Since regular languages are closed under union, every semilinear set is the image of a regular language under $\Psi$.  As a result, Parikh's theorem can be restated as 

\begin{thm}[Theorem 1 restated] If $L$ is context-free, then $\Psi(L)$ is semilinear. \end{thm}

Using this theorem, one easily sees that the language $\lbrace a^p \mid p \text{ is a prime number } \rbrace$ is not context-free, and neither is the language $$\lbrace a^m b^n \mid \mbox{ either }m\mbox{ is prime, and }n\ge m,\mbox{ or }n\mbox{ is prime, and }m\ge n\rbrace,$$ which can not be proved by the pumping lemma in a straightforward manner.

\textbf{Remarks}.  
\begin{itemize}
\item The converse of Theorem 2 above is false.  For example, let $L=\lbrace a^nb^nc^n \mid n\ge 0\rbrace$.  Then $L$ is not context-free by the pumping lemma.  However, $\Psi(L)=\langle (1,1,1)\rangle$, which is clearly linear.
\item In the literature, a lanugage $L$ such that $\Psi(L)$ is semilinear is often called a \emph{slip-language}.  It can be shown that for a slip-language $L$ over an alphabet consisting of two symbols, the language $\Psi^{-1}\circ \Psi(L)$ is context-free.
\end{itemize}

\begin{thebibliography}{9}
\bibitem{hlcp} H.R. Lewis, C.H. Papadimitriou, {\em Elements of the Theory of Computation}. Prentice-Hall, Englewood Cliffs, New Jersey (1981).
\bibitem{dk} D. C. Kozen, {\em Automata and Computability}, Springer, New York (1997).
\bibitem{rjp} R.J. Parikh, {\em On Context-Free Languages, Journal of the Association for Computing Machinery, 4} (1966), 570-581.
\bibitem{ml} M. Latteux, {\em C\^{o}nes rationnels commutatifs}, J. Comput. Systems Sci. 18 (1979), 307-333.
\end{thebibliography}</content>
</record>
