<?xml version="1.0" encoding="UTF-8"?>

<record version="28" id="8609">
 <title>formal grammar</title>
 <name>FormalGrammar</name>
 <created>2006-12-09 14:38:42</created>
 <modified>2009-08-28 19:14:35</modified>
 <type>Definition</type>
 <creator id="3771" name="CWoo"/>
 <author id="3771" name="CWoo"/>
 <author id="12020" name="Lando47"/>
 <classification>
	<category scheme="msc" code="03D05"/>
	<category scheme="msc" code="68Q45"/>
	<category scheme="msc" code="68Q42"/>
	<category scheme="msc" code="91F20"/>
 </classification>
 <defines>
	<concept>formal language</concept>
	<concept>terminal</concept>
	<concept>non-terminal</concept>
	<concept>starting symbol</concept>
	<concept>production</concept>
	<concept>generable by a formal grammar</concept>
	<concept>sentential form</concept>
 </defines>
 <synonyms>
	<synonym concept="formal grammar" alias="phrase-structure grammar"/>
	<synonym concept="formal grammar" alias="unrestricted grammar"/>
	<synonym concept="formal grammar" alias="grammar"/>
	<synonym concept="formal grammar" alias="phrase-structure language"/>
	<synonym concept="formal grammar" alias="terminal symbol"/>
	<synonym concept="formal grammar" alias="non-terminal symbol"/>
	<synonym concept="formal grammar" alias="initial-symbol"/>
	<synonym concept="formal grammar" alias="initial symbol"/>
	<synonym concept="formal grammar" alias="start symbol"/>
 </synonyms>
 <related>
	<object name="SemiThueSystem"/>
	<object name="Language"/>
	<object name="PostSystem"/>
 </related>
 <preamble>% this is the default PlanetMath preamble.  as your knowledge
% of TeX increases, you will probably want to edit this, but
% it should be fine as is for beginners.

% almost certainly you want these
\usepackage{amssymb}
\usepackage{amsmath}
\usepackage{amsfonts}

% used for TeXing text within eps files
%\usepackage{psfrag}
% need this for including graphics (\includegraphics)
%\usepackage{graphicx}
% for neatly defining theorems and propositions
%\usepackage{amsthm}
% making logically defined graphics
%\usepackage{xypic}

% there are many more packages, add them here as you need them

% define commands here
\newcommand{\derive}{\stackrel{*}{\Rightarrow}}</preamble>
 <content>\PMlinkescapeword{rewriting rules}

\subsubsection*{Introduction}

A grammar, loosely speaking, is a set of rules that can be applied to words to generate sentences in a language.  For example, with the grammar of the English language, one can form syntactically correct sentences such as ``The elephant drove his bicycle to the moon,'' regardless whether the sentence is meaningful or not.

The mathematical abstraction of a grammar is known as a \emph{formal grammar}.  Instead of generating sentences from words, a formal grammar generates words from symbols and other words.  The following basic ingredients are necessary in a formal grammar: 
\begin{itemize}
\item a collection of symbols called an alphabet, 
\item a collection of rules, called \emph{rewriting rules}, specifying how one can \emph{generate} new words from existing ones, and 
\item a collection of \emph{initial} words that serve to initialize the generation of new words.
\end{itemize}
To see how these rewriting rules work, let us look at an example.  Let $\lbrace a,b\rbrace$ be the alphabet as well as the set of initial words.  With the rewriting rules given by: from a word $x$ we can form the word $ax$, as well as the word $xa$, we would be able to generate words like $$aa, \quad aaa, \quad ab, \quad baa$$  However, words such as $$bb, \quad baba, \quad baaab$$ can not be produced.

Note that by adding an extra symbol $\sigma$ to the above alphabet, and two additional rewriting rules given by ``from $\sigma$ form $a$'' and ``from $\sigma$ form $b$'', it is not hard to see that any word that can be generated by the first grammar can be generated by this new grammar.

\subsubsection*{Definition}

Formalizing what we have discussed above, we say that a \emph{formal grammar} $G$ is a quadruple $(\Sigma,N,P,\sigma)$, where
\begin{enumerate}
\item $(\Sigma,P)$ is a rewriting system;
\item $N$ is a subset of $\Sigma$ whose elements are called \emph{non-terminals}, and $T:=\Sigma-N$ the set of \emph{terminals};
\item an element $\sigma\in N$ called the \emph{starting symbol}.
\end{enumerate}
Instead of writing $G=(\Sigma,N,P,\sigma)$, the quadruple $(T,N,P,\sigma)$ is another way of representing $G$ (as long as the conditions $\Sigma=T\cup N$ and $T\cap N=\varnothing$ are satisfied).

A formal grammar is variously known as a \emph{phrase-structure grammar}, an \emph{unrestricted grammar}, or simply a \emph{grammar}.  A formal grammar is sometimes also called a rewriting system in the literature, although the two notions are distinct on PlanetMath.

Given a formal grammar $G$, a word $W$ over $\Sigma$ such that $\sigma \derive W$ is called a \emph{sentential form} of $G$.  A sentential form over $T$ is called a \emph{word} generated by $G$.  The set of all words generated by $G$ is called the \emph{formal language} generated by $G$, and is denoted by $L(G)$.  In other words,
$$L(G):=\lbrace w\in T^*\mid \sigma \derive w\rbrace,$$ 
where $\derive$ is the derivability relation in the rewriting system $(\Sigma,P)$.  A \emph{formal language} is also known as a \emph{phrase-structure language}.  

A language $L$ over an alphabet $A$ is said to be \emph{generable by a formal grammar} if there is a formal grammar $G$ such that $L=L(G)\cap A^*$.

\textbf{Example}.  Continuing from the example in the previous section, we can put $T=\lbrace a,b\rbrace$ and $N=\lbrace \sigma\rbrace$.  For the set $P$ of productions, we have four 
\begin{enumerate}
\item $\sigma \to \sigma a$
\item $\sigma \to a\sigma$
\item $\sigma \to a$
\item $\sigma \to b$
\end{enumerate}
Then $G=(\Sigma,N,P,\sigma)$ is a formal grammar.  It is not hard to see that $\sigma \derive baa$, as $\sigma \to \sigma a \to \sigma aa \to baa$.  In fact, $L(G)$ consists of all words such that $a$ occurs at least once and $b$ occurs at most once.

\textbf{Remarks}.
\begin{itemize}
\item Not every language can be generated by a formal grammar.  Given a finite alphabet $\Sigma$, $\Sigma^*$ is countably infinite, and therefore there are uncountably many languages over $\Sigma$.  However, there are only a countably infinitely many languages that can be generated by formal grammars.  
\item Every language generated by a formal grammar is recursively enumerable.
\item Every context-sensitive grammar is equivalent to a formal grammar, and under the Chomsky hierarchy, the class of formal languages is of class $0$.
\end{itemize}

\begin{thebibliography}{9}
\bibitem{hlcp} H.R. Lewis, C.H. Papadimitriou {\em Elements of the Theory of Computation}. Prentice-Hall, Englewood Cliffs, New Jersey (1981).
\end{thebibliography}</content>
</record>
