PlanetMath (more info)
 Math for the people, by the people. Sponsor PlanetMath
Encyclopedia | Requests | Forums | Docs | Wiki | Random | RSS  
Login
create new user
name:
pass:
forget your password?
Main Menu
Viewing Version 25 of 'formal grammar'
[ view 'formal grammar' | back to history ]

Title of object: formal grammar
Canonical Name: FormalGrammar
Type: Definition

Created on: 2006-12-09 14:38:42
Modified on: 2007-11-11 18:57:46

Creator: CWoo
Modifier: CWoo
Author: CWoo
Author: Lando47

Classification: msc:03D05, msc:68Q45, msc:68Q42, msc:91F20
Defines: formal language, terminal, non-terminal, starting symbol, production, generable by a formal grammar
Synonyms: formal grammar=phrase-structure grammar
formal grammar=unrestricted grammar
formal grammar=grammar
formal grammar=phrase-structure language
formal grammar=terminal symbol
formal grammar=non-terminal symbol
formal grammar=initial-symbol
formal grammar=initial symbol
formal grammar=start symbol

Preamble:

% this is the default PlanetMath preamble. as your knowledge
% of TeX increases, you will probably want to edit this, but
% it should be fine as is for beginners.

% almost certainly you want these
\usepackage{amssymb}
\usepackage{amsmath}
\usepackage{amsfonts}

% used for TeXing text within eps files
%\usepackage{psfrag}
% need this for including graphics (\includegraphics)
%\usepackage{graphicx}
% for neatly defining theorems and propositions
%\usepackage{amsthm}
% making logically defined graphics
%\usepackage{xypic}

% there are many more packages, add them here as you need them

% define commands here
\newcommand{\derive}{\stackrel{*}{\Rightarrow}}
Content:

\PMlinkescapeword{rewriting rules}

\subsubsection*{Introduction}

A grammar, loosely speaking, is a set of rules that can be applied to words to generate sentences in a language. For example, with the grammar of the English language, one can form syntactically correct sentences such as ``The elephant drove his bicycle to the moon,'' regardless whether the sentence is meaningful or not.

The mathematical abstraction of a grammar is known as a \emph{formal grammar}. Instead of generating sentences from words, a formal grammar generates words from symbols and other words. The following basic ingredients are necessary in a formal grammar:
\begin{itemize}
\item a collection of symbols called an alphabet,
\item a collection of rules, called \emph{rewriting rules}, specifying how one can \emph{generate} new words from existing ones, and
\item a collection of \emph{initial} words that serve to initialize the generation of new words.
\end{itemize}
To see how these rewriting rules work, let us look at an exaample. Let $\lbrace a,b\rbrace$ be the alphabet as well as the set of initial words. With the rewriting rules given by: from a word $x$ we can form the word $ax$, as well as the word $xa$, we would be able to generate words like $$aa, \quad aaa, \quad ab, \quad baa$$ However, words such as $$bb, \quad baba, \quad baaab$$ can not be produced.

Note that by adding an extra symbol $\sigma$ to the above alphabet, and two additional rewriting rules given by ``from $\sigma$ form $a$'' and ``from $\sigma$ form $b$'', it is not hard to see that any word that can be generated by the first grammar can be generated by this new grammar.

\subsubsection*{Definition}

Formalizing what we have discussed above, we say that a \emph{formal grammar} $G$ is a quadruple $(\Sigma,N,P,\sigma)$, where
\begin{enumerate}
\item $(\Sigma,P)$ is a rewriting system;
\item $N$ is a subset of $\Sigma$ whose elements are called \emph{non-terminals}, and $T:=\Sigma-N$ the set of \emph{terminals};
\item an element $\sigma\in N$ called the \emph{starting symbol}.
\end{enumerate}
Instead of writing $G=(\Sigma,N,P,\sigma)$, the quadruple $(T,N,P,\sigma)$ is another way of representing $G$ (as long as the conditions $\Sigma=T\cup N$ and $T\cap N=\varnothing$ are satisfied).

A formal grammar is variously known as a \emph{phrase-structure grammar}, an \emph{unrestricted grammar}, or simply a \emph{grammar}. A formal grammar is sometimes also called a rewriting system in the literature, although the two notions are distinct on PlanetMath.

Given a formal grammar $G$, the \emph{formal language} generated by $G$ is the set
$$L(G):=\lbrace w\in T^*\mid \sigma \derive w\rbrace,$$
where $\derive$ is the derivability relation in the rewriting system $(\Sigma,P)$. A \emph{formal language} is also known as a \emph{phrase-structure language}.

With this defined, two more concepts follow:
\begin{enumerate}
\item Two formal grammars $G_1=(\Sigma_1,N_1,P_1,\sigma_1)$ and $G_2=(\Sigma_2,N_2,P_2,\sigma_2)$ are said to be \emph{equivalent} if $L(G_1)=L(G_2)$. When $G_1$ is equivalent to $G_2$, we write $G_1\cong G_2$.
\item A language $L$ over an alphabet $A$ is said to be \emph{generable by a formal grammar} if there is a formal grammar $G$ such that $L=L(G)\cap A^*$.
\end{enumerate}

\textbf{Example}. Continuing from the example in the previous section, we can put $T=\lbrace a,b\rbrace$ and $N=\lbrace \sigma\rbrace$. For the set $P$ of productions, we have four
\begin{enumerate}
\item $\sigma \to \sigma a$
\item $\sigma \to a\sigma$
\item $\sigma \to a$
\item $\sigma \to b$
\end{enumerate}
Then $G=(\Sigma,N,P,\sigma)$ is a formal grammar. It is not hard to see that $\sigma \derive baa$, as $\sigma \to \sigma a \to \sigma aa \to baa$. In fact, $L(G)$ consists of all words such that $a$ occurs at least once and $b$ occurs at most once.

\textbf{Remarks}.
\begin{itemize}
\item Not every language can be generated by a formal grammar. Given a finite alphabet $\Sigma$, $\Sigma^*$ is countably infinite, and therefore there are uncountably many languages over $\Sigma$. However, there are only a countably infinitely many languages that can be generated by formal grammars.
\item Every language generated by a formal grammar is recursively enumerable.
\item Every context-sensitive grammar is equivalent to a formal grammar, and under the Chomsky hierarchy, the class of formal languages is of class $0$.
\end{itemize}

\begin{thebibliography}{9}
\bibitem{hlcp} H.R. Lewis, C.H. Papadimitriou {\em Elements of the Theory of Computation}. Prentice-Hall, Englewood Cliffs, New Jersey (1981).
\end{thebibliography}