|
|
|
Viewing Version
25
of
'formal grammar'
|
[ view 'formal grammar'
|
back to history
]
| Title of object: |
formal grammar |
| Canonical Name: |
FormalGrammar |
| Type: |
Definition |
| Created on: |
2006-12-09 14:38:42 |
| Modified on: |
2007-11-11 18:57:46 |
| Classification: |
msc:03D05, msc:68Q45, msc:68Q42, msc:91F20 |
| Defines: |
formal language, terminal, non-terminal, starting symbol, production, generable by a formal grammar |
| Synonyms: |
formal grammar=phrase-structure grammar formal grammar=unrestricted grammar formal grammar=grammar formal grammar=phrase-structure language formal grammar=terminal symbol formal grammar=non-terminal symbol formal grammar=initial-symbol formal grammar=initial symbol formal grammar=start symbol |
Preamble:
% this is the default PlanetMath preamble. as your knowledge
% of TeX increases, you will probably want to edit this, but
% it should be fine as is for beginners.
% almost certainly you want these
\usepackage{amssymb}
\usepackage{amsmath}
\usepackage{amsfonts}
% used for TeXing text within eps files
%\usepackage{psfrag}
% need this for including graphics (\includegraphics)
%\usepackage{graphicx}
% for neatly defining theorems and propositions
%\usepackage{amsthm}
% making logically defined graphics
%\usepackage{xypic}
% there are many more packages, add them here as you need them
% define commands here
\newcommand{\derive}{\stackrel{*}{\Rightarrow}} |
Content:
\PMlinkescapeword{rewriting rules}
\subsubsection*{Introduction}
A grammar, loosely speaking, is a set of rules that can be applied to words to generate sentences in a language. For example, with the grammar of the English language, one can form syntactically correct sentences such as ``The elephant drove his bicycle to the moon,'' regardless whether the sentence is meaningful or not.
The mathematical abstraction of a grammar is known as a \emph{formal grammar}. Instead of generating sentences from words, a formal grammar generates words from symbols and other words. The following basic ingredients are necessary in a formal grammar:
\begin{itemize}
\item a collection of symbols called an alphabet,
\item a collection of rules, called \emph{rewriting rules}, specifying how one can \emph{generate} new words from existing ones, and
\item a collection of \emph{initial} words that serve to initialize the generation of new words.
\end{itemize}
To see how these rewriting rules work, let us look at an exaample. Let $\lbrace a,b\rbrace$ be the alphabet as well as the set of initial words. With the rewriting rules given by: from a word $x$ we can form the word $ax$, as well as the word $xa$, we would be able to generate words like $$aa, \quad aaa, \quad ab, \quad baa$$ However, words such as $$bb, \quad baba, \quad baaab$$ can not be produced.
Note that by adding an extra symbol $\sigma$ to the above alphabet, and two additional rewriting rules given by ``from $\sigma$ form $a$'' and ``from $\sigma$ form $b$'', it is not hard to see that any word that can be generated by the first grammar can be generated by this new grammar.
\subsubsection*{Definition}
Formalizing what we have discussed above, we say that a \emph{formal grammar} $G$ is a quadruple $(\Sigma,N,P,\sigma)$, where
\begin{enumerate}
\item $(\Sigma,P)$ is a rewriting system;
\item $N$ is a subset of $\Sigma$ whose elements are called \emph{non-terminals}, and $T:=\Sigma-N$ the set of \emph{terminals};
\item an element $\sigma\in N$ called the \emph{starting symbol}.
\end{enumerate}
Instead of writing $G=(\Sigma,N,P,\sigma)$, the quadruple $(T,N,P,\sigma)$ is another way of representing $G$ (as long as the conditions $\Sigma=T\cup N$ and $T\cap N=\varnothing$ are satisfied).
A formal grammar is variously known as a \emph{phrase-structure grammar}, an \emph{unrestricted grammar}, or simply a \emph{grammar}. A formal grammar is sometimes also called a rewriting system in the literature, although the two notions are distinct on PlanetMath.
Given a formal grammar $G$, the \emph{formal language} generated by $G$ is the set
$$L(G):=\lbrace w\in T^*\mid \sigma \derive w\rbrace,$$
where $\derive$ is the derivability relation in the rewriting system $(\Sigma,P)$. A \emph{formal language} is also known as a \emph{phrase-structure language}.
With this defined, two more concepts follow:
\begin{enumerate}
\item Two formal grammars $G_1=(\Sigma_1,N_1,P_1,\sigma_1)$ and $G_2=(\Sigma_2,N_2,P_2,\sigma_2)$ are said to be \emph{equivalent} if $L(G_1)=L(G_2)$. When $G_1$ is equivalent to $G_2$, we write $G_1\cong G_2$.
\item A language $L$ over an alphabet $A$ is said to be \emph{generable by a formal grammar} if there is a formal grammar $G$ such that $L=L(G)\cap A^*$.
\end{enumerate}
\textbf{Example}. Continuing from the example in the previous section, we can put $T=\lbrace a,b\rbrace$ and $N=\lbrace \sigma\rbrace$. For the set $P$ of productions, we have four
\begin{enumerate}
\item $\sigma \to \sigma a$
\item $\sigma \to a\sigma$
\item $\sigma \to a$
\item $\sigma \to b$
\end{enumerate}
Then $G=(\Sigma,N,P,\sigma)$ is a formal grammar. It is not hard to see that $\sigma \derive baa$, as $\sigma \to \sigma a \to \sigma aa \to baa$. In fact, $L(G)$ consists of all words such that $a$ occurs at least once and $b$ occurs at most once.
\textbf{Remarks}.
\begin{itemize}
\item Not every language can be generated by a formal grammar. Given a finite alphabet $\Sigma$, $\Sigma^*$ is countably infinite, and therefore there are uncountably many languages over $\Sigma$. However, there are only a countably infinitely many languages that can be generated by formal grammars.
\item Every language generated by a formal grammar is recursively enumerable.
\item Every context-sensitive grammar is equivalent to a formal grammar, and under the Chomsky hierarchy, the class of formal languages is of class $0$.
\end{itemize}
\begin{thebibliography}{9}
\bibitem{hlcp} H.R. Lewis, C.H. Papadimitriou {\em Elements of the Theory of Computation}. Prentice-Hall, Englewood Cliffs, New Jersey (1981).
\end{thebibliography} |
|
|
|
|
|