derivation language

Background

Let $G=(\Sigma,N,P,\sigma)$ be a formal grammar. A pair $(W_{1},W_{2})$ of words over $\Sigma$ is said to correspond to the production $p\to q$ if

W_{1}=u_{1}pv_{1}\qquad\mbox{and}\qquad W_{2}=u_{2}qv_{2}

for some words $u_{i},v_{j}$ over $\Sigma$ . We also say that $(W_{1},W_{2})$ is a derivation step, and write $W_{1}\to W_{2}$ .

Recall that a derivation in $G$ is a finite sequence of words

W_{1},W_{2},\ldots,W_{n}

over $\Sigma$ such that $W_{i}\Rightarrow W_{i+1}$ , for $i=1,\ldots,n-1$ . The derivation is also written

W_{1}\Rightarrow W_{2}\Rightarrow\cdots\Rightarrow W_{n}.

The derivation above has $n-1$ steps. Zero-step derivations are also permitted. These are just words over $\Sigma$ .

The reflexive transitive closure of $\Rightarrow$ is $\Rightarrow^{*}$ . Thus, $V\Rightarrow^{*}W$ means that there is a derivation starting with $V$ and ending with $W$ . There may be more than one derivation from $V$ to $W$ .

If we consider each production as a “letter” in the alphabet $P$ , then the above derivation can be represented by a “word” over $P$ in the following manner: the “word” is formed by taking concatenations of the “letters”, where concatenations correspond to successive applications of productions in $P$ :

[p_{1}\to q_{1}][p_{2}\to q_{2}]\cdots[p_{n-1}\to q_{n-1}].

Derivation language is thus a certain collection of derivation words, formally defined below.

Definitions

Treating productions as “letters” is really nothing more than labeling each production with some symbol. Formally, call a labeling of an alphabet $P$ a surjection $f:F\to P$ , where $F$ is some alphabet. For any $p\in P$ , a label for $p$ is an element $x\in F$ such that $f(x)=p$ . We will only be interested in injective labeling (hence a bijection) from now on.

Definition. Suppose we are given a labeling $f$ of the set $P$ of productions in the grammar $G$ . Given a derivation $D:W_{1}\Rightarrow W_{2}\Rightarrow\ldots\Rightarrow W_{n}$ , a derivation word $U$ for $D$ is defined inductively as follows:

1.

if $n=1$ , then the empty word $U:=\lambda$ ,
2.

if $n=2$ , then $U:=x\in F$ is a label for a production that $W_{1}\Rightarrow W_{2}$ corresponds to,
3.
if $n>2$ , then $U:=U_{1}U_{2}$ , where
- –
  
  $U_{1}$ is a derivation word for the derivation $W_{1}\Rightarrow\cdots\Rightarrow W_{i}$ ,
- –
  
  $U_{2}$ is a derivation word for the derivation $W_{i}\Rightarrow\cdots\Rightarrow W_{n}$ .

If $U$ is a derivation word for derivation $D$ , let us abbreviate this by writing $f[U]=D$ . Note that we are not applying the labeling $f$ to $U$ , it is merely a notational convenience.

A derivation word is sometimes called a control word, for it defines or controls whether and how a word may be derived from another word. Note that any $W_{1}\Rightarrow^{*}W_{2}$ may correspond to several distinct derivations, and hence several distinct derivation words. Also, the same derivation word may correspond to distinct derivations.

For example, let $G$ be a grammar over two symbols $($ and $)$ with productions $\sigma\to\lambda$ , $\sigma\to(\sigma)$ , and $\sigma\to\sigma\sigma$ ( $G$ generates the parenthesis language $\boldsymbol{\operatorname{Paren}_{1}}$ ) Label the productions as $a, b, c$ respectively. Then the derivation $\sigma\Rightarrow^{*}(()())$ correspond to derivation words $b c b b a a$ and $b c b a b a$ . Notice that $\sigma\sigma\Rightarrow(\sigma)\sigma$ and $\sigma\sigma\Rightarrow\sigma(\sigma)$ both correspond to the derivation word $b$ .

Definition. The derivation language of a grammar $G=(\Sigma,N,P,\sigma)$ is the set of all derivation words for derivations on words generated by $G$ . In other words, consider the labeling $f:F\to P$ . The derivation language of $G$ is the set

\{U\in F^{*}\mid f[U]\mbox{ is a derivation of the form }\sigma\Rightarrow^{*}% u\mbox{ for some }u\in N^{*}\}.

The derivation language of $G$ is also called the Szilard language of $G$ , named after its inventor, and is denoted by $\operatorname{Sz}(G)$ .

For example, let $G$ be the grammar over a the letter $a$ , with productions given by $\sigma\to\sigma$ , $\sigma\to a$ . If the productions are labeled $b, c$ , then $\operatorname{Sz}(G)=\{b^{n}c\mid n\geq 0\}$ . Note that $L(G)=\{a\}$ . Likewise, if $G^{\prime}$ is the grammar over $a$ , with productions $\sigma\to A\sigma$ , $A\to\lambda$ , and $\sigma\to a$ , labeled $p, q, r$ respectively, then $L(G^{\prime})=\{a\}$ . However, $\operatorname{Sz}(G^{\prime})$ is quite different from $\operatorname{Sz}(G)$ :

	$\displaystyle\operatorname{Sz}(G^{\prime})$	$\displaystyle=\{u\in F^{*}\mid$	$\displaystyle u=vrw\mbox{, where }v\in\{p,q\}^{},w\in\{q\}^{},$
			$\displaystyle\#_{u}(p)=\#_{u}(q)\mbox{ and }\#_{x}(p)\geq\#_{x}(q)\mbox{ for % all }x\leq u\}$

where

•

$F=\{p,q,r\}$ ,
•

$\#_{u}(r)$ means the number of occurrences of $r$ in word $u$ ,
•

$v\leq u$ means that $v$ is a prefix of $u$ .

Remarks. Let $G$ be a formal grammar.

•
Some properties of $\operatorname{Sz}(G)$ :
1. (a)
  
  $\operatorname{Sz}(G)$ is always context-sensitive.
2. (b)
  
  If $G$ is regular, so is $\operatorname{Sz}(G)$ .
3. (c)
  
  if $G$ is context-free, $\operatorname{Sz}(G)$ may not be; in fact, for any context-free language $L$ , there is a context-free grammar $G$ such that $L=L(G)$ but $\operatorname{Sz}(G)$ is not context-free.
4. (d)
  
  There exists a context-free language $L$ such that $\operatorname{Sz}(G)$ is not context-free for any grammar $G$ generating $L$ .
•

However, if one considers the subset $\operatorname{Sz}_{L}(G)$ of $\operatorname{Sz}(G)$ consisting of all derivation words corresponding to leftmost derivations, then $\operatorname{Sz}_{L}(G)$ is context-free.
•

It is an open question that, given any (context-sensitive) language $L$ , whether there is a grammar $G$ such that $L=\operatorname{Sz}(G)$ .

References

1 A. Salomaa, Formal Languages, Academic Press, New York (1973).
2 G. E. Révész, Introduction to Formal Languages, Dover Publications (1991).

Title	derivation language
Canonical name	DerivationLanguage
Date of creation	2013-03-22 18:58:15
Last modified on	2013-03-22 18:58:15
Owner	CWoo (3771)
Last modified by	CWoo (3771)
Numerical id	13
Author	CWoo (3771)
Entry type	Definition
Classification	msc 68Q42
Classification	msc 68Q45
Synonym	Szilard language