LL(k)

Given a word $u$ and a context-free grammar $G$ , how do we determine if $u\in L(G)$ ?

There are in general two ways to proceed. Start from $u$ , and proceed backward to find $v$ such that $v\Rightarrow u$ . Keep going until a derivation $\sigma\Rightarrow^{*}u$ is found. This procedure is known as the bottom-up parsing of $u$ . The other method the top-down approach: begin with the starting symbol $\sigma$ , and work its way down to $u$ , so $\sigma\Rightarrow^{*}u$ .

As with the bottom-up approach, finding a derivation of $u$ from the top-down may be time consuming, if one is lucky enough to find a derivation at all.

There is a class of grammars, known as the $LL(k)$ grammars, which make the top-down parsing of a word natural and direct. The first $L$ in $LL(k)$ means scanning the symbols of $u$ from left to right, the second $L$ stands for finding a leftmost derivation ( $\Rightarrow_{L}$ ) for $u$ , and $k$ means having the allowance to look at up to $k$ symbols ahead while scanning.

Definition. Let $G=(\Sigma,N,P,\sigma)$ be a context-free grammar such that $\sigma\to\sigma$ is not a production of $G$ , and $k\geq 0$ an integer. Suppose $u\in L(G)$ with a $X\to U_{1}$ a production in a leftmost derivation of $u$ :

\sigma\Rightarrow_{L}^{*}UXU_{2}\Rightarrow_{L}UU_{1}U_{2}\Rightarrow_{L}^{*}u.

Let $n=|U|+k$ and $v$ be the prefix of $u$ of length $n$ (if $|u|<n$ , then set $v=u$ ).

Then $G$ is said to be $LL(k)$ if for any $w\in L(G)$ , with $v$ as a prefix, such that there is a production $X\to W_{1}$ in a leftmost derivation of $w$ :

\sigma\Rightarrow_{L}^{*}UXW_{2}\Rightarrow_{L}UW_{1}W_{2}\Rightarrow_{L}^{*}w,

implies that $W_{1}=U_{1}$ .

In a leftmost derivation $D_{u}$ of a word $u$ , call a prefix $v$ of $u$ is a leftmost descendant of a production $P\to U$ if $\sigma\Rightarrow^{*}vPU^{\prime}\Rightarrow vUU^{\prime}\Rightarrow^{*}u$ is $D_{u}$ . Then the definition above can be restated in words as follows:

Given a leftmost derivation $D_{u}$ of a word $u$ , a production used in $D_{u}$ is uniquely determined up to $k$ symbols beyond the prefix of $u$ which is a leftmost descendant of the production. In other words, if $D_{u}$ and $D_{w}$ are leftmost derivations of $u$ and $w$ which agree on $k$ symbols beyond the common prefix $v$ , where $v$ is both a leftmost descendant of $X\to U$ used in $D_{u}$ , and a leftmost descendant of $X\to W$ used in $D_{w}$ , then $X\to U$ and $X\to W$ are the same production, i.e. $U=W$ .

Every $LL(k)$ is unambiguous. Furthermore, every $LL(k)$ grammar is $LR(k)$ (http://planetmath.org/LRk).

Given a context-free grammar $G$ and $k\geq 0$ , there is an algorithm deciding whether $G$ is $LL(k)$ .

Examples

•

The grammar $G$ over $\Sigma=\{a,b\}$ , with productions $\sigma\to a^{2}\sigma b^{2}$ , $\sigma\to a$ and $\sigma\to\lambda$ is $LL(2)$ but not $LL(1)$ . It is not hard to see that $L(G)$ is the set $\{a^{m}b^{n}\mid n\mbox{ is even, and }n\leq m\leq n+1\}$ . On the other hand, the grammar $G^{\prime}$ over $\Sigma$ , with productions

$\sigma\to aX,\quad\sigma\to\lambda,\quad X\to aYb,\quad X\to\lambda,\quad Y\to aXb% ,\quad Y\to b$

also generates $L(G)$ , but is $LL(1)$ instead.
•

The grammar $G$ over $\{a,b,c\}$ , with productions

$\sigma\to X,\quad\sigma\to Y,\quad X\to aXb,\quad X\to ab,\quad Y\to aYc,\quad Y% \to ac$

is not $LL(k)$ for any $k\geq 0$ .

Definition A language is said to be $LL(k)$ if it is generated by an $LL(k)$ grammar. The family of $LL(k)$ languages is denoted by $\mathscr{LL}(k)$ .

It is easy to see that an $LL(0)$ contains no more than one word. Furthermore, it can be shown that

\mathscr{LL}(0)\subset\mathscr{LL}(1)\subset\cdots\subset\mathscr{LL}(k)% \subset\cdots,

and the inclusion is strict. If $\mathscr{LL}(k)^{\prime}$ denotes the family of $\lambda$ -free $LL(k)$ languages, then

\mathscr{LL}(0)^{\prime}=\mathscr{LL}(1)^{\prime}=\cdots=\mathscr{LL}(k)^{% \prime}=\cdots.

Given two $LL(k)$ grammars $G_{1}$ and $G_{2}$ , there is an algorithm that decides if $L(G_{1})=L(G_{2})$ .

References

1 A. Salomaa, Formal Languages, Academic Press, New York (1973).

Title	LL(k)
Canonical name	LLk
Date of creation	2013-03-22 19:00:54
Last modified on	2013-03-22 19:00:54
Owner	CWoo (3771)
Last modified by	CWoo (3771)
Numerical id	8
Author	CWoo (3771)
Entry type	Definition
Classification	msc 68Q05
Classification	msc 68Q42
Classification	msc 03D10
Related topic	LRk