LL(k)


Given a word u and a context-free grammar G, how do we determine if uL(G)?

There are in general two ways to proceed. Start from u, and proceed backward to find v such that vu. Keep going until a derivation σ*u is found. This procedure is known as the bottom-up parsing of u. The other method the top-down approach: begin with the starting symbol σ, and work its way down to u, so σ*u.

As with the bottom-up approach, finding a derivation of u from the top-down may be time consuming, if one is lucky enough to find a derivation at all.

There is a class of grammars, known as the LL(k) grammars, which make the top-down parsing of a word natural and direct. The first L in LL(k) means scanning the symbols of u from left to right, the second L stands for finding a leftmost derivation (L) for u, and k means having the allowance to look at up to k symbols ahead while scanning.

Definition. Let G=(Σ,N,P,σ) be a context-free grammar such that σσ is not a production of G, and k0 an integer. Suppose uL(G) with a XU1 a production in a leftmost derivation of u:

σL*UXU2LUU1U2L*u.

Let n=|U|+k and v be the prefix of u of length n (if |u|<n, then set v=u).

Then G is said to be LL(k) if for any wL(G), with v as a prefix, such that there is a production XW1 in a leftmost derivation of w:

σL*UXW2LUW1W2L*w,

implies that W1=U1.

In a leftmost derivation Du of a word u, call a prefix v of u is a leftmost descendant of a production PU if σ*vPUvUU*u is Du. Then the definition above can be restated in words as follows:

Given a leftmost derivation Du of a word u, a production used in Du is uniquely determined up to k symbols beyond the prefix of u which is a leftmost descendant of the production. In other words, if Du and Dw are leftmost derivations of u and w which agree on k symbols beyond the common prefix v, where v is both a leftmost descendant of XU used in Du, and a leftmost descendant of XW used in Dw, then XU and XW are the same production, i.e. U=W.

Every LL(k) is unambiguous. Furthermore, every LL(k) grammar is LR(k) (http://planetmath.org/LRk).

Given a context-free grammar G and k0, there is an algorithmMathworldPlanetmath deciding whether G is LL(k).

Examples

  • The grammar G over Σ={a,b}, with productions σa2σb2, σa and σλ is LL(2) but not LL(1). It is not hard to see that L(G) is the set {ambnn is even, and nmn+1}. On the other hand, the grammar G over Σ, with productions

    σaX,σλ,XaYb,Xλ,YaXb,Yb

    also generates L(G), but is LL(1) instead.

  • The grammar G over {a,b,c}, with productions

    σX,σY,XaXb,Xab,YaYc,Yac

    is not LL(k) for any k0.

Definition A languagePlanetmathPlanetmath is said to be LL(k) if it is generated by an LL(k) grammar. The family of LL(k) languages is denoted by (k).

It is easy to see that an LL(0) contains no more than one word. Furthermore, it can be shown that

(0)(1)(k),

and the inclusion is strict. If (k) denotes the family of λ-free LL(k) languages, then

(0)=(1)==(k)=.

Given two LL(k) grammars G1 and G2, there is an algorithm that decides if L(G1)=L(G2).

References

Title LL(k)
Canonical name LLk
Date of creation 2013-03-22 19:00:54
Last modified on 2013-03-22 19:00:54
Owner CWoo (3771)
Last modified by CWoo (3771)
Numerical id 8
Author CWoo (3771)
Entry type Definition
Classification msc 68Q05
Classification msc 68Q42
Classification msc 03D10
Related topic LRk