LR(k)


Given a word u and a context-free grammar G, how do we determine if uL(G)?

One way is to look for any subword v of u such that there is a production Av. If this is successful, we may replace v with A in u to obtain a word w, so that wu. We may then repeat the process on w to obtain another word x such that xw (if successful). In the end, if everything works successfully, we arrive at the starting non-terminal symbol σ, and get a derivation σ*u as a result, so that uL(G). This procedure is known as the bottom-up parsing of the word u.

In general, unless one is very lucky, successfully finding a derivation σ*u requires many trials and errors, since at each stage, for a given word u, there may be several words w such that wu.

Nevertheless, there is a particular family of context-free grammars, called the LR(k) grammars, which make the bottom-up parsing described above straightforward in the sense that, given a word u, once a word w is found such that wRu, any other word w such that wRu forces w=w. Here, R is known as the rightmost derivation (meaning that u is obtained from w by replacing the rightmost non-terminal in w). The L in LR(k) means scanning the symbols of u from left to right, R stands for finding a rightmost derivation for u, and k means having the allowance to look at up to k symbols ahead while scanning.

The details are as follows:

Definition. Let G=(Σ,N,P,σ) be a context-free grammar such that σσ is not a production of G, and k0 an integer. Suppose U is any sentential form over Σ with the following setup: U=U1U2U3 where

  • U3 is a terminal word,

  • XU2 a production, and

  • σR*U1XU3RU.

Let n=|U1U2|+k, and Z the prefix of U of length n (if |U|<n, then set Z=U).

Then G is said to be LR(k) if W is another sentential form having Z as a prefix, with the following setup: W=W1W2W3, where

  • W3 is a terminal,

  • YW2 is a production, and

  • σR*W1YW3RW,

implies that

W1=U1,Y=X,and  W2=U2.

Simply put, if DU and DW are the rightmost derivations of U and W respectively, and if the prefix of U obtained by including k symbols beyond the last replacement in DU is also a prefix of W, then the prefix of U obtained by including k symbols beyond the last replacement in DU is also a prefix of W, where U and W are words at the next to the last step in DU and DW respectively. In particular, if U=W, then U=W. This implies that any derivable in an LR(k) grammar has a unique rightmost derivation, hence

Proposition 1.

Any LR(k) grammar is unambiguous.

Examples.

  • Let G be the grammar consisting of one non-terminal symbol σ (which is also the final non-terminal symbol), two terminal symbols a,b, with productions

    σaσb,σσb  and  σb.

    Then G is not LR(k) for any k0. For instance, look at the following two derivations of U=a2σb3:

    σ*aσb2a2σb3  and  σ*a2σb2a2σb3

    Here, U1=a, U2=σb. Let k=1. Then the criteria in the definition are satisfied. Yet, W1=a2U1. Therefore, G is not LR(1).

  • Note that the grammar G above generates the languagePlanetmathPlanetmath L={ambnn>m}, which can also be generated by the grammar with three non-terminal symbols σ,X,Y, with σ the final non-terminal symbol, where the productions are given by

    σXY,XaXb,Xλ,YYb,andYb.

    However, this grammar is LR(1).

Determining whether a context-free grammar is LR(k) is a non-trivial problem. Nevertheless, an algorithmMathworldPlanetmath exists for determining, given a context-free grammar G and a non-negative integer k, whether G is LR(k). On the other hand, without specifying k in advance, no algorithms exist that determine if G is LR(k) for some k.

Definition. A language is said to be LR(k) if it can be generated by an LR(k) grammar.

Theorem 1.

Every LR(k) language is deterministic context-free. Every deterministic context-free language is LR(1).

Hence, deterministic context-free languages are the same as LR(1) languages.

References

  • 1 A. Salomaa, Formal Languages, Academic Press, New York (1973).
  • 2 J.E. Hopcroft, J.D. Ullman, Formal Languages and Their RelationMathworldPlanetmath to Automata, Addison-Wesley, (1969).
Title LR(k)
Canonical name LRk
Date of creation 2013-03-22 19:00:31
Last modified on 2013-03-22 19:00:31
Owner CWoo (3771)
Last modified by CWoo (3771)
Numerical id 11
Author CWoo (3771)
Entry type Definition
Classification msc 03D10
Classification msc 68Q42
Classification msc 68Q05
Synonym LR(k)
Related topic LLk