Loading [MathJax]/jax/output/HTML-CSS/jax.js
Close
Close Window

Show Source |    | About   «  10.1. Parsing Introduction   ::   Contents   ::   10.3. LR Parsing  »

10.2. LL Parsing

10.2.1. LL Parsing

10.2.1.1. LL(k) Parser

  • Top-down parser: starts with start symbol on stack, and repeatedly replace nonterminals until string is generated.

  • Predictive parser: predict next rewrite rule
    NOTE: use lookahead for this
  • First L of LL means that we read input string left to right

  • Second L of LL means that we produce the leftmost derivation

    Note

    ASK TO SEE IF THEY KNOW WHAT THIS IS

  • k: number of lookahead symbols used.
    Sometimes more than one symbol is needed

10.2.1.2. LL parsing process

  • Convert CFG to PDA (different method than before)

  • Use the PDA and lookahead symbols

  • Lookahead symbol is next symbol in input string

Notes:

  • The PDA is nondeterministic, so we will lookahead to the next input symbol and use it to determine which rewrite rule to use.

  • Nondeterministic, could use back-tracking, but this could take forever.

  • Remember: cannot necessarily construct a deterministic PDA from a NPDA.

10.2.1.3. Convert CFG to NPDA

NOTE: This is not the same construction method we used before. This method will apply to any CFG, even those that are not in GNF.

Idea: To derive a string with a CFG, start with the start symbol and repeatedly apply production rules until the string is derived. In order to simulate this process with an NPDA, start by pushing the start symbol on the stack. Whenever a production rule Aw would be applied, the variable A should be on top of the stack. A is popped (or replaced) and the right hand side of the rule, w, is pushed onto the stack. Whenever a terminal is on top of the stack, if it matches the next symbol in the input string, then it is popped from the stack. If it does not match, then this string is not in the language of the grammar. If starting with the start symbol S, one can apply replacement rules, match all the terminals in the input string and empty the stack, then the string is in the language.

Note

Just mention this stuff and then draw NPDA.

The constructed NPDA:

  • Three states: s,q,f
    As usual, start in state s
    Push S on stack, move into q
    All rewrite rules in state q: If left-hand-side of rewrite rule on top of stack, replace it with right-hand-side of rewrite rule and stay in state q
    Additional rules in q to recognize terminals: Read input symbol, pop input symbol, stay in state q
    Pop z from stack, move into f, accept

Example 10.2.1

L={anbbn:n0}
SaSbb
lt10pda1

Note that this is nondeterministic. You have to use the lookahead to decide which transition to take, in a sense adding determinism by using extra information.

Note

Trace aabbb.

  • symbol is a buffer that holds next input symbol (not processed right away)

  • We have gotten rid of the nondeterminism by using the lookahead symbol.

A parsing routine for this grammar:

symbol is the lookahead symbol and $ is the end-of-string marker:

state = s
push(S)
state = q
read(symbol)                               obtain the lookahead symbol
while top-of-stack <> z do                 while stack is not empty
   case top-of-stack of
   S: if symbol == a then                  cases for variables
            { pop(); push(aSb) }           replace S by aSb
         else if symbol == b then
            { pop(); push(b) }             replace S by b
         else error
      a: if symbol <> a, then error        cases for terminals
         else { pop(); read(symbol) }      pop a, get next lookahead
      b: if symbol <> b, then error
         else { pop(); read(symbol) }      pop b, get next lookahead
      end case
end while
pop()                                      pop z from the stack
if symbol <> $ then error
state = f

What are the drawbacks?

  • For a larger grammar, case statement can get quite long

  • Can put the case statement into a generic routine

10.2.1.4. LL Parse Table: 2-dimensional array

When the grammar is large, the parsing routine will have many cases. Alternatively, store the information for which rule to apply in a table.

  • Rows: variables

  • Columns: terminals, $ (end of string marker)

  • LL[i,j] contains the right-hand-side of a rule. This right-hand-side is pushed onto the stack when the left-hand-side of the rule is the variable representing the i th row and the lookahead is the symbol representing the j th column.

  • If we can specify any CFG by this type of parse table, then we can use a generic parser to determine if strings are in this language.

  • Gets rid of use of states

Example 10.2.2

Parse table for

L={anbbn:n0}
SaSbb
ab$SaSbberror

10.2.1.5. A generic parsing routine

Idea: To replace a variable on the top of the stack with its appropriate right-hand-side, use the lookahead and the left-hand-side to look up the right-hand-side in the LL parse table. (LL[,] is the parse table.):

push(S)
read(symbol)                                         obtain the lookahead symbol
while stack not empty do
   case top-of-stack of
      terminal:
         if top-of-stack == symbol
            then { pop(); read(symbol) }             pop terminal and get next lookahead
         else
            error
      variable:
         if LL[top-of-stack, symbol] <> error
            then { pop(),                            pop the lhs
                   push(LL[top-of-stack,symbol]) }   push the rhs
            else
               error
      end case
end while
if symbol <> $, then error

Note

For previous example, try the following traces:

Parse the string: aabbb

Parse the string: b

Example 10.2.3

SaSbScabc$SaSberrorcerror

In this example, it is clear that when S is on the stack and a is the lookahead, replace S by aSb. When S is on the stack and b is the lookahead, there is an error, because there must be a c between the a ‘s and b ‘s. When S is on the stack and $ is the lookahead, then there is an error, since S must be replaced by at least one terminal. When S is on the stack, and c is the lookahead, then S should be replaced by c.

Example 10.2.4

SAcBcAaAbλBb

When the grammar has a λ-rule, it can be difficult to compute parse tables. In this example, A can disappear (due to Aλ), so when S is on the stack, it can be replaced by Ac if either “a” or “c” are the lookahead, or it can be replaced by Bc if “b” is the lookahead.

We will use the following functions FIRST and FOLLOW to aid in computing the table.

10.2.1.6. To construct an LL parse table LL[rows,cols]

Note

Refresh memory as to what parse table is.

  1. For each rule Aw

    1. For each a in FIRST(w)
      add w to LL[A,a]
    2. If λ is in FIRST(w)
      add w to LL[A,b] for each b in FOLLOW(A)
      where bT{$}
  2. Each undefined entry is an error.

Example 10.2.5

SaScB
Bbλ

We have already calculated FIRST and FOLLOW for this Grammar:

FIRSTFOLLOWSa,b,λ$,cBb,λ$,c

To Compute the LL Parse Table for this example:

  • For SaSc,
    FIRST(aSc)={a}, so add aSc to LL[S,a] by step 1a.
  • For SB,
    FIRST(B)={b,λ}
    FOLLOW(S)={$,c}
    By step 1a, add B to LL[S,b]
    By step 1b, add B to LL[S,c] and LL[S,$]
  • For Bb,
    FIRST(b)={b}, so by step 1a add b to LL[B,b]
  • For Bλ
    FIRST(λ)={λ} and FOLLOW(B)={$,c}, so by step 1b add λ to LL[B,c] and add λ to LL[B,$].

LL(1) Parse Table

abc$SaScBBBBerrorbλλ

Parse string: aacc

aaSSBSSccccStack:S_c_c_c_c_c_c_c_symbol:aaaacccc

where a is the second a in the string and symbol is the lookahead symbol. This table is an LL(1) table because only 1 symbol of lookahead is needed.

Example 10.2.6

Trace aabcc

aaSSBbSScccccStack:S_c_c_c_c_c_c_c_c_symbol:aaaabbbcc

where a is the second a in the string and symbol is the lookahead symbol. This table is an LL(1) table because only 1 symbol of lookahead is needed.

Example 10.2.7

Construct Parse Table for:
L={anbncamcbm:n0,m0}
SAcB
AaAb
Aλ
BaBb
Bc
FIRST(A)={a,λ}
FIRST(S)={a,c}
NOTE: λ is not in FIRST(S)
FIRST(B)={a,c}
FOLLOW(A)={b,c}
FOLLOW(S)={$}
FOLLOW(B)={b,$}
To compute the parse table:
  • For SAcB,
    FIRST(AcB)={a,c} so add AcB to LL[S,a] and LL[S,c]
  • For AaAb,
    FIRST(aAb)={a} so add aAb to LL[A,a]
  • For Aλ,
    FIRST(λ)={λ} and FOLLOW(A)={b,c} so add λ to LL[A,b] and LL[A,c]
  • For BaBb,
    FIRST(aBb)={a} so add aBb to LL[B,a]
  • For Bc,
    FIRST(c)={c} so add c to LL[B,c]
  • All other entries are errors.
LL(1) Parse Table:
abc$SAcBerrorAcBerrorAaAbλλerrorBaBberrorcerror

parse string: abcacb

parse string: cc

parse string: abcab (not in language)

Example 10.2.8

L={anbncamcbm:n1,m1}

SAcB
AaAb
Aab
BaBb
Bacb
FIRSTFOLLOWSa$Aac,bBab,$

Note that FIRST and FOLLOW are quite easy to calculate since there are no λ rules! In this case, you don’t need FOLLOW to construct the parse table.

Try to construct LL(1) Parse table

abc$SAcBerrorerrorerrorAaAberrorerrorerrorabBaBberrorerrorerroracb

Note that you don`t know which rewrite rule to apply to replace A and B with just one lookahead symbol. A has two choices and both use a lookahead of ‘a’. There are two entries in the LL(1) parse table for T[A,a]. Thus, there is no LL(1) parse table. This means the grammar is not LL(1)!. We will try to use 2 symbols of lookahead.

For example, the string aabbcaacbb cannot be parsed with just one lookahead.

LL(2) Parse Table:

aaabaca$bc$SAcBAcBerrorerrorerrorerrorerrorAaAbaberrorerrorerrorerrorerrorBaBberroracberrorerrorerrorerror

There are no conflicts (only one rule in each entry of the table). This is an LL(2) parser - need two lookahead symbols.

parse string: aabbcacb

aaAAbbAbbbbbacccccccccStack:S_B_B_B_B_B_B_B_B_b_b_b_symbol:aaaaaaababbbbccaacaccbb$

Note the leftmost derivation! Also note that the two lookahead symbols are used whenever there is a variable on top of the stack.

An LL(k) parser needs k lookahead symbols.

Note

Mention that LL parser doesn’t work if the grammar is left recursive.

Example 10.2.9

L={an:n0}{anbn:n0}

SA
SB
AaA
Aλ
BaBb
Bλ

This grammar cannot be recognized by an LL(k) parser for any k! Consider the string aabb. You would need 3 lookahead to realize that you want to use SB. Consider the string aaabbb, you would need 4 lookahead. Consider string anbn, you would need n lookahead. There is no (constant) k such that k lookahead works for every string in the language.

Example 10.2.10

An LL(11) parser will work since all strings have 10 or fewer a ‘s.

Example 10.2.11

SbbCdBcc
BbBb
CcCc

This grammar is LL(5). We don’t know which S rule to apply with the string bbccd or bbcc$ until you have seen the fifth symbol.

This grammar cannot be recognized by an LL(k) parser.

When the lookahead is b, don’t know which rule to apply, either the second or third.

Comments:

There are some CFL’s that have no LL(k) Parser

There are some languages for which some grammars have LL(k) parsers and some don’t.

   «  10.1. Parsing Introduction   ::   Contents   ::   10.3. LR Parsing  »

nsf
Close Window