Loading [MathJax]/jax/output/HTML-CSS/jax.js
Close
Close Window

Show Source |    | About   «  9.3. Turing Machine Exercises   ::   Contents   ::   10.2. LL Parsing  »

10.1. Parsing Introduction

10.1.1. Introduction

Parsing: Deciding if xΣ is in L(G) for some CFG G.

Review: What have we done so far

Consider the CFG G:

SAaAAAABaλBBBabλ

Is ba in L(G)? Running time?

How do you determine whether a string is in L(G)?

Note ba is not in L(G) for this G!

Try all possible derivations, but don’t know when to stop. This runs forever!

Same grammar without lambda-rules:

Remove λ-rules, then unit productions, and then useless productions from the grammar G above. New grammar G is:

SAaaAAAABaAaBaaBBBaBaab

Is ba in L(G)? Running time?

Note

Earlier I said this was linear time.

Try all possible derivations, there will be at most |w| rounds. NOTE THIS IS NOT LINEAR TIME, IT TAKES A LONG TIME. Actual time is |w|p where p is the maximum number of rules for any variable.

Note

Given grammar to represent the C Programming language, we would want to know if C programs are syntactically correct. This is one of the phases in a compiler.

Want this to run as fast as possible, don’t want to sit there and wait for your program forever to compile.

We will be looking at parsing methods that are used in writing compilers. We would like to know which rule to apply next.

Consider string baa. We would like to only try the rules that give us the derivation and ignore false paths. This would be fast! SAaBaabaa

10.1.1.1. Top-down Parser:

  • Start with S and try to derive the string.

    SaSb
    lt10ptree1
  • Examples: LL Parser, Recursive Descent

Bottom-up Parser:

  • Start with string, work backwards, and try to derive S.

    lt10ptree2
  • Examples: Shift-reduce, Operator-Precedence, LR Parser

When the grammar has a λ-rule, it can be difficult to compute parse tables. In the example above, A can disappear (due to Aλ), so when S is on the stack, it can be replaced by Ac if either “a” or “c” are the lookahead or it can be replaced by Bc if “b” is the lookahead.

We will use the following functions FIRST and FOLLOW to aid in computing parse tables.

10.1.1.2. The function FIRST

Some notation that we will use in defining FIRST and FOLLOW.

G=(V,T,S,P)
w,v(VT)
aT
X,A,BV
XI(VT)+

Definition: FIRST(w)= the set of terminals that begin strings derived from w.

If wav then
a is in FIRST(w)
If wλ then
λ is in FIRST(w)

Example from previous grammar: FIRST(aAb)={a}, since you have aAba...b, and FIRST(Ac)={a,c}

To compute FIRST:

  1. FIRST(a)={a} where a is a terminal.

  2. FIRST(X) where X is a variable.

    1. If Xaw then

      a is in FIRST(X)

    2. If Xλ then

      λ is in FIRST(X)

    3. If XAw and λFIRST(A) then

      Everything in FIRST(w) is in FIRST(X)

  3. In general, FIRST(X1X2X3...XK)=

    • FIRST(X1)

    •  FIRST(X2) if λ is in FIRST(X1)

    •  FIRST(X3) if λ is in FIRST(X1)

      and λ is in FIRST(X2)

    •  FIRST(XK) if λ is in FIRST(X1)

      and λ is in FIRST(X2)

      … and λ is in FIRST(XK1)

    •  {λ} if λFIRST(XJ) for all J

    (where XI represents a terminal or a variable)

We will be computing FIRST(w) where w is the right hand side of a rule. Thus, we will need to compute FIRST(X) for each symbol X (either terminal or variable) that appears in the right hand side of a rule.

Example 10.1.1

L={anbmcn:n0,0m1}

SaScBBbλ

FIRST(B)={b,λ}

Using Bb gives that b is in FIRST(B). Using Bλ gives that λ is in FIRST(B).

FIRST(S)={a,b,λ}

Using SaSc gives that a is in FIRST(S).

Using SB and λ is in FIRST(B) gives that everything in FIRST(B) is in FIRST(S), so b and λ are in FIRST(S).

FIRST(Sc)={a,b,c}

Example 10.1.2

SBCDaDACEBaABbλCdBλDcAλEefE

Note

Why do we not calculate FIRST(S) first?

FIRST(S)={b,d,c,λ,a}

FIRST(A)={d,e,f,a}

FIRST(B)={b,λ}

FIRST(C)={d,λ}

FIRST(D)={c,λ}

FIRST(E)={e,f}

10.1.1.3. The function FOLLOW

Definition: FOLLOW(X)= set of terminals that can appear to the right of X in some derivation. (We only compute FOLLOW for variables.)

If SwAav then
a is in FOLLOW(A)
(where w and v are strings of terminals and variables, a is a terminal, and A is a variable)

To compute FOLLOW:

  1. $ is in FOLLOW(S)

  2. If AwBv and vλ then

    FIRST(v){λ} is in FOLLOW(B)

  3. If AwB or AwBv and λ is in FIRST(v) then

    FOLLOW(A) is in FOLLOW(B)

  4. λ is never in FOLLOW

Example 10.1.3

SaScB
Bbλ

Note

Do a sample derivation of aabcc and show that c follows S, c follows B.

Reminder: λ is never in a FOLLOW set.

FOLLOW(S)={$,c}

$ goes into FOLLOW(S) by rule 1. Then c goes into FOLLOW(S) by rule 2 since SaSc and FIRST(c)={c}.

FOLLOW(B)={$,c}

By rule 3 and SB, FOLLOW(S) is added to FOLLOW(B).

Example 10.1.4

SBCDaD
ACEBaA
Bbλ
CdBλ
DcAλ
EefE

FOLLOW(S)={$}

FOLLOW(A)={$}

FOLLOW(B)={d,c,e,f$}

FOLLOW(C)={c,e,f$}

FOLLOW(D)={$}

FOLLOW(E)={b,$}

   «  9.3. Turing Machine Exercises   ::   Contents   ::   10.2. LL Parsing  »

nsf
Close Window