Parsing Introduction¶
1. Introduction¶
Parsing: Deciding if x∈Σ∗ is in L(G) for some CFG G.
Review: What have we done so far
Consider the CFG G:
Is ba in L(G)? Running time?
How do you determine whether a string is in L(G)?
Note ba is not in L(G) for this G!
Try all possible derivations, but don't know when to stop. This runs forever!
Same grammar without lambda-rules:
Remove λ-rules, then unit productions, and then useless productions from the grammar G above. New grammar G′ is:
Is ba in L(G)? Running time?
Note
Earlier I said this was linear time.
Try all possible derivations, there will be at most |w| rounds. NOTE THIS IS NOT LINEAR TIME, IT TAKES A LONG TIME. Actual time is |w|∗p where p is the maximum number of rules for any variable.
Note
Given grammar to represent the C Programming language, we would want to know if C programs are syntactically correct. This is one of the phases in a compiler.
Want this to run as fast as possible, don't want to sit there and wait for your program forever to compile.
We will be looking at parsing methods that are used in writing compilers. We would like to know which rule to apply next.
Consider string baa. We would like to only try the rules that give us the derivation and ignore false paths. This would be fast! S⇒Aa⇒Baa⇒baa
1.1. Top-down Parser:¶
Start with S and try to derive the string.
Examples: LL Parser, Recursive Descent
Bottom-up Parser:
Start with string, work backwards, and try to derive S.
Examples: Shift-reduce, Operator-Precedence, LR Parser
We will use the following functions FIRST and FOLLOW to aid in computing parse tables.
Some notation that we will use in defining FIRST and FOLLOW.
G=(V,T,S,P)w,v∈(V∪T)∗a∈TX,A,B∈VXI∈(V∪T)+
1.2. The function FIRST¶
Definition: FIRST(w)= the set of terminals that begin strings derived from w.
If w∗⇒av thena is in FIRST(w)If w∗⇒λ thenλ is in FIRST(w)
To compute FIRST:
FIRST(a)={a} where a is a terminal.
FIRST(X) where X is a variable.
If X→aw then
a is in FIRST(X)
If X→λ then
λ is in FIRST(X)
If X→Aw and λ∈FIRST(A) then
Everything in FIRST(w) is in FIRST(X)
In general, FIRST(X1X2X3...XK)=
FIRST(X1)
∪ FIRST(X2) if λ is in FIRST(X1)
∪ FIRST(X3) if λ is in FIRST(X1)
and λ is in FIRST(X2)
...
∪ FIRST(XK) if λ is in FIRST(X1)
and λ is in FIRST(X2)
... and λ is in FIRST(XK−1)
− {λ} if λ∉FIRST(XJ) for all J
(where XI represents a terminal or a variable)
We will be computing FIRST(w) where w is the right hand side of a rule. Thus, we will need to compute FIRST(X) for each symbol X (either terminal or variable) that appears in the right hand side of a rule.
Example 1
L={anbmcn:n≥0,0≤m≤1}
FIRST(B)={b,λ}
Using B→b gives that b is in FIRST(B). Using B→λ gives that λ is in FIRST(B).
FIRST(S)={a,b,λ}
Using S→aSc gives that a is in FIRST(S).
Using S→B and λ is in FIRST(B) gives that everything in FIRST(B) is in FIRST(S), so b and λ are in FIRST(S).
FIRST(Sc)={a,b,c}
Example 2
Note
Why do we not calculate FIRST(S) first?
FIRST(S)={b,d,c,λ,a}
FIRST(A)={d,e,f,a}
FIRST(B)={b,λ}
FIRST(C)={d,λ}
FIRST(D)={c,λ}
FIRST(E)={e,f}
1.3. The function FOLLOW¶
Definition: FOLLOW(X)= set of terminals that can appear to the right of X in some derivation. (We only compute FOLLOW for variables.)
If S∗⇒wAav thena is in FOLLOW(A)(where w and v are strings of terminals and variables, a is a terminal, and A is a variable)
To compute FOLLOW:
$ is in FOLLOW(S)
If A→wBv and v≠λ then
FIRST(v)−{λ} is in FOLLOW(B)
If A→wB or A→wBv and λ is in FIRST(v) then
FOLLOW(A) is in FOLLOW(B)
λ is never in FOLLOW
Example 3
Note
Do a sample derivation of aabcc and show that c follows S, c follows B.
Reminder: λ is never in a FOLLOW set.
FOLLOW(S)={$,c}
$ goes into FOLLOW(S) by rule 1. Then c goes into FOLLOW(S) by rule 2 since S→aSc and FIRST(c)={c}.
FOLLOW(B)={$,c}
By rule 3 and S→B, FOLLOW(S) is added to FOLLOW(B).
Example 4
FOLLOW(S)={$}
FOLLOW(A)={$}
FOLLOW(B)={d,c,e,f$}
FOLLOW(C)={c,e,f$}
FOLLOW(D)={$}
FOLLOW(E)={b,$}