9.1. Parsing¶
9.1.1. Parsing Introduction¶
Parsing: Deciding if x∈Σ∗ is in L(G) for some CFG G.
Consider the CFG G:S→AaA→AA∣ABa∣λB→BBa∣b∣λIs ba in L(G)? Running time?How do you determine whether a string is in L(G)?Note: ba is not in L(G) for this G!Try all possible derivations, but don’t know when to stop. This runs forever!
9.1.2. Introduction Example (2)¶
Same grammar without lambda-rules:Remove λ-rules, then unit productions, and then useless productions from the grammar G above. New grammar G′ is:S→Aa∣aA→AA∣ABa∣Aa∣Ba∣aB→BBa∣Ba∣a∣bIs ba in L(G)? Running time?Try all possible derivations, there will be at most |w| rounds. NOTE THIS IS NOT LINEAR TIME, IT TAKES A LONG TIME. Actual time is |w|∗p where p is the maximum number of rules for any variable.
9.1.3. Introduction Example (3)¶
Grammar:S→Aa∣aA→AA∣ABa∣Aa∣Ba∣aB→BBa∣Ba∣a∣bConsider string baa.Goal: We would like to only try the rules that give us the derivation and ignore false paths. This would be fast!S⇒Aa⇒Baa⇒baa
9.1.4. Top-down Parser¶
9.1.5. Bottom-up Parser¶
9.1.6. Making Parse Tables¶
We want to construct simple tables that tell us:When you are working on this rule, and see this input, do somethingWe will use the functions FIRST and FOLLOW to aid in computing parse tables.Notation that we will use in defining FIRST and FOLLOW.G=(V,T,S,P)w,v∈(V∪T)∗a∈TX,A,B∈VXI∈(V∪T)+
9.1.7. The function FIRST¶
Definition: FIRST(w)= the set of terminals that begin strings derived from w.If w∗⇒av thena is in FIRST(w)If w∗⇒λ thenλ is in FIRST(w)
9.1.8. To compute FIRST (1)¶
1. FIRST(a)={a} where a is a terminal.2. FIRST(X) where X is a variable.(a) If X→aw thena is in FIRST(X)(b) If X→λ thenλ is in FIRST(X)(c) If X→Aw and λ∈FIRST(A) thenEverything in FIRST(w) is in FIRST(X)
9.1.9. To compute FIRST (2)¶
3. In general, FIRST(X1X2X3...XK)=* FIRST(X1)* ∪ FIRST(X2) if λ is in FIRST(X1)* ∪ FIRST(X3) if λ is in FIRST(X1)and λ is in FIRST(X2)…* ∪ FIRST(XK) if λ is in FIRST(X1)and λ is in FIRST(X2)… and λ is in FIRST(XK−1)* − {λ} if λ∉FIRST(XJ) for all J(where XI represents a terminal or a variable)
9.1.10. To compute FIRST (3)¶
We will be computing FIRST(w) where w is the right hand side of a rule.Thus, we will need to compute FIRST(X) for each symbol X (either terminal or variable) that appears in the right hand side of a rule.
9.1.11. Example (1)¶
L={anbmcn:n≥0,0≤m≤1}S→aSc∣BB→b∣λFIRST(B)={b,λ}Using B→b gives that b is in FIRST(B).Using B→λ gives that λ is in FIRST(B).FIRST(S)={a,b,λ}Using S→aSc gives that a is in FIRST(S).Using S→B and λ is in FIRST(B) gives that everything in FIRST(B) is in FIRST(S), so b and λ are in FIRST(S).FIRST(Sc)={a,b,c}
9.1.12. Example (2a)¶
S→BCD∣aDA→CEB∣aAB→b∣λC→dB∣λD→cA∣λE→e∣fE
9.1.13. Example (2b)¶
FIRST(B)=
FIRST(C)=
FIRST(D)=
FIRST(E)=
FIRST(A)=
FIRST(S)=
9.1.14. The function FOLLOW¶
Definition: FOLLOW(X)= set of terminals that can appear to the right of X in some derivation.(We only compute FOLLOW for variables.)If S∗⇒wAav thena is in FOLLOW(A)(where w and v are strings of terminals and variables, a is a terminal, and A is a variable)
9.1.15. Computing FOLLOW¶
$ is in FOLLOW(S)
If A→wBv and v≠λ then
FIRST(v)−{λ} is in FOLLOW(B)
If A→wB or A→wBv and λ is in FIRST(v) then
FOLLOW(A) is in FOLLOW(B)
λ is never in FOLLOW
9.1.16. Example (1)¶
S→aSc∣BB→b∣λReminder: λ is never in a FOLLOW set.
FOLLOW(S)={$,c}
$ goes into FOLLOW(S) by rule 1. Then c goes into FOLLOW(S) by rule 2 since S→aSc and FIRST(c)={c}.
FOLLOW(B)={$,c}
By rule 3 and S→B, FOLLOW(S) is added to FOLLOW(B).
9.1.17. Example (2a)¶
S→BCD∣aDA→CEB∣aAB→b∣λC→dB∣λD→cA∣λE→e∣fE
9.1.18. Example (2b)¶
FOLLOW(S)=
FOLLOW(A)=
FOLLOW(B)=
FOLLOW(C)=
FOLLOW(D)=
FOLLOW(E)=
9.1.19. LL(k) Parsing¶
We discussed this in principle before. Now we want to operationalize it.Note: A language is not LL(k), a grammar is.L={aiabci∣i>0}G1=S→aSc{aaa}S→aabc{aab}G2=S→aAA→Sc{aa}A→abc{ab}G3=S→aaAcA→aAc{a}A→b{b}
9.1.20. LL parsing process¶
9.1.21. Parsing routine (for this grammar)¶
symbol
is the lookahead symbol and $ is the end-of-string marker:state = s push(S) state = q read(symbol) obtain the lookahead symbol while top-of-stack <> z do while stack is not empty case top-of-stack of S: if symbol == a then cases for variables { pop(); push(aSb) } replace S by aSb else if symbol == b then { pop(); push(b) } replace S by b else error a: if symbol <> a, then error cases for terminals else { pop(); read(symbol) } pop a, get next lookahead b: if symbol <> b, then error else { pop(); read(symbol) } pop b, get next lookahead end case end while pop() pop z from the stack if symbol <> $ then error state = f
9.1.22. LL Parse Table¶
When the grammar is large, the parsing routine will have many cases.Alternatively, store the information for which rule to apply in a table.Rows: variablesColumns: terminals, $ (end of string marker)LL[i,j]
contains the right-hand-side of a rule. This right-hand-side is pushed onto the stack when the left-hand-side of the rule is the variable representing the i th row and the lookahead is the symbol representing the j th column.For any CFG that we can specify by this type of parse table, we can use a generic parser to determine if strings are in this language.Gets rid of use of states
9.1.23. Parse Table Example¶
Parse table for
L={anbbn:n≥0}S→aSb∣bab$SaSbberrorExample strings:aabbbb
9.1.24. A generic parsing routine¶
(
LL[,]
is the parse table.):push(S) read(symbol) obtain the lookahead symbol while stack not empty do case top-of-stack of terminal: if top-of-stack == symbol then { pop(); read(symbol) } pop terminal and get next lookahead else error variable: if LL[top-of-stack, symbol] <> error then { pop(), pop the lhs push(LL[top-of-stack,symbol]) } push the rhs else error end case end while if symbol <> $, then error
9.1.25. Example¶
S→aSbS→cabc$SaSberrorcerrorWhen S is on the stack and a is the lookahead, replace S by aSbWhen S is on the stack and b is the lookahead, there is an error (there must be a c between the a ‘s and b ‘s)When S is on the stack and $ is the lookahead, then there is an error (S must be replaced by at least one terminal)When S is on the stack, and c is the lookahead, then S should be replaced by c.
9.1.26. Example¶
S→Ac∣BcA→aAb∣λB→bWhen the grammar has a λ-rule, it can be difficult to compute parse tables.In this example, A can disappear (due to A→λ).So when S is on the stack, it can be replaced by Ac if either “a” or “c” are the lookahead, or it can be replaced by Bc if “b” is the lookahead.
9.1.27. Constructing an LL parse table¶
1. For each rule A→wa. For each a in FIRST(w)add w to LL[A,a]b. If λ is in FIRST(w)add w to LL[A,b] for each b in FOLLOW(A)where b∈T∪{$}2. Each undefined entry is an error.
9.1.28. Example (1): Need FIRST and FOLLOW¶
S→aSc∣BB→b∣λWe have already calculated FIRST and FOLLOW for this Grammar:
FIRSTFOLLOWSa,b,λ$,cBb,λ$,c
9.1.29. Example (2): Compute Parse Table¶
For S→aSc, FIRST(aSc)={a}, so add aSc toLL[S,a]
by step 1a.For S→B,FIRST(B)={b,λ}FOLLOW(S)={$,c}By step 1a, add B toLL[S,b]
By step 1b, add B toLL[S,c]
andLL[S,$]
For B→b, FIRST(b)={b}, so by step 1a add b toLL[B,b]
For B→λ FIRST(λ)={λ} and FOLLOW(B)={$,c},so by step 1b add λ toLL[B,c]
and add λ toLL[B,$]
.
9.1.30. Example (3): Sample Trace¶
abc$SaScBBBBerrorbλλParse string: aacc
aaSSBSSccccStack:S_c_c_c_c_c_c_c_symbol:aaa′a′cccc′where a′ is the second a in the string and
symbol
is the lookahead symbol. This table is an LL(1) table because only 1 symbol of lookahead is needed.
9.1.31. Example (4): Sample Trace¶
Trace aabcc
aaSSBbSScccccStack:S_c_c_c_c_c_c_c_c_symbol:aaa′a′bbbcc′where a′ is the second a in the string and
symbol
is the lookahead symbol. This table is an LL(1) table because only 1 symbol of lookahead is needed.
9.1.32. LL(k) Can’t Parse All CFGs¶
L={an:n≥0}∪{anbn:n≥0}S→AS→BA→aAA→λB→aBbB→λThis grammar cannot be recognized by an LL(k) parser for any k.Consider the string aabb. Need 3 lookahead to realize that we want S→B.For aaabbb, we need 4 lookahead.For anbn, we need n lookahead.
9.1.33. Conclusion¶
There are some CFL’s that have no LL(k) Parser
There are some languages for which some grammars have LL(k) parsers and some don’t.