5.1. Simplifying CFGs and Normal Forms¶
5.1.1. Transforming CFGs (1)¶
We use grammars to represent a programming language.Want to know: Is a given string (or program x) valid (syntactically correct)?Same as asking if it is in the language.We have seen that (with restrictions on form of CFG), we can determine whether w is in L(G) in 2|w| rounds, each step adding a terminal or increasing in length.This works, but it is not fast enough, that is, not linear!
5.1.2. Transforming CFGs (2)¶
Key question: Are there ways to transform (restrict) CFGs such that1) We can process efficiently2) without restricting the power of CFGs<< What does it mean “without restricting the power of CFGs”? >>
Specifically, we look at restrictions on the right hand side of the production rules.Need to be able to automatically transform an arbitrary CFG into an equivalent restricted CFG.
5.1.3. Substitution Theorem (1)¶
Sometimes it is easier to reason about grammars for languages without λ.(Besides, its pointless to keep expanding on productions that derive to the empty string.)Useful note: It would be easy to add λ to any grammar by adding a new start symbol S0,S0→S∣λSo in practice, if we can transform grammar G into ˆG such that L(ˆG)=L(G)−{λ}, then the two are effectively equivalent.
5.1.4. Substitution Theorem (2)¶
Let G be a CFG. Suppose G contains A→x1Bx2where xi∈(V∪T)∗, A and B are different variables, and B has the productionsB→y1∣y2∣…∣yn.We can construct G′ from G by deletingA→x1Bx2from P and adding to itA→x1y1x2∣x1y2x2∣…∣x1ynx2.Substitution Theorem: L(G)=L(G′).
5.1.5. Substitution Theorem Example¶
Grammar G:A→a∣aaA∣abBcB→abbA∣bSubstitute to get ˆG:A→a∣aaA∣ababbAc∣abbcB→abbA∣bThen the B productions become useless productions.<< Question: Why don’t we also delete B rules? >>
5.1.6. Substitution Theorem Example¶
What was the point to this? Look at derivations.
Derivation under G for aaabbc:A⇒aaA⇒aaabBc⇒aaabbcDerivation under ˆG for aaabbc:A⇒aaA⇒aaabbc
5.1.7. Useless Productions (1)¶
We left in the productions for B, but maybe there is no way remaining to reach them. In that situation, they can go.This example is not as obvious: (What is wrong with the A production?)S→aSb∣λ∣AA→aADefinition: Variable A∈V is said to be useful if and only if there is at least one w∈L(G) such that S∗⇒xAy∗⇒wWe want to eliminate both types of useless production (ones that can’t be reached, and ones that don’t terminate).
5.1.8. Theorem: (useless productions)¶
Let G be a CFG.Then there exists ˆG that does not contain any useless variables or productions such that L(G)=L(ˆG).
5.1.9. Process (1)¶
To Remove Useless Productions:
Let G=(V,T,S,P).
I. Compute V1= {Variables that can derive strings of terminals}1. V1=∅2. Repeat until no more variables added* For every A∈V with A→x1x2…xn, all xi∈(T∗∪V1), add A to V13. Take P1 as all productions in P whose symbols are all in (V1∪T)Then G1=(V1,T,S,P1) has no variables that can’t derive strings.NOTE: Now need to get rid of productions we can’t use.
5.1.10. Example (1)¶
S→aB∣bAA→aAB→Sa∣bC→cBc∣aD→bCbE→Aa∣bWe process this to eliminate A:B, C, and E all have productions with just terminals, so they go into V1.Now, D has a production with just symbols that are terminals or variables already in V1, so it goes into V1.
5.1.11. Process (2)¶
II. Draw Variable Dependency GraphFor A→xBy, draw A→B.Make A and B be nodes with an arc from A to B.Remove productions for V if there is no path from S to V in the dependency graph.Resulting Grammar G′ is such that L(G)=L(G′) and G′ has no useless productions.
5.1.12. Example (2)¶
5.1.13. Example (3)¶
G′:S→aBB→Sa∣b
5.1.14. Removing λ Productions¶
NOTE: Last time talked about simpler CFG that had no λ productions, now we will show how to get rid of them.
Theorem (remove λ productions)Let G be a CFG with λ not in L(G).Then there exists a CFG G′ having no λ productions such that L(G)=L(G′).
5.1.15. Process: Removing λ Productions¶
1. Let Vn={A∣∃ production A→λ}2. Repeat until no more additions* if B→A1A2…Am and Ai∈Vn for all i, then put B in VnTHUS, Vn={A∣A∗⇒λ}3. Construct G′ with productions P′ such that* If A→x1x2…xm∈P,m≥1, then put all productions formed when xj is replaced by λ (for all xj∈Vn) such that |rhs|≥1 into P′.
5.1.16. Example: Step 2¶
S→AbA→BCB∣AaB→b∣λC→cC∣λ2. Repeat until no more additions* if B→A1A2…Am and Ai∈Vn for all i, then put B in VnTHUS, Vn={A∣A∗⇒λ}Which variables can derive λ?B and C directlyA indirectly (because BCB can become λ)
5.1.17. Example: Step 3¶
3. Construct G′ with productions P′ such that* If A→x1x2…xm∈P,m≥1, then put all productions formed when xj is replaced by λ (for all xj∈Vn) such that |rhs|≥1 into P′.G′:S→Ab∣bA→BCB∣BC∣BB∣CB∣B∣C∣Aa∣aB→bC→cC∣cNOTE: Don’t add A→λ!
5.1.18. Unit Productions¶
Definition: Unit Production:A→Bwhere A,B∈V.Consider removing unit productions: Suppose we haveA→BB→a∣abThis becomes: A→a∣ab
5.1.19. Unit Productions (2)¶
But what if we haveA→B becomes A→CB→CB→AC→AC→BBut we didn’t get rid of unit productions!
5.1.20. Removing Unit Productions¶
Theorem (Remove unit productions)Let G=(V,T,S,P) be a CFG without λ productions.Then there exists CFG G′=(V′,T′,S,P′) that does not have any unit productions and L(G)=L(G′).
5.1.21. Process¶
1. Find for each A, all B such that A∗⇒B(Draw a dependency graph showing relationship of Unit productions.Make a node for each variable. For any rule A⇒B, draw an arc from A to B.)2. Construct G′=(V′,T′,S,P′) by(a) Put all non-unit productions in P′(b) For all A∗⇒B such that B→y1∣y2∣…yn∈P′, put A→y1∣y2∣…yn∈P′Run DFS with A as root.Note the star in A∗⇒BNever put a unit production in P′.
5.1.22. Example (1)¶
Original:S→Aa∣BB→A∣bbA→a∣bc∣BUnit Production Dependency Graph:
5.1.23. Example (2)¶
Remove the unit production rules, and add these rules:S→a∣bc∣bbA→bbB→a∣bcResult:S→a∣bc∣bb∣AaA→a∣bb∣bcB→a∣bb∣bc
5.1.24. Theorem¶
Theorem: Let L be a CFL that does not contain λ. Then there exists a CFG for L that does not have any useless productions, λ productions, or unit productions.Proof:1. Remove λ productions2. Remove unit productions3. Remove useless productionsOrder is important. Removing λ productions can create unit productions!
5.1.25. Chomsky Normal Form (CNF)¶
Definition: A CFG is in Chomsky Normal Form (CNF) if all productions are of the formA→BC or A→awhere A,B,C∈V and a∈T.Why would you want to put a grammar in this form?Because it is easier to work with (reason about), so will see it in future.
5.1.26. Theorem:¶
Any CFG G with λ not in L(G) has an equivalent grammar in CNF.
Proof:1. Remove λ productions, unit productions, and useless productions. (We already know how to do this.)2. For every right-hand-side of length >1, replace each terminal xi by a new variable Cj and add the production Cj→xi.Note: All productions are in the correct form or the right-hand-side is a string of variables.3. Replace every right-hand-side of length >2 by a series of productions, each with right-hand-side of length 2.
5.1.27. Example (1)¶
S→CBcdB→bC→Cc∣e(after removing λ, usless, unit productions:)S→CBC1C2B→bC→CC3∣eC1→cC2→dC3→c
5.1.28. Example (2)¶
(after step 2):S→CZ1Z1→BZ2Z2→C1C2B→bC→CC3∣eC1→cC2→dC3→c
5.1.29. Greibach Normal Form (GNF)¶
Definition: A CFG is in Greibach normal form (GNF) if all productions have the formA→axwhere a∈T and x∈V∗This is like an s-grammar (or simple grammar), except the s-grammar definition includes a further restriction that any pair (A,a) can occur at most in one rule.
This is so that you wouldn’t have to backtrack (only one choice to match the derivation of a string). So it is very restrictive.
5.1.30. GNF Theorem¶
For every CFG G, there exists a grammar in GNF.See proof in modules.Example:S→ABA→aA∣bB∣bB→bSimple substitutions give us:S→aAB∣bBB∣bBA→aA∣bB∣bB→b
5.1.31. What You Should Know¶
Know what usless productions, unit productions, etc. areDon’t memorize the processes for eliminating them, but understand how they work in principle.Know what the GNF and CNF forms are.