4.Transforming CFGs (1)
We use grammars to represent a programming language.
Want to know: Is a given string (or program x) valid
(syntactically correct)?
Same as asking if it is in the language.
We have seen that (with restrictions on form of CFG),
we can determine whether w is in L(G) in
2|w| rounds, each step adding a terminal
or increasing in length.
This works, but it is not fast enough, that is, not linear!
Transforming CFGs (2)
Key question: Are there ways to transform (restrict) CFGs such that
1) We can process efficiently
2) without restricting the power of CFGs
<< What does it mean “without restricting the power of CFGs”? >>
Specifically, we look at restrictions on the right hand side of the
production rules.
Need to be able to automatically transform an arbitrary CFG
into an equivalent restricted CFG.
Substitution Theorem (1)
Sometimes it is easier to reason about grammars for languages
without λ.
(Besides, its pointless to keep expanding on productions that
derive to the empty string.)
Useful note: It would be easy to add λ to any
grammar by adding a new start symbol S0,
So in practice, if we can transform grammar G into
ˆG such that L(ˆG)=L(G)−{λ},
then the two are effectively equivalent.
Substitution Theorem (2)
Let G be a CFG. Suppose G contains
A→x1Bx2
where xi∈(V∪T)∗,
A and B are different variables,
and B has the productions
We can construct G′ from G by deleting
from P and adding to it
A→x1y1x2|x1y2x2|…|x1ynx2.
Substitution Theorem: L(G)=L(G′).
Substitution Theorem Example
Substitute to get ˆG:
A→a∣aaA∣ababbAc∣abbc
B→abbA∣b
Then the B productions become useless productions.
<< Question: Why don’t we also delete B rules? >>
Substitution Theorem Example
What was the point to this? Look at derivations.
Derivation under G for aaabbc:
Derivation under ˆG for aaabbc:
Useless Productions (1)
We left in the productions for B, but maybe there is no way
remaining to reach them.
Obviously they can go.
This example is not as obvious:
Definition: Variable A∈V is said to be useful if
and only if there is at least one w∈L(G) such that
We want to eliminate both types of useless production.
Theorem: (useless productions)
Let G be a CFG.
Then there exists ˆG that does not contain any useless
variables or productions such that L(G)=L(ˆG).
Process (1)
To Remove Useless Productions:
Let G=(V,T,S,P).
I. Compute V1= {Variables that can derive strings of terminals}
1. V1=∅
2. Repeat until no more variables added
* For every A∈V with A→x1x2…xn,
all xi∈(T∗∪V1), add A to V1
3. Take P1 as all productions in P whose
symbols are all in (V1∪T)
Then G1=(V1,T,S,P1) has no variables that can’t derive
strings.
NOTE: Now need to get rid of productions we can’t use.
Example (1)
S→aB∣bA
A→aA
B→Sa∣b
C→cBc∣a
D→bCb
E→Aa∣b
We process this to eliminate A.
Process (2)
II. Draw Variable Dependency Graph
For A→xBy, draw A→B.
Draw A in a circle, B in a circle, and an arc from
A to B.
Remove productions for V if there is no path from S to
V in the dependency graph.
Resulting Grammar G′ is such that L(G)=L(G′) and
G′ has no useless productions.
Example (2)
S→aB
B→Sa∣b
C→cBc∣a
D→bCb
E→Aa∣b
Example (2)
Now, do it again.
Removing λ Productions
NOTE: Last time talked about simpler CFG that had no
λ-productions, now we will show how to get rid of them.
Theorem (remove λ productions)
Let G be a CFG with λ not in L(G).
Then there exists a CFG G′ having no
λ-productions such that L(G)=L(G′).
Process: Removing λ Productions
1. Let Vn={A∣∃ production A→λ}
2. Repeat until no more additions
* if B→A1A2…Am and Ai∈Vn
for all i, then put B in Vn
THUS, Vn={A∣A∗⇒λ}
3. Construct G′ with productions P′ such that
* If A→x1x2…xm∈P,m≥1, then
put all productions formed when xj is replaced by
λ (for all xj∈Vn) such that
|rhs|≥1 into P′.
Example
S→Ab
A→BCB∣Aa
B→b∣λ
C→cC∣λ
G′:
S→Ab∣b
A→BCB∣BC∣BB∣CB∣B∣C∣Aa∣a
B→b
C→cC∣c
NOTE: Don’t add A→λ!
Unit Productions
Definition: Unit Production:
Consider removing unit productions: Suppose we have
This becomes: A→a∣ab
Unit Productions (2)
But what if we have
A→B becomes A→C
B→CB→A
C→AC→B
But we didn’t get rid of unit productions!
Removing Unit Productions
Theorem (Remove unit productions)
Let G=(V,T,S,P) be a CFG without
λ-productions.
Then there exists CFG G′=(V′,T′,S,P′) that does not
have any unit-productions and L(G)=L(G′).
Process
1. Find for each A, all B such that
A∗⇒B
(Draw a dependency graph howing relationship of Unit
productions. Just draw arc for each A→B rule.
Draw A in a circle, B in a circle,
and an arc from A to B.)
2. Construct G′=(V′,T′,S,P′) by
(a) Put all non-unit productions in P′
(b) For all A∗⇒B such that
B→y1∣y2∣…yn∈P′,
put A→y1∣y2∣…yn∈P′
Note the star in A∗⇒B
Never put a unit production in P′.
Example (1)
Unit Production Dependency Graph:
Example (2)
Remove the unit production rules, and add these rules:
Result:
S→a∣bc∣bb∣Aa
A→a∣bb∣bc
B→a∣bb∣bc
Theorem
Theorem: Let L be a CFL that does not contain λ.
Then there exists a CFG for L that does not have any
useless productions, λ-productions, or unit-productions.
Proof:
1. Remove λ-productions
2. Remove unit-productions
3. Remove useless productions
Order is important.
Removing λ-productions can create unit-productions!
There are additional examples in the book.
Chomsky Normal Form (CNF)
Definition: A CFG is in Chomsky Normal Form (CNF) if all
productions are of the form
where A,B,C∈V and a∈T.
Why would you want to put a grammar in this form?
Because it is easier to work with (reason about), so will see it
in future.
Theorem:
Any CFG G with λ not in
L(G) has an equivalent grammar in CNF.
Proof:
1. Remove λ-productions, unit productions, and
useless productions. (We already know how to do this.)
2. For every right-hand-side of length >1,
replace each terminal xi by a new variable
Cj and add the production Cj→xi.
Note: All productions are in the correct form or the
right-hand-side is a string of variables.
3. Replace every right-hand-side of length >2 by a
series of productions, each with right-hand-side of length 2.
Example (1)
(after step 1)
S→CBC1C2
B→b
C→CC3∣e
C1→c
C2→d
C3→c
Example (2)
(after step 2):
S→CZ1
Z1→BZ2
Z2→C1C2
B→b
C→CC3∣e
C1→c
C2→d
C3→c
NOTE: Can get rid of λ-productions and unit
productions first!
Greibach Normal Form (GNF)
Definition: A CFG is in Greibach normal form (GNF) if
all productions have the form
where a∈T and x∈V∗
This is like an s-grammar (or simple grammar, Linz page 144),
except the s-grammar definition includes a further restriction that
any pair (A,a) can occur at most in one rule.
This is so that you wouldn’t have to backtrack (only one
choice to match the derivation of a string).
So it is very restrictive.
GNF Theorem
For every CFG G, there exists a grammar in GNF.
Simple substitutions give us:
S→aAB∣bBB∣bB
A→aA∣bB∣b
B→b
What You Should Know
Know what usless productions, unit productions, etc. are
Don’t memorize the processes for eliminating them, but
understand how they work in principle.
Know what the GNF and CNF forms are.