Loading [MathJax]/jax/output/HTML-CSS/jax.js

4.Transforming CFGs (1)§

We use grammars to represent a programming language.
Want to know: Is a given string (or program x) valid (syntactically correct)?
Same as asking if it is in the language.
We have seen that (with restrictions on form of CFG), we can determine whether w is in L(G) in 2|w| rounds, each step adding a terminal or increasing in length.
This works, but it is not fast enough, that is, not linear!

Transforming CFGs (2)

Key question: Are there ways to transform (restrict) CFGs such that
1) We can process efficiently
2) without restricting the power of CFGs

<< What does it mean “without restricting the power of CFGs”? >>

Specifically, we look at restrictions on the right hand side of the production rules.
Need to be able to automatically transform an arbitrary CFG into an equivalent restricted CFG.

Substitution Theorem (1)

Sometimes it is easier to reason about grammars for languages without λ.
(Besides, its pointless to keep expanding on productions that derive to the empty string.)
Useful note: It would be easy to add λ to any grammar by adding a new start symbol S0,
S0Sλ
So in practice, if we can transform grammar G into ˆG such that L(ˆG)=L(G){λ}, then the two are effectively equivalent.

Substitution Theorem (2)

Let G be a CFG. Suppose G contains Ax1Bx2
where xi(VT), A and B are different variables, and B has the productions
By1|y2||yn.
We can construct G from G by deleting
Ax1Bx2
from P and adding to it
Ax1y1x2|x1y2x2||x1ynx2.
Substitution Theorem: L(G)=L(G).

Substitution Theorem Example

Grammar G:
AaaaAabBc
BabbAb
Substitute to get ˆG:
AaaaAababbAcabbc
BabbAb
Then the B productions become useless productions.

<< Question: Why don’t we also delete B rules? >>

Substitution Theorem Example

What was the point to this? Look at derivations.

Derivation under G for aaabbc:
AaaAaaabBcaaabbc
Derivation under ˆG for aaabbc:
AaaAaaabbc

Useless Productions (1)

We left in the productions for B, but maybe there is no way remaining to reach them. Obviously they can go.
This example is not as obvious:
SaSbλA
AaA
Definition: Variable AV is said to be useful if and only if there is at least one wL(G) such that
SxAyw

We want to eliminate both types of useless production.

Theorem: (useless productions)

Let G be a CFG.
Then there exists ˆG that does not contain any useless variables or productions such that L(G)=L(ˆG).

Process (1)

To Remove Useless Productions:

Let G=(V,T,S,P).

I. Compute V1= {Variables that can derive strings of terminals}
1. V1=
2. Repeat until no more variables added
* For every AV with Ax1x2xn, all xi(TV1), add A to V1
3. Take P1 as all productions in P whose symbols are all in (V1T)

Then G1=(V1,T,S,P1) has no variables that can’t derive strings.

NOTE: Now need to get rid of productions we can’t use.

Example (1)

SaBbA
AaA
BSab
CcBca
DbCb
EAab

We process this to eliminate A.

Process (2)

II. Draw Variable Dependency Graph
For AxBy, draw AB.
Draw A in a circle, B in a circle, and an arc from A to B.
Remove productions for V if there is no path from S to V in the dependency graph.
Resulting Grammar G is such that L(G)=L(G) and G has no useless productions.

Example (2)

SaB
BSab
CcBca
DbCb
EAab
uselessgraph

Example (2)

G1:
SaB
BSab
CcBca

Now, do it again.

G:
SaB
BSab

Removing λ Productions

NOTE: Last time talked about simpler CFG that had no λ-productions, now we will show how to get rid of them.

Theorem (remove λ productions)
Let G be a CFG with λ not in L(G).
Then there exists a CFG G having no λ-productions such that L(G)=L(G).

Process: Removing λ Productions

1. Let Vn={A production Aλ}
2. Repeat until no more additions
* if BA1A2Am and AiVn for all i, then put B in Vn
THUS, Vn={AAλ}
3. Construct G with productions P such that
* If Ax1x2xmP,m1, then put all productions formed when xj is replaced by λ (for all xjVn) such that |rhs|1 into P.

Example

SAb
ABCBAa
Bbλ
CcCλ
G:
SAbb
ABCBBCBBCBBCAaa
Bb
CcCc

NOTE: Don’t add Aλ!

Unit Productions

Definition: Unit Production:
AB
where A,BV.
Consider removing unit productions: Suppose we have
AB
Baab

This becomes: Aaab

Unit Productions (2)

But what if we have
AB becomes AC
BCBA
CACB

But we didn’t get rid of unit productions!

Removing Unit Productions

Theorem (Remove unit productions)
Let G=(V,T,S,P) be a CFG without λ-productions.
Then there exists CFG G=(V,T,S,P) that does not have any unit-productions and L(G)=L(G).

Process

1. Find for each A, all B such that AB
(Draw a dependency graph howing relationship of Unit productions. Just draw arc for each AB rule.
Draw A in a circle, B in a circle, and an arc from A to B.)
2. Construct G=(V,T,S,P) by
(a) Put all non-unit productions in P
(b) For all AB such that By1y2ynP, put Ay1y2ynP
Run DFS with A as root.
Note the star in AB
Never put a unit production in P.

Example (1)

Original:
SAaB
BAbb
AabcB

Unit Production Dependency Graph:

Example (2)

Remove the unit production rules, and add these rules:
Sabcbb
Abb
Babc
Result:
SabcbbAa
Aabbbc
Babbbc

Theorem

Theorem: Let L be a CFL that does not contain λ. Then there exists a CFG for L that does not have any useless productions, λ-productions, or unit-productions.
Proof:
1. Remove λ-productions
2. Remove unit-productions
3. Remove useless productions
Order is important. Removing λ-productions can create unit-productions!

There are additional examples in the book.

Chomsky Normal Form (CNF)

Definition: A CFG is in Chomsky Normal Form (CNF) if all productions are of the form
ABC or Aa
where A,B,CV and aT.
Why would you want to put a grammar in this form?
Because it is easier to work with (reason about), so will see it in future.

Theorem:

Any CFG G with λ not in L(G) has an equivalent grammar in CNF.

Proof:
1. Remove λ-productions, unit productions, and useless productions. (We already know how to do this.)
2. For every right-hand-side of length >1, replace each terminal xi by a new variable Cj and add the production Cjxi.
Note: All productions are in the correct form or the right-hand-side is a string of variables.
3. Replace every right-hand-side of length >2 by a series of productions, each with right-hand-side of length 2.

Example (1)

SCBcd
Bb
CCce
(after step 1)
SCBC1C2
Bb
CCC3e
C1c
C2d
C3c

Example (2)

(after step 2):
SCZ1
Z1BZ2
Z2C1C2
Bb
CCC3e
C1c
C2d
C3c

NOTE: Can get rid of λ-productions and unit productions first!

Greibach Normal Form (GNF)

Definition: A CFG is in Greibach normal form (GNF) if all productions have the form
Aax
where aT and xV

This is like an s-grammar (or simple grammar, Linz page 144), except the s-grammar definition includes a further restriction that any pair (A,a) can occur at most in one rule.

This is so that you wouldn’t have to backtrack (only one choice to match the derivation of a string). So it is very restrictive.

GNF Theorem

For every CFG G, there exists a grammar in GNF.
See proof in book.
Example:
SAB
AaAbBb
Bb
Simple substitutions give us:
SaABbBBbB
AaAbBb
Bb

What You Should Know

Know what usless productions, unit productions, etc. are
Don’t memorize the processes for eliminating them, but understand how they work in principle.
Know what the GNF and CNF forms are.