Processing math: 100%

5.Transforming CFGs (1)§

We use grammars to represent a programming language.
Want to know: Is a given string (or program x) valid (syntactically correct)?
Same as asking if it is in the language.
We have seen that (with restrictions on form of CFG), we can determine whether w is in L(G) in 2|w| rounds, each step adding a terminal or increasing in length.
This works, but it is not fast enough, that is, not linear!

Transforming CFGs (2)

Key question: Are there ways to transform (restrict) CFGs such that
1) We can process efficiently
2) without restricting the power of CFGs

<< What does it mean “without restricting the power of CFGs”? >>

Specifically, we look at restrictions on the right hand side of the production rules.
Need to be able to automatically transform an arbitrary CFG into an equivalent restricted CFG.

Substitution Theorem (1)

Sometimes it is easier to reason about grammars for languages without λ.
(Besides, its pointless to keep expanding on productions that derive to the empty string.)
Useful note: It would be easy to add λ to any grammar by adding a new start symbol S0,
S0Sλ
So in practice, if we can transform grammar G into ˆG such that L(ˆG)=L(G){λ}, then the two are effectively equivalent.

Substitution Theorem (2)

Let G be a CFG. Suppose G contains Ax1Bx2
where xi(VT), A and B are different variables, and B has the productions
By1y2yn.
We can construct G from G by deleting
Ax1Bx2
from P and adding to it
Ax1y1x2x1y2x2x1ynx2.
Substitution Theorem: L(G)=L(G).

Substitution Theorem Example

Grammar G:
AaaaAabBc
BabbAb
Substitute to get ˆG:
AaaaAababbAcabbc
BabbAb
Then the B productions become useless productions.

<< Question: Why don’t we also delete B rules? >>

Substitution Theorem Example

What was the point to this? Look at derivations.

Derivation under G for aaabbc:
AaaAaaabBcaaabbc
Derivation under ˆG for aaabbc:
AaaAaaabbc

Useless Productions (1)

We left in the productions for B, but maybe there is no way remaining to reach them. In that situation, they can go.
This example is not as obvious: (What is wrong with the A production?)
SaSbλA
AaA
Definition: Variable AV is said to be useful if and only if there is at least one wL(G) such that SxAyw

We want to eliminate both types of useless production (ones that can’t be reached, and ones that don’t terminate).

Theorem: (useless productions)

Let G be a CFG.
Then there exists ˆG that does not contain any useless variables or productions such that L(G)=L(ˆG).

Process (1)

To Remove Useless Productions:

Let G=(V,T,S,P).

I. Compute V1= {Variables that can derive strings of terminals}
1. V1=
2. Repeat until no more variables added
* For every AV with Ax1x2xn, all xi(TV1), add A to V1
3. Take P1 as all productions in P whose symbols are all in (V1T)
Then G1=(V1,T,S,P1) has no variables that can’t derive strings.

NOTE: Now need to get rid of productions we can’t use.

Example (1)

SaBbA
AaA
BSab
CcBca
DbCb
EAab
We process this to eliminate A:
B, C, and E all have productions with just terminals, so they go into V1.
Now, D has a production with just symbols that are terminals or variables already in V1, so it goes into V1.

Process (2)

II. Draw Variable Dependency Graph
For AxBy, draw AB.
Make A and B be nodes with an arc from A to B.
Remove productions for V if there is no path from S to V in the dependency graph.
Resulting Grammar G is such that L(G)=L(G) and G has no useless productions.

Example (2)

SaB
BSab
CcBca
DbCb
Eb
uselessgraph

Example (3)

G:
SaB
BSab

Removing λ Productions

NOTE: Last time talked about simpler CFG that had no λ productions, now we will show how to get rid of them.

Theorem (remove λ productions)
Let G be a CFG with λ not in L(G).
Then there exists a CFG G having no λ productions such that L(G)=L(G).

Process: Removing λ Productions

1. Let Vn={A production Aλ}
2. Repeat until no more additions
* if BA1A2Am and AiVn for all i, then put B in Vn
THUS, Vn={AAλ}
3. Construct G with productions P such that
* If Ax1x2xmP,m1, then put all productions formed when xj is replaced by λ (for all xjVn) such that |rhs|1 into P.

Example: Step 2

SAb
ABCBAa
Bbλ
CcCλ
2. Repeat until no more additions
* if BA1A2Am and AiVn for all i, then put B in Vn
THUS, Vn={AAλ}
Which variables can derive λ?
B and C directly
A indirectly (because BCB can become λ)

Example: Step 3

3. Construct G with productions P such that
* If Ax1x2xmP,m1, then put all productions formed when xj is replaced by λ (for all xjVn) such that |rhs|1 into P.
G:
SAbb
ABCBBCBBCBBCAaa
Bb
CcCc

NOTE: Don’t add Aλ!

Unit Productions

Definition: Unit Production:
AB
where A,BV.
Consider removing unit productions: Suppose we have
AB
Baab

This becomes: Aaab

Unit Productions (2)

But what if we have
AB becomes AC
BCBA
CACB

But we didn’t get rid of unit productions!

Removing Unit Productions

Theorem (Remove unit productions)
Let G=(V,T,S,P) be a CFG without λ productions.
Then there exists CFG G=(V,T,S,P) that does not have any unit productions and L(G)=L(G).

Process

1. Find for each A, all B such that AB
(Draw a dependency graph showing relationship of Unit productions.
Make a node for each variable. For any rule AB, draw an arc from A to B.)
2. Construct G=(V,T,S,P) by
(a) Put all non-unit productions in P
(b) For all AB such that By1y2ynP, put Ay1y2ynP
Run DFS with A as root.
Note the star in AB
Never put a unit production in P.

Example (1)

Original:
SAaB
BAbb
AabcB

Unit Production Dependency Graph:

Example (2)

Remove the unit production rules, and add these rules:
Sabcbb
Abb
Babc
Result:
SabcbbAa
Aabbbc
Babbbc

Theorem

Theorem: Let L be a CFL that does not contain λ. Then there exists a CFG for L that does not have any useless productions, λ productions, or unit productions.
Proof:
1. Remove λ productions
2. Remove unit productions
3. Remove useless productions
Order is important. Removing λ productions can create unit productions!

Chomsky Normal Form (CNF)

Definition: A CFG is in Chomsky Normal Form (CNF) if all productions are of the form
ABC or Aa
where A,B,CV and aT.
Why would you want to put a grammar in this form?
Because it is easier to work with (reason about), so will see it in future.

Theorem:

Any CFG G with λ not in L(G) has an equivalent grammar in CNF.

Proof:
1. Remove λ productions, unit productions, and useless productions. (We already know how to do this.)
2. For every right-hand-side of length >1, replace each terminal xi by a new variable Cj and add the production Cjxi.
Note: All productions are in the correct form or the right-hand-side is a string of variables.
3. Replace every right-hand-side of length >2 by a series of productions, each with right-hand-side of length 2.

Example (1)

SCBcd
Bb
CCce
(after removing λ, usless, unit productions:)
SCBC1C2
Bb
CCC3e
C1c
C2d
C3c

Example (2)

(after step 2):
SCZ1
Z1BZ2
Z2C1C2
Bb
CCC3e
C1c
C2d
C3c

Greibach Normal Form (GNF)

Definition: A CFG is in Greibach normal form (GNF) if all productions have the form
Aax
where aT and xV

This is like an s-grammar (or simple grammar), except the s-grammar definition includes a further restriction that any pair (A,a) can occur at most in one rule.

This is so that you wouldn’t have to backtrack (only one choice to match the derivation of a string). So it is very restrictive.

GNF Theorem

For every CFG G, there exists a grammar in GNF.
See proof in modules.
Example:
SAB
AaAbBb
Bb
Simple substitutions give us:
SaABbBBbB
AaAbBb
Bb

What You Should Know

Know what usless productions, unit productions, etc. are
Don’t memorize the processes for eliminating them, but understand how they work in principle.
Know what the GNF and CNF forms are.