3.Programming Languages§
= \ ;
begin
… end
in Pascal)= \ ;
begin
… end
in Pascal)Definition: \(L\) is a context-free language (CFL) iff there exists a context-free grammar (CFG) \(G\) such that \(L = L(G)\).
Definition: A linear grammar has at most one variable on the right hand side of any production. Thus, right linear and left linear grammars are also linear grammars.
This grammar is not a linear grammar, as there is a choice of which variable to replace.
To write an algorithm to perform replacements, we need some order. We will see this when we look at parsing algorithms.
Definition: Leftmost derivation: in each step of a derivation, replace the leftmost variable. (See derivation 1 above.)
Definition: Rightmost derivation: in each step of a derivation, replace the rightmost variable. (See derivation 2 above.)
Derivation Trees (also known as “parse trees”): A derivation tree represents a derivation, but does not show the order in which productions were applied.
Definitions: Partial derivation tree - subtree of derivation tree.
If partial derivation tree has root \(S\) then it represents a sentential form.
Leaves from left to right in a derivation tree form the yield of the tree.
If \(w\) is the yield of a derivation tree, then it must be that \(w \in L(G)\).
The yield for the example above is \(aacbb\).
A partial derivation tree that has root S:
Membership: Given CFG \(G\) and string \(w \in \Sigma^*\), is \(w \in L(G)\)?
If we can find a derivation of \(w\), then we would know that \(w\) is in \(L(G)\).
This is (part of) what a compiler does. You write a program, you compile it, and the compiler finds all your syntax mistakes.
(It also “translates” the program into “bytecode” to be executed. We won’t talk much about that aspect of compilers in this class.)
Is \(abbab \in L(G)\)?
Theorem: If CFG \(G\) does not contain rules of the form
\(A \rightarrow \lambda\)\(A \rightarrow B\)
where \(A, B \in V\), then we can determine if \(w \in L(G)\) or if \(w \not\in L(G)\).
Proof: Consider
1. Length of sentential forms2. Number of terminal symbols in a sentential form
Either 1 or 2 increases with each derivation.
Derivation of string \(w\) in \(L(G)\) takes \(\le 2|w|\) times through loop in the exhaustive algorithm.
Thus, if there are \(> 2|w|\) times through loop, then \(w \not\in L(G)\).
Let \(L_2 = L_1 - \{\lambda\}\). \(L_2 = L(G)\) where \(G\) is:
\(S \rightarrow SS\ |\ aa\ |\ aSa\ |\ b\)
NOTE that this grammar is in the correct form for the theorem.
Show \(baaba \not\in L(G)\).
With each substitution, either there is at least one more terminal or the length of the sentential form has increased.
So after we process the loop for \(i = 10\), we can conclude that \(baaba\) is not in \(L(G)\).
Next chapter, we will learn methods for taking a grammar and transforming it into an equivalent (or almost) equivalent grammar.
For now, here is another form that will make membership testing easier.
If you use the exhaustive search method to ask if \(w \in L(G)\), where \(G\) is an s-grammar, the number of terminals increases with each step.
Definition: A CFG \(G\) is ambiguous if there exists some \(w \in L(G)\) which has two distinct derivation trees.
Expression grammar
\(G = (\{E, I\}, \{a, b, +, *, (, )\}, E, P), P =\)
\(E \rightarrow E+E\ |\ E*E\ |\ (E)\ |\ I\)\(I \rightarrow a\ |\ b\)
Derivation of \(a+b*a\) is:
\(E \Rightarrow \underline{E}+E \Rightarrow \underline{I}+E \Rightarrow a+\underline{E} \Rightarrow a+\underline{E}*E \Rightarrow a+\underline{I}*E \Rightarrow a+b*\underline{E} \Rightarrow a+b*\underline{I} \Rightarrow a+b*a\)
Corresponding derivation tree is:
Derivation trees of expressions are evaluated bottom up. So if \(a = 2\) and \(b = 4\), then the “result” of this expression is \(2+(4*2) = 10\).
If \(a = 2\) and \(b = 4\), then the “result” of this expression is \((2+4)*2 = 12\).
There are two distinct derivation trees for the same string. Thus the grammar is ambiguous. The string can have different meanings depending on which way it is interpreted.
If \(G\) is a grammar for Java programs and \(w\) is Bob’s Java program, he doesn’t want one compiler to give one meaning to his program and another compiler to interpret his program differently. Disaster!
Rewrite the grammar as an unambiguous grammar. (Specifically, with the meaning that multiplication has higher precedence than addition.)
\(E \rightarrow E+T\ |\ T\)\(T \rightarrow T*F\ |\ F\)\(F \rightarrow I\ |\ (E)\)\(I \rightarrow a\ |\ b\)
There is only one derivation tree for \(a+b*a\):
Try to get a derivation tree with the other meaning of \(a+b*c\), when \(*\) is closer to the root of the tree.
\(E \Rightarrow T \Rightarrow T*F ...\) Then the only way to include a “\(+\)” before the multiplication is if the addition is enclosed in parenthesis. Thus, there is only one meaning that is accepted.
Definition: If \(L\) is CFL and \(G\) is an unambiguous CFG such that \(L = L(G)\), then \(L\) is unambiguous.
<<Why are we studying CFL? Because we want to be able to represent syntactically correct programs.>>
Sample C++ Program::
main () {
int a; int b; int sum;
a = 40; b = 6; sum = a + b;
cout << "sum is "<< sum << endl;
}
“Attempt” to write a CFG for C++ in BNF (Note: \(<\mbox{program}>\) is start symbol of grammar:
<program> ::= main () <block> <block> ::= { <stmt-list> } <stmt-list> ::= <stmt> | <stmt> stmt-list> | <decl> | <decl> <stmt-list> <decl> ::= int <id> ; | double <id> ; <stmt> ::= <asgn-stmt> | <assgn-stmt> | <cout-stmt> <asgn-stmt> ::= <id> = <expr> ; <expr> ::= <expr> + <expr> | <expr> * <expr> | ( <expr> ) | <id> <cout-stmt> ::= cout <out-list>
etc., Must expand all nonterminals!
So a derivation of the program test would look like:
<program> ==> main() <block>
==> main() { <stmt-list> }
==> main() { <decl> <stmt-list> }
==> main() { int <id> <stmt-list> }
==> main() { int a <stmt-list> }
...
==> complete C++ program
Example: Formal parameters should match actual parameters (# and type):
declare: int Sum(int a, int b, int c) ...
call: newsum = Sum(x,y);