6.1. Context-Free Languages¶

6.1.1. Context-Free Languages¶

In the previous chapters, we saw that some languages are regular languages, which means that we can define a DFA, NFA, regular expression, or regular grammar for them.

Examples of regular languages:

keywords in a programming language
names of identifiers
integers
a finite list of miscillaneous symbols: = ;

Then we covered the Pumping Lemma which is a tool to prove that if a language is not regular.

Examples for Not Regular languages:

expressions: $((a + b) - c)$
block structures ( $\{\}$ in Java/C++ and begin … end in Pascal)

We know that not all languages are not regular, since we’ve proved some are not. Now we will look at a class of languages that is larger than the regular languages, and more powerful: Context-free languages.

1 / 14 Settings

<<<>>>

Definition: A grammar $G=(V,T,S,P)$ is context-free if all productions are of the form
$A \rightarrow x$
where $A \in V$ and $x \in (V∪T)^∗$.
(T includes $\lambda$.)

Saving...

Server Error
Resubmit

6.1.2. String Derivation¶

1 / 38 Settings

<<<>>>

Given a grammar $G$, the most important question to ask is whether a given string $w$ can be derived from $G$. This process is called string derivation.

String derivation begins with the start variable for the grammar (we often use $S$ as the start variable). A series of replacements on variables is performed until there are no more variables to replace. At each step, a variable is replaced by the right hand side of some production rule that has that variable as its left hand side. The goal is to select the proper replacement steps that will lead to $w$.

Saving...

Server Error
Resubmit

6.1.3. Derivation Trees¶

1 / 13 Settings

<<<>>>

We have covered how to derive a string by using sentential forms. Now we a more visual derivation method.

Derivation Trees (also known as parse trees) represent a derivation, but do not show the order in which the productions were applied.

Saving...

Server Error
Resubmit

6.1.4. Derivation Trees Example¶

1 / 7 Settings

<<<>>>

Nonterminals are enclosed in brackets $<>$

For “ $\rightarrow$ ” use instead “ $::=$ ”

Sample C++ Program::

main () {
  int a;     int b;   int sum;
  a = 40;    b = 6;   sum = a + b;
  cout << "sum is "<< sum << endl;
}

“Attempt” to write a CFG for C++ in BNF (Note: $<\mbox{program}>$ is start symbol of grammar.)

$\begin{split}\begin{eqnarray*} <\mbox{program}> &::=& \mbox{main} ()\ <\mbox{block}>\\ <\mbox{block}> &::=& \{\ <\mbox{stmt-list}>\ \}\\ <\mbox{stmt-list}> &::=& <\mbox{stmt}>\ |\ <\mbox{stmt}>\ <\mbox{stmt-list}>\ |\ <\mbox{decl}>\ |\ <\mbox{decl}> <\mbox{stmt-list}> \\ <\mbox{decl}> &::=& \mbox{int}\ <\mbox{id}>\ ;\ |\ \mbox{double}\ <\mbox{id}>\ ; \\ <\mbox{stmt}> &::=& <\mbox{asgn-stmt}>\ |\ <\mbox{cout-stmt}>\\ <\mbox{asgn-stmt}> &::=& <\mbox{id}>\ =\ <\mbox{expr}>\ ;\\ <\mbox{expr}> &::=& <\mbox{expr}>\ +\ <\mbox{expr}>\ |\ <\mbox{expr}>\ *\ <\mbox{expr}>\ |\ (\ <\mbox{expr}>\ )\ |\ <\mbox{id}>\\ <\mbox{cout-stmt}> &::=& \mbox{cout}\ <\mbox{out-list}>\\ \end{eqnarray*}\end{split}$

etc., Must expand all nonterminals!

So a derivation of the program test would look like:

$\begin{split}<\mbox{program}> &\Rightarrow&\ \mbox{main} ()\ <\mbox{block}> \\ &\Rightarrow&\ \mbox{main} ()\ \{\ <\mbox{stmt-list}>\ \} \\ &\Rightarrow&\ \mbox{main} ()\ \{\ <\mbox{decl}>\ <\mbox{stmt-list}>\ \} \\ &\Rightarrow&\ \mbox{main} ()\ \{\ \mbox{int}\ <\mbox{id}>\ <\mbox{stmt-list}>\ \} \\ &\Rightarrow&\ \mbox{main} ()\ \{\ \mbox{int}\ \mbox{a}\ <\mbox{stmt-list}>\ \} \\ &\stackrel{*}{\Rightarrow}&\ \mbox{complete C++ program}\end{split}$

This problem asks you to provide a characterization in English of the language generated by a BNF grammar. After you finish it, there is one more problem about Extended Backus-Naur Form, which is described before the problem.

6.1.11. Extended BNF¶

The symbols we have used in our representation of grammars collectively comprise what is known as Backus-Naur Form (BNF). In Extended Backus-Naur Form (EBNF) we add five meta-symbols to those already used in BNF notation:

Kleene closure operator $*$ , which means “zero or more”. Hence if $<fn\_name>$ were a non-terminal representing a valid function name and $<argument>$ were a non-terminal representing a valid argument, then the EBNF notation for function calls with zero or more arguments (with no commas between them) would be

$<fn\_name> "(" <argument>* ")"$

Positive closure operator $+$ . The EBNF notation for function calls that must have at least one argument would be

$<fn\_name> "(" <argument>+ ")"$

The two paired parenthesis symbols $( \; )$ , which are used for grouping. For example, if $<positive\_number>$ were the non-terminal denoting a valid positive number, then the following EBNF would dictate that we must have a plus or minus sign preceding a number

$(+ | -) <positive\_number>$

The “optional operator” $?$ , which specifies that you can have zero or one of whatever grammatical structure precedes the operator. For example, if our language allowed an optional plus or minus sign in front of a number, we would use the EBNF

$(+ | -)? <positive\_number>$

EBNF is used to reduce the number of productions a grammar needs to specify a language. However, it does not increase the expressive power of grammars, that is, any grammatical structure that can be expressed in EBNF can also be expressed in BNF if one is willing to use more productions.

This last problem is about the equivalence between a given BNF grammar (the same one as in part 4 above) and a smaller EBNF grammar.

More on CFG for C++

Last time we “attempted” to write a CFG for C++, it is possible to write a CFG that recognizes all syntactically correct C++ programs, but there is a problem that the CFG also accepts incorrect programs. For example, it can’t recognize that it is an error to declare the same variable twice, once as an integer and once as a char.

We can write a CFG $G$ such that $L(G) = \{ \mbox{syntactically correct C++ programs} \}$ .

But note that $\{ \mbox{semantically correct C++ programs} \} \subset L(G)$ .

Another example: Can’t recognize if formal parameters match actual parameters in number and type:

declare: int Sum(int a, int b, int c) …

call: newsum = Sum(x,y);

CS4114 Formal Languages and Automata

Chapter 6 Context-Free Grammars and Languages

6.1. Context-Free Languages¶

6.1.1. Context-Free Languages¶

6.1.2. String Derivation¶

6.1.3. Derivation Trees¶

6.1.4. Derivation Trees Example¶

6.1.5. Practice question 1¶

6.1.6. Membership Problem¶

6.1.7. Practice question 2¶

6.1.8. Ambiguity¶

6.1.8.1. Ambiguous Grammars (1)¶

6.1.8.2. Ambiguous Grammars (2)¶

6.1.8.3. Ambiguous Grammars (3)¶

6.1.8.4. Ambiguous Grammars (4)¶

6.1.9. Precedence Practice¶

6.1.10. Unambiguous grammar parse tree Example¶

6.1.10.1. Associativity¶

6.1.10.2. Precedence and Associativity¶

6.1.11. Extended BNF¶