6.Deterministic vs. Non-deterministic PDAs and Grammars§

Now we know that:
All CFL can be generated by a CFG, and implemented by a NPDA.
Not all CFL can be generated using a DPDA.
So some CFG are associated with only non-deterministic PDAs.
Nondeterminism gives us something more in terms of capability.

Why do we care?

Because we want to parse efficiently.
This means, we want to quickly determine if a given string can be generated from a given grammar.
Determinism seems to be a fundamental requirement if we hope to do this.

PDAs and Parsing

Think of a PDA as a parsing device.
No backtracking \(\Rightarrow\) can make a decision at every step on what to do next.
This is the same as knowing which grammar production comes next.
Clearly linear time when the grammar has been simplified (no unit productions), because every derivation step consumes an input character.
Oops… except for \(\lambda\) productions. But actually, these are not really a problem either.

Top-down Parsing with Lookahead

Start with the start symbol
Scan left-to-right through the string.
At each step, we want only to follow one rule when we look at the current character.
Perhaps we don’t see a production for the current character, but instead pop something off the stack (\(\lambda\) production).
(BTW: This is why \(\lambda\) productions are still linear, if we don’t put too much on the stack when we process a character.)

S-grammars

Recall that an S-grammar has all productions of the form:
\(A \rightarrow ax\)
where \(A \in V\), \(a \in T\), and \(x \in V^*\) AND any pair \((A, a)\) can occur in at most one rule.
Obviously this can be parsed efficiently.
But, s-grammars are more restrictive than we want.
Lots of useful language constructs cannot be defined using an s-grammar.
We want to generalize as much as we can to capture a broader subset of CFLs

LL(k) Grammars

LL means “left-to-right” and “left-most derivation” is constructed.
\(k\) means that we can look ahead at most \(k-1\) characters.
Every s-grammar is LL, but so are more grammars.

LL Example 1

Consider this grammar:
\(S \rightarrow aSb \mid ab\)
This is not an s-grammar.
But, this is an LL grammar.
By looking at the next two characters, we always know which rule to apply.
If we see \(ab\), then apply \(S \rightarrow ab\).
<< What gets consumed, what goes on the stack? >>
Otherwise, apply \(S \rightarrow aSb\)
<< What gets consumed, what goes on the stack? >>

LL Example 2 (1)

Consider this grammar:
\(S \rightarrow SS \mid aSb \mid ab\)
This is a useful grammar! It captures nested parentheses.
This is not an LL(k) grammar for any \(k\).
<< Why not?? >>

LL Example 2 (2)

Just because the grammar is not LL(k) does not mean that the language might not be deterministic.
The reasoning for why this was not LL(k) should help you to see how to fix the grammar.
Consider this grammar: \(S \rightarrow aSbS \mid \lambda\)
This is LL.
Example: Derive \(w = abab\).
\(S \Rightarrow aSbS \Rightarrow abS \Rightarrow abaSbS \Rightarrow ababS \Rightarrow abab\).
When the input next input symbol is \(a\), we must use \(S \rightarrow aSbS\).
When the input next input symbol is \(b\) or string is empty, we must use \(S \rightarrow \lambda\).

LL Example 2 (3)

One last problem: This grammar accepts the empty string.
If we don’t like that, then there is an easy fix.
Just define a new start symbol that avoids the \(\lambda\) production.
\(S_0 \rightarrow aSbS\)
\(S \rightarrow aSbS \mid \lambda\)