2.Regular Expressions
Regular expressions are a way to specify the set of strings that
define a language.
There are three operators that can be used:
\(+\) union (or)
\(\cdot\) concatenation (AND)
\(\,^*\) star-closure (repeat 0 or more times)
We often omit showing the \(\cdot\) for concatenation.
Examples
Example 1:
\((a + b)^* \cdot a \cdot (a + b)^* = (a + b)^*a(a + b)^*\)
What language is this?
All strings from \(\{a, b\}^*\) that contain at least one
\(a\) .
Example2 :
\((aa)*\)
What language is this?
Strings of \(a\) ‘s with an even number of \(a\) ‘s
Definition for Regular Expression
Given \(\Sigma\) ,
\(\emptyset\) , \(\lambda\) , and \(a \in \Sigma\) are
regular expressions.
If \(r\) and \(s\) are regular expressions then
\(r + s\) is a regular expression.
\(r s\) is a regular expression.
\((r)\) is a regular expression.
\(r^*\) is a regular expression.
\(r\) is a regular expression iff it can be derived from (1)
with a finite number of applications of (2).
Definition for Regular Language
\(L(r) =\) language denoted by some regular expression \(r\) .
\(\emptyset\) , \(\{\lambda\}\) , and \(\{a \in \Sigma\}\)
are languages denoted by a regular expression.
If \(r\) and \(s\) are regular expressions, then
\(L(r + s) = L(r) \cup L(s)\)
\(L(r s) = L(r) \cdot L(s)\)
\(L((r)) = L(r)\)
\(L((r)^*) = (L(r)^*)\)
Precedence Rules
\(^*\) highest
\(\cdot\)
\(+\) lowest
Example: \(ab^* + c = (a(b)^*) + c\)
Examples of Regular Languages
\(\Sigma = \{a,b\}\) ,
\(\{w \in {\Sigma}^{*} \mid w\)
has an odd number of \(a\) ‘s followed by an even number of
\(b\) ‘s \(\}\) :
\(\Sigma=\{a,b\}\) , \(\{w \in {\Sigma}^{*} \mid w\) has no more than
three \(a\) ‘s and must end in \(ab\}\) :
Regular expression for positive and negative integers:
Regular Expressions vs. Regular Languages
Recall that we previously defined the term regular language to mean
the languages that are recognized by a DFA.
(Which is the same as the languages recognized by an NFA.)
How do regular expressions relate to these?
We can easily see NFAs for \(\emptyset\) , \(\lambda\) , and
\(a \in \Sigma\) .
But what about the “more interesting” regular expressions?
Can every regular language \(r\) be described by some
regular expression?
Regular Expression \(\rightarrow\) NFA (1)
Theorem: Let \(r\) be a R.E.
Then \(\exists\) NFA \(M\) such that \(L(M) = L(r)\) .
Proof: By simple simulation/construction.
(This is a standard approach to proving such things!)
We aleady know that we can easily do \(\emptyset\) ,
\(\{\lambda\}\) , and \(\{a\}\) for \(a \in \Sigma\) .
Regular Expression \(\rightarrow\) NFA (1a)
Here is an NFA that accepts nothing (\(\emptyset\) ).
Here is an NFA that accepts an empty string (\(\lambda\) ).
Here is an NFA that accepts \(a \in \Sigma\) .
Regular Expression \(\rightarrow\) NFA (2)
Suppose that \(r\) and \(s\) are regular expressions.
(By induction…)
That means that there is an NFA for \(r\) and an NFA for
\(s\) .
\(r + s\) . Simply add a new start state and a new final
state, each connected (in parallel) with \(\lambda\)
transitions to both \(r\) and \(s\) .
\(r \cdot s\) . Add new start state and new final state,
and connect them with \(\lambda\) transitions in series.
\(r^*\) . Add new start and final states, along with
\(\lambda\) transitions that allow free movement between
them all.
<< Question: Why are we using NFAs for this proof?>>
Regular Expression \(\rightarrow\) NFA: Schematic
Regular Expression \(\rightarrow\) NFA: Or
.
.
Regular Expression \(\rightarrow\) NFA: Concatenation
Regular Expression \(\rightarrow\) NFA: Star
Example
\(ab^* + c\)
.
.
NFA \(\rightarrow\) Regular Expression
Theorem: Let \(L\) be regular. Then \(\exists\) a
regular expression such that \(L = L(r)\) .
Perhaps you see that any regular expression can be
implemented as a NFA.
For most of us, its not obvious that any NFA can be converted to a
regular expression.
Proof Idea:
Remove states sucessively, generating equivalent
generalized transition graphs (GTG) until only two states are
left (initial state and one final state).
The transition between these states is a regular expression
that is equivalent to the original NFA.
Generalized Transition Graph (GTG)
A Generalized Transition Graph (GTG) is a transition
graph whose edges can be labeled with any regular expression.
Thus, it “generalizes” the standard transition graph.
Definition: A complete GTG is a complete graph, meaning that every
state has a transition to every other state.
Any GTG can be converted to a complete GTG by adding edges labeled
\(\emptyset\) as needed.
Proof (1)
\(L\) is regular \(\Rightarrow \exists\) NFA \(M\) such
that \(L = L(M)\) .
1. Assume \(M\) has one final state, and \(q_0 \notin F\) .
2. Convert \(M\) to a complete GTG.
Let \(r_{ij}\) stand for the label of the edge from \(q_i\)
to \(q_j\) .
3. If the GTG has only two states, then it has this form:
Add an arrow to the start state.
Then, the corresponding regular expression is:
\(r = (r^*_{ii}r_{ij}r^*_{jj}r_{ji})^*r^*_{ii}r_{ij}r^*_{jj}\)
Proof (2)
(See homework question about why its fair to just assume that
there is a single final state, and that the start state is not final.)
If the GTG has three states, then it must have the following form:
Proof (3)
In this case, make the following replacements:
REPLACE \(\quad\) WITH
\(r_{ii} \qquad\qquad\qquad\qquad r_{ii}+r_{ik}r_{kk}^{*}r_{ki}\)
\(r_{jj} \qquad\qquad\qquad\qquad r_{jj}+r_{jk}r_{kk}^{*}r_{kj}\)
\(r_{ij} \qquad\qquad\qquad\qquad r_{ij}+r_{ik}r_{kk}^{*}r_{kj}\)
\(r_{ji} \qquad\qquad\qquad\qquad r_{ji}+r_{jk}r_{kk}^{*}r_{ki}\)
After these replacements, remove state \(q_k\) and its edges.
Proof (4)
If the GTG has four or more states, pick any state \(q_k\) that
is not the start or the final state.
It will be removed.
For all \(o \neq k, p \neq k\) , replace \(r_{op}\) with
\(r_{op} + r_{ok}r^*_{kk}r_{kp}\) .
When done, remove \(q_k\) and all its edges.
Continue eliminating states until only two states are left.
Finish with step 3 of the “Proof (1)” slide.
Proof (5)
In each step, we can simplify regular expressions \(r\) and
\(s\) with any of these rules that apply:
\(r + r = r\)
\(s + r{}^{*}s = r{}^{*}s\)
\(r + \emptyset = r\)
\(r\emptyset = \emptyset\)
\(\emptyset^{*} = \{\lambda\}\)
\(r\lambda = r\)
\((\lambda + r)^{*} = r^{*}\)
\((\lambda + r)r^{*} = r^{*}\)
And similar rules.
Example
Conclusion
We have now demonstrated that R.E. is equivalent (meaning, goes both
directions) to DFA.