2.Regular Expressions§

Regular expressions are a way to specify the set of strings that define a language.

There are three operators that can be used:
\(+\) union (or)
\(\cdot\) concatenation (AND)
\(\,^*\) star-closure (repeat 0 or more times)
We often omit showing the \(\cdot\) for concatenation.

Examples

Example 1:
\((a + b)^* \cdot a \cdot (a + b)^* = (a + b)^*a(a + b)^*\)
What language is this? All strings from \(\{a, b\}^*\) that contain at least one \(a\).
Example2 :
\((aa)*\)
What language is this? Strings of \(a\) ‘s with an even number of \(a\) ‘s

Definition for Regular Expression

Given \(\Sigma\),

  1. \(\emptyset\), \(\lambda\), and \(a \in \Sigma\) are regular expressions.

  2. If \(r\) and \(s\) are regular expressions then
    \(r + s\) is a regular expression.
    \(r s\) is a regular expression.
    \((r)\) is a regular expression.
    \(r^*\) is a regular expression.
  3. \(r\) is a regular expression iff it can be derived from (1) with a finite number of applications of (2).

Definition for Regular Language

\(L(r) =\) language denoted by some regular expression \(r\).

  1. \(\emptyset\), \(\{\lambda\}\), and \(\{a \in \Sigma\}\) are languages denoted by a regular expression.

  2. If \(r\) and \(s\) are regular expressions, then
    \(L(r + s) = L(r) \cup L(s)\)
    \(L(r s) = L(r) \cdot L(s)\)
    \(L((r)) = L(r)\)
    \(L((r)^*) = (L(r)^*)\)

Precedence Rules

Example: \(ab^* + c = (a(b)^*) + c\)

Examples of Regular Languages

  1. \(\Sigma = \{a,b\}\), \(\{w \in {\Sigma}^{*} \mid w\) has an odd number of \(a\) ‘s followed by an even number of \(b\) ‘s \(\}\):

    ??
  2. \(\Sigma=\{a,b\}\), \(\{w \in {\Sigma}^{*} \mid w\) has no more than three \(a\) ‘s and must end in \(ab\}\):

    ??
  3. Regular expression for positive and negative integers:

    ??

Regular Expressions vs. Regular Languages

Recall that we previously defined the term regular language to mean the languages that are recognized by a DFA.
(Which is the same as the languages recognized by an NFA.)
How do regular expressions relate to these?
We can easily see NFAs for \(\emptyset\), \(\lambda\), and \(a \in \Sigma\).
But what about the “more interesting” regular expressions?
Can every regular language \(r\) be described by some regular expression?

Regular Expression \(\rightarrow\) NFA (1)

Theorem: Let \(r\) be a R.E. Then \(\exists\) NFA \(M\) such that \(L(M) = L(r)\).

Proof: By simple simulation/construction.
(This is a standard approach to proving such things!)

We aleady know that we can easily do \(\emptyset\), \(\{\lambda\}\), and \(\{a\}\) for \(a \in \Sigma\).

Regular Expression \(\rightarrow\) NFA (1a)

Here is an NFA that accepts nothing (\(\emptyset\)).

Here is an NFA that accepts an empty string (\(\lambda\)).

Here is an NFA that accepts \(a \in \Sigma\).

Regular Expression \(\rightarrow\) NFA (2)

Suppose that \(r\) and \(s\) are regular expressions. (By induction…) That means that there is an NFA for \(r\) and an NFA for \(s\).

  1. \(r + s\). Simply add a new start state and a new final state, each connected (in parallel) with \(\lambda\) transitions to both \(r\) and \(s\).
  2. \(r \cdot s\). Add new start state and new final state, and connect them with \(\lambda\) transitions in series.
  3. \(r^*\). Add new start and final states, along with \(\lambda\) transitions that allow free movement between them all.

<< Question: Why are we using NFAs for this proof?>>

Regular Expression \(\rightarrow\) NFA: Schematic

Settings

Proficient Saving... Error Saving
Server Error
Resubmit

Regular Expression \(\rightarrow\) NFA: Or

Settings

Proficient Saving... Error Saving
Server Error
Resubmit

.

.

Regular Expression \(\rightarrow\) NFA: Concatenation

Settings

Proficient Saving... Error Saving
Server Error
Resubmit

Regular Expression \(\rightarrow\) NFA: Star

Settings

Proficient Saving... Error Saving
Server Error
Resubmit

Example

\(ab^* + c\)

Settings

Proficient Saving... Error Saving
Server Error
Resubmit

.

.

NFA \(\rightarrow\) Regular Expression

Theorem: Let \(L\) be regular. Then \(\exists\) a regular expression such that \(L = L(r)\).

Perhaps you see that any regular expression can be implemented as a NFA.
For most of us, its not obvious that any NFA can be converted to a regular expression.
Proof Idea:
Remove states sucessively, generating equivalent generalized transition graphs (GTG) until only two states are left (initial state and one final state).
The transition between these states is a regular expression that is equivalent to the original NFA.

Generalized Transition Graph (GTG)

A Generalized Transition Graph (GTG) is a transition graph whose edges can be labeled with any regular expression. Thus, it “generalizes” the standard transition graph.

Definition: A complete GTG is a complete graph, meaning that every state has a transition to every other state. Any GTG can be converted to a complete GTG by adding edges labeled \(\emptyset\) as needed.

Proof (1)

\(L\) is regular \(\Rightarrow \exists\) NFA \(M\) such that \(L = L(M)\).
1. Assume \(M\) has one final state, and \(q_0 \notin F\).
2. Convert \(M\) to a complete GTG.
Let \(r_{ij}\) stand for the label of the edge from \(q_i\) to \(q_j\).
3. If the GTG has only two states, then it has this form:
Add an arrow to the start state.
Then, the corresponding regular expression is: \(r = (r^*_{ii}r_{ij}r^*_{jj}r_{ji})^*r^*_{ii}r_{ij}r^*_{jj}\)

Proof (2)

(See homework question about why its fair to just assume that there is a single final state, and that the start state is not final.)

If the GTG has three states, then it must have the following form:

Proof (3)

In this case, make the following replacements:

REPLACE \(\quad\) WITH
\(r_{ii} \qquad\qquad\qquad\qquad r_{ii}+r_{ik}r_{kk}^{*}r_{ki}\)
\(r_{jj} \qquad\qquad\qquad\qquad r_{jj}+r_{jk}r_{kk}^{*}r_{kj}\)
\(r_{ij} \qquad\qquad\qquad\qquad r_{ij}+r_{ik}r_{kk}^{*}r_{kj}\)
\(r_{ji} \qquad\qquad\qquad\qquad r_{ji}+r_{jk}r_{kk}^{*}r_{ki}\)

After these replacements, remove state \(q_k\) and its edges.

Proof (4)

If the GTG has four or more states, pick any state \(q_k\) that is not the start or the final state. It will be removed.
For all \(o \neq k, p \neq k\), replace \(r_{op}\) with \(r_{op} + r_{ok}r^*_{kk}r_{kp}\).
When done, remove \(q_k\) and all its edges.
Continue eliminating states until only two states are left.
Finish with step 3 of the “Proof (1)” slide.

Proof (5)

In each step, we can simplify regular expressions \(r\) and \(s\) with any of these rules that apply:
\(r + r = r\)
\(s + r{}^{*}s = r{}^{*}s\)
\(r + \emptyset = r\)
\(r\emptyset = \emptyset\)
\(\emptyset^{*} = \{\lambda\}\)
\(r\lambda = r\)
\((\lambda + r)^{*} = r^{*}\)
\((\lambda + r)r^{*} = r^{*}\)

And similar rules.

Example

Settings

Proficient Saving... Error Saving
Server Error
Resubmit

Conclusion

We have now demonstrated that R.E. is equivalent (meaning, goes both directions) to DFA.