The Regular Expression (also known as RegEx or RE) is another
way to define a language.
They are used a lot by practicing programmers for things like defining
simple search patterns.
This adds another way to define languages along with the ones that we
already know: Grammars, DFAs and NFAs.
Or, we could just describe the language using an English description.
Why do we need another one?
The problem with an English description (or any other language that
people speak) is that it is too imprecise, and not something that we
can easily implement.
Using a DFA or NFA typically requires some sort of graphical
editor, and it takes a bit of time to enter all the states and
transitions.
We will see that regular expressions are easy to type, and they
tend to use relatively short descriptions for common languages that we
want to represent.
Of course, even a relatively small and precise specification for a
language can be hard to come up with (or to understand).
But at least with a regular expression, it is usually quick and easy
to type once you have it.
4.1.1. Definition and Examples of Regular Expressions¶
Part 1. Recall that we define the term regular language to mean the languages that are recognized by a DFA. And we know these are the same as the languages recognized by an NFA, because we know that every NFA can be converted to a DFA (and vice versa).
Now, we will show the relationship between regular languages (and thus, DFAs and NFAs) and Regular Expressions.
Part 2. In Part 1, we showed how to convert the base case REs ($\lambda$ and any symbol from $\Sigma$) to NFAs. And we showed that any NFA can be converted to an equivalent NFA with a single final state.
Now we will see how to convert more complex REs to an NFA.
Part 3. Next, we will define a construction for the NFA that can accept the RE $r \cdot s$, given that we have NFAs that are equivalent to $r$ and $s$.
Part 4. The last operator that we need to implement is the Kleene star ($*$) operator. The operator will concatenate the language with itself zero or more times.
We now have a proof that any RegEx can be converted to a NFA. And we know some mechanics: In particular, we know how to combine two NFAs that represent RegExs into a single NFA using one of the RegEx builder rules. Unfortunately, that does not really help us when faced with a complex RegEx that we want to convert to an NFA. In this frameset, we show an algorithm for doing this.
Since every regular expression has an NFA that implements it, this means that the regular expressions are a subset of the regular languages. The next question is: Does every regular language have a regular expression?
Perhaps you thought it fairly intuitive to see that any regular expression can be implemented as a NFA. But for most of us, going the other way is not at all obvious. The proof that any NFA can be converted to a regular expression is rather difficult, and we are just going to give a sketch.
Thus, all languages that can be represented by regular
expression are regular, and all regular languages can be represented
by a regular expression.
As we noted at the start, regular expressions don’t give us any more
power than we already had with DFAs.
But they are often easier to write down, as you will see in the
following exercises.