Close
Register
Close Window

OpenDSA Stand-alone Modules

Chapter 0 modules

Show Source |    | About   «  0.269. Closure Properties of Regular Grammars   ::   Contents   ::   0.271. Properties  »

Identifying Non-regular Languages

1. Identifying Non-regular Languages

How do we prove that a language is regular? We have a number of approaches in our toolbox.

  1. Build a DFA.
  2. Build a NFA.
  3. Write a regular expression.
  4. Write a regular grammar.
  5. Start with known regular languages and apply operations known to be closed for regular languages.

Given so many tools for creating a regular language, are there languages that are not regular? (The very fact that we are concerned with this question is a hint that this can happen.)

Let's start with some basic questions. First, if a language \(L\) is finite, is \(L\) regular? Yes! All finite languages are regular.

If \(L\) is infinite, is \(L\) regular? It might be and it might not. For example, \(L_1 = \{a^nb^m | n > 0, m > 0 \} = aa^*bb^*\) is an infinite regular language.

So, what about a language that was mentioned earlier, with no clear resolution? \(L_2 = \{a^nb^n | n > 0 \}\) is an infinite language.

Prove that \(L_2 = \{a^nb^n | n > 0 \}\) is not regular.

Proof 1 (by contradiction)
Suppose that \(L_2\) is regular.
If \(L_2\) is regular then \(\exists\) DFA \(M\) that recognizes \(L_2\).
\(M\) has a finite number of states, say \(k\) states.
Consider a long string \(a^kb^k \in L_2\).
Since there are \(k\) states and \(k\) a's, some state in \(M\) must be reached more than once when following the path of \(a^k\).
In that case, there is a loop with one or more a's (say \(t\) a's for some \(t > 1\)) along the path.
Suppose we start at the initial state, traverse the same path for \(a^kb^k\), but we traverse the loop of \(a\) 's one additional time.
We will end up in the same final state that \(a^kb^k\) did, but our actual number of a's is some other value (call it \(m\)).
Therefore, the string \(a^mb^k\) is accepted by \(M\), but this string is not in \(L_2\). Contradiction!
Thus, \(L_2\) is not regular.

This is an example of the Pigeonhole Principle. The Pigeonhole Principle states that, given \(n\) pigeonholes and \(n+1\) pigeons, when all of the pigeons go into the holes we can be sure that at least one hole contains more than one pigeon. In our case, the number of \(a\) 's are the pigeons, and the states in the DFA are the pigeonholes. We can't distinguish the various possibilities for the number of \(a\) 's, so we can't verify that they properly match the number of \(b\) 's.

1.1. The Concept of Pumping

Proof 2 (by contradiction, but worded a little differently)

Proof: Suppose that \(L_2\) is regular.
Then \(\exists\) DFA \(M\) that recognizes \(L_2\).
\(M\) has a finite number of states, say \(k\) states.
Consider a long string \(a^kb^k \in L_2\).
Since there are \(k\) states and \(k\) a's, some state in \(M\) must be reached more than once when following the path of \(a^k\).
In that case, there is a loop with one or more a's (say \(t\) a's for some \(t > 1\)) along the path.
Suppose we start at the initial state, traverse the same path for \(a^kb^k\), but we traverse the loop of a's one additional time ("pump" the loop). We will end up in the same final state that \(a^kb^k\) did.
Therefore, the string \(a^{k+t}b^k\) is accepted by \(M\), but this string is not in \(L_2\). Contradiction!
Thus, \(L_2\) is not regular.

In this version, we introduce the concept of "pumping" the string as we go around the loop. Loops are how we get infinite languages. They are also how we lose count or otherwise lose the ability to distinguish various properties of the string being processed.

Another way of looking at it is that the memory of a DFA is embodied explicitly in its set of states. Since the number of states is finite, the memory is finite. Further, while a program in a traditional programming language might have one integer (that conceptually at least stores an infinite number of values, even if that is not literally true), a DFA has no such thing.

In general, consider a DFAs with or without cycles. If there is no cycle, the language accepted is finite. If there is one or more cycle, then the language is infinite. But we can actually say a bit more about this language. To explain the concept, let's assume that there is exactly one cycle. In that case, the cycle might be skipped, executed once, or executed more than once.

We can consider any string accepted by this DFA when the cycle is executed exactly once to be of the form \(w = w_1vw_2\) where the \(v\) represents the part of the string captured by the cycle. Now, if the cycle is skipped, then we get \(w = w_1w_2\), and if its executed twice we get \(w = w_1v^2w_2\). In general, the DFA accepts all strings like \(w = w_1v^*w_2\).

1.2. Pumping Lemma

Let \(L\) be an infinite regular language. \(\exists\) a constant \(m > 0\) such that any \(w \in L\) with \(|w| \ge m\) can be decomposed into three parts as \(w=xyz\) with:

\(|xy| \le m\)
\(|y| \ge 1\)
\(xy^iz \in L\) for all \(i\ge 0\)

Meaning: Every sufficiently long string in \(L\) (the constant \(m\) corresponds to the finite number of states in \(M\)) can be partitioned into three parts such that the middle part can be "pumped", resulting in strings that must be in \(L\).

How To Use the Pumping Lemma to prove L is not regular:

Proof by Contradiction.
Assume L is regular.
Therefore \(L\) satisfies the pumping lemma.
Choose a long string \(w \in L\), \(|w| \ge m\). (The choice of the string is crucial. We must pick a string that will yield a contradiction).
Show that there is NO division of \(w\) into \(xyz\) (must consider all possible divisions) such that \(|xy| \le m\), \(|y| \ge 1\) and \(xy^iz \in L \forall i \ge 0\).
The pumping lemma does not hold. Contradiction!
\(\Rightarrow L\) is not regular.

Note that, unfortunately, the pumping lemma is one-way: For (some) languages we can use the pumping lemma to prove that they are not regular. But we cannot use the pumping lemma to help us prove that a language is regular. And the pumping lemma is not a universal solution for determining that a language is non-regular. Its just a tool in the toolbox.

Example 1

\(L = \{a^nb^n | n \geq 0\}\)

Theorem: \(L\) is not regular.

Proof:

Assume \(L\) is regular, therefore the pumping lemma holds.
Choose \(w = a^mb^m\) where \(m\) is the constant in the pumping lemma. (Note that \(w\) must be choosen such that \(|w| \ge m\).)
We can always choose \(n = m\). Therefore, substring \(y\) must be some number of \(a\) 's.
So the partition is:
\[x=a^{n-k}\quad |\quad y=a^k\quad |\quad z=b^n\]
where \(n \leq m\) and \(k > 0\).
It should be true that \(xy^iz \in L\) for all \(i\ge 0\).
But clearly this is not true. Contradiction!
\(\Rightarrow L\) is not regular.

It is important to understand that the pumping lemma says that there is some way to define the language that meets the criteria. It is not enough to pick your favorite value of \(m\) for which the language would not be regular. You have to show that no satisfactory \(m\) can exist.

Here is an adversary argument way of looking at this. Your goal is to establish a contradiction (to prove the language is not regular), while the opponent tries to stop the proof. The moves in the game are:

  1. The opponent picks \(m\).
  2. We pick string \(w\) in \(L\) of length equal or greater than \(m\). We are free to chose any \(w\), so long as \(w \in L\) and \(|w| \geq m\).
  3. The opponent chooses the decomposition \(xyz\), such that \(|xy| \leq m, |y| \geq 1\). The opponent will make the choice that is hardest for us to win the game.
  4. We try to pick \(i\) so that the pumped string \(w_i = xy^iz\) is not in \(L\). If we can always do this, we win (\(L\) is not regular).

Example 2

\(L = \{ww^R : w \in \Sigma^*\}\).

Theorem: \(L\) is not regular.

Proof:

For any value \(m\), we pick the string \(a^mb^mb^ma^m\).
Since \(|xy| \leq m\), \(y\) must consist entirely of \(a\) 's.
If we pick \(i = 0\), then the resulting string has fewer \(a\) 's on the left than on the right and so cannot be of the form \(ww^R\).
Therefore, \(L\) is not regular.

Example 3

If the language is indeed regular, you should find it impossible to use the pumping lemma to prove it non-regular!

\(L = \{a^mb^n \mid n+m\) is odd \(\}\)

Prove \(L = \{a^mb^n \mid n+m\) is odd \(\}\) is not regular.
If the opponent picks \(m = 1\), then we can pick \(w = abb\).
Whatever the adversary picks for \(xyz\), we end up with \(y\) such that we can pump strings not in the language.
SO... does this mean that \(L\) is non-regular?
NO!! The adversary will not pick a bad choice for \(m\) if they don't have to!

Example 4

If the language is indeed regular, you should find it impossible to use the pumping lemma to prove it non-regular!

\(L = \{a^mb^n \mid n+m\) is odd \(\}\)

Theorem: \(L\) is not regular.

Proof:

Say that the opponent picks \(m = 3\).
We can choose this string that is in the language: \(aaabb\) so as to constrain the opponent to picking values for \(y\) with all \(a\) 's.
But unfortunately, the opponent picks decomposition \(a(aa)^ibb\).
We can't pick \(i\) that is not in the language.
The point is that we cannot find a string, for all values of \(m\), such that the opponent cannot also pick workable values for \(x, y, z\).
Consider the Pumping Lemma definition again:
Let \(L\) be an infinite regular language. There exists a constant \(m > 0\) such that any \(w \in L\) with \(|w| \ge m\) can be decomposed into three parts as \(w=xyz\) with:
\(|xy| \le m\)
\(|y| \ge 1\)
\(xy^iz \in L\) for all \(i\ge 0\)
1. The opponent picks \(m\).
2. We pick string \(w\).
3. The opponent chooses the decomposition \(xyz\).
4. We try to pick \(i\).
WE seek to prove the language non-regular.
The adversary seeks to stop us.
  1. There exists a constant \(m > 0\) [= Adversary picks a value for \(m\).]
  2. ... such that any \(w \in L\) with \(|w| \ge m\) [= WE pick our choice for \(w\).]
  3. ... can be decomposed into three parts as \(w=xyz\) [= Adversary picks \(xyz\)] (that meets the length criteria on \(xy\) and \(y\))
  4. ... such that \(xy^iz \in L\) for all \(i\ge 0\) [= WE pick a value for \(i\).]

Example 5

\(L = \{a^ncb^n | n > 0\}\)

Theorem: \(L\) is not regular.

Proof:

Assume \(L\) is regular, therefore the pumping lemma holds.
Choose \(w = a^mcb^m\) where \(m\) is the constant in the pumping lemma. (Note that \(w\) must be choosen such that \(|w|\ge m\).)
The only way to partition \(w\) into three parts, \(w=xyz\), is such that \(x\) contains 0 or more a's, \(y\) contains 1 or more a's, and \(z\) contains 0 or more a's concatenated with \(cb^m\). This is because of the restrictions \(|xy| \le m\) and \(|y|> 0\).
So the partition is:
\[x=a^k\quad |\quad y=a^j\quad |\quad z=a^{m-k-j}cb^m\]

where \(k \ge 0\), \(j > 0\), and \(k + j \le m\) for some constants \(k\) and \(j\).

It should be true that \(xy^iz \in L\) for all \(i\ge 0\).
\(xy^0z = a^{m-j}cb^{m} \not \in L\). Contradiction!
(Note that \(xy^2z\) would also give a contradiction, but you only need to find one contradiction.)
\(\Rightarrow L\) is not regular.

Example 6

\(L = \{a^nb^{n+s}c^s | n,s > 0\}\)

Theorem: L is not regular.

Proof:

Assume \(L\) is regular, therefore the pumping lemma holds.
Choose \(w = a^mb^{m+s}c^s\) where \(m\) is the constant in the pumping lemma. (Note: \(s\) could be replaced by any constant here, 5, 9, etc.)
The only way to partition \(w\) into three parts, \(w=xyz\), is such that \(x\) contains 0 or more a's, \(y\) contains 1 or more a's, and \(z\) contains 0 or more a's concatenated with the rest of the string \(b^{m+s}c^s\). This is because of the restrictions \(|xy| \le m\) and \(|y|> 0\).
So the partition is:
\[x=a^k\quad |\quad y=a^j\quad |\quad z=a^{m-k-j}b^{m+s}c^s\]

where \(k \ge 0\), \(j > 0\), and \(k + j \le m\) for some constants \(k\) and \(j\).

It should be true that \(xy^iz \in L\) for all \(i \ge 0\).
\(xy^2z = a^{m+j}b^{m+s}c^s \not\in L\). \(n_a + n_c > n_b\). Contradiction!
\(\Rightarrow L\) is not regular.

Example 7

\(\Sigma=\{a,b\}, L = \{w\in{\Sigma}^{*}\mid n_a(w) > n_b(w)\}\)

Theorem: L is not regular.

Proof:

Assume \(L\) is regular, therefore the pumping lemma holds.
Choose \(w = a^{m+1}b^{m}\) where \(m\) is the constant in the pumping lemma.
The only way to partition \(w\) into three parts, \(w=xyz\), is such that \(x\) contains 0 or more a's, \(y\) contains 1 or more a's, and \(z\) contains 1 or more a's concatenated with the rest of the string \(ab^{m}\). This is because of the restrictions \(|xy| \le m\) and \(|y| \ge 0\).
So the partition is:
\[x=a^k\quad |\quad y=a^j\quad |\quad z=a^{m+1-k-j}b^{m}\]

where \(k \ge 0\), \(j > 0\), and \(k + j \le m\) for some constants \(k\) and j.

It should be true that \(xy^iz \in L\) for all \(i \ge 0\).
\(xy^2z = a^{m+1+j}b^{m} \in L\). Not a contradiction.
\(xy^0z = a^{m+1-j}b^{m} \in L\). Since \(j > 0\), \(n_a \le n_b\). Contradiction!
\(\Rightarrow L\) is not regular.

Example 8

\(L = \{a^3b^nc^{n-3} | n > 3 \}\)

Theorem: L is not regular.

Proof:

Assume \(L\) is regular, therefore the pumping lemma holds.
Choose \(w = a^3b^mc^{m-3}\) where \(m\) is the constant in the pumping lemma. There are three ways to partition \(w\) into three parts, \(w=xyz\).
1) \(y\) contains only a's
2) \(y\) contains only b's, and
3) \(y\) contains a's and b's
We must show that each of these possible partitions lead to a contradiction. (Then, there would be no way to divide \(w\) into three parts such that the pumping lemma contraints were true).
Case 1: (\(y\) contains only a's). Then \(x\) contains 0 to 2 a's, \(y\) contains 1 to 3 a's, and \(z\) contains 0 to 2 a's concatenated with the rest of the string \(b^{m}c^{m-3}\), such that there are exactly 3 a's. So the partition is:
\[x=a^k\quad |\quad y=a^j\quad |\quad z=a^{3-k-j}b^{m}c^{m-3}\]

where \(k \ge 0, j > 0\), and \(k + j \le 3\) for some constants \(k\) and \(j\).

It should be true that \(xy^iz \in L\) for all \(i\ge 0\).
\(xy^2z = (x)(y)(y)(z) = (a^k)(a^j)(a^j)(a^{3-j-k}b^mc^{m-3}) = a^{3+j}b^{m}c^{m-3} \not\in L\) since \(j>0\), there are too many a's. Contradiction.
Case 2: (\(y\) contains only b's)
Then \(x\) contains 3 a's followed by 0 or more b's, \(y\) contains 1 to \(m-3\) b's, and \(z\) contains 3 to \(m-3\) b's concatenated with the rest of the string \(c^{m-3}\). So the partition is:
\[x=a^3b^k\quad |\quad y=b^j\quad |\quad z=b^{m-k-j}c^{m-3}\]

where \(k \ge 0\), \(j > 0\), and \(k + j \le m-3\) for some constants \(k\) and \(j\).

It should be true that \(xy^iz \in L\) for all \(i\ge 0\).
\(xy^0z = a^{3}b^{m-j}c^{m-3} \not\in L\) since \(j > 0\), there are too few b's. Contradiction.
Case 3: (\(y\) contains a's and b's)
Then \(x\) contains 0 to 2 a's, \(y\) contains 1 to 3 a's, and 1 to \(m-3\) b's, \(z\) contains 3 to \(m-1\) b's concatenated with the rest of the string \(c^{m-3}\). So the partition is:
\[x=a^{3-k}\quad |\quad y=a^{k}b^j\quad |\quad z=b^{m-j}c^{m-3}\]

where \(3 \ge k > 0\), and \(m-3 \ge j > 0\) for some constants \(k\) and \(j\).

It should be true that \(xy^iz \in L\) for all \(i\ge 0\).
\(xy^2z = a^{3}b^ja^kb^mc^{m-3} \not\in L\) since \(j, k > 0\), there are b's before a's. Contradiction.
\(\Rightarrow\) There is no partition of \(w\).
\(\Rightarrow L\) is not regular.

1.3. Use Closure Properties to prove \(L\) is not regular

Using closure properties of regular languages, construct a language that should be regular, but for which you have already shown is not regular. Contradiction.

Proof Outline:
Assume \(L\) is regular.
Apply closure properties to \(L\) and other regular languages, constructing \(L'\) that you know is not regular.
Closure properties \(\Rightarrow L'\) is regular.
Contradiction. So \(L\) is not regular.

Example 9

\(L = \{a^3b^nc^{n-3} | n > 3 \}\)

Theorem: \(L\) is not regular.

Proof: (proof by contradiction)

Assume \(L\) is regular.
Define a homomorphism \(h: \Sigma \rightarrow \Sigma^*\)
\[h(a) = a\quad |\quad h(b) = a\quad |\quad h(c) = b\]
\(h(L) = \{a^3a^nb^{n-3} | n > 3 \} = \{a^{n+3}b^{n-3} | n > 3\}\)
\(L\) is regular and closure under homomorphism \(\Rightarrow h(L)\) is regular.
The language \(\{b^6\}\) is a regular language.
By closure under concatenation, \(L' = h(L)\{b^6\} = \{a^{n+3}b^{n+3} | n > 3\}\) is regular.
The language \(L'' = \{ab, aabb, aaabbb, aaaabbbb, aaaaabbbbb, aaaaaabbbbbb\}\) is regular.
By closure under union, \(L' \cup L'' = \{a^nb^n | n > 0\}\) is regular.
But, we showed earlier that \(\{a^nb^n | n > 0 \}\) is not regular! Contradiction.
\(\Rightarrow L\) is not regular.

Example 10

\(L = \{a^nb^ma^{m}\ |\ m \ge 0, n \ge 0 \}\)

Theorem: \(L\) is not regular.

Proof: (proof by contradiction)

Assume \(L\) is regular.
\(L1 = \{ bb^{*}aa^{*}\}\)
\(L2 = L \cap L1 = \{b^na^n \mid n > 0\}\)
Define a homomorphism \(h: \Sigma \rightarrow \Sigma^*\)
\[h(a) = b\quad |\quad h(b) = a\]
\(h(L2) = \{a^nb^n | n>0 \}\) should be regular.
We showed earlier that \(\{a^nb^n | n > 0 \}\) is not regular. Contradiction.
\(\Rightarrow L\) is not regular.

Example 11

\(L_1 = \{a^nb^na^n\ |\ n > 0\}\)

Theorem: \(L_1\) is not regular.

Proof: (proof by contradiction)

Assume \(L_1\) is regular.
The goal is to try to construct \(\{a^nb^n | n > 0\}\) which we know is not regular.
NOTE: Trying to intersect with \(\{a^{*}b^{*} \}\) does not work.
Let \(L_2 = \{a^{*}\}\). \(L_2\) is regular.
By closure under right quotient, \(L_3 = L_1 \backslash L_2 = \{a^nb^na^p | 0 \le p \le n, n > 0\}\) is regular.
By closure under intersection, \(L_4 = L_3 \cap \{a^{*}b^{*}\} = \{a^nb^n | n > 0\}\) is regular.
We already proved that \(L_4\) is not regular. Contradiction.
\(\Rightarrow L_1\) is not regular.

   «  0.269. Closure Properties of Regular Grammars   ::   Contents   ::   0.271. Properties  »

nsf
Close Window