Regular Languages

Definition An alphabet is a nonempty, finite set of objects (symbols).

I Σ and Γ are usually used to indicate alphabets. I For instance, Σ1 = {a, b, c, d, e, f },Σ2 = {0, 1}

Definition A string over Σ is any finite sequence of symbols from Σ. The empty string ε (sometimes λ) is the string consisting of no symbols. If w = a1a2 ... an, n ≥ 0 and each ai ∈ Σ, is a string over Σ, then

I the length |w| of w is n. The length of ε is 0. R I w = an ... a2a1 is the reverse of w. I a substring u of w is any consecutive sequence of 0 or more symbols of w.

I 0, 101, and 0101111 are strings over Σ = {0, 1}. I |0| = 1, |101| = 3, and |0101111| = 7.

Let x = x1 ... xn and y = y1 ... ym be strings over some alphabet. xy = x1 ... xny1 ... ym is the concatenation of x and y.

I Σ = {0, 1}, I x = 01, y = 001. I xy = 01001, yx = 00101.

Definition If w is a string and k ∈ N, then w k is the concatenation of k w’s.

3 I If w = 001, then w = 001001001. 0 I w = ε for any string w.

Definition Let Σ be some alphabet. Σ∗ is the set of strings defined as follows: ∗ I Basis: ε ∈ Σ . ∗ ∗ I If w ∈ Σ and a ∈ Σ, then wa ∈ Σ . I Nothing but strings in the basis or formed by a finite number of applications of the above rule are members of Σ∗.

Definition A language over an alphabet Σ is any of Σ∗.

In other words, a language is a set of strings.

Definition Let A and B be languages. The regular operations are: I Union A ∪ B: {w|w ∈ A or w ∈ B}. I Concatenation A ◦ B: {uv|u ∈ A and v ∈ B}. ∗ I (Kleene) Star A : {w1w2 ... wn| n ≥ 0 and each wi ∈ A}.

∗ I Observe that for any language A, ε ∈ A . ∗ I If Σ is a set of symbols, then Σ is just the set of finite strings made from symbols of Σ. I We often write AB rather than A ◦ B.

Example Let A = {good, bad} and B = {boy, girl} I A ∪ B: {good, bad, boy, girl}. I A ◦ B: {goodboy, badboy, goodgirl, badgirl}. ∗ I A : {ε, good, bad, goodgood, goodbad, badgood, badbad,...}. ∗ I B : {ε, boy, girl, boyboy, boygirl, girlboy, girlgirl,...}.

I There are several equivalent definitions for regular languages. I They can be defined recursively via regular operations.

Let Σ be an alphabet. I Basis: I ∅ is a over Σ. I {ε} is a regular language over Σ. I For each a ∈ Σ, {a} is a regular language over Σ. I Recursion: I If A and B are regular languages over Σ, A ∪ B is a regular language over Σ. I If A and B are regular languages over Σ, A ◦ B is a regular language over Σ. ∗ I If A is a regular language over Σ, A is a regular language over Σ. I Closure: Only languages formed via finite applications of the above rules are regular languages over Σ.

I The above will work. However, instead we will define regular languages via automata.

I A simple model of computation I Read an input string from tape I Determine if the input string is in a language I Determine if the answer for the problem is “YES” or “NO” for the given input on the tape

I At the beginning, I the FA is in the start state (initial state) I its tape head points at the first cell I For each move, the FA I reads the symbol under its tape head I changes its state (according to the transition function) to the next state determined by the symbol read from the tape and its current state I move its tape head to the right one cell

I a FA stops I when it reads all symbols on the tape I Then, it gives an answer if the input string is in the specific language: I Answer “YES” if its last state is an accept state I Answer “NO” if its last state is not an accept state

I Regular languages can be defined using deterministic finite automata (DFAs). I Informally, a DFA M consists of a set of states q0, q1,... qn. I q0 is called the start state. I Some subset of q0, q1,... qn comprises the accept states. I DFA M reads an input string w = w1 ... wm, m ≥ 0: I M operates in discrete steps. I M occupies exactly one state at any given time. I M begins in state q0. I M reads w one character at a time, moving left to right. I A transition function, together with the current state and character determines M’s next state. I If M reaches the end of w in an accept state, then M accepts string w. Otherwise, it rejects it. I The language L(M) of M is the set of strings M accepts. I A language is regular if and only if there is some DFA that accepts it.

Figure: An example DFA M1.

I A state diagram, a directed graph, is often used to represent DFAs. I Nodes represent states. I Nodes circled twice represent accept states. I An edge with no starting node indicates the start state. I Labelled edges represent the transition function. −→ I An edge qi a qj means that if M is in state qi and reads an a, it should move to state qj .

Here, L(M1), the language of M1, is the set of strings that have at least one 1, and the last 1 is followed by an even number of 0s.

I deterministic finite automaton (DFA) is a 5-tuple (Q, Σ, δ, q0, F ): I Q is a finite, nonempty set of states. I Σ is a finite, nonempty set (the alphabet). I δ : Q × Σ → Q is total function, the transition function. I q0 ∈ Q is the start state. I F ⊆ Q is the set of accept states.

I Observe that Q and Σ must be finite and nonempty. I q0 must be in Q. I δ is a total function and so maps every pair (q, a) of state q and symbol a to some state q0. I F might be empty.

The above DFA can be formally defined as follows:

I Q = {q1, q2, q3}. I Σ = {0, 1}. I δ is given by the table: δ 0 1 q1 q1 q2 q2 q3 q2 q3 q2 q2 I q0 = q1. I F = {q2}

The above DFA M2 can be formally defined as follows:

I Q = {q1, q2}. I Σ = {0, 1}. I δ is given by the table: δ 0 1 q1 q1 q2 q2 q1 q2 I q0 = q1. I F = {q2}

L(M2) = {w| w ends in a 1}.

The above DFA M3 can be formally defined as follows:

I Q = {q1, q2}. I Σ = {0, 1}. I δ is given by the table: δ 0 1 q1 q1 q2 q2 q1 q2 I q0 = q1. I F = {q1}

L(M3) = {w| w ends in a 0} ∪ {ε}.

DFA M4 can be formally defined as follows:

I Q = {s, q1, q2, r1, r2}. I Σ = {a, b}. I δ is given by the table: δ a b s q1 r1 q1 q1 q2 q2 q1 q2 r1 r2 r1 r2 r2 r1 I q0 = s. I F = {q1, r1}

L(M4) = {w|w begins and ends in a} ∪ {w|w begins and ends in b}.

I Let M = (Q, Σ, δ, q0, F ) be a DFA. I Let w = w1 ... wn be a string such that each wi ∈ Σ. I M accepts w if and only if there exists a sequence r0, r1, ..., rn of states of Q such that I r0 = q0. I For each 0 ≤ i < n, δ(ri , wi+1) = ri+1. I rn ∈ F . I Otherwise, M rejects w.

Definition If M is a DFA, then M recognizes language L if L = {w|M accepts w}.

Definition Language L is regular if there exists some DFA M such that M recognizes L.

I Determine what a DFA needs to memorize in order to recognize stings in the language I Hint: the property of the strings in the language I Determine how many states are required to memorize what we want I Accept state(s) memorize the property of the string in the language I Find out how the thing we memorize is changed once the next input symbol is read I From this change, we get the transition function

I Suppose that Σ = {0, 1} and the language consists of all strings with an odd number of 1s. Construct a DFA to accept this language. I ??How about all strings with an even number of 1s? I Construct a DFA to accept the language, which consists of strings that represent binary numbers divisible by 3. I Decide what a DFA needs to memorize I How many states do we need I Construct the transition diagram

1. Suppose that Σ = {0, 1} and the language consists of all strings that have 00 or 11 as substrings. Construct a DFA to accept this language. 2. Suppose that Σ = {0, 1} and the language consists of all strings that have 00 and 11 as substrings. Construct a DFA to accept this language.

I Decide what a DFA needs to memorize I How many states do we need I Construct the transition diagram

Definition Recall the regular operations: I Union A ∪ B: {w|w ∈ A or w ∈ B}. I Concatenation A ◦ B: {uv|u ∈ A and v ∈ B}. ∗ I (Kleene) Star A : {w1w2 ... wn| n ≥ 0 and each wi ∈ A}.

I These can be used to define regular languages, but we will use a DFA-based account as our primitive. I We will use this account to show that the set of regular languages is closed under the regular operations.

Let A1 and A2 be any languages defined over alphabet Σ.

I If A1 and A2 are regular languages, then A1 ∪ A2 is regular. I If A1 and A2 are regular, then A1 ◦ A2 is regular. ∗ I If A1 is regular, then A1 is regular.

If A1 and A2 are regular languages, then A1 ∪ A2 is regular.


Wlog, we may assume that A1 and A2 are defined over the same alphabet Σ. Since A1 and A2 are both regular, there exist DFAs M1 and M2 such that L(M1) = A1 and L(M2) = A2.

I M1 = (Q1, Σ, δ1, q1, F1) I M2 = (Q2, Σ, δ2, q2, F2)

The proof is by construction. We construct a DFA M to recognize A1 ∪ A2. Specifically, M = (Q, Σ, δ, q0, F ), where

I Q = {(r1, r2)|r1 ∈ Q1 and r2 ∈ Q2}; I q0 = (q1, q2); I F = {(r1, r2)|r1 ∈ F1 or r2 ∈ F2}; I For each (r1, r2) ∈ Q and each a ∈ Σ, δ((r1, r2), a) = (δ1(r1, a), δ2(r2, a)).

Proof, Cont.

We must prove the construction works. We must prove L(M) = A1 ∪ A2. (LR) Suppose M accepts string w = w1w2 ... wn. Then by definition there exists a sequence of states (r0, s0), (r1, s1),... (rn, sn) such that

I (r0, s0) = q0; I For each 0 ≤ i < n, δ((ri , si ), wi+1) = (ri+1, si+1). I (rn, sn) ∈ F ;

However, by construction of M, r0 = q1, s0 = q2, and since (rn, sn) ∈ F , it must be that either rn ∈ F1 or sn ∈ F2. We may assume that its the former. Similarly, by the construction of δ, for each 0 ≤ i < n, if δ((ri , si ), wi+1) = (ri+1, si+1) then δ1(ri wi+1) = ri+1. Given all of this, the sequence r0 ... rn satisfies all of the requirements needed to show that M1 accepts w. From this, w ∈ A1 and consequently w ∈ A1 ∪ A2.

Proof, Cont.

For the other direction, we show that if w ∈ L(M1), then w ∈ L(M). (RL) Suppose M1 accepts string w = w1w2 ... wn. Then by definition there exists a sequence of states r0, r1,... rn such that

I r0 = q1; I For each 0 ≤ i < n, δ1(ri , wi+1) = ri+1. I rn ∈ F1;

Construct the following sequence (r0, s0), (r1, s1),... (rn, sn), such that

I (r0, s0) = (q1, q2); I for each 0 ≤ i < n,(ri+1, si+1) = (δ1(ri , wi+1), δ2(si , wi+1)).

Proof, Cont.

Construct the following sequence (r0, s0), (r1, s1),... (rn, sn), such that

I (r0, s0) = (q1, q2); I for each 0 ≤ i < n,(ri+1, si+1) = (δ1(ri , wi+1), δ2(si , wi+1)). Observe that:

I (r0, s0) = q0; I For each 0 ≤ i < n, (ri+1, si+1) = (δ1(ri , wi+1), δ2(si , wi+1)) = δ((ri , si ), wi+1).

I (rn, sn) ∈ F , because rn ∈ F1;

As such, the sequence (r0, s0), (r1, s1),... (rn, sn) satisfies all of the requirements needed to show that M accepts w, and so w ∈ L(M).

I The behavior of DFAs is completely deterministic. The next state of the machine is determined completely by its current state and the symbol to be read from input. I Nondeterministic finite automata (NFAs) eliminate this determinism. I With NFAs, there may be a choice of next state. I Though NFAs in a sense generalize DFAs (all DFAs are NFAs but not vice versa), the two computational models are equivalent. I NFAs and DFAs both accept exactly the regular languages. I A language is regular iff there is a DFA that accepts it. I A language is regular iff there is an NFA that accepts it. ∗ I To prove that A1 ◦ A2 and A1 yield regular languages (provided A1, A2 are regular), we will use NFAs.

In an NFA N

I For any state q and symbol a, q might have 0, 1, or > 1 transitions. I ε is allowed in a transition (N switches states but consumes no input).

I In a DFA there is only one way to process an input string w. I In an NFA, there might be multiple possible ways of processing it. I There might be multiple computation paths. I If there is a choice of the next state, you may think of the path as branching off in multiple directions. I The NFA accepts a string w if any of these possible computation paths end in an accept state.

NFA N2 accepts the language of bit-strings w such that w ends in 100, 101, 110, or 111.

I Often, it is easier to design and understand NFAs than DFAs. I Though DFAs and NFAs are equivalent in computational power, the DFA to accept a given language might have many more states than an NFA that accepts it.

If Σ is an alphabet, then Σε = Σ ∪ {ε}. I A nondeterministic finite automaton (NFA) is a 5-tuple (Q, Σ, δ, q0, F ): I Q is a finite, nonempty set of states. I Σ is a finite, nonempty set (the alphabet). I δ : Q × Σε → P(Q) is total function, the transition function. I q0 ∈ Q is the start state. I F ⊆ Q is the set of accept states.

I Observe that the transition function differs from that for DFAs. 0 I In a DFA, δ maps a pair (q, a) to a single state q . I In an NFA, δ maps a pair (q, a) to a set of states. I Also, in an NFA, the domain of δ is Q × Σε and not Q × Σ.

I Let N = (Q, Σ, δ, q0, F ) be an NFA. I Let w = w1 ... wn be a string such that each wi ∈ Σ. I N accepts w if and only if there exists a sequence r0, r1, ..., rn of states of Q such that I r0 = q0. I For each 0 ≤ i < n, ri+1 ∈ δ(ri , wi+1). I rn ∈ F . I Otherwise, M rejects w.

Definition If N is an NFA, then N recognizes language L if L = {w|N accepts w}.

Note that the “next” state ri+1 in the sequence is one of the set of states indicated by δ(ri , wi+1).

The above NFA N1 can be formally defined as follows:

I Q = {q1, q2, q3, q4}. I Σ = {0, 1}. I δ is given by the table: δ 0 1 ε q1 {q1}{q1, q2} ∅ q2 {q3} ∅ {q3} q3 ∅ {q4} ∅ q4 {q4}{q4} ∅ I q0 = q1. I F = {q4}

I Every DFA is an NFA. Theorem Every NFA has an equivalent DFA.

Let N = (Q, Σ, δ, q0, F ) be an NFA. 0 0 0 0 We construct an equivalent DFA M = (Q , Σ, δ , q0, F ) as follows: 0 I Q = P(Q) (the powerset of Q). 0 I q0 = {q0}. 0 I F = {R|R ∈ Q and there is a q ∈ R such that q ∈ F }. 0 I For any R ∈ Q and a ∈ Σ, δ(R, a) = {q| r ∈ R and q ∈ δ(r, a)}.

I N processes a string in multiple parallel computation paths. I N must still operate in discrete steps, however. I At any step, N will “occupy” some subset S of states of Q. I The states of M encode these sets of states. I Consider reading string w1w2 ....

I M begins in state {q0}. M transitions to {q| q0 →w1 q is an edge of N}. CSCI 2670 Regular Languages Equivalence between NFAs and DFAs

I The previous construction did not account for ε-edges. I Consider the following alterations: I E(R) = {q| q is reachable from any state of R by following only ε-edges}. 0 I For any R ∈ Q and a ∈ Σ, δ(R, a) = {q| r ∈ R and q ∈ E(δ(r, a))}. 0 I q0 = E({q0}) I These are sufficient to construct a DFA M equivalent to arbitrary NFA N.

Note that Sipser does not provide a proof that the construction works. Instead he states that it “obviously works correctly”.

Given the equivalence

Corollary A language is regular if and only if it is recognized by some NFA.

NFA N4 has three states. The DFA M made from it has 8 states. Construct M.

NFA N4 has three states. The DFA M made from it has 8 states.

δM a b ∅ ∅ ∅ {1} ∅ {2} {2} {2, 3}{3} {3} {1, 3} ∅ {1, 2} {2, 3}{2, 3} {1, 3} {1, 3}{2} {2, 3} {1, 2, 3}{3} {1, 2, 3} {1, 2, 3}{2, 3}

FM = {{1}, {1, 2}, {1, 3}, {1, 2, 3}}. q0 for M is {1, 3}.

DFA M has been simplified by removing unreachable nodes.

I Given that NFAs delineate the class of regular languages, we will use them to show that regular languages are closed under concatenation.


If A1 and A2 are regular languages, then A1 ◦ A2 is regular.

Since A1 and A2 are both regular, there exist NFAs N1 and N2 such that L(N1) = A1 and L(N2) = A2.

I N1 = (Q1, Σ, δ1, q1, F1) I N2 = (Q2, Σ, δ2, q2, F2)

Since A1 and A2 are both regular, their exist NFAs N1 and N2 such that L(N1) = A1 and L(N2) = A2.

I N1 = (Q1, Σ, δ1, q1, F1) I N2 = (Q2, Σ, δ2, q2, F2)

We construct an NFA N = (Q, Σ, δ, q0, F ) to recognize A1 ◦ A2:

I Q = Q1 ∪ Q2 I q0 = q1; I F = F2 I For each q ∈ Q and a ∈ Σ: I δ(q, a) = δ1(q, a) if q ∈ Q1 and q ∈/ F1. I δ(q, a) = δ1(q, a) ∪ {q2} if q ∈ F1 and a = ε. I δ(q, a) = δ1(q, a) if q ∈ F1 and a 6= ε. I δ(q, a) = δ2(q, a) if q ∈ Q2.

CSCI 2670 Regular Languages Closure of regular languages under star

Theorem If A is a regular language, then A∗ is regular.

Let N1 = (Q1, Σ, δ1, q1, F1) be an NFA such that L(N1) = A. ∗ We construct an NFA N = (Q, Σ, δ, q0, F ) to recognize A :

I Q = Q1 ∪ {q0}, q0 ∈/ Q1 I F = F1 ∪ {q0} I For each q ∈ Q and a ∈ Σ: I δ(q, a) = ∅ if q = q0 and a 6= ε. I δ(q, a) = {q1} if q = q0 and a = ε. I δ(q, a) = δ1(q, a) if q ∈/ F1 I δ(q, a) = δ1(q, a) if q ∈ F1 and a 6= ε. I δ(q, a) = δ1(q, a) ∪ {q1} if q ∈ F1 and a = ε.

Note again that Sipser does not provide proofs that these constructions work.

CSCI 2670 Regular Languages