Q1 Q2 Q3 a B B a A, B

CS 341: Chapter 1 1-2 Chapter 1 Regular Languages CS 341: Foundations of CS II Contents • Finite Automata • Class of Regular Languages is Closed Under Some Operations Marvin K. Nakayama • Nondeterminism Computer Science Department • Regular Expressions New Jersey Institute of Technology Newark, NJ 07102 • Nonregular Languages

CS 341: Chapter 1 1-3 CS 341: Chapter 1 1-4 Introduction Deterministic Finite Automata (DFA)

• Now introduce a simple model of a computer having a finite amount of Example: DFA with alphabet Σ={a, b}: memory. a b • This type of machine will be known as a finite-state machine or a finite automaton. b q1 q2 q3 • Basic idea how a finite automaton works: a, b It is presented an input string w over an alphabet Σ; i.e., w ∈ Σ∗. It reads in the symbols of w from left to right, one at a time. • q1,q2,q3 are the states. After reading the last symbol, it indicates if it accepts or rejects the • q1 is the start state as it has an arrow coming into it from nowhere. string. • q2 is an accept state as it is drawn with a double circle. • These machines are useful for string matching, compilers, etc. CS 341: Chapter 1 1-5 CS 341: Chapter 1 1-6 Deterministic Finite Automata Formal Definition of DFA a b Definition: A deterministic finite automaton (DFA) is a 5-tuple

a M =(Q, Σ,δ,q0,F), q b q q 1 2 3 where a, b 1. Q is a finite set of states. • Edges tell how to move when in a state and a symbol from Σ is read. 2. Σ is an alphabet, and the DFA processes strings over Σ. • DFA is fed input string w ∈ Σ∗. After reading last symbol of w, 3. δ : Q × Σ → Q is the transition function. if DFA is in an accept state, then string is accepted otherwise, it is rejected. • δ defineslabeloneachedge. • Σ={a, b} Process the following strings over on above machine: 4. q0 ∈ Q is the start state (or initial state). a b a a abaa q1 q1 q2 q3 q2 is accepted 5. F ⊆ Q is the set of accept states (or final states). a b a aba is rejected q1 q1 q2 q3 ε is rejected q1 Remark: Sometimes refer to DFA as simply a finite automaton (FA).

CS 341: Chapter 1 1-7 CS 341: Chapter 1 1-8 Transition Function of DFA Example of DFA a b a b a a b q b q q q1 q2 q3 1 2 3 a, b a, b M =(Q, Σ,δ,q1,F) with Transition function δ : Q × Σ → Q works as follows: • Q = {q1,q2,q3} • For each state and for each symbol of the input alphabet, • Σ={a, b} the function δ tellswhich(one)statetogotonext. • δ : Q × Σ → Q is described as • r ∈ Q ∈ Σ δ(r, ) Specifically, if and ,then is the state that the DFA ab goes to when it is in state r and reads in , e.g., δ(q2,a)=q3. q1 q1 q2 • r ∈ Q ∈ Σ For each pair of state and symbol , q2 q3 q2 there is exactly one arc leaving r labeled with . q3 q2 q2 • Thus, there is no choice in how to process a string. • q1 is the start state So the machine is deterministic. • F = {q2}. CS 341: Chapter 1 1-9 CS 341: Chapter 1 1-10 How a DFA Computes Formal Definition of DFA Computation • M =(Q, Σ,δ,q ,F) ∗ Let 0 be a DFA. • DFA is presented with an input string w ∈ Σ . ∗ • String w = w1w2 ···wn ∈ Σ , where each wi ∈ Σ and n ≥ 0. • DFA begins in the start state. • Then M accepts w if there exists a sequence of states r0,r1,r2,...,rn ∈ Q such that • DFA reads the string one symbol at a time, starting from the left. 1. r0 = q0 first state r0 in the sequence is the start state of DFA; • The symbols read in determine the sequence of states visited. 2. rn ∈ F • Processing ends after the last symbol of w has been read. last state rn in the sequence is an accept state; 3. δ(ri,wi+1)=ri+1 for each i =0, 1, 2,...,n− 1 • After reading the entire input string sequence of states corresponds to valid transitions for string w. if DFA ends in an accept state, then input string w is accepted; w1 w2 wn r0 r1 r2 ··· rn−1 rn otherwise, input string w is rejected.

CS 341: Chapter 1 1-11 CS 341: Chapter 1 1-12 Language of Machine Examples of Deterministic Finite Automata

• Deﬁnition: If A is the set of all strings that machine M accepts, Example: Consider the following DFA M1 with alphabet Σ={0, 1} : then we say 0 0 A = L(M) is the language of machine M,and 1 q1 q2 M recognizes A. 1 Remarks: • If machine M has input alphabet Σ,thenL(M) ⊆ Σ∗. • 010110 is accepted, but 0101 is rejected. • L(M1) is the language of strings over Σ in which the total number of • Deﬁnition: A language is regular if it is recognized by some DFA. 1’s is odd. • Can you come up with a DFA that recognizes the language of strings over Σ having an even number of 1’s ? CS 341: Chapter 1 1-13 CS 341: Chapter 1 1-14

Example: Consider the following DFA M2 with alphabet Σ={0, 1} : Example: Consider the following DFA M3 with alphabet Σ={0, 1} : 0, 1 0, 1 0, 1 0, 1 0, 1 0, 1 q1 q2 q3 q1 q2 q3

Remarks: Remarks:

• L(M2) is language of strings over Σ that have length 1, i.e., • L(M3) is the language of strings over Σ that do not have length 1, ∗ i.e. L(M2)={ w ∈ Σ ||w| =1} ∗ L(M3)=L(M2)={w ∈ Σ ||w| =1} • • Recall that L(M2), the complement of L(M2),isthesetofstrings DFA can have more than one accept state. over Σ not in L(M2), i.e., • Start state can also be an accept state. ∗ L(M2)=Σ − L(M2). • In general, a DFA accepts ε if and only if the start state is also an Can you come up with a DFA that recognizes L(M2) ? accept state.

CS 341: Chapter 1 1-15 CS 341: Chapter 1 1-16 Constructing DFA for Complement Example: Consider the following DFA M4 with alphabet Σ={a, b} : • In general, given a DFA M for language A, a b we can make a DFA M for A from M by b changing all accept states in M into non-accept states in M, q q b q changing all non-accept states in M into accept states in M, 1 2 3 a • More formally, suppose language A over alphabet Σ has a DFA a M =(Q, Σ,δ,q1,F).

• Then, a DFA for the complementary language A is Remarks:

M =(Q, Σ,δ,q1,Q− F ). • L(M4) is the language of strings over Σ that end with bb, i.e., ∗ ∗ where Q, Σ,δ,q1,F are the same as in DFA M. L(M4)={ w ∈ Σ | w = sbb for some s ∈ Σ }.

• Note that abbb ∈ L(M4) and bba ∈ L(M4). • Why does this work? CS 341: Chapter 1 1-17 CS 341: Chapter 1 1-18

Example: Consider the following DFA M6 with alphabet Σ={a, b} : Example: Consider the following DFA M5 with alphabet Σ={a, b} : a, b

q1 q q a 2 a 4 a q b a 1 a b Remarks: • Σ b b This DFA accepts all possible strings over , i.e., q3 q5 ∗ L(M6)=Σ .

b • In general, any DFA in which all states are accept states recognizes the language Σ∗. ∗ ∗ L(M5)={ w ∈ Σ | w = saa or w = sbb for some string s ∈ Σ }.

Note that abbb ∈ L(M5) and bba ∈ L(M5).

CS 341: Chapter 1 1-19 CS 341: Chapter 1 1-20

Example: Consider the following DFA M7 with alphabet Σ={a, b} : Example: Consider the following DFA M8 with alphabet Σ={a, b} : a, b a q1 q2 q1 a • a b b DFA moves left or right on . b b • b a DFA moves up or down on . Remarks: q3 q4 • This DFA accepts no strings over Σ, i.e., a

L(M7)=∅. • This DFA recognizes the language of strings over Σ having • In general, even number of a’s and F = ∅⊆Q a DFA may have no accept states, i.e., . even number of b’s. any DFA with no accept states recognizes the language ∅. • Note that ababaa ∈ L(M8) and bba ∈ L(M8). CS 341: Chapter 1 1-21 CS 341: Chapter 1 1-22 Some Operations on Languages Closed under Operation

• Let A and B be languages. • Recall that a collection S of objects is closed under operation f if applying f to members of S always returns an object still in S. • Recall we previously deﬁned the operations: e.g., N = {1, 2, 3,...} is closed under addition but not subtraction. Union: A ∪ B = { w | w ∈ A or w ∈ B }. • Previously saw that given a DFA M1 for language A, Concatenation: can construct DFA M2 for complementary language A.

A ◦ B = { vw | v ∈ A, w ∈ B }. Make all accept states in M1 into non-accept states in M2. Make all non-accept states in M1 into accept states in M2. Kleene star: ∗ A = { w1 w2 ··· wk | k ≥ 0 and each wi ∈ A }. • Thus, the class of regular languages is closed under complementation. i.e., if A is a regular language, then A is a regular language.

CS 341: Chapter 1 1-23 CS 341: Chapter 1 1-24

Regular Languages Closed Under Union Example: Consider the following DFAs and languages over Σ={a, b} : Theorem 1.25 • DFA M1 recognizes language A1 = L(M1) The class of regular languages is closed under union. • DFA M2 recognizes language A2 = L(M2) • i.e., if A1 and A2 are regular languages, then so is A1 ∪ A2. DFA M1 for A1 DFA M2 for A2 Proof Idea: a • A M a b Suppose 1 is regular, so it has a DFA 1. y1 y2 b • Suppose A2 is regular, so it has a DFA M2. a, b x1 x2 b • w ∈ A1 ∪ A2 if and only if w ∈ A1 or w ∈ A2. a y3 a, b • w ∈ A1 ∪ A2 if and only if w is accepted by M1 or M2. • Need DFA M3 to accept a string w iﬀ w is accepted by M1 or M2. • M A ∪ A • Construct M3 to keep track of where the input would be if it were We now want a DFA 3 for 1 2. simultaneously running on both M1 and M2. • Accept string if and only if M1 or M2 accepts. CS 341: Chapter 1 1-25 CS 341: Chapter 1 1-26 DFA M1 for A1 DFA M2 for A2 DFA M1 for A1 DFA M2 for A2 a b a a b a y1 y2 y1 y2 b a, b b a, b x1 x2 b x1 x2 b a a y3 a, b y3 a, b

Step 1 to build DFA M3 for A1 ∪ A2: Begin in start states for M1 and M2 Step 2: From (x1,y1) on input a, M1 moves to x1,andM2 moves to y2. a

(x1,y1) (x1,y1) (x1,y2)

CS 341: Chapter 1 1-27 CS 341: Chapter 1 1-28 DFA M1 for A1 DFA M2 for A2 DFA M1 for A1 DFA M2 for A2 a b a a b a y1 y2 y1 y2 b a, b b a, b x1 x2 b x1 x2 b a a y3 a, b y3 a, b

Step 3: From (x1,y1) on input b, M1 moves to x2,andM2 moves to y3. Step 4: From (x1,y2) on input a, M1 moves to x1,andM2 moves to y1. a a

(x1,y1) (x1,y2) (x1,y1) a (x1,y2)

b b

(x2,y3) (x2,y3) CS 341: Chapter 1 1-29 CS 341: Chapter 1 1-30 DFA M1 for A1 DFA M2 for A2 DFA M1 for A1 DFA M2 for A2 a b a a b a y1 y2 y1 y2 b a, b b a, b x1 x2 b x1 x2 b a a y3 a, b y3 a, b

Step 5: From (x1,y2) on input b, M1 moves to x2,andM2 moves to y1, .... Continue until each state has outgoing edge for each symbol in Σ. a a

(x1,y1) a (x1,y2) (x1,y1) a (x1,y2) a a b b b b (x2,y2) a b b (x ,y ) (x ,y ) (x ,y ) (x ,y ) (x ,y ) 2 3 2 1 1 3 2 3 b 2 1 a b

CS 341: Chapter 1 1-31 CS 341: Chapter 1 1-32 DFA M1 for A1 DFA M2 for A2 Proof that Regular Languages Closed Under Union a b a • A1 A2 Σ y1 y2 Suppose and are deﬁned over the same alphabet . b a, b • Suppose A1 recognized by DFA M1 =(Q1, Σ,δ1,q1,F1). x1 x2 b a • Suppose A2 recognized by DFA M2 =(Q2, Σ,δ2,q2,F2). y3 a, b • Deﬁne DFA M3 =(Q3, Σ,δ3,q3,F3) for A1 ∪ A2 as follows:

Set of states of M3 is Accept states for DFA M3 for A1 ∪ A2 have accept state from M1 or M2 Q = Q × Q = { (x, y) | x ∈ Q ,y∈ Q }. a 3 1 2 1 2 The alphabet of M3 is Σ. (x1,y1) a (x1,y2) M3 has transition function δ3 : Q3 × Σ → Q3 such that for a a b x ∈ Q1, y ∈ Q2,and ∈ Σ, b (x2,y2) a b b δ3((x, y),)=( δ1(x, ),δ2(y, ) ) . (x ,y ) (x ,y ) (x ,y ) 1 3 2 3 b 2 1 ThestartstateofM3 is a q =(q ,q ) ∈ Q . b 3 1 2 3 CS 341: Chapter 1 1-33 CS 341: Chapter 1 1-34 The set of accept states of M3 is Regular Languages Closed Under Intersection F3 = { (x, y) ∈ Q1 × Q2 | x ∈ F1 or y ∈ F2 } Theorem =[F1 × Q2] ∪ [Q1 × F2]. The class of regular languages is closed under intersection.

• A1 A2 A1 ∩ A2 • Because Q3 = Q1 × Q2, i.e., if and are regular languages, then so is .

number of states in new machine M3 is |Q3| = |Q1|·|Q2|. Proof Idea: • A M • Thus, |Q3| < ∞ because |Q1| < ∞ and |Q2| < ∞. 1 has DFA 1. • A2 has DFA M2. • w ∈ A1 ∩ A2 if and only if w ∈ A1 and w ∈ A2. Remark: • w ∈ A1 ∩ A2 if and only if w is accepted by both M1 and M2. • We can leave out a state (x, y) ∈ Q1 × Q2 from Q3 if (x, y) is not • Need DFA M3 to accept string w iﬀ w is accepted by M1 and M2. reachable from M3’s initial state (q1,q2). • Construct M3 to simultaneously keep track of where the input would • This would result in fewer states in Q3, but still we have |Q1|·|Q2| as be if it were running on both M1 and M2. an upper bound for |Q3|; i.e., |Q3|≤|Q1|·|Q2| < ∞. • Accept string if and only if both M1 and M2 accept.

CS 341: Chapter 1 1-35 CS 341: Chapter 1 1-36 Regular Languages Closed Under Concatenation Nondeterministic Finite Automata

• In any DFA, the next state the machine goes to on any given symbol is Theorem 1.26 uniquely determined. Class of regular languages is closed under concatenation. a b • i.e., if A1 and A2 are regular languages, then so is A1 ◦ A2. b b q1 q2 q3 a Remark: a • It is possible (but cumbersome) to directly construct a DFA for A ◦ A A A 1 2 given DFAs for 1 and 2. • This is why these machines are deterministic. • Remember that the transition function in a DFA is deﬁned as • There is a simpler way if we introduce a new type of machine. δ : Q × Σ → Q.

• Because range of δ is Q,fcnδ always returns a single state. • DFA has exactly one transition leaving each state for each symbol. δ(q, ) tells what state the edge out of q labeled with leads to. CS 341: Chapter 1 1-37 CS 341: Chapter 1 1-38 1 0,ε 1 Nondeterminism q1 q2 q3 q4 0, 1

• Nondeterministic ﬁnite automata (NFAs) allow for several or no 0, 1 choices to exist for the next state on a given symbol. • • For a state q and symbol ∈ Σ,NFAcanhave Suppose NFA is in a state with multiple ways to proceed, e.g., in state q1 and the next symbol in input string is 1. multiple edges leaving q labelled with the same symbol • no edge leaving q labelled with symbol The machine splits into multiple copies of itself (threads). edges leaving q labelled with ε Each copy proceeds with computation independently of others. can take ε-edge without reading any symbol from input string. NFA may be in a set of states, instead of a single state. NFA follows all possible computation paths in parallel. Example: NFA N1 with alphabet Σ={0, 1}. If a copy is in a state and next input symbol doesn’t appear on any outgoing edge from the state, then the copy dies or crashes. 1 0,ε 1 • q1 q2 q3 q4 0, 1 If any copy ends in an accept state after reading entire input string, the NFA accepts the string. 0, 1 • If no copy ends in an accept state after reading entire input string, then NFA does not accept (rejects) the string.

CS 341: Chapter 1 1-39 CS 341: Chapter 1 1-40 0, 1 0,ε q 1 q q 1 q 1 0,ε 1 1 2 3 4 q1 q2 q3 q4 0, 1 0, 1 0, 1 Symbol read q1 Start • Similarly, if a state with an ε-transition is encountered, 0 q1 without reading an input symbol, NFA splits into multiple copies, 1 each one following an exiting ε-transition (or staying put). q1 q2 q3 0 Each copy proceeds independently of other copies. q1 q3 NFA follows all possible paths in parallel. 1 q1 q2 q3 q4 NFA proceeds nondeterministically as before. 1 q1 q2 q3 q4 q4 0 • What happens on input string 010110 ? q1 q3 q4 q4 CS 341: Chapter 1 1-41 CS 341: Chapter 1 1-42 Formal Definition of NFA Example: NFA N Σ Σ =Σ∪{ε} q1 Definition: For an alphabet ,define ε . • Σ ε ε is set of possible labels on NFA edges. b a a Definition: A nondeterministic finite automaton (NFA) is a 5-tuple (Q, Σ,δ,q0,F),where q2 q3 a, b 1. Q is a finite set of states 2. Σ is an alphabet • N accepts strings ε, a, aa, baa, baba, .... 3. δ : Q × Σε →P(Q) is the transition function, where e.g., aa = εaεa •P(Q) is the power set of Q q ε q a q ε q a q 1 3 1 3 1 • δ defineslabeloneachedge. q ∈ Q • N does not accept (i.e., rejects) strings b, ba, bb, bbb, .... 4. 0 is the start state 5. F ⊆ Q is the set of accept states.

CS 341: Chapter 1 1-43 CS 341: Chapter 1 1-44 Difference Between DFA and NFA 1 0,ε 1 q1 q2 q3 q4 0, 1 • DFA has transition function δ : Q × Σ → Q. 0, 1 0 1 1 N =(Q, Σ,δ,q1,F) q1 q2 Formal description of above NFA 0 • Q = {q1,q2,q3,q4} is the set of states • NFA has transition function δ : Q × Σε →P(Q). • Σ={0, 1} is the alphabet Returns a set of states rather than a single state. • Transition function δ : Q × Σε →P(Q) Allows for ε-transitions because Σε =Σ∪{ε}. 01ε For state q ∈ Q and ∈ Σε, δ(q, ) is set of states where edges q1 {q1}{q1,q2}∅ out of q labeled with lead to. q2 {q3}∅{q3} q3 ∅{q4}∅ 1 0,ε 1 q {q }{q }∅ q1 q2 q3 q4 0, 1 4 4 4 • q1 is the start state 0, 1 • F = {q4} is the set of accept states • Remark: Note that every DFA is also an NFA. CS 341: Chapter 1 1-45 CS 341: Chapter 1 1-46 Formal Definition of NFA Computation Equivalence of DFAs and NFAs ∗ • Let N =(Q, Σ,δ,q0,F) be an NFA and w ∈ Σ . Definition: Two machines (of any types) are equivalent if they • Then N accepts w if recognize the same language. w w = y y ··· y m ≥ 0 we can write as 1 2 m for some , Theorem 1.39 y ∈ Σ where each i ε,and Every NFA N has an equivalent DFA M. there is a sequence of states r0,r1,r2,...,rm in Q such that • i.e., if N is some NFA, then ∃ DFA M such that L(M)=L(N). 1. r0 = q0 2. ri+1 ∈ δ(ri,yi+1) for each i =0, 1, 2,...,m− 1 3. rm ∈ F Proof Idea: y2 ym • N y1 y2 ym NFA splits into multiple copies of itself on nondeterministic moves. r0 r1 r2 ··· rm−1 rm

y1 y2 • NFA can be in a set of states at any one time.

Deﬁnition: The set of all input strings that are accepted by NFA N is • Build DFA M whose set of states is the power set of the set of states the language recognized by N and is denoted by L(N). of NFA N, keeping track of where N can be at any time.

CS 341: Chapter 1 1-47 CS 341: Chapter 1 1-48 0, 1 1 0,ε 1 q1 q2 q3 q4 Example: Convert NFA N into equivalent DFA. 0, 1 0,ε q 1 q q 1 q 0, 1 1 2 3 4

0, 1 Symbol read q1 Start 0 q1 1 N’s start state q1 has no ε-edges out, so DFA has start state {q1}. q1 q2 q3 0 q1 q3 1 q1 q2 q3 q4 1 q1 q2 q3 q4 q4 {q1} 0 q1 q3 q4 q4 CS 341: Chapter 1 1-49 CS 341: Chapter 1 1-50

Example: Convert NFA N into equivalent DFA. Example: Convert NFA N into equivalent DFA. 0, 1 0, 1 1 0,ε 1 1 0,ε 1 q1 q2 q3 q4 q1 q2 q3 q4

0, 1 0, 1

On reading 0 from states in {q1}, can reach states {q1}. On reading 1 from states in {q1}, can reach states {q1,q2,q3}.

{q } {q } {q ,q ,q } 1 1 1 1 2 3 0 0

CS 341: Chapter 1 1-51 CS 341: Chapter 1 1-52

Example: Convert NFA N into equivalent DFA. Example: Convert NFA N into equivalent DFA. 0, 1 0, 1 1 0,ε 1 1 0,ε 1 q1 q2 q3 q4 q1 q2 q3 q4

0, 1 0, 1

On reading 0 from states in {q1,q2,q3}, can reach states {q1,q3}. On reading 1 from states in {q1,q2,q3}, can reach {q1,q2,q3,q4}.

{q1,q3} {q1,q3}

0 0

{q } {q ,q ,q } {q } {q ,q ,q } {q ,q ,q ,q } 1 1 1 2 3 1 1 1 2 3 1 1 2 3 4 0 0 CS 341: Chapter 1 1-53 CS 341: Chapter 1 1-54

Example: Convert NFA N into equivalent DFA. Example: Convert NFA N into equivalent DFA. 0, 1 0, 1 1 0,ε 1 1 0,ε 1 q1 q2 q3 q4 q1 q2 q3 q4

0, 1 0, 1

On reading 0 from states in {q1,q3}, can reach states {q1}. On reading 1 from states in {q1,q3}, can reach states {q1,q2,q3,q4}.

{q1,q3} {q1,q3}

0 0 1 0 0

{q } {q ,q ,q } {q ,q ,q ,q } {q } {q ,q ,q } {q ,q ,q ,q } 1 1 1 2 3 1 1 2 3 4 1 1 1 2 3 1 1 2 3 4 0 0

CS 341: Chapter 1 1-55 CS 341: Chapter 1 1-56

Example: Convert NFA N into equivalent DFA. Proof. (Theorem 1.39) 0, 1 1 0,ε 1 • Consider NFA N =(Q, Σ,δ,q0,F): q1 q2 q3 q4

0, 1 1 0,ε 1 q1 q2 q3 q4 0, 1 Continue until each DFA state has a 0-edge and a 1-edge leaving it. DFA accept states have ≥ 1 accept states from N. 0, 1 0

{q1,q3} {q1,q4} • Deﬁnition: The ε-closure of a set of states R ⊆ Q is 1 E(R)={ q | q R 0 1 0 can be reached from by 0 0 travelling over 0 or more ε transitions }. {q } {q ,q ,q } {q ,q ,q ,q } {q ,q ,q } 1 1 1 2 3 1 1 2 3 4 1 3 4 1 0 e.g., E({q1,q2})={q1,q2,q3}. 1 CS 341: Chapter 1 1-57 CS 341: Chapter 1 1-58 Convert NFA to Equivalent DFA Regular ⇐⇒ NFA

Given NFA N =(Q, Σ,δ,q0,F), build an equivalent DFA M =(Q , Σ,δ,q0,F ) as follows: Corollary 1.40 Language A is regular if and only if some NFA recognizes A. 1. Calculate the ε-closure of every subset R ⊆ Q. 2. Define DFA M’s set of states Q = P(Q). Proof. (⇒) 3. Define DFA M’s start state q0 = E({q0}). • A 4. Define DFA M’s set of accept states F to be all DFA states in Q that If is regular, then there is a DFA for it. include an accept state of NFA N; i.e., • But every DFA is also an NFA, so there is an NFA for A. F = { R ∈ Q | R ∩ F = ∅}. (⇐) M δ : Q × Σ → Q 5. Calculate DFA ’s transition function as • Follows from previous theorem (1.39), which showed that every NFA δ (R, )={ q ∈ Q | q ∈ E(δ(r, )) for some r ∈ R } has an equivalent DFA. for R ∈ Q = P(Q) and ∈ Σ. 6. Can leave out any state q ∈ Q not reachable from q0, e.g., {q2,q3} in our previous example.

CS 341: Chapter 1 1-59 CS 341: Chapter 1 1-60 Class of Regular Languages Closed Under Union Proof Idea: Given NFAs N1 and N2 for A1 and A2,resp., construct NFA N for A1 ∪ A2 as follows: Remark: Can use fact that every NFA has an equivalent DFA to simplify the proof that the class of regular languages is closed under union. N1 N Remark: Recall union: ε A1 ∪ A2 = { w | w ∈ A1 or w ∈ A2 }.

Theorem 1.45 N2 The class of regular languages is closed under union. ε CS 341: Chapter 1 1-61 CS 341: Chapter 1 1-62 Construct NFA for A1 ∪ A2 from NFAs for A1 and A2 Class of Regular Languages Closed Under Concatenation

• Let A1 be language recognized by NFA N1 =(Q1, Σ,δ1,q1,F1). Remark: Recall concatenation: • A N =(Q , Σ,δ ,q ,F ) Let 2 be language recognized by NFA 2 2 2 2 2 . A ◦ B = { vw | v ∈ A, w ∈ B }.

• Construct NFA N =(Q, Σ,δ,q0,F) for A1 ∪ A2 : Theorem 1.47 Q = {q0}∪Q1 ∪ Q2 is set of states of N. The class of regular languages is closed under concatenation. q0 is start state of N.

Set of accept states F = F1 ∪ F2. For q ∈ Q and a ∈ Σε, transition function δ satisﬁes ⎧ ⎪ ⎪ δ (q, a) q ∈ Q , ⎪ 1 if 1 ⎨⎪ δ (q, a) q ∈ Q , δ(q, a)= 2 if 2 ⎪ {q ,q } q = q a = ε, ⎪ 1 2 if 0 and ⎪ ⎩ ∅ if q = q0 and a = ε.

CS 341: Chapter 1 1-63 CS 341: Chapter 1 1-64 Proof Idea: Given NFAs N1 and N2 for A1 and A2,resp., Construct NFA for A1 ◦ A2 from NFAs for A1 and A2 construct NFA N for A1 ◦ A2 as follows: • Let A1 be language recognized by NFA N1 =(Q1, Σ,δ1,q1,F1). N1 N2 • Let A2 be language recognized by NFA N2 =(Q2, Σ,δ2,q2,F2).

• Construct NFA N =(Q, Σ,δ,q1,F2) for A1 ◦ A2 :

Q = Q1 ∪ Q2 is set of states of N.

Start state of N is q1, which is start state of N1. N F N N Set of accept states of is 2,whichissameasfor 2. For q ∈ Q and a ∈ Σε, transition function δ satisﬁes ⎧ ε ⎪ ⎪ δ1(q, a) q ∈ Q1 − F1, ε ⎪ if ⎨⎪ δ (q, a) q ∈ F a = ε, ε δ(q, a)= 1 if 1 and ⎪ δ (q, a) ∪{q } q ∈ F a = ε, ⎪ 1 2 if 1 and ⎪ ⎩ δ2(q, a) if q ∈ Q2. CS 341: Chapter 1 1-65 CS 341: Chapter 1 1-66 Class of Regular Languages Closed Under Star Proof Idea: Given NFA N1 for A, construct NFA N for A∗ as follows: Remark: Recall Kleene star: ∗ A = { x1 x2 ··· xk | k ≥ 0 and each xi ∈ A }.

N N Theorem 1.49 1 The class of regular languages is closed under the Kleene-star operation. ε ε ε

CS 341: Chapter 1 1-67 CS 341: Chapter 1 1-68 Construct NFA for A∗ from NFA for A Regular Expressions • A N =(Q , Σ,δ ,q ,F ) Let be language recognized by NFA 1 1 1 1 1 . • Regular expressions are a way of describing certain languages. • N =(Q, Σ,δ,q ,F) A∗ Construct NFA 0 for : • Consider alphabet Σ={0, 1}. Q = {q }∪Q N 0 1 is set of states of . • Shorthand notation: q N 0 is start state of . 0 means {0} F = {q0}∪F1 is the set of accept states of N. 1 means {1}

For q ∈ Q and a ∈ Σε, transition function δ satisﬁes • ⎧ Regular expressions use above shorthand notation and operations ⎪ ⎪ δ1(q, a) if q ∈ Q1 − F1, ⎪ ∪ ⎪ union ⎨⎪ δ1(q, a) if q ∈ F1 and a = ε, δ(q, a)= δ (q, a) ∪{q } q ∈ F a = ε, concatenation ◦ ⎪ 1 1 if 1 and ⎪ ⎪ {q } q = q a = ε, Kleene star ∗ ⎪ 1 if 0 and ⎩⎪ ∅ if q = q0 and a = ε. • When using concatenation, will often leave out operator “◦”. CS 341: Chapter 1 1-69 CS 341: Chapter 1 1-70 Interpreting Regular Expressions Another Example of a Regular Expression

Example: 0 ∪ 1 means {0}∪{1}, which equals {0, 1}. Example: • (0 ∪ 1)∗ means ({0}∪{1})∗. Example: ∗ ∗ • Consider (0 ∪ 1)0 ,whichmeans(0 ∪ 1) ◦ 0 . • This equals {0, 1}∗, which is the set of all possible strings over the alphabet Σ={0, 1}. • This equals {0, 1}◦{0}∗. ∗ • Recall {0} = { ε, 0, 00, 000,...}. • When Σ={0, 1}, often use shorthand notation Σ to denote regular expression (0 ∪ 1). • Thus, {0, 1}◦{0}∗ is the set of strings that start with symbol 0 or 1,and followed by zero or more 0’s.

CS 341: Chapter 1 1-71 CS 341: Chapter 1 1-72 Hierarchy of Operations in Regular Expressions More Examples of Regular Expressions • In most programming languages, Example: 00 ∪ 101∗ is language consisting of multiplication has precedence over addition • string 00 2+3× 4=14 • strings that begin with 10 andfollowedbyzeroormore1’s. parentheses change usual order (2 + 3) × 4=20 ∗ Example: 0(0 ∪ 101) is the language consisting of strings that exponentiation has precedence over multiplication and addition • start with 0 4+2× 32 = , 4+(2× 3)2 = . • concatenated to a string in {0, 101}∗. • Order of precedence for the regular operations: For example, 0101001010 is in the language because 1. Kleene star 0101001010 = 0 ◦ 101 ◦ 0 ◦ 0 ◦ 101 ◦ 0. 2. concatenation 3. union • Parentheses change usual order. CS 341: Chapter 1 1-73 CS 341: Chapter 1 1-74 Formal Definition of Regular Expression Examples of Regular Expressions Definition: R is a regular expression with alphabet Σ if R is Examples: For Σ={0, 1}, 1. a for some a ∈ Σ 1. (0 ∪ 1) = {0, 1} ∗ ∗ 2. ε 2. 0 10 = { w | w has exactly a single 1 } ∗ ∗ 3. ∅ 3. Σ 1Σ = { w | w has at least one 1 } ∗ ∗ 4. (R1 ∪ R2),whereR1 and R2 are regular expressions 4. Σ 001Σ = { w | w contains 001 as a substring } ∗ 5. (R1) ◦ (R2), also denoted by (R1)(R2), 5. (ΣΣ) = { w ||w| is even } R R ∗ where 1 and 2 are regular expressions 6. (ΣΣΣ) = { w ||w| is a multiple of three } (R )∗ R ∗ ∗ 6. 1 ,where 1 is a regular expression 7. 0Σ 0 ∪ 1Σ 1 ∪ 0 ∪ 1= 7. (R1),whereR1 is a regular expression. { w | w starts and ends with the same symbol } 1∗∅ = ∅ Can remove redundant parentheses, e.g., ((0) ∪ (1))(1) −→ (0 ∪ 1)1. 8. , anything concatenated with ∅ is equal to ∅. ∗ Definition: If R is a regular expression, then L(R) is the language 9. ∅ = {ε} generated (or described or defined)byR.

CS 341: Chapter 1 1-75 CS 341: Chapter 1 1-76

Examples: Kleene’s Theorem 1. R ∪∅ = ∅∪R = R Theorem 1.54 Language A is regular iﬀ A has a regular expression. 2. R ◦ ε = ε ◦ R = R 3. R ◦∅ = ∅◦R = ∅ Lemma 1.55 4. R1(R2 ∪ R3)=R1R2 ∪ R1R3. If a language is described by a regular expression, then it is regular. Concatenation distributes over union. Proof. Procedure to convert regular expression R into NFA N : 1. If R = a for some a ∈ Σ,thenL(R)={a}, which has NFA Example: a • Deﬁne EVEN-EVEN over alphabet Σ={a, b} asstringswithaneven q1 q2 number of a’s and an even number of b’s. • For example, aababbaaababab ∈ EVEN-EVEN. N =({q1,q2}, Σ,δ,q1, {q2}) where transition function δ • Regular expression: • δ(q1,a)={q2}, aa ∪ bb ∪ (ab ∪ ba)(aa ∪ bb)∗(ab ∪ ba) ∗ • δ(r, b)=∅ for any state r = q1 or any b ∈ Σε with b = a. CS 341: Chapter 1 1-77 CS 341: Chapter 1 1-78 2. If R = ε,thenL(R)={ε}, which has NFA 4. If R =(R1 ∪ R2) and • L(R1) has NFA N1 q1 • L(R2) has NFA N2, then L(R)=L(R1) ∪ L(R2) has NFA N below: N =({q1}, Σ,δ,q1, {q1}) where • δ(r, b)=∅ for any state r and any b ∈ Σε. N1 N ε 3. If R = ∅,thenL(R)=∅, which has NFA

q1 N2 ε

N =({q1}, Σ,δ,q1, ∅) where • δ(r, b)=∅ for any state r and any b ∈ Σε.

CS 341: Chapter 1 1-79 CS 341: Chapter 1 1-80 ∗ 5. If R =(R1) ◦ (R2) and 6. If R =(R1) and L(R1) has NFA N1, ∗ L(R)=(L(R1)) N • L(R1) has NFA N1 then has NFA below: • L(R2) has NFA N2, N N1 then L(R)=L(R1) ◦ L(R2) has NFA N below: ε N1 N2 ε ε

N • Thus, can convert any regular expression R into an NFA. ε • L(R) ε Hence, Corollary 1.40 implies that the language is regular. ε CS 341: Chapter 1 1-81 CS 341: Chapter 1 1-82 a Ex: Build NFA a More of Kleene’s Theorem for (ab ∪ a)∗ Lemma 1.60 b b If a language is regular, then it has a regular expression.

Proof Idea: ab a ε b • Convert DFA into regular expression. • Use generalized NFA (GNFA), which is an NFA with following a ε b ε modiﬁcations: ab ∪ a ε a no edges into start state. single accept state, with no edges out of it. (ab ∪ a)∗ a ε b labels on edges are regular expressions instead of ε elements from Σε. ε ε ε can traverse edge on any string generated by its regular expression. ε a ∃ other correct NFAs

CS 341: Chapter 1 1-83 CS 341: Chapter 1 1-84 Example: GNFA ba∗ Method to convert DFA into regular expression

q3 1. First convert DFA into equivalent GNFA. (aa ∪ b)∗ (ab)∗a∗ 2. Apply following iterative procedure: ε ε • In each step, eliminate one state from GNFA. q1 q2 q4 q5 (b ∪ a∗b)∗ When state is eliminated, need to account for every path that was previously possible. • Can move from Can eliminate states in any order but end result will be diﬀerent. Never delete start or (unique) accept state. q1 to q2 on string ε. • Done when only 2 states remaining: start and accept. q2 to q3 on string aabaa. q3 to q3 on string b or baaa. Label on remaining arc between start and accept states is a regular expression for language of original DFA. q3 to q4 on string ε. q4 to q5 on string ε. Remark: Method also can convert NFA into a regular expression. • GNFA accepts string ε ◦ aabaa ◦ b ◦ baaa ◦ ε ◦ ε = aabaabbaaa. CS 341: Chapter 1 1-85 CS 341: Chapter 1 1-86 1. Convert DFA M =(Q, Σ,δ,q1,F) into equivalent GNFA G. 2. Iteratively eliminate a state from GNFA G. • Introduce new start state s. • Need to take into account all possible previous paths. Add edge from s to q1 with label ε. • Never eliminate new start state s or new accept state t. Make q1 no longer the start state. • Introduce new accept state t. Example: Eliminate state q2, which has no other in/out edges. Addedgewithlabelε from each state q ∈ F to t. Make each state originally in F no longer an accept state. q R4 q • Change edge labels into regular expressions. 1 3 a, b a ∪ b ∗ e.g., “ ” becomes “ ”. R4 ∪ (R1)(R2) (R3) =⇒ q1 q3 R1 R3 G DFA M GNFA q2 ε ε ε R2

CS 341: Chapter 1 CS 341: Chapter 1 1-87 R2 1-88 R1 Example: Convert DFA M into regular expression. Example: v x a ∪ b a, b R3 x b ε b ε Eliminate state , R q1 q3 s q q 4 1 3 t which has no other R5 R6 a a b 1) Convert DFA b in/out edges R7 q2 q2 y z into GNFA a a R8 R9 a ∪ b • Let C = {v, z}, which are states with arcs into x (except for x). ε b ∪ aa∗b ε • Let D = {v, y, z}, which are states with arcs from x (except for x). 2.1) Eliminate state q2 s q1 q3 t • When we eliminate x, need to account for paths ε (b ∪ aa∗b)(a ∪ b)∗ from each state in C directly into x 2.2) Eliminate state q3 s q1 t then from x directly to x x D (b ∪ aa∗b)(a ∪ b)∗ ﬁnally from directly to each state in 2.3) Eliminate state q1 s t CS 341: Chapter 1 1-89 CS 341: Chapter 1 1-90 • Recall C = {v, z} and D = {v, y, z}. Example: Convert DFA into Regular Expression • x So eliminating state gives b (R )(R )∗(R ) ∗ 1 2 3 (R1)(R2) (R5) a 1 2 R2 R1 a v x v b ∗ a R (R6)(R2) (R3) 3 ∗ b (R1)(R2) (R4) Step 1. Convert DFA into GNFA R4 3 R5 R6 =⇒

R7 R7 b y z y z a R 1 2 8 R ε a 9 ε R ∪ (R )(R )∗(R ) 8 6 2 4 s b ∗ a R9 ∪ (R6)(R2) (R5) b t 3 ε • e.g., for path v → x → y,addarcfromv to y with label ∗ (R1)(R2) (R4)

CS 341: Chapter 1 1-91 CS 341: Chapter 1 1-92 b aa ∪ b a 1 2 a 2 ε a ε ε Step 2.2. Eliminate state 2 b s a s ab b t ba ∪ a t C = {s, 3} ε b 3 ε D = {3,t} 3 aa ∪ b Step 2.1. Eliminate state 1 bb 2 a ∗ C = {s, 2, 3} a(aa ∪ b) ε s s ab D = {2, 3} ba ∪ a t t b a(aa ∪ b)∗ab ∪ b 3 ε 3 (ba ∪ a)(aa ∪ b)∗ ∪ ε bb (ba ∪ a)(aa ∪ b)∗ab ∪ bb CS 341: Chapter 1 1-93 CS 341: Chapter 1 b 1-94 a 1 2 a(aa ∪ b)∗ a b s a b t a(aa ∪ b)∗ab ∪ b 3 3 (ba ∪ a)(aa ∪ b)∗ ∪ ε ﬁrst visit to 3 0 or more returns to 3 end in 2 or stay in 3 ∗ (ba ∪ a)(aa ∪ b) ab ∪ bb ∗ a(aa ∪ b)∗ab ∪ b (ba ∪ a)(aa ∪ b)∗ab ∪ bb (ba ∪ a)(aa ∪ b)∗ ∪ ε ∪ a(aa ∪ b)∗ ends in 2 with Step 2.3. Eliminate state 3 no visits to 3 C = {s},D= {t} • Regular expression accounts for all paths starting in start state 1 and ending in accepting state (2 or 3): ∗ ∗ ∗ ∗ (a(aa ∪ b) ab ∪ b)((ba ∪ a)(aa ∪ b) ab ∪ bb) ((ba ∪ a)(aa ∪ b) ∪ ε) visit state 3 at least once (ending in 2 or 3), or ∪ a(aa ∪ b)∗ s t never visit state 3 (ending in 2).

CS 341: Chapter 1 1-95 CS 341: Chapter 1 1-96 Finite Languages are Regular Pumping Lemma for Regular Languages

Theorem Example: DFA with alphabet Σ={0, 1} for language A. If A is a ﬁnite language, then A is regular.

Proof. q2 q4 0 0 0 • Because A ﬁnite, we can write 1 0 q1 1 A = { w1,w2, ..., wn } 0 n<∞ 1 1 for some . q3 q5 • A regular expression for A is then 1 R = w1 ∪ w2 ∪···∪wn • DFA has 5 states. • Kleene’s Theorem then implies A has a DFA, so A is regular. • DFA accepts string s = 0011, which has length 4. • On s = 0011, DFA visits all of the states. Remark: The converse is not true. e.g., 1∗ generates a regular language, but it’s inﬁnite. CS 341: Chapter 1 1-97 CS 341: Chapter 1 1-98 q q q2 q4 0 2 0 4 0 0 0 0 0 0 q 1 q 1 1 0 1 1 0 1 1 q 1 q 1 1 3 5 q3 q5 1 1 • For any string s with |s|≥5, guaranteed to visit some state twice • Recall DFA accepts string

by the pigeonhole principle. s =0 0110 11 . x y z • String s = 0011011 is accepted by DFA, i.e., s ∈ A. 0 0 1 1 0 1 1 • DFA also accepts strings q1 q2 q4 q3 q5 q2 q3 q5 xyyz =0 0110 0110 11, • q2 is first state visited twice. x y y z xyyyz =0 0110 0110 0110 11, • Using q2, divide string s into3partsx, y, z such that s = xyz. x y y y z xz =0 11 . x =0, the symbols read until first visit to q2. x z y = 0110, the symbols read from first to second visit to q2. • String xyiz ∈ A for each i ≥ 0. z =11, the symbols read after second visit to q2.

CS 341: Chapter 1 1-99 CS 341: Chapter 1 1-100 • More generally, consider Pumping y language A with DFA M having p states, y string s ∈ A with |s|≥p. x • s M z When processing on , guaranteed to visit some state twice. r • Let r be first state visited twice. • Using state r, can divide s as s = xyz. • Because y corresponds to starting in r and returning to r, i x are symbols read until first visit to r. xy z ∈ A for each i ≥ 1. y are symbols read from first to second visit to r. • xy0z = xz ∈ A z are symbols read from second visit to r to end of s. Also, note ,so xyiz ∈ A i ≥ 0. y for each •|y| > 0 because x z r y corresponds to starting in r and coming back; this consumes at least one symbol (because DFA), so y can’t be empty. CS 341: Chapter 1 1-101 CS 341: Chapter 1 1-102 Length of xy Pumping Lemma y Theorem 1.70 If A is regular language, then ∃ number p (pumping length) where, x z r if s ∈ A with |s|≥p, then s can be split into 3 pieces, s = xyz, satisfying the conditions 1. xyiz ∈ A for each i ≥ 0, •|xy|≤p,wherep is number of states in DFA, because 2. |y| > 0, and xy are symbols read up to second visit to r. 3. |xy|≤p. Because r is the first state visited twice, r all states visited before second visit to are unique. Remarks: So just before visiting r for second time, DFA visited at most p • yi denotes i copies of y concatenated together, and y0 = ε. states, which corresponds to reading at most p − 1 symbols. •|y| > 0 means y = ε. The second visit to r, which is after reading 1 more symbol, corresponds to reading at most p symbols. •|xy|≤p means x and y together have no more than p symbols total.

CS 341: Chapter 1 1-103 CS 341: Chapter 1 1-104 Understanding the Pumping Lemma Nonregular Languages M M 1 2 Definition: Language is nonregular if there is no DFA for it. A ∃ p If is regular language, then number (pumping length) where, Remarks: M 3 • Pumping Lemma (PL) is a result about regular languages. if s ∈ A with |s|≥p, then ⎫ • But PL mainly used to prove that certain language A is nonregular. ⎪ s can be split into 3 pieces, s = xyz, satisfying conditions ⎪ ⎪ • Typically done using proof by contradiction. 1. xyiz ∈ A for each i ≥ 0, ⎬ M4 |y| > 0 ⎪ A 2. , and ⎪ Assume language is regular. ⎪ 3. |xy|≤p. ⎭ PL says that all strings s ∈ A that are at least a certain length must satisfy some conditions. if (M1 is true), then By appropriately choosing s ∈ A, will eventually get contradiction. M2 is true PL: can split s into s = xyz satisfying all of Conditions 1–3. M if ( 3 is true), then To get contradiction, show cannot split s = xyz satisfying 1–3. M4 is true Show all splits satisfying 2–3 violate Condition 1. endif |xy|≤p endif Because Condition 3 of PL states , often choose s ∈ A so that all of its first p symbols are the same. CS 341: Chapter 1 1-105 CS 341: Chapter 1 1-106 Language A = { 0n1n | n ≥ 0 } is Nonregular • So we have Proof. j x =0 for some j ≥ 0, • A A p k Suppose is regular, so PL implies has “pumping length” . y =0 for some k ≥ 0, • s =0p 1p ∈ A m p Consider string . z =0 1 for some m ≥ 0 • |s| =2p ≥ p, so Pumping Lemma will hold. • s = xyz implies • So can split s into 3 pieces s = xyz satisfying conditions p p j k m p j+k+m p 1. xyiz ∈ A for each i ≥ 0, 0 1 =00 0 1 =0 1 , |y| > 0 2. ,and so j + k + m = p. 3. |xy|≤p. • Condition 2 states that |y| > 0,sok>0. • To get contradiction, must show cannot split s = xyz satisfying 1–3. • Condition 1 implies xyyz ∈ A, but Show all splits s = xyz satisfying Conditions 2 and 3 will violate 1. xyyz =0j 0k 0k 0m 1p • Because the first p symbols of s =00 ···0 11 ···1 are all 0’s j+k+k+m p p p =0 1 Condition 3 implies that x and y consist of only 0’s. =0p+k 1p ∈ A z will be the rest of the 0’s, followed by all p 1’s. because j + k + m = p and k>0. • Key: y has some 0’s, and z contains all the 1’s (and maybe some 0’s), n n • Contradiction,soA = { 0 1 | n ≥ 0 } is nonregular. so pumping y changes # of 0’s but not # of 1’s.

CS 341: Chapter 1 1-107 CS 341: Chapter 1 1-108 Language B = { ww | w ∈{0, 1}∗ } is Nonregular • So we have Proof. x =0j j ≥ 0, • B B p for some Suppose is regular, so PL implies has “pumping length” . k p p y =0 for some k ≥ 0, • Consider string s =0 10 1 ∈ B. m p z =0 10 1 for some m ≥ 0 • |s| =2p +2≥ p, so Pumping Lemma will hold. • So can split s into 3 pieces s = xyz satisfying conditions • s = xyz implies i 1. xy z ∈ B for each i ≥ 0, 0p 10p 1=0j 0k 0m 10p 1=0j+k+m 10p 1, 2. |y| > 0,and j + k + m = p 3. |xy|≤p. so . • For contradiction, show cannot split s = xyz so that 1–3 hold. • Condition 2 states that |y| > 0,sok>0. Show all splits s = xyz satisfying Conditions 2 and 3 will violate 1. • Condition 1 implies xyyz ∈ B, but j k k m p • Because first p symbols of s =00 ···0 100 ···0 1 are all 0’s, xyyz =0 0 0 0 10 1 p p =0j+k+k+m 10p 1 x y 0 Condition 3 implies that and consist only of ’s. =0p+k 10p 1 ∈ B z willbetherestoffirstsetof0’s, followed by 10p 1. j + k + m = p k>0 • Key: y has some of first 0’s, and z has all of second 0’s, because and . ∗ so pumping y changes only # of first 0’s. • Contradiction,soB = { ww | w ∈{0, 1} } is nonregular. CS 341: Chapter 1 1-109 CS 341: Chapter 1 1-110 Important Steps in Proving Language is Nonregular Pumping Lemma (PL): A ∃ p Pumping Lemma (PL): If is a regular language, then number (pumping length) where, s ∈ A |s|≥p s s = xyz If A is a regular language, then ∃ number p (pumping length) where, if with , then can be split into 3 pieces, ,with i if s ∈ A with |s|≥p, then s can be split into 3 pieces, s = xyz,with 1. xy z ∈ A for each i ≥ 0, 1. xyiz ∈ A for each i ≥ 0, 2. |y| > 0,and 2. |y| > 0,and 3. |xy|≤p. |xy|≤p 3. . Examples: Remarks: 1. Let C = { w ∈{a, b}∗ | w = wR },wherewR is the reverse of w. • s ∈ A p p Must choose appropriate string to get contradiction. • To show C is nonregular, can choose s = a ba ∈ C. Some strings s ∈ A might not lead to contradiction. • Choosing s = ap ∈ C does not work. Why? • Because Condition 3 of PL states |xy|≤p, 2. To show D = { a2n b3n an | n ≥ 0 } is nonregular, can choose often choose s ∈ A so that all of its first p symbols are the same. s = a2p b3p ap ∈ D. • s ∗ Once appropriate is chosen, need to show every possible split of 3. Consider language E = { w ∈{a, b} | w has more a’s than b’s }. s = xyz leads to contradiction. For example, baaba ∈ E. • To show E is nonregular, can choose s = bp ap+1 ∈ E.

CS 341: Chapter 1 1-111 CS 341: Chapter 1 1-112 Common Mistake F = { w | # of 0’s in w equals # of 1’s in w } is Nonregular • Consider D = { a2n b3n an | n ≥ 0 }. • Note that, e.g., 101100 ∈ F . • To show D is nonregular, can choose s = a2p b3p ap ∈ D. • Need to be careful when choosing string s ∈ F for Pumping Lemma. • Common mistake: try to apply Pumping Lemma with i x = a2p,y= b3p,z= ap. If xyz ∈ F with y ∈ F ,thenxy z ∈ F ,sonocontradiction. • |xy| =5p ≤ p For this split, . • Another Approach: If F and G are regular, then F ∩ G is regular. • D But Pumping Lemma states “If is a regular language, then . . . • Solution: Suppose that F is regular. can split s = xyz satisfying Conditions 1–3.” Let G = { 0n1m | n, m ≥ 0 }. • To get contradiction, need to show cannot split s = xyz G 0∗1∗ satisfying Conditions 1–3. is regular: it has regular expression . Then F ∩ G = { 0n1n | n ≥ 0 }. Need to show every split s = xyz doesn’t satisfy all of 1–3. But know that F ∩ G is not regular. Every split s = xyz satisfying Conditions 2 and 3 must have • F x = aj,y= ak,z= am b3p ap, Conclusion: is not regular. where j + k ≤ p, j + k + m =2p,andk ≥ 1. CS 341: Chapter 1 1-113 CS 341: Chapter 1 1-114 Hierarchy of Languages (so far) Summary of Chapter 1 Examples • DFA is a deterministic machine for recognizing certain languages. All languages • A language is regular if it has a DFA. • The class of regular languages is closed under union, intersection, concatenation, Kleene-star, complementation. • NFA can be nondeterministic: allows choice in how to process string. • Every NFA has an equivalent DFA. • Regular expression is a way of generating certain languages. { 0n1n | n ≥ 0 } • Kleene’s Theorem: Language A has DFA iff A has regular expression. Regular (0 ∪ 1)∗ • Every finite language is regular, but not every regular language is finite. (DFA,NFA,RegExp) • Use pumping lemma to prove certain languages are not regular. Finite { 110, 01 }