Deterministic Finite Automata 0 0,1 1
Total Page:16
File Type:pdf, Size:1020Kb
Great Theoretical Ideas in CS V. Adamchik CS 15-251 Outline Lecture 21 Carnegie Mellon University DFAs Finite Automata Regular Languages 0n1n is not regular Union Theorem Kleene’s Theorem NFAs Application: KMP 11 1 Deterministic Finite Automata 0 0,1 1 A machine so simple that you can 0111 111 1 ϵ understand it in just one minute 0 0 1 The machine processes a string and accepts it if the process ends in a double circle The unique string of length 0 will be denoted by ε and will be called the empty or null string accept states (F) Anatomy of a Deterministic Finite start state (q0) 11 0 Automaton 0,1 1 The singular of automata is automaton. 1 The alphabet Σ of a finite automaton is the 0111 111 1 ϵ set where the symbols come from, for 0 0 example {0,1} transitions 1 The language L(M) of a finite automaton is the set of strings that it accepts states L(M) = {x∈Σ: M accepts x} The machine accepts a string if the process It’s also called the ends in an accept state (double circle) “language decided/accepted by M”. 1 The Language L(M) of Machine M The Language L(M) of Machine M 0 0 0 0,1 1 q 0 q1 q0 1 1 What language does this DFA decide/accept? L(M) = All strings of 0s and 1s The language of a finite automaton is the set of strings that it accepts L(M) = { w | w has an even number of 1s} M = (Q, Σ, , q0, F) Q = {q0, q1, q2, q3} Formal definition of DFAs where Σ = {0,1} A finite automaton is a 5-tuple M = (Q, Σ, , q0, F) q0 Q is start state Q is the finite set of states F = {q1, q2} Q accept states : Q Σ → Q transition function Σ is the alphabet : Q Σ → Q is the transition function q 1 0 1 0 1 0,1 q0 Q is the start state q0 q0 q1 1 q q1 q2 q2 F Q is the set of accept states 0 M q2 0 0 q2 q3 q2 q q q 1 3 0 2 L(M) = the language of machine M q3 = set of all strings machine M accepts EXAMPLE Determine the language An automaton that accepts all recognized by and only those strings that contain 001 1 0,1 0,1 0 1 0 0 0 1 {0} {00} {001} 1 L(M)={1,11,111, …} 2 Membership problem Determine the language decided by Determine whether some word belongs to the language. 0 0,1 0 1 0,1 1 L(M)={1, 01} Regular Languages DFA Membership problem A language over Σ is a set of strings over Σ Determine whether some word belongs to the language. A language L ⊆ Σ is regular if it is recognized by a deterministic finite automaton Theorem: The DFA Membership Problem is A language L ⊆ Σ is regular if there is solvable in linear time. a DFA which decides it. Let M = (Q, Σ, , q0, F) and w = w1...wm. L = { w | w contains 001} is regular Algorithm for DFA M: p := q0; L = { w | w has an even number of 1s} is regular for i := 1 to m do p := (p,wi); if pF then return Yes else return No. Theorem: L = {0n1n : n∈ℕ} is not regular Are all languages regular? Notation: If a∈Σ is a symbol and n∈N then an denotes the string aaa∙∙∙a (n times). E.g., a3 means aaa, a5 means aaaaa, 1 0 a means a, a means ϵ, etc. Theorem: Any finite language is regular Thus L = {ϵ, 01, 0011, 000111, 00001111, …}. 3 Theorem: L = {0n1n : n∈ℕ} is not regular L = strings where the number of occurrences of 01 is equal to the number of occurrences of 10 Wrong Intuition: 1 0 For a DFA to decide L, it seems like it needs 1 to “remember” how many 0’s it sees at the 0 0 beginning of the string, so that it can 1 0 “check” there are equally many 1’s. 0 But a DFA has only finitely many states — 1 1 shouldn’t be able to handle arbitrary n. M accepts only the strings with an equal number of 01’s and 10’s! For example, 010110 How to prove a language is not Theorem: L = {0n1n : n∈ℕ} is not regular regular… Full proof: Assume for contradiction there is a DFA M with L(M) = L. Suppose M is a DFA deciding L with, say, k states. i Argue (usually by Pigeonhole) there are two Let ri be the state M reaches after processing 0 . strings x and y which reach the same state in M. By Pigeonhole, there is a repeat among r0, r1, r2, …, rk. So say that rs = rt for some s ≠ t. Show there is a string z such that xz∈L but yz∉L. Contradiction, since M accepts either both (or s s s neither.) Since 0 1 ∈ L, starting from rs and processing 1 causes M to reach an accepting state. Theorem: L = {0n1n : n∈ℕ} is not regular Regular Languages Full proof: So on input 0s1s ∈ L, M will reach an accepting state. Definition: A language L ⊆ Σ is regular if there is a DFA which decides it. Consider input 0t1s ∉ L, s≠t. t M will process 0 , reach state rt = rs Questions: 1. Are all languages regular? then M will process 1s, and reach an accepting state. 2. Are there other ways to tell if L is regular? Contradiction! 4 Equivalence of two DFAs Union Theorem Definition: Two DFAs M and M over the same 1 2 Given two languages, L and L , define alphabet are equivalent if they 1 2 the union of L1 and L2 as accept the same language: L(M1) = L(M2). L1 L2 = { w | w L1 or w L2 } Given a few equivalent machines, we are Theorem: The union of two regular naturally interested in the smallest one languages is also a regular language. with the least number of states. Theorem: The union of two regular Union Theorem languages is also a regular language L1 = strings with qeven 0 Proof (Sketch): Let even # of 1’s M1 M1 = (Q1, Σ, 1, q0, F1) be finite automaton for L1 L2 = strings x with and 1 1 |x| div. by 3 M2 = (Q2, Σ, 2, q0, F2) be finite automaton for L2 qodd 0 We want to construct a finite automaton M = (Q, Σ, , q0, F) that recognizes L = L1 L2 M 2 0,1 0,1 Idea: Run both M and M at the same time. p0 p1 p2 1 2 0,1 Union Theorem Union Theorem qeven 0 qeven 0 M1 M1 Input: 101001 Input: 101001 1 1 1 1 qodd 0 qodd 0 M M 2 0,1 0,1 2 0,1 0,1 p0 p1 p2 p0 p1 p2 0,1 0,1 5 Union Theorem Union Theorem qeven 0 qeven 0 M1 M1 Input: 101001 Input: 101001 1 1 1 1 qodd 0 qodd 0 M M 2 0,1 0,1 2 0,1 0,1 p0 p1 p2 p0 p1 p2 0,1 0,1 Union Theorem Union Theorem qeven 0 qeven 0 M1 M1 Input: 101001 Input: 101001 1 1 1 1 qodd 0 qodd 0 M M 2 0,1 0,1 2 0,1 0,1 p0 p1 p2 p0 p1 p2 0,1 0,1 Union Theorem Union Theorem Make a DFA keeping qeven 0 qeven 0 M1 track of both at once. M1 Input: 101001 1 1 1 1 Accept. qodd 0 qodd 0 M M 2 0,1 0,1 2 0,1 0,1 p0 p1 p2 p0 p1 p2 0,1 0,1 6 Union Theorem The Regular Operations Q = pairs of states, one from M and one from M 1 2 Union: A B = { w | w A or w B } = { (q1, q2) | q1 Q1 and q2 Q2 } = Q Q Intersection: A B = { w | w A and w B } 1 2 0 0 0 Negation: A = { w | w A } q p q p q p even, 0 even, 1 even, 2 R Reverse: A = { w1 …wk | wk …w1 A } 1 1 Concatenation: A B = { vw | v A and w B } Star: A* = { w …w | k ≥ 0 and each w A } qodd, p0 0 qodd, p1 0 qodd, p2 1 k i 1 1 The Kleene closure: A* The Kleene closure: A* Star: A* = { w1 …wk | k ≥ 0 and each wi A } What is A* of A={0,1}? From the definition of the concatenation, All binary strings we definite An, n =0, 1, 2, … recursively A0 = {ε} An+1 = An A What is A* of A={11}? A* is a set consisting of concatenations of arbitrary many strings from A. All binary strings of an even number of 1s A* UAk k 0 Regular Languages Are Closed The Kleene Theorem (1956) Under The Regular Operations An axiomatic system for regular languages Every regular language over Σ can be constructed from ∅ and {a}, a ∈ Σ, using only Vocabulary: Languages over alphabet Σ the operations union, concatenation Axioms: ∅, {a} for each a∈Σ and Kleene star. Deduction rules: Given L1, L2, can obtain L1 ⋃ L2 Given L1, L2, can obtain L1 ⋅ L2 * Given L, can obtain L 7 Reverse Nondeterministic finite automaton R Reverse: A = { w1 …wk | wk …w1 A } There is another type How to construct a DFA for the reversal machine in which there a of a language? may be several possible a 1 0,1 next states.