Deterministic Finite Automata 0 0,1 1

Great Theoretical Ideas in CS V. Adamchik CS 15-251 Outline Lecture 21 Carnegie Mellon University DFAs Finite Automata Regular Languages 0n1n is not regular Union Theorem Kleene’s Theorem NFAs Application: KMP 11 1 Deterministic Finite Automata 0 0,1 1 A machine so simple that you can 0111 111 1 ϵ understand it in just one minute 0 0 1 The machine processes a string and accepts it if the process ends in a double circle The unique string of length 0 will be denoted by ε and will be called the empty or null string accept states (F) Anatomy of a Deterministic Finite start state (q0) 11 0 Automaton 0,1 1 The singular of automata is automaton. 1 The alphabet Σ of a finite automaton is the 0111 111 1 ϵ set where the symbols come from, for 0 0 example {0,1} transitions 1 The language L(M) of a finite automaton is the set of strings that it accepts states L(M) = {x∈Σ: M accepts x} The machine accepts a string if the process It’s also called the ends in an accept state (double circle) “language decided/accepted by M”. 1 The Language L(M) of Machine M The Language L(M) of Machine M 0 0 0 0,1 1 q 0 q1 q0 1 1 What language does this DFA decide/accept? L(M) = All strings of 0s and 1s The language of a finite automaton is the set of strings that it accepts L(M) = { w | w has an even number of 1s} M = (Q, Σ, , q0, F) Q = {q0, q1, q2, q3} Formal definition of DFAs where Σ = {0,1} A finite automaton is a 5-tuple M = (Q, Σ, , q0, F) q0 Q is start state Q is the finite set of states F = {q1, q2} Q accept states : Q Σ → Q transition function Σ is the alphabet : Q Σ → Q is the transition function q 1 0 1 0 1 0,1 q0 Q is the start state q0 q0 q1 1 q q1 q2 q2 F Q is the set of accept states 0 M q2 0 0 q2 q3 q2 q q q 1 3 0 2 L(M) = the language of machine M q3 = set of all strings machine M accepts EXAMPLE Determine the language An automaton that accepts all recognized by and only those strings that contain 001 1 0,1 0,1 0 1 0 0 0 1 {0} {00} {001} 1 L(M)={1,11,111, …} 2 Membership problem Determine the language decided by Determine whether some word belongs to the language. 0 0,1 0 1 0,1 1 L(M)={1, 01} Regular Languages DFA Membership problem A language over Σ is a set of strings over Σ Determine whether some word belongs to the language. A language L ⊆ Σ is regular if it is recognized by a deterministic finite automaton Theorem: The DFA Membership Problem is A language L ⊆ Σ is regular if there is solvable in linear time. a DFA which decides it. Let M = (Q, Σ, , q0, F) and w = w1...wm. L = { w | w contains 001} is regular Algorithm for DFA M: p := q0; L = { w | w has an even number of 1s} is regular for i := 1 to m do p := (p,wi); if pF then return Yes else return No. Theorem: L = {0n1n : n∈ℕ} is not regular Are all languages regular? Notation: If a∈Σ is a symbol and n∈N then an denotes the string aaa∙∙∙a (n times). E.g., a3 means aaa, a5 means aaaaa, 1 0 a means a, a means ϵ, etc. Theorem: Any finite language is regular Thus L = {ϵ, 01, 0011, 000111, 00001111, …}. 3 Theorem: L = {0n1n : n∈ℕ} is not regular L = strings where the number of occurrences of 01 is equal to the number of occurrences of 10 Wrong Intuition: 1 0 For a DFA to decide L, it seems like it needs 1 to “remember” how many 0’s it sees at the 0 0 beginning of the string, so that it can 1 0 “check” there are equally many 1’s. 0 But a DFA has only finitely many states — 1 1 shouldn’t be able to handle arbitrary n. M accepts only the strings with an equal number of 01’s and 10’s! For example, 010110 How to prove a language is not Theorem: L = {0n1n : n∈ℕ} is not regular regular… Full proof: Assume for contradiction there is a DFA M with L(M) = L. Suppose M is a DFA deciding L with, say, k states. i Argue (usually by Pigeonhole) there are two Let ri be the state M reaches after processing 0 . strings x and y which reach the same state in M. By Pigeonhole, there is a repeat among r0, r1, r2, …, rk. So say that rs = rt for some s ≠ t. Show there is a string z such that xz∈L but yz∉L. Contradiction, since M accepts either both (or s s s neither.) Since 0 1 ∈ L, starting from rs and processing 1 causes M to reach an accepting state. Theorem: L = {0n1n : n∈ℕ} is not regular Regular Languages Full proof: So on input 0s1s ∈ L, M will reach an accepting state. Definition: A language L ⊆ Σ is regular if there is a DFA which decides it. Consider input 0t1s ∉ L, s≠t. t M will process 0 , reach state rt = rs Questions: 1. Are all languages regular? then M will process 1s, and reach an accepting state. 2. Are there other ways to tell if L is regular? Contradiction! 4 Equivalence of two DFAs Union Theorem Definition: Two DFAs M and M over the same 1 2 Given two languages, L and L , define alphabet are equivalent if they 1 2 the union of L1 and L2 as accept the same language: L(M1) = L(M2). L1 L2 = { w | w L1 or w L2 } Given a few equivalent machines, we are Theorem: The union of two regular naturally interested in the smallest one languages is also a regular language. with the least number of states. Theorem: The union of two regular Union Theorem languages is also a regular language L1 = strings with qeven 0 Proof (Sketch): Let even # of 1’s M1 M1 = (Q1, Σ, 1, q0, F1) be finite automaton for L1 L2 = strings x with and 1 1 |x| div. by 3 M2 = (Q2, Σ, 2, q0, F2) be finite automaton for L2 qodd 0 We want to construct a finite automaton M = (Q, Σ, , q0, F) that recognizes L = L1 L2 M 2 0,1 0,1 Idea: Run both M and M at the same time. p0 p1 p2 1 2 0,1 Union Theorem Union Theorem qeven 0 qeven 0 M1 M1 Input: 101001 Input: 101001 1 1 1 1 qodd 0 qodd 0 M M 2 0,1 0,1 2 0,1 0,1 p0 p1 p2 p0 p1 p2 0,1 0,1 5 Union Theorem Union Theorem qeven 0 qeven 0 M1 M1 Input: 101001 Input: 101001 1 1 1 1 qodd 0 qodd 0 M M 2 0,1 0,1 2 0,1 0,1 p0 p1 p2 p0 p1 p2 0,1 0,1 Union Theorem Union Theorem qeven 0 qeven 0 M1 M1 Input: 101001 Input: 101001 1 1 1 1 qodd 0 qodd 0 M M 2 0,1 0,1 2 0,1 0,1 p0 p1 p2 p0 p1 p2 0,1 0,1 Union Theorem Union Theorem Make a DFA keeping qeven 0 qeven 0 M1 track of both at once. M1 Input: 101001 1 1 1 1 Accept. qodd 0 qodd 0 M M 2 0,1 0,1 2 0,1 0,1 p0 p1 p2 p0 p1 p2 0,1 0,1 6 Union Theorem The Regular Operations Q = pairs of states, one from M and one from M 1 2 Union: A B = { w | w A or w B } = { (q1, q2) | q1 Q1 and q2 Q2 } = Q Q Intersection: A B = { w | w A and w B } 1 2 0 0 0 Negation: A = { w | w A } q p q p q p even, 0 even, 1 even, 2 R Reverse: A = { w1 …wk | wk …w1 A } 1 1 Concatenation: A B = { vw | v A and w B } Star: A* = { w …w | k ≥ 0 and each w A } qodd, p0 0 qodd, p1 0 qodd, p2 1 k i 1 1 The Kleene closure: A* The Kleene closure: A* Star: A* = { w1 …wk | k ≥ 0 and each wi A } What is A* of A={0,1}? From the definition of the concatenation, All binary strings we definite An, n =0, 1, 2, … recursively A0 = {ε} An+1 = An A What is A* of A={11}? A* is a set consisting of concatenations of arbitrary many strings from A. All binary strings of an even number of 1s A* UAk k 0 Regular Languages Are Closed The Kleene Theorem (1956) Under The Regular Operations An axiomatic system for regular languages Every regular language over Σ can be constructed from ∅ and {a}, a ∈ Σ, using only Vocabulary: Languages over alphabet Σ the operations union, concatenation Axioms: ∅, {a} for each a∈Σ and Kleene star. Deduction rules: Given L1, L2, can obtain L1 ⋃ L2 Given L1, L2, can obtain L1 ⋅ L2 * Given L, can obtain L 7 Reverse Nondeterministic finite automaton R Reverse: A = { w1 …wk | wk …w1 A } There is another type How to construct a DFA for the reversal machine in which there a of a language? may be several possible a 1 0,1 next states.

Deterministic Finite Automata 0 0,1 1

Midterm Study Sheet for CS3719 Regular Languages and Finite

Chapter 6 Formal Language Theory

6.045 Final Exam Name

1 Alphabets and Languages

Context Free Languages II Announcement Announcement

Kleene Algebra with Tests and Coq Tools for While Programs Damien Pous

(A) for Any Regular Expression R, the Set L(R) of Strings

Context-Free Languages ∗

Formal Languages We’Ll Use the English Language As a Running Example

Finite-State Automata and Algorithms

Using Jflap in Academics to Understand Automata Theory in a Better Way

Chapter 3: Lexing and Parsing