Regular Languages and Finite State Automata Data Structures and Algorithms for Computational Linguistics III
Total Page:16
File Type:pdf, Size:1020Kb
Regular Languages and Finite State Automata Data structures and algorithms for Computational Linguistics III Çağrı Çöltekin [email protected] University of Tübingen Seminar für Sprachwissenschaft Winter Semester 2019–2020 Introduction DFA NFA Regular languages Minimization Regular expressions Why study finite-state automata? • Unlike some of the abstract machines we discussed, finite-state automata are efficient models of computation • There are many applications – Electronic circuit design – Workflow management – Games – Pattern matching – … But more importantly ;-) – Tokenization, stemming – Morphological analysis – Shallow parsing/chunking – … Ç. Çöltekin, SfS / University of Tübingen WS 19–20 1 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions Finite-state automata (FSA) • A finite-state machine is in one of a finite-number of states in a giventime • The machine changes its state based on its input • Every regular language is generated/recognized by an FSA • Every FSA generates/recognizes a regular language • Two flavors: – Deterministic finite automata (DFA) – Non-deterministic finite automata (NFA) Note: the NFA is a superset of DFA. Ç. Çöltekin, SfS / University of Tübingen WS 19–20 2 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions DFA as a graph b state • States are represented as nodes 1 b • Transitions are shown by the edges, transition labeled with symbols from an alphabet 0 a • One of the states is marked as the initial state initial state • a Some states are accepting states 2 accepting state Ç. Çöltekin, SfS / University of Tübingen WS 19–20 3 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions DFA: formal definition Formally, a finite state automaton, M, is a tuple (Σ, Q, q0, F, ∆) with Σ is the alphabet, a finite set of symbols Q a finite set of states q0 is the start state, q0 2 Q F is the set of final states, F ⊆ Q ∆ is a function that takes a state and a symbol in the alphabet, and returns another state (∆ : Q × Σ ! Q) At any given time, for any input, a DFA has a single well-defined action to take. Ç. Çöltekin, SfS / University of Tübingen WS 19–20 4 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions DFA: formal definition an example b Σ = fa, bg 1 Q = fq0, q1, q2g b q0 = q0 F = fq g 2 0 a ∆ = f(q0, a) ! q2, (q0, b) ! q1, (q1, a) ! q2, (q1, b) ! q1g a 2 Ç. Çöltekin, SfS / University of Tübingen WS 19–20 5 / 56 – In that case, when we reach a dead end, recognition fails • To make all transitions well-defined, we can add a sink (or error) state • For brevity, we skip the explicit error state Introduction DFA NFA Regular languages Minimization Regular expressions Another note on DFA b • Is this FSA deterministic? 1 b a, b 0 3 a a, b a 2 Ç. Çöltekin, SfS / University of Tübingen WS 19–20 6 / 56 – In that case, when we reach a dead end, recognition fails • For brevity, we skip the explicit error state Introduction DFA NFA Regular languages Minimization Regular expressions Another note on DFA error or sink state b • Is this FSA deterministic? 1 • To make all transitions well-defined, b we can add a sink (or error) state a, b 0 3 a a, b a 2 Ç. Çöltekin, SfS / University of Tübingen WS 19–20 6 / 56 – In that case, when we reach a dead end, recognition fails Introduction DFA NFA Regular languages Minimization Regular expressions Another note on DFA error or sink state b • Is this FSA deterministic? 1 • To make all transitions well-defined, b we can add a sink (or error) state • For brevity, we skip the explicit error a, b state 0 3 a a, b a 2 Ç. Çöltekin, SfS / University of Tübingen WS 19–20 6 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions Another note on DFA error or sink state b • Is this FSA deterministic? 1 • To make all transitions well-defined, b we can add a sink (or error) state • For brevity, we skip the explicit error a, b state 0 3 a – In that case, when we reach a dead end, recognition fails a, b a 2 Ç. Çöltekin, SfS / University of Tübingen WS 19–20 6 / 56 3 3 3 Introduction DFA NFA Regular languages Minimization Regular expressions DFA: the transition table transition table b symbol a b 1 !0 2 1 b 1 2 1 state *2 ?? a, b 0 3 a a, b ! a marks the start state 2 * marks the accepting state(s) Ç. Çöltekin, SfS / University of Tübingen WS 19–20 7 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions DFA: the transition table transition table b symbol a b 1 !0 2 1 b 1 2 1 state *2 3 3 a, b 3 3 3 0 3 a a, b ! a marks the start state 2 * marks the accepting state(s) Ç. Çöltekin, SfS / University of Tübingen WS 19–20 7 / 56 • What is the complexity of the algorithm? • How about inputs: – bbbb – aa Introduction DFA NFA Regular languages Minimization Regular expressions DFA recognition 1. Start at q0 2. Process an input symbol, move accordingly b 3. Accept if in a final state at the end of the input b 1 0 a a 2 Input: b b a Ç. Çöltekin, SfS / University of Tübingen WS 19–20 8 / 56 • What is the complexity of the algorithm? • How about inputs: – bbbb – aa Introduction DFA NFA Regular languages Minimization Regular expressions DFA recognition 1. Start at q0 2. Process an input symbol, move accordingly b 3. Accept if in a final state at the end of the input b 1 0 a a 2 Input: b b a Ç. Çöltekin, SfS / University of Tübingen WS 19–20 8 / 56 • What is the complexity of the algorithm? • How about inputs: – bbbb – aa Introduction DFA NFA Regular languages Minimization Regular expressions DFA recognition 1. Start at q0 2. Process an input symbol, move accordingly b 3. Accept if in a final state at the end of the input b 1 0 a a 2 Input: b b a Ç. Çöltekin, SfS / University of Tübingen WS 19–20 8 / 56 • What is the complexity of the algorithm? • How about inputs: – bbbb – aa Introduction DFA NFA Regular languages Minimization Regular expressions DFA recognition 1. Start at q0 2. Process an input symbol, move accordingly b 3. Accept if in a final state at the end of the input b 1 0 a a 2 Input: b b a Ç. Çöltekin, SfS / University of Tübingen WS 19–20 8 / 56 • What is the complexity of the algorithm? • How about inputs: – bbbb – aa Introduction DFA NFA Regular languages Minimization Regular expressions DFA recognition 1. Start at q0 2. Process an input symbol, move accordingly b 3. Accept if in a final state at the end of the input b 1 0 a a 2 Input: b b a Ç. Çöltekin, SfS / University of Tübingen WS 19–20 8 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions DFA recognition 1. Start at q0 2. Process an input symbol, move accordingly b 3. Accept if in a final state at the end of the input b 1 0 a • What is the complexity of the a 2 algorithm? • How about inputs: – bbbb – aa Input: b b a Ç. Çöltekin, SfS / University of Tübingen WS 19–20 8 / 56 • Can you draw a simpler DFA for the same language? • Draw a DFA recognizing strings with even number of ‘a’s over Σ = fa, bg Introduction DFA NFA Regular languages Minimization Regular expressions A few questions • What is the language recognized by b this FSA? b 1 0 a a 2 Ç. Çöltekin, SfS / University of Tübingen WS 19–20 9 / 56 • Draw a DFA recognizing strings with even number of ‘a’s over Σ = fa, bg Introduction DFA NFA Regular languages Minimization Regular expressions A few questions • What is the language recognized by b this FSA? • Can you draw a simpler DFA for the b 1 same language? 0 a a 2 Ç. Çöltekin, SfS / University of Tübingen WS 19–20 9 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions A few questions • What is the language recognized by b this FSA? • Can you draw a simpler DFA for the b 1 same language? • Draw a DFA recognizing strings 0 a with even number of ‘a’s over f , g Σ = a b a 2 Ç. Çöltekin, SfS / University of Tübingen WS 19–20 9 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions Non-deterministic finite automata Formal definition A non-deterministic finite state automaton, M, is a tuple (Σ, Q, q0, F, ∆) with Σ is the alphabet, a finite set of symbols Q a finite set of states q0 is the start state, q0 2 Q F is the set of final states, F ⊆ Q ∆ is a function from (Q, Σ) to P(Q), power set of Q (∆ : Q × Σ ! P(Q)) Ç. Çöltekin, SfS / University of Tübingen WS 19–20 10 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions An example NFA a,b transition table a,b a,b 1 symbol a b 0 a !0 0,1 0,1 1 1,2 1 a,b 2 state *2 0,2 0 a • We have nondeterminism, e.g., if the first input is a, we need to choose between states 0 or 1 • Transition table cells have sets of states Ç. Çöltekin, SfS / University of Tübingen WS 19–20 11 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions Dealing with non-determinism • Follow one of the links, store alternatives, and backtrack on failure • Follow all options in parallel • Use dynamic programming (e.g., as in chart parsing) Ç.