Regular Languages and Finite State Automata Data Structures and Algorithms for Computational Linguistics III

Total Page:16

File Type:pdf, Size:1020Kb

Regular Languages and Finite State Automata Data Structures and Algorithms for Computational Linguistics III Regular Languages and Finite State Automata Data structures and algorithms for Computational Linguistics III Çağrı Çöltekin [email protected] University of Tübingen Seminar für Sprachwissenschaft Winter Semester 2019–2020 Introduction DFA NFA Regular languages Minimization Regular expressions Why study finite-state automata? • Unlike some of the abstract machines we discussed, finite-state automata are efficient models of computation • There are many applications – Electronic circuit design – Workflow management – Games – Pattern matching – … But more importantly ;-) – Tokenization, stemming – Morphological analysis – Shallow parsing/chunking – … Ç. Çöltekin, SfS / University of Tübingen WS 19–20 1 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions Finite-state automata (FSA) • A finite-state machine is in one of a finite-number of states in a giventime • The machine changes its state based on its input • Every regular language is generated/recognized by an FSA • Every FSA generates/recognizes a regular language • Two flavors: – Deterministic finite automata (DFA) – Non-deterministic finite automata (NFA) Note: the NFA is a superset of DFA. Ç. Çöltekin, SfS / University of Tübingen WS 19–20 2 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions DFA as a graph b state • States are represented as nodes 1 b • Transitions are shown by the edges, transition labeled with symbols from an alphabet 0 a • One of the states is marked as the initial state initial state • a Some states are accepting states 2 accepting state Ç. Çöltekin, SfS / University of Tübingen WS 19–20 3 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions DFA: formal definition Formally, a finite state automaton, M, is a tuple (Σ, Q, q0, F, ∆) with Σ is the alphabet, a finite set of symbols Q a finite set of states q0 is the start state, q0 2 Q F is the set of final states, F ⊆ Q ∆ is a function that takes a state and a symbol in the alphabet, and returns another state (∆ : Q × Σ ! Q) At any given time, for any input, a DFA has a single well-defined action to take. Ç. Çöltekin, SfS / University of Tübingen WS 19–20 4 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions DFA: formal definition an example b Σ = fa, bg 1 Q = fq0, q1, q2g b q0 = q0 F = fq g 2 0 a ∆ = f(q0, a) ! q2, (q0, b) ! q1, (q1, a) ! q2, (q1, b) ! q1g a 2 Ç. Çöltekin, SfS / University of Tübingen WS 19–20 5 / 56 – In that case, when we reach a dead end, recognition fails • To make all transitions well-defined, we can add a sink (or error) state • For brevity, we skip the explicit error state Introduction DFA NFA Regular languages Minimization Regular expressions Another note on DFA b • Is this FSA deterministic? 1 b a, b 0 3 a a, b a 2 Ç. Çöltekin, SfS / University of Tübingen WS 19–20 6 / 56 – In that case, when we reach a dead end, recognition fails • For brevity, we skip the explicit error state Introduction DFA NFA Regular languages Minimization Regular expressions Another note on DFA error or sink state b • Is this FSA deterministic? 1 • To make all transitions well-defined, b we can add a sink (or error) state a, b 0 3 a a, b a 2 Ç. Çöltekin, SfS / University of Tübingen WS 19–20 6 / 56 – In that case, when we reach a dead end, recognition fails Introduction DFA NFA Regular languages Minimization Regular expressions Another note on DFA error or sink state b • Is this FSA deterministic? 1 • To make all transitions well-defined, b we can add a sink (or error) state • For brevity, we skip the explicit error a, b state 0 3 a a, b a 2 Ç. Çöltekin, SfS / University of Tübingen WS 19–20 6 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions Another note on DFA error or sink state b • Is this FSA deterministic? 1 • To make all transitions well-defined, b we can add a sink (or error) state • For brevity, we skip the explicit error a, b state 0 3 a – In that case, when we reach a dead end, recognition fails a, b a 2 Ç. Çöltekin, SfS / University of Tübingen WS 19–20 6 / 56 3 3 3 Introduction DFA NFA Regular languages Minimization Regular expressions DFA: the transition table transition table b symbol a b 1 !0 2 1 b 1 2 1 state *2 ?? a, b 0 3 a a, b ! a marks the start state 2 * marks the accepting state(s) Ç. Çöltekin, SfS / University of Tübingen WS 19–20 7 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions DFA: the transition table transition table b symbol a b 1 !0 2 1 b 1 2 1 state *2 3 3 a, b 3 3 3 0 3 a a, b ! a marks the start state 2 * marks the accepting state(s) Ç. Çöltekin, SfS / University of Tübingen WS 19–20 7 / 56 • What is the complexity of the algorithm? • How about inputs: – bbbb – aa Introduction DFA NFA Regular languages Minimization Regular expressions DFA recognition 1. Start at q0 2. Process an input symbol, move accordingly b 3. Accept if in a final state at the end of the input b 1 0 a a 2 Input: b b a Ç. Çöltekin, SfS / University of Tübingen WS 19–20 8 / 56 • What is the complexity of the algorithm? • How about inputs: – bbbb – aa Introduction DFA NFA Regular languages Minimization Regular expressions DFA recognition 1. Start at q0 2. Process an input symbol, move accordingly b 3. Accept if in a final state at the end of the input b 1 0 a a 2 Input: b b a Ç. Çöltekin, SfS / University of Tübingen WS 19–20 8 / 56 • What is the complexity of the algorithm? • How about inputs: – bbbb – aa Introduction DFA NFA Regular languages Minimization Regular expressions DFA recognition 1. Start at q0 2. Process an input symbol, move accordingly b 3. Accept if in a final state at the end of the input b 1 0 a a 2 Input: b b a Ç. Çöltekin, SfS / University of Tübingen WS 19–20 8 / 56 • What is the complexity of the algorithm? • How about inputs: – bbbb – aa Introduction DFA NFA Regular languages Minimization Regular expressions DFA recognition 1. Start at q0 2. Process an input symbol, move accordingly b 3. Accept if in a final state at the end of the input b 1 0 a a 2 Input: b b a Ç. Çöltekin, SfS / University of Tübingen WS 19–20 8 / 56 • What is the complexity of the algorithm? • How about inputs: – bbbb – aa Introduction DFA NFA Regular languages Minimization Regular expressions DFA recognition 1. Start at q0 2. Process an input symbol, move accordingly b 3. Accept if in a final state at the end of the input b 1 0 a a 2 Input: b b a Ç. Çöltekin, SfS / University of Tübingen WS 19–20 8 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions DFA recognition 1. Start at q0 2. Process an input symbol, move accordingly b 3. Accept if in a final state at the end of the input b 1 0 a • What is the complexity of the a 2 algorithm? • How about inputs: – bbbb – aa Input: b b a Ç. Çöltekin, SfS / University of Tübingen WS 19–20 8 / 56 • Can you draw a simpler DFA for the same language? • Draw a DFA recognizing strings with even number of ‘a’s over Σ = fa, bg Introduction DFA NFA Regular languages Minimization Regular expressions A few questions • What is the language recognized by b this FSA? b 1 0 a a 2 Ç. Çöltekin, SfS / University of Tübingen WS 19–20 9 / 56 • Draw a DFA recognizing strings with even number of ‘a’s over Σ = fa, bg Introduction DFA NFA Regular languages Minimization Regular expressions A few questions • What is the language recognized by b this FSA? • Can you draw a simpler DFA for the b 1 same language? 0 a a 2 Ç. Çöltekin, SfS / University of Tübingen WS 19–20 9 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions A few questions • What is the language recognized by b this FSA? • Can you draw a simpler DFA for the b 1 same language? • Draw a DFA recognizing strings 0 a with even number of ‘a’s over f , g Σ = a b a 2 Ç. Çöltekin, SfS / University of Tübingen WS 19–20 9 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions Non-deterministic finite automata Formal definition A non-deterministic finite state automaton, M, is a tuple (Σ, Q, q0, F, ∆) with Σ is the alphabet, a finite set of symbols Q a finite set of states q0 is the start state, q0 2 Q F is the set of final states, F ⊆ Q ∆ is a function from (Q, Σ) to P(Q), power set of Q (∆ : Q × Σ ! P(Q)) Ç. Çöltekin, SfS / University of Tübingen WS 19–20 10 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions An example NFA a,b transition table a,b a,b 1 symbol a b 0 a !0 0,1 0,1 1 1,2 1 a,b 2 state *2 0,2 0 a • We have nondeterminism, e.g., if the first input is a, we need to choose between states 0 or 1 • Transition table cells have sets of states Ç. Çöltekin, SfS / University of Tübingen WS 19–20 11 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions Dealing with non-determinism • Follow one of the links, store alternatives, and backtrack on failure • Follow all options in parallel • Use dynamic programming (e.g., as in chart parsing) Ç.
Recommended publications
  • State Minimization Problems in Finite State Automata
    Graduate Theses, Dissertations, and Problem Reports 2007 State minimization problems in finite state automata Chris Tauras West Virginia University Follow this and additional works at: https://researchrepository.wvu.edu/etd Recommended Citation Tauras, Chris, "State minimization problems in finite state automata" (2007). Graduate Theses, Dissertations, and Problem Reports. 2541. https://researchrepository.wvu.edu/etd/2541 This Thesis is protected by copyright and/or related rights. It has been brought to you by the The Research Repository @ WVU with permission from the rights-holder(s). You are free to use this Thesis in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you must obtain permission from the rights-holder(s) directly, unless additional rights are indicated by a Creative Commons license in the record and/ or on the work itself. This Thesis has been accepted for inclusion in WVU Graduate Theses, Dissertations, and Problem Reports collection by an authorized administrator of The Research Repository @ WVU. For more information, please contact [email protected]. State Minimization Problems in Finite State Automata by Chris Tauras Thesis submitted to the College of Engineering and Mineral Resources at West Virginia University in partial fulfillment of the requirements for the degree of Master of Science in Computer Science K. Subramani, Ph.D., Chair A. Ross, Ph.D. B. Cukic, Ph.D. Lane Department of Computer Science and Electrical Engineering Morgantown, West Virginia 2007 Keywords: Finite Automata, State Minimization Copyright 2007 Chris Tauras Abstract State Minimization Problems in Finite State Automata by Chris Tauras Master of Science in Computer Science West Virginia University K.
    [Show full text]
  • Automata Theory 4Th
    AUTOMATA THEORY Digital Notes By BIGHNARAJ NAIK Assistant Professor Department of Master in Computer Application VSSUT, Burla Syllabus 4th SEMESTER MCA F.M : 70 MCA 207 AUTOMATA THEORY (3-1-0)Cr.-4 Module – I Introduction to Automata : The Methods Introduction to Finite Automata, Structural Representations, Automata and Complexity. Proving Equivalences about Sets, The Contrapositive, Proof by Contradiction, Inductive Proofs : General Concepts of Automata Theory: Alphabets Strings, Languages, Applications of Automata Theory. Module – II Finite Automata : The Ground Rules, The Protocol, Deterministic Finite Automata: Definition of a Deterministic Finite Automata, How a DFA Processes Strings, Simpler Notations for DFA’s, Extending the Transition Function to Strings, The Language of a DFA Nondeterministic Finite Automata : An Informal View. The Extended Transition Function, The Languages of an NFA, Equivalence of Deterministic and Nondeterministic Finite Automata. Finite Automata With Epsilon-Transitions: Uses of ∈-Transitions, The Formal Notation for an ∈-NFA, Epsilon-Closures, Extended Transitions and Languages for ∈-NFA’s, Eliminating ∈- Transitions. Module – III Regular Expressions and Languages : Regular Expressions: The Operators of regular Expressions, Building Regular Expressions, Precedence of Regular-Expression Operators, Precedence of Regular-Expression Operators Finite Automata and Regular Expressions: From DFA’s to Regular Expressions, Converting DFA’s to Regular Expressions, Converting DFA’s to Regular Expressions by Eliminating
    [Show full text]
  • On the Computational Complexity of Partial Word Automata Problems
    I F I G Institut fur¨ Informatik R esearch R eport On the Computational Complexity of Partial Word Automata Problems Markus Holzer Sebastian Jakobi Matthias Wendlandt IFIG Research Report 1404 May 2014 Institut f¨ur Informatik JLU Gießen Arndtstraße 2 35392 Giessen, Germany Tel: +49-641-99-32141 Fax: +49-641-99-32149 [email protected] www.informatik.uni-giessen.de IFIG Research Report IFIG Research Report 1404, May 2014 On the Computational Complexity of Partial Word Automata Problems Markus Holzer,1 Sebastian Jakobi,2 and Matthias Wendlandt3 Institut f¨ur Informatik, Universit¨at Giessen Arndtstraße 2, 35392 Giessen, Germany Abstract. We consider computational complexity of problems related to partial word au- tomata. Roughly speaking, a partial word is a word in which some positions are unspecified, and a partial word automaton is a finite automaton that accepts a partial word language— here the unspecified positions in the word are represented by a “hole” symbol ⋄. A partial word language L′ can be transformed into an ordinary language L by using a ⋄-substitution. In particular, we investigate the complexity of the compression or minimization problem for partial word automata, which is known to be NP-hard. We improve on the previously known complexity on this problem, by showing PSPACE-completeness. In fact, it turns out that almost all problems related to partial word automata, such as, e.g., equivalence and universality, are already PSPACE-complete. Moreover, we also study these problems under the further restriction that the involved automata accept only finite languages. In this case, the complexity of the studied problems drop from PSPACE-completeness down to P coNP-hardness and containment in Σ2 depending on the problem investigated.
    [Show full text]
  • Applied Automata Theory
    Applied Automata Theory Prof. Dr. Wolfgang Thomas RWTH Aachen Course Notes compiled by Thierry Cachat Kostas Papadimitropoulos Markus Schl¨utter Stefan W¨ohrle November 2, 2005 2 i Note These notes are based on the courses “Applied Automata Theory” and “Angewandte Automatentheorie” given by Prof. W. Thomas at RWTH Aachen University in 2002 and 2003. In 2005 they have been revised and corrected but still they may contain mistakes of any kind. Please report any bugs you find, comments, and proposals on what could be improved to [email protected] ii Contents 0 Introduction 1 0.1 Notation.............................. 6 0.2 Nondeterministic Finite Automata . 6 0.3 Deterministic Finite Automata . 7 1 Automata and Logical Specifications 9 1.1 MSO-Logicoverwords ...................... 9 1.2 TheEquivalenceTheorem . 16 1.3 Consequences and Applications in Model Checking . 28 1.4 First-OrderDefinability . 31 1.4.1 Star-freeExpressions. 31 1.4.2 TemporalLogicLTL . 32 1.5 Between FO- and MSO-definability . 35 1.6 Exercises ............................. 39 2 Congruences and Minimization 43 2.1 Homomorphisms, Quotients and Abstraction . 43 2.2 Minimization and Equivalence of DFAs . 48 2.3 Equivalence and Reduction of NFAs . 57 2.4 TheSyntacticMonoid . 67 2.5 Exercises ............................. 74 3 Tree Automata 77 3.1 TreesandTreeLanguages . 78 3.2 TreeAutomata .......................... 82 3.2.1 Deterministic Tree Automata . 82 3.2.2 Nondeterministic Tree Automata . 88 3.2.3 Emptiness, Congruences and Minimization . 91 3.3 Logic-OrientedFormalismsoverTrees . 94 3.3.1 RegularExpressions . 94 3.3.2 MSO-Logic ........................ 99 3.4 XML-Documents and Tree Automata . 102 3.5 Automata over Two-Dimensional Words (Pictures) .
    [Show full text]
  • Minimization of Finite State Automata Through Partition Aggregation
    Minimization of finite state automata through partition aggregation Johanna Bj¨orklund1 and Loek Cleophas1;2 1 Ume˚aUniversity, Dept. of Computing Science, Ume˚a,Sweden 2 Stellenbosch University, Dept. of Information Science, Stellenbosch, South Africa [email protected], [email protected] Abstract. We present a minimization algorithm for finite state au- tomata that finds and merges bisimulation-equivalent states, identified through partition aggregation. We show the algorithm to be correct and run in time On2d2 jΣj, where n is the number of states of the input automaton M, d is the maximal outdegree in the transition graph for any combination of state and input symbol, and jΣj is the size of the input alphabet. The algorithm is slower than those based on partition refinement, but has the advantage that intermediate solutions are also language equivalent to M. As a result, the algorithm can be interrupted or put on hold as needed, and the derived automaton is still useful. Fur- thermore, the algorithm essentially searches for the maximal model of a characteristic formula for M, so many of the optimisation techniques used to gain efficiency in SAT solvers are likely to apply. 1 Introduction Finite-state automata form one of the key concepts of theoretical computer sci- ence, and their computational and representational complexity is the subject of a great body of work. In the case of deterministic finite state automata (dfa), there is for every dfa M a minimal language-equivalent dfa M 0, and this M 0 is canon- ical with respect to the recognized language. In the general, non-deterministic case, no analogous result exists.
    [Show full text]
  • Algorithms for Testing Equivalence of Finite Automata, with a Grading Tool for JFLAP Daphne Norton
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by RIT Scholar Works Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 2009 Algorithms for testing equivalence of finite automata, with a grading tool for JFLAP Daphne Norton Follow this and additional works at: http://scholarworks.rit.edu/theses Recommended Citation Norton, Daphne, "Algorithms for testing equivalence of finite automata, with a grading tool for JFLAP" (2009). Thesis. Rochester Institute of Technology. Accessed from This Master's Project is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact [email protected]. Rochester Institute of Technology - Department of Computer Science M.S. Project Report: Algorithms for Testing Equivalence of Finite Automata, with a Grading Tool for JFLAP Daphne A. Norton http://www.cs.rit.edu/~dan0337 [email protected] March 22, 2009 Chair: Prof. Edith Hemaspaandra Date Reader: Prof. Ivona Bezakova Date Observer: Prof. Christopher M. Homan Date 1 Contents 1 Abstract 4 2 Preliminaries 4 2.1 Definitions . 4 2.2 JFLAP Diagrams of Finite Automata . 6 3 Determining Equivalence of Finite Automata 9 3.1 Equivalence to Grade Assignments . 9 3.2 Converting NFAs to DFAs: The Subset Construction . 10 3.2.1 The Reversing Minimization Algorithm . 15 3.3 Equivalence-Testing Algorithms . 16 3.3.1 Symmetric Difference with Emptiness Testing . 17 3.3.2 The Table-Filling Algorithm .
    [Show full text]
  • Descriptional and Computational Complexity of Finite Automata—A
    Information and Computation 209 (2011) 456–470 Contents lists available at ScienceDirect Information and Computation journal homepage:www.elsevier.com/locate/ic < Descriptional and computational complexity of finite automata—A survey ∗ Markus Holzer ,MartinKutrib Institut für Informatik, Universität Giessen, Arndtstr. 2, 35392 Giessen, Germany ARTICLE INFO ABSTRACT Article history: Finite automata are probably best known for being equivalent to right-linear context-free Availableonline18November2010 grammars and, thus, for capturing the lowest level of the Chomsky-hierarchy, the family of regular languages. Over the last half century, a vast literature documenting the impor- Keywords: tance of deterministic, nondeterministic, and alternating finite automata as an enormously Finite automata valuable concept has been developed. In the present paper, we tour a fragment of this lit- Determinism Nondeterminism erature. Mostly, we discuss developments relevant to finite automata related problems like, Alternation for example, (i) simulation of and by several types of finite automata, (ii) standard automata Descriptional complexity problems such as fixed and general membership, emptiness, universality, equivalence, and Computational complexity related problems, and (iii) minimization and approximation. We thus come across descrip- Survey tional and computational complexity issues of finite automata. We do not prove these results but we merely draw attention to the big picture and some of the main ideas involved. © 2010 Elsevier Inc. All rights reserved. 1. Introduction Nondeterministic finite automata (NFAs) were introduced in [75], where their equivalence to deterministic finite automata (DFAs) was shown. Later, the concept of alternation was developed in [14], where also alternating finite automata (AFAs) were investigated, which turned out to be equivalent to DFAs, too.
    [Show full text]
  • Efficient Deterministic Finite Automata Minimization Based on Backward Depth Information
    RESEARCH ARTICLE Efficient Deterministic Finite Automata Minimization Based on Backward Depth Information Desheng Liu*, Zhiping Huang, Yimeng Zhang, Xiaojun Guo, Shaojing Su College of Mechatronics and Automation, National University of Defense Technology, ChangSha 410073, China * [email protected] Abstract Obtaining a minimal automaton is a fundamental issue in the theory and practical imple- a11111 mentation of deterministic finite automatons (DFAs). A minimization algorithm is presented in this paper that consists of two main phases. In the first phase, the backward depth infor- mation is built, and the state set of the DFA is partitioned into many blocks. In the second phase, the state set is refined using a hash table. The minimization algorithm has a lower time complexity O(n) than a naive comparison of transitions O(n2). Few states need to be refined by the hash table, because most states have been partitioned by the backward OPEN ACCESS depth information in the coarse partition. This method achieves greater generality than pre- vious methods because building the backward depth information is independent of the Citation: Liu D, Huang Z, Zhang Y, Guo X, Su S (2016) Efficient Deterministic Finite Automata topological complexity of the DFA. The proposed algorithm can be applied not only to the Minimization Based on Backward Depth minimization of acyclic automata or simple cyclic automata, but also to automata with high Information. PLoS ONE 11(11): e0165864. topological complexity. Overall, the proposal has three advantages: lower time complexity, doi:10.1371/journal.pone.0165864 greater generality, and scalability. A comparison to Hopcroft's algorithm demonstrates Editor: Gui-Quan Sun, Shanxi University, CHINA experimentally that the algorithm runs faster than traditional algorithms.
    [Show full text]