Regular Languages and Finite State Automata Data Structures and Algorithms for Computational Linguistics III

Regular Languages and Finite State Automata Data Structures and Algorithms for Computational Linguistics III

Regular Languages and Finite State Automata Data structures and algorithms for Computational Linguistics III Çağrı Çöltekin [email protected] University of Tübingen Seminar für Sprachwissenschaft Winter Semester 2019–2020 Introduction DFA NFA Regular languages Minimization Regular expressions Why study finite-state automata? • Unlike some of the abstract machines we discussed, finite-state automata are efficient models of computation • There are many applications – Electronic circuit design – Workflow management – Games – Pattern matching – … But more importantly ;-) – Tokenization, stemming – Morphological analysis – Shallow parsing/chunking – … Ç. Çöltekin, SfS / University of Tübingen WS 19–20 1 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions Finite-state automata (FSA) • A finite-state machine is in one of a finite-number of states in a giventime • The machine changes its state based on its input • Every regular language is generated/recognized by an FSA • Every FSA generates/recognizes a regular language • Two flavors: – Deterministic finite automata (DFA) – Non-deterministic finite automata (NFA) Note: the NFA is a superset of DFA. Ç. Çöltekin, SfS / University of Tübingen WS 19–20 2 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions DFA as a graph b state • States are represented as nodes 1 b • Transitions are shown by the edges, transition labeled with symbols from an alphabet 0 a • One of the states is marked as the initial state initial state • a Some states are accepting states 2 accepting state Ç. Çöltekin, SfS / University of Tübingen WS 19–20 3 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions DFA: formal definition Formally, a finite state automaton, M, is a tuple (Σ, Q, q0, F, ∆) with Σ is the alphabet, a finite set of symbols Q a finite set of states q0 is the start state, q0 2 Q F is the set of final states, F ⊆ Q ∆ is a function that takes a state and a symbol in the alphabet, and returns another state (∆ : Q × Σ ! Q) At any given time, for any input, a DFA has a single well-defined action to take. Ç. Çöltekin, SfS / University of Tübingen WS 19–20 4 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions DFA: formal definition an example b Σ = fa, bg 1 Q = fq0, q1, q2g b q0 = q0 F = fq g 2 0 a ∆ = f(q0, a) ! q2, (q0, b) ! q1, (q1, a) ! q2, (q1, b) ! q1g a 2 Ç. Çöltekin, SfS / University of Tübingen WS 19–20 5 / 56 – In that case, when we reach a dead end, recognition fails • To make all transitions well-defined, we can add a sink (or error) state • For brevity, we skip the explicit error state Introduction DFA NFA Regular languages Minimization Regular expressions Another note on DFA b • Is this FSA deterministic? 1 b a, b 0 3 a a, b a 2 Ç. Çöltekin, SfS / University of Tübingen WS 19–20 6 / 56 – In that case, when we reach a dead end, recognition fails • For brevity, we skip the explicit error state Introduction DFA NFA Regular languages Minimization Regular expressions Another note on DFA error or sink state b • Is this FSA deterministic? 1 • To make all transitions well-defined, b we can add a sink (or error) state a, b 0 3 a a, b a 2 Ç. Çöltekin, SfS / University of Tübingen WS 19–20 6 / 56 – In that case, when we reach a dead end, recognition fails Introduction DFA NFA Regular languages Minimization Regular expressions Another note on DFA error or sink state b • Is this FSA deterministic? 1 • To make all transitions well-defined, b we can add a sink (or error) state • For brevity, we skip the explicit error a, b state 0 3 a a, b a 2 Ç. Çöltekin, SfS / University of Tübingen WS 19–20 6 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions Another note on DFA error or sink state b • Is this FSA deterministic? 1 • To make all transitions well-defined, b we can add a sink (or error) state • For brevity, we skip the explicit error a, b state 0 3 a – In that case, when we reach a dead end, recognition fails a, b a 2 Ç. Çöltekin, SfS / University of Tübingen WS 19–20 6 / 56 3 3 3 Introduction DFA NFA Regular languages Minimization Regular expressions DFA: the transition table transition table b symbol a b 1 !0 2 1 b 1 2 1 state *2 ?? a, b 0 3 a a, b ! a marks the start state 2 * marks the accepting state(s) Ç. Çöltekin, SfS / University of Tübingen WS 19–20 7 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions DFA: the transition table transition table b symbol a b 1 !0 2 1 b 1 2 1 state *2 3 3 a, b 3 3 3 0 3 a a, b ! a marks the start state 2 * marks the accepting state(s) Ç. Çöltekin, SfS / University of Tübingen WS 19–20 7 / 56 • What is the complexity of the algorithm? • How about inputs: – bbbb – aa Introduction DFA NFA Regular languages Minimization Regular expressions DFA recognition 1. Start at q0 2. Process an input symbol, move accordingly b 3. Accept if in a final state at the end of the input b 1 0 a a 2 Input: b b a Ç. Çöltekin, SfS / University of Tübingen WS 19–20 8 / 56 • What is the complexity of the algorithm? • How about inputs: – bbbb – aa Introduction DFA NFA Regular languages Minimization Regular expressions DFA recognition 1. Start at q0 2. Process an input symbol, move accordingly b 3. Accept if in a final state at the end of the input b 1 0 a a 2 Input: b b a Ç. Çöltekin, SfS / University of Tübingen WS 19–20 8 / 56 • What is the complexity of the algorithm? • How about inputs: – bbbb – aa Introduction DFA NFA Regular languages Minimization Regular expressions DFA recognition 1. Start at q0 2. Process an input symbol, move accordingly b 3. Accept if in a final state at the end of the input b 1 0 a a 2 Input: b b a Ç. Çöltekin, SfS / University of Tübingen WS 19–20 8 / 56 • What is the complexity of the algorithm? • How about inputs: – bbbb – aa Introduction DFA NFA Regular languages Minimization Regular expressions DFA recognition 1. Start at q0 2. Process an input symbol, move accordingly b 3. Accept if in a final state at the end of the input b 1 0 a a 2 Input: b b a Ç. Çöltekin, SfS / University of Tübingen WS 19–20 8 / 56 • What is the complexity of the algorithm? • How about inputs: – bbbb – aa Introduction DFA NFA Regular languages Minimization Regular expressions DFA recognition 1. Start at q0 2. Process an input symbol, move accordingly b 3. Accept if in a final state at the end of the input b 1 0 a a 2 Input: b b a Ç. Çöltekin, SfS / University of Tübingen WS 19–20 8 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions DFA recognition 1. Start at q0 2. Process an input symbol, move accordingly b 3. Accept if in a final state at the end of the input b 1 0 a • What is the complexity of the a 2 algorithm? • How about inputs: – bbbb – aa Input: b b a Ç. Çöltekin, SfS / University of Tübingen WS 19–20 8 / 56 • Can you draw a simpler DFA for the same language? • Draw a DFA recognizing strings with even number of ‘a’s over Σ = fa, bg Introduction DFA NFA Regular languages Minimization Regular expressions A few questions • What is the language recognized by b this FSA? b 1 0 a a 2 Ç. Çöltekin, SfS / University of Tübingen WS 19–20 9 / 56 • Draw a DFA recognizing strings with even number of ‘a’s over Σ = fa, bg Introduction DFA NFA Regular languages Minimization Regular expressions A few questions • What is the language recognized by b this FSA? • Can you draw a simpler DFA for the b 1 same language? 0 a a 2 Ç. Çöltekin, SfS / University of Tübingen WS 19–20 9 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions A few questions • What is the language recognized by b this FSA? • Can you draw a simpler DFA for the b 1 same language? • Draw a DFA recognizing strings 0 a with even number of ‘a’s over f , g Σ = a b a 2 Ç. Çöltekin, SfS / University of Tübingen WS 19–20 9 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions Non-deterministic finite automata Formal definition A non-deterministic finite state automaton, M, is a tuple (Σ, Q, q0, F, ∆) with Σ is the alphabet, a finite set of symbols Q a finite set of states q0 is the start state, q0 2 Q F is the set of final states, F ⊆ Q ∆ is a function from (Q, Σ) to P(Q), power set of Q (∆ : Q × Σ ! P(Q)) Ç. Çöltekin, SfS / University of Tübingen WS 19–20 10 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions An example NFA a,b transition table a,b a,b 1 symbol a b 0 a !0 0,1 0,1 1 1,2 1 a,b 2 state *2 0,2 0 a • We have nondeterminism, e.g., if the first input is a, we need to choose between states 0 or 1 • Transition table cells have sets of states Ç. Çöltekin, SfS / University of Tübingen WS 19–20 11 / 56 Introduction DFA NFA Regular languages Minimization Regular expressions Dealing with non-determinism • Follow one of the links, store alternatives, and backtrack on failure • Follow all options in parallel • Use dynamic programming (e.g., as in chart parsing) Ç.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    151 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us