Formal Language Theory: Integrating Experimentation and Proof

Formal Language Theory Integrating Experimentation and Proof Alley Stoughton Draft of Fall 2019 0 0 101 B E F I 0 0 0 1 Start A J 0, 1 1 0 1 1 C D G H 0 1 0 1 1 Copyright © 2019 Alley Stoughton Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. The LATEX source of this book is part of the Forlan distribution, which is available on the Web at http://alleystoughton.us/forlan. A copy of the GNU Free Documentation License is included in the Forlan distribution. Contents Preface xi 1 Mathematical Background 1 1.1 BasicSetTheory ........................... 1 1.1.1 Review of Classical Logic . 1 1.1.2 Describing Sets by Listing Their Elements . 4 1.1.3 SetsofNumbers ....................... 4 1.1.4 Relationships between Sets . 5 1.1.5 SetFormation ........................ 6 1.1.6 OperationsonSets...................... 8 1.1.7 Relations and Functions . 10 1.1.8 SetCardinality ........................ 14 1.1.9 DataStructures. .. .. .. .. .. .. 18 1.1.10 Notes ............................. 20 1.2 Induction ............................... 20 1.2.1 Mathematical Induction . 20 1.2.2 StrongInduction . .. .. .. .. .. .. 21 1.2.3 Well-founded Induction . 24 1.2.4 Notes ............................. 28 1.3 Inductive Definitions and Recursion . 28 1.3.1 Inductive Definition of Trees . 28 1.3.2 Recursion ........................... 32 1.3.3 PathsinTrees ........................ 36 1.3.4 Notes ............................. 37 2 Formal Languages 39 2.1 Symbols, Strings, Alphabets and (Formal) Languages . 39 2.1.1 Symbols............................ 39 2.1.2 Strings............................. 40 2.1.3 Alphabets........................... 43 2.1.4 Languages........................... 44 2.1.5 Notes ............................. 45 2.2 Using Induction to Prove Language Equalities . 45 iii iv CONTENTS 2.2.1 String Induction Principles . 45 2.2.2 Proving Language Equalities . 48 2.2.3 Notes ............................. 54 2.3 IntroductiontoForlan . 54 2.3.1 InvokingForlan. .. .. .. .. .. .. 55 2.3.2 TheSMLCoreofForlan. 55 2.3.3 Symbols............................ 64 2.3.4 Sets .............................. 65 2.3.5 SetsofSymbols........................ 65 2.3.6 Strings............................. 67 2.3.7 SetsofStrings ........................ 68 2.3.8 Relations on Symbols . 70 2.3.9 Notes ............................. 73 3 Regular Languages 75 3.1 RegularExpressionsandLanguages . 75 3.1.1 Operations on Languages . 75 3.1.2 RegularExpressions . 79 3.1.3 Processing Regular Expressions in Forlan . 85 3.1.4 JForlan ............................ 88 3.1.5 Notes ............................. 89 3.2 Equivalence and Correctness of Regular Expressions . 89 3.2.1 Equivalence of Regular Expressions . 89 3.2.2 Proving the Correctness of Regular Expressions . 94 3.2.3 Notes ............................. 102 3.3 Simplification of Regular Expressions . 102 3.3.1 Regular Expression Complexity . 102 3.3.2 Weak Simplification . 110 3.3.3 Local and Global Simplification . 117 3.3.4 Notes ............................. 131 3.4 Finite Automata and Labeled Paths . 131 3.4.1 Finite Automata . 131 3.4.2 Labeled Paths and FA Meaning . 135 3.4.3 Design of Finite Automata . 139 3.4.4 Notes ............................. 141 3.5 Isomorphism of Finite Automata . 141 3.5.1 Definition and Algorithm . 141 3.5.2 Isomorphism Finding/Checking in Forlan . 146 3.5.3 Notes ............................. 147 3.6 Checking Acceptance and Finding Accepting Paths . 148 3.6.1 Processing a String from a Set of States . 148 3.6.2 Checking String Acceptance and Finding Accepting Paths 150 3.6.3 Notes ............................. 153 CONTENTS v 3.7 Simplification of Finite Automata . 153 3.7.1 Notes ............................. 158 3.8 Proving the Correctness of Finite Automata . 158 3.8.1 DefinitionofΛ ........................ 158 3.8.2 Proving that Enough is Accepted . 160 3.8.3 Proving that Everything Accepted is Wanted . 162 3.8.4 Notes ............................. 164 3.9 Empty-string Finite Automata . 165 3.9.1 Definition of EFAs . 165 3.9.2 Converting FAs to EFAs . 166 3.9.3 Processing EFAs in Forlan . 167 3.9.4 Notes ............................. 169 3.10 Nondeterministic Finite Automata . 169 3.10.1 Definition of NFAs . 169 3.10.2 Converting EFAs to NFAs . 170 3.10.3 Converting EFAs to NFAs, and Processing NFAs in Forlan 173 3.10.4 Notes ............................. 175 3.11 Deterministic Finite Automata . 175 3.11.1 Definition of DFAs . 175 3.11.2 Proving the Correctness of DFAs . 178 3.11.3 Simplification of DFAs . 181 3.11.4 Converting NFAs to DFAs . 184 3.11.5 Processing DFAs in Forlan . 188 3.11.6 Notes ............................. 191 3.12 Closure Properties of Regular Languages . 191 3.12.1 Converting Regular Expressions to FAs . 192 3.12.2 Converting FAs to Regular Expressions . 199 3.12.3 Characterization of Regular Languages . 209 3.12.4 More Closure Properties/Algorithms . 210 3.12.5 Notes ............................. 225 3.13 Equivalence-testing and Minimization of DFAs . 226 3.13.1 Testing the Equivalence of DFAs . 226 3.13.2 Minimization of DFAs . 229 3.13.3 Notes ............................. 237 3.14 ThePumpingLemma for Regular Languages . 238 3.14.1 Experimenting with the Pumping Lemma Using Forlan . 240 3.14.2 Notes ............................. 243 3.15 Applications of Finite Automata and Regular Expressions . 243 3.15.1 Representing Character Sets and Files . 243 3.15.2 Searching for Regular Expression in Files . 244 3.15.3 Lexical Analysis . 244 3.15.4 Design of Finite State Systems . 246 3.15.5 Notes ............................. 257 vi CONTENTS 4 Context-free Languages 259 4.1 Grammars, Parse Trees and Context-free Languages . 259 4.1.1 Grammars........................... 259 4.1.2 Parse Trees and Grammar Meaning . 261 4.1.3 GrammarSynthesis . 267 4.1.4 Notes ............................. 268 4.2 IsomorphismofGrammars. 268 4.2.1 Definition and Algorithm . 268 4.2.2 Isomorphism Finding/Checking in Forlan . 270 4.2.3 Notes ............................. 271 4.3 AParsingAlgorithm . 271 4.3.1 Algorithm........................... 271 4.3.2 ParsinginForlan . 274 4.3.3 Notes ............................. 276 4.4 Simplification of Grammars . 276 4.4.1 Definition and Algorithm . 276 4.4.2 Simplification in Forlan . 280 4.4.3 Hand-simplification Operations . 281 4.4.4 Notes ............................. 282 4.5 ProvingtheCorrectnessofGrammars . 282 4.5.1 Preliminaries .. .. .. .. .. .. .. 282 4.5.2 Proving that Enough is Generated . 284 4.5.3 Proving that Everything Generated is Wanted . 286 4.5.4 Notes ............................. 288 4.6 AmbiguityofGrammars . 288 4.6.1 Definition ........................... 288 4.6.2 Disambiguating Grammars of Operators . 289 4.6.3 Top-downParsing . 291 4.6.4 Notes ............................. 293 4.7 Closure Properties of Context-free Languages . 293 4.7.1 Operations on Grammars . 294 4.7.2 Operations on Grammars in Forlan . 298 4.7.3 Notes ............................. 305 4.8 Converting Regular Expressions and FA to Grammars . 305 4.8.1 Converting Regular Expressions to Grammars . 306 4.8.2 Converting Finite Automata to Grammars . 306 4.8.3 Consequences of Conversion Functions . 308 4.8.4 Notes ............................. 308 4.9 ChomskyNormalForm . 308 4.9.1 Removing%-Productions . 308 4.9.2 Removing Unit Productions . 309 4.9.3 Generating a Grammar’s Language When Finite . 311 4.9.4 ChomskyNormalForm . 312 CONTENTS vii 4.9.5 Notes ............................. 313 4.10 The Pumping Lemma for Context-free Languages . 314 4.10.1 Statement, Application and Proof of Pumping Lemma . 314 4.10.2 Experimenting with the Pumping Lemma Using Forlan . 316 4.10.3 Consequences of Pumping Lemma . 320 4.10.4 Notes ............................. 321 5 Recursive and Recursively Enumerable Languages 323 5.1 Programs and Recursive and RE Languages . 323 5.1.1 Programs ........................... 324 5.1.2 ProgramMeaning . 330 5.1.3 ProgramsasData . .. .. .. .. .. .. 345 5.1.4 Recursive and Recursively Enumerable Languages . 348 5.1.5 Notes ............................. 353 5.2 Closure Properties of Recursive and R.E. Languages . 353 5.2.1 Closure Properties of Recursive Languages . 353 5.2.2 Closure Properties of Recursively Enumerable Languages 354 5.2.3 Notes ............................. 356 5.3 Diagonalization and Undecidable Problems . 356 5.3.1 Diagonalization . 356 5.3.2 Undecidability of the Halting Problem . 359 5.3.3 Other Undecidable Problems . 360 5.3.4 Notes ............................. 360 Bibliography 361 Index 365 List of Figures 1.1 Example Diagonalization Table for Cardinality Proof . ..... 17 3.1 DFA Accepting AllLongStutter .................. 256 5.1 Example Diagonalization Table . 357 ix Preface Background Since the 1930s, the subject of formal language theory, also known as automata theory, has been developed by computer scientists, linguists and mathemati- cians. Formal languages (or simply languages) are sets of strings over finite sets of symbols, called alphabets, and various ways of describing such languages have been developed and studied, including regular expressions (which “generate” languages), finite automata (which “accept” languages), grammars (which “generate” languages) and Turing machines (which “accept” languages). For example, the set of identifiers of a given programming language is a formal language—one

Load more