Compiler Design: Theory, Tools, and Examples

Total Page:16

File Type:pdf, Size:1020Kb

Compiler Design: Theory, Tools, and Examples Rowan University Rowan Digital Works Open Educational Resources University Libraries 5-1-2017 Compiler Design: Theory, Tools, and Examples Seth D. Bergmann Rowan University Follow this and additional works at: https://rdw.rowan.edu/oer Part of the Computer Sciences Commons DOI: 10.31986/issn.2689-0690_rdw.oer.1001 Let us know how access to this document benefits ouy - share your thoughts on our feedback form. Recommended Citation Bergmann, Seth D., "Compiler Design: Theory, Tools, and Examples" (2017). Open Educational Resources. 1. https://rdw.rowan.edu/oer/1 This Book is brought to you for free and open access by the University Libraries at Rowan Digital Works. It has been accepted for inclusion in Open Educational Resources by an authorized administrator of Rowan Digital Works. Compiler Design: Theory, Tools, and Examples Seth D. Bergmann February 12, 2016 2 Contents Preface v 1 Introduction 1 1.1 WhatisaCompiler?......................... 1 1.1.1 Exercises ........................... 7 1.2 ThePhasesofaCompiler ...................... 8 1.2.1 Lexical Analysis (Scanner) - Finding the Word Boundaries 8 1.2.2 SyntaxAnalysisPhase. 10 1.2.3 Global Optimization . 12 1.2.4 CodeGeneration ....................... 13 1.2.5 Local Optimization . 15 1.2.6 Exercises ........................... 17 1.3 ImplementationTechniques . 19 1.3.1 Bootstrapping ........................ 20 1.3.2 Cross Compiling . 21 1.3.3 Compiling To Intermediate Form . 22 1.3.4 Compiler-Compilers . 24 1.3.5 Exercises ........................... 24 1.4 CaseStudy:Decaf .......................... 25 1.5 ChapterSummary .......................... 27 2 Lexical Analysis 28 2.0 FormalLanguages .......................... 28 2.0.1 LanguageElements.. .. .. .. .. .. .. .. .. 28 2.0.2 Finite State Machines . 29 2.0.3 RegularExpressions . .. .. .. .. .. .. .. .. 33 2.0.4 Exercises ........................... 36 2.1 LexicalTokens ............................ 38 2.1.1 Exercises ........................... 41 2.2 Implementation with Finite State Machines . 42 2.2.1 Examples of Finite State Machines for Lexical Analysis . 42 2.2.2 Actions for Finite State Machines . 44 2.2.3 Exercises ........................... 46 2.3 LexicalTables............................. 47 i ii CONTENTS 2.3.1 SequentialSearch.. .. .. .. .. .. .. .. .. .. 47 2.3.2 BinarySearchTree. .. .. .. .. .. .. .. .. .. 48 2.3.3 HashTable .......................... 49 2.3.4 Exercises ........................... 49 2.4 Lexical Analysis with SableCC . 51 2.4.1 SableCC Input File . 51 2.4.2 RunningSableCC ...................... 59 2.4.3 Exercises ........................... 62 2.5 CaseStudy: LexicalAnalysisforDecaf. 62 2.5.1 Exercises ........................... 65 2.6 ChapterSummary .......................... 65 3 Syntax Analysis 67 3.0 Grammars,Languages,and Pushdown Machines . 68 3.0.1 Grammars........................... 68 3.0.2 ClassesofGrammars. 70 3.0.3 Context-FreeGrammars . 73 3.0.4 PushdownMachines .. .. .. .. .. .. .. .. .. 75 3.0.5 Correspondence Between Machines and Classes of Languages 79 3.0.6 Exercises ........................... 84 3.1 Ambiguities in Programming Languages . 87 3.1.1 Exercises ........................... 89 3.2 TheParsingProblem......................... 90 3.3 Summary ............................... 91 4 Top Down Parsing 93 4.0 RelationsandClosure ........................ 94 4.0.1 Exercises ........................... 96 4.1 SimpleGrammars .......................... 97 4.1.1 Parsing Simple Languages with Pushdown Machines . 98 4.1.2 Recursive Descent Parsers for Simple Grammars . 100 4.1.3 Exercises ........................... 104 4.2 Quasi-SimpleGrammars. 105 4.2.1 Pushdown Machines for Quasi-Simple Grammars . 107 4.2.2 Recursive Descent for Quasi-Simple Grammars . 107 4.2.3 A Final Remark on ǫ Rules ................. 108 4.2.4 Exercises ........................... 111 4.3 LL(1)Grammars ........................... 111 4.3.1 Pushdown Machinesfor LL(1)Grammars . 116 4.3.2 RecursiveDescentforLL(1)Grammars . 118 4.3.3 Exercises ........................... 120 4.4 ParsingArithmeticExpressionsTopDown. 121 4.4.1 Exercises ........................... 130 4.5 Syntax-DirectedTranslation. 131 4.5.1 Implementing Translation Grammars with Pushdown Translators132 4.5.2 Implementing Translation Grammars with Recursive Descent134 CONTENTS iii 4.5.3 Exercises ........................... 137 4.6 AttributedGrammars . 137 4.6.1 Implementing Attributed Grammars with Recursive Descent139 4.6.2 Exercises ........................... 142 4.7 An Attributed Translation Grammar for Expressions . 143 4.7.1 Translating Expressions with Recursive Descent . 144 4.7.2 Exercises ........................... 147 4.8 DecafExpressions .......................... 147 4.8.1 LBL,JMP,TST,andMOVatoms . 148 4.8.2 Booleanexpressions . 148 4.8.3 Assignment .......................... 150 4.8.4 Exercises ........................... 152 4.9 TranslatingControlStructures . 153 4.9.1 Exercises ........................... 158 4.10 CaseStudy: ATopDownParserforDecaf . 159 4.10.1 Exercises ........................... 161 4.11 ChapterSummary . .. .. .. .. .. .. .. .. .. .. 162 5 Bottom Up Parsing 164 5.1 ShiftReduceParsing. .. .. .. .. .. .. .. .. .. .. 164 5.1.1 Exercises ........................... 170 5.2 LRParsingWithTables . .. .. .. .. .. .. .. .. .. 171 5.2.1 Exercises ........................... 176 5.3 SableCC................................177 5.3.1 OverviewofSableCC . 177 5.3.2 StructureoftheSableCCSourceFiles . 177 5.3.3 An Example Using SableCC . 179 5.3.4 Exercises ........................... 187 5.4 Arrays.................................192 5.4.1 Exercises ........................... 196 5.5 CaseStudy: SyntaxAnalysisforDecaf . 197 5.5.1 Exercises ........................... 199 5.6 ChapterSummary .......................... 200 6 Code Generation 202 6.1 IntroductiontoCodeGeneration . 202 6.1.1 Exercises ........................... 205 6.2 ConvertingAtomstoInstructions. 206 6.2.1 Exercises ........................... 208 6.3 Single Pass vs. Multiple Passes . 209 6.3.1 Exercises ........................... 214 6.4 Register Allocation . 215 6.4.1 Exercises ........................... 219 6.5 Case Study: A Code Generator for the Mini Architecture . 219 6.5.1 Mini: The Simulated Architecture . 220 6.5.2 TheInputtotheCodeGenerator. 222 iv CONTENTS 6.5.3 TheCodeGeneratorforMini . 223 6.5.4 Exercises ........................... 224 6.6 ChapterSummary .......................... 225 7 Optimization 227 7.1 Introduction and View of Optimization . 227 7.1.1 Exercises ........................... 229 7.2 Global Optimization . 230 7.2.1 BasicBlocksandDAGs . 230 7.2.2 Other Global Optimization Techniques . 237 7.2.3 Exercises ........................... 242 7.3 LocalOptimization.......................... 246 7.3.1 Exercises ........................... 248 7.4 ChapterSummary .......................... 250 Glossary 251 Appendix A - Decaf Grammar 263 Appendix B - Decaf Compiler 266 B.1 InstallingDecaf............................ 266 B.2 SourceCodeforDecaf . 267 B.3 CodeGenerator............................ 285 Appendix C - Mini Simulator 291 Bibliography 298 Index 301 Preface Compiler design is a subject which many believe to be fundamental and vital to computer science. It is a subject which has been studied intensively since the early 1950’s and continues to be an important research field today. Compiler design is an important part of the undergraduate curriculum for many reasons: (1) It provides students with a better understanding of and appreciation for programming languages. (2) The techniques used in compilers can be used in other applications with command languages. (3) It provides motivation for the study of theoretic topics. (4) It is a good vehicle for an extended programming project. There are several compiler design textbooks available today, but most have been written for graduate students. Here at Rowan University, our students have had difficulty reading these books. However, I felt it was not the subject matter that was the problem, but the way it was presented. I was sure that if concepts were presented at a slower pace, with sample problems and diagrams to illustrate the concepts, that our students would be able to master the concepts. This is what I have attempted to do in writing this book. This book is a revision of earlier editions that were written for Pascal and C++ based curricula. As many computer science departments have moved to Java as the primary language in the undergraduate curriculum, I have produced this edition to accommodate those departments. This book is not intended to be strictly an object- oriented approach to compiler design. Though most Java compilers compile to an intermediate form known as Byte Code, the approach taken here is a more traditional one in which we compile to native code for a particular machine. The most essential prerequisites for this book are courses in Java application programming, Data Structures, Assembly Language or Computer Architecture, and possibly Programming Languages. If the student has not studied formal languages and automata, this book includes introductory sections on these the- oretic topics, but in this case it is not likely that all seven chapters will be covered in a one semester course. Students who have studied the theory will be able to skip the preliminary sections (2.0, 3.0, 4.0) without loss of continuity. The concepts of compiler design are applied to a case study which is an implementation of a subset of Java which I call Decaf. Chapters 2, 4, 5, and 6 include
Recommended publications
  • Using the Unibasic Debugger
    C:\Program Files\Adobe\FrameMaker8\UniData 7.2\7.2rebranded\DEBUGGER\BASBTITL.fm March 8, 2010 10:30 am Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta UniData Using the UniBasic Debugger UDT-720-UDEB-1 C:\Program Files\Adobe\FrameMaker8\UniData 7.2\7.2rebranded\DEBUGGER\BASBTITL.fm March 8, 2010 10:30 am Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Notices Edition Publication date: July 2008 Book number: UDT-720-UDEB-1 Product version: UniData 7.2 Copyright © Rocket Software, Inc. 1988-2008. All Rights Reserved. Trademarks The following trademarks appear in this publication: Trademark Trademark Owner Rocket Software™ Rocket Software, Inc. Dynamic Connect® Rocket Software, Inc. RedBack® Rocket Software, Inc. SystemBuilder™ Rocket Software, Inc. UniData® Rocket Software, Inc. UniVerse™ Rocket Software, Inc. U2™ Rocket Software, Inc. U2.NET™ Rocket Software, Inc. U2 Web Development Environment™ Rocket Software, Inc. wIntegrate® Rocket Software, Inc. Microsoft® .NET Microsoft Corporation Microsoft® Office Excel®, Outlook®, Word Microsoft Corporation Windows® Microsoft Corporation Windows® 7 Microsoft Corporation Windows Vista® Microsoft Corporation Java™ and all Java-based trademarks and logos Sun Microsystems, Inc. UNIX® X/Open Company Limited ii Using the UniBasic Debugger The above trademarks are property of the specified companies in the United States, other countries, or both. All other products or services mentioned in this document may be covered by the trademarks, service marks, or product names as designated by the companies who own or market them. License agreement This software and the associated documentation are proprietary and confidential to Rocket Software, Inc., are furnished under license, and may be used and copied only in accordance with the terms of such license and with the inclusion of the copyright notice.
    [Show full text]
  • Kednos PL/I for Openvms Systems User Manual
    ) Kednos PL/I for OpenVMS Systems User Manual Order Number: AA-H951E-TM November 2003 This manual provides an overview of the PL/I programming language. It explains programming with Kednos PL/I on OpenVMS VAX Systems and OpenVMS Alpha Systems. It also describes the operation of the Kednos PL/I compilers and the features of the operating systems that are important to the PL/I programmer. Revision/Update Information: This revised manual supersedes the PL/I User’s Manual for VAX VMS, Order Number AA-H951D-TL. Operating System and Version: For Kednos PL/I for OpenVMS VAX: OpenVMS VAX Version 5.5 or higher For Kednos PL/I for OpenVMS Alpha: OpenVMS Alpha Version 6.2 or higher Software Version: Kednos PL/I Version 3.8 for OpenVMS VAX Kednos PL/I Version 4.4 for OpenVMS Alpha Published by: Kednos Corporation, Pebble Beach, CA, www.Kednos.com First Printing, August 1980 Revised, November 1983 Updated, April 1985 Revised, April 1987 Revised, January 1992 Revised, May 1992 Revised, November 1993 Revised, April 1995 Revised, October 1995 Revised, November 2003 Kednos Corporation makes no representations that the use of its products in the manner described in this publication will not infringe on existing or future patent rights, nor do the descriptions contained in this publication imply the granting of licenses to make, use, or sell equipment or software in accordance with the description. Possession, use, or copying of the software described in this publication is authorized only pursuant to a valid written license from Kednos Corporation or an anthorized sublicensor.
    [Show full text]
  • Regular Expressions with a Brief Intro to FSM
    Regular Expressions with a brief intro to FSM 15-123 Systems Skills in C and Unix Case for regular expressions • Many web applications require pattern matching – look for <a href> tag for links – Token search • A regular expression – A pattern that defines a class of strings – Special syntax used to represent the class • Eg; *.c - any pattern that ends with .c Formal Languages • Formal language consists of – An alphabet – Formal grammar • Formal grammar defines – Strings that belong to language • Formal languages with formal semantics generates rules for semantic specifications of programming languages Automaton • An automaton ( or automata in plural) is a machine that can recognize valid strings generated by a formal language . • A finite automata is a mathematical model of a finite state machine (FSM), an abstract model under which all modern computers are built. Automaton • A FSM is a machine that consists of a set of finite states and a transition table. • The FSM can be in any one of the states and can transit from one state to another based on a series of rules given by a transition function. Example What does this machine represents? Describe the kind of strings it will accept. Exercise • Draw a FSM that accepts any string with even number of A’s. Assume the alphabet is {A,B} Build a FSM • Stream: “I love cats and more cats and big cats ” • Pattern: “cat” Regular Expressions Regex versus FSM • A regular expressions and FSM’s are equivalent concepts. • Regular expression is a pattern that can be recognized by a FSM. • Regex is an example of how good theory leads to good programs Regular Expression • regex defines a class of patterns – Patterns that ends with a “*” • Regex utilities in unix – grep , awk , sed • Applications – Pattern matching (DNA) – Web searches Regex Engine • A software that can process a string to find regex matches.
    [Show full text]
  • Data General Extended Algol 60 Compiler
    DATA GENERAL EXTENDED ALGOL 60 COMPILER, Data General's Extended ALGOL is a powerful language tial I/O with optional formatting. These extensions comple­ which allows systems programmers to develop programs ment the basic structure of ALGOL and significantly in­ on mini computers that would otherwise require the use of crease the convenience of ALGOL programming without much larger, more expensive computers. No other mini making the language unwieldy. computer offers a language with the programming features and general applicability of Data General's Extended FEATURES OF DATA GENERAL'S EXTENDED ALGOL Character strings are implemented as an extended data ALGOL. type to allow easy manipulation of character data. The ALGOL 60 is the most widely used language for describ­ program may, for example, read in character strings, search ing programming algorithms. It has a flexible, generalized, for substrings, replace characters, and maintain character arithmetic organization and a modular, "building block" string tables efficiently. structure that provides clear, easily readable documentation. Multi-precision arithmetic allows up to 60 decimal digits The language is powerful and concise, allowing the systems of precision in integer or floating point calculations. programmer to state algorithms without resorting to "tricks" Device-independent I/O provides for reading and writ­ to bypass the language. ing in line mode, sequential mode, or random mode.' Free These characteristics of ALGOL are especially important form reading and writing is permitted for all data types, or in the development of working prototype systems. The output can be formatted according to a "picture" of the clear documentation makes it easy for the programmer to output line or lines.
    [Show full text]
  • Generating Context-Free Grammars Using Classical Planning
    Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) Generating Context-Free Grammars using Classical Planning Javier Segovia-Aguas1, Sergio Jimenez´ 2, Anders Jonsson 1 1 Universitat Pompeu Fabra, Barcelona, Spain 2 University of Melbourne, Parkville, Australia [email protected], [email protected], [email protected] Abstract S ! aSa S This paper presents a novel approach for generating S ! bSb /|\ Context-Free Grammars (CFGs) from small sets of S ! a S a /|\ input strings (a single input string in some cases). a S a Our approach is to compile this task into a classical /|\ planning problem whose solutions are sequences b S b of actions that build and validate a CFG compli- | ant with the input strings. In addition, we show that our compilation is suitable for implementing the two canonical tasks for CFGs, string produc- (a) (b) tion and string recognition. Figure 1: (a) Example of a context-free grammar; (b) the corre- sponding parse tree for the string aabbaa. 1 Introduction A formal grammar is a set of symbols and rules that describe symbols in the grammar and (2), a bounded maximum size of how to form the strings of certain formal language. Usually the rules in the grammar (i.e. a maximum number of symbols two tasks are defined over formal grammars: in the right-hand side of the grammar rules). Our approach is compiling this inductive learning task into • Production : Given a formal grammar, generate strings a classical planning task whose solutions are sequences of ac- that belong to the language represented by the grammar.
    [Show full text]
  • Formal Grammar Specifications of User Interface Processes
    FORMAL GRAMMAR SPECIFICATIONS OF USER INTERFACE PROCESSES by MICHAEL WAYNE BATES ~ Bachelor of Science in Arts and Sciences Oklahoma State University Stillwater, Oklahoma 1982 Submitted to the Faculty of the Graduate College of the Oklahoma State University iri partial fulfillment of the requirements for the Degree of MASTER OF SCIENCE July, 1984 I TheSIS \<-)~~I R 32c-lf CO'f· FORMAL GRAMMAR SPECIFICATIONS USER INTER,FACE PROCESSES Thesis Approved: 'Dean of the Gra uate College ii tta9zJ1 1' PREFACE The benefits and drawbacks of using a formal grammar model to specify a user interface has been the primary focus of this study. In particular, the regular grammar and context-free grammar models have been examined for their relative strengths and weaknesses. The earliest motivation for this study was provided by Dr. James R. VanDoren at TMS Inc. This thesis grew out of a discussion about the difficulties of designing an interface that TMS was working on. I would like to express my gratitude to my major ad­ visor, Dr. Mike Folk for his guidance and invaluable help during this study. I would also like to thank Dr. G. E. Hedrick and Dr. J. P. Chandler for serving on my graduate committee. A special thanks goes to my wife, Susan, for her pa­ tience and understanding throughout my graduate studies. iii TABLE OF CONTENTS Chapter Page I. INTRODUCTION . II. AN OVERVIEW OF FORMAL LANGUAGE THEORY 6 Introduction 6 Grammars . • . • • r • • 7 Recognizers . 1 1 Summary . • • . 1 6 III. USING FOR~AL GRAMMARS TO SPECIFY USER INTER- FACES . • . • • . 18 Introduction . 18 Definition of a User Interface 1 9 Benefits of a Formal Model 21 Drawbacks of a Formal Model .
    [Show full text]
  • Supplementary Materials
    Contents 2 Programming Language Syntax C 1 2.3.5 Syntax Errors C 1 2.4 Theoretical Foundations C 13 2.4.1 Finite Automata C 13 2.4.2 Push-Down Automata C 18 2.4.3 Grammar and Language Classes C 19 2.6 Exercises C 24 2.7 Explorations C 25 3 Names, Scopes, and Bindings C 26 3.4 Implementing Scope C 26 3.4.1 Symbol Tables C 26 3.4.2 Association Lists and Central Reference Tables C 31 3.8 Separate Compilation C 36 3.8.1 Separate Compilation in C C 37 3.8.2 Packages and Automatic Header Inference C 40 3.8.3 Module Hierarchies C 41 3.10 Exercises C 42 3.11 Explorations C 44 4SemanticAnalysis C 45 4.5 Space Management for Attributes C 45 4.5.1 Bottom-Up Evaluation C 45 4.5.2 Top-Down Evaluation C 50 C ii Contents 4.8 Exercises C 57 4.9 Explorations C 59 5 Target Machine Architecture C 60 5.1 The Memory Hierarchy C 61 5.2 Data Representation C 63 5.2.1 Integer Arithmetic C 65 5.2.2 Floating-Point Arithmetic C 67 5.3 Instruction Set Architecture (ISA) C 70 5.3.1 Addressing Modes C 71 5.3.2 Conditions and Branches C 72 5.4 Architecture and Implementation C 75 5.4.1 Microprogramming C 76 5.4.2 Microprocessors C 77 5.4.3 RISC C 77 5.4.4 Multithreading and Multicore C 78 5.4.5 Two Example Architectures: The x86 and ARM C 80 5.5 Compiling for Modern Processors C 88 5.5.1 Keeping the Pipeline Full C 89 5.5.2 Register Allocation C 93 5.6 Summary and Concluding Remarks C 98 5.7 Exercises C 100 5.8 Explorations C 104 5.9 Bibliographic Notes C 105 6 Control Flow C 107 6.5.4 Generators in Icon C 107 6.7 Nondeterminacy C 110 6.9 Exercises C 116 6.10 Explorations
    [Show full text]
  • CS412/CS413 Introduction to Compilers Tim Teitelbaum Lecture 12
    CS412/CS413 Introduction to Compilers Tim Teitelbaum Lecture 12: Symbol Tables February 15, 2008 CS 412/413 Spring 2008 Introduction to Compilers 1 Where We Are Source code if (b == 0) a = b; (character stream) Lexical Analysis Token if ( b == 0 ) a = b ; stream Syntax Analysis if (Parsing) == = Abstract syntax tree (AST) b0ab if Semantic Analysis boolean int Decorated == = AST int b int 0 int a int b Errors lvalue (incorrect program) CS 412/413 Spring 2008 Introduction to Compilers 2 Non-Context-Free Syntax • Programs that are correct with respect to the language’s lexical and context-free syntactic rules may still contain other syntactic errors • Lexical analysis and context-free syntax analysis are not powerful enough to ensure the correct usage of variables, objects, functions, statements, etc. • Non-context-free syntactic analysis is known as semantic analysis CS 412/413 Spring 2008 Introduction to Compilers 3 Incorrect Programs •Example 1: lexical analysis does not distinguish between different variable or function identifiers (it returns the same token for all identifiers) int a; int a; a = 1; b = 1; •Example 2: syntax analysis does not correlate the declarations with the uses of variables in the program: int a; a = 1; a = 1; •Example3: syntax analysis does not correlate the types from the declarations with the uses of variables: int a; int a; a = 1; a = 1.0; CS 412/413 Spring 2008 Introduction to Compilers 4 Goals of Semantic Analysis • Semantic analysis ensures that the program satisfies a set of additional rules regarding the
    [Show full text]
  • Algorithm for Analysis and Translation of Sentence Phrases
    Masaryk University Faculty}w¡¢£¤¥¦§¨ of Informatics!"#$%&'()+,-./012345<yA| Algorithm for Analysis and Translation of Sentence Phrases Bachelor’s thesis Roman Lacko Brno, 2014 Declaration Hereby I declare, that this paper is my original authorial work, which I have worked out by my own. All sources, references and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Roman Lacko Advisor: RNDr. David Sehnal ii Acknowledgement I would like to thank my family and friends for their support. Special thanks go to my supervisor, RNDr. David Sehnal, for his attitude and advice, which was of invaluable help while writing this thesis; and my friend František Silváši for his help with the revision of this text. iii Abstract This thesis proposes a library with an algorithm capable of translating objects described by natural language phrases into their formal representations in an object model. The solution is not restricted by a specific language nor target model. It features a bottom-up chart parser capable of parsing any context-free grammar. Final translation of parse trees is carried out by the interpreter that uses rewrite rules provided by the target application. These rules can be extended by custom actions, which increases the usability of the library. This functionality is demonstrated by an additional application that translates description of motifs in English to objects of the MotiveQuery language. iv Keywords Natural language, syntax analysis, chart parsing,
    [Show full text]
  • Finite-State Automata and Algorithms
    Finite-State Automata and Algorithms Bernd Kiefer, [email protected] Many thanks to Anette Frank for the slides MSc. Computational Linguistics Course, SS 2009 Overview . Finite-state automata (FSA) – What for? – Recap: Chomsky hierarchy of grammars and languages – FSA, regular languages and regular expressions – Appropriate problem classes and applications . Finite-state automata and algorithms – Regular expressions and FSA – Deterministic (DFSA) vs. non-deterministic (NFSA) finite-state automata – Determinization: from NFSA to DFSA – Minimization of DFSA . Extensions: finite-state transducers and FST operations Finite-state automata: What for? Chomsky Hierarchy of Hierarchy of Grammars and Languages Automata . Regular languages . Regular PS grammar (Type-3) Finite-state automata . Context-free languages . Context-free PS grammar (Type-2) Push-down automata . Context-sensitive languages . Tree adjoining grammars (Type-1) Linear bounded automata . Type-0 languages . General PS grammars Turing machine computationally more complex less efficient Finite-state automata model regular languages Regular describe/specify expressions describe/specify Finite describe/specify Regular automata recognize languages executable! Finite-state MACHINE Finite-state automata model regular languages Regular describe/specify expressions describe/specify Regular Finite describe/specify Regular grammars automata recognize/generate languages executable! executable! • properties of regular languages • appropriate problem classes Finite-state • algorithms for FSA MACHINE Languages, formal languages and grammars . Alphabet Σ : finite set of symbols Σ . String : sequence x1 ... xn of symbols xi from the alphabet – Special case: empty string ε . Language over Σ : the set of strings that can be generated from Σ – Sigma star Σ* : set of all possible strings over the alphabet Σ Σ = {a, b} Σ* = {ε, a, b, aa, ab, ba, bb, aaa, aab, ...} – Sigma plus Σ+ : Σ+ = Σ* -{ε} Strings – Special languages: ∅ = {} (empty language) ≠ {ε} (language of empty string) .
    [Show full text]
  • Design and Implementation of the GNU Prolog System Abstract 1 Introduction
    Design and Implementation of the GNU Prolog System Daniel Diaz Philippe Codognet University of Paris 1 University of Paris 6 CRI, bureau C1407 LIP6, case 169 90, rue de Tolbiac 8, rue du Capitaine Scott 75013 Paris, FRANCE 75015 Paris, FRANCE and INRIA-Rocquencourt and INRIA-Rocquencourt [email protected] [email protected] Abstract In this paper we describe the design and the implementation of the GNU Pro- log system. This system draws on our previous experience of compiling Prolog to C in the wamcc system and of compiling finite domain constraints in the clp(FD) system. The compilation scheme has however been redesigned in or- der to overcome the drawbacks of compiling to C. In particular, GNU-Prolog is based on a low-level mini-assembly platform-independent language that makes it possible to avoid compiling C code, and thus drastically reduces compilation time. It also makes it possible to produce small stand-alone executable files as the result of the compilation process. Interestingly, GNU Prolog is now com- pliant to the ISO standard, includes several extensions (OS interface, sockets, global variables, etc) and integrates a powerful constraint solver over finite domains. The system is efficient and in terms of performance is comparable with commercial systems for both the Prolog and constraint aspects. 1 Introduction GNU Prolog is a free Prolog compiler supported by the GNU organization (http://www.gnu.org/software/prolog). It is a complete system which in- cludes: floating point numbers, streams, dynamic code, DCG, operating sys- tem interface, sockets, a Prolog debugger, a low-level WAM debugger, line editing facilities with completion on atoms, etc.
    [Show full text]
  • Using Contextual Representations to Efficiently Learn Context-Free
    JournalofMachineLearningResearch11(2010)2707-2744 Submitted 12/09; Revised 9/10; Published 10/10 Using Contextual Representations to Efficiently Learn Context-Free Languages Alexander Clark [email protected] Department of Computer Science, Royal Holloway, University of London Egham, Surrey, TW20 0EX United Kingdom Remi´ Eyraud [email protected] Amaury Habrard [email protected] Laboratoire d’Informatique Fondamentale de Marseille CNRS UMR 6166, Aix-Marseille Universite´ 39, rue Fred´ eric´ Joliot-Curie, 13453 Marseille cedex 13, France Editor: Fernando Pereira Abstract We present a polynomial update time algorithm for the inductive inference of a large class of context-free languages using the paradigm of positive data and a membership oracle. We achieve this result by moving to a novel representation, called Contextual Binary Feature Grammars (CBFGs), which are capable of representing richly structured context-free languages as well as some context sensitive languages. These representations explicitly model the lattice structure of the distribution of a set of substrings and can be inferred using a generalisation of distributional learning. This formalism is an attempt to bridge the gap between simple learnable classes and the sorts of highly expressive representations necessary for linguistic representation: it allows the learnability of a large class of context-free languages, that includes all regular languages and those context-free languages that satisfy two simple constraints. The formalism and the algorithm seem well suited to natural language and in particular to the modeling of first language acquisition. Pre- liminary experimental results confirm the effectiveness of this approach. Keywords: grammatical inference, context-free language, positive data only, membership queries 1.
    [Show full text]