Measuring and Extending Lr(1) Parser Generation A

Total Page:16

File Type:pdf, Size:1020Kb

Measuring and Extending Lr(1) Parser Generation A MEASURING AND EXTENDING LR(1) PARSER GENERATION A DISSERTATION SUBMITTED TO THE GRADUATE DIVISION OF THE UNIVERSITY OF HAWAI‘I IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN COMPUTER SCIENCE AUGUST 2009 By Xin Chen Dissertation Committee: David Pager, Chairperson YingFei Dong David Chin David Streveler Scott Robertson We certify that we have read this dissertation and that, in our opinion, it is satisfactory in scope and quality as a dissertation for the degree of Doctor of Philosophy in Computer Science. DISSERTATION COMMITTEE Chairperson ii Copyright 2009 by Xin Chen iii To my family iv Acknowledgements I wish to thank my family for their love and support. I thank my advisor, professor David Pager, for his guidance and supervision. Without him this work would not be possible. Every time when I was stuck at an issue, it was the discussion with him that could lead me through the difficulty. I thank my PhD committee members for giving me feedback on my work, as well as occasional talks and suggestions. I would regret the unfortunate pass away of professor Art Lew and miss his passion and dedication on research. I would like to thank many people in the compiler and parser generator field that I have commu- nicated to in person, by email or via online discussion. From them I received invaluable suggestions and have learned a lot: Francois Pottier, Akim Demaille, Paul Hilfinger, Joel Denny, Paul Mann, Chris Clark, Hans Aberg, Hans-Peter Diettrich, Terence Parr, Sean O’Connor, Vladimir Makarov, Aho Alfred, Heng Yuan, Felipe Angriman, Pete Jinks and more. I would like to thank my GA supervisor Shi-Jen He for allowing me more freedom to concen- trate on research in the final phase. Thanks also go to people not mentioned here but who have nevertheless supported my work in one way or another. v ABSTRACT Commonly used parser generation algorithms such as LALR, LL and SLR all have their restric- tions. The canonical LR(1) algorithm proposed by Knuth in 1965 is regarded as the most powerful parser generation algorithm for context-free languages, but is very expensive in time and space costs and has long been considered as impractical by the community. There have been LR(1) algorithms that can improve the time and space efficiency of the canonical LR(1) algorithm, but there had been no systematic study of them. Good LR(1) parser generator implementation is also rare. LR(k) parser generation is even more expensive and complicated than LR(1), but it can be an alternative to GLR in natural language processing and other applications. To solve these problems, this work explored ways of improving data structure and algorithms, and implemented an efficient, practical and Yacc-compatible LR(0)/LALR(1)/LR(1)/LR(k) parser generator Hyacc, which has been released to the open source community. An empirical study was conducted comparing different LR(1) parser generation algorithms and with LALR(1) algorithms. The result shows that LR(1) parser generation based upon improved algorithms and carefully se- lected data structures can be sufficiently efficient to be of practical use with modern computing facilities. An extension was made to the unit production elimination algorithm to remove redundant states. A LALR(1) implementation based on the first phase of lane-tracing was done, which is an- other alternative of LALR(1) algorithm. The second phase of lane-tracing algorithm, which has not been discussed in detail before, was analyzed and implemented. A new LR(k) algorithm called the edge-pushing algorithm, which is based on recursively applying the lane-tracing process, was de- signed and implemented. Finally, a latex2gDPS compiler was created using Hyacc to demonstrate its usage. vi Contents Acknowledgements v Abstract vi List of Tables xii List of Figures xiv 1 Introduction 1 2 Background and Related Work 4 2.1 Parsing Theory . .4 2.1.1 History of Research on Parsing Algorithms . .4 2.1.2 Classification of Parsing Algorithms . .5 2.2 LR(1) Parsing Theory . .8 2.2.1 The Canonical LR(k) Algorithm of Knuth (1965) . .8 2.2.2 The Partitioning Algorithm of Korenjak (1969) . .8 2.2.3 The Lane-tracing Algorithm of Pager (1977) . .9 2.2.4 The Practical General Method of Pager (1977) . .9 2.2.5 The Splitting Algorithm of Spector (1981, 1988) . 10 2.2.6 The Honalee Algorithm of Tribble (2002) . 10 2.2.7 Other LR(1) Algorithms . 11 2.3 LR(1) Parser Generators . 11 vii 2.3.1 LR(1) Parser Generators Based on the Practical General Method . 11 2.3.2 LR(1) Parser Generators Based on the Lane-Tracing Algorithm . 12 2.3.3 LR(1) Parser Generators Based on Spector’s Splitting Algorithm . 13 2.3.4 Other LR(1) Parser Generators . 13 2.4 The Need For Revisiting LR(1) Parser Generation . 14 2.4.1 The Problems of Other Parsing Algorithms . 14 2.4.2 The Obsolete Misconception of LR(1) versus LALR(1) . 14 2.4.3 The Current Status of LR(1) Parser Generators . 15 2.5 LR(k) Parsing . 15 2.6 Conclusion . 17 3 The Hyacc Parser Generator 18 3.1 Overview . 18 3.2 Architecture of the Hyacc parser generator . 21 3.3 Architecture of the LR(1) Parse Engine . 23 3.3.1 Architecture . 23 3.3.2 Storing the Parsing Table . 25 3.3.3 Handling Precedence and Associativity . 33 3.3.4 Error Handling . 35 3.4 Data Structures . 35 4 LR(1) Parser Generation 38 4.1 Overview . 38 4.2 Knuth’s Canonical Algorithm . 39 4.2.1 The Algorithm . 39 4.2.2 Implementation Issues . 41 4.3 Pager’s Practical General Method . 50 4.3.1 The Algorithm . 50 4.3.2 Implementation Issues . 52 viii 4.4 Pager’s Unit Production Elimination Algorithm . 56 4.4.1 The Algorithm . 56 4.4.2 Implementation Issues . 59 4.5 Extension To The Unit Production Elimination Algorithm . 66 4.5.1 Introduction and the Algorithm . 66 4.5.2 Implementation Issues . 71 4.6 Pager’s Lane-tracing Algorithm . 72 4.6.1 The Algorithm . 72 4.6.2 Lane-tracing Phase 1 . 73 4.6.3 Lane-tracing Phase 2 . 76 4.6.4 Lane-tracing Phase 2 First Step: Get Lanehead State List . 76 4.6.5 Lane-tracing Phase 2 Based on PGM . 79 4.6.6 Lane-tracing Phase 2 Based on A Lane-tracing Table . 83 4.7 Framework of Reduced-Space LR(1) Parser Generation . 94 4.8 Conclusion . 95 5 Measurements and Evaluations of LR(1) Parser Generation 97 5.1 About the Measurement . 97 5.1.1 The Environment and Metrics Collection . 97 5.1.2 The Algorithms . 98 5.1.3 The Grammars . 99 5.2 LR(1), LALR(1) and LR(0) Algorithms . 100 5.2.1 Parsing Table Size Comparison . 100 5.2.2 Parsing Table Conflict Comparison . 103 5.2.3 Running Time Comparison . 105 5.2.4 Memory Usage Comparison . 107 5.3 Extension Algorithm to Unit Production Elimination . 109 5.3.1 Parsing Table Size Comparison . 109 ix 5.3.2 Parsing Table Conflict Comparison . 112 5.3.3 Running Time Comparison . 114 5.3.4 Memory Usage Comparison . 114 5.4 Comparison with Other Parser Generators . 118 5.4.1 Comparison to Dragon and Parsing . 118 5.4.2 Comparison to Menhir and MSTA . 119 5.5 Conclusion . 121 5.5.1 LR(1) and LALR(1) Algorithms . 121 5.5.2 The Unit Production Elimination Algorithm and Its Extension Algorithm . 122 5.5.3 Hyacc and Other Parser Generators . 122 6 LR(k) Parser Generation 124 6.1 LR(k) Algorithm . 125 6.1.1 LR(k) Parser Generation Based on Recursive Lane-tracing . 125 6.1.2 Edge-pushing Algorithm: A Conceptual Example . 128 6.1.3 The Edge-pushing Algorithm . 133 6.1.4 Edge-pushing Algorithm on Cycle Condition . 135 6.2 Computation of theads(α; k) ............................. 137 6.2.1 The Problem . 137 6.2.2 Literature Review on theads(α; k) Calculation . 139 6.2.3 The theads(α; k) Algorithm Used in Hyacc . 141 6.3 Storage of LR(k) Parsing Table . 145 6.4 LR(k) Parse Engine . 148 6.5 Performance . 149 6.6 Examples . 150 6.7 Lane-tracing at Compile Time . 164 6.8 More Issues . 166 7 The Latex2gDPS compiler 171 x 7.1 Introduction . 171 7.2 Design of the Latex2gDPS Compiler . 172 7.2.1 Overall Design . 172 7.2.2 Data Structures . 173 7.2.3 Use of Special Declarations . 174 7.3 Current Status . 175 7.4 An Example . 177 8 Conclusion 179 9 Future Work 181 9.1 Study of More LR(1) Algorithms . 181 9.2 Issues in LR(k) Parser Generation . ..
Recommended publications
  • Summary of UNIX Commands Furnishing, Performance, Or the Use of These Previewers Commands Or the Associated Descriptions Available on Most UNIX Systems
    S u mmary of UNIX Commands- version 3.2 Disclaimer 8. Usnet news 9. File transfer and remote access Summary of UNIX The author and publisher make no warranty of any 10. X window kind, expressed or implied, including the warranties of 11. Graph, Plot, Image processing tools commands merchantability or fitness for a particular purpose, with regard to the use of commands contained in this 12. Information systems 1994,1995,1996 Budi Rahardjo reference card. This reference card is provided “as is”. 13. Networking programs <[email protected]> The author and publisher shall not be liable for 14. Programming tools damage in connection with, or arising out of the 15. Text processors, typesetters, and This is a summary of UNIX commands furnishing, performance, or the use of these previewers commands or the associated descriptions available on most UNIX systems. 16. Wordprocessors Depending on the configuration, some of 17. Spreadsheets the commands may be unavailable on your Conventions 18. Databases site. These commands may be a 1. Directory and file commands commercial program, freeware or public bold domain program that must be installed represents program name separately, or probably just not in your bdf search path. Check your local dirname display disk space (HP-UX). See documentation or manual pages for more represents directory name as an also df. details (e.g. man programname). argument cat filename This reference card, obviously, cannot filename display the content of file filename describe all UNIX commands in details, but represents file name as an instead I picked commands that are useful argument cd [dirname] and interesting from a user’s point of view.
    [Show full text]
  • Compiler Construction
    Compiler construction PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 10 Dec 2011 02:23:02 UTC Contents Articles Introduction 1 Compiler construction 1 Compiler 2 Interpreter 10 History of compiler writing 14 Lexical analysis 22 Lexical analysis 22 Regular expression 26 Regular expression examples 37 Finite-state machine 41 Preprocessor 51 Syntactic analysis 54 Parsing 54 Lookahead 58 Symbol table 61 Abstract syntax 63 Abstract syntax tree 64 Context-free grammar 65 Terminal and nonterminal symbols 77 Left recursion 79 Backus–Naur Form 83 Extended Backus–Naur Form 86 TBNF 91 Top-down parsing 91 Recursive descent parser 93 Tail recursive parser 98 Parsing expression grammar 100 LL parser 106 LR parser 114 Parsing table 123 Simple LR parser 125 Canonical LR parser 127 GLR parser 129 LALR parser 130 Recursive ascent parser 133 Parser combinator 140 Bottom-up parsing 143 Chomsky normal form 148 CYK algorithm 150 Simple precedence grammar 153 Simple precedence parser 154 Operator-precedence grammar 156 Operator-precedence parser 159 Shunting-yard algorithm 163 Chart parser 173 Earley parser 174 The lexer hack 178 Scannerless parsing 180 Semantic analysis 182 Attribute grammar 182 L-attributed grammar 184 LR-attributed grammar 185 S-attributed grammar 185 ECLR-attributed grammar 186 Intermediate language 186 Control flow graph 188 Basic block 190 Call graph 192 Data-flow analysis 195 Use-define chain 201 Live variable analysis 204 Reaching definition 206 Three address
    [Show full text]
  • Porting C to the IBM" Mainframe Environment Gary H. Merrill, SAS Institute Inc., Cary, NC
    Porting C to the IBM" Mainframe Environment Gary H. Merrill, SAS Institute Inc., Cary, NC ABSTRACT • Class 3 contains programs that take advantage in inessential ways of various features of the machine ThiS paper deals with issues encountered in porting existing C pro­ architecture (for example, the size of ints or size of grams to the mainframe environment Following a discussion con­ pointers). cerning what it means to say that a program is portable, this paper emphasizes program features that are likely to inhibit portability to • Class 4 contains programs that take advantage in and from the mainframe. inessential ways of various features of the host operating system and file system (for example, the format of file names, the maximum length of file names, or library THE MEANING OF PORTABILITY functions specific to that compiler or operating system). What does it mean to say that a program, or a body of source code, Potentially portable is portable? For most practical purposes (and portability is a practi­ • Class 5 contains programs that make use of certain cal concept), this means that if you have written and debugged the common, but not universally available, library interfaces (for program so that it runs correctly on one system, when you compile example. UNIX curses or XWINDOWS). and link the same code for a second system, the program will (with­ out modification) run correctly on that system as well. Thus, portabil­ • Class 6 contains programs that intentionally take advantage ity is a relation among a body of source cocle and two computer of features of the machine architecture (for example, the systems.
    [Show full text]
  • Red Hat Linux 6.0
    Red Hat Linux 6.0 The Official Red Hat Linux Installation Guide Red Hat Software, Inc. Durham, North Carolina Copyright ­c 1995, 1996, 1997, 1998, 1999 Red Hat Software, Inc. Red Hat is a registered trademark and the Red Hat Shadow Man logo, RPM, the RPM logo, and Glint are trademarks of Red Hat Software, Inc. Linux is a registered trademark of Linus Torvalds. Motif and UNIX are registered trademarks of The Open Group. Alpha is a trademark of Digital Equipment Corporation. SPARC is a registered trademark of SPARC International, Inc. Products bearing the SPARC trade- marks are based on an architecture developed by Sun Microsystems, Inc. Netscape is a registered trademark of Netscape Communications Corporation in the United States and other countries. TrueType is a registered trademark of Apple Computer, Inc. Windows is a registered trademark of Microsoft Corporation. All other trademarks and copyrights referred to are the property of their respective owners. ISBN: 1-888172-28-2 Revision: Inst-6.0-Print-RHS (04/99) Red Hat Software, Inc. 2600 Meridian Parkway Durham, NC 27713 P. O. Box 13588 Research Triangle Park, NC 27709 (919) 547-0012 http://www.redhat.com While every precaution has been taken in the preparation of this book, the publisher assumes no responsibility for errors or omissions, or for damages resulting from the use of the information con- tained herein. The Official Red Hat Linux Installation Guide may be reproduced and distributed in whole or in part, in any medium, physical or electronic, so long as this copyright notice remains intact and unchanged on all copies.
    [Show full text]
  • The Pkgsrc Guide
    The pkgsrc guide Documentation on the NetBSD packages system (2021/01/02) Alistair Crooks [email protected] Hubert Feyrer [email protected] The pkgsrc Developers The pkgsrc guide: Documentation on the NetBSD packages system by Alistair Crooks, Hubert Feyrer, The pkgsrc Developers Published 2021/01/02 08:32:15 Copyright © 1994-2021 The NetBSD Foundation, Inc pkgsrc is a centralized package management system for Unix-like operating systems. This guide provides information for users and developers of pkgsrc. It covers installation of binary and source packages, creation of binary and source packages and a high-level overview about the infrastructure. Table of Contents 1. What is pkgsrc?......................................................................................................................................1 1.1. Introduction.................................................................................................................................1 1.1.1. Why pkgsrc?...................................................................................................................1 1.1.2. Supported platforms.......................................................................................................2 1.2. Overview.....................................................................................................................................3 1.3. Terminology................................................................................................................................4 1.3.1. Roles involved in pkgsrc.................................................................................................4
    [Show full text]
  • Mincpp: Efficient, Portable Compilation of C Programs
    Mincpp: Efficient, Portable Compilation of C Programs George Moberly B.S., Boston University A thesis submitted to the faculty of the Oregon Graduate Institute of Science & Technology in partial fulfillment of the requirements for the degree Master of Science in Computer Science April, 1994 The thesis "Mincpp:Efficient, Port; le Compilation of C Programs" by George Moberly has been examined and approved b: the following Examination Committee: .-------------______--s---------------------- a Sheard, Ph.D. 1 sistant Professor The s Research Advisor eve Otto, Ph.D. sistant Professor Acknowledgment This thesis and supporting degree program was made possible only with the financial, intellectual, and emotional support of many corporations, institutions, and individuals. Past and present employers Digital Equipment Corporation, Mentor Graphics Corporation, and SpaceLabs Medical, Incorporated generously paid for the vast majority of the educational expenses involved, and have actively supported my work. Degree credit was transferred from Tufts University, The University of Washington, and Oregon State University. The following individuals shared generously of their time to discuss issues with me: Phil Brooks and Mike Gresham of Mentor Graphics, Bill Bregar of Oregon State University, Aki Fujimura of Pure Software, Greg Kostal, Jeff Small, and of course Tim Sheard and Steve Otto of OGI. Finally, my wife Julie deserves profound credit and thanks for not letting me give up when it would have been easier to have done so, and for unwavering belief
    [Show full text]
  • Jflex User's Manual
    The Fast Lexical Analyser Generator Copyright c 1998–2004 by Gerwin Klein JFlex User’s Manual Version 1.4.1, November 7, 2004 Contents 1 Introduction 3 1.1 Design goals ..................................... 3 1.2 About this manual ................................. 3 2 Installing and Running JFlex 4 2.1 Installing JFlex ................................... 4 2.1.1 Windows .................................. 4 2.1.2 Unix with tar archive ........................... 4 2.1.3 Linux with RPM .............................. 5 2.2 Running JFlex ................................... 5 3 A simple Example: How to work with JFlex 6 3.1 Code to include ................................... 9 3.2 Options and Macros ................................ 9 3.3 Rules and Actions ................................. 10 3.4 How to get it going ................................. 12 4 Lexical Specifications 12 4.1 User code ...................................... 12 4.2 Options and declarations .............................. 13 4.2.1 Class options and user class code ..................... 13 4.2.2 Scanning method .............................. 15 4.2.3 The end of file ............................... 16 4.2.4 Standalone scanners ............................ 17 4.2.5 CUP compatibility ............................. 17 4.2.6 BYacc/J compatibility ........................... 18 4.2.7 Code generation .............................. 18 4.2.8 Character sets ............................... 19 4.2.9 Line, character and column counting ................... 20 1 4.2.10 Obsolete JLex options
    [Show full text]
  • Licensing Information User's Guide
    Oracle® Exadata Database Machine Licensing Information User's Guide 21.2 F29248-08 June 2021 Oracle Exadata Database Machine Licensing Information User's Guide, 21.2 F29248-08 Copyright © 2008, 2021, Oracle and/or its affiliates. Primary Author: Peter Fusek Contributing Authors: Beth Roeser, James Spiller This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or "commercial
    [Show full text]
  • And Pattern-Oriented Compiler Construction in C++
    UNIVERSIDADE TECNICA´ DE LISBOA INSTITUTO SUPERIOR TECNICO´ Object- and Pattern-Oriented Compiler Construction in C++ A hands-on approach to modular compiler construction using GNU flex, Berkeley yacc and standard C++ David Martins de Matos January 2006 Foreword This is a preliminary revision. Lisboa, April 16, 2008 David Martins de Matos Contents I Introduction1 1 Introdution 3 1.1 Introduction..............................................3 1.2 Who Should Read This Document?.................................3 1.3 Regarding C++............................................4 1.4 Organization..............................................4 2 Using C++ and the CDK Library5 2.1 Introduction..............................................5 2.2 The CDK Library...........................................5 2.2.1 The abstract compiler factory................................8 2.2.2 The abstract scanner class..................................9 2.2.3 The abstract compiler class..................................9 2.2.4 The parsing function..................................... 11 2.2.5 The node set......................................... 12 2.2.6 The abstract evaluator.................................... 12 2.2.7 The abstract semantic processor............................... 14 2.2.8 The code generators..................................... 14 2.2.9 Putting it all together: the main function........................... 15 2.3 Summary............................................... 15 II Lexical Analysis 17 3 Theoretical Aspects of Lexical Analysis 19 3.1 What
    [Show full text]
  • Version : 2009K Vendor: Centos Release : 1.El5 Build Date
    Name : tzdata Relocations: (not relocatable) Version : 2009k Vendor: CentOS Release : 1.el5 Build Date: Mon 17 Aug 2009 06:43:09 PM EDT Install Date: Mon 19 Dec 2011 12:32:58 PM EST Build Host: builder16.centos.org Group : System Environment/Base Source RPM: tzdata-2009k- 1.el5.src.rpm Size : 1855860 License: GPL Signature : DSA/SHA1, Mon 17 Aug 2009 06:48:07 PM EDT, Key ID a8a447dce8562897 Summary : Timezone data Description : This package contains data files with rules for various timezones around the world. Name : nash Relocations: (not relocatable) Version : 5.1.19.6 Vendor: CentOS Release : 54 Build Date: Thu 03 Sep 2009 07:58:31 PM EDT Install Date: Mon 19 Dec 2011 12:33:05 PM EST Build Host: builder16.centos.org Group : System Environment/Base Source RPM: mkinitrd-5.1.19.6- 54.src.rpm Size : 2400549 License: GPL Signature : DSA/SHA1, Sat 19 Sep 2009 11:53:57 PM EDT, Key ID a8a447dce8562897 Summary : nash shell Description : nash shell used by initrd Name : kudzu-devel Relocations: (not relocatable) Version : 1.2.57.1.21 Vendor: CentOS Release : 1.el5.centos Build Date: Thu 22 Jan 2009 05:36:39 AM EST Install Date: Mon 19 Dec 2011 12:33:06 PM EST Build Host: builder10.centos.org Group : Development/Libraries Source RPM: kudzu-1.2.57.1.21- 1.el5.centos.src.rpm Size : 268256 License: GPL Signature : DSA/SHA1, Sun 08 Mar 2009 09:46:41 PM EDT, Key ID a8a447dce8562897 URL : http://fedora.redhat.com/projects/additional-projects/kudzu/ Summary : Development files needed for hardware probing using kudzu.
    [Show full text]
  • Summary of UNIX Commands- Version 3.2
    Summary of UNIX Commands- version 3.2 Disclaimer 8. Usnet news 9. File transfer and remote access Summary of UNIX The author and publisher make no warranty of any 10. X window kind, expressed or implied, including the warranties of 11. Graph, Plot, Image processing tools commands merchantability or fitness for a particular purpose, with regard to the use of commands contained in this 12. Information systems Ó 1994,1995,1996 Budi Rahardjo reference card. This reference card is provided “as is”. 13. Networking programs <[email protected]> The author and publisher shall not be liable for 14. Programming tools damage in connection with, or arising out of the 15. Text processors, typesetters, and This is a summary of UNIX commands furnishing, performance, or the use of these previewers available on most UNIX systems. commands or the associated descriptions. 16. Wordprocessors Depending on the configuration, some of 17. Spreadsheets the commands may be unavailable on your Conventions 18. Databases site. These commands may be a commercial program, freeware or public 1. Directory and file commands bold domain program that must be installed represents program name separately, or probably just not in your bdf search path. Check your local dirname display disk space (HP-UX). See documentation or manual pages for more represents directory name as an also df. details (e.g. man programname). argument cat filename This reference card, obviously, cannot filename display the content of file filename describe all UNIX commands in details, but represents file name as an instead I picked commands that are useful argument cd [dirname] and interesting from a user’s point of view.
    [Show full text]
  • And Pattern-Oriented Compiler Construction in C++
    UNIVERSIDADE TECNICA´ DE LISBOA INSTITUTO SUPERIOR TECNICO´ Object- and Pattern-Oriented Compiler Construction in C++ A hands-on approach to modular compiler construction using GNU flex, Berkeley yacc and standard C++ David Martins de Matos January 2006 ÓÖÛÓÖ ÛÐÑÒØ× Lisboa, May 4, 2007 David Martins de Matos ÓÒØÒØ× I Introduction 1 1 Introdution 3 1.1 Introduction .............................. 3 1.2 WhoShouldReadThisDocument?. 3 1.3 Organization.............................. 4 2 Using C++ and the CDK Library 5 2.1 Introduction .............................. 5 2.2 RegardingC++ ............................ 5 2.3 TheCDKLibrary ........................... 6 2.3.1 Theabstractcompilerfactory . 7 2.3.2 Theabstractscannerclass . 9 2.3.3 Theabstractcompilerclass . 10 2.3.4 Theparsingfunction . 11 2.3.5 Thenodeset.......................... 12 2.3.6 Theabstractevaluator . 13 2.3.7 Theabstractsemanticprocessor . 15 2.3.8 Thecodegenerators . 15 2.3.9 Putting it all together: the main function . 16 2.4 Summary................................ 17 II Lexical Analysis 19 3 TheoreticalAspectsofLexicalAnalysis 21 3.1 WhatisLexicalAnalysis? . 21 i 3.1.1 Language ........................... 21 3.1.2 RegularLanguage .... ... .... .... .... ... 21 3.1.3 RegularExpressions . 21 3.2 FiniteStateAcceptors. 21 3.2.1 BuildingtheNFA....................... 22 3.2.2 Determinization: Building the DFA . 22 3.2.3 CompactingtheDFA. 24 3.3 AnalysingaInputString . 26 3.4 BuildingLexicalAnalysers. 27 3.4.1 Theconstructionprocess. 27 3.4.1.1 TheNFA ...................... 27 3.4.1.2 The DFA and the minimized DFA . 27 3.4.2 TheAnalysisProcessandBacktracking . 29 3.5 Summary................................ 29 4 The GNU flex Lexical Analyser 31 4.1 Introduction .............................. 31 4.1.1 Thelexfamilyoflexicalanalysers . 31 4.2 TheGNUflexanalyser ........................ 31 4.2.1 Syntaxofaflexanalyserdefinition . 31 4.2.2 GNUflexandC++ ...................... 31 4.2.3 TheFlexLexerclass.
    [Show full text]