A Computer Language Transformation System Capable of Generalized Context-Dependent Parsing

A Computer Language Transformation System Capable of Generalized Context-Dependent Parsing by Adrian D. Thurston A thesis submitted to the School of Computing in conformity with the requirements for the degree of Doctor of Philosophy Queen's University Kingston, Ontario, Canada December 2008 Copyright c Adrian D. Thurston, 2008 ISBN 978-0-494-46208-9 Abstract Source transformation systems are special-purpose programming languages, or in some cases suites of languages, that are designed for the analysis and transformation of computer languages. They enable rapid prototyping of programming languages, source code renovation, language-to-language translation, design recovery, and other custom analysis techniques. With the emergence of these systems a serious problem is evident: expressing a parser for common computer languages is sometimes very difficult. Source transformation systems employ generalized parsing algorithms, and while these are well suited for the kind of agile parsing techniques in use by transformation practitioners, they are not well suited for parsing languages that are context-dependent. Traditional deterministic parser generators do not stumble in this area, but they sacrifice the generalized parsing abilities that transformation systems depend on. When it is hard to get the input into the system as a correct and accurate parse tree the utility of the unified transformation environment is degraded and more ad hoc approaches become attractive for processing input. This thesis is about the design of a new computer language transformation system with a focus on enhancing the parsing system to support generalized context-dependent parsing. We argue for the use of backtracking LR as the generalized parsing algorithm. We present an enhancement to backtracking LR that allows us to control the parsing of an ambiguous grammar by ordering the productions of the grammar definitions. We add a i grammar-dependent lexical solution and integrate it with our ordered choice parsing strategy. We design a transformation language that is closer to general-purpose programming languages, yet enables common transformation techniques. We add semantic actions to our backtracking LR parsing engine and encourage the modification of global state in support of context-dependent parsing. We introduce semantic undo actions for reverting changes to global state during backtracking, thereby enabling generalized context-dependent parsing. Finally, we free the user from having to write undo actions by employing automatic reverse execution. The resulting system allows a wider variety of computer languages to be ana- lyzed. By focusing on improving parsing abilities and moving to a transformation language that resembles general-purpose languages, we aim to extend the transformation paradigm to allow greater use by practitioners who face an immediate need to parse, analyze and transform computer languages. ii Acknowledgements I would like to thank my supervisor, James R. Cordy, for guidance during the production of this work. Jim was always available for discussions that left me with a fresh perspective. He has an amazing ability to relate new ideas to the huge and rather intimidating body of research that we have behind us. His ability to articulate my ideas in a manner that inspires confidence in my own work proved invaluable to me. I would like to thank lab members Jeremy S. Bradbury and Chanchal K. Roy for many productive discussions about research and grad school life. I am aware that in a lifetime it is possible to pursue interests that are easily shared with those you love. Like many grad students, this was not the case with me. My interests lie far from those of my family, yet my family always supported my personal obsessions and distractions. I would especially like to thank them for their love and support in my final year. Nobody is forced to write a thesis, and completing one takes a lot of hard work. Persistence in this kind of situation requires support from people who are willing to sacrifice while you chase after your own goals. My family did just that, and it is one of the many reasons why I am forever indebted to them. iii Statement of Originality I hereby certify that the research presented in this dissertation is my own, conducted under the supervision of James R. Cordy. Ideas and techniques that are not a product of my own work are cited, or, in cases where citations are not available, are presented using language that indicates they existed prior to this work. iv Table of Contents Abstract i Acknowledgements iii Statement of Originality iv Table of Contents v List of Figures xi Chapter 1: Introduction 1 1.1 Computer Language Transformation . 1 1.2 Motivation . 2 1.2.1 Parsing . 2 1.2.2 A New Transformation Language . 4 1.3 Thesis Outline . 5 Chapter 2: Background 7 2.1 Introduction . 7 2.2 Anatomy of a Transformation System . 8 2.2.1 Tokenization . 9 2.2.2 Generalized Parsing . 10 2.2.3 Representing Trees . 11 v 2.2.4 Tree Traversal . 12 2.2.5 Tree Pattern Matching . 13 2.2.6 Tree Construction . 13 2.2.7 Unparsing . 13 2.3 Generalized Parsing . 14 2.3.1 Classification of Parsing Strategies . 15 2.3.2 Unger's Method . 20 2.3.3 CYK Method . 24 2.3.4 Earley Parsing . 26 2.3.5 Generalized Top-Down . 30 2.3.6 Backtracking LR . 36 2.3.7 Generalized LR . 37 2.3.8 Limitations on Grammars Imposed by the Algorithms . 41 2.3.9 Time Complexities . 43 2.3.10 Controlling the Parse of Ambiguous Grammars . 45 2.3.11 Error Recovery . 46 2.3.12 Summary . 47 2.4 Context-Dependent Parsing . 48 2.4.1 Global State . 49 2.4.2 Nondeterminism and the Global State . 49 2.4.3 Requirements . 51 2.5 Source Transformation Systems . 54 2.5.1 ASF+SDF . 54 2.5.2 ELAN . 55 2.5.3 Stratego Language . 56 2.5.4 TXL . 57 2.5.5 Integration with General-Purpose Languages . 58 vi 2.6 Summary . 60 Chapter 3: Generalized Parsing for Tree Transformation 62 3.1 Introduction . 62 3.2 Backtracking LR . 63 3.3 User-Controlled Parsing Strategy . 64 3.3.1 Ordering Example . 66 3.3.2 Out-of-Order Parse Correction . 67 3.3.3 Ordered Choice and Left Recursion . 70 3.3.4 Sequences . 72 3.3.5 Longest/Shortest Match . 73 3.4 Discarding Alternatives . 74 3.5 Error Recovery . 75 3.6 Grammar-Dependent Lexical Rules . 76 3.7 Summary . 78 Chapter 4: COLM: A Tree-Based Programming Language 80 4.1 Designing a New Transformation System . 81 4.1.1 Implementing a New System . 81 4.1.2 Designing a New Language . 82 4.2 Language Overview . 83 4.2.1 Value Semantics . 84 4.2.2 Root Language Elements . 84 4.2.3 Tokens . 85 4.3 General-Purpose Constructs . 85 4.3.1 Statements . 85 4.3.2 Expressions . 87 4.3.3 Functions . 87 vii 4.3.4 Global Variables . 89 4.3.5 Dynamic Variables . 90 4.4 Type Definitions . 91 4.4.1 Token Definitions . 92 4.4.2 Context-Free Language Definitions . 95 4.4.3 Extended Types . 96 4.5 Matching Patterns . 98 4.6 Synthesizing Trees . 101 4.7 Traversing and Manipulating Trees . 103 4.7.1 References . 104 4.7.2 Iterators . 105 4.7.3 User-Defined Iterators . 107 4.8 Generic Transformation Idioms . 107 4.8.1 Top-Down, Left-Right, One-Pass . 107 4.8.2 Fixed-Point Iteration . 109 4.8.3 Bottom-Up Traversals . 110 4.8.4 Right-Left Traversals . 110 4.9 Transformation Example: Goto Elimination . 110 4.10 Implementation Notes . 111 4.10.1 Tree Sharing . 112 4.10.2 The Tree-Kid Pair . 114 4.10.3 Reference Chains . 114 4.10.4 Implementation-Caused Language Restrictions . 116 4.10.5 The Virtual Machine . 118 4.11 Summary . 119 Chapter 5: Context-Dependent Parsing 120 viii 5.1 Token-Generation Actions . 121 5.2 Reduction Actions . 122 5.3 Semantic Predicates . ..

Load more