Parsing for Agile Modeling
Total Page:16
File Type:pdf, Size:1020Kb
Parsing For Agile Modeling Inauguraldissertation der Philosophisch-naturwissenschaftlichen Fakultat¨ der Universitat¨ Bern vorgelegt von Jan Kursˇ von Tschechien | downloaded: 7.10.2021 Leiter der Arbeit: Prof. Dr. Oscar Nierstrasz Institut fur¨ Informatik https://doi.org/10.24442/boristheses.891 source: Parsing For Agile Modeling Inauguraldissertation der Philosophisch-naturwissenschaftlichen Fakultat¨ der Universitat¨ Bern vorgelegt von Jan Kursˇ von Tschechien Leiter der Arbeit: Prof. Dr. Oscar Nierstrasz Institut fur¨ Informatik Von der Philosophisch-naturwissenschaftlichen Fakultat¨ angenommen. Bern, 25.10.2016 Der Dekan: Prof. Dr. Gilberto Colangelo This dissertation can be downloaded from scg.unibe.ch. Copyright c 2016 by Jan Kursˇ This work is licensed under a Creative Commons Attribution-Non-Commercial-No derivative works 2.5 Switzerland license. To see the license go to http://creativecommons.org/licenses/by-sa/2.5/ch/ Attribution–ShareAlike Abstract Agile modeling refers to a set of methods that allow for a quick initial development of an importer and its further refinement. These requirements are not met simultaneously by the current parsing technology. Problems with parsing became a bottleneck in our research of agile modeling. In this thesis we introduce a novel approach to specify and build parsers. Our approach allows for expressive, tolerant and composable parsers with- out sacrificing performance. The approach is based on a context-sensitive extension of parsing expression grammars that allows a grammar engineer to specify complex language restrictions. To insure high parsing perfor- mance we automatically analyze a grammar definition and choose differ- ent parsing strategies for different parts of the grammar. We show that context-sensitive parsing expression grammars allow for highly composable, tolerant and variable-grained parsers that can be easily refined. Different parsing strategies significantly insure high-performance of parsers without sacrificing expressiveness of the underlying grammars. 2 Contents 1 Introduction 8 1.1 Agile Modeling . .9 1.2 Parsing Obstacles of Agile Modeling . 12 1.3 Thesis . 13 1.4 Our Contribution . 13 2 Overview of Parsing Technologies 15 2.1 Parsing in the Wild . 15 2.1.1 Expressive Power . 15 2.1.2 Composability . 17 2.1.3 Tolerant Grammars and Semi-Parsing . 18 2.1.4 Performance . 19 2.1.5 Parsing Frameworks . 20 2.2 Existing Limitations . 22 2.3 Our Solution . 23 3 Parsing Expression Grammars and PetitParser 24 3.1 Parsing Expression Grammars . 24 3.1.1 PEG Analysis . 25 3.1.2 Parser Combinators . 26 3.2 PetitParser . 29 4 Context Sensitivity in Parsing Expression Grammars 32 4.1 Motivating Example . 33 4.2 Parsing Contexts . 34 4.2.1 Context-Sensitive Extension . 35 4.2.2 Indentation Stack . 35 4.3 Parsing Contexts in Parsing Expression Grammars . 37 4.3.1 Parser Combinators . 40 4.3.2 CS-PEG analysis . 40 4.4 Implementation . 43 4.4.1 Performance . 44 4.5 Case Studies . 47 4.5.1 Python . 47 4.5.2 Markdown . 49 4.6 Related Work . 52 4.7 Conclusion . 54 4 CONTENTS 5 5 Semi-Parsing with Bounded Seas 55 5.1 Motivating Example . 56 5.1.1 Why not use Regular Expressions? . 56 5.1.2 A Na¨ıve Island Grammar . 57 5.1.3 An Advanced Island Grammar . 57 5.2 Bounded Seas . 59 5.2.1 The Sea Boundary . 60 5.2.2 The Context Sensitivity of Bounded Seas . 61 5.3 Bounded Seas in Parsing Expression Grammars . 62 5.3.1 The Water Operator . 63 5.3.2 The NEXT function . 67 5.3.3 BS-PEG analysis . 71 5.4 Implementation . 72 5.4.1 Performance . 73 5.5 Java Parser Case Study . 73 5.5.1 Without Nested Classes . 75 5.5.2 With Nested Classes . 75 5.5.3 With Return Types . 76 5.5.4 Performance . 77 5.6 Related Work . 79 5.7 Conclusion . 82 6 Adaptable Parsing Strategies 83 6.1 Motivating Example . 84 6.1.1 Composition Overhead . 86 6.1.2 Superfluous Intermediate Objects . 86 6.1.3 Backtracking Overhead . 87 6.1.4 Context-Sensitivity Overhead . 87 6.2 A Parser Combinator Compiler . 88 6.2.1 Adaptable Strategies . 88 6.3 Parser Optimizations . 90 6.3.1 Regular Optimizations . 92 6.3.2 Context-Free Optimizations . 94 6.3.3 Context-Sensitive Optimizations . 96 6.4 Performance analysis . 100 6.4.1 PetitParser compiler . 100 6.4.2 Benchmarks . 101 6.4.3 Parsing Strategies Impact . 103 6.4.4 Scanner Impact . 105 6.4.5 Memoization Impact . 107 6.4.6 Java Parsers Comparison . 108 6.4.7 Smalltalk Parsers Comparison . 108 6.5 Related Work . 109 6.6 Conclusion . 110 7 Ruby Case study 112 7.1 Ruby Structure . 112 7.1.1 The Dangling End Problem . 113 7.1.2 Measurements . 115 7.2 Ruby Method Calls . 116 CONTENTS 6 7.2.1 Measurements . 117 7.3 Performance . 119 7.4 Conclusion . 121 8 Conclusion 123 A Formal development of PEGs 136 B Bounded Seas Examples 141 B.1 Example of Dynamic NEXT computation . 141 B.2 Example of Static NEXT computation . 142 B.3 Overlapping Seas Example . 147 C Implementation 152 C.1 Bounded seas . 152 D Layout Sensitivity in the Wild 159 D.1 Haskell . 159 D.2 Python . 160 D.3 F#.................................... 161 D.4 YAML . 162 D.5 OCaml . 162 D.6 CoffeeScript . 163 D.7 Grace . 163 D.8 SRFI 49 — Indentation-Sensitive Scheme . 164 D.9 Elastic Tabstops . 164 E Scanner 166 E.1 Scanners in PEG-based parsers . 166 E.1.1 Tokens and Scannable Parsing Expressions . 166 E.1.2 Scannable Choices . 167 E.1.3 Scanner . 167 E.2 Regular Parsing Expressions . 170 E.3 Regular Parsing Expression Languages . 172 E.4 Finite State Automata . 175 E.4.1 Construction of finite state automata from regular parsing ex- pressions (FSA)........................ 178 E.4.2 Determinization of the automata with epsilons and priorities (D) 181 F Measurements 183 F.1 Summary . 183 F.2 Strategies Details . 187 F.2.1 Expressions . 190 F.2.2 IS Expressions . 191 F.2.3 CF Python . 192 F.2.4 Python . 193 F.2.5 Smalltalk . 194 F.2.6 Java . 195 F.2.7 Java Sea . 196 F.3 Scanner Impact . 197 F.3.1 Expressions . 199 CONTENTS 7 F.3.2 Smalltalk . 200 F.4 Memoization Details . 201 F.5 Smalltalk Parsers . 202 F.6 Java Parsers . ..