Chapter 5 Disambiguation Strategies
Total Page:16
File Type:pdf, Size:1020Kb
Eindhoven University of Technology MASTER Disambiguation mechanisms and disambiguation strategies ten Brink, A.P. Award date: 2013 Link to publication Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain Department of Mathematics and Computer Science Den Dolech 2, 5612 AZ Eindhoven P.O. Box 513, 5600 MB Eindhoven The Netherlands www.tue.nl Supervisor prof. dr. Mark van den Brand (TU/e) Disambiguation mechanisms and Section disambiguation strategies Model Driven Software Engineering Date August 23, 2013 Master’s Thesis Alex P. ten Brink Where innovation starts Abstract It is common practice for engineers to express artifacts such as domain specific languages using ambiguous context-free grammars along with separate disambiguation rules, rather than to use equivalent unambiguous context-free grammars, as these ambiguous grammars often capture the intent of the engineer more directly. In this thesis we give an overview of the landscape of these disambiguation rules as described in the literature. In order to make proper comparisons between publications, we define a parser-technology independent framework for describing work done on disambiguation, using the notions of a disambiguation mechanism and a disambiguation strategy. We investigate for a number of disambiguation mechanisms what grammars they can dis- ambiguate, both relative to each other and in absolute terms: for example, we prove that the LR shift-reduce conflict resolution mechanism (used in the well-known tools Yacc and Bison) correctly disambiguates expression grammars with unary and binary operators. To the best of our knowledge, previous work only considered the correctness of this mechanism on specific grammars. We define six quality measures on disambiguation strategies and compare the strategies intro- duced in the literature to each other on these measures. Finally, we introduce our own strategy based on regular rewriting rules. Our strategy scores well on all but one of our quality mea- sures. Its weakness is that it may not succeed on all grammars. Our strategy is very extensible however, and we give multiple options to improve its applicability in future work. iii Acknowledgements I would first like to thank my parents for their unconditional support, not only while I was working on my thesis but throughout my academic career (and before that) – I would not have gotten to where I am without them. I would also like to thank the friends I made over the past five years for making my student life fun rather than a chore. In particular I thank Sander and Quirijn, who made even the contents of courses fun to work through, and Ana, for making the time I spent at the TU working on this thesis that much more enjoyable. I would like to thank Mark van den Brand for his guidance, feedback and insights, both on this thesis and on related work. The other major influence on this thesis and related work is Elisabeth Scott, whom I’d like to thank for her feedback, ideas and time, and in particular her remarks whenever I was not being precise enough – though I fear I may get a few more still. I would also like to thank Adrian Johnstone for reviewing part of this thesis. Finally, I would like to thank Tom Verhoeff and Kevin Buchin for serving on my examination committee. Alex P. ten Brink Eindhoven August 23, 2013 v Contents Abstract iii Acknowledgements v 1 Introduction 1 2 Preliminaries 5 3 Disambiguation Mechanisms 9 3.1Introduction....................................... 9 3.2DisambiguationMechanismsIntroducedintheLiterature............. 10 3.3ComparisonofDisambiguationMechanisms..................... 13 4 Applying Disambiguation Mechanisms to Expression Grammars 17 4.1Real-WorldExampleofaHardDisambiguationProblem.............. 17 4.2 Expression Grammars . .............................. 20 4.3 Defining Precedences and Associativities ....................... 21 4.4DisambiguatingExpressionGrammarswithBinaryOperators........... 23 4.5AddingUnaryOperators................................ 25 5 Disambiguation Strategies 29 5.1Introduction....................................... 29 5.2DisambiguationStrategiesIntroducedintheLiterature............... 30 6 A rewriting-based disambiguation strategy 35 6.1Introduction....................................... 35 6.2 Systems of Regular Expressions ............................ 38 6.3 Rewriting Rules for Regular Expressions ....................... 38 6.3.1 DefiningRewriters............................... 38 6.3.2 OurRewriters................................. 39 6.4StructuredRegularExpressions............................ 42 6.5RewritingRulesforStructuredRegularExpressions................ 44 vii 6.5.1 DefiningRewriters............................... 44 6.5.2 Enforcing Filters ................................ 44 6.5.3 OurRewriters................................. 45 7 Implementation 49 8 Conclusion 53 viii Chapter 1 Introduction Context-free grammars are widely-used tools for describing the syntactic structure of program- ming languages. They are often used to describe the second step in the analysis of an input program, after (optional) tokenization and before semantic analysis. An advantage of this de- scription is that for any context-free grammar it is possible to automatically generate a program, called a parser, that can compute this syntactic structure from an input, represented as a parse tree. A disadvantage is that some context-free grammars contain ambiguities: a single input may be parsed to multiple parse trees. For programming languages it is desirable that a program has only one interpretation. There- fore one would like to only use unambiguous context-free grammars. However, in some cases the most natural context-free description of a programming language is an ambiguous one. An example are arithmetic expressions: the most natural grammar for them leaves open whether 1+2∗ 3 should be parsed as (1 + 2) ∗ 3oras1+(2∗ 3). Some additional form of disambiguation is then needed: in our example we would need to specify whether ∗ has priority over +. Often, an unambiguous grammar capturing this additional disambiguation can be found, but these grammars are often much larger, less intuitive and less maintainable as they may not capture the intent of the language designer. Mechanisms and Strategies In practice it is therefore often preferable to use a syntactic description with two parts: an ambiguous context-free grammar along with disambiguation rules. There are many ways to achieve this separation, and many such methods have been described in the literature. We call everything involved in such a method – the allowed disambiguation rules, the way these disambiguation rules are applied in the parser, the way the language designer is helped with choosing these rules and with understanding the effects of the rules – a disambiguation strategy. Most strategies have at their core a specific mechanism that specifies how inputs should be parsed (or in some cases, multiple mechanisms). Strategies (usually) detect ambiguities in the grammar and then find the right parameters for the mechanisms they employ so the inputs are parsed as specified. As a number of such disambiguation mechanisms are very common, it is useful to analyze them separately from the strategies that use them. When we are investigating strategies, we can use this knowledge of the mechanisms it uses to immediately say something about the strategy. As an example of a strategy, consider the disambiguation strategy typical for users of Yacc or Bison. First, they create an initial grammar, generate the LR table and analyze the resulting conflicts by hand. They then specify precedence and associativity rules for their grammar that 1 Yacc or Bison then uses to resolve the conflicts in the table. The disambiguation mechanism involved in this strategy is the resolution of LR table conflicts. Note that this strategy is not really a good strategy as the user needs to do most of the work himself and the strategy may fail to work as not all ambiguities can be resolved through LR table conflict resolution. While many disambiguation mechanisms and strategies have been proposed, not much work has gone into their evaluation and comparison. For example, to the best of our knowledge the correctness of the mechanism employed by Yacc and Bison has only been considered on examples. This thesis sets some initial steps in this area by defining our parser-technology inde- pendent framework consisting of the notions of disambiguation mechanisms and disambiguation strategies and by addressing the following questions: • What are relevant quality criteria for disambiguation mechanisms? • Which disambiguation