A PEG-Based Macro System for Java

Université Catholique de Louvain Ecole Polytechnique de Louvain INGI Department A PEG-Based Macro System for Java A thesis submitted to the Université Catholique de Louvain in accordance with the requirements of the degree of Master in Computer Science (120 credits). Author: Nicolas Laurent Supervisor: Pr. Peter Van Roy Readers: Pr. Kim Mens & Yves Jaradin Louvain-la-Neuve, academic year 2012 - 2013 - 2 - Abstract Programming is abstracting. As such, programming languages supply us with abstraction facilities such as functions and classes. One such facility is the macro. A macro defines new language constructs, to be translated at compile-time into existing constructs. Macros permit the abstraction of patterns that cannot be abstracted by the usual abstraction facilities. We introduce caxap, a framework that adds a Lisp-like macro system on top of Java, a programming language with a complex grammar and lacking the property of homoiconicity. Our aim is to be as general as possible, allowing arbitrary syntax to be added to the language, and arbitrary computation to be done to translate the new syntax to the old. To that end, we use parsing expression grammars (PEGs) as tools to reason about and manipulate syntax. - 2 - Acknowledgements I would first like to thank my friends, which have helped me keep my sanity and my motivation during all these months. I’m thinking of Arnaud Beelen, Adrien Bibal, Etienne Martel, Anthony Popolo Cagnisi, Renaud Proye, Valentin Vansteenberghe and Romain Van Vooren. My thanks also extend to my family, which has always supported me. I would also like to thank my promoter, Peter Van Roy, for the freedom with which he has entrusted me. I hope I used it well. I’m also grateful to all the professors and teaching assistants which imparted their knowledge to me during these five years. - 2 - It is impossible to dissociate language from science or science from language, because every natural science always involves three things: the sequence of phenomena on which the science is based, the abstract concepts which call these phenomena to mind, and the words in which the concepts are expressed. To call forth a concept, a word is needed; to portray a phenomenon, a concept is needed. All three mirror one and the same reality. —Antoine Lavoisier, 1789 - 4 - Contents 1 Macros 9 1.1 Generalities......................................9 1.2 Examples....................................... 10 1.3 Matching Input and Defining Context....................... 11 1.4 Mapping an Input Sequence to its Replacement.................. 12 2 Motivation 13 2.1 Macros in the Wild.................................. 13 2.2 The Expressiveness of Macros............................ 14 2.3 Abstracting More................................... 14 2.4 Meta-Programming and Source Rewriting..................... 16 2.5 caxap: Source Rewriting Made Practical...................... 16 2.6 Targeting the Java Programming Language.................... 17 2.7 Meta-Abstraction................................... 17 2.8 Examples....................................... 18 3 Parsing Expression Grammars 19 3.1 Context Free Grammars (CFGs).......................... 19 3.1.1 CFG Basics.................................. 19 3.1.2 Extended Notations............................. 20 3.1.3 Generated Language & Nonterminal Expansion.............. 21 3.1.4 Parsing CFGs................................. 21 3.2 PEG as a CFG Extension.............................. 22 3.2.1 Ordered Choice and Single Parse Rule................... 23 3.2.2 Lookahead Operators............................ 23 3.2.3 Philosophical Differences........................... 23 3.3 Practical Implication of PEGs’ Specificities.................... 24 3.3.1 Ambiguity & Language Hiding....................... 24 3.3.2 Left-Recursion................................ 24 3.3.3 Backtracking & Greed............................ 24 3.3.4 Scannerless Parsing............................. 25 3.3.5 Memoization................................. 26 3.4 Further Comparison of PEGs and CFGs...................... 26 3.4.1 Reasoning about Grammars......................... 26 3.4.2 Expressivity.................................. 27 3.4.3 Grammar Composition............................ 28 3.5 Definition of the PEG Formalism.......................... 28 5 3.6 Extended Notation.................................. 30 3.6.1 Matching Characters............................. 30 3.6.2 New Operators................................ 31 3.6.3 Recapitulation & Precedence........................ 32 3.7 Parsing PEGs..................................... 33 3.7.1 Naive Parser................................. 33 3.7.2 Packrat Parsing................................ 33 3.7.3 Left-Recursion................................ 35 3.8 Why PEGs...................................... 36 4 User Manual 39 4.1 Introduction...................................... 39 4.1.1 What is caxap ?............................... 39 4.1.2 What is a macro?.............................. 40 4.1.3 How does caxap deal with files?....................... 40 4.1.4 How does caxap recognize macros?..................... 40 4.1.5 How does caxap expand macros?...................... 41 4.1.6 Where can macro calls appear?....................... 41 4.2 Installation...................................... 41 4.3 Invocation....................................... 42 4.4 A Simple Example.................................. 42 4.5 The require Statement: Importing Macros and Specifying Dependencies.... 45 4.5.1 import and require ............................. 45 4.5.2 General Form................................. 46 4.5.3 Package-Wide and Class-Wide Requires.................. 46 4.5.4 Syntactic Sugars............................... 46 4.5.5 Static Requires................................ 47 4.5.6 Macro Requires................................ 47 4.5.7 When to use import or require?...................... 48 4.6 The Match Class: Representing the Syntax Tree.................. 48 4.7 Macro Definition................................... 49 4.7.1 Spacing.................................... 49 4.7.2 Compile-Time Environement........................ 50 4.7.3 Captures................................... 51 4.7.4 Expansion Strategies............................. 51 4.7.5 Prioritary Macros.............................. 52 4.8 Working with Matches................................ 53 4.8.1 Match Information.............................. 53 4.8.2 MatchSpec ................................... 54 4.8.3 Finding Submatches............................. 55 4.8.4 Checking for the Presence of Submatches................. 57 4.8.5 Examples................................... 57 4.9 (Quasi)quotation................................... 57 4.9.1 Quotation Basics............................... 58 4.9.2 Nested Quotations.............................. 60 4.9.3 Non-Recursive Unquotation......................... 61 4.9.4 Splicing.................................... 61 - 6 - 4.9.5 Quotation & Escaped Sequences...................... 62 4.10 Macro Composition.................................. 63 4.10.1 Composing Unrelated Macros........................ 63 4.10.2 Macro Expansion Order & Raw Macros.................. 64 4.10.3 Macros Referencing Macros......................... 64 4.11 Error Reporting.................................... 64 4.12 Miscellaneous Concerns............................... 65 5 Implementation & Internals 67 5.1 Overview of the Architecture............................ 67 5.1.1 driver..................................... 67 5.1.2 files...................................... 68 5.1.3 grammar................................... 68 5.1.4 source..................................... 69 5.1.5 grammar.java................................. 69 5.1.6 parser..................................... 70 5.1.7 trees...................................... 71 5.1.8 compiler.................................... 71 5.1.9 compiler.java................................. 72 5.1.10 compiler.util................................. 72 5.1.11 compiler.macros............................... 72 5.1.12 util and util.apache.............................. 73 5.2 The require Statement............................... 73 5.2.1 Dependencies and the Java Compiler.................... 73 5.2.2 Problems with the import Statements................... 74 5.2.3 Why do we need compile time dependencies?............... 75 5.3 Finding Submatches................................. 76 5.4 Quotations...................................... 76 5.5 The Java Grammar.................................. 77 6 Evaluation 79 6.1 Testing........................................ 79 6.2 Performance...................................... 80 6.2.1 Setup & Preliminary Considerations.................... 80 6.2.2 Profiling.................................... 80 6.2.3 Memory Consumption............................ 81 6.2.4 Parser Performance............................. 81 6.2.5 Parser vs Back-End............................. 83 6.2.6 Complexity Bounds............................. 83 6.2.7 Improving Performances........................... 84 7 Related Work 85 7.1 More Macro Concepts................................ 85 7.1.1 Homoiconicity................................ 85 7.1.2 Hygiene.................................... 86 7.2 Java Macro Frameworks............................... 86 7.2.1 java-macros.................................. 87 7.2.2 Jatha..................................... 87 - 7 - 7.2.3 The Jakarta Toolset (JTS)......................... 88 7.2.4 The Java Syntactic Extender (JSE).................... 89 7.3

Load more