High-Fidelity Metaprogramming with Separator Syntax Trees

High-Fidelity Metaprogramming with Separator Syntax Trees Rodin T. A. Aarssen Tijs van der Storm Centrum Wiskunde & Informatica Centrum Wiskunde & Informatica Amsterdam, The Netherlands Amsterdam, The Netherlands Eindhoven University of Technology University of Groningen Eindhoven, The Netherlands Groningen, The Netherlands [email protected] [email protected] Abstract Manipulation (PEPM ’20), January 20, 2020, New Orleans, LA, USA. Many metaprogramming tasks, such as refactorings, auto- ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/3372884. mated bug fixing, or large-scale software renovation, require 3373162 high-fidelity source code transformations – transformations which preserve comments and layout as much as possible. 1 Introduction Abstract syntax trees (ASTs) typically abstract from such Many metaprogramming tasks require high-fidelity source details, and hence would require pretty printing, destroying code transformations: transformations that preserve as much the original program layout. Concrete syntax trees (CSTs) layout (comments, indentation, whitespace, etc.) of the input preserve all layout information, but transformation systems as possible. Typical examples include: or parsers that support CSTs are rare and can be cumbersome to use. • Automated refactorings [12]; In this paper we present separator syntax trees (SSTs), a • Mass maintenance scenarios [14]; lightweight syntax tree format, that sits between AST and • Automated bug fixing [10]. CSTs, in terms of the amount of information they preserve. High-fidelity metaprogramming further promotes end- SSTs extend ASTs by recording textual layout information user scripting of source code transformations, where pro- separating AST nodes. This information can be used to re- grammers not only apply standard refactorings and restruc- construct the textual code after parsing, but can largely be turings offered by mainstream IDEs or dedicated tools (such ignored when implementing high-fidelity transformations. as Coccinelle), but are able to script (one-off) transformations We have implemented SSTs in Rascal, and show how it themselves, using knowledge about their code bases. enables the concise definition of high-fidelity source code Unfortunately, high-fidelity is a high bar to reach. Most transformations using a simple refactoring for C++. program transformation systems employ ASTs to represent CCS Concepts • Software and its engineering → Trans- source code. While valuable in the sense that they abstract lator writing systems and compiler generators; Source from “non-essential detail”, simplifying metaprogramming code generation; Software maintenance tools. across the board, ASTs throw the baby out with the bath water from the perspective of high-fidelity transformation Keywords metaprogramming, high-fidelity code transfor- scenarios. Concrete Syntax Trees (CSTs), on the other hand, mations represent source code in exactly the way it was parsed, con- ACM Reference Format: taining all syntactical information. While “unparsing” such Rodin T. A. Aarssen and Tijs van der Storm. 2020. High-Fidelity trees gives back the original source code, CSTs usually reflect Metaprogramming with Separator Syntax Trees. In Proceedings of productions from some context-free grammar defining the the 2020 ACM SIGPLAN Workshop on Partial Evaluation and Program source language, which may not be available. In this paper we present Separator Syntax Trees (SSTs), Permission to make digital or hard copies of all or part of this work for a syntax tree representation that conveniently sits between personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear the abstraction provided by ASTs, and the complexity of this notice and the full citation on the first page. Copyrights for components heavy-weight, full-blown CSTs (Section 2). We motivate SSTs of this work owned by others than ACM must be honored. Abstracting with using an example transformation, define their structure, and credit is permitted. To copy otherwise, or republish, to post on servers or to how they can be mapped back to source code. The use of redistribute to lists, requires prior specific permission and/or a fee. Request SSTs is further illustrated using a simple term rewriting permissions from [email protected]. language (HifiTRS), detailing pattern matching, substitution, PEPM ’20, January 20, 2020, New Orleans, LA, USA © 2020 Association for Computing Machinery. and how to deal with (separated) lists (Section 3). Finally, we ACM ISBN 978-1-4503-7096-7/20/01...$15.00 discuss our implementation in the Rascal metaprogramming https://doi.org/10.1145/3372884.3373162 system [8], evaluating SSTs on the definition of a simple C++ 27 PEPM ’20, January 20, 2020, New Orleans, LA, USA Rodin T. A. Aarssen and Tijs van der Storm refactoring (Section 4). Throughout the paper, we use Rascal Concrete syntax trees solve a large part of the problem of in the meta program code examples. high-fidelity metaprogramming, yet suffer from a number of drawbacks: 1.1 Overview and Motivating Example • Transformation systems that support CSTs are scarce, At the heart of many programming transformations, are and even if CSTs are supported, adapting existing rewrite rules such as the following: parsers to produce CSTs is non-trivial. • ifThen(not(e), s1, s2) ) ifThen(e, s2, s1) Writing a grammar for a language from scratch in the formalism of systems that do support CSTs is a This example uses pattern-matching against ASTs, matching complex task [7], and hardly feasible for complex lan- conditional statements with a negated conditional. The rule guages such as C++ and Cobol. replaces matching subtrees somewhere in the program with • Without support for concrete pattern matching, pro- the pattern on the right-hand side where the then and else cessing CSTs can be cumbersome. branches are swapped. Applying such rules to a program’s Separator syntax trees represent a middle-ground between AST produces another AST, which has to be rendered to the fidelity of CSTs and the simplicity of ASTs. SSTs are like actual source code again. ASTs, except each node is annotated with a list of strings, rep- The example illustrates two problems for high-fidelity resenting the literal source code that separates the children metaprogramming: of the node. SSTs support unparsing, like CSTs, to obtain the • Source code layout of the source is lost in translation; original textual source code. • The new term constructed on the right-hand side has In terms of metaprogramming, SSTs can be the basis of no layout at all. rule-based rewriting systems that support rules like the fol- As a result, such metaprograms require post-transformation lowing: pretty printing, inventing new layout for the transformed ifThen(not(e), s1, s2) ) program. In many cases this is unacceptable, because it loses ( Stmt )ìf (<Exp e >) <Stmt s2 > else <Stmt s1 >` comments of the input source, and might make the transformed program look foreign to the developers of the code, In this case the left-hand side consists of an abstract pattern, because of coding conventions the pretty-printer is unaware which will be matched against an SST (modulo the separa- of. tors). As a result, e, s1, and s2 preserve their separator lists Systems such as ASF+SDF [3], TXL [4], and Rascal [8] sup- when inserted into the right-hand side. The right-hand side port CSTs, which preserve all layout of the source program; pattern is itself parsed into an SST, and hence will get the additionally, such systems support concrete syntax patterns layout as specified in the rule itself. to allow rewrite rules to be expressed directly in the object SSTs are easy to construct, either during parsing, or after- language. For instance, the above rule could be expressed in wards if accurate source location information is available. Rascal as follows: This opens the way to high-fidelity metaprogramming in systems like Rascal or TXL, when parsing is realized by reusing ( Stmt )ìf (!< Exp e >) <Stmt s1 > else <Stmt s2 >` ) external, black box parsers (cf. [1]). ( Stmt )ìf (<Exp e >) <Stmt s2 > else <Stmt s1 >` In this case the left-hand side matches against CSTs (ignoring 2 Separator Syntax Trees layout), which means that e, s1, and s2 will carry over all 2.1 Introduction internal layout information, when substituted in the right- As a compromise between ASTs and CSTs, we introduce hand side. Conversely, the concrete pattern on the right-hand Separator Syntax Trees (SSTs). SSTs contain the structure side defines the layout of the rewritten conditional, aspart of abstract syntax, augmented with the separator strings of the metaprogram. that are present in source code, filling the gaps around and In the previous example, the conditional on the right-hand between abstract syntax nodes. Compared to CSTs, where side is written (and parsed) as a one-liner. The metaprogram- such separator strings might come in different forms (e.g., 1 mer could have specified a different layout, for instance : whitespace, or a literal representing some operator), in SSTs ( Stmt )ìf (!< Exp e >) <Stmt s1 > else <Stmt s2 >` ) this distinction is not made; the “non-essential” information ( Stmt )ìf (<Exp e >) in between AST nodes is recorded simply as text. ' <Stmt s2 > Listing 1 shows the meta-definition of SSTs in the form 'else of the algebraic data type Term. The leaves of an SST are ' <Stmt s1 >` represented by (literal) token nodes, corresponding

High-Fidelity Metaprogramming with Separator Syntax Trees

Source-To-Source Translation and Software Engineering

Extending Fortran with Meta-Programming

ROSE Tutorial: a Tool for Building Source-To-Source Translators Draft Tutorial (Version 0.9.11.115)

Towards Large-Scale Refactoring for Ocaml

Combining Source and Target Level Cost Analyses for Ocaml Programs

A Bundle of Program Transformation Tools System Description

When Polyhedral Transformations Meet SIMD Code Generation

Formalizing the SSA-Based Compiler for Verified Advanced Program Transformations Jianzhou Zhao University of Pennsylvania, [email protected]

Compiler-Based Code-Improvement Techniques

Design and Implementation of a PHP Compiler Front-End

Tail Modulo Cons

Structured Program Generation Techniques