Spoon: a Library for Implementing Analyses and Transformations Of
Total Page:16
File Type:pdf, Size:1020Kb
Spoon: A Library for Implementing Analyses and Transformations of Java Source Code Renaud Pawlak, Martin Monperrus, Nicolas Petitprez, Carlos Noguera, Lionel Seinturier To cite this version: Renaud Pawlak, Martin Monperrus, Nicolas Petitprez, Carlos Noguera, Lionel Seinturier. Spoon: A Library for Implementing Analyses and Transformations of Java Source Code. Software: Practice and Experience, Wiley, 2015, 46, pp.1155-1179. 10.1002/spe.2346. hal-01078532v2 HAL Id: hal-01078532 https://hal.inria.fr/hal-01078532v2 Submitted on 12 Sep 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution - ShareAlike| 4.0 International License Spoon: A Library for Implementing Analyses and Transformations of Java Source Code Renaud Pawlak Martin Monperrus Nicolas Petitprez Carlos Noguera Lionel Seinturier Abstract This article presents Spoon, a library for the analysis and transfor- mation of Java source code. Spoon enables Java developers to write a large range of domain-specific analyses and transformations in an easy and concise manner. Spoon analyses and transformations are written in plain Java. With Spoon, developers do not need to dive into parsing, to hack a compiler infrastructure, or to master a new formalism. 1 Introduction Compilers and interpreters analyze source code. But source code analysis is used in many more places [6]: it is used to compute metrics [17], to detect bad smells [18], to detect code clones [20]. Companies and open-source projects set up their own metrics and coding conventions [16]. This motivates a library for source code analysis that is usable by the masses of developers and not dedicated to compiler hackers. Beyond source code analysis, there is source code transformation. Source code transformation is a program transformation at the source code level, as opposed to program transformation done on binary code [8]. There are many usages of program transformation: profiling [41], security [11], optimization [28], refactoring [24]. As source code analysis, some source code transformations are written by normal Java developers. For instance, this happens when the transformation uses domain-specific knowledge [5]. This article presents Spoon, a library for the analysis and transformation of Java source code. Spoon enables Java developers to write a large range of domain-specific analyses and transformations in an easy and concise manner. Spoon analyses and transformations are written in plain Java. With Spoon, developers do not need diving to parse, to hack a compiler infrastructure, or to master a new formalism. The main features of Spoon are: 1. a Java metamodel for representing Java abstract syntax trees (AST) which is both easy to understand and easy to manipulate; 2. a first-class intercession programming interface (intercession API) to mod- ify and generate Java source code; 3. the use of generic typing for static checking of the analyses and transfor- mations; 1 4. the native and seamless integration and processing of Java annotations; 5. a pure Java statically-checked templating engine. Taking these features all together, Spoon is unique for the following reasons. Feature #1 and #2 are not the focus of compiler infrastructures. We are not aware of leveraging generics for AST manipulation (Feature #3). Java anno- tations have been extensively discussed [14] but generic annotation processors are scarce (Feature #4). While many templating engines exists (e.g. Apache Velocity1), none provide static checking as our plain Java templates provide. The related work section5 deepens those points. This paper supersedes INRIA technical report #5901 [36], which has been completely rewritten. It contains a better explanation of the concepts using appropriate illustrative examples, the evaluation contains a section of the cor- rectness of Spoon as well as three new case studies, and the related work is thoroughly analyzed (including the most recent papers). This article reads as follows. Section2 discusses the foundations of source code analysis in Spoon (the metamodel and the queries). Section3 presents our mechanisms for transforming source code. Section4 exposes case studies of fruitful usages of Spoon. Section5 discusses the related work. Section6 concludes and discusses future work. 2 Source Code Analysis with Spoon The first goal of Spoon is to enable standard developers to write their own domain-specific analyses on source code. This requires: first, an intuitive meta- model understandable by the mass of Java developers (presented in Section 2.2), second, mechanisms to analyze source code elements. The latter is embodied by queries (Section 2.3) and processors for traversing the program under analysis (Section 2.4). But let us first give an overview of the library before going into the details of the Java metamodel of Spoon. Spoon is a meta-analysis tool, it provides software engineers with the primi- tives to write their own analyses. As such, Spoon does not any specific analysis such as dataflow analysis. 2.1 Overview of Spoon Figure1 gives the overview of our approach. A Java program is given as in- put. It is parsed with an off-the-shelf compiler in order to produce a first abstract syntax tree (AST). Then, Spoon simplifies the AST (deleting and cre- ating nodes), in order to provide users with an intuitive and easy-to-manipulate model of their program. This compile-time (CT) model is an instance of the Spoon metamodel. The analysis and transformation of programs are written as “program processors” and “templates”. A user-defined processor performs a specific action, such as a transformation on a kind of node, under a well-defined processing condition. The processing and templating engine takes them as input and applies them to the Java model as long as elements remain to be processed. Eventually, the Spoon model is translated back to source with a pretty-printer. 1http://velocity.apache.org 2 Java Program (source code) Processors for Analysis & Transformation Parsing Low-Level Abstract Syntactic Tree AST Simplification Processing & instance Spoon Java Model Spoon Meta-Model Templating Engine Java Syntax Printing Transformed Java Program Figure 1: Overview of Spoon: Java Programs are transformed and analyzed as models of the Spoon Java metamodel.. This pretty printer preserves API comments and removed in-body comments (because they are not part of the metamodel). 2.2 The Spoon Metamodel of Java A programming language can have different metamodels. An abstract syntax tree (AST) or model, is an instance of a metamodel. Each metamodel – and consequently each AST – is more or less appropriate depending on the task at hand. In this paper, we focus on Java and consequently on Java metamodels. For instance, the Java metamodel of Sun’s compiler (javac) has been designed and optimized for compilation to bytecode, while, the main purpose of the Java metamodel of the Eclipse IDE (JDT) is to support different tasks of software development in an integrated manner (code completion, quick fix of compilation errors, debug, etc.). Unlike a compiler-based AST (e.g. from javac), the Spoon metamodel of Java is designed to be easily understandable by normal Java developers, so that they can write their own program analyses and transformations. The Spoon metamodel is complete in the sense that it contains all the required information to derive compilable and executable Java programs (hence contains annotations, generics, and method bodies). The Spoon metamodel can be split in three parts. The structural part (Figure2) contains the declarations of the program elements, such as interface, class, variable, method, annotation, and enum declarations. The code part (Figure3) contains the executable Java code, such as the one found in method bodies. The reference part models the references to program elements (for instance a reference to a type). As shown in Figure2, all elements inherit from CtElement which declares a parent element denoting the containment relation in the source file. For instance, the parent of a method node is a class node. All names are prefixed by “CT” 3 Figure 2: Excerpt of the structural part of the Spoon Java 5 metamodel. which means “compile-time”. Figure3 shows the metamodel for Java executable code. Because of the complexity of the Java language, the code metamodel figure contains only an excerpt of all classes. There are two main kinds of code elements. First, the statements (CtStatement) are untyped top-level instructions that can be used directly in a block of code. Second, the expressions (CtExpression) are used inside the statements (for sake of readability, this can not be seen on the figure). For instance, a CtLoop (which is a statement) points to a CtExpression which expresses its boolean condition. Some code elements such as invocations and assignments are both statements and expressions (multiple inheritance links). Concretely, this is translated as an interface CtInvocation inheriting from both interfaces CtStatement and CtExpression. The generic type of CtExpression is used to add static type-checking when transforming programs. This will be explained in details in Section 3.2. The reference part of the metamodel expresses the fact that program refer- ences elements that are not necessarily reified into the metamodel (they may belong to third party libraries). For instance, an expression node returning a String is bound to a type reference to String and not to the compile-time model of String.java since the source code of String is (usually) not part of the application code under analysis. In other terms, references are used by meta- model elements to reference elements in a weak way.