Design and Implementation of Semantic Patch Support for the Spoon Java Transformation Engine

DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2021 Design and Implementation of Semantic Patch Support for the Spoon Java Transformation Engine MIKAEL FORSBERG KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Design and Implementation of Semantic Patch Support for the Spoon Java Transformation Engine MIKAEL FORSBERG Master in Computer Science Date: January 26, 2021 Supervisor: Nicolas Yves Maurice Harrand Examiner: Martin Monperrus School of Electrical Engineering and Computer Science Swedish title: Design och implementering av stöd för semantiska patchar för Javatransformeringsmotorn Spoon iii Abstract Software development is more often than not a collaborative process, creating a need for tools and file formats that enable developers to create and share suc- cinct representations of changes to source code in order to facilitate eﬀicient communication. Standard POSIX diffs and patches have long been important parts of the toolkit, but their lack of support for the syntax and semantics of specific programming languages results in limited expressiveness. The Semantic Patch Language (SmPL), introduced in 2006 together with the tool Coccinelle, increases the expressiveness of POSIX-style patches for the C programming language by leveraging support for the syntax and semantics of C. For exam- ple, an SmPL patch can specify changes to source code using metavariables that bind arbitrary program variable names, allowing for the specification of transformations involving variable references regardless of what specific variable names appear in programs targeted by the patch. A recent development is Coccinelle4J, a prototype modification of Coccinelle targeting the Java programming language. Coccinelle4J remains based on a toolkit designed for the parsing and modeling of C, adapted to operate on Java source code. The language mismatch of the base toolkit gives rise to limitations. Despite this, Coccinelle4J remains the state of the art for an SmPL targeting Java. In this thesis we lay the foundations for an SmPL for Java based on Spoon, a robust Java metaprogramming toolkit. We qualitatively investigate to which extent the features of SmPL and Coccinelle are generalizable to a Java context, and we implement and evaluate SPOON-SMPL, a prototype SmPL tool for Java based on Spoon. We base the core design of SPOON-SMPL on temporal logic and model checking, heavily inspired by the design of Coccinelle. We find the majority of identified SmPL features to generalize for Java. We quantitatively evaluate SPOON-SMPL by comparing the running time performance to that of Coccinelle4J over a set of six semantic patches with associated real-world project code bases used in an API migration case study originally performed by the authors of Coccinelle4J. Additionally, we compare the running times of SPOON-SMPL to the average build time of each associated project. We find that SPOON-SMPL performs worse than Coccinelle4J, but that the performance remains in a range acceptable for a single developer using inexpensive hardware. Finally, we provide two proposed designs for extensions to SPOON-SMPL along with a set of suggestions for future work. The proposals show that our prototype offers a strong potential to leverage the capabilities of the Spoon li- brary, particularly in providing improved and robust support for certain aspects of Java for which Coccinelle4J provides only limited support. iv Sammanfattning Mjukvaruutveckling är ofta en kollaborativ process med behov av effektiv kommunikation. Ett centralt inslag i denna kommunikation är möjligheten för utvecklare att skapa och sinsemellan dela kortfattade sammanfattningar över källkodsändringar. De POSIX-standardiserade verktygen diff och patch har länge utgjort en viktig del av verktygslådan, men deras avsaknad av stöd för syntax och semantik hos specifika programspråk ger upphov till en begrän- sad uttrycksfullhet. Semantic Patch Language (SmPL), introducerat år 2006 tillsammans med verktyget Coccinelle, erbjuder ökad uttrycksfullhet i POSIX- liknande patchar för programspråket C. En SmPL-patch kan bland annat an- vända metavariabler, logiska variabelnamn som binder godtyckliga program- variabler, för att specificera transformationer som berör variabelreferenser oav- sett vilka variabelnamn som förekommer i målprogrammet. Coccinelle4J, en modifikation av Coccinelle, är en nyligen framtagen prototyp på ett SmPL- verktyg för programspråket Java. Coccinelle4J baseras på en teknisk grund designad för tolkning och bearbetning av C som anpassats till att bearbeta Ja- va. Språkskillnader gör det svårt att få en heltäckande anpassning, vilket leder till ett begränsat stöd för vissa av Javas egenskaper. Trots detta är Coccinelle4J i dagsläget den främsta lösningen för SmPL för Java. I denna avhandling tar vi de första stegen mot ett SmPL för Java baserat på Spoon, ett robust metapro- grammeringsbibliotek för Java. Vi undersöker kvalitativt vilka egenskaper hos SmPL och Coccinelle som kan generaliseras till Java, samt implementerar och utvärderar SPOON-SMPL, en prototyp på ett SmPL-verktyg för Java baserat på Spoon. Designen av SPOON-SMPL är kraftigt inspirerad av Coccinelle, och baseras på temporallogik och modellprövning. Vi finner att en klar majoritet av de egenskaper vi identifierat hos SmPL och Coccinelle låter sig generaliseras till Java. Vi utvärderar kvantitativt SPOON-SMPL genom att jämföra körtidspre- standan mot Coccinelle4J över sex semantiska patchar med tillhörande pro- jektkodbaser som ursprungligen användes i en fallstudie kring API-migrering utförd av teamet bakom Coccinelle4J. Vi jämför även körtidsprestandan mot byggnadstiden för vardera projekt. Vi finner att körtidsprestandan hos SPOON- SMPL är sämre än Coccinelle4J, men att den trots det befinner sig inom ett område som är acceptabelt för en enskild mjukvaruutvecklare med en enkel persondator. Slutligen presenterar vi två detaljerade förslag till utökningar av SPOON-SMPL tillsammans med en uppsättning förslag för framtida arbete. Vi visar genom detta att vår prototyp har en kraftfull potential för utökningar som drar nytta av de funktioner som finns i Spoon, i synnerhet kring ett förbättrat och robust stöd för vissa egenskaper hos Java där Coccinelle4J endast erbjuder ett begränsat stöd. v Acknowledgements I would like to thank: • Prof. Martin Monperrus, my examiner. Martin suggested the project and gave me the opportunity to pursue it. Martin also helped establish the research methodology and formulate the formal research questions, gave regular feedback on the structure of the thesis and my approaches to various aspects of the work, suggested many papers on related works, and provided tips on the use of Spoon. • Nicolas Yves Maurice Harrand, my supervisor. Like Martin, Nicolas provided feedback on the methodology and the structure of the thesis, and also helped me with a couple of diﬀicult choices in the implementation. Nicolas also provided detailed feedback on the full text, introduced me to a set of useful tools and ideas for improving the text, provided papers on the subtleties involved in bench- marking the performance of Java programs, and helped eliminate a formal research question for which the results were overly speculative. • Ann Bengtsson, degree project coordinator at KTH EECS. Ann greatly helped me solve the complications surrounding my formal admittance to the degree project course. Finally, I would like to jointly thank Martin and Nicolas for their sympathy and patience throughout the project in general, and in particular surrounding the passing of my father. To my father. I’m sorry I took too long. Thank you for everything. Contents 1 Introduction 1 1.1 Problem statement . 1 1.2 Research questions . 2 1.3 Contributions . 2 1.4 Intended audience . 3 1.5 Outline of the thesis . 3 2 Background 4 2.1 Text file differencing . 4 2.1.1 diff .......................... 4 2.1.2 patch .......................... 6 2.2 Formal logics for the modeling of computer programs . 7 2.2.1 Computation Tree Logic . 7 2.2.2 CTL with free variables . 13 2.2.3 CTL with quantified variables . 16 2.2.4 CTL with variables and witnesses . 17 2.3 Program analysis and transformation . 19 2.3.1 Spoon . 20 2.3.2 Semantic Patch Language . 22 3 Related work 28 3.1 Semantic patching . 28 3.1.1 Coccinelle . 28 3.1.2 Coccinelle4J . 30 3.2 Program transformation using temporal logic . 31 3.3 Other approaches to Java source code transformation . 32 3.4 API migration . 32 vii viii CONTENTS 4 Design of spoon-smpl 34 4.1 Design goals . 34 4.2 Core engine . 36 4.3 Parsing SmPL . 37 4.4 Formula language . 41 4.5 Formula compilation . 43 4.6 Batch processing . 50 4.7 Use of Spoon . 50 5 Evaluation methodology 52 5.1 Analytical methodology . 52 5.1.1 RQ1: Generalizable features . 52 5.1.2 RQ2: Non-generalizable features . 53 5.2 Experimental methodology . 53 5.2.1 RQ3: Patch application performance . 54 5.2.2 RQ4: Project build times . 64 6 Evaluation results 66 6.1 Analytical results . 66 6.1.1 Coccinelle feature catalog . 66 6.1.2 RQ1: Generalizable features . 81 6.1.3 RQ2: Non-generalizable features . 81 6.2 Experimental results . 82 6.2.1 Hardware and software . 83 6.2.2 RQ3: Patch application performance . 83 6.2.3 RQ4: Project build times . 89 7 Discussion 91 7.1 Limitations . 91 7.2 Threats to validity . 92 7.3 Extension proposals . 93 7.3.1 Improving name resolution . 94 7.3.2 Improving sub-typing . 95 7.4 Future work . 96 7.4.1 Support for more simple Java constructs . 97 7.4.2 Support for looping constructs . 98 7.4.3 Support for isomorphisms . 98 7.4.4 Model checker optimizations . 99 7.4.5 Using Spoon sniper mode . 100 7.4.6 Using spoon.pattern . 100 CONTENTS ix 7.4.7 Target-embedded parsing of the semantic patch . 101 7.5 Ethical considerations . 102 8 Conclusions 104 Bibliography 105 A Full semantic patches 109 A.1 Semantic patch 4: should_vibrate .............. 110 A.1.1 Original version . 110 A.1.2 Modified version .

Design and Implementation of Semantic Patch Support for the Spoon Java Transformation Engine

Semantic Patches for Java Program Transformation

Coccinelle: Reducing the Barriers to Modularization in a Large C Code Base

Inferring Semantic Patches for the Linux Kernel

Automating Patching of Vulnerable Open-Source Software Versions in Application Binaries

SED 1214 Transcript EPISODE 1214

Automated Secure Code Review for Webapplications

Detect Complex Code Patterns Using Semantic Grep

Introducing Semgrep

Towards Generating Transformation Rules Without Examples for Android API Replacement

Effective Source Code Analysis with Minimization

Clang and Coccinelle: Synergising Program Analysis Tools for CERT C Secure Coding Standard Certification

Aalborg Universitet Coccinelle Tool Support for Automated