
SEMANTIC PATCH INFERENCE Andersen, Jesper Publication date: 2009 Document version Publisher's PDF, also known as Version of record Citation for published version (APA): Andersen, J. (2009). SEMANTIC PATCH INFERENCE. Download date: 01. Oct. 2021 SEMANTIC PATCH INFERENCE jesper andersen Computer Science Department (DIKU) The Graduate School of Science Faculty of Science University of Copenhagen Copenhagen November 2009 Supervisor: Julia L. Lawall [ November 13, 2009 at 10:43 ] Dedicated to my loving wife and son. You are the sunshine that light up my day. [ November 13, 2009 at 10:43 ] ABSTRACT Collateral evolution the problem of updating several library-using programs in response to API changes in the used library. In this dissertation we address the issue of understanding collateral evolutions by automatically inferring a high-level specification of the changes evident in a given set of updated programs. We have formalized a concept of transformation parts that serve as an indication of when a change specification is evident in a set of changes. Based on the transformation parts concept, we state a subsumption relation on change specifications. The subsumption relation allows decision of when a change specification captures a maximal amount of the evident changes in a set of changes. We state two algorithms that find high-level change specifications evident in a set of changes. Both algorithms have been implemented in a tool we call spdiff. Finally, a few examples of change specifications inferred by spdiff in Linux are shown. We find that the inferred specifications concisely capture the actual collateral evolution performed in the examples. SAMMENFATNING “Medført evolution” handler om nødvendigheden af at opdatere adskillige programmer medført af ændringer i et bibliotek brugt af alle programmerne. I denne afhandling behandles emnet om at forstå sådanne medførte evolutioner ved, automatisk, at aflede en høj-niveau specifikation of ændringer set i en given mængde af opdaterede programmer. Vi har formaliseret et begreb vi kalder “transformationsdele”. Transformationsdele viser hvornår en given specifikation af ændringer kan ses i en mængde af opdaterede programmer. Baseret herpå, har vi defineret en relation som beskriver hvornår en specifikation af ændringer er en del af en anden specifikation af ændringer. Dette kan yderligere bruges til at afgøre om en specifikation af ændringer er maksimal. Endelig har vi beskrevet to algoritmer til at finde høj-niveau specifikationer af ændringer i en given mængde af opdaterede programmer. Begge algoritmer er implementeret i et værktøj, som vi kalder spdiff. Vi viser resultatet af nogle få anvendelser af spdiff på ændringer i Linux. De afledte ændringer fanger på en konsis måde de medførte evolutioner, som var blevet udført. iii [ November 13, 2009 at 10:43 ] ACKNOWLEDGMENTS There are numerous people that I would like to thank for somehow being helpful to me during my Ph.D. studies. In particular I would like to thank my supervisor, Julia Lawall, for consistent, quick, and helpful advise on just about any of the (more or less silly) questions I have had. Your guidance changed the way I think about research in a way I think is better. I would also like to thank Professor Siau-Cheng Khoo from the National University of Singapore for being my host when I visited NUS during winter 2008. I have so many fond memories of Singapore and I really enjoyed the collaboration with you. Finally, I would like to thank David Lo who is now working as an assistant professor at the Singapore Management University. Your energy with respect to research and general helpfulness is an inspiration to me. I am grateful that you took the time to visit me and my family in Copenhagen. iv [ November 13, 2009 at 10:43 ] CONTENTS i semantic patch inference1 1 introduction2 1.1 Example-based change inference 3 1.1.1 Tranformation parts 3 1.1.2 Algorithms and implementations 5 1.2 Structure of the dissertation 6 2 related work7 2.1 Change vocabulary 7 2.2 Program transformation systems 9 2.3 Program pattern discovery 10 2.3.1 Inference of program behavior 11 2.3.2 Clone detection 13 2.4 Change detection 24 2.4.1 Text based differencing 24 2.4.2 Tree differencing 27 2.4.3 Higher level approaches 31 3 setup 38 3.1 The language of Terms 38 3.1.1 Constructing Terms 38 3.2 Term patterns 40 3.2.1 Abstracting terms 42 4 transformation parts 46 4.1 Properties of common change descriptions 46 4.1.1 Towards a definition 47 4.2 Tree distance based transformation parts 49 4.2.1 Work-function 50 4.2.2 Term-distance 50 4.3 Subsumption of program transformations 52 4.4 Extending to changesets 53 4.5 Non-global common changes 55 ii algorithms and implementation 58 5 context-free patch inference 59 5.1 Motivating example 59 5.2 Context-free patches 62 5.2.1 Application function 62 v [ November 13, 2009 at 10:43 ] contents vi 5.3 Algorithm 63 5.3.1 A simple algorithm 64 5.3.2 Towards a refined algorithm 66 5.3.3 The refined spfind algorithm 71 6 context-sensitive patch inference 73 6.1 Motivating example 73 6.2 Semantic patches 75 6.3 Semantic patterns 76 6.4 Finding semantic patterns 77 6.4.1 Occurrences & Pruning Properties 77 6.4.2 Algorithm 78 6.4.3 Constructing semantic patches 79 6.5 Implementation 80 iii real-world application 83 7 experiments 84 7.1 Examples of context-free patches 84 7.2 Examples of context-sensitive patches 86 8 conclusion 89 8.1 Summary 89 8.2 Future work 89 8.2.1 Evaluation and engineering 89 8.2.2 Exploration of other transformation languages 90 bibliography 92 [ November 13, 2009 at 10:43 ] Part I SEMANTIC PATCH INFERENCE [ November 13, 2009 at 10:43 ] INTRODUCTION Chapter 1 In the case of open-source software, such as Linux, where the developers are widely distributed, it must be possible to exchange, distribute, and reason about source code changes. One common medium for such exchange is the patch [43]. When making a change in the source code, a developer makes a copy of the code, modifies this copy, and then uses diff to create a file describing the line-by-line differences between the original code and the new version. He then distributes this file, known as a patch, to subsystem maintainers and mailing lists for discussion. Once the patch has been approved, other developers can apply it to their own copy of the code, to update it to the new version. Patches have been undeniably useful in the development of Linux and other open-source systems. However, it has been found that they are not very well adapted for one kind of change, the collateral evolution [48]. A collateral evolution is a change entailed by an evolution that affects the interface of a library, and comprises the modifications that are required to bring the library clients up to date with this evolution. Collateral evolutions range from simply replacing the name of a called library function to more complex changes that involve multiple parts of each affected file. Such changes may have to be replicated across an entire directory, subsystem implementation, or even across the entire source code. In the case of Linux, it has been shown that collateral evolutions particularly affect device drivers, where hundreds of files may depend on a single library [48]. The volume and repetitiveness of collateral evolutions strain the patch-based development model in two ways. First, the original developer has to make the changes in every file, which is tedious and error prone. Second, developers that need to read the resulting patch, either to check its correctness or to understand what it will do to their own code, may have to study hundreds of lines of patch code, which are typically all very similar, but which may contain some subtle differences. An alternative is provided by the transformation system Coccinelle, which raises the level of abstraction of patches to semantic patches [49]. A semantic patch describes a change at the source code level, like an ordinary patch, but is applied in terms of the syntactic and semantic structure of the source language, rather than on a line-by-line basis. Semantic patches include only the code relevant to the change, can be abstracted over irrelevant subterms using meta-variables, and are independent of the spacing and line breaks of the code to which they are applied. The level of abstraction of semantic patches furthermore implies that they can be applied to files not known to the original developer – in the case of Linux, the many drivers that are maintained outside the Linux source tree. Despite the many advantages of semantic patches, it may not be reasonable to expect develop- ers to simply drop the patch-based development model when performing collateral evolutions. For the developer who makes the collateral evolution, there can be a gap between the details of 2 [ November 13, 2009 at 10:43 ] 1.1 example-based change inference 3 an evolution within a library and the collateral evolution it entails. Therefore, he may still find it natural to make the required changes by hand in a few typical files, to better understand the range and scope of the collateral evolution that is required. Furthermore, the standard patch application process is very simple, involving only replacing one line by another, which may increase confidence in the result. Thus, developers may find it desirable to continue to distribute standard patches, with or without an associated semantic patch. What is then needed is a means of mediating between standard patches and semantic patches, by inferring semantic patches from standard patches.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages105 Page
-
File Size-