Detecting Invalid Choices in Merged Code and Software Models

JOHANNES KEPLER UNIVERSITAT¨ LINZ JKU Technisch-Naturwissenschaftliche Fakultät Detecting Invalid Choices in Merged Code and Software Models MASTERARBEIT zur Erlangung des akademischen Grades Diplom-Ingenieur im Masterstudium Software Engineering Eingereicht von: Matthias Braun B.Sc. Angefertigt am: Institute for Software Systems Engineering Beurteilung: Univ.-Prof. Dr. Alexander Egyed M.Sc. Linz, August 2015 Contents Zusammenfassung vi 0.1 Ziel ................................. vi 0.2 Methode .............................. vi 0.3 Ergebnisse ............................. vi Abstract viii 0.4 Objective ............................. viii 0.5 Method .............................. viii 0.6 Results and Conclusions ..................... viii 1 Introduction 1 1.1 Terminology ............................ 1 1.2 Motivation and Goals ....................... 2 1.3 Method .............................. 3 1.4 Related Work ........................... 3 1.5 Thesis Structure .......................... 4 2 Background 6 2.1 Code Merging ........................... 6 2.1.1 Two-Way Merging .................... 7 2.1.2 Three-Way Merging .................... 8 2.2 Model Merging .......................... 8 2.3 Invalid Artifacts in Code ..................... 11 2.3.1 Code Clones Cause Invalid Artifacts .......... 11 2.3.2 Invalidity Categorization ................. 12 2.3.3 Detection Approaches .................. 14 2.3.3.1 Code Inspection ................ 14 2.3.3.2 Testing ..................... 15 2.3.3.3 Static Code Analysis .............. 15 2.4 Invalid Artifacts in Models .................... 17 2.4.1 Multiple Views Cause Invalid Artifacts ......... 17 ii CONTENTS iii 2.4.2 Invalidity Categorization ................. 18 2.4.3 Detection Approaches .................. 21 2.5 Software Product Lines ...................... 25 3 Approach 28 3.1 Rules ................................ 28 3.2 Limitations ............................ 30 4 Commonalities Of Code and Model Rules 32 4.1 Common Rule Categories .................... 33 4.1.1 Invalid Artifact Combination .............. 33 4.1.2 Dispensable Artifact ................... 33 4.2 Merging Causes Invalidities ................... 35 5 ECCO Case Study 39 5.1 ECCO Platform Background ................... 39 5.1.1 Functionality ....................... 40 5.1.1.1 Feature to Code Mapping ........... 40 5.1.1.2 Composing New Products ........... 40 5.1.2 Architecture ........................ 41 5.2 Parsing the ECCO Code Tree .................. 43 5.3 Case Study Approach ....................... 44 5.3.1 Related Work ....................... 44 5.4 Case Study Motivation ...................... 45 5.5 Rules ................................ 47 5.5.1 Add Listener Equivalence Rule ............. 50 5.5.2 Multiple Variable Assignment Rule ........... 50 5.5.3 Multiple Setter Call Rule ................. 53 5.5.4 Uninitialized Read Rule ................. 54 5.6 Empirical Results on Rules Performance ............ 55 5.6.1 Effectiveness ........................ 55 5.6.2 Validity .......................... 55 5.6.3 Efficiency ......................... 56 6 ArchStudio Case Study 58 6.1 ArchStudio 3 Platform Background ............... 58 6.1.1 Functionality ....................... 59 6.1.1.1 Model Creation and Editing .......... 59 6.1.1.2 Diffing and Merging .............. 59 6.1.1.3 Detection of Invalid Model Artifacts ..... 60 6.1.2 Architecture ........................ 61 CONTENTS iv 6.2 Architecture Representation and Parsing ............ 61 6.3 Rule Engine for ArchStudio 3 .................. 63 6.3.1 Rule Language ...................... 64 6.4 Case Study Approach ....................... 66 6.4.1 Related Work ....................... 67 6.5 Case Study Motivation ...................... 70 6.6 Rules ................................ 70 6.6.1 Circular Dependency ................... 71 6.6.2 Connector Has Incoming Interface ............ 73 6.6.3 Connector Has Outgoing Interface ............ 74 6.6.4 Mandatory Component Has Mandatory Interface ... 74 6.6.5 Model Has Mandatory Components ........... 75 6.6.6 No Mandatory Link On Optional Interface ....... 77 6.7 Empirical Results ......................... 78 7 Conclusion 81 7.1 Threats to Validity ........................ 82 Bibliography 83 A Source Code 101 A.1 Code for ECCO .......................... 101 A.2 Code for ArchStudio ....................... 151 A.2.1 ArchStudio Parsing and Modeling ............ 151 A.2.2 ArchStudio Rules ..................... 189 B Performance Measurements Raw Data 197 B.1 Efficiency ............................. 197 B.2 Effectiveness ............................ 199 C ArchStudio Model Merge & Repair 206 C.1 First Repair Leads to Cyclic Dependency ............ 207 C.2 First Repair Removes Last Mandatory Interface from Compo- nent ................................ 213 C.3 First Repair Removes Last Incoming Interface from Connec- tor. Second Repair Creates Mandatory Link on Optional In- terface ............................... 219 D Curriculum Vitae 226 E Erklärung 228 für Theresa Zusammenfassung 0.1 Ziel Diese Masterarbeit beschreibt wie im Bereich der Softwareentwicklung das Wählen aus mehreren Wahlmöglichkeiten, die das Resultat einer Softwarezusam- menführung sind, vereinfacht werden kann. Diese Wahlmöglichkeiten entste- hen, da beim Zusammenführen von Software deren Artefakte auf unter- schiedliche Arten kombiniert werden können. Die Software-Artefakte mit denen wir uns in dieser Arbeit beschäftigen sind zum einen Quelltext und zum anderen Softwaremodelle. Unser Ziel ist es die Entscheidung von Softwareingenieuren zu erleichtern, wenn sie aus einer Menge von Wahlmöglichkeiten wählen müssen die aus einer Softwarezusammenführung entstanden sind. 0.2 Methode Um dieses Ziel zu erfüllen, wenden wir Regeln auf die Wahlmöglichkeiten an. Anhand dieser Regeln entfernen wir Wahlmöglichkeiten die ungültig sind. Wir erachten eine Wahlmöglichkeit als ungültig wenn sie entweder einen Fehler beinhalten (zum Beispiel einen Nullzeiger-Zugriff oder eine zyk- lische Abhängigkeit) oder wenn die Wahlmöglichkeit äquivalent zu einer anderen Wahlmöglichkeit ist. Wir definieren zwei Wahlmöglichkeiten als äquiv- alent wenn sie sich gleich verhalten. Unsere Regeln bestimmen ob eine Wahlmöglichkeit ungültig ist. 0.3 Ergebnisse Wir testen diese Methode indem wir Regeln für Quelltext und Software- modelle entwerfen. Diese Regeln werden in zwei Fallstudien angewandt und evaluiert. Wir zeigen wie wir mit diesen Regeln ungültige Wahlmöglichkeiten vi ZUSAMMENFASSUNG vii in Quelltext und Softwaremodellen identifizieren um so die Anzahl der beste- henden Wahlmöglichkeiten zu reduzieren. Abstract 0.4 Objective The work this master’s thesis describes aims to reduce the effort of selecting one of multiple choices that are the result of a software merge. Such choices occur since a merge can combine software artifacts in different ways. The software artifacts this thesis is concerned with are source code and software architecture models. Our goal is to facilitate decision-making for a software engineer who has to choose from a range of choices that were produced by a software merge. 0.5 Method To meet this objective, we apply rules to the available choices from which a software engineer has to choose. Using these rules we eliminate invalid choices. We consider a choice invalid if it either contains a serious flaw (e.g., a null pointer dereference or a circular dependency) or if the choice is equivalent to another one. We define two choices as equivalent if their behavior is the same. It is our rules’ task to determine whether a choice is invalid. 0.6 Results and Conclusions We test this method by creating rules for source code and software models. The developed rules are tested on two case studies by applying them to code and models, respectively. We demonstrate how we can use these rules to detect invalid choices in source code and models in order to reduce the number of available choices. viii Chapter 1 Introduction In the following chapter we outline our motivation for the work presented in this thesis, as well as its goals and methods. To facilitate discussing the topics of this thesis, we define a few terms used throughout the thesis in the next section. 1.1 Terminology Since we will be working with software models and source code alike in the course of this thesis, we shall introduce the umbrella term artifact. This word shall denote an element of either architecture models (e.g., component, link, and interface) or source code (e.g., code block, method call, and variable assignment). This thesis deals with eliminating and repairing choices. Generally speak- ing, these choices can stem from various sources: A software engineer could create multiple, slightly differing methods to compare their execution time, readability, or memory consumption and then choose the most efficient method. But in this thesis, we concentrate on choices which were created automati- cally through source code and model merges, respectively. As we will see in Chapter 2, artifacts can be combined in different ways during a merge, which creates choices, in the sense we defined above. Similar to the term artifact, the word choice is intended to bridge the areas of source code and architecture models in software engineering. For source code, choices are single methods that vary in some respect (e.g., the order of their statements might differ). Like source code choices, architecture choices vary, too (e.g., in the way their components are linked). Building on the definition of artifacts, we want to make it clear that a choice

Load more