JOHANNES KEPLER UNIVERSITAT¨ LINZ JKU

Technisch-Naturwissenschaftliche Fakult¨at

Detecting Invalid Choices in Merged Code and Software Models

MASTERARBEIT

zur Erlangung des akademischen Grades Diplom-Ingenieur

im Masterstudium Software Engineering

Eingereicht von: Matthias Braun B.Sc.

Angefertigt am: Institute for Software Systems Engineering

Beurteilung: Univ.-Prof. Dr. Alexander Egyed M.Sc.

Linz, August 2015 Contents

Zusammenfassung vi 0.1 Ziel ...... vi 0.2 Methode ...... vi 0.3 Ergebnisse ...... vi

Abstract viii 0.4 Objective ...... viii 0.5 Method ...... viii 0.6 Results and Conclusions ...... viii

1 Introduction 1 1.1 Terminology ...... 1 1.2 Motivation and Goals ...... 2 1.3 Method ...... 3 1.4 Related Work ...... 3 1.5 Thesis Structure ...... 4

2 Background 6 2.1 Code Merging ...... 6 2.1.1 Two-Way Merging ...... 7 2.1.2 Three-Way Merging ...... 8 2.2 Model Merging ...... 8 2.3 Invalid Artifacts in Code ...... 11 2.3.1 Code Clones Cause Invalid Artifacts ...... 11 2.3.2 Invalidity Categorization ...... 12 2.3.3 Detection Approaches ...... 14 2.3.3.1 Code Inspection ...... 14 2.3.3.2 Testing ...... 15 2.3.3.3 Static Code Analysis ...... 15 2.4 Invalid Artifacts in Models ...... 17 2.4.1 Multiple Views Cause Invalid Artifacts ...... 17

ii CONTENTS iii

2.4.2 Invalidity Categorization ...... 18 2.4.3 Detection Approaches ...... 21 2.5 Software Product Lines ...... 25

3 Approach 28 3.1 Rules ...... 28 3.2 Limitations ...... 30

4 Commonalities Of Code and Model Rules 32 4.1 Common Rule Categories ...... 33 4.1.1 Invalid Artifact Combination ...... 33 4.1.2 Dispensable Artifact ...... 33 4.2 Merging Causes Invalidities ...... 35

5 ECCO Case Study 39 5.1 ECCO Platform Background ...... 39 5.1.1 Functionality ...... 40 5.1.1.1 Feature to Code Mapping ...... 40 5.1.1.2 Composing New Products ...... 40 5.1.2 Architecture ...... 41 5.2 Parsing the ECCO Code Tree ...... 43 5.3 Case Study Approach ...... 44 5.3.1 Related Work ...... 44 5.4 Case Study Motivation ...... 45 5.5 Rules ...... 47 5.5.1 Add Listener Equivalence Rule ...... 50 5.5.2 Multiple Variable Assignment Rule ...... 50 5.5.3 Multiple Setter Call Rule ...... 53 5.5.4 Uninitialized Read Rule ...... 54 5.6 Empirical Results on Rules Performance ...... 55 5.6.1 Effectiveness ...... 55 5.6.2 Validity ...... 55 5.6.3 Efficiency ...... 56

6 ArchStudio Case Study 58 6.1 ArchStudio 3 Platform Background ...... 58 6.1.1 Functionality ...... 59 6.1.1.1 Model Creation and Editing ...... 59 6.1.1.2 Diffing and Merging ...... 59 6.1.1.3 Detection of Invalid Model Artifacts ..... 60 6.1.2 Architecture ...... 61 CONTENTS iv

6.2 Architecture Representation and Parsing ...... 61 6.3 Rule Engine for ArchStudio 3 ...... 63 6.3.1 Rule Language ...... 64 6.4 Case Study Approach ...... 66 6.4.1 Related Work ...... 67 6.5 Case Study Motivation ...... 70 6.6 Rules ...... 70 6.6.1 Circular Dependency ...... 71 6.6.2 Connector Has Incoming Interface ...... 73 6.6.3 Connector Has Outgoing Interface ...... 74 6.6.4 Mandatory Component Has Mandatory Interface ... 74 6.6.5 Model Has Mandatory Components ...... 75 6.6.6 No Mandatory Link On Optional Interface ...... 77 6.7 Empirical Results ...... 78

7 Conclusion 81 7.1 Threats to Validity ...... 82

Bibliography 83

A Source Code 101 A.1 Code for ECCO ...... 101 A.2 Code for ArchStudio ...... 151 A.2.1 ArchStudio Parsing and Modeling ...... 151 A.2.2 ArchStudio Rules ...... 189

B Performance Measurements Raw Data 197 B.1 Efficiency ...... 197 B.2 Effectiveness ...... 199

C ArchStudio Model Merge & Repair 206 C.1 First Repair Leads to Cyclic Dependency ...... 207 C.2 First Repair Removes Last Mandatory Interface from Compo- nent ...... 213 C.3 First Repair Removes Last Incoming Interface from Connec- tor. Second Repair Creates Mandatory Link on Optional In- terface ...... 219

D Curriculum Vitae 226

E Erklärung 228 für Theresa Zusammenfassung

0.1 Ziel

Diese Masterarbeit beschreibt wie im Bereich der Softwareentwicklung das Wählen aus mehreren Wahlmöglichkeiten, die das Resultat einer Softwarezusam- menführung sind, vereinfacht werden kann. Diese Wahlmöglichkeiten entste- hen, da beim Zusammenführen von Software deren Artefakte auf unter- schiedliche Arten kombiniert werden können. Die Software-Artefakte mit denen wir uns in dieser Arbeit beschäftigen sind zum einen Quelltext und zum anderen Softwaremodelle. Unser Ziel ist es die Entscheidung von Softwareingenieuren zu erleichtern, wenn sie aus einer Menge von Wahlmöglichkeiten wählen müssen die aus einer Softwarezusammenführung entstanden sind.

0.2 Methode

Um dieses Ziel zu erfüllen, wenden wir Regeln auf die Wahlmöglichkeiten an. Anhand dieser Regeln entfernen wir Wahlmöglichkeiten die ungültig sind. Wir erachten eine Wahlmöglichkeit als ungültig wenn sie entweder einen Fehler beinhalten (zum Beispiel einen Nullzeiger-Zugriff oder eine zyk- lische Abhängigkeit) oder wenn die Wahlmöglichkeit äquivalent zu einer an- deren Wahlmöglichkeit ist. Wir definieren zwei Wahlmöglichkeiten als äquiv- alent wenn sie sich gleich verhalten. Unsere Regeln bestimmen ob eine Wahlmöglichkeit ungültig ist.

0.3 Ergebnisse

Wir testen diese Methode indem wir Regeln für Quelltext und Software- modelle entwerfen. Diese Regeln werden in zwei Fallstudien angewandt und evaluiert. Wir zeigen wie wir mit diesen Regeln ungültige Wahlmöglichkeiten

vi ZUSAMMENFASSUNG vii in Quelltext und Softwaremodellen identifizieren um so die Anzahl der beste- henden Wahlmöglichkeiten zu reduzieren. Abstract

0.4 Objective

The work this master’s thesis describes aims to reduce the effort of selecting one of multiple choices that are the result of a software merge. Such choices occur since a merge can combine software artifacts in different ways. The software artifacts this thesis is concerned with are source code and software architecture models. Our goal is to facilitate decision-making for a software engineer who has to choose from a range of choices that were produced by a software merge.

0.5 Method

To meet this objective, we apply rules to the available choices from which a software engineer has to choose. Using these rules we eliminate invalid choices. We consider a choice invalid if it either contains a serious flaw (e.g., a null pointer dereference or a circular dependency) or if the choice is equivalent to another one. We define two choices as equivalent if their behavior is the same. It is our rules’ task to determine whether a choice is invalid.

0.6 Results and Conclusions

We test this method by creating rules for source code and software models. The developed rules are tested on two case studies by applying them to code and models, respectively. We demonstrate how we can use these rules to detect invalid choices in source code and models in order to reduce the number of available choices.

viii Chapter 1

Introduction

In the following chapter we outline our motivation for the work presented in this thesis, as well as its goals and methods. To facilitate discussing the topics of this thesis, we define a few terms used throughout the thesis in the next section.

1.1 Terminology

Since we will be working with software models and source code alike in the course of this thesis, we shall introduce the umbrella term artifact. This word shall denote an element of either architecture models (e.g., component, link, and interface) or source code (e.g., code block, method call, and variable assignment). This thesis deals with eliminating and repairing choices. Generally speak- ing, these choices can stem from various sources: A software engineer could create multiple, slightly differing methods to compare their execution time, readability, or memory consumption and then choose the most efficient method. But in this thesis, we concentrate on choices which were created automati- cally through source code and model merges, respectively. As we will see in Chapter 2, artifacts can be combined in different ways during a merge, which creates choices, in the sense we defined above. Similar to the term artifact, the word choice is intended to bridge the areas of source code and architecture models in software engineering. For source code, choices are single methods that vary in some respect (e.g., the order of their statements might differ). Like source code choices, architecture choices vary, too (e.g., in the way their components are linked). Building on the definition of artifacts, we want to make it clear that a choice is made of artifacts, i.e., a choice contains artifacts.

1 CHAPTER 1. INTRODUCTION 2

We prefer the term choices, as opposed to variants, options, or alternatives to express the fact that a software engineer has to choose from them when designing software. The term invalid in this thesis is often applied to choices which are 1. inconsistent because they contain a serious flaw or 2. equivalent to another choice, thus rendering them redundant. Both kinds of invalidity lead to the removal or repair of the choice. We apply the adjective invalid not only to choices but also to artifacts within a choice: if an artifact causes the choice it is a part of to be invalid, we label this artifact as invalid as well.

1.2 Motivation and Goals

People designing and creating software are confronted with a multitude of choices in their daily work. These choices involve the decision to pick the right tools for software development, choosing from a plethora of program- ming languages, and deciding how to architect a software system in general. This thesis focuses on one such mechanism: the merging of source code and software models. During merging, artifacts can be combined in different ways, giving the software engineer choices. See Chapter 2 for explanations on why software merging creates multiple choices to choose from. Since all of these choices can overwhelm a software engineer, decreasing the number of such choices leads to a reduced burden for the person dealing with all of these choices in software development, e.g., the software engineer. This is the motivation of the effort we undertook for this master’s thesis. Facilitating decision-making by removing some of the invalid choices should reduce the effort in software engineering. This saved effort allows the engi- neer to focus more on the remaining choices and their quality. This is a worthwhile goal since the significance of software quality has increased in recent history and will likely continue to do so in years to come [137]. As software systems become integrated in ever more appliances in our daily lives [76, 79], the need for stable, high-quality software becomes a mat- ter of personal convenience and also safety for a lot of people. Moreover, software is becoming an indispensable business asset for an ever-growing number of industries [3], making high-quality software economically impor- tant as well. We hope that the findings of this thesis help software engineers create higher-quality software by eliminating choices which are clearly invalid and therefore letting engineers focus on the few choices which are meaningful. CHAPTER 1. INTRODUCTION 3 1.3 Method

In order to relieve the software engineer from having to manually sort out choices which are invalid, we apply rules to these possible choices to eliminate or repair them. We analyze choices and the artifacts of which they consist (as defined in 1.1) using these rules. If an artifact within that choice violates one or more of these rules, i.e., the choice is invalid, then we decide that the choice should be repaired or eliminated. Depending on the rule that was violated, we try to repair the choice to make it conform to the rule. If there is no fix for the choice then it is removed, thus reducing the effort a software engineer has to take to select one of the choices. We acknowledge that we cannot automatically repair all of the choices which are invalid but focus instead on the cases where reparation is straight- forward (see Section 5.5 for details). Our rules detect flaws in software artifacts such as potential null pointer dereferences, redundant assignments, or circular dependencies among compo- nents. Furthermore we have created a domain-specific rule that determines whether two choices are equivalent. We call two choices equivalent if they behave identically. Figure 3.1 illustrates our approach in broad strokes whereas Chapter 3 explains it in more detail.

1.4 Related Work

In this section we give a brief outline of papers that are related to our ap- proach either because the portrayed approach is similar to ours or if the goal of their work is similar we describe how their approach compares to ours. With respect to the diagram that outlines our generic approach at Fig- ure 3.1, we argue that the abstract Choice Generator in the diagram can be substituted with a Software Product Line (SPL)1. This stands to reason when we consider that the SPL takes multiple features as input (these are the Artifacts in Figure 3.1) and generates a new product from them. It offers Choices (as we defined the term in Section 1.1) to the software engineer with regard to which features to combine into the resulting product. Therefore, work that proposes ways to analyze SPLs with the goal to eliminate invalid choices which an SPL might generate, is considered related since it shares the goal of our thesis.

1For a short introduction to SPLs, see the background information in Section 2.5. CHAPTER 1. INTRODUCTION 4

Consequently, “Type Checking Annotation-Based Product Lines” by Käst- ner et al. [115] is related to our work as they aim to detect invalid software choices, too. Yet their approach differs from ours: They employ a type sys- tem to find type errors in product lines. In the same vein, Acher et al. [1, Section B] showed in their work how to detect invalid choices in SPLs such as features that can never be part of a product by employing formal methods. Also, Bontemps et al. [18] analyze SPLs using formal methods with the intention of “assisting stakeholders in selecting features”, which is quite similar to the goal pursued by our work stated in Section 1.2. Since our approach aims to facilitate decision-making with regard to soft- ware choices, we consider the work of Schmid and John on variability man- agement [164] as related to our thesis. The scope of their work is admittedly broader than ours as they provide guidance in transitioning an organiza- tion’s software development practice to using a software product line. But in accordance with our goal, they aim to reduce the effort of decision-making in software engineering when it comes to choosing between software vari- ants, a concept which is similar to the Choices we defined in Section 1.1. Their approach differs from ours insofar as they apply a meta model to all artifacts of an existing software project which allows software engineers to create new software variants within the constraints of this meta model [164, Subsection 4.1]. The Model/Analyzer by Reder and Egyed [155] uses rules to analyze models with the goal to give feedback to model changes instantaneously, similar to the feedback of integrated development environments for code. Also, their tool lets users create their own rules which are in the style of predicate logic. While using rules for detecting invalid artifacts, which is a key aspect of our approach, their work focuses on incrementally checking models whereas we analyze models and source code in their entirety.

1.5 Thesis Structure

This thesis is structured as follows:

• In Chapter 2, after this introduction, we provide background informa- tion regarding topics relevant to this thesis such as software merging and invalidity detection. The established background knowledge will be useful to the reader in order to understand the following chapters more easily. CHAPTER 1. INTRODUCTION 5

• Afterwards, Chapter 3 explains our approach for eliminating and re- pairing choices. With regard to the rules, we describe overarching prin- ciples that apply to source code as well as to architecture rules.

• Drawing from the observations made during the work for this thesis, Chapter 4 shows commonalities between rules for source code and rules for software models. Examples of these commonalities will be high- lighted throughout the subsequent case studies.

• In the next two chapters, we test our approach in two case studies: the first for source code rules in Chapter 5, the second for source software model rules in Chapter 6. We discuss the effectiveness and efficiency of our approach at the end of each case study.

• We conclude with Chapter 7 which recapitulates the contents of this thesis and accounts for threats to its validity. Chapter 2

Background

This chapter gives short introductions to topics pertaining to our work. We provide these introductions hoping that they will make it easier to understand our approach described in Chapter 3 as well as the case studies later on in Chapter 5 and Chapter 6.

2.1 Code Merging

As our work focuses on analyzing merged software, this section shall give an overview of how code merging works, what kind of artifacts are merged, and what the motivation for merging code is. One of the motivations for merging software occurs when two or more software engineers work on the same software artifacts in parallel. Assuming that these artifacts are kept under optimistic version control, that is, the concurrent modification of files is allowed1, the files have to be merged in order to include the changes of both engineers. This practice is called parallel development [153]. As we will see in Chapter 5, parallel development is not the only reason for code merging in software, but it is practically unavoidable in larger software projects [12, Section 1] and thus a common reason for code merges. The stage in software development where software engineers merge the code that they have edited individually is called integration phase. This particular phase used to reoccur in intervals of weeks to months [153, Sec- tion 2.3] and took the authors of the code-to-be-merged days to weeks to finish successfully [68]. It was difficult to create a merged code base that had

1The opposite of optimistic version control is pessimistic version control where each file is associated with a corresponding lock which engineers must obtain before they can edit a file [136].

6 CHAPTER 2. BACKGROUND 7 no obvious issues due to changes made between the current integration and the last one, such as restructurings in the code, modifications of interfaces, and the altering of database schemes. Perry et al. [153, Section 5.1] con- firm this when they show that software artifacts that are edited by multiple engineers tend to contain more invalid artifacts. The complexity of the merging task and the resulting effort led to ideas within the domain of software engineering, like continuous integration, that emphasize on the importance on merging code daily [50, 96]. Continuous in- tegration regards delaying the integration phase to a point where a lot of code changes have to be integrated as too risky and overly complex. Proponents of continuous integration instead recommend to frequently merge fewer and thus simpler sets of changes, following an adage of extreme programming “if it hurts, do it more often” [69][91, Chapter 10]. The high frequency of integrations ensures that the working copies of the individual engineers do not drift apart too much so that merging remains feasible. Continuously merging code has also a positive impact on software engi- neer’s attitude towards refactoring other people’s code: When code changes are integrated within the working day, it is less risky for one engineer to change the code of another engineer. Should the refactoring have caused any unintended side effects (detected either by the automated test suite or personally by the engineer whose code was affected), the recent change is far easier to undo than one that is a month old, for example. This is attributed to the fact that comparably few other changes have occurred since then and the memory of the engineer who was responsible for the refactoring is still fresh [65]. From the above we can see that code merging is a wide-spread—and encouraged—practice in the software industry. Since we are concerned with analyzing merged code, this means that our thesis is dealing with a very common kind of artifact (i.e., merged code) of software engineering, which is motivating for us. Assuming that the software artifacts described before are treated as plain text files, there are two merging strategies available: Two-way merging and three-way merging. They are described in the rest of this section.

2.1.1 Two-Way Merging If a text file was edited by two engineers in parallel, the two-way merging strategy can compare the two versions of the file line by line. An implementa- tion of the two-way merge such as the one used in the Unix diff utility [136, Chapter 2.1] is essentially a program to solve the longest common subse- quence problem [94], where the subsequences are the matching lines between CHAPTER 2. BACKGROUND 8 the files. As its output, the algorithm “creates a list of what lines of one file have to be changed to bring it into agreement with a second file or vice versa” [93]. To illustrate the algorithm’s results, Table 2.1 demonstrates how two files (represented by vertically aligned characters for brevity’s sake) are com- pared using the diff algorithm and shows which actions (i.e., delete, add, and change) are necessary to make their content identical. The work this algorithm performs is called diffing, a term that we will use throughout this thesis, especially in our case studies. The inherent drawback of two-way merging is that the algorithm can only point out the difference between the two files, but is unaware of the changes that the engineers performed with regards to the versions’ common ancestor (i.e., the file both engineers started with): It is impossible to distinguish whether a line was added by one engineer or deleted by the other. This is why all modern merge tools use the three-way merge approach [136].

2.1.2 Three-Way Merging As demonstrated in Subsection 2.1.1, in order to detect not only the differ- ences between two files but also to properly merge them, it is necessary to consider the files’ base version, i.e., the common ancestor. The diff3 utility from the GNU diffutils package is commonly used to perform a three-way merge. A rigorous analysis of diff3 ’s algorithm can be found in Khanna’s per- tinent paper [110]. Consider Figure 2.1 for an exemplary three-way merge building on the previous two-way merge example from Table 2.1. Although used by many current merge tools [136], the investigation of diff3 by Khanna et al. [110] as well as Ritcher’s analysis [158] show that diff3 behavior is counterintuitive and unpredictable at times. Still, both SVN and Git use the three-way merge as their default merge strategy [77, 36].

2.2 Model Merging

The case study described in Chapter 6 analyzes merged software models and detects invalid artifacts using specialized rules. To provide a better understanding and also to outline the theoretical background for our case study, this section gives a brief introduction to model merging and illustrates the implications it has on invalid artifacts in software models. As explained by Brunet et al. [22, Chapter 1], model merging becomes necessary when multiple engineers, working in model-based development, have to recombine their individual versions into a single model. Another quite CHAPTER 2. BACKGROUND 9

First File

1 ancLine2 2 addedInFirstFile

Merged File Ancestor

1 changedInSecondFile 1 ancLine1 . 2 addedInFirstFile 2 ancLine2 3 addedInSecondFile

Second File

1 ancLine1 2 changedInSecondFile 3 addedInSecondFile

Figure 2.1: Minimal three-way merge scenario: The common ancestor file on the left is modified by two software engineers in parallel. In the first file, line ‘ancLine1’ was deleted, whereas the line ‘addedInFirstFile’ was added with regard to the ancestor. In the second file line ‘ancLine2’ was changed to ‘changedInSecondFile’ and line ‘addedInSecondFile’ was added. Combining the changes of the first and the second file results in the merged file seen on the right. CHAPTER 2. BACKGROUND 10

first file second file a w b a c b d x e y f z g e

Table 2.1: Example reproduced from Hunt and McIlroy [93] that illustrates how the diff algorithm aligns the first file to match the second. The single characters represent lines to keep the example short. The algorithm produces a list of steps that have to be performed to bring the first file into agreement with the second one. These steps are: First, prepend line ‘w’, then change lines ‘c’ and ‘d’ to ‘x’, ‘y’, and ‘z’. Finally, delete lines ‘f’ and ‘g’. common [10] reason for model merging arises when a project’s stakeholders— who observe different aspects of the system through their models, some of which are requirements, domain analysis, system architecture, and system behavior—make changes and want to integrate those modifications into the model [172, Chapter 1]. As will be further discussed in Subsection 2.4.1, merging these aspects often causes invalid artifacts in software models. Extensive research has been conducted on algorithms for diffing and merg- ing different kinds of models that occur in the domain of software engineering. These models include database models [142, 27], software architectures based on modeling languages like UML [181], language-independent approaches [2], product line architectures [31], and requirements models [156]. Because mod- els can also be expressed in XML notation, as demonstrated by Dashofy’s ArchStudio [42], the three-way merge algorithm for XML documents de- scribed in Lindholm’s paper [123] is suitable for merging such models, too. Similar to the merges for source code described in Section 2.1, a three-way approach to model merging offers the benefit of not only pinpointing the dif- ferences between two models but also being able to determine which changes the models underwent with regard to their common ancestor. An exemplary three-way merge of models is illustrated and explained in Figure 2.2. Since the work of this thesis is concerned with the detection of invalid artifacts in merged code and software models, we will proceed with a short discussion of this topic. Section 2.3 gives an overview regarding existing work on analyzing, classifying, and detecting invalid code artifacts, whereas Section 2.4 provides background information on the same issues for invalid CHAPTER 2. BACKGROUND 11

Figure 2.2: A three-way model merge: Designer 1 removed class B including its subtype relation to class A. Designer 2 added class D as a subtype of class C. The merged model “Final Model” is the result of applying all the actions performed by both designers. Merge example from Alanen and Porres [2]. artifacts in software models. Via the following sections, we aim to establish a bigger picture for the reader and provide perspective on where our approach fits into with respect to established approaches.

2.3 Invalid Artifacts in Code

In this section, we present current approaches, categorizations, and tools for finding invalid artifacts in code including the motivation for doing so. Where appropriate, we will relate to our own approach and case study for code described at Chapter 5 throughout this section.

2.3.1 Code Clones Cause Invalid Artifacts Considering that duplicate code in software projects increases their com- plexity [119], causes maintenance problems [7, 9, 23, 98], and is a source of invalid artifacts in code [33, Section 6.4][101], significant effort has been invested by the software engineering community to detect and avoid code CHAPTER 2. BACKGROUND 12 clones. Research suggests that copied code is quite common in large-scale software projects such as the one analyzed by Laguë [23, Chapter 4] where the number of copied functions ranges between 6.4% and 7.5%. Baker’s system under study contained about 19% of code duplications [7, Section 4.1]. Such code clones can either be exact, meaning that two sets of lines (above a chosen threshold of line numbers) are completely identical, or they can be parameterized which occurs for example when the copied code has its variable names changed [7, Chapter 2]. An example for a parameterized code clone is illustrated in Figure 2.3. Given this definition of code clones, our rule described at Subsection 5.5.1 is an example of a parameterized clone detector, considering that it deter- mines equivalency in code blocks if the blocks only differ in the order event lis- teners are added. Instead of taking the implicit position of trying to avoid cloning parts of software, Gabel et al. [73] have focused on finding invalid artifacts by extracting and analyzing sets of code pieces that were first copied and then altered, because those cloned parts might be “thoughtful solutions to difficult engineering problems” [73, Section 1].

2.3.2 Invalidity Categorization The purpose of this subsection is to provide an overview of existing cate- gories for invalid code artifacts. We establish context with our approach by determining into which of these categories the invalid artifacts fit that our code rules detect (see Section 5.5). Various classifications for invalid artifacts in code (often called defects [141, Subsection 1.2][60, Chapter 5]) have been proposed [49, 56, 178]. One of the most influential frameworks for classifying invalid artifacts in code is the Orthogonal Defect Classification(ODC) [90, Section 1]. An excerpt of ODC’s defect types [32, Section ‘The Defect Type At- tribute’] is found below to give an impression what kind of defects the ODC covers.

Algorithm Defects caused by incorrectly or inefficiently implementing an algorithm or a data structure.

Assignment Defects related to variable assignments and initialization.

Function Program misbehavior, or erroneous user interfaces belong to this defect CHAPTER 2. BACKGROUND 13

Figure 2.3: Example for a parameterized code clone detected by Baker’s dup clone detection algorithm [7]. The algorithm was able to recognize the parameterization of the clone in the form of the variables lbearing and left as well as rbearing and right. The non-matching indentation is not considered a discriminating feature. Example from [7, Chapter 1]. CHAPTER 2. BACKGROUND 14

type.

Interface Defects that occur due to the interaction with other components within or outside the developed software system including device drivers. Prob- lems in relation to parameter lists used for calling functions also pertain to this defect type.

Timing/Serialization Defects caused by parallelism and shared resources.

Considering the different defect types in the ODC, the Assignment type from Item 2.3.2 suits our code rules described at Subsection 5.5.2, at Subsec- tion 5.5.3, and at Subsection 5.5.4 very well as they indeed detect assignment issues in code. We argue that our rule dealing with equivalent code blocks (see Subsection 5.5.1), on the other hand, does not find any invalid artifacts that can be assigned to one of the types listed in the ODC. Rather, the invalid artifacts found by the rule are parameterized code clones [7, 104].

2.3.3 Detection Approaches The following subsection provides a brief outline concerning common ap- proaches to finding invalid code artifacts. We will also use this outline to point out to which of these approaches our approach for detecting invalid artifacts in code belongs and why.

2.3.3.1 Code Inspection The value of programmers reading and reviewing other programmers’ code in a systematic fashion was recognized in the seventies [59]. Code inspection is generally described as “the process of finding defects in the [software] ar- tifact” [87, Subsection 2.2] employing a disciplined approach that commonly involves checklists pertaining to coding conventions, discussing design alter- natives, and catching programming mistakes [151][59, Section ‘The Inspec- tion Process’][140, Chapter 3]. The goal of such inspections is to “identify defects within the work product and to provide confidence in its correct- ness” [150, Subsection 1.3.4]. As a positive side effect, code inspections can contribute to an improve- ment of the participants’ programming skills by knowledge exchange and foster team spirit [140, Subsection ‘Side Benefits of the Inspection Process’]. Siy and Votta [166] have shown that code inspections remain a useful practice to improve the readability and maintainability of code also after the CHAPTER 2. BACKGROUND 15 advent of languages with automatic memory management and powerful type systems; mechanisms that avoid many defects that would otherwise occur during runtime [166, Section 1].

2.3.3.2 Testing Software testing is the activity of automatically or manually executing soft- ware in order to find invalid code artifacts [140, Chapter 2]. Myers et al. [140] emphasize in “The Art of Software Testing” that the aim of testing cannot be to prove that the software under test is without invalid artifacts but rather to find invalid artifacts in the code. They further distinguish between unit tests that test individual com- ponents of the software (e.g., classes and functions) which can be auto- mated2 [140, Chapter 5] and acceptance testing that examines the whole software system from a user perspective, which commonly entails manual testing [140, Chapter 7].

2.3.3.3 Static Code Analysis Static code analysis tools search code—without executing it—for security vulnerabilities, style violations, logical errors such as unreachable code, or null pointer dereferences [5, Chapter 1]. The code rules we created for this thesis that are outlined in Section 5.5 constitute a form of static code analysis since our rules analyze code artifacts without executing the code itself. We subsequently give a short introduction to the topic and present similarities between the approach of established static analysis tools and ours. Hangal et al. [84] state that the practice of automatically analyzing source code can be seen as a form of automated debugging, apt to reveal invalid ar- tifacts which are difficult to find even for experienced engineers. They also emphasize the need for better practices in order to increase code quality by contrasting the rapid progress and success in the field of hardware with the still lacking reliability of software, whose failure has caused catastrophes at times [126, Section 2.1] [118]. In spite of their potential usefulness in this context, static code analysis practices are not widely adopted [100, Chap- ter 7]. There exists a variety of tools for static code analysis; to limit the scope of this section somewhat, we will focus on open-source products for analyzing

2The xUnit frameworks are commonly used to write and execute unit tests, with NUnit for .NET and JUnit for Java as the most prominent members [83, Chapter 3][138, Chap- ter 6]. CHAPTER 2. BACKGROUND 16

Java programs since we are detecting invalid artifacts in Java code in our case study described in Chapter 5. In this domain, popular projects include SonarQube [30]3, FindBugs [89], and PMD4. All these tools try to detect patterns that point to an invalid artifact in software, yet they have different approaches to detecting these patterns: For example, PMD exclusively ana- lyzes source code [40, Section ‘Source Code Checking with PMD’], whereas FindBugs focuses on Java’s bytecode [5, Chapter 3]. SonarQube’s rule en- gine pursues a dual approach, i.e., depending on the rule it analyzes source code or bytecode5. In order to define and detect potential security threats, SonarQube uses the definitions of institutions like OWASP6 or CWE7. However, what all these tools and our approach portrayed in Chapter 3 have in common, is that they apply a set of rules to software with the goal of finding invalid artifacts. Taking SonarQube’s set of Java rules as an example, the scope and ap- proach of the rules are quite diverse. Some are very generic and could be applied to other programming languages as well, such as the rule making sure that all parameters passed to a function should be used within that function8. Other rules are more directly tailored to the language or its API. Examples are prohibiting a call to run() on Java’s Runnable since start() should be called instead9. Or the detection of a return statement within a finally block which prevents the propagation of thrown exceptions to the caller10. Like our rules, the rules of SonarQube, PMD11, and FindBugs [5, Chap- ter 3] for Java are written in Java. SonarQube additionally uses annotations to provide information about the rule such as the description, category, and severity in case of a violation12. Regarding data on how effective static code analysis is, a review at Google conducted by Ayewah and Pugh concluded that 77% of the problems de- tected by FindBugs were considered real and deemed worth fixing by the engineers [6, Section 4.1]. Although none of the issues flagged by Find-

3A subset of SonarQube’s features is the static code analysis tool Squid. 4http://pmd.sourceforge.net/ 5For analyzing Java bytecode, SonarQube uses the ASM framework: http://asm.ow2. org/ 6https://www.owasp.org 7http://cwe.mitre.org/ 8http://jira.sonarsource.com/browse/RSPEC-1172 9https://sonar.spring.io/rules/show/squid:S1217?layout=false 10https://sonar.spring.io/rules/show/squid:S1143?layout=false 11PMD’s rules for Java, which are written in Java themselves, can be found here: https: //github.com/pmd/pmd/tree/master/pmd-java 12https://github.com/SonarSource/sonar-java CHAPTER 2. BACKGROUND 17

Bugs as being severe were “associated with any serious incorrect behaviors in Google’s production systems” [6, Chapter 1]. The reviewers ascribe this discovery to the rigorous manual testing that is performed before pushing code to production. FindBugs had been in place at Google but its reports were not easily accessible to engineers which caused FindBugs’ warnings to be ignored [6, Section 2.1]. One of the review’s main conclusions is that although static code analysis is generally not able to discover code deficiencies that would have been im- possible to find using other techniques, like acceptance testing, it does have the advantage that it detects problems earlier on13. This is beneficial, since finding invalid artifacts soon makes them cheap to fix [6, Chapter 6], and furthermore, it points to the root cause of software misbehavior instead of merely observing its symptoms using testing [6, Section 4.1].

2.4 Invalid Artifacts in Models

Complementing the previous section on invalid code artifacts, this section outlines the existing techniques for detecting and classifying invalid artifacts in software models. Since the rules for our model case study presented in Chapter 6 detect invalid artifacts in models, we will establish context to our rules during the course of this section.

2.4.1 Multiple Views Cause Invalid Artifacts Mirroring Subsection 2.3.1 that discussed a common cause on invalid code artifacts, this subsection portrays a frequent reason for invalid artifacts in software models. We aim to give a short overview of current research and to provide background knowledge concerning this area of software engineering. Invalid artifacts in models often—but not always—exist because the sys- tem under development is represented using different specifications that por- tray an individual part or aspect of the system [10][145, Chapter 2]. Modeling these system aspects is intended to provide a simplified view on the software product and tries to make it easier to review. Model designers draw from other engineering disciplines like civil engineering where multi- ple blueprints representing various aspects of the same building are used to facilitate constructing the building and to improve communication between stakeholders who are likely to be interested in different systems of the struc- ture (e.g., electrical wiring, plumbing, heating) [10, Chapter 1].

13Integrating static code analysis into the automated build process is a practicable way to catch regressions early on [91, Chapter 7]. CHAPTER 2. BACKGROUND 18

Analogously, a software system can be modeled using multiple viewpoints that express its different characteristics [135]. These viewpoints—a different name for them is “partial specifications” [147, Section 1.A]—are often devel- oped by multiple groups of development participants (e.g.,business analysts, programmers, product owners, designers) [172, Chapter 1]. In his seminal paper [113], Kruchten has described four possible views (Logical, Process, Development, and Physical) on how to describe a software system and their combination as the fifth view. The Scenarios are used for this fifth view “to show that the elements of the four views work together seamlessly” [113, Section ‘Scenarios’]. The models of these different types of viewpoints can be formulated using a variety of design notations including text, diagrams, and mathematical notation [26, Section 5.3]. Note that the types of viewpoints mentioned above are not the only possibility; a different categorization was proposed for example by the Object Management Group [149, Subsection 2.2.5]. At some point though, the analogy between models of software and mod- els of real-life structures starts to break down. This is due to the fact that software is invisible: It has no “geometric representation in the way that land has maps, silicon chips have diagrams, computers have connectivity schemat- ics” [21, Chapter 16]. This makes creating visual abstractions, i.e., software models, hard because “the lack of any visual properties for software does create a conceptual gap between representation and implementation” [26, Section 5.1]. Nevertheless, the abstract nature of software architectures leads to estab- lishing a high-level vocabulary that simplifies discussing design decisions [26, p. 118]. This is useful and necessary considering that a software architecture can be defined as consisting of those system parts, of which the lead engineers need to have a shared understanding [67].

2.4.2 Invalidity Categorization To facilitate reasoning about invalid artifacts in models, we now want to present existing categories for invalid model artifacts. We will put the rules we created for detecting invalid model artifacts (depicted in Section 6.6) in context by describing to which category the invalid artifacts belong that our rules can detect. Liu et al. [127, Chapter 2] classify invalid artifacts in models as follows:

Redundancy If a system specification contains the same piece of information multi- ple times some part of the specification is redundant. This redundancy CHAPTER 2. BACKGROUND 19

is not problematic per se because “[s]uch redundancy may be desirable since it can provide additional information to a requirement specifica- tion from different perspectives, and describe the behavior of a design unit under various scenarios” [127, Chapter 2]. Only when the infor- mation of those parts contradict each other, an invalid artifact within the model becomes visible. Liu et al. [127, Chapter 2] distinguish between design redundancy and data redundancy: Design artifacts such as UML classes, objects in a sequence diagram, or use cases are redundant if they model the same system element multiple times. Figure 2.4 illustrates this using two sequence diagrams that depict the same use case on different levels of granularity. According to them, data redundancy occurs through the often complex relationships between data objects. Sometimes, the graphs that are formed by data objects and their connections to each other, can be simplified, thus removing data redundancy and reducing unnecessary complexity [127, p. 4].

Conformance to Constraints and Standards A model must conform to certain constraints regarding its structure and well-formedness. In the case of UML for example, this can be ensured using the Object Constraint Language(OCL) [29, 157, 171] that allows model designers to impose invariants, preconditions, and postcondi- tions on class diagrams. A model constraint according to Warmer and Kleppe is “[a] restriction on one or more values of (part of) an object- oriented model or system” [179]. For instance, these restrictions include that every model element must have a name or that every connecting element must connect exactly two model elements. Standards, similar to constraints, define limitations which a model has to respect in order to be considered standard-conforming. Such stan- dards include the Law of Demeter originally proposed by Lieberherr et al. [121] which states that components of a system should communicate with as few other components as possible and avoid calling methods on one component via another. If this law is honored, coupling between objects is reduced, which facilitates the evolution of individual system modules and the entire system in general [92, Chapter 5][120, Sec- tion 3.3]. Also, it tends to reduce the average number of errors within a given program, as Basili et al. [8, Subsection 3.2.2] have shown. Following the definitions given here, the rules created for this master thesis which are applied to models (the relevant case study is demon- strated in Chapter 6) can be categorized partly as constraints (e.g., CHAPTER 2. BACKGROUND 20

Figure 2.4: Two sequence diagrams with overlapping and thus redundant information. They describe the process of requesting a new meeting, once with location and time, and once without that information. Diagram source: Liu et al. [127, Chapter 2].

the rule ensuring that a model has at least one mandatory component shown at Subsection 6.6.5) and partly as standards (the rule outlawing circular dependencies presented in Subsection 6.6.1).

Change As a system is developed, requirements and the context in which the system is going to be operating may change, thus necessitating a mod- ification of the model. These changes harbor the risk of introducing invalid artifacts, especially when the modifications are incomplete, i.e., some required steps were unintentionally left out. These kind of invalid artifacts may also be introduced when the model is transformed from one notation to another (e.g., from UML to the architectural descrip- tion language Rapide [128]).

Alternatively, the framework ConMan presented by Schwanke and Kaiser distinguishes between six types of invalid artifacts that focus more on syntac- tic correctness (including type-checking) and the absence of versioning prob- lems [165, Chapter 3]. Schwanke and Kaiser emphasize (just as Spanoudakis in [169, Chapter 1]) that the existence and detection of invalid artifacts within a system should not hinder engineers from working on it [165, Chapter 2]. Considering the classification of invalid artifacts by Liu et al. [127], we conclude that the rules we devised for our model case study enforce the adherence to standards and constraints, thus belonging into the category explained at Item 2.4.2. CHAPTER 2. BACKGROUND 21

2.4.3 Detection Approaches Now that we have outlined the various ways of categorizing invalid artifacts and put our rules into this context, the following subsection addresses the different ways invalid model artifacts can be tracked down within a software model. Spanoudakis et al. [172, Chapter 4], for example, have grouped the approaches for detecting these kinds of invalid artifacts as follows:

Logic-based detection Logic-based detection is a special kind of formal methods in software development, which can be defined as a “process for developing software that exploits the power of mathematical notation and mathematical proofs” [180]. The logic-based approach takes a software model described using a for- mal modeling language (such as first-order logic used by Easterbrook and Nuseibeh [52], temporal logic [176], the Object Constraint Lan- guage [171], or VDM++ [64, Section 2.5]) as input and employs logical transformations, inferences, and theorem provers to detect invalid ar- tifacts. Using formal logic as a foundation to reason about models and the invalid artifacts they might contain has the benefit that this kind of analysis is well-studied, the semantics are sound [172, Section 4.1], and that through the model’s mathematical rigor design verification can be performed [26, Section 18.1]. Moreover, logic can also be applied to not only find invalid model artifacts, but also to decide whether a program produces a set of specified outcomes by analyzing its model [88]. While being a powerful analysis tool, classical logic makes it difficult to reason in the presence of contradictions within the software system specification, the domain, or the requirements because every arbitrary fact follows from conflicting information (ex falso quodlibet)[95, Sub- section 2.2.2]. To overcome these difficulties, Besnard and Hunter have proposed a weaker form of classical logic—quasi-classical logic—which is able to tolerate such contradictions in software specifications [14]. Mathematical proofs in software models and the concomitant certainty about system properties make them attractive for aiding software devel- opment in general and specifically when designing safety-critical parts of a program [80]. Still, these formal descriptions suffer from drawbacks: They cannot represent certain aspects of computing such as “Human-Computer In- teraction (HCI), some features of parallelism, [and] the non-functional CHAPTER 2. BACKGROUND 22

elements of real-time systems” [26, Section 18.1]. Furthermore, this kind of descriptions tend not to scale very well, because they become difficult to manage for humans as they grow [26]. Additionally, “theo- rem proving is computationally inefficient” [172, Section 4.1]. Glass maintains that formal methods in general may have limited use in planning and creating software because “the needs of the customers evolve over time, as the customer comes to learn more about solu- tion possibilities, and that what is really needed is not a rigorous/rigid specification, but one that encompasses the problem evolution that in- evitably occurs” [78]. Also belonging to this category of logic-based detection according to Spanoudakis et al. [172, Section 4.1] is the approach of Emmerich et al. [57] who make use of “consistency rules which determine relation- ships that should hold between [structured requirement documents]”. Their rules are formulated with an extension of Common Lisp. Similarly, rules expressed using the Object Constraint Language that are applied to elements of a UML model are part of this category [170, Section 5]. Although our model rules do not take the rigorous approach of formal methods, we nevertheless argue that they belong into this category since our rules are defined as predicates that must hold for each model element (see Section 6.6), just as the rules in [57, Subsubsection 5.2.2] that fall into this category according to Spanoudakis et al. [172, Sec- tion 4.1].

Model checking detection This technique of detecting invalid artifacts has proven very effective in checking digital system designs [15, Chapter 1] using, for example, Bi- nary Decision Diagrams which are directed acyclic graphs to represent the states of hardware [24, 25]. Transferring the analysis approach from hardware to software has been shown to be less successful because software employs a considerably greater number of types which are much more complex than Booleans. The increased—often infinite—number of states that software can be in, aggravates the problem significantly. Abstraction and thus simpli- fication of the software model has to be performed either by manually abstracting the original model (which is prone to error) [48, Chapter 1], or by automatically deriving an abstraction [15]. CHAPTER 2. BACKGROUND 23

Another model checking approach is to define the system with respect to its requirements using the Software Cost Reduction (SCR) [86] no- tation. Gargantini and Heitmeyer [75] have developed an automated way to generate tests from a requirement specification written in SCR in order to determine whether a software implementation satisfies its requirements. Using SCR, the specified system can be viewed and analyzed like a state machine. Well-formedness can be checked as well as application-domain rules in the form of invariants of the system such as “the absence of cir- cular definitions and undesired non determinism” [172, Subsection 4.2].

Specialized model analysis detection Section 2.4 mentioned that software can be modeled using different viewpoints to represent the system under development from all per- spectives that are relevant to the software’s stakeholders. If two or more viewpoints describe the same element, an “ontological overlap” has occurred among the viewpoints, something that happens in all software projects of significant size. As an inevitable result of such an intersection of perspectives, invalid artifacts in the form of inconsistencies between models come to light [61, Chapter 2]. It may not be obvious immediately that two pieces of spec- ification contradict each other because the domain knowledge may be expressed in different ways and on varying levels of abstraction. These contradictions within the viewpoints might indicate that a misunder- standing during requirements elicitation has occurred. Then again, they might reflect the fact that there are contradictions in the real world, something the stakeholders either have to live with or try to settle by negotiating. Tolerating inconsistencies among viewpoints might be a necessity in order “to support innovative thinking, deferment of commitments and exploration of alternatives” [169, Chapter 1]. Also van Lamsweerde et al. [177] see the beneficial aspects of inconsistencies as they “allow fur- ther elicitation of requirements descriptions being acquired from multi- ple stakeholders”. Finkelstein et al. [61] have expressed similar opinions, adding that an absence of inconsistencies in non-trivial model is hard or even impossible to achieve. Blanc et al. [16] have also made efforts to detect invalid model artifacts in viewpoints. They created a meta-model independent approach to model checking, focusing on methodological rules. This means the rules not only apply to the final model but also to the way it was produced. CHAPTER 2. BACKGROUND 24

Delugach’s approach [47] uses semantic networks to represent the view- points of a software system’s specification as a graph with the goal to find conflicting requirements. As viewpoints may be described in distinct requirement languages, Delugach’s framework translates those different notations into a common form first. A common requirement language reduces the number of translations that are necessary between the supported languages and makes analyz- ing the heterogeneous specification easier because the analysis engine only needs to know one language [47, Chapter 2]. This is similar to the way we create an abstraction of the model in our case study described in Chapter 6 to facilitate analysis. Delugach’s resulting representation uses Sowa’s conceptual graphs [168] as the form of visualization. Figure 2.5 shows a simple example of such a conceptual graph. Whereas the previously discussed model analyses were static, leading to a complete analysis of the whole model (or set of models), there exist also approaches that check a model for invalid artifacts each time it is modified, giving immediate feedback to engineers [55].

Human-centered collaborative exploration detection The detection approaches mentioned so far have analyzed formal sys- tem specifications; Kotonya and Sommerville use requirement models from multiple viewpoints defined in varying kinds of languages that can include informal, i.e., natural languages but also physical formu- las [112, Chapter 4] and apply analysis to those viewpoints. They aim to detect conflicts among those viewpoints [112, Section 5.2]. This term is broader than our invalid artifacts defined in Section 1.1 since they

Figure 2.5: This conceptual graph shows the fact that “Doctor Jones is 45 years of age and he is the agent of an act of surgery performed on a pa- tient” [47, Chapter 2]. CHAPTER 2. BACKGROUND 25

not only analyze software artifacts but also the interpersonal process of developing these artifacts. Easterbrook has developed a method called Synoptic with the goal to enable computer-supported negotiation for detecting and handling conflicts in the software development process as well as the software specification [51]. Their approach to conflict is a rather positive one (similar to the stance taken by Liu et al. [127, Chapter 1] and more directly by Nuseibeh et al. [146, Chapter 3]), stating that conflicts have been recognized in other fields like sociology [163] and also logical reasoning [71] to be an important source of information and should thus not be suppressed, ignored, or avoided but seen as something that commands action. Synoptic aims to provide a framework that guides the user with iden- tifying, describing, and resolving conflicts. To mitigate the emotional aspect that is often associated with conflicts, Synoptic tries to “separate the people from the problem, in order to avoid the polarising nature of arguing from entrenched positions” [51, Chapter 3]. Likewise, in software engineering Curtis et al. [38] stress the significance of being able to identify and conciliate conflicts that occur inevitably during human collaboration because the “areas of knowledge do not fit together like a jigsaw, but instead overlap in some places, conflict in others, and often leave gaps” [51, Section 2.5]. They declare this ability to be a central trait of an exceptional systems designer who they call the “intellectual core of the project (i.e., the keeper of the project vision)” [38, Section ‘Individual Level’]. In conclusion, we determine that according to this categorization given by Spanoudakis et al. [172], our model rules portrayed in Section 6.6 fit best into the logic-based detection approach outlined at Item 2.4.3.

2.5 Software Product Lines

Since some of the rules portrayed in Section 6.6 analyze Software Product Lines (SPLs), this section gives a short introduction to SPLs and also explains some deliberations regarding detecting invalid artifacts in SPLs. In software architecture, we can model a family of software products that only differ in some parts whereas the base structure remains the same us- ing an SPL [103, 106, 122]. The variability of an SPL is often visualized using a feature model containing Boolean guards that define whether a fea- ture or component should be present in the resulting model which the SPL CHAPTER 2. BACKGROUND 26 instantiates [105]. Engineers can create new variants of the model by setting these Boolean guards instead of having to model the software architecture themselves. Figure 2.6 shows a small example of such a feature model. As we will see in Section 5, SPLs can also be used to create not only models of software but also new software products. In order to detect invalid artifacts in software product lines, we could check every possible model the product line is able to instantiate based on its feature model. As this soon becomes unwieldy and even impossible as the number of potentially included features grows [132, Subsection 2.5], we have to analyze the product line itself instead of all its conceivable instantiations. This analysis results in identifying SPLs that contain the risk of creating models that are not valid. Section 6.6 presents a number of rules that analyze SPLs this way and point out invalid model variants that can but should not be instantiated from the SPL, thus enabling the elimination of choices. CHAPTER 2. BACKGROUND 27

Figure 2.6: An exemplary feature model depicting the variability in features and components of a mobile phone. Filled dots signify mandatory features, empty dots mean that the feature is optional and the filled triangle below the Games feature states that at minimum one of the features must be included in the product. Graphic from [107]. Chapter 3

Approach

In this chapter, we outline our approach to reach our goal described in Sec- tion 1.2. We also refer to work related to our approach in Section 1.4. To make this chapter easier to read, we remind the reader of the definitions of the terms choice and artifact as defined in Section 1.1. In order to reduce the number of choices produced by a software merge that a software engineer has to choose from, we apply rules to the set of choices. With these rules we analyze a choice and determine whether it is invalid or not. We say that the choice is invalid if it violated one of the applied rules. Depending on the violated rule, we either try to repair the invalid choice to make it valid, or remove the choice from the set of choices. In general, the reason for multiple choices after a software merge comes from the fact that artifacts can be merged in different ways1. We termed this source of choices generically Choice Generator since it generates choices. Figure 3.1 depicts our approach including the software engineer’s role diagrammatically.

3.1 Rules

At its core, our approach to improve the set of choices that is the result of a software merge, is to apply rules to those choices. Because the rules are a crucial part of our approach, we use this section to explain the deliberations behind these rules. Following Nuseibeh’s definition of what constitutes an invalid code arti- fact (Nuseibeh uses the term inconsistency for this), namely that “[a]n in- consistency occurs if and only if a (consistency) rule has been broken” [145,

1This is explained in more detail in Subsection 2.1.2.

28 CHAPTER 3. APPROACH 29

Artifacts Choice processed by Generator

produces

Choice A Choice B Choice C remove Rules Choice D repair

chooses from

Software Engineer

Figure 3.1: A high-level illustration of our approach and the motivating effects of its application: A software engineer has to select from a set of choices that stem from an abstract source labeled Choice Generator. The rules developed for this thesis aim to eliminate or repair some of these choices from which the software engineer can choose (in this diagram, Choice B is removed whereas Choice D is repaired), thus easing decision-making in software engineering which is a goal of the work described in this thesis. The Artifacts are the elements of which choices are made. In our thesis, these are —as defined in Section 1.1— source code blocks and software models. These artifacts are processed by the Choice Generator. It stands for various sources that create the need to choose from a set of choices. Software merging is an example for such a source as it causes artifacts to be combined in various ways. CHAPTER 3. APPROACH 30

Chapter 2], we have defined multiple rules once for code and once for software models, in order to detect and remove invalid choices. According to the classification given by Liu et al. [127, Chapter 2], the rules developed for this master’s thesis fall into the category “Conformance to Constraints and Standards” as described in definition 2.4.2. Considering that our rules are realized as software, the goal in designing the rules was to keep them conceptually generic and applicable to a wide range of models and code bases. Furthermore, to increase flexibility and maintainability, we aimed to design our rules to be legible and thus easier to change if need be. Schematically, a rule receives a choice as input2 judges that input accord- ing to the rule’s knowledge of what constitutes a rule violation and produces a judgment as output. All these judgments of each rule are aggregated to a total judgment of all applied rules. This total judgment describes whether the processed choice is in violation of any rule. Figure 3.2 illustrates in a simplified manner how the choice, the rules, and the total judgment relate to each other in our case studies. The scope of the rules presented in Section 5.5 and Section 6.6 vary to a certain degree: The rule either focuses on a single artifact of the analyzed choice or the whole choice, analyzing the interrelations among its artifacts. For certain rules it becomes necessary to choose the latter approach: For example, in order to screen for circular dependencies in a software model, the dependency tree of each component within the examined model must be checked. A description of the more extensive rule detecting circular depen- dencies in software models can be found later in Subsection 6.6.1. The rules do not analyze a choice directly in its original form (be it source code or model). Rather, before the rules are applied to the choice, it is first parsed and converted into a representation which is more suited for rule analysis. These conversions and representations are described in detail in Section 5.2 and in Section 6.2. It is noteworthy that applying our rules is free from side effects. Thus, the order in which the rules are applied does not change the outcome in any way.

3.2 Limitations

As the rules of our approach do not use formal methods, we recognize that they cannot offer the rigorous mathematical proofs of an approach that em-

2A choice consists—in agreement with our definition given in Section 1.1—either of model artifacts or code artifacts. CHAPTER 3. APPROACH 31

Choice Input

analyzes

Rule A Rule B Rule C Rule Set Rule D

produces

Judgment of Rule A Judgment of Rule B Judgment of Rule C Total Judgment Judgment of Rule D

Figure 3.2: The relationship between the analyzed choice, the rule set, and the total judgment. ploys, for example, first-order logic like the one portrayed in [18]. Our rules were created with specific code and model issues in mind, as illustrated in the respective case studies in Chapter 5 and in Chapter 6. Further research on additional models and code bases is needed to ascertain the effectiveness and efficiency of our rules. Furthermore, we limit our automated repairs to invalid artifacts where the repair action is simple. We realize that for more involved repairs that necessitate, for example, the initializing of an object, human judgment is needed. Chapter 4

Commonalities Of Code and Model Rules

Before we present the two case studies for source code and models that demonstrate the practical application of our approach, we want to introduce the reader to overarching principles between source code rules and model rules. The reason for this is that these overarching principles are referenced in the case studies each time a practical example of such an overarching principle was found. Note that we use the expression commonalities and overarching principles interchangeably in this chapter and beyond as both expressions convey the theoretical commonalities between code and model rules adequately. Although rules for models and code operate on different abstraction levels (cf. Krueger [114, Section 1] or Atkinson and Kuehne [4])—one being on a higher abstraction level than the other—during our work, we were able to identify two areas of commonality that link the rules. These overarching principles are discussed and presented in this chapter from a theoretical point of view, whereas the case studies found at Chapter 5 and at Chapter 6 demonstrate the practical application of these theoretical concepts to the reader. In order to facilitate discussing the parallels between code and model rules, we remind the reader of our definition given in Section 1.1 for the term artifact that is used to abstract over the different constituents of source code and software models. In the following subsections, we will use this catchall term to denote code statements, code blocks, architecture components, and whole architecture models.

32 CHAPTER 4. COMMONALITIES OF CODE AND MODEL RULES 33 4.1 Common Rule Categories

As the first commonality between code and model rules, we propose two distinct categories that contain the rules of both abstraction levels. While model and code rules analyze very different artifacts1, two overarching rule categories emerged during our research. Rules for both source code and models can be associated with these categories; Figure 4.1 illustrates the categories encompassing the rules for both software models and source code.

4.1.1 Invalid Artifact Combination The first category we identify concerns rules detecting invalid combinations of artifacts. These rules analyze the relations between artifacts and how they interact with each other, trying to find combinations that are invalid. For instance, artifacts whose dependency relationship forms a circle, represent an invalid artifact combination (a rule to detect a combination like this, is shown in Subsection 6.6.1). On the code level, we can define a rule which demands that only initialized objects can be combined with a method call on them. Acting contrary to this, i.e., not initializing an object and trying to call one of its methods, constitutes an invalid combination of artifacts as dereferencing an uninitialized object is not advisable and leads in Java, for instance, to a null pointer exception. The rule described in Subsection 5.5.4 finds such accesses to uninitialized objects. Once an invalid combination of artifacts is detected, resolving it is not a task easily automated. In case of circular dependencies, a possible solution is to cut the links between the dependent components. Yet, we deem these automatic repairs as too risky since they might cause more harm than good when trying to repair the invalid artifact combination. We maintain that the artifacts should be edited manually, taking into account the specific context of the artifacts to avoid introducing further invalid artifacts.

4.1.2 Dispensable Artifact Additionally to invalid artifact combinations, we define a category of rules that find dispensable artifacts in source code or architecture models. If a choice contains a dispensable artifact it means that the choice could do well

1At the abstraction level of architectures we analyze components, links, and connectors whereas at the source code level we are concerned with method calls, statements, and assignments. CHAPTER 4. COMMONALITIES OF CODE AND MODEL RULES 34

Figure 4.1: The rule categories described in Section 4.1 span both the model and the code rules. without that artifact. Examples for such dispensable artifacts are the redun- dant assignment statements which our rules portrayed at Subsection 5.5.2 and at Subsection 5.5.3 detect. We also define rules finding equivalent artifacts to fall under this category. The rationale for this classification is that if two artifacts fulfill the exact same purpose, with no difference in behavior, only one of them is needed, rendering the other artifact dispensable. We are aware that proving whether two pieces of software behave exactly the same is theoretically impossible [174]. Still, applicable heuristics exist on a mature level that can tell to a certain degree of confidence if code ar- tifacts are functionally equivalent [99, 129] and which are scalable for large code bases as well [72]. Accordingly, our equivalence rule outlined in Sub- section 5.5.1 is not a mathematically rigorous proof but rather a practical means for finding artifacts which behave identically. CHAPTER 4. COMMONALITIES OF CODE AND MODEL RULES 35 4.2 Merging Causes Invalidities

In our thesis we analyze the outcome of merges of source code and of software architecture models. In case of a source code merge, the artifacts to be combined are two blocks of code. As Section 2.1 explains in further detail, given that the code blocks were not edited on the same line, no textual conflict occurs. But even if no textual conflict occurs there are situations where combining the change sets incurs semantic problems. Assuming that two engineers called Alice and Bob work from the same base version and make changes that—seen in isolation—are valid, these unproblematic changes still may lead to a merged code block that is invalid. Such a situation, using a three- way merge, is illustrated in Figure 4.2 where Alice and Bob commit harmless change sets that, when merged, cause a rule violation. With model merging, the situation is quite similar: There are scenarios in which changes, applied in isolation to a model, are valid and cause no invalid artifacts in the changed model. Yet when the changes are combined—which is the case when the modified models are merged—invalidities can arise. This has been exemplified and studied by Dam et al. [39, Section 3]. Figure 4.3 shows such a merge scenario where validly changed models are combined into a model that contains invalid artifacts. From the above we see that both model and code merging are affected by the same problem: Combining valid artifacts can result in an invalid merged outcome. Consequently, rules cannot rely on the validity of the merge inputs but have to analyze the merged product in order to detect invalid artifacts in models and code. These rules must not stop at checking the syntax of the artifacts but also consider the relation of the individual artifacts to each other which might have changed due to the merge. This analysis of the way artifacts are combined with each other after a merge is fundamental to our rules that detect invalid artifacts. Thus, we classified it as one of the rule categories that both model as well as code rules fall into. We discuss this rule type detecting invalid artifact combinations in Subsection 4.1.1. Although examining the combination of artifacts after a merge is vital for architecture models and source code alike, the intrinsic reliance of imperative style coding2 on statement order makes this aspect especially relevant for merged code. While in general it does not matter in which order components are added to a model, it does make a—quite often significant—difference

2Explicitly mentioning the imperative coding style is due to the fact that the problem of statement order becomes irrelevant in functional programming via referential trans- parency [167]. CHAPTER 4. COMMONALITIES OF CODE AND MODEL RULES 36 when the order of code statements is changed. For example, the rule for detecting null pointer dereferences portrayed in Subsection 5.5.4 reflects this distinguishing issue of code. CHAPTER 4. COMMONALITIES OF CODE AND MODEL RULES 37

Alice

1 int i = 5; 2 String s = doSomething(i); 3 i = 10; 4 System.out.println(i);

Merged Original 1 int i = 5 1 int i = 5; 2 //String s = doSomething(i); 2 String s =. doSomething(i); 3 String s = ""; 3 i = s.length(); 4 i = 10; 5 System.out.println(i);

Bob

1 int i = 5; 2 //String s = doSomething(i); 3 String s = "";

Figure 4.2: Three-way merge showing that the changes done by Alice and Bob are innocuous in isolation, yet cause an invalid artifact when merged: The variable i is assigned the value 5 in line 1 which is never read until it is set to the value 10 in line 4. Thus one of the assignments is redundant and constitutes a rule violation of type “Dispensable Artifact” defined in Subsection 4.1.2. CHAPTER 4. COMMONALITIES OF CODE AND MODEL RULES 38

Figure 4.3: A model merge combining two change sets that are valid indi- vidually but create an invalid merged model. The common ancestor of the two model versions is shown on the far left. Graphic from Dam et al. [39, Section 3]. Chapter 5

ECCO Case Study

In this chapter, we present the case study for our code rules. We demonstrate rules that detect invalid code artifacts and how they are applied to code produced by the ECCO tool which is introduced in the following section. We show in Section 5.6 how our code rules reduce the number of choices or improve choices a software engineer working with ECCO has to choose from.

5.1 ECCO Platform Background

In this section, we give a brief introduction to the ECCO tool which is central to our case study for source code rules. We will be using the terms product variants or simply variants to denote an adapted version of a software prod- uct (an UML editor, for example). Software engineers can create such a product variant by copying a product and then modifying the copy, or they can generate a variant using a software product line which are described in Section 2.5. ECCO (Extraction and Composition for Clone-and-Own) is a software tool created by Lukas Linsbauer [124] and Stefan Fischer [62] for obtaining software product lines from an existing code base. The authors recognize the wide-spread practice of copying and adapting large-scale, industrial soft- ware to provide support for a new set of hardware or to satisfy the specific needs of a customer. ECCO is based on the notion that this practice creates considerable maintenance problems because the product variants share large amounts of duplicated code [63, Chapter 1]. This duplication can incur problematic costs since the variants exist side by side after they were cloned. Security patches and other code changes have to be applied separately, which increases the associated complexity and ef- fort accordingly [124, Chapter 1] (Subsection 2.3.1 discusses the issue of code

39 CHAPTER 5. ECCO CASE STUDY 40 clones in general). To alleviate this problem, ECCO provides the functional- ity to create a product line by parsing an existing array of product variants and also offers a semi-automatic way to compose new product variants [63, Chapter 3] from that product line.

5.1.1 Functionality Below we outline the distinctive core features of ECCO and point out how this features are relevant to our case study.

5.1.1.1 Feature to Code Mapping ECCO takes product variants that have been cloned and modified to suit a special need as input. Additionally, ECCO requires a text file that lists all features a variant implements for each of the variants. This file only contains the names of the implemented features and their descriptions. It is kept inten- tionally simple, without specifying—not even approximately—which parts of the product variant implement the feature because “[companies] lack precise knowledge where in the code these features are implemented” [124, Chap- ter 1]. Also, this information might get out of date as the variant evolves. ECCO is then able to extract the variants’ “commonalities and differences, and maps them to their features” [63, Chapter 1] by analyzing the input source code. This mapping is also referred to as traces. The analysis and extraction described by Linsbauer [124] works based on the assumption that products that share features (as specified in the text file) also share source code. Given enough product variants that differ in their features sufficiently, the algorithm can find overlapping feature sets and code sets, respectively. Figure 5.1 illustrates this principle with a diagram.

5.1.1.2 Composing New Products Building on this feature-to-code-mapping, ECCO offers to create new com- positions of these features, synthesizing a new product variant. We will call this part of ECCO the ECCO Composer for the remainder of this thesis. The composition of new product variants is semi-automatic which means in the context of ECCO that the software engineer is guided through the creation of a new variant, receiving hints and warnings from ECCO [63, Chapter 4][62, Section 3.1]. The reason preventing the ECCO Composer to perform a fully auto- matic generation is that ECCO cannot generate code for an interaction of features if they never occurred together before in a product variant. “Glue CHAPTER 5. ECCO CASE STUDY 41

Figure 5.1: The extraction algorithm’s premise is that features (called modulesets in this figure) and code (called codesets) between product vari- ants overlap. The δ denotes the derivative code that is needed to make the feature interaction between two features (here the features “color” and “line”) work. Example from [125, Section 3.2]. code” is necessary to make the combination of two features run which is also called derivative modules in [63, Chapter 2]. Also ECCO—by the nature of its approach of comparing the source code of variants to extract distinct features—is not able to separate features that always appeared together in each product variant; even if those features may fulfill completely dissimilar requirements. When composing new features, ECCO might create multiple choices of the same code block. This is due to the fact that the products used for the composition’s input have implemented the code block in varying ways. We will show in our case study in this chapter how filtering out invalid code blocks is the driving motivation for the source code rules we have created.

5.1.2 Architecture ECCO’s architecture can be roughly separated into feature extraction and the composition of a new product variant. Figure 5.2 shows a simplified view on ECCO’s architecture and the flow of information within the tool. CHAPTER 5. ECCO CASE STUDY 42

Figure 5.2: High-level overview of ECCO from [62, Section 3.1]. CHAPTER 5. ECCO CASE STUDY 43 5.2 Parsing the ECCO Code Tree

The way the ECCO Composer combines existing code from products to new variants is by building up a tree of code. The tree’s nodes are statements and the order of a node’s children determines the order of statements in the resulting code block. A node having siblings translates to the fact that a statement has alternatives, stemming from the different implementations of products which are combined. These alternative statements make for choices in the sense defined in Section 1.1. Figure 5.3 illustrates this principle with a small and simple code tree. In order to detect invalid code blocks, we check the code tree while it is expanding its nodes. This approach has the particular advantage that in case a rule detects an invalid artifact within that block, it can react immediately by removing the problematic statement and thus pruning the code tree. The nodes of the tree contain the individual code statements as strings of code similar to Java. We say similar because ECCO’s representation of Java statements differs slightly from Java which necessitates the parsing of these statements and accounting for their peculiarity. Listing 5.1 shows an example of the way ECCO represents Java code which we dubbed “ECCO Java”. The parsing of ECCO Java from the nodes was implemented using a dedicated utility class listed at A.1. The main tasks this class fulfills are the extraction of variables that were in some way accessed either by having their value read or reassigned as well as determining whether and which methods were called inside the node.

[public , ServerReq serverReq, VODClient vODClient, bevelPanell = new BorderPanel(1, new Color(220, 220, 220), new Color(50, 50, 50)), bevelPanel2 = new BorderPanel(1, new Color(220, 220, 220), new Color(50, 50, 50)), listControll = new List(), buttonControll = new Button(), buttonControl2 = new Button(), buttonControl3 = new Button(), buttonControl4 = new Button(), labell = new Label(), movielist = null, server = serverReq, parent = vODClient, TRY: TRYBLOCK, CATCH (Exception e), , hostreset(), serverselect = new ServerSelect(this), detail = new Detail()] Listing 5.1: ECCO’s representation of Java statements inside the code tree. CHAPTER 5. ECCO CASE STUDY 44

int i. = 5;

String s = doSomething(i);

i = 10; String s = "";

Figure 5.3: When combining multiple code blocks from different product variants to generate a new variant, ECCO creates a code tree where the order of nodes determines the order of statements. If a node has more than one children, as seen here with the second node containing String s = doSomething(i), it means that there are multiple possibilities for the next statement.

5.3 Case Study Approach

In Figure 5.6 we see how our generic approach described in Chapter 3 is adapted for the ECCO case study. As the diagram shows, we specify the generic Choice Generator to be the ECCO Composer and the generic Arti- facts to be ECCO Java blocks from product variants, respectively. The re- maining elements of the diagram did not need to change with respect to the generic diagram at Figure 3.1 because a Software Engineer still has to choose from a set of Choices (which are ECCO Java blocks) that are eliminated and repaired by rules. These rules for ECCO are portrayed in Section 5.5.

5.3.1 Related Work In the following subsection we want to give a brief overview of existing ap- proaches for detecting invalid artifacts in source code to show how other researchers have tackled the problem. In the context of detecting invalid artifacts in merged software, Berzins was one of the first who presented a language-independent approach [13]. He analyzes programs and digital circuits modeled in Boolean algebra. Using his approach that relies—in contrast to ours—heavily on formal methods, he CHAPTER 5. ECCO CASE STUDY 45 is also able to detect semantic conflicts within the merged software. Bush et al. [28] analyze source code by first transforming it to so-called models whose syntax is inspired by Lisp and then performing control flow analysis on them as well as watching values in memory. These models are enriched with rules that primarily aim to find problems with memory ac- cess, memory leaks, and pointers [28, Chapter ‘Appendix: The Modeling Language’]. Figure 5.4 depicts an example of transforming C code to Bush’s Lisp-like model and enriching it with checks for problems related to mem- ory management. Their approach of watching values is similar to the one of our rules portrayed at Subsection 5.5.2 and Subsection 5.5.3 that employs a watch list of variables. Engler et al. [58] have contributed substantial research on how to ex- tend compilers in order to find invalid artifacts in C and C++ code. This includes detecting invalid artifacts that might pertain to a certain kind of software system, such as operating systems or embedded software. Compiler extensions are written in the metal language which is a superset of C++ and “provides the state machine (SM) as a fundamental abstraction” [81, Chap- ter 1]. Figure 5.5 shows an example of how a metal rule can be defined. Regarding analysis of Java code using rules, a topic closely related to this case study, the tool FindBugs [35] is a prominent example that is presented along with other static code analysis tools in Subsection 2.3.3.

5.4 Case Study Motivation

This section outlines why it became necessary to reduce the number of choices (in the sense defined in Section 1.1) that the ECCO Composer produces. We mentioned in Section 5.1 that the ECCO tool not only traces features to code but also lets engineers create new products by combining features in novel ways. In the latter step, code merging has to be done when there are two or more variants of a code block. These choices occur because the code blocks pertaining to the features are taken from existing product variants that might have subtle but also significant differences in implementation. Consequently, a software engineer has to choose from them to create a new product. ECCO’s capabilities have been tested using various software products [125], all written in Java: Video On Demand, Draw Product Line, and ArgoUML (which was used by Couto et al. [37] to study product lines). After merging code blocks to create new variants, certain invalid artifacts were observed in the choices the ECCO Composer generated. This formed the motivation for this case study: Filter out the invalid choices automatically CHAPTER 5. ECCO CASE STUDY 46

(a) An example function written in C that dereferences a pointer and returns its data.

(b) The dereferencing function from 5.4a af- ter being transformed into a model. The in- terspersed rules in the form of (constraint ...) check for memory- and pointer-related issues.

Figure 5.4: The static analyzer of Bush et al. [28] analyzes source code after translating it into a Lisp-like model which is annotated with rules. Figure 5.4a shows the function before its translation into the model seen in Figure 5.4b. If the data-flow analysis detects that the rules are violated, an error message is issued. Example code and model are from [28, Section ‘Model’]. CHAPTER 5. ECCO CASE STUDY 47

Figure 5.5: A rule written in metal that checks whether a pointer that was already freed is dereferenced or freed again. In line three, {kfree(v)} is a pattern that matches all deallocations of variables. Source: [81, Chapter 1] so a software engineer does not have to. The inspected blocks do not only contain problematic code (e.g., a poten- tial null pointer exception), but a considerable number of merged blocks were found upon human inspection to behave identically. They only differ in the order of their statements, which did not change the behavior of the blocks at all. Accordingly, we devised a rule that detects equivalence between blocks of code which is presented in Subsection 5.5.1. Following the definition given in Section 1.1, two or more blocks that are equivalent represent invalid artifacts.

5.5 Rules

This section describes the rules we have created for the ECCO case study to find invalid blocks of ECCO Java. These rules are an integral part of the approach outlined in Section 5.3. We initially attempted to create rules for detecting invalid blocks of ECCO Java by adopting and adapting established invalidity detection tools. We created a wrapper called CodeChecker for analyzing Java source code us- ing PMD, FindBugs (both of which were discussed in Subsubsection 2.3.3.3), and the Java compiler javac with lint enabled1. The implemented prototype of CodeChecker calls the three tools which read Java source files from disk and aggregates their reports. Listing A.2 shows the implementation of this wrapper. The invalid code produced by the ECCO Composer involved dispensable artifacts (cf. Subsection 4.1.2) in the form of redundant assignments to fields. Since PMD did not cover this invalidity in its predefined rules, we created two

1See https://docs.oracle.com/javase/8/docs/technotes/tools/windows/ javac.html for more information on javac and lint. CHAPTER 5. ECCO CASE STUDY 48

ECCO ECCO Java blocks merged by Composer from product variants produces

Choice A Choice B Choice C remove Rules Choice D repair

chooses from

Software Engineer

Figure 5.6: Our approach for the ECCO case study: The ECCO Composer merges ECCO Java code blocks that stem from product variants. This merg- ing generates choices that a software engineer has to choose from. We fa- cilitate the engineer’s choice by removing or repairing the generated choices using the rules we have devised for this case study. This approach is a spe- cialization of the generic approach for our thesis shown in Figure 3.1. CHAPTER 5. ECCO CASE STUDY 49 custom PMD rules for the purpose of finding redundant field assignments. Those rules written in Java have the same goal as the rules later devised for our own rule engine described at Subsection 5.5.2 and Subsection 5.5.3, but differ in implementation since the former rules make use of PMD’s API. These early PMD rules for detecting redundant field assignments can be found at Listing A.3 and at Listing A.4. To get a better understanding on how these rules were embedded and interact with the PMD framework including its abstract syntax tree, we point the reader to Listing A.5 containing the utility class used, for example, to query the code’s abstract syntax tree as generated by PMD. Two main reasons led us to abandon CodeChecker in favor of a custom rule checking engine:

1. Speed: Because one of our goals was to prune the ECCO code tree by not expanding invalid nodes, we had to write every (unfinished) block to disk while the tree was building up as this is the only way the tools wrapped by CodeChecker accept input. After preliminary test runs it became evident that the performance of analyzing even small code trees was untenable and would clearly not scale for larger trees. Due to the fact that memory access is faster than disk access by orders of magnitude [97], we opted for a rule engine that could read code from memory instead of having to read code from disk.

2. Translation issue: Because FindBugs, PMD, and javac accept stan- dard Java as their input, we would have had to create a translation layer for ECCO Java. Also because ECCO’s Java representation might change in the future, we estimated the effort and risk of creating a translation layer from ECCO Java to regular Java to be quite sig- nificant with the possibility to outweigh the advantages gained from reusing these existing detectors of invalid code.

Taking into account the above considerations, we created our own rules written in Java for examining ECCO Java that were applied to ECCO’s code tree while it is expanding its nodes. The following subsections provide rationale as well as explanations re- garding the rules created for ECCO’s code composition. We describe how the rules react to a spotted invalid artifact and what repairs they perform to make the problematic code valid. In addition, we refer from these concrete rules to the overarching concepts of detecting invalid artifacts in merged code and models as explained in Chapter 4. CHAPTER 5. ECCO CASE STUDY 50

5.5.1 Add Listener Equivalence Rule One of the reasons a considerable number of choices were produced while synthesizing a new product in ECCO, stems from the fact that input products add listeners to GUI elements, such as buttons2 in different orders in the implementing code block. Through merging these blocks, which are valid when viewed in isolation, cause problems since an excess of equivalent choices is created. This issue effected by merging constitutes a specific example of the general principle outlined in Section 4.2 that describes how both model and code merges can cause invalid artifacts although their merge inputs were valid. Figure 5.7 shows three choices of a code block that only differs in the order in which listeners are added to their buttons as a small example to illustrate this situation. By inspecting the different choices it becomes clear that each code block exhibits the same behavior. Because it suffices to have only a single one of these choices, all but one of these equivalent blocks can be deleted. Furthermore, the Add Listener Equivalence Rule is an example of a rule tailored to a special domain, in this case GUI applications that favorite message passing using listeners. This contrasts the following rules which apply to all Java programs equally. Listing A.6 shows the documented implementation of this rule, demon- strating how blocks of code can be assessed regarding their potentially equiv- alent behavior. As we have seen, the Add Listener Equivalence Rule decides if the members of a group of code blocks are equivalent in order to remove equiv- alent blocks. Following the definition of artifact given in Section 1.1, this rule is a prime example of rules dealing with the detection and elimination of dispensable artifacts. Consequently, we can assign it to the respective overarching rule category described in Subsection 4.1.2. In terms of automatic repairs, this rule will react to a detected equivalence among multiple code blocks by deleting all blocks but the first one.

5.5.2 Multiple Variable Assignment Rule A recurring issue in the code generated by ECCO’s code composition feature was that values assigned to variables (including fields) were not read before

2GUI buttons and the listeners attached to them are documented for instance in the AWT framework here: https://docs.oracle.com/javase/8/docs/api/java/awt/ Button.html#method.summary CHAPTER 5. ECCO CASE STUDY 51

1 String s = ""; 2 Button button1 = new Button(); 3 Button button2 = new Button(); 4 Button button3 = new Button(); 5 6 button1.addActionListener(listener) 7 button2.addActionListener(listener) 8 button3.addActionListener(listener)

(a) First choice.

1 String s = ""; 2 Button button1 = new Button(); 3 Button button2 = new Button(); 4 Button button3 = new Button(); 5 6 button2.addActionListener(listener) 7 button1.addActionListener(listener) 8 button3.addActionListener(listener)

(b) Second choice.

1 String s = ""; 2 Button button1 = new Button(); 3 Button button2 = new Button(); 4 Button button3 = new Button(); 5 6 button2.addActionListener(listener) 7 button3.addActionListener(listener) 8 button1.addActionListener(listener)

(c) Third choice.

Figure 5.7: The code shown in 5.7a, 5.7b, and 5.7c only differs in the order the listeners are added to the buttons. Choices like these occurred while creating new products from existing ones using ECCO’s Composer. The rule described at Subsection 5.5.1 decides these three choices to be equal in behavior and will delete all choices but the first one. CHAPTER 5. ECCO CASE STUDY 52

they received a new value: The previous value was never used by the program which is a symptom of low code quality. To see an example of this rule violation, consider listing 5.2 which shows a snippet of the synthesized Java code exhibiting multiple assignments to the same variable.

1 newLine = new Line(start) 2 newLine = new Line(color, start) Listing 5.2: The variable newLine is sequentially assigned two values. As the first value is not read until the second one is assigned, one of the assignments is redundant.

When analyzing code blocks like these containing multiple assignments, it becomes obvious that only one of them makes sense. The other assign- ment does not and is therefore dispensable, making Multiple Variable Assignment Rule another instance of a rule detecting dispensable artifacts (cf. Subsection 4.1.2). Considering the implementation of Multiple Variable Assignment Rule shown at Listing A.7, we see that the rule is violated when there is no access to the variable until the next time the variable is assigned a value. This is realized by putting variables in a watch list when they are assigned a value. When a variable from the list is accessed in the code or a method is called3, the variable is removed from the list. If a variable on the watch list is assigned a value again, we can be sure that the variable’s value was not read between the previous and the current assignment. We then declare a rule violation because the first assignment was rendered useless by the second one. In order to determine whether the value of a variable was read, we use EccoJavaParser#getReadVars. As shown at Listing A.1, this method re- turns those variables, that had their values read either in an assignment (e.g., a = b), in a comparison (e.g., a <= b), or in a binary operation (e.g., a + b or a » b). The current rule implementation does not yet include support for detecting access via unary operations such as i++. We consulted the Java Language Specification version 7, specifically sec- tions 17, 18, 19, 20, 21, and 26 of chapter 154 to compile a list of operators that read the value of a variable. Listing 5.3 shows an example where a vari- able is accessed using a comparison operation before it is assigned another value, which makes the code block conforming to the rule.

3Because called methods may use any or all of the fields in the watch list, we choose to be on the safe side and clear the list upon a method call in order to avoid false positives. 4See https://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html CHAPTER 5. ECCO CASE STUDY 53

1 int variable = 1; 2 boolean b = variable <= 5; 3 variable = 2; Listing 5.3: A block of code that conforms to Multiple Variable Assignment Rule because the variable is read in between assignments via a comparison.

The predecessor of Multiple Variable Assignment Rule is the rule shown at Listing A.3 implemented using the PMD framework. The two rules follow the same intent and also the implementations are not unlike each other, barring the fact that the PMD rule only considers the assign- ments to fields, whereas Multiple Variable Assignment Rule also detects redundant reassignments to local variables. When finding a pair of assignments to the same variable where no inter- mittent access to the variable occurs in between, this rule will recommend repairing the block by deleting the second assignment, thus the first assign- ment will keep its usefulness.

5.5.3 Multiple Setter Call Rule The conceptual sibling to the Multiple Variable Assignment Rule from Subsection 5.5.2 is the Multiple Setter Call Rule. Because Java does not follow the uniform access principle [139, Section 3.3], the convention of getters and setters has emerged as a means to hide the implementation of fields [70]. Lest this principle is violated, we can assume that a method call such as myVar.setField(aVal) indeed sets a field to a certain value just as if a more direct assignment like myVar.field = aVal had occurred. Thus, the same considerations regarding redundant assignments apply as those portrayed in Subsection 5.5.2. The implementation of this rule as listed at A.8 is also roughly similar to that of Multiple Variable Assignment Rule: Again we pursue the ap- proach of a watch list to detect repeated setter calls to objects. We remove watched objects when a method is called; as that method might have used one or all of the objects on the list. A rule with the same purpose was developed leveraging PMD’s API to analyze code which can be browsed at Listing A.4. The second assignment makes the first one redundant and becomes a dispensable artifact accordingly. We can, just as the Multiple Variable Assignment Rule, count this rule to the category of rules detecting dispens- able artifacts which we described in Subsection 4.1.2. CHAPTER 5. ECCO CASE STUDY 54

This rule repairs the found invalid artifact the same way as the previous rule: Delete the second assignment to give the first one its utility back.

5.5.4 Uninitialized Read Rule In contrast to the previous rules that detected dispensable artifacts in code, Uninitialized Read Rule belongs to the category of finding invalid com- binations of artifacts. This rule category, described in Subsection 4.1.1, is part of the common concepts in detecting invalid artifacts in code and mod- els (cf. Chapter 4). In case of this rule, the problematic combination is an uninitialized object and a method call to it, which, in languages like Java, causes a null pointer exception. Listing 5.4 shows an example of a code block violating the Uninitialized Read Rule.

1 String s = null; 2 someMethod(); 3 s.charAt(0); Listing 5.4: Uninitialized Read Rule detects null dereferences like the one in line three.

While inspecting ECCO’s generated code, we found quite a few occur- rences of such potential null pointer exceptions. These do not exclusively stem from ECCO’s composition of code but were to some extent already present in the code in the first place, presumably added to the code base in human error. The potential null dereferences detected by this rule may be introduced, apart via developer oversight mentioned above, through merging two pieces of code that are valid individually but lead to an invalid artifact when they are combined in a merge. This issue of valid artifacts causing invalidity when merged is further explained in Section 4.2. As with the previous rules detecting multiple assignments and setter calls, we employ the mental model of a watch list containing “suspicious” variables: Variables that were declared but not initialized to something else than null get on that list. If a method is invoked on one of the watched variables, the rule has successfully detected an invalid artifact. The implementation of Uninitialized Read Rule is listed at A.9. This rule will not try to repair a potential null pointer exception. Gen- erally, there are two options for a repair: Either initialize the uninitialized variable or remove the method call to it. We think that this has to be decided on a case-by-case basis by a human and therefore we did not automate that CHAPTER 5. ECCO CASE STUDY 55 step5. This rule will just indicate that there exists a null dereference in the code.

5.6 Empirical Results on Rules Performance

The following subsections give a presentation of the effectiveness, validity, and efficiency regarding the rules applied to ECCO’s generated code. For further detail on the rules’ performance, please consult the raw performance data at appendix B.

5.6.1 Effectiveness In order to determine the effectiveness of our ECCO rules, we measured how many invalid choices the rules could eliminate. We chose a number of input products and configurations where ECCO was known to produce problematic code blocks which constitute the initial reasons code rules had to be applied to them. As Figure 5.8 shows, from all those code blocks where multiple choices ex- isted, a significant amount of choices could be eliminated, especially from the code of Video On Demand, which we trace back to the fact that a lot of blocks within the GUI code could be removed by Add Listener Equivalence Rule. For a more in-depth view on the measured effectiveness, we point the reader to the raw performance data at Section B.2.

5.6.2 Validity The rules’ validity was assessed doing multiple manual reviews of the detected invalid artifacts to ensure the rules would not yield false positives. False neg- atives were precluded by using code inspection of the product which ECCO synthesized. The reviews were conducted by Stefan Fischer [62], Lukas Lins- bauer [124]6, and the author of this thesis. We reviewed the composed code and the repairs created by the rules thoroughly and to the best of our knowledge there are neither false positives nor false negatives regarding the rules.

5Automatically initializing an object to something else than null is not a task easily automated: What if there are multiple constructors? Which value should be chosen for the constructor’s individual parameters? When there is no public constructor, how do we find the appropriate factory method? 6Fischer and Linsbauer are the creators of and experts on ECCO as well as its code composition feature. CHAPTER 5. ECCO CASE STUDY 56

Figure 5.8: The average number of choices per code block (of those code blocks that have different choices) once with rules filtering out invalid choices and once with no rules active. The variants that were created for this measurement including the detailed performance data can be found at List- ing B.10 for ArgoUML, Listing B.14 for Draw Product Line and Listing B.18 for Video on Demand.

5.6.3 Efficiency Efficiency was measured by contrasting test runs where a set of test runs had rule application turned on and the other one turned off for performance comparison. The computer used for measuring the rules’ efficiency was a Lenovo X220 Tablet with four Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz processors with eight gigabytes of random access memory. The used at the time of measuring was Windows 7 Professional 64 bit. This hardware is in no way special and can be regarded as an off-the-shelf laptop. Figure 5.9 shows the additional time the rules incur on the generation of new product variants with ECCO. Although the overhead is not negligible, it can be argued that the rules’ time of execution is tolerable considering that the creation of a new variant of the industrial-sized ArgoUML takes about a minute on an off-the-shelf laptop with the rules activated. With smaller code bases like Draw Product Line the added time is espe- cially low, increasing the time merely by 5% which corresponds to less than a tenth of a second in absolute terms. The detailed numbers from measuring the rules’ efficiency can be found at appendix B.1. CHAPTER 5. ECCO CASE STUDY 57

Figure 5.9: Performance results from using ECCO to generate new product variants for Video on Demand, Draw Product Line, and ArgoUML, compar- ing the time in milliseconds of applying rules and creating products with- out the use of rules. The table also shows how many code statements the rules could remove. Each generated product has a different number of input products and implements a different number of features which are listed in appendix B.1. Chapter 6

ArchStudio Case Study

This chapter describes our case study for detecting invalid artifacts in mod- els that were created using ArchStudio 3. The approach behind this case study is described in Section 6.4 whereas Section 6.5 explains the underlying motivation, the reason we created rules for ArchStudio models in the first place. To get a better understanding of the case study at hand, we provide a brief introduction to ArchStudio 3 in the following section.

6.1 ArchStudio 3 Platform Background

ArchStudio 3 is an open-source tool authored by Dr. Eric Dashofy [42] for creating and modifying software architecture models and software product lines. ArchStudio uses xADL 2 (highly-extensible architecture description language)[43, 45] as the notation for its models. As also discussed in the work by Dashofy et al. [46, Subsection 3.2], xADL 2 describes architectures using XML, contributing a set of XML schemas “that provide a basic frame- work for modeling product family architectures, its extensibility to allow fu- ture additions to (and modifications of) elements in the representation, and its associated tool support to automatically generate APIs for manipulating specific instances of product family architectures” [44, Section 1]. The possibility to define software product lines is supported by declaring model artifacts as being optional or mandatory. These artifacts possess a Boolean guard whose value determines whether the artifact will be present in the instantiated model [42, Subsubsection 4.1.4.2]. While xADL is not lim- ited to model a single architectural approach [41, Section 2], in ArchStudio 3 it is used to express the Chiron-2 style (also known as C2) [173]. ArchStudio 3 is superseded by ArchStudio 4 and 5, which are both im-

58 CHAPTER 6. ARCHSTUDIO CASE STUDY 59 plemented in the form of plugins [42, Subsection 4.8.4]. The reason we decided to opt for the significantly older version 3 is due to its built-in architectural diffing and merging capabilities which the newer versions lack (see Subsubsection 6.1.1.2 for more on ArchStudio’s diffing and merging ca- pabilities). We explain later on in the same subsubsection that we need these features for our three-way merge scenarios (shown in appendix C) that are the motivation for the rules of this case study. ArchStudio makes use of C2’s artifacts, which are components, connec- tors, links, and interfaces—both in optional and mandatory variants—to rep- resent its models. Components function as the artifacts of computation, whereas connectors enable communication among those components. Links and interfaces are means to model how these artifacts are connected. Fig- ure 6.1 shows a model that contains and names each of the C2 artifacts as they are used in ArchStudio 3.

6.1.1 Functionality For the remainder of this section, we outline features of ArchStudio 3 that are essential to our case study presented in this chapter.

6.1.1.1 Model Creation and Editing ArchStudio allows the creation and modification of xADL 2 models using a graphical . This GUI is implemented as a component within ArchStudio named Archipelago [42, Section 4.9], that lets users open existing or create new xADL 2 models. Using mouse features like drag-and-drop, artifacts of an ArchStudio model, such as components or links, can be edited and added. The model in Figure 6.1 is visualized using Archipelago. In addition to this graphical editor that represents models using boxes and arrows, ArchStudio also offers the component ArchEdit [42, Subsection 4.2.5], a textual editor for xADL 2 that displays the model in its XML form and lets users manipulate it in a more low-level fashion.

6.1.1.2 Diffing and Merging ArchStudio 3 also contains two basic components for diffing and merging models [42, Subsection 6.5.3]. They are contributions to ArchStudio’s code base by Chen et al. [31] which also include support for product line architec- tures in order to facilitate propagating changes from one model to another model. CHAPTER 6. ARCHSTUDIO CASE STUDY 60

Figure 6.1: This model shows all the artifacts available in ArchStudio 3. The dashed line around an element signifies that it is optional rather than mandatory. Archipelago is ArchStudio’s graphical model editor that renders this model.

In hindsight, these existing components were not sufficient for our pur- poses, as they only allow to create a diff between two models, thus merely providing support for two-way diffing, not three-way as required for our model merge scenarios shown in appendix C. Through modification and adaption, we implemented a custom three-way diffing and merging feature for Arch- Studio 3.

6.1.1.3 Detection of Invalid Model Artifacts ArchStudio 3 ships with an integrated rule checking engine that uses Schema- tron [175, 11] (which again employs XSLT [108, 34] and Xalan [117, Chap- ter 2]1) to apply rules to its models. An example for such a rule written with Schematron is listed at 6.1. The use of an XML processing and evaluation tool is natural considering that ArchStudio models themselves are persisted as xADL files that are in turn written in XML. Yet, we encountered severe problems when trying to create more evolved rules since support for loops and functions is missing from ArchStudio 3’s rule checking engine. This prohibits the implementation of rules that need to take the correlation among components in the entire model into account, such as the rule described in Subsection 6.6.1. We could not overcome these limitations also after contacting Dr. Dashofy

1https://xml.apache.org/xalan-j/ CHAPTER 6. ARCHSTUDIO CASE STUDY 61

who offered his help and expertise via the ArchStudio mailing list, which eventually led us to the decision to create a custom rule checking engine for ArchStudio 3, demonstrated in Section 6.6.

1 4 5 6 id0= |*| 7 iddesc0=Link |*| 8 text=Link point missing anchor-on-interface |*| 9 detail=Link 10 must have an anchor-on-interface for every endpoint 11 12 13 Listing 6.1: Schematron rule ensuring that each link is connected at both ends.

6.1.2 Architecture ArchStudio 3 is a Java application [54, 182] that is itself modeled using a C2 architecture defined in xADL 2 [41]. Compiling ArchStudio 3 is performed via a bootstrap process which assembles and builds the individual components of its C2 model [42, Subsection 4.8.3]. Individual features, including the diffing mechanism and the graphical model editor of ArchStudio, are implemented as components defined in xADL 2.2

6.2 Architecture Representation and Parsing

After this background on the employed tool of our case study, we will continue with describing how to read ArchStudio’s models in order to find invalid artifacts in them. A challenge for both case studies which is outlined in Section 3.1 is the need to process the specific format of artifacts and convert them into objects that can be analyzed by our rules. This is equally true for our parsing of ECCO’s Java representation and—with respect to this case study—for

2Inspecting ArchStudio’s source code makes this evident: http://www.isr.uci.edu/ projects/software/archstudio.zip CHAPTER 6. ARCHSTUDIO CASE STUDY 62

creating a type-safe abstraction of the XML structure that ArchStudio uses to persist its models. In order to have suitable objects that can be analyzed by our rules de- scribed in Section 6.6, we have to parse the ArchStudio models defined in XML. Listing 6.2 gives an impression on the form of ArchStudio models, showing a very simple model with only two components and one connection between them. The code responsible for parsing and encapsulating the different artifacts of an ArchStudio model can be found at appendix A.2.1. The Java 8 classes we created possess the logic to parse data from the models stored as XML in order to create a type-safe representation of an ArchStudio model. They do so in combination with ArchUtil listed at appendix A.10 that contains logic common to all ArchStudio artifacts. Having parsed and encapsulated the XML artifacts in Java in a type-safe manner, we are thereafter in the position to analyze the ArchStudio model and to apply our custom rules. These rules are described in Section 6.6.

1 2 3 BtoA 4 5 Component A 6 7 (New Interface) CHAPTER 6. ARCHSTUDIO CASE STUDY 63

8 in 9 10 11 12 Component B 13 14 (New Interface) 15 out 16 17 18 19 (New Link) 20 21 22 23 24 25 26 27 28 Listing 6.2: ArchStudio’s representation of an elementary model containing two components and a single connection between them. The rendering hints used by the ArchStudio GUI were removed from this listing.

6.3 Rule Engine for ArchStudio 3

We mentioned in Subsubsection 6.1.1.3 that the limitations of Schematron were quite severe and after consulting Dr. Dashofy via the ArchStudio mail- ing list, we concluded that it was advisable to create a custom rule checking CHAPTER 6. ARCHSTUDIO CASE STUDY 64 engine for ArchStudio using a more powerful rule language. The considera- tions and rationale for choosing this language are described in the following subsection.

6.3.1 Rule Language This subsection outlines which languages we considered for the rules of this case study and what led us to the decision to select Scala for this task. The following deliberations are written assuming that the reader is some- what familiar with languages running on the Java Virtual Machine and the paradigms of object-oriented programming as well as functional program- ming. Informed of the limited capabilities and expressiveness we encountered when trying to devise new rules using ArchStudio’s Schematron-based rule engine, we decided to look for a suitable language through which we can express our rules for ArchStudio’s models. Since we implemented the parsing of models in Java in order to use Arch- Studio’s API for reading, creating, and editing the models, it stood to reason that we choose a JVM language because the entire model was available as a compound Java object. Our requirements for this language are as follows:

• Easy integration with Java

• High conciseness and readability to increase the rules’ maintainability

• Since we have created type-safe representations of the ArchStudio model (as described in Section 6.2), the language should support static type checking in order for us to profit from type safety at compile time

Although Jython3 and Groovy [111]4 seemed like good options as we have worked with both languages (Jython via Python), we opted for a statically- typed language because it appeared—and indeed turned out to be—beneficial to leverage the types we designed with the Java classes to encapsulate Arch- Studio’s model artifacts (as listed in appendix A.2.1). The rules for ArchStudio should both be readable and concise. Clo- jure [82] cannot satisfy this admittedly subjective criterion for us because we lack the practice of reading and writing Lisps. The aptitude of employing Clojure as a domain specific language is undisputed [109]5 and the pass on

3http://www.jython.org/ 4http://www.groovy-lang.org/ 5Currently maintained DSLs built with Clojure include CHAPTER 6. ARCHSTUDIO CASE STUDY 65

Clojure speaks more for the inexperience of the author than for Clojure’s flexibility and power to be a language for our rules. With regard to choosing Java as the language for creating the rules, which may appear obvious since we modeled all of ArchStudio’s artifacts with Java objects, we decided Scala to be strictly more expressive than Java and thus more suitable for writing and reading rules. Additionally, Scala is purely object-oriented6 and stricter regarding types in contrast to Java7. Moreover, it offers syntax for expressing functional constructs like higher-order functions more tersely than Java 8. Higher-order functions play a critical role in the formulation of our rules: We ensure rule compliance by applying predicates to ArchStudio artifacts, thus making use of this language feature extensively. Although Java offers syntax for higher-order functions since version 8 as well, Scala accomplishes the same arguably more concisely8 which makes for more legible rules; a prime goal when designing them, which we also stated in Section 3.1. Listings 6.3 and 6.4 show a small comparison between the syntax Scala and Java 8 provide for applying predicates to lists using higher-order func- tions, a functional construct that is used in every rule we created for checking ArchStudio models.

• https://github.com/technomancy/leiningen • https://github.com/r0man/sqlingvo • https://github.com/yieldbot/marceline • http://clojurequartz.info/ • https://github.com/seancorfield/jsql

6Scala does not have primitives. 7Scala discourages using null in general and particularly when returning null from a method by favoring the type-safe Option [148, Section 15.6] instead. See Fowler’s “Patterns of Enterprise Application Architecture” [66, Chapter 18, Section ‘Special Case’] for a short discussion on how null poses a problem for type safety and polymorphism as well as Martin’s “Clean Code” [130, Chapter 7, ‘Don’t return null’] regarding the issue of returning null. Furthermore, Scala ensures type safety when using covariant collections, for example, by disallowing covariance of mutable collections. Conversely, Java does allow this in the form of the covariant type of Array which breaks type safety [53, Chapter ‘Generics’ Section ‘Wildcards’] [17, Chapter 5, Item 25]. 8Apart from not having to call stream() each time we want to use map and filter oper- ations on collections, in Scala we can forgo the pleonastic style of naming the type of an ob- ject multiple times such as in List numbers = new ArrayList()<>;. Not to mention the pre-Java 7 way of List numbers = new ArrayList(); that repeats both the type constructor List and the type parameter Integer. Scala has stronger type inference and can hence offer val numbers = List() for this. CHAPTER 6. ARCHSTUDIO CASE STUDY 66

1 List names = Arrays.asList("Java", "Groovy", "Scala", "Jython"); 2 Predicate myPredicate = str -> str.startsWith("J") && str.endsWith("n"); 3 List filteredList = names.stream().filter(myPredicate).map(String::toUpperCase).collect(toList()); Listing 6.3: Applying a predicate to a list in Java. Note that java.util.stream.Collectors.toList is imported statically.

1 val names = List("Java", "Groovy", "Scala", "Jython") 2 val myPredicate = (str:String) => str.startsWith("J") && str.endsWith("n") 3 val filteredList = names.filter(myPredicate).map(_.toUpperCase) Listing 6.4: Applying a predicate to a list in Scala.

Scala’s propensity of minimizing boilerplate code9 is to our advantage when reading and writing rules for ArchStudio. While Scala code is of com- paratively small size, as hinted at in the previous short examples, this may not justify the integration gap between the ArchStudio artifacts modeled in Java and the rule logic implemented in Scala. Yet, using Scala’s implicit and explicit conversions [148, Section 24.18] of Java collections (which are used heavily for the representation of the ArchStudio models and whose con- versions can be seen in appendix A.2.2), we were able to achieve all of this integration from Scala to Java and vice versa.

6.4 Case Study Approach

Analogous to our previous case study presented in Chapter 5, we use a spe- cialization of our generic approach for finding invalid model artifacts. Our approach for the ArchStudio case study is depicted in Figure 6.2. The Choice Generator from our generic approach is the three-way merger that we created for ArchStudio. This merger takes ArchStudio models as in- put (they are the Artifacts in the generic approach). Specifically, the merger merges those models which are part of the merge scenarios that form the

9Scala provides syntactic sugar for creating hashCode(), equals(), toString() as well as getters and setters for fields, i.e., boilerplate that is very common in Java classes. For example, the following line provides for all of these methods in Scala: case class myClass(myField:Int). CHAPTER 6. ARCHSTUDIO CASE STUDY 67 motivation for the rules of this case study (see Section 6.5 for more on this case study’s motivation). Just like the abstract Choice Generator, our merge mechanism creates choices for the Software Engineer to choose from. This approach is struc- turally quite similar to the one outlined in Section 5.3. There is a difference though, with regard to the rules that aim to help the engineers in their de- cision: Although we remove an available choice if it violates a rule, for this case study the automated repairing of an invalid ArchStudio model was not implemented due to time constraints and is left as future work.

6.4.1 Related Work In this subsection we highlight a selection of papers that are related to our approach of detecting invalid model artifacts using rules. We show other rule-based approaches in order to contrast them with our approach. In the area of analyzing software architectures that are modeled using XML, such as ArchStudio’s models, Nentwich et al. [143] have created a rule-based approach using xlinkit, focusing on detecting invalid artifacts in web content. Their rules are defined in first-order logic with the goal to “[return] hyperlinks between inconsistent elements instead of boolean val- ues” [143, Chapter 1]. To exemplify xlinkit’s rule language, which uses XPath for selecting XML elements [20], consider a rule that answers the following question in the context of a web shop: “Are all the product names in the advertisement the same as in the cat- alog?”. This query can be formulated in xlinkit with the rule shown at 6.1 using universal and existential quantifiers10. To decrease the time it takes to assess whether a change in the model has introduced an invalid artifact, xlinkit supports the concept of incremental checking [144, Chapter 7].

∀a ∈ “/Advert”(∃p ∈ “/*/Product”(“$a/ProductName” = “$p/Name”)) (6.1) Comparing the rule shown at 6.1 with our own rules of Section 6.6 we can see a conceptual similarity when we consider that they both make use of exis- tential quantifiers to assure certain predicates hold. Here is a relevant extract of our Circular Dependency rule from Subsection 6.6.1 for comparison:

10A rule less intuitive than the question but phrased closer to the syntax of the rule shown at 6.1 is: “For all Advert elements, there exists a Product element in the Catalog element where the ProductName subelement of the former equals the Name subelement of the latter”. The question, the rule, and the rules are from [143, Chapter 2]. CHAPTER 6. ARCHSTUDIO CASE STUDY 68

ArchStudio ArchStudio models merged by three-way merge

produces

Choice A Choice B Choice C remove Rules Choice D repair

chooses from

Software Engineer

Figure 6.2: Our approach for the ArchStudio case study: Our custom Arch- Studio three-way merger merges ArchStudio models. This merging generates choices that a software engineer has to choose from. We facilitate the engi- neer’s choice by removing or repairing the generated choices using the rules we have devised for this case study. Note that the dashed line from Rules to Choices signifies that the current version of our ArchStudio rules only elimi- nate choices, they do not try repair existing ones like the rules for ECCO do (compare Section 5.5). Automatic repairs of models are future work. This approach is a specialization of the generic approach for our thesis shown in Figure 3.1. CHAPTER 6. ARCHSTUDIO CASE STUDY 69

model.getComponents.exists(dependsOnItself) (6.2) Granting that the notation is different—our rule uses Scala’s exists whereas xlinkit expresses the same using ∃—we can nevertheless see that the principle of using predicates11 with existential quantifiers is the same for the rule of Nentwich et al. [143]. The critics12 in the architectural design tool Argo by Robbins et al. [160] ascertain that a model (or a part of it) specified in UML [159, Section 1] or C2 notation13 satisfies predefined properties which can stem from, for example, rules imposed by the modeling language or guidelines and patterns for object-oriented design [159, Subsection 3.1][74]. Critics are implemented as Java predicates [162, Section 6] and are re- garded as being pessimistic because they consider an unspecified design at- tribute as a reason to issue a warning [161, Subsection 6.2]. Where previous efforts on architecture checking have only evaluated the model after a design decision has been made, analyzing the model using critics is possible “while architects are considering individual design decisions and modifying the architecture” [160, Chapter ‘Introduction’]. Thus the critics provide assistance in the evolution of the architectural design. As our approach to detecting invalid model choices is rule-based, we can draw from the work of Liu, Easterbrook, and Mylopoulos whose “goal is to develop a software design environment that automates the detection and resolution of design inconsistencies in design models” [127]. Their method of analyzing UML models using custom production rules [19, Chapter 7]14 involves automatically fixing the invalid artifact. Medvidovic et al. [133] tackled the problem of connecting the different views on a software system involving the analysis of C2 models, which is the type of models ArchStudio produces. Their work was significantly extended in “Using Object-Oriented Typing to Support Architectural Design in the C2 Style” [134] by applying type theory.

11In our rule’s case the predicate is dependsOnItself that determines if a component within the model depends on itself. 12“Critics are active agents that support decision making by continuously and pes- simistically analyzing a partially specified design. Each critic checks for the presence of a certain condition in the design. Critics are embedded in a design environment where they have access to the architecture as it is being modified.” [161, Chapter 3] 13The fact that Argo creates and analyzes architectures in C2 notation is stated here: http://isr.uci.edu/architecture/prior-software.html 14Figure 6.3 illustrates the syntax of such a production rule by explaining one of the rules from [127]. CHAPTER 6. ARCHSTUDIO CASE STUDY 70

Figure 6.3: This production rule from Liu et al. [127, Chapter 3] states that a debit to a bank account must be lower than the balance of the account.

6.5 Case Study Motivation

The rules for the ArchStudio case study were primarily motivated by the work done in [39]. To give a simplified summary of the process relevant to our work, this is the abstract rule scenario described in the paper for which we designed our ArchStudio rules: 1. Rules are applied to a merged software model with the goal to find invalid artifacts. 2. Detected invalid artifacts are repaired automatically. These repairs are associated with the rules. 3. Although the repairs fix the original invalid artifact, they introduce new invalid artifacts to the model. 4. More rules are applied which eventually bring the model into a state with a minimal number of invalid artifacts. An algorithm in pseudocode for this procedure of applying repairs is pre- sented by Dam et al. [39, Section 7]. Note that steps three and four can repeat multiple times until a final fix is found for the model. We designed merge scenarios and specific rules that lead to these repair iterations. The scenarios we created are concrete examples of the abstract procedure outlined above and are listed in appendix C. They demonstrate how individually valid changes cause invalid models when combined; this fact applies both to model as well as source code merges, which is described in Section 4.2 as an overarching principle. The presented scenarios form the motivation and the raison d’être for the model rules described in the subsequent section.

6.6 Rules

Analogous to our case study for source code rules (see Chapter 5), we present in this chapter the rules we created for ArchStudio software models, their CHAPTER 6. ARCHSTUDIO CASE STUDY 71 rationale, and their correlation to the overarching principles of detecting invalid choices in code and models as described in Chapter 4.

6.6.1 Circular Dependency Circular dependencies in modules of software systems are a sign of bad design, reduce overall modularity, hamper refactoring, impede testability [102, 116, Chapter 4], and run contrary to the tenet of dividing “the system into inde- pendently callable subprograms” [152, Subsection 4.D]. Figure 6.4 shows a basic ArchStudio model containing a circular dependency. In order to track down circular dependencies in an ArchStudio model, we employ the following method:

• We start with the first component of a model and visit all its dependen- cies, the dependencies’ dependencies and so forth (i.e., we visit those components on which the first component transitively depends).

• If we encounter the original component again among the dependencies, the one we started with, we know that there is a circular dependency in the model and notify the system that we have detected a rule violation.

• Otherwise, after we have visited all the component’s dependencies, we can continue analyzing the next component of the model and visit all its transitive dependencies.

• This way, we either find a circular dependency while consecutively ex- amining the model’s components or, after we have checked the last component, conclude that the model under review does not contain any circular dependencies.

The components forming a circular dependency within a model are tan- tamount to an invalid artifact combination. Therefore, this rule belongs to the category described at Subsection 4.1.1 which encompasses both model and source code rules. Regarding the rule’s implementation, we followed the principle outlined in Subsection 6.3.1 of defining a predicate—in this case dependsOnItself— that is applied to each component of the model. If one of the components does not satisfy this predicate, i.e., it does depend on itself, we have detected a circular dependency. Thus, the method for checking an ArchStudio model for circular depen- dencies and returning an ArchRuleResult (see Listing A.17 for its imple- mentation) is as follows: CHAPTER 6. ARCHSTUDIO CASE STUDY 72

Figure 6.4: ArchStudio model whose components are dependent on each other in a cyclic fashion.

1 /** @return whether there is a circular dependency in the model */ 2 def check(model: ArchModel) = 3 // A component indirectly depending on itself constitutes a circular dependency in the model 4 if (model.getComponents.exists(dependsOnItself)) 5 ArchRuleResult("Circular dependency detected", TestFailed) 6 else 7 ArchRuleResult("No circular dependencies", TestPassed) The entire code for the rule can be found at appendix A.18. Since we implemented dependsOnItself as a tail-recursive function [85], the rule can analyze models containing components with extensive dependency networks. Note that the application of the predicate to the model’s components, i.e., model.getComponents.exists(dependsOnItself), which is the core of this rule, also demonstrates the interoperability between Java and Scala code mentioned in Subsection 6.3.1: model.getComponents returns a Java list that has no notion of a method called exists. Yet the Scala compiler implicitly converts this Java list to a Scala buffer that provides the method exists [148, Section 24.18][154, Chapter 11]15. These implicit conversions occur in all of the following rules.

15The Scala compiler only performs implicit conversions from Java to Scala col- lections and vice versa if the respective Scala package is imported using import scala.collection.JavaConversions. CHAPTER 6. ARCHSTUDIO CASE STUDY 73

6.6.2 Connector Has Incoming Interface As outlined in Section 6.1, connectors in the C2 model employed by Arch- Studio serve the purpose of receiving and relaying data from one component to another. We argue that a connector which does not meet this requirement has no use within the model and is consequently a dispensable artifact in the sense of the common rule category formulated at Subsection 4.1.2. Figure 6.5 exemplifies a model that has a connector void of any incoming interfaces. The predicate central to this model rule is hasIncomingIface, which is applied to each connector inside the model. Like with the previous rule, an implicit conversion of the Java collection that is returned by the method ArchConnector#getInterfaces to a Scala collection allows us to use the existential operator exists on the ArchStudio interfaces. This way, we can check if there is at least one incoming interface on the connector using the predicate shown in Listing 6.5:

1 val hasIncomingIface = (_:ArchConnector).getInterfaces.exists(_.getDirection == IN) Listing 6.5: Predicate accepting an ArchConnector and returning a Boolean. It tells us if a connector has an incoming interface.

Having defined the predicate above, we can then write another simple function shown at Listing 6.6 to get all the connectors that do not satisfy the predicate, i.e., those connectors having no incoming interface.

1 val connectorsWithNoIncomingIfaces = (_:ArchModel).getConnectors.filterNot(hasIncomingIface) Listing 6.6: Function that finds all the interfaces of an ArchModel that have no incoming interface.

This code is the essence of the rule at hand. The rest of it contains the relevant imports and the creation of the ArchRuleResult; this can be

Figure 6.5: One of the connectors in this model cannot receive data at all, so we deem that connector dispensable. CHAPTER 6. ARCHSTUDIO CASE STUDY 74

inspected at appendix A.19.

6.6.3 Connector Has Outgoing Interface The same convention for connectors having to relay data stated in Sub- section 6.6.2 also pertains to outgoing interfaces. A connector’s task is to connect components; it should not be a dead end for information. Figure 6.6 shows an array of components and connectors where one of the connectors violates the rule on outgoing interfaces. The implementation of this rule is—apart from the interface’s direction— identical to the rule ensuring connectors have incoming interfaces and is listed at A.20.

6.6.4 Mandatory Component Has Mandatory Interface While the previous rules had conventional software models as their target, this rule and the following ones are specific to software product lines (SPLs) created with ArchStudio. This rule prohibits the creation of undesirable variants the SPL can pro- duce, i.e., choices as defined in Section 1.1 This rule follows the approach mentioned in Section 2.5 and tries to find invalid artifacts by analyzing the SPL with regard to models that it could possibly instantiate. This stands in contrast with the approach of instanti- ating all models and analyzing each model subsequently. The Scala code implementing these rules is depicted at appendix A.21 where we can see that the relevant predicate is hasMandatoryIface which is listed at 6.7.

1 val hasMandatoryIface = (_:ArchComponent).getInterfaces.exists(_.isMandatory) Listing 6.7: Predicate that determines whether an ArchComponent has at least one mandatory interface.

Figure 6.6: The last connector of this model does not forward the received data, which makes it a dead end and therefore dispensable. CHAPTER 6. ARCHSTUDIO CASE STUDY 75

Using this predicate we can now retrieve all mandatory components that do not possess a mandatory interface and hence are at risk of being unconnected when the product line is instantiated. Querying the model via the predicate is illustrated in Listing 6.8.

1 val mandComps = model.getComponents.filter(_.isMandatory) 2 val mandCompsWithoutMandIface = mandComps.filterNot(hasMandatoryIface) Listing 6.8: Get all the components that are mandatory and lack a mandatory interface.

A model with a component that does not comply with this rule is shown in Figure 6.7. Because Mandatory Component Has Mandatory Interface ensures that a model does not contain unconnected and thus useless components, it is a member of the rule category for detecting dispensable artifacts. See Subsec- tion 4.1.2 for more information on that category.

6.6.5 Model Has Mandatory Components To avoid instantiating models from software product lines that are empty, i.e., they contain not a single component, we devised a rule ensuring this. The rule Model Has Mandatory Components detects when a model consists entirely of optional components and reports a rule violation to make sure that there is no possibility of a blank model being instantiated. Considering that a model devoid of any components is arguably useless, this rule belongs to the category of rules prohibiting dispensable artifacts outlined at Subsection 4.1.2. As an example, Figure 6.8 shows an ArchStudio model not conforming to this rule since all its components and connectors are optional. Regarding implementation, this rule is the simplest among the model rules. Scala’s support for higher-order functions and the implicit conversion of Java collections to Scala collections enable this rule’s brevity. Its complete check method is listed at Listing 6.9. See appendix A.22 for the entire rule.

1 /** @return whether there exists a mandatory component within the model */ 2 def check(model: ArchModel) = 3 if (model.getComponents.exists(_.isMandatory)) 4 ArchRuleResult("Model has a mandatory component", TestPassed) 5 else CHAPTER 6. ARCHSTUDIO CASE STUDY 76

Figure 6.7: The optional interface on the mandatory component Comp A is all that connects it to the rest of the model. The dashed line around interfaces and links signify that they are optional.

Figure 6.8: This ArchStudio model has not a single mandatory component and is thus in violation of the rule Model Has Mandatory Components. CHAPTER 6. ARCHSTUDIO CASE STUDY 77

6 ArchRuleResult("Model doesn’t have any mandatory components", TestFailed) Listing 6.9: The check method of the rule Model Has Mandatory Components.

6.6.6 No Mandatory Link On Optional Interface In software product lines, care has to be taken when it comes to combining optional and mandatory artifacts. In a scenario like the one pictured in Figure 6.9, it is possible that a model gets instantiated that does not include the outgoing interface on component Comp B. In this case, the link leading to component Comp B has a loose end and does not connect anything. Such a combination of artifacts is undesirable. Accordingly, our rule No Mandatory Link On Optional Interface belongs to the kind of rules detecting invalid artifact combinations. This rule category spanning both model and source code rules is portrayed in Subsection 4.1.1. The predicate for this rule is a bit more sophisticated than the previous model rules for ArchStudio. In the implementation of the rule displayed at Listing A.23 it is called hasMandatoryLinkOnOptionalIface and uses Scala’s for comprehension as a way to iterate through and filter out a com- ponent’s interfaces and links. For comparison, we developed an alternative implementation of the pred- icate using map and filter operations directly16. It is up to the reader to decide which version of the predicate is more readable by comparing the two implementations at Listing 6.10 and at Listing 6.11.

1 /** @return whether there is an optional interface on one of this component’s mandatory links */ 2 def hasMandatoryLinkOnOptionalIface(comp: ArchComponent) = { 3 val mandLinksOnOptionalIfaces = for { 4 optIface <- comp.getInterfaces if optIface.isOptional 5 link = optIface.getLink.get if optIface.getLink.isPresent 6 mandLink = link if link.isMandatory 7 } yield mandLink 8 9 mandLinksOnOptionalIfaces.nonEmpty

16We say directly because the Scala compiler translates for comprehensions to calls of map, filter, and withFilter [148, Chapter 23]. CHAPTER 6. ARCHSTUDIO CASE STUDY 78

10 } Listing 6.10: Predicate finding components having mandatory links on one of their optional interfaces. Uses Scala’s for comprehension.

1 /** @return whether there is an optional interface on one of this component’s mandatory links */ 2 def hasMandatoryLinkOnOptionalIfaceAlt(comp: ArchComponent) = { 3 val optIfaces = comp.getInterfaces.filter(_.isOptional) 4 val links = optIfaces.map(_.getLink).filter(_.isPresent) 5 val mandLinksOnOptionalIfaces = links.filter(_.get().isMandatory) 6 7 mandLinksOnOptionalIfaces.nonEmpty 8 } Listing 6.11: Predicate finding components having mandatory links on one of their optional interfaces. Alternative implementation using calls to map and filter.

6.7 Empirical Results

Evaluation of the rules was performed on the models of the merge scenar- ios (see appendix C) since they constitute the motivation for the individual ArchStudio rules. Whereas these merge scenarios were conceptually abstract and not bound to any modeling tool, we created the concrete models in ArchStudio and applied the rules described in Section 6.6 to them. In order to do so, we integrated a for selecting rules and starting a test run with them to analyze an ArchStudio model. Naturally, we also included the rules’ results in our GUI. An example of the user interface displaying the results after applying rules to a model can be seen in Figure 6.10. This

Figure 6.9: If the optional interface on Comp B is not present in the instan- tiated model, the link to Comp C will have no origin and connect nothing. CHAPTER 6. ARCHSTUDIO CASE STUDY 79 also demonstrates the integration of Scala code inside Java components such as our user interface written in Java Swing. The Scala code that is executed via our Java GUI is the analysis of the models using our Scala rules. The rules’ effectiveness was measured by the results they yielded, i.e., whether they are capable of finding the invalid artifacts they were designed to detect. We examined the rules’ results and the models to which they were applied together with Dr. Hoa Dam, who is the co-author of the paper [39] that forms the foundation of the merge scenarios shown in appendix C which in turn are the motivation for the model rules we have developed. The evaluation’s outcome was that our rules can reliably detect the invalid artifacts in the models we have created. After examining the model rules and the output they produced, we concluded that there were no cases of either false positives or false negatives. Consequently, we can say that our rules fulfill their purpose entirely. With regard to efficiency, we went without measuring execution time of the rules, which we did in our other case study as reported at Subsection 5.6.3. There are two reasons for this: Firstly, there were no large-scale C2 models at our disposal that could embody the merge scenarios and models therein, which would have tested our rules on an industrial-sized level. Secondly, and related to the first reason, our rule checking feature delivered and pre- sented its results perceptually instantaneously when analyzing the models from appendix C. Since this feature we have developed and integrated into ArchStudio is user-facing, we regard this perceived performance as significant enough. We note that the hardware our rules were executed on, is the same off- the-shelf computer described in Subsection 5.6.3. When we examine the rules’ design as they are described in Section 6.6, we can argue that they are quite efficient in the sense that when comparing the essential complexity [21, Chapter 17][131, Section 6] of the conceptual rules and the accidental complexity added in their actual implementation, it becomes apparent that this additionally incurred complexity is quite slim due to the expressiveness and terseness of the rule language (cf. Subsection 6.3.1). CHAPTER 6. ARCHSTUDIO CASE STUDY 80

Figure 6.10: The model rules written in Scala were integrated into Arch- Studio’s Swing GUI. The user interface shows that the model under test did not comply with every selected rule: The component Comp A causes the model to be in violation of the two rules described at Subsection 6.6.4 and Subsection 6.6.6. Chapter 7

Conclusion

In this thesis, we have shown how rules applied to software artifacts can eliminate and improve choices that are the result of a software merge. We tested our approach using two case studies, one for source code (Chapter 5) and another for software models (Chapter 6). The results presented in Section 5.5 and in Section 6.7 show that our rules can detect invalid artifacts reliably and reduce the number of choices a software engineer has to deal with. One of the theoretical insights gained from assessing the case studies, is that rules applied to models and source code differ fundamentally. This is to be accounted for by the different abstraction levels (in the sense of Krueger [114, Section 1] or Atkinson and Kuehne [4]) to which the analyzed elements of code and architecture pertain: When evaluating source code, we analyze statements, variables, methods, and their order relative to each other. Contrastingly, when examining software models with our rules, we inspect components, interfaces and their interrelations. This semantic gap makes for distinct rules. On the other hand, we have observed and demonstrated that although operating on dissimilar software abstractions , model and code rules share common categories (Section 4.1) and merging poses similar difficulties for source code as well as software models, which the particular rules have to address (Section 4.2). Furthermore, on a more technical note, we have shown that it is feasible to express rules for both abstraction levels using languages from the same platform, in our case the JVM.

81 CHAPTER 7. CONCLUSION 82 7.1 Threats to Validity

Considering that we built our rules and tested our approach in context of two tools (i.e., ArchStudio and ECCO), we acknowledge that the results presented in this thesis reflect the insights gained from working with these tools and the notations they use, C2 models and Java code. To test our approach on a broader basis, we need to conduct more case studies employing other tools or different architecture styles and programming languages, respectively. Also, the portrayed rules were created to fix specific issues in the merged source code and models, which we described in the relevant case studies. Further research is needed on code and model rules in additional contexts to draw more general conclusions. Bibliography

[1] Mathieu Acher, Philippe Collet, Philippe Lahire, and Robert B. France. Slicing feature models. In Proceedings of the 2011 26th IEEE/ACM In- ternational Conference on Automated Software Engineering, ASE ’11, pages 424–427, Washington, DC, USA, 2011. IEEE Computer Society.

[2] Marcus Alanen and Ivan Porres. Difference and Union of Models. In Perdita Stevens, Jon Whittle, and Grady Booch, editors, UML, vol- ume 2863 of Lecture Notes in Computer Science, pages 2–17, Berlin, Heidelberg, 2003. Springer-Verlag.

[3] Marc Andreessen. Why Software Is Eating The World. Wall Street Journal (Online), August 2011.

[4] Colin Atkinson and Thomas Kühne. Model-Driven Development: A Metamodeling Foundation. IEEE Software Magazine, 20(5):36–41, September 2003.

[5] Nathaniel Ayewah, David Hovemeyer, J. David Morgenthaler, John Penix, and William Pugh. Experiences Using Static Analysis to Find Bugs. Software, IEEE, 25(5):22–29, 2008.

[6] Nathaniel Ayewah and William Pugh. The Google FindBugs Fixit. In Proceedings of the 19th International Symposium on Software Testing and Analysis, pages 241–252. ACM, 2010.

[7] Brenda S. Baker. On Finding Duplication and Near-duplication in Large Software Systems. In Proceedings of the Second Working Con- ference on Reverse Engineering, WCRE ’95, pages 86–, Washington, DC, USA, 1995. IEEE Computer Society.

[8] Victor R. Basili, Lionel C. Briand, and Walcélio L. Melo. A Valida- tion of Object-Oriented Design Metrics As Quality Indicators. IEEE Transactions on Software Engineering, 22(10):751–761, October 1996.

83 BIBLIOGRAPHY 84

[9] Ira D. Baxter, Andrew Yahin, Leonardo Moura, Marcelo Sant’Anna, and Lorraine Bier. Clone Detection Using Abstract Syntax Trees. In Proceedings of the International Conference on Software Maintenance, ICSM ’98, pages 368–, Washington, DC, USA, 1998. IEEE Computer Society.

[10] Boumediene Belkhouche and Cuauhtémoc Lemus Olalde. Multiple View Analysis of Designs. In Joint Proceedings of the Second Inter- national Software Architecture Workshop (ISAW-2) and International Workshop on Multiple Perspectives in Software Development (View- points ’96) on SIGSOFT ’96 Workshops, ISAW ’96, pages 159–161, New York, NY, USA, 1996. ACM.

[11] Soběslav Benda, Jakub Klímek, and Martin Nečaský. Using Schema- tron As Schema Language in Conceptual Modeling for XML. In Pro- ceedings of the Ninth Asia-Pacific Conference on Conceptual Modelling - Volume 143, APCCM ’13, pages 31–40, Darlinghurst, Australia, Aus- tralia, 2013. Australian Computer Society, Inc.

[12] Brian Berliner. CVS II: Parallelizing Software Development. In Pro- ceedings of the Winter 1990 USENIX Conference, pages 341–352. USENIX Association, 1990.

[13] Valdis Berzins. Software Merge: Semantics of Combining Changes to Programs. ACM Transactions on Programming Languages and Systems (TOPLAS), 16(6):1875–1903, 1994.

[14] Philippe Besnard and Anthony Hunter. Quasi-classical Logic: Non- trivializable Classical Reasoning from Inconsistent Information. In Christine Froidevaux and Jürg Kohlas, editors, ECSQARU, volume 946 of Lecture Notes in Computer Science, pages 44–51, Berlin, Heidelberg, 1995. Springer-Verlag.

[15] Ramesh Bharadwaj and Constance L. Heitmeyer. Model Checking Complete Requirements Specifications Using Abstraction. Automated Software Engineering, 6(1):37–68, 1999.

[16] Xavier Blanc, Isabelle Mounier, Alix Mougenot, and Tom Mens. De- tecting Model Inconsistency Through Operation-based Model Con- struction. In Proceedings of the 30th International Conference on Soft- ware Engineering, ICSE ’08, pages 511–520, New York, NY, USA, 2008. ACM. BIBLIOGRAPHY 85

[17] Joshua Bloch. Effective Java. Prentice Hall PTR, Upper Saddle River, NJ, USA, second edition, 2008. [18] Bontemps, Yves and Heymans, Patrick and Schobbens, Pierre-Yves and Trigaux, Jean-Christophe. Generic Semantics of Feature Diagrams Variants. In Feature Interactions and Software Systems ’05, pages 58– 77, 2005. [19] Ronald Brachman and Hector Levesque. Knowledge Representation and Reasoning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2004. [20] Tim Bray, Jean Paoli, C Michael Sperberg-McQueen, Eve Maler, and François Yergeau. Extensible Markup Language (XML). World Wide Web Consortium Recommendation REC-xml-19980210, 16, 1998. [21] Frederick P. Brooks, Jr. The Mythical Man-month (Anniversary Ed.). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1995. [22] Greg Brunet, Marsha Chechik, Steve Easterbrook, Shiva Nejati, Nan Niu, and Mehrdad Sabetzadeh. A Manifesto for Model Merging. In Proceedings of the 2006 international workshop on Global integrated model management, pages 5–12. ACM, 2006. [23] Bruno Laguë, Daniel Proulx, Ettore M. Merlo, Jean Mayrand, and John Hudepohl. Assessing the Benefits of Incorporating Function Clone Detection in a Development Process. In Proceedings of the International Conference on Software Maintenance (ICSM), pages 314–321. IEEE Computer Society Press, 1997. [24] Randal E. Bryant. Graph-Based Algorithms for Boolean Function Ma- nipulation. IEEE Transactions on Computers, 35(8):677–691, August 1986. [25] Randal E. Bryant. Symbolic Boolean Manipulation with Ordered Binary-decision Diagrams. ACM Computing Surveys, 24(3):293–318, September 1992. [26] David Budgen. Software Design. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, second edition, 2003. [27] Peter Buneman, Susan Davidson, and Anthony Kosky. Theoretical Aspects of Schema Merging. In Advances in Database Technology– EDBT’92, pages 152–167, Berlin, Heidelberg, 1992. Springer-Verlag. BIBLIOGRAPHY 86

[28] William R. Bush, Jonathan D. Pincus, and David J. Sielaff. A Static Analyzer for Finding Dynamic Programming Errors. Software: Prac- tice and Experience, 30(7):775–802, June 2000.

[29] Jordi Cabot and Martin Gogolla. Object Constraint Language (OCL): A Definitive Guide. In Proceedings of the 12th International Confer- ence on Formal Methods for the Design of Computer, Communication, and Software Systems: Formal Methods for Model-driven Engineering, SFM’12, pages 58–90, Berlin, Heidelberg, 2012. Springer-Verlag.

[30] G. Ann Campbell and Patroklos P. Papapetrou. SonarQube in Action. Manning Publications Co., 2013.

[31] Ping Chen, Matt Critchlow, Akash Garg, Christopher Van der West- huizen, and André van der Hoek. Differencing and Merging within an Evolving Product Line Architecture. In Software Product-Family Engineering, 5th International Workshop, PFE 2003, November 4-6, Siena, Italy, Revised Papers, pages 269–281, Berlin, Heidelberg, 2003. Springer-Verlag.

[32] Ram Chillarege, Inderpal S. Bhandari, Jarir K. Chaar, Michael J. Hal- liday, Diane S. Moebus, Bonnie K. Ray, and Man-Yuen Wong. Orthog- onal Defect Classification – A Concept for In-Process Measurements. IEEE Transactions on Software Engineering, 18(11):943–956, 1992.

[33] Andy Chou, Junfeng Yang, Benjamin Chelf, Seth Hallem, and Daw- son Engler. An Empirical Study of Operating Systems Errors. In Proceedings of the Eighteenth ACM Symposium on Operating Systems Principles, SOSP ’01, pages 73–88, New York, NY, USA, 2001. ACM.

[34] James Clark et al. XSL Transformations (XSLT). World Wide Web Consortium (W3C). URL http://www. w3. org/TR/xslt, 1999.

[35] Brian Cole, Daniel Hakim, David Hovemeyer, Reuven , William Pugh, and Kristin Stephens. Improving Your Software Using Static Analysis to Find Bugs. In Companion to the 21st ACM SIGPLAN Symposium on Object-oriented Programming Systems, Languages, and Applications, OOPSLA ’06, pages 673–674, New York, NY, USA, 2006. ACM.

[36] Ben Collins-Sussman, Brian W. Fitzpatrick, and C. Michael Pilato. Version Control with Subversion. O’Reilly, 2011. BIBLIOGRAPHY 87

[37] Marcus Vinicius Couto, Marco Tulio Valente, and Eduardo Figueiredo. Extracting Software Product Lines: A Case Study Using Conditional Compilation. In Tom Mens, Yiannis Kanellopoulos, and Andreas Win- ter, editors, CSMR, pages 191–200. IEEE Computer Society, 2011.

[38] Bill Curtis, Herb Krasner, and Neil Iscoe. A Field Study of the Soft- ware Design Process for Large Systems. Communications of the ACM, 31(11):1268–1287, November 1988.

[39] Hoa Khanh Dam, Alexander Reder, and Alexander Egyed. Inconsis- tency Resolution in Merging Versions of Architectural Models. In 11th Working IEEE/IFIP Conference on Software Architecture (WICSA), Sydney, Australia, pages 153–162, 2014.

[40] Ian F. Darwin. Checking Java Programs. O’Reilly Media, Inc., first edition, 2007.

[41] Eric Matthew Dashofy. xADL 2.0 Distilled: A Guide for Users of the xADL 2.0 Language, January 2003.

[42] Eric Matthew Dashofy. Supporting Stakeholder-driven, Multi-view Soft- ware Architecture Modeling. PhD thesis, University of California, Irvine, 2007.

[43] Eric Matthew Dashofy, André van der Hoek, and Richard N. Taylor. A Comprehensive Approach for the Development of Modular Software Architecture Description Languages. ACM Transactions on Software Engineering and Methodology, 14(2):199–245, April 2005.

[44] Eric Matthew Dashofy and van der André Hoek. Representing Prod- uct Family Architectures in an Extensible Architecture Description Language. In International Workshop on Product Family Engineering (PFE-4), pages 330–341, October 2001.

[45] Eric Matthew Dashofy, van der André Hoek, and Richard N. Taylor. A Highly-Extensible, XML-Based Architecture Description Language. In Working IEEE/IFIP Conference on Software Architecture (WICSA 2001), Amsterdam, The Netherlands, August 28-31 2001.

[46] Eric Matthew Dashofy, André van der Hoek, and Richard N. Taylor. An Infrastructure for the Rapid Development of XML-based Architec- ture Description Languages. In Proceedings of the 24th International Conference on Software Engineering, ICSE ’02, pages 266–276, New York, NY, USA, 2002. ACM. BIBLIOGRAPHY 88

[47] Harry S. Delugach. Specifying Multiple-viewed Software Requirements with Conceptual Graphs. Journal of Systems and Software, 19:207–224, 1992.

[48] David L. Dill, Andreas J. Drexler, Alan J. Hu, and Chan-Ho Yang. Pro- tocol verification as a hardware design aid. In Computer Design: VLSI in Computers and Processors, 1992. ICCD ’92. Proceedings., IEEE 1992 International Conference on, pages 522–525, 1992.

[49] R. Geoff Dromey. A Model for Software Product Quality. IEEE Trans- actions on Software Engineering, 21(2):146–162, February 1995.

[50] Paul Duvall, Stephen M. Matyas, and Andrew Glover. Continuous Inte- gration: Improving Software Quality and Reducing Risk (The Addison- Wesley Signature Series). Addison-Wesley Professional, 2007.

[51] Steve Easterbrook. Handling Conflict Between Domain Descriptions With Computer-Supported Negotiation. Knowledge Acquisition, 3:255– 289, 1991.

[52] Steve Easterbrook and Bashar Nuseibeh. Using ViewPoints for In- consistency Management. Software Engineering Journal, 11(1):31–43, January 1996.

[53] Bruce Eckel. Thinking in Java. Prentice Hall Professional Technical Reference, third edition, 2002.

[54] Robert Eckstein, Marc Loy, and Dave Wood. Java Swing. O’Reilly & Associates, Inc., Sebastopol, CA, USA, second edition, 1998.

[55] Alexander Egyed. Automatically Detecting and Tracking Inconsisten- cies in Software Design Models. IEEE Transactions on Software Engi- neering, 37(2):188–204, 2011.

[56] Khaled El Emam and Isabella Wieczorek. The Repeatability of Code Defect Classifications. In Ninth International Symposium on Software Reliability Engineering, ISSRE 1998, Paderborn, Germany, November 4-7, 1998, pages 322–333, 1998.

[57] Wolfgang Emmerich, Anthony Finkelstein, Carlo Montangero, Ste- fano Antonelli, Stephen Armitage, and Richard Stevens. Managing Standards Compliance. IEEE Transactions on Software Engineering, 25(6):836–851, 1999. BIBLIOGRAPHY 89

[58] Dawson Engler, Benjamin Chelf, Andy Chou, and Seth Hallem. Check- ing System Rules Using System-specific, Programmer-written Compiler Extensions. In Proceedings of the 4th Conference on Symposium on Op- erating System Design & Implementation - Volume 4, OSDI’00, pages 1–1, Berkeley, CA, USA, 2000. USENIX Association.

[59] Michael. E. Fagan. Design and code inspections to reduce errors in pro- gram development. IBM Systems Journal, 15(3):182–211, September 1976.

[60] Norman E. Fenton and Shari Lawrence Pfleeger. Software Metrics: A Rigorous and Practical Approach. PWS Publishing Co., Boston, MA, USA, third edition, 2015.

[61] Anthony C. W. Finkelstein, Dov Gabbay, Anthony Hunter, Jeff Kramer, and Bashar Nuseibeh. Inconsistency Handling in Multiper- spective Specifications. IEEE Transactions on Software Engineering, 20(8):569–578, October 1994.

[62] Stefan Fischer. Feature-Based Composition of Software-Systems. Mas- ter’s thesis, Johannes Kepler University Linz, February 2014.

[63] Stefan Fischer, Lukas Linsbauer, Roberto Erick Lopez-Herrejon, and Alexander Egyed. Enhancing Clone-and-Own with Systematic Reuse for Developing Software Variants. In 30th IEEE International Confer- ence on Software Maintenance and Evolution, Victoria, BC, Canada, September 29 - October 3, 2014, pages 391–400, 2014.

[64] John Fitzgerald, Peter Gorm Larsen, Paul Mukherjee, Nico Plat, and Marcel Verhoef. Validated Designs For Object-oriented Systems. Springer-Verlag TELOS, Santa Clara, CA, USA, 2005.

[65] Martin Fowler. Refactoring: Improving the Design of Existing Code. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1999.

[66] Martin Fowler. Patterns of Enterprise Application Architecture. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2002.

[67] Martin Fowler. Who Needs an Architect? IEEE Software Magazine, 20(5):11–13, September 2003. BIBLIOGRAPHY 90

[68] Martin Fowler. Continuous Integration, May 2006. martinfowler.com [Online; posted 01-May-2006].

[69] Martin Fowler. Frequency Reduces Difficulty, July 2011. martin- fowler.com [Online; posted 28-July-2011].

[70] Martin Fowler. Uniform Access Principle, April 2011. martinfowler.com [Online; posted 20-April-2011].

[71] Dov M. Gabbay and Anthony Hunter. Making Inconsistency Re- spectable: A Logical Framework for Inconsistency in Reasoning. In Philippe Jorrand and Jozef Kelemen, editors, FAIR, volume 535 of Lecture Notes in Computer Science, pages 19–32. Springer, 1991.

[72] Mark Gabel, Lingxiao Jiang, and Zhendong Su. Scalable Detection of Semantic Clones. In Proceedings of the 30th International Conference on Software Engineering, ICSE ’08, pages 321–330, New York, NY, USA, 2008. ACM.

[73] Mark Gabel, Junfeng Yang, Yuan Yu, Moisés Goldszmidt, and Zhen- dong Su. Scalable and Systematic Detection of Buggy Inconsistencies in Source Code. In William R. Cook, Siobhán Clarke, and Martin C. Rinard, editors, OOPSLA, pages 175–190. ACM, 2010.

[74] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns: Elements of Reusable Object-oriented Software. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1995.

[75] Angelo Gargantini and Constance L. Heitmeyer. Using Model Checking to Generate Tests from Requirements Specifications. In Proceedings of the 7th European Software Engineering Conference Held Jointly with the 7th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ESEC/FSE-7, pages 146–162, London, UK, UK, 1999. Springer-Verlag.

[76] Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In Pro- ceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), CVPR ’12, pages 3354–3361, Washington, DC, USA, 2012. IEEE Computer Society.

[77] git-merge online documentation. git-scm.com [Online; retrieved on 11- February-2015]. BIBLIOGRAPHY 91

[78] Robert L. Glass. The Mystery of Formal Methods Disuse. Communi- cations of the ACM, 47(8):15–17, August 2004.

[79] Fangfang Guo, Yu Li, Mohan S. Kankanhalli, and Michael S. Brown. An Evaluation of Wearable Activity Monitoring Devices. In Proceed- ings of the 1st ACM International Workshop on Personal Data Meets Distributed Multimedia, PDM ’13, pages 31–34, New York, NY, USA, 2013. ACM.

[80] Anthony Hall. Seven Myths of Formal Methods. IEEE Software Mag- azine, 7(5):11–19, September 1990.

[81] Seth Hallem, Benjamin Chelf, Yichen Xie, and Dawson Engler. A System and Language for Building System-specific, Static Analyses. ACM SIGPLAN Notices, 37(5):69–82, May 2002.

[82] Stuart Halloway. Programming Clojure. Pragmatic Bookshelf, first edition, 2009.

[83] Paul Hamill. Unit Test Frameworks. O’Reilly, first edition, 2004.

[84] Sudheendra Hangal and Monica S. Lam. Tracking Down Software Bugs Using Automatic Anomaly Detection. In Proceedings of the 24th Inter- national Conference on Software Engineering, ICSE ’02, pages 291–301, New York, NY, USA, 2002. ACM.

[85] Chris Hanson. Efficient Stack Allocation for Tail-recursive Languages. In Proceedings of the 1990 ACM Conference on LISP and Functional Programming, LFP ’90, pages 106–118, New York, NY, USA, 1990. ACM.

[86] Constance L. Heitmeyer. Software Cost Reduction. Wiley Online Li- brary, 2002.

[87] Tim Heyer. Semantic Inspection of Software Artifacts From Theory to Practice. PhD thesis, Linköping University Electronic Press, 2001.

[88] Kerry Hinge, Aditya K. Ghose, and George Koliadis. Process SEER: A Tool for Semantic Effect Annotation of Business Process Models. In EDOC, pages 54–63. IEEE Computer Society, 2009.

[89] David Hovemeyer and William Pugh. Finding Bugs is Easy. ACM SIGPLAN Notices, 39(12):92–106, 2004. BIBLIOGRAPHY 92

[90] LiGuo Huang, Vincent Ng, Isaac Persing, Ruili Geng, Xu Bai, and Jeff Tian. AutoODC: Automated generation of orthogonal defect classifi- cations. In 26th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 412–415. IEEE, 2011.

[91] Jez Humble and David Farley. Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation. Addison- Wesley Professional, first edition, 2010.

[92] Andrew Hunt and David Thomas. The Pragmatic Programmer: From Journeyman to Master. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1999.

[93] James W. Hunt and M. Douglas McIlroy. An Algorithm for Differential File Comparison. Technical Report 41, Bell Laboratories Computing Science, July 1976.

[94] James W. Hunt and Thomas G. Szymanski. A Fast Algorithm for Computing Longest Common Subsequences. Communications of the ACM, 20(5):350–353, May 1977.

[95] Anthony Hunter and Bashar Nuseibeh. Managing Inconsistent Spec- ifications: Reasoning, Analysis, and Action. ACM Transactions on Software Engineering and Methodology, 7(4):335–367, October 1998.

[96] Michael Hüttermann. DevOps for Developers. Apress, first edition, 2012.

[97] Adam Jacobs. The Pathologies of Big Data. Communications of the ACM, 52(8):36–44, August 2009.

[98] Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, and Stephane Glondu. DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones. In Proceedings of the 29th International Conference on Software Engineering, ICSE ’07, pages 96–105, Washington, DC, USA, 2007. IEEE Computer Society.

[99] Lingxiao Jiang and Zhendong Su. Automatic Mining of Functionally Equivalent Code Fragments via Random Testing. In Proceedings of the Eighteenth International Symposium on Software Testing and Analysis, ISSTA ’09, pages 81–92, New York, NY, USA, 2009. ACM.

[100] Brittany Johnson, Yoonki Song, Emerson Murphy-Hill, and Robert Bowdidge. Why Don’t Software Developers Use Static Analysis Tools BIBLIOGRAPHY 93

to Find Bugs? In Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pages 672–681, Piscataway, NJ, USA, 2013. IEEE Press.

[101] Elmar Juergens, Florian Deissenboeck, Benjamin Hummel, and Stefan Wagner. Do Code Clones Matter? In Proceedings of the 31st Interna- tional Conference on Software Engineering, ICSE ’09, pages 485–495, Washington, DC, USA, 2009. IEEE Computer Society.

[102] Stefan Jungmayr. Testability Measurement and Software Dependen- cies. In Proceedings of 12th International Workshop on Software Mea- surement, pages 179–202, Magdeburg, Germany, October 2002.

[103] Jean-Marc Jézéquel. Model-Driven Engineering for Software Product Lines.

[104] Toshihiro Kamiya, Shinji Kusumoto, and Katsuro Inoue. CCFinder: A Multilinguistic Token-based Code Clone Detection System for Large Scale Source Code. IEEE Transactions on Software Engineering, 28(7):654–670, July 2002.

[105] Kyo C. Kang, Sholom G. Cohen, James A. Hess, William E. Novak, and A. Spencer Peterson. Feature-Oriented Domain Analysis (FODA) Feasibility Study. Technical report, Carnegie-Mellon University Soft- ware Engineering Institute, November 1990.

[106] Kyo C. Kang, Vijayan Sugumaran, and Sooyong Park. Applied Software Product Line Engineering. Auerbach Publications, Boston, MA, USA, first edition, 2009.

[107] Ahmet Serkan Karatas, Halit Oguztüzün, and Ali H. Dogru. Mapping Extended Feature Models to Constraint Logic Programming over Finite Domains. In Jan Bosch and Jaejoon Lee, editors, Software Product Lines: Going Beyond - 14th International Conference, SPLC 2010, Jeju Island, South Korea, September 13-17, 2010. Proceedings, pages 286–299, 2010.

[108] Michael Kay. XSLT Programmer’s Reference. Wrox Press Ltd., Birm- ingham, UK, UK, 2000.

[109] Ryan D. Kelker. Clojure for Domain-specific Languages. Packt Pub- lishing, 2013. BIBLIOGRAPHY 94

[110] Sanjeev Khanna, Keshav Kunal, and Benjamin C. Pierce. A Formal Investigation of Diff3. In Proceedings of the 27th International Con- ference on Foundations of Software Technology and Theoretical Com- puter Science, FSTTCS’07, pages 485–496, Berlin, Heidelberg, 2007. Springer-Verlag.

[111] Dierk Koenig, Andrew Glover, Paul King, Guillaume Laforge, and Jon Skeet. Groovy in Action. Manning Publications Co., Greenwich, CT, USA, 2007.

[112] Gerald Kotonya and Ian Sommerville. Requirements engineering with viewpoints. Software Engineering Journal, 11(1):5–18, January 1996.

[113] Philippe Kruchten. The 4+1 View Model of Architecture. IEEE Soft- ware Magazine, 12(6):42–50, November 1995.

[114] Charles W. Krueger. Software Reuse. ACM Computing Surveys, 24(2):131–183, June 1992.

[115] Christian Kästner, Sven Apel, Thomas Thüm, and Gunter Saake. Type Checking Annotation-based Product Lines. ACM Transactions on Soft- ware Engineering and Methodology, 21(3):14:1–14:39, July 2012.

[116] John Lakos. Large-scale C++ Software Design. Addison Wesley Long- man Publishing Co., Inc., Redwood City, CA, USA, 1996.

[117] Theodore W. Leung. Professional XML Development with Apache Tools: Xerces, Xalan, FOP, Cocoon, Axis, Xindice. John Wiley & Sons, 2004.

[118] Nancy G. Leveson and Clay S. Turner. An Investigation of the Therac- 25 Accidents. Computer, 26(7):18–41, July 1993.

[119] Zhenmin Li, Shan Lu, Suvda Myagmar, and Yuanyuan Zhou. CP- Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code. In Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation - Volume 6, OSDI’04, pages 289–302, Berkeley, CA, USA, 2004. USENIX Association.

[120] Karl J. Lieberherr. Formulations and Benefits of the Law of Demeter. ACM SIGPLAN Notices, 24(3):67–78, March 1989.

[121] Karl J. Lieberherr, Ian M. Holland, and Arthur J. Riel. Object-oriented Programming: An Objective Sense of Style. In Conference Proceedings BIBLIOGRAPHY 95

on Object-oriented Programming Systems, Languages and Applications, OOPSLA ’88, pages 323–334, New York, NY, USA, 1988. ACM.

[122] Frank J. van der Linden, Klaus Schmid, and Eelco Rommes. Software Product Lines in Action: The Best Industrial Practice in Product Line Engineering. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2007.

[123] Tancred Lindholm. A three-way merge for XML documents. In Pro- ceedings of the 2004 ACM Symposium on Document Engineering, pages 1–10. ACM, 2004.

[124] Lukas Linsbauer. Reverse Engineering Variability from Product Vari- ants. Master’s thesis, Johannes Kepler University Linz, November 2013.

[125] Lukas Linsbauer, Roberto E. Lopez-Herrejon, and Alexander Egyed. Recovering Traceability between Features and Code in Product Vari- ants. In 17th International Software Product Line Conference (SPLC), Tokyo, Japan, pages 131–140, 2013.

[126] Jacques-Louis Lions et al. Ariane 5 flight 501 failure, 1996. di.unito.it [Online; retrieved on 19-February-2015].

[127] Wenqian Liu, Steve Easterbrook, and John Mylopoulos. Rule-Based Detection of Inconsistency in UML Models. In Proc. UML Workshop on Consistency Problems in UML-Based Software Development, pages 106–123. Blekinge Institute of Technology, 2002.

[128] David C. Luckham. Rapide: A Language and Toolset for Simulation of Distributed Systems by Partial Orderings of Events. Technical report, Stanford University, Stanford, CA, USA, 1996.

[129] Andrian Marcus and Jonathan I. Maletic. Identification of High-Level Concept Clones in Source Code. In Proceedings of the 16th IEEE In- ternational Conference on Automated Software Engineering, ASE ’01, pages 107–, Washington, DC, USA, 2001. IEEE Computer Society.

[130] Robert C. Martin. Clean Code: A Handbook of Agile Software Crafts- manship. Prentice Hall PTR, Upper Saddle River, NJ, USA, first edi- tion, 2008.

[131] Thomas J. McCabe. A Complexity Measure. In Proceedings of the second International Conference on Software Engineering, ICSE ’76, pages 407–, Los Alamitos, CA, USA, 1976. IEEE Computer Society Press. BIBLIOGRAPHY 96

[132] John D. McGregor. Testing a Software Product Line. Technical Report CMU/SEI-2001-TR-022, Carnegie-Mellon University Software Engi- neering Institute, 2001.

[133] Nenad Medvidovic, Paul Grünbacher, Alexander Egyed, and Barry W. Boehm. Software Model Connectors: Bridging Models across the Soft- ware Lifecycle. In Proceedings Thirteenth International Conference on Software Engineering & Knowledge Engineering, SEKE 2001, pages 387–396, 2001.

[134] Nenad Medvidovic, Peyman Oreizy, Jason E. Robbins, and Richard N. Taylor. Using Object-oriented Typing to Support Architectural Design in the C2 Style. In Proceedings of the 4th ACM SIGSOFT Symposium on Foundations of Software Engineering, SIGSOFT ’96, pages 24–32, New York, NY, USA, 1996. ACM.

[135] Stephen J. Mellor, Tony Clark, and Takao Futagami. Model-driven Development: Guest Editors’ Introduction. IEEE Software, 20(5):14– 18, 2003.

[136] Tom Mens. A State-of-the-Art Survey on Software Merging. IEEE Transactions on Software Engineering, 28(5):449–462, May 2002.

[137] Tom Mens, Michel Wermelinger, Stéphane Ducasse, Serge Demeyer, Robert Hirschfeld, and Mehdi Jazayeri. Challenges in software evo- lution. In 8th International Workshop on Principles of Software Evo- lution (IWPSE 2005), 5-7 September 2005, Lisbon, Portugal, pages 13–22. IEEE, IEEE Press, 2005.

[138] Gerard Meszaros. xUnit Test Patterns: Refactoring Test Code. Prentice Hall PTR, Upper Saddle River, NJ, USA, 2006.

[139] Bertrand Meyer. Object-oriented Software Construction (2Nd Ed.). Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1997.

[140] Glenford J. Myers, Corey Sandler, and Tom Badgett. The Art of Soft- ware Testing. John Wiley & Sons, third edition, 2004.

[141] Naresh Kumar Nagwani and Shrish Verma. BugML: Software Bug Markup Language. International Journal of Computer Applications, 26(2), 2011.

[142] Shamkant Navathe, Ramez Elmasri, and James Larson. Integrating User Views in Database Design. Computer, 19(1):50–62, 1986. BIBLIOGRAPHY 97

[143] Christian Nentwich, Licia Capra, Wolfgang Emmerich, and Anthony Finkelstein. Xlinkit: A Consistency Checking and Smart Link Genera- tion Service. ACM Transactions on Software Engineering and Method- ology, 2(2):151–185, May 2002. [144] Christian Nentwich, Wolfgang Emmerich, Anthony Finkelstein, and Ernst Ellmer. Flexible Consistency Checking. ACM Transactions on Software Engineering and Methodology, 12(1):28–63, January 2003. [145] Bashar Nuseibeh. To Be and Not to Be: On Managing Inconsistency in Software Development. In Proceedings of the 8th International Work- shop on Software Specification and Design, IWSSD ’96, pages 164–, Washington, DC, USA, 1996. IEEE Computer Society. [146] Bashar Nuseibeh, Steve M. Easterbrook, and Alessandra Russo. Mak- ing Inconsistency Respectable in Software Development. Journal of Systems and Software, 58(2):171–180, 2001. [147] Bashar Nuseibeh, Jeff Kramer, and Anthony Finkelstein. A Frame- work for Expressing the Relationships Between Multiple Views in Re- quirements Specification. IEEE Transactions on Software Engineering, 20(10):760–773, October 1994. [148] Martin Odersky, Lex Spoon, and Bill Venners. Programming in Scala: A Comprehensive Step-by-Step Guide. Artima Incorporation, USA, second edition, 2011. [149] OMG. MDA Guide Version 1.0.1. http://www.omg.org/cgi-bin/ doc?omg/03-06-01.pdf, June 2003. [150] Gerard O’Regan. A Practical Approach to Software Quality. Springer Publishing Company, Incorporated, 2011. [151] Raymond R. Panko. Applying Code Inspection to Spreadsheet Testing. Journal of Management Information Systems, 16(2):159–176, 1999. [152] David Lorge Parnas. Designing Software for Ease of Extension and Contraction. In Proceedings of the 3rd International Conference on Software Engineering, ICSE ’78, pages 264–277, Piscataway, NJ, USA, 1978. IEEE Press. [153] Dewayne E. Perry, Harvey P. Siy, and Lawrence G. Votta. Parallel Changes in Large-scale Software Development: An Observational Case Study. ACM Transactions on Software Engineering and Methodology, 10(3):308–337, July 2001. BIBLIOGRAPHY 98

[154] Nilanjan Raychaudhuri. Scala in Action. Manning Publications Co., Greenwich, CT, USA, 2013.

[155] Alexander Reder and Alexander Egyed. Model/Analyzer: A Tool for Detecting, Visualizing and Fixing Design Errors in UML. In 25th International Conference on Automated Software Engineering (ASE), Antwerp, Belgium, pages 347–348, 2010.

[156] Debbie Richards. Merging Individual Conceptual Models of Require- ments. Requirements Engineering, 8(4):195–205, 2003.

[157] Mark Richters and Martin Gogolla. On Formalizing the UML Ob- ject Constraint Language OCL. In Tok Wang Ling, Sudha Ram, and Mong-Li Lee, editors, Proceedings of the 17th International Conference on Conceptual Modeling, volume 1507 of Lecture Notes in Computer Science, pages 449–464, Berlin, Heidelberg, 1998. Springer-Verlag.

[158] Bill Ritcher. Guiffy SureMerge – A Trustworthy 3-Way Merge. Guiffy Software, 2011.

[159] Jason E. Robbins and David F. Redmiles. Cognitive Support, UML Adherence, and XMI Interchange in Argo/UML. In Conference on Construction of Software Engineering Tools (CoSET’99), Los Angeles, CA, May 17-18 1999.

[160] Jason E. Robbins, David M. Hilbert, and David F. Redmiles. Argo: A Design Environment for Evolving Software Architectures. International Conference on Software Engineering, 0:600, 1997.

[161] Jason E. Robbins and David F. Redmiles. Software Architecture Critics in the Argo Design Environment. Knowledge-Based Systems, 11(1):47– 60, 1998.

[162] Jason E. Robbins and David F. Redmiles. Software Architecture Critics in the Argo Design Environment. Knowledge-Based Systems, 11(1):47– 60, 1998.

[163] Stephen P. Robbins. Organizational Behavior: Concepts, Controver- sies, Applications. Prentice Hall, seventh edition, 1996.

[164] Klaus Schmid and Isabel John. A Customizable Approach to Full Lifecycle Variability Management. Science of Computer Programming, 53(3):259–284, December 2004. BIBLIOGRAPHY 99

[165] Robert W. Schwanke and Gail E. Kaiser. Living With Inconsistency in Large Systems. In Jürgen F. H. Winkler, editor, SCM, volume 30 of Berichte des German Chapter of the ACM, pages 98–118. Teubner, 1988.

[166] Harvey Siy and Lawrence Votta. Does The Modern Code Inspection Have Value? In Proceedings of the IEEE International Conference on Software Maintenance (ICSM’01), ICSM ’01, pages 281–, Washington, DC, USA, 2001. IEEE Computer Society.

[167] Harald Sondergaard and Peter Sestoft. Referential Transparency, Def- initeness and Unfoldability. Acta Informatica, 27(6):505–517, January 1990.

[168] John F. Sowa. Conceptual Graphs for Representing Conceptual Struc- tures. Conceptual Structures in Practice, pages 101–136, 2009.

[169] George Spanoudakis and Anthony Finkelstein. Reconciling Require- ments: A Method for Managing Interference, Inconsistency and Con- flict. Annals of Software Engineering, 3:433–457, 1997.

[170] George Spanoudakis and Kuriakos Kasis. Significance of Inconsistencies in UML Models. In Proceedings of the International Conference on Software: Theory and Practice, World Computer Congress, pages 152– 163, Beijing, China, 2000.

[171] George Spanoudakis and Hyoseob Kim. Diagnosis of the Significance of Inconsistencies in Object-oriented Designs: A Framework and Its Experimental Evaluation. Journal of Systems and Software, 64(1):3– 22, 2002.

[172] George Spanoudakis and Andrea Zisman. Inconsistency Management in Software Engineering: Survey and Open Research Issues. In Hand- book of Software Engineering and Knowledge Engineering, pages 329– 380. World Scientific, 2001.

[173] Richard N. Taylor, Nenad Medvidovic, Kenneth M. Anderson, E. James Whitehead, Jr., and Jason E. Robbins. A Component- and Message-based Architectural Style for GUI Software. In Proceedings of the 17th International Conference on Software Engineering, ICSE ’95, pages 295–304, New York, NY, USA, 1995. ACM. BIBLIOGRAPHY 100

[174] Alan Turing. On computable numbers, with an application to the Entscheidungsproblem. Proceedings of the London Mathematical Soci- ety, 42(2):230–265, 1936.

[175] Eric van der Vlist. Schematron. O’Reilly, first edition, 2007.

[176] Axel van Lamsweerde and Emmanuel Letier. Handling Obstacles in Goal-Oriented Requirements Engineering. IEEE Transactions on Soft- ware Engineering, 26(10):978–1005, October 2000.

[177] Axel van Lamsweerde, Emmanuel Letier, and Christophe Ponsard. Leaving Inconsistency. In Workshop on Living with Inconsistency, ICSE ’97, May 1997.

[178] Stefan Wagner. Defect Classification and Defect Types Revisited. In Proceedings of the 2008 Workshop on Defects in Large Software Sys- tems, DEFECTS ’08, pages 39–40, New York, NY, USA, 2008. ACM.

[179] Jos B. Warmer and Anneke G. Kleppe. The Object Constraint Lan- guage: Precise Modeling With UML (Addison-Wesley Object Technol- ogy Series). Addison-Wesley Professional, October 1998.

[180] John B. Wordsworth. Getting the Best from Formal Methods. Infor- mation and Software Technology, 41(14):1027–1032, November 1999.

[181] Zhenchang Xing and Eleni Stroulia. UMLDiff: An Algorithm for Object-oriented Design Differencing. In Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engi- neering, pages 54–65. ACM, 2005.

[182] John Zukowski. The Definitive Guide to Java Swing. Apress, Berkeley, CA, USA, third edition, 2005. Appendix A

Source Code

A.1 Code for ECCO

1 package at.jku.sea.plt.core.compose.rules; 2 3 import java.util.ArrayList; 4 import java.util.Collection; 5 import java.util.HashSet; 6 import java.util.List; 7 import java.util.Set; 8 import java.util.regex.Matcher; 9 import java.util.regex.Pattern; 10 11 import org.slf4j.Logger; 12 import org.slf4j.LoggerFactory; 13 14 import at.jku.sea.plt.core.artifact.Artifact; 15 import at.jku.sea.plt.core.artifact.Node; 16 import at.jku.sea.plt.core.compose.rules.data.MethodCall; 17 import at.jku.sea.plt.core.compose.rules.data.Variable; 18 import at.jku.sea.utils.StringUtil; 19 20 import com.google.common.collect.Sets; 21 22 /** 23 * Parses strings that represent code from the ECCO code tree. 24 * 25 * @author Matthias Braun 26 * 27 */

101 APPENDIX A. SOURCE CODE 102

28 public class EccoJavaParser { 29 30 /** 31 * ECCO nodes carry also the type of the code they contain. This string is used to identify expressions. 32 */ 33 private static final String EXPRESSION_TYPE = "EXPRESSION"; 34 /** 35 * ECCO nodes carry also the type of the code they contain. This string is used to identify assignments. 36 */ 37 private static final String OTHER_TYPE = "OTHER"; 38 39 /** 40 * The number returned when the position of something couldn’t be found. 41 */ 42 private static final int POS_NOT_FOUND = -1; 43 44 /** 45 * Matches strings like {@code obj.myMethod(params)} and 46 * {@code if (something)} Remembers what’s outside the parentheses and 47 * what’s inside. 48 *

49 * {@code (?U)} lets {@code \\w} also match non-ASCII letters. 50 */ 51 public static final Pattern PARENTHESES_REGEX = Pattern 52 .compile("(?U)(?[^(]+)\\s*\\((?.*)\\)"); 53 54 /** 55 * Attempts to match strings that are variable identifiers. Currently this 56 * does also match reserved Java words like {@code while} or {@code catch}. 57 *

58 * Explanation of the regex: 59 *

    60 *
  1. {@code (?U)} enables unicode support. This causes \w to match also 61 * non-ASCII letters
  2. 62 *
  3. The first character can be any letter (including the dollar sign) APPENDIX A. SOURCE CODE 103

    63 * except a number.

  4. 64 *
  5. After that any or no letter is allowed (including the dollar sign and 65 * the dot in case the variable looks like this: {@code this.var} or 66 * {@code otherObj.var}).
  6. 67 *
68 *

69 * Caveat: This regex does not match some exotic (but legal) Java 70 * identifiers like the unicode character U+0BF9 71 */ 72 private static final String VAR_REGEX = "(?U)[$\\w&&[^0-9]][$.\\w]*"; 73 74 75 private static final Logger LOG = LoggerFactory 76 .getLogger(EccoJavaParser.class); 77 78 /** 79 * Matches and remembers variables that are compared using the boolean and the relational operators. 80 * These are <, <=, >, >=, ==, !=, &&, ||, &, and |. 81 * Compare JLS 7 $15.20 $15.21 82 */ 83 private static final String COMPARE_OP_REGEX = "(?:<|<=|>|>=|==|!=|&&|&|\\|\\||\\|)"; 84 /** 85 * Matches for example {@code if(this == that)} and {@code if(this != that)} 86 * and remembers {@code this}. 87 */ 88 private static Pattern LEFT_COMPARED_VAR_REGEX = Pattern.compile("(" 89 + VAR_REGEX + ")\\s*" + COMPARE_OP_REGEX); 90 /** 91 * Matches for example {@code if(this && that)} and {@code if(this | that)} 92 * and remembers {@code that}. 93 */ 94 private static Pattern RIGHT_COMPARED_VAR_REGEX = Pattern 95 .compile(COMPARE_OP_REGEX + "\\s*(" + VAR_REGEX + ")"); 96 97 /** APPENDIX A. SOURCE CODE 104

98 * 99 * Matches variables that are read during an assignment like 100 * {@code int assignee = readVar} or {@code int assignee *= readVar} and 101 * remembers {@code readVar}. 102 *

103 * Does not match variables used in comparisons like 104 * {@code if(this == that)} or {@code if(this != that)}. 105 */ 106 private static final Pattern READ_VAR_VIA_ASSIGNMENT_REGEX = Pattern 107 .compile("[^=!]=\\s*(" + VAR_REGEX + ")"); 108 /** 109 * Matches variable assignments like {@code int var = 23;} and remembers the 110 * name of the variable that is written to. 111 */ 112 private static final Pattern SIMPLE_ASSIGNMENT_REGEX = Pattern.compile("(" 113 + VAR_REGEX + ")\\s*="); 114 /** 115 * Matches Java operators that involve two or more operands. 116 *

117 * Those operators are in words: addition, subtraction, multiplication, 118 * division, modulo, left-shift, signed right-shift and unsigned 119 * right-shift. From JLS $15.17, $15.18, and $15.19. 120 */ 121 private static final String BINARY_OP_REGEX = "(?:[+/%*\\-]|<<|>>|>>>)"; 122 /** 123 * Matches strings where a variable is used as the left-hand part of a 124 * binary operation in Java. The left variable is remembered. Example: 125 * {@code x + y} 126 */ 127 private static final Pattern LEFT_VAR_OF_BINARY_OP_REGEX = Pattern 128 .compile("(" + VAR_REGEX + ")\\s*" + BINARY_OP_REGEX); 129 /** 130 * Matches strings where a variable is used as the right-hand part of a APPENDIX A. SOURCE CODE 105

131 * binary operation in Java. The right variable is remembered. Example: 132 * {@code a - b} 133 */ 134 private static final Pattern RIGHT_VAR_OF_BINARY_OP_REGEX = Pattern 135 .compile(BINARY_OP_REGEX + "\\s*(" + VAR_REGEX + ")"); 136 /** 137 * Matches simple variable declaration like {@code int x}. 138 *

139 * The Ecco Java nodes don’t end with semicolons, hence the $ at the end of 140 * the regex. 141 *

142 * Does not match multiple declarations like {@code int x, y}. 143 */ 144 private static final Pattern VAR_DECLARATION_REGEX = Pattern 145 .compile("[^=]+?\\s+(" + VAR_REGEX + ")\\s*$"); 146 147 /** 148 * Matches strings where a variable is set to null. Remembers the variable. 149 * Example: {@code String str = null;} 150 */ 151 private static final Pattern VAR_SET_TO_NULL_REGEX = Pattern.compile("(" 152 + VAR_REGEX + ")\\s*=\\s*null"); 153 154 /** 155 * After these keywords may come an opening parenthesis. Some are in 156 * uppercase because that’s how they occur in the code fragment strings of 157 * ECCO. 158 */ 159 private static HashSet keyWordsBeforeParens = Sets.newHashSet( 160 "while", "for", "IF", "if", "try", "catch", "SWITCH", "FOREACH"); 161 162 /** 163 * Determine if a call causes a null pointer exception in a node that APPENDIX A. SOURCE CODE 106

164 * consists of multiple code statements. 165 * 166 * @param call 167 * The call that might cause a NPE. 168 * @param node 169 * The node containing multiple statements. Among them is the 170 * method call. 171 * @return Whether the method call causes a NPE. 172 */ 173 public static boolean callCausesNPE(final MethodCall call, final Node node) { 174 boolean causesNPE = false; 175 final Variable receiver = call.getReceiver(); 176 final int posWhereReceiverIsSetToNull = getPosWhereVarIsSetToNull( 177 receiver, node); 178 final int posWhereReceiverIsCalled = getPosWhereReceiverIsCalled(call, 179 node); 180 if (posWhereReceiverIsSetToNull != POS_NOT_FOUND 181 && posWhereReceiverIsCalled != POS_NOT_FOUND) { 182 // true if the call happened after setting the receiver to null 183 causesNPE = posWhereReceiverIsCalled > posWhereReceiverIsSetToNull; 184 } 185 return causesNPE; 186 } 187 188 /** 189 * Checks whether {@code node} contains a method call. 190 * @param node 191 * check whether this node contains a method call 192 * @return whether a method call happens inside the code of {@code node} 193 */ 194 public static boolean containsMethodCall(final Node node) { 195 final String identifier = node.getArtifact().getIdentifier(); 196 return containsMethodCall(identifier); 197 } 198 199 /** APPENDIX A. SOURCE CODE 107

200 * A node can contain multiple statements, for example when it contains an 201 * anonymous inner class. 202 * 203 * @param node 204 * check whether this node contains multiple statements 205 * @return whether {@code node} contains multiple statements 206 */ 207 public static boolean containsMultipleStmts(final Node node) { 208 final String nodeAsStr = node.getArtifact().toString(); 209 // Remove all string content from the node 210 String nodeWithOutStringLiterals = StringUtil.deleteBetween(nodeAsStr, 211 "\""); 212 nodeWithOutStringLiterals = StringUtil.deleteBetween( 213 nodeWithOutStringLiterals, "’"); 214 return nodeWithOutStringLiterals.contains(";"); 215 } 216 217 /** 218 * Gets all the {@link Variable}s declared in the {@code node} 219 * @param node 220 * node that might declared variables 221 * @return all declared {@link Variable}s inside {@code node} 222 */ 223 public static Set getDeclaredVars(final Node node) { 224 return getDeclaredVars(node.getArtifact().toString()); 225 } 226 227 /** 228 * Gets all the {@link MethodCall}s that occurred in the {@code node} 229 * @param node 230 * node that might contain method calls 231 * @return all {@link MethodCall}s inside {@code node} 232 */ 233 public static List getMethodCalls(final Node node) { 234 return getMethodCalls(node.getArtifact().toString()); 235 } 236 237 /** 238 * Gets all the {@link MethodCall}s that occurred in the code statement {@code stmt}. APPENDIX A. SOURCE CODE 108

239 * @param stmt 240 * statement that might contain method calls 241 * @return all {@link MethodCall}s inside {@code stmt} 242 * 243 */ 244 public static List getMethodCalls(final String stmt) { 245 final List calls = new ArrayList<>(); 246 getMethodCalls(stmt, calls); 247 return calls; 248 } 249 250 /** 251 * Gets the the variables that were read during a Java code statement. 252 * 253 * @param stmt 254 * the code statement that might contain read variables 255 * @return a {@link List} of {@link Variable}s that were read in the 256 * statement 257 */ 258 public static Set getReadVars(final Node stmt) { 259 final String identifier = stmt.getArtifact().getIdentifier(); 260 return getReadVars(identifier); 261 } 262 263 /** 264 * Gets a the {@link Variable}s that were set to null inside a statement. 265 * 266 * @param stmt 267 * the statement as a {@link Node} that might contain variables 268 * @return a set of {@link Variable}s that were set to null in the 269 * statement 270 */ 271 public static Set getVarsSetToNull(final Node stmt) { 272 final String stmtAsString = stmt.getArtifact().toString(); 273 final Set vars = new HashSet<>(); 274 final Matcher matcher = VAR_SET_TO_NULL_REGEX.matcher(stmtAsString); APPENDIX A. SOURCE CODE 109

275 while (matcher.find()) { 276 final String setToNullVar = matcher.group(1); 277 final Variable var = new Variable(setToNullVar); 278 vars.add(var); 279 } 280 return vars; 281 } 282 283 /** 284 * Parses a {@link Node} and gets the {@link Variable}s that were written to 285 * in that node. 286 * 287 * @param node 288 * the node that may contain variables that are changed 289 * @return a set of {@link Variable}s that were written to 290 */ 291 public static Set getWrittenToVars(final Node node) { 292 final Set changedVars = new HashSet<>(); 293 final Artifact artifact = node.getArtifact(); 294 final String type = artifact.getType(); 295 // This type may contain assignments 296 if (type.equals(OTHER_TYPE) || type.equals(EXPRESSION_TYPE)) { 297 final String identifier = artifact.getIdentifier(); 298 changedVars.addAll(getSimplyAssignedVars(identifier)); 299 300 } 301 return changedVars; 302 } 303 304 private static boolean containsMethodCall(final String s) { 305 return getMethodCalls(s).size() > 0; 306 } 307 308 /** 309 * Gets {@link Variable}s inside a statement that were declared but not 310 * initialized. 311 *

312 * Example: {@code int i;}. 313 * 314 * @param stmt APPENDIX A. SOURCE CODE 110

315 * the statement as a string that might contain variables 316 * @return a set of declared {@link Variable}s 317 */ 318 private static Set getDeclaredVars(final String stmt) { 319 final Set declaredVars = new HashSet<>(); 320 final Matcher matcher = VAR_DECLARATION_REGEX.matcher(stmt); 321 while (matcher.find()) { 322 final String varName = matcher.group(1); 323 declaredVars.add(new Variable(varName)); 324 } 325 return declaredVars; 326 } 327 328 /** 329 * Gets all the method calls of a Java statement. 330 * 331 * @param stmt 332 * the code statement that might contain a method call 333 * @param calls 334 * a {@link List} of currently found method calls 335 */ 336 private static void getMethodCalls(final String stmt, 337 final List calls) { 338 final Matcher matcher = PARENTHESES_REGEX.matcher(stmt); 339 while (matcher.find()) { 340 final String beforeParens = matcher.group("outer"); 341 final String insideParens = matcher.group("inner"); 342 if (!keyWordsBeforeParens.contains(beforeParens) 343 && !isConstructor(beforeParens)) { 344 345 final MethodCall call = MethodCall.create(beforeParens, 346 insideParens); 347 calls.add(call); 348 } 349 getMethodCalls(insideParens, calls); 350 } 351 } 352 353 private static int getPosWhereReceiverIsCalled(final MethodCall call, 354 final Node node) { APPENDIX A. SOURCE CODE 111

355 int pos = POS_NOT_FOUND; 356 // A node can consist of multiple statements if it’s an anonymous class 357 final String[] stmts = node.getArtifact().getIdentifier().split(";"); 358 for (int i = 0; i < stmts.length; i++) { 359 final String stmt = stmts[i]; 360 if (methodIsCalled(stmt, call)) { 361 pos = i; 362 } 363 } 364 return pos; 365 } 366 367 private static int getPosWhereVarIsSetToNull(final Variable var, 368 final Node node) { 369 int pos = POS_NOT_FOUND; 370 // A node can consist of multiple statements if it’s an anonymous class 371 final String[] stmts = node.getArtifact().getIdentifier().split(";"); 372 for (int i = 0; i < stmts.length; i++) { 373 final String stmt = stmts[i]; 374 if (varIsSetToNull(stmt, var)) { 375 pos = i; 376 } 377 } 378 return pos; 379 } 380 381 private static Set getReadVars(final String stmt) { 382 final Set readVars = new HashSet<>(); 383 readVars.addAll(getReadVarsFromAssignment(stmt)); 384 readVars.addAll(getReadVarsFromComparison(stmt)); 385 readVars.addAll(getReadVarsFromBinaryOps(stmt)); 386 return readVars; 387 388 } 389 390 /** 391 * Gets the variables that are read in an assignment. For example in 392 * {@code int i = readVar;} the read variable is {@code readVar}. APPENDIX A. SOURCE CODE 112

393 *

394 * According to JLS 7 §15.26 these are the twelve assignment operators: = *= 395 * /= %= += -= <<= >>= >>>= &= ^= |= 396 * 397 * @param s 398 * The string that might contain assignments 399 * @return A {@link Collection} of variables that were read in the 400 * assignment 401 */ 402 private static Collection getReadVarsFromAssignment(final String s) { 403 final List readVars = new ArrayList<>(); 404 405 final Matcher matcher = READ_VAR_VIA_ASSIGNMENT_REGEX.matcher(s); 406 while (matcher.find()) { 407 final String readVarName = matcher.group(1); 408 readVars.add(new Variable(readVarName)); 409 } 410 return readVars; 411 } 412 413 /** 414 * Gets all variables whose value was read during a binary operation such as {@code a + b} or {@code x << y}. 415 * @param s 416 * The string that might contain variables read via binary operations 417 * @return A {@link Collection} of variables whose value was read during a binary operation 418 * */ 419 private static Collection getReadVarsFromBinaryOps(final String s) { 420 final Collection readVars = new HashSet<>(); 421 final Matcher leftMatcher = LEFT_VAR_OF_BINARY_OP_REGEX.matcher(s); 422 while (leftMatcher.find()) { 423 final String leftReadVarName = leftMatcher.group(1); 424 readVars.add(new Variable(leftReadVarName)); 425 } 426 final Matcher rightMatcher = APPENDIX A. SOURCE CODE 113

RIGHT_VAR_OF_BINARY_OP_REGEX.matcher(s); 427 while (rightMatcher.find()) { 428 final String rightReadVarName = rightMatcher.group(1); 429 readVars.add(new Variable(rightReadVarName)); 430 } 431 return readVars; 432 433 } 434 435 private static Collection getReadVarsFromComparison(final String s) { 436 437 final Collection comparedVars = new HashSet<>(); 438 final Matcher leftMatcher = LEFT_COMPARED_VAR_REGEX.matcher(s); 439 while (leftMatcher.find()) { 440 final String leftReadVarName = leftMatcher.group(1); 441 comparedVars.add(new Variable(leftReadVarName)); 442 } 443 final Matcher rightMatcher = RIGHT_COMPARED_VAR_REGEX.matcher(s); 444 while (rightMatcher.find()) { 445 final String rightReadVarName = rightMatcher.group(1); 446 comparedVars.add(new Variable(rightReadVarName)); 447 } 448 return comparedVars; 449 } 450 451 private static List getSimplyAssignedVars(final String identifier) { 452 final List assignedVars = new ArrayList<>(); 453 final Matcher matcher = SIMPLE_ASSIGNMENT_REGEX.matcher(identifier); 454 while (matcher.find()) { 455 final String changedVar = matcher.group(1); 456 final Variable var = new Variable(changedVar); 457 assignedVars.add(var); 458 } 459 return assignedVars; 460 } 461 462 private static boolean isConstructor(final String beforeParens) { 463 return beforeParens.contains("new "); APPENDIX A. SOURCE CODE 114

464 } 465 466 private static boolean methodIsCalled(final String stmt, 467 final MethodCall call) { 468 final String callAsString = call.toString(); 469 return stmt.contains(callAsString); 470 } 471 472 private static boolean varIsSetToNull(final String stmt, final Variable var) { 473 final String varName = var.getName(); 474 return stmt.matches(".*\\s*" + varName + "\\s*=\\s*null.*"); 475 } 476 477 } Listing A.1: Parser for ECCO’s Java representation.

1 package eu.matthiasbraun.codeAnalysis; 2 3 import java.io.File; 4 import java.nio.file.Path; 5 import java.util.ArrayList; 6 import java.util.List; 7 8 import org.slf4j.Logger; 9 import org.slf4j.LoggerFactory; 10 11 import com.google.common.base.Joiner; 12 import com.google.common.base.Optional; 13 14 import eu.matthiasbraun.codeAnalysis.config.EccoConf; 15 import eu.matthiasbraun.pmd.rules.helpers.FileUtil; 16 import eu.matthiasbraun.pmd.rules.helpers.KeyNotFoundException; 17 import eu.matthiasbraun.pmd.rules.helpers.StringUtil; 18 import eu.matthiasbraun.pmd.rules.helpers.SysUtil; 19 20 /** 21 * Executes different code analysis tools and collects their output. 22 * 23 * @author Matthias Braun 24 * 25 */ APPENDIX A. SOURCE CODE 115

26 public class CodeChecker { 27 28 private static final Logger log = LoggerFactory 29 .getLogger(CodeChecker.class); 30 /* 31 * All Java files end with ’.java’. Use this regex to find all Java source 32 * files in a project 33 */ 34 private static final String JAVA_FILES_REGEX = ".*\\.java$"; 35 36 // javac reads the java files to compile from this file 37 private static final File FILES_TO_COMPILE_TXT = new File( 38 "filesToCompile.txt"); 39 40 public static Optional callPmd(final File file) { 41 Optional outputMaybe = Optional.absent(); 42 try { 43 /* 44 * Path to the PMD executable relative to the directory of this 45 * project 46 */ 47 final String pmdPath = EccoConf.get(EccoConf.ABS_PATH_TO_PMDF); 48 /* 49 * The XML file defining which rules should be applied 50 */ 51 final String ruleSet = EccoConf.get(EccoConf.PMD_RULE_SET); 52 // The custom PMD rules have to be in a Jar so PMD can use them 53 putPmdRulesInJar(); 54 final List pmdCommand = new ArrayList<>(); 55 // Executable batch file 56 pmdCommand.add(pmdPath); 57 // Specify the directory with the code to analyze 58 pmdCommand.add("-dir"); 59 pmdCommand.add(file.getAbsolutePath()); 60 // These rules are used for checking 61 pmdCommand.add("-rulesets"); 62 pmdCommand.add(ruleSet); 63 // Omit the directory path of the class where bugs are found 64 pmdCommand.add("-shortnames"); APPENDIX A. SOURCE CODE 116

65 // This prints the loaded rules and detailed error messages 66 // pmdCommand.add("-debug"); 67 68 outputMaybe = SysUtil.call(pmdCommand); 69 } catch (final KeyNotFoundException e) { 70 log.warn(e.getMessage(), e); 71 } 72 73 return outputMaybe; 74 } 75 76 public static void main(final String[] args) { 77 78 try { 79 // Check this project for bugs 80 final String checkThisProject = EccoConf 81 .get(EccoConf.CHECK_THIS_PROJECT); 82 callFindBugs(checkThisProject); 83 callJavac(checkThisProject); 84 final Optional pmdOutput = callPmd(new File( 85 checkThisProject)); 86 if (pmdOutput.isPresent()) { 87 log.info(pmdOutput.get()); 88 } else { 89 log.warn("No PMD output"); 90 } 91 } catch (final KeyNotFoundException e) { 92 log.error(e.getMessage(), e); 93 } 94 95 } 96 97 /** 98 * Call javac, the Java compiler to get errors and warnings about the code 99 * to analyze. 100 * 101 * @param checkThisProject 102 * The absolute path to the project whose code should be 103 * analyzed. 104 * @throws KeyNotFoundException 105 * Thrown when the a key in a property file is not found. APPENDIX A. SOURCE CODE 117

106 */ 107 private static void callJavac(final String checkThisProject) 108 throws KeyNotFoundException { 109 final String javacPath = EccoConf.get(EccoConf.ABSOLUTE_PATH_TO_JAVAC); 110 111 final Path projectPath = FileUtil.toPath(checkThisProject); 112 /* 113 * javac can’t look for files recursively -> Create a file containing 114 * the paths to all java files that javac should compile and pass that 115 * file as a parameter to javac 116 */ 117 final boolean recursive = true; 118 final List filePaths = FileUtil.listFiles(projectPath, recursive, 119 JAVA_FILES_REGEX); 120 final String filePathsStr = Joiner.on(FileUtil.EOL).join(filePaths); 121 FileUtil.write(filePathsStr, FILES_TO_COMPILE_TXT); 122 123 final List javacCommand = new ArrayList<>(); 124 javacCommand.add(javacPath); 125 // Get more warnings 126 javacCommand.add("-Xlint"); 127 javacCommand.add("@" + FILES_TO_COMPILE_TXT); 128 final Optional outputMaybe = SysUtil.call(javacCommand); 129 if (outputMaybe.isPresent()) { 130 log.info(outputMaybe.get()); 131 } else { 132 log.warn("Unsuccessful call: {} ", javacCommand); 133 } 134 } 135 136 /** 137 * The custom rules for PMD must be put in a Jar file into the lib folder of 138 * PMD. Otherwise PMD won’t find the rules. 139 * 140 * @throws KeyNotFoundException 141 * APPENDIX A. SOURCE CODE 118

142 */ 143 private static void putPmdRulesInJar() throws KeyNotFoundException { 144 // The folder containing the .class files 145 final String binDir = "bin/"; 146 final String pmdRulesJar = EccoConf.get(EccoConf.CUSTOM_PMD_RULES_JAR); 147 final String pmdRules = EccoConf.get(EccoConf.CUSTOM_PMD_RULES_DIR); 148 final List jarCommand = new ArrayList<>(); 149 // Create a new Jar archive and specify its file name 150 jarCommand.add("jar"); 151 jarCommand.add("-cf"); 152 // Put the Jar into PMD’s lib directory so PMD finds it 153 jarCommand.add(pmdRulesJar); 154 // Change to the bin directory 155 jarCommand.add("-C"); 156 jarCommand.add(binDir); 157 // Include the class files from this directory 158 jarCommand.add(pmdRules); 159 SysUtil.call(jarCommand); 160 } 161 162 private static void callFindBugs(final String checkThisProject) 163 throws KeyNotFoundException { 164 /* 165 * Path to the FindBugs executable relative to the directory of this 166 * project 167 */ 168 final String findBugsPath = EccoConf 169 .get(EccoConf.RELATIVE_PATH_TO_FIND_BUGS); 170 171 // Add the parameters 172 final String[] findBugsCommand = { findBugsPath, "-textui", 173 checkThisProject }; 174 175 final Optional output = SysUtil.call(findBugsCommand); 176 if (output.isPresent()) { 177 log.info(output.get()); 178 } else { 179 log.warn("Unsuccessful call: {}", 180 StringUtil.asString(findBugsCommand)); APPENDIX A. SOURCE CODE 119

181 } 182 183 } 184 185 } Listing A.2: CodeChecker wrapper that uses FindBugs, PMD, and javac with lint to detect invalid artifacts.

1 package eu.matthiasbraun.pmd.rules; 2 3 import java.io.File; 4 import java.util.ArrayList; 5 import java.util.Collection; 6 import java.util.List; 7 8 import net.sourceforge.pmd.lang.java.ast.ASTBlock; 9 import net.sourceforge.pmd.lang.java.ast.ASTBlockStatement; 10 import net.sourceforge.pmd.lang.java.ast.ASTPrimaryExpression; 11 import net.sourceforge.pmd.lang.java.ast.ASTStatementExpression; 12 import net.sourceforge.pmd.lang.java.rule.AbstractJavaRule; 13 14 import org.slf4j.Logger; 15 import org.slf4j.LoggerFactory; 16 17 import com.google.common.base.Optional; 18 19 import eu.matthiasbraun.pmd.rules.helpers.FileUtil; 20 import eu.matthiasbraun.pmd.rules.helpers.PmdUtil; 21 import eu.matthiasbraun.pmd.rules.helpers.VarAssignment.VarAssignment; 22 import eu.matthiasbraun.pmd.rules.helpers.VarAssignment.VarAssignmentFactory; 23 import eu.matthiasbraun.pmd.rules.helpers.VarAssignment.Variable; 24 25 /** 26 * Every field must be read before being assigned a new value. 27 *

28 * For every block of Java code this rule keeps a list of fields. If a field 29 * receives a new value multiple times without its value being read or a method 30 * being called in between, this rule is violated. APPENDIX A. SOURCE CODE 120

31 * 32 * @author Matthias Braun 33 * 34 */ 35 public class MultipleFieldAssignment extends AbstractJavaRule { 36 37 private static final Logger log = LoggerFactory 38 .getLogger(MultipleFieldAssignment.class); 39 40 public void createRuleViolation(final ASTBlockStatement blockStmt, 41 final Variable assignee) { 42 final String message = "Repeated assignment to field " 43 + assignee.getName() + " in " + PmdUtil.getLocation(blockStmt) 44 + FileUtil.EOL; 45 System.err.println(message); 46 final File outputFile = new File("ruleViolations.txt"); 47 FileUtil.append(message, outputFile); 48 log.warn("Repeated assignment to field {} in {}", assignee.getName(), PmdUtil.getLocation(blockStmt)); 49 } 50 51 @Override 52 public Object visit(final ASTBlock node, final Object data) { 53 /* 54 * Get all block statements which contain the assignments and method 55 * calls from the current block of Java code 56 */ 57 final List blockStatements = node 58 .findChildrenOfType(ASTBlockStatement.class); 59 60 checkForMultipleFieldAssignments(blockStatements); 61 62 return super.visit(node, data); 63 } 64 65 /** 66 * See if a field is assigned multiple times without its value being read in 67 * between those assignments. 68 * 69 * @param blockStatements 70 * The statements of a block of Java code. They APPENDIX A. SOURCE CODE 121

constitute method 71 * calls and variable assignments. 72 */ 73 private void checkForMultipleFieldAssignments( 74 final List blockStatements) { 75 /* 76 * These fields are suspicious: They were set to a value other than null 77 * and have not been read so far. If they get assigned a new value their 78 * previous assignment was superfluous and we’ll create a rule 79 * violation. 80 */ 81 final List watchedVars = new ArrayList<>(); 82 83 /* 84 * A block statement can contain a variable assignment and/or a method 85 * call 86 */ 87 for (final ASTBlockStatement blockStmt : blockStatements) { 88 /* 89 * Remove all watched variables that are used in this block 90 * statement 91 */ 92 removeUsedFields(watchedVars, blockStmt); 93 /* 94 * Create a warning if this is an assignment to a field that wasn’t 95 * used since its last assignment. 96 */ 97 final Collection assignments = VarAssignmentFactory 98 .fromBlock(blockStmt); 99 100 for (final VarAssignment assignment : assignments) { 101 102 final Variable assignee = assignment.getAssignee(); 103 if (assignee == null){ 104 log.warn( 105 "Block statement is a variable assignment but could not get variable name: {}", 106 PmdUtil.getLocation(blockStmt)); APPENDIX A. SOURCE CODE 122

107 } 108 // This rule is only interested in fields 109 if (assignee.isField()) { 110 /* 111 * Field was assigned a new value without its old one being read 112 * previously -> Create a warning 113 */ 114 if (watchedVars.contains(assignee)) { 115 createRuleViolation(blockStmt, assignee); 116 } 117 watchedVars.add(assignee); 118 } 119 } 120 } 121 } 122 123 /** 124 * A field is considered used when it is assigned to another variable or a 125 * method is called. 126 * 127 * @param watchedFields 128 * List of currently suspicious fields -> They weren’t used yet. 129 * @param blockStmt 130 * An {@link ASTBlockStatement} that can be a variable assignment 131 * or a method call. 132 */ 133 private void removeUsedFields(final List watchedFields, 134 final ASTBlockStatement blockStmt) { 135 final Optional stmtExpMaybe = PmdUtil 136 .getStatementExpression(blockStmt); 137 138 /* 139 * First check if a method is called: This resets the list of watched 140 * fields because the method might have accessed them 141 */ 142 if (stmtExpMaybe.isPresent()) { 143 144 final ASTStatementExpression stmtExp = stmtExpMaybe.get(); APPENDIX A. SOURCE CODE 123

145 for (final ASTPrimaryExpression primeExp : stmtExp 146 .findChildrenOfType(ASTPrimaryExpression.class)) { 147 if (PmdUtil.containsMethodCall(primeExp)) { 148 watchedFields.clear(); 149 } 150 } 151 } 152 /* 153 * A field was read during an assignment -> remove it from the watch 154 * list 155 */ 156 final Collection assignments = VarAssignmentFactory 157 .fromBlock(blockStmt); 158 for (final VarAssignment assignment : assignments) { 159 watchedFields.removeAll(assignment.getRightSideVars()); 160 } 161 } 162 } Listing A.3: Custom rule for the PMD rule checking engine. It detects redundant assignments to fields in Java code.

1 package eu.matthiasbraun.pmd.rules; 2 3 import java.io.File; 4 5 import net.sourceforge.pmd.lang.java.ast.ASTPrimaryExpression; 6 import net.sourceforge.pmd.lang.java.rule.AbstractJavaRule; 7 8 import org.slf4j.Logger; 9 import org.slf4j.LoggerFactory; 10 11 import com.google.common.base.Optional; 12 13 import eu.matthiasbraun.pmd.rules.helpers.FileUtil; 14 import eu.matthiasbraun.pmd.rules.helpers.MethodCall; 15 import eu.matthiasbraun.pmd.rules.helpers.PmdUtil; 16 17 /** 18 * This rule is violated when the same setter method is called multiple times on the same object APPENDIX A. SOURCE CODE 124

19 * without other method calls happening in between. 20 *

21 * Example for violation:
22 * myObj.setFoo(5);
23 * int x = 1;
24 * myObj.setFoo(9);

25 *
This is ok:
26 * myObj.setFoo(5);
27 * bar(); // The value of foo might get used in bar()
28 * myObj.setFoo(9); 29 *
30 * 31 * @author Matthias Braun 32 * 33 */ 34 public class MultipleSetterCall extends AbstractJavaRule { 35 36 private static final Logger logger = LoggerFactory 37 .getLogger(MultipleSetterCall.class); 38 /** 39 * The previously called setter method. It should not be called again until 40 * another method is invoked. 41 */ 42 private static MethodCall previousSetterMethod; 43 44 @Override 45 public Object visit(final ASTPrimaryExpression node, final Object data) { 46 final Optional methodNameMaybe = PmdUtil.getMethodName(node); 47 // A method was called 48 if (methodNameMaybe.isPresent()) { 49 final String methodName = methodNameMaybe.get(); 50 final MethodCall meth = new MethodCall(methodName, node); 51 52 if (meth.isSetter()) { 53 if (meth.equals(previousSetterMethod)) { 54 final String message = "Repeated call of setter " + meth 55 + " in" + PmdUtil.getLocation(node) + FileUtil.EOL; 56 System.err.println(message); 57 final File outputFile = new File("ruleViolations.txt"); 58 FileUtil.append(message, outputFile); APPENDIX A. SOURCE CODE 125

59 } else { 60 /* 61 * The current setter method might have used variables that 62 * were set in the previous setter call -> this setter call 63 * might have been useful and is not suspicious anymore 64 */ 65 previousSetterMethod = meth; 66 } 67 } 68 /* 69 * This method call was not a setter. It might have used the value 70 * set in the previous setter method -> The previous setter method 71 * is not suspicious anymore 72 */ 73 else { 74 previousSetterMethod = null; 75 76 } 77 } 78 return super.visit(node, data); 79 } 80 81 } Listing A.4: Custom rule for the PMD rule checking engine. Similar to A.3 it checks whether there are redundant setter calls in code.

1 package eu.matthiasbraun.pmd.rules.helpers; 2 3 import java.util.ArrayList; 4 import java.util.List; 5 import java.util.Set; 6 7 import net.sourceforge.pmd.lang.java.ast.ASTArguments; 8 import net.sourceforge.pmd.lang.java.ast.ASTBlock; 9 import net.sourceforge.pmd.lang.java.ast.ASTBlockStatement; 10 import net.sourceforge.pmd.lang.java.ast.ASTExpression; 11 import net.sourceforge.pmd.lang.java.ast.ASTLocalVariableDeclaration; 12 import net.sourceforge.pmd.lang.java.ast.ASTName; 13 import net.sourceforge.pmd.lang.java.ast.ASTPrimaryExpression; APPENDIX A. SOURCE CODE 126

14 import net.sourceforge.pmd.lang.java.ast.ASTPrimaryPrefix; 15 import net.sourceforge.pmd.lang.java.ast.ASTPrimarySuffix; 16 import net.sourceforge.pmd.lang.java.ast.ASTStatement; 17 import net.sourceforge.pmd.lang.java.ast.ASTStatementExpression; 18 import net.sourceforge.pmd.lang.java.ast.ASTType; 19 import net.sourceforge.pmd.lang.java.ast.ASTVariableDeclarator; 20 import net.sourceforge.pmd.lang.java.ast.AbstractJavaNode; 21 import net.sourceforge.pmd.lang.java.symboltable.NameDeclaration; 22 import net.sourceforge.pmd.lang.java.symboltable.VariableNameDeclaration; 23 24 import org.slf4j.Logger; 25 import org.slf4j.LoggerFactory; 26 27 import com.google.common.base.Optional; 28 29 import eu.matthiasbraun.pmd.rules.helpers.VarAssignment.Variable; 30 31 /** 32 * Utility functions for PMD rules. 33 * 34 * @author Matthias Braun 35 * 36 */ 37 public class PmdUtil { 38 39 private static final Logger log = LoggerFactory.getLogger(PmdUtil.class); 40 41 /** 42 * Does this {@link ASTStatementExpression} contain a method call? 43 *

44 * Note that this node might be an assignment using a method call like 45 * String s = getString(); 46 * 47 * @param node 48 * The {@link ASTStatementExpression} that might contain a method 49 * call. 50 * @return Whether the {@code node} contains a method call. 51 */ 52 public static boolean containsMethodCall(final APPENDIX A. SOURCE CODE 127

ASTPrimaryExpression node) { 53 boolean isMethodCall = false; 54 final List suffixes = node 55 .findDescendantsOfType(ASTPrimarySuffix.class); 56 /* 57 * "this.foo(bar)" has two suffixes: "foo" and the argument list "bar" 58 */ 59 for (final ASTPrimarySuffix suffix : suffixes) { 60 if (suffix.isArguments()) { 61 isMethodCall = true; 62 break; 63 } 64 } 65 return isMethodCall; 66 } 67 68 public static String getFullClassName(final AbstractJavaNode node) { 69 final String packageName = node.getScope() 70 .getEnclosingSourceFileScope().getPackageName(); 71 final String className = node.getScope().getEnclosingClassScope() 72 .getClassName(); 73 final String fullClassName = packageName + "." + className; 74 return fullClassName; 75 } 76 77 public static int getLineNr(final AbstractJavaNode node) { 78 return node.getBeginLine(); 79 } 80 81 /** 82 * Gets the location of a node consisting of its full class name and the 83 * line number. 84 * 85 * @param node 86 * Node in whose location we are interested. 87 * @return The location of the node (class name + line number). 88 */ 89 public static String getLocation(final AbstractJavaNode node) { 90 final String className = getFullClassName(node); 91 final int line = getLineNr(node); APPENDIX A. SOURCE CODE 128

92 final String location = className + " at " + line; 93 return location; 94 } 95 96 /** 97 * Gets the name of a method from a primary expression in the syntax tree. 98 * 99 * @param node 100 * The primary expression node within the syntax tree which might 101 * have a method name as its children. 102 * @return The method name wrapped in an {@link Optional} in case the 103 * primary expression has no method name. 104 */ 105 public static Optional getMethodName(final ASTPrimaryExpression node) { 106 String methodName = null; 107 if (containsMethodCall(node)) { 108 // Works for method calls like "obj.foo()" 109 final ASTPrimaryPrefix astPrefix = node 110 .getFirstChildOfType(ASTPrimaryPrefix.class); 111 final ASTName astName = astPrefix 112 .getFirstChildOfType(ASTName.class); 113 if (astName != null){ 114 methodName = astName.getImage(); 115 } 116 // Works for method calls like "this.obj.foo()" 117 else { 118 final List suffixes = node 119 .findChildrenOfType(ASTPrimarySuffix.class); 120 121 if (suffixes.size() > 1) { 122 /* 123 * The penultimate suffix contains the method name (the 124 * ultimate contains the method arguments) 125 */ 126 final ASTPrimarySuffix suffixWithMethodName = suffixes 127 .get(suffixes.size() - 2); 128 methodName = suffixWithMethodName.getImage(); 129 } 130 } APPENDIX A. SOURCE CODE 129

131 } 132 133 return Optional.fromNullable(methodName); 134 } 135 136 public static int getNrOfMethodArgs(final ASTPrimaryExpression node) { 137 int nrOfMethodArgs = 0; 138 final List argumentsNodes = node 139 .findDescendantsOfType(ASTArguments.class); 140 if (argumentsNodes.size() == 1) { 141 nrOfMethodArgs = argumentsNodes.get(0).getArgumentCount(); 142 } 143 144 return nrOfMethodArgs; 145 } 146 147 /** 148 * Gets the variables whose value is assigned to a newly created local 149 * variable. 150 *

151 * There may be more than one variables involved if the variable 152 * initialization uses a ternary expression: 153 *

154 * {@code int x = foo() ? firstName : secondName}. 155 * 156 * @param locVarDeclaration 157 * Node describing the local variable declaration in the abstract 158 * syntax tree. 159 * @return The variables whose value is read. This may be none in case there 160 * is no variable used on the right hand side of the assignment; The 161 * new local variable could get its value as a literal: 162 *

163 * {@code int x = 1;} 164 */ 165 public static List getRightHandVars( 166 final ASTLocalVariableDeclaration locVarDeclaration) { 167 final List rightHandVars = new ArrayList<>(); 168 /* APPENDIX A. SOURCE CODE 130

169 * The type of the right hand vars is assumed to be the type of the 170 * assignee 171 */ 172 final Class type = getVarType(locVarDeclaration); 173 174 final List declarators = locVarDeclaration 175 .findDescendantsOfType(ASTVariableDeclarator.class); 176 177 /* 178 * Multiple variables of the same type can be initialized in one 179 * expression: int x = 1, y = 2; 180 */ 181 for (final ASTVariableDeclarator declarator : declarators) { 182 final List primeExps = declarator 183 .findDescendantsOfType(ASTPrimaryExpression.class); 184 for (final ASTPrimaryExpression primeExp : primeExps) { 185 final Optional varNameMaybe = PmdUtil 186 .getVarName(primeExp); 187 if (varNameMaybe.isPresent()) { 188 /* 189 * A field of a variable might have been used in the 190 * variable declaration -> Turn ’myVar.something’ into 191 * ’myVar’ 192 */ 193 final String varName = StringUtil.subBefore( 194 varNameMaybe.get(), "."); 195 196 final Variable rightHandVar = new Variable(varName, type); 197 rightHandVars.add(rightHandVar); 198 } 199 } 200 201 } 202 return rightHandVars; 203 } 204 205 /** 206 * Parse the right hand variables whose value is used in an assigned from a 207 * {@link ASTStatementExpression}. 208 * 209 * @param stmtExp APPENDIX A. SOURCE CODE 131

210 * The statement expression that might hold the data about a 211 * variable assignment. 212 * @return List of {@link Variable}s used in the right side of an 213 * assignment. 214 */ 215 public static List getRightHandVars( 216 final ASTStatementExpression stmtExp) { 217 218 final List rightHandVars = new ArrayList<>(); 219 220 final List expressions = stmtExp 221 .findChildrenOfType(ASTExpression.class); 222 if (expressions.size() == 1) { 223 final ASTExpression exp = expressions.get(0); 224 final List primeExps = exp 225 .findDescendantsOfType(ASTPrimaryExpression.class); 226 for (final ASTPrimaryExpression primeExp : primeExps) { 227 /* 228 * Parse the variable from the primary expression if it’s not a 229 * method call 230 */ 231 if (!containsMethodCall(primeExp)) { 232 final Variable var = new Variable(primeExp); 233 rightHandVars.add(var); 234 } 235 } 236 237 } else if (expressions.size() > 1) { 238 log.warn("More than one ASTExpression in {}", getLocation(stmtExp)); 239 } 240 241 return rightHandVars; 242 } 243 244 /** 245 * Gets the {@link ASTStatementExpression} from a {@link ASTBlockStatement}. 246 * 247 * @param blockStmt 248 * Block statement that might contain the statement APPENDIX A. SOURCE CODE 132

expression. 249 * @return A {@link ASTStatementExpression} wrapped in an {@link Optional} 250 * in case the {@code blockStmt} did not contain a statement 251 * expression. 252 */ 253 public static Optional getStatementExpression( 254 final ASTBlockStatement blockStmt) { 255 return Optional.fromNullable(blockStmt 256 .getFirstDescendantOfType(ASTStatementExpression.class)); 257 } 258 259 /** 260 * 261 * Get the top level statement expressions of this {@code block}. 262 *

263 * Don’t look for statements in nested blocks within {@code block}. 264 * 265 * @param block 266 * The {@link ASTBlock} from which the list of 267 * {@link ASTStatementExpression}s is extracted. 268 * 269 * @return A list of {@link ASTStatementExpression}s 270 */ 271 public static List getStatementExpressions(final ASTBlock block) { 272 final List stmtExpressions = new ArrayList<>(); 273 274 final List blockStmts = block.findChildrenOfType(ASTBlockStatement.class); 275 for (final ASTBlockStatement blockStmt : blockStmts) { 276 277 final String fullClassName = getFullClassName(blockStmt); 278 final int currLine = getLineNr(blockStmt); 279 280 final List astStmts = blockStmt 281 .findChildrenOfType(ASTStatement.class); 282 if (astStmts.size() == 1) { 283 final List stmtExps = astStmts.get(0) 284 .findChildrenOfType(ASTStatementExpression.class); 285 if (stmtExps.size() == 1) { APPENDIX A. SOURCE CODE 133

286 final ASTStatementExpression stmtExp = stmtExps.get(0); 287 stmtExpressions.add(stmtExp); 288 289 } else if (stmtExps.size() > 1) { 290 log.warn("Nr of ASTStatementExpressions in {} at {}: {}", 291 fullClassName, currLine, stmtExps.size()); 292 } 293 } else if (astStmts.size() > 1) { 294 log.warn("Nr of ASTStatements in {} at {}: {}", fullClassName, currLine, astStmts.size()); 295 } 296 } 297 return stmtExpressions; 298 } 299 300 /** 301 * Get the {@link Class type} of a variable from its 302 * {@link ASTLocalVariableDeclaration}. 303 * 304 * @param declaration 305 * The {@link ASTLocalVariableDeclaration} containing the type of 306 * the declared variable. 307 * @return Type of the variable. 308 */ 309 public static Class getVarType(final ASTLocalVariableDeclaration declaration) { 310 final ASTType astType = declaration.getTypeNode(); 311 final Class varType = astType.getType(); 312 return varType; 313 } 314 315 /** 316 * Check if this {@code ASTName} refers to a field from the current class. 317 * 318 * @param var 319 * The {@link ASTName} containing the information about the 320 * variable. 321 * @return Whether the variable is a field. 322 */ 323 public static boolean isFieldFromThisClass(final ASTName var) { APPENDIX A. SOURCE CODE 134

324 boolean isField = false; 325 // Get the line where the variable was declared 326 final NameDeclaration nameDec = var.getNameDeclaration(); 327 /* 328 * The variable must be declared in the current class; otherwise its 329 * name declaration is null 330 */ 331 if (nameDec != null){ 332 final int varDeclarationLine = nameDec.getNode().getBeginLine(); 333 // Name of the variable 334 final String varName = var.getImage(); 335 336 // Get the variable declarations of the class 337 final Set varDecs = var.getScope() 338 .getEnclosingClassScope().getVariableDeclarations() 339 .keySet(); 340 341 for (final VariableNameDeclaration varDec : varDecs) { 342 final int fieldDeclarationLine = varDec.getNode().getBeginLine(); 343 final String fieldName = varDec.getDeclaratorId().getImage(); 344 /* 345 * There is a field declared on the same line which has the same 346 * name as the variable -> The variable is a field 347 */ 348 if ((fieldDeclarationLine == varDeclarationLine) 349 && fieldName.equals(varName)) { 350 isField = true; 351 break; 352 } 353 } 354 } 355 return isField; 356 } 357 358 /** 359 * Get the variable name from a {@link ASTPrimaryExpression}. 360 * 361 * @param primeExp 362 * The {@link ASTPrimaryExpression} that might contain a APPENDIX A. SOURCE CODE 135

variable 363 * name. 364 * @return The variable name wrapped in an {@code Optional} in case 365 * {@code primeExp} contained the name of a method or a literal. 366 */ 367 private static Optional getVarName( 368 final ASTPrimaryExpression primeExp) { 369 String varName = null; 370 371 if (!containsMethodCall(primeExp)) { 372 /* 373 * If the primary expression is not a method call it might contain a 374 * variable 375 */ 376 final ASTPrimaryPrefix primePrefix = primeExp 377 .getFirstChildOfType(ASTPrimaryPrefix.class); 378 if (primePrefix != null){ 379 final ASTName astName = primePrefix 380 .getFirstChildOfType(ASTName.class); 381 if (astName != null){ 382 varName = astName.getImage(); 383 }// else the primary expression contains a literal 384 } else { 385 log.warn( 386 "Could not find primary prefix in primary expression: {}", 387 getLocation(primeExp)); 388 } 389 } 390 return Optional.fromNullable(varName); 391 } 392 } Listing A.5: Utility class used for interacting with the PMD framework and working with its abstract syntax tree. Used by the rules described in A.3 and A.4.

1 package at.jku.sea.plt.core.compose.rules.equivalence; 2 3 import java.util.ArrayList; 4 import java.util.Collection; 5 import java.util.List; APPENDIX A. SOURCE CODE 136

6 import java.util.Set; 7 8 import org.slf4j.Logger; 9 import org.slf4j.LoggerFactory; 10 11 import at.jku.sea.plt.core.artifact.Artifact; 12 import at.jku.sea.plt.core.artifact.Node; 13 import at.jku.sea.plt.core.compose.rules.data.CodeBlock; 14 15 /** 16 * If multiple nodes differ only in the order of adding listeners to GUI objects, they are considered to behave the same. 17 * This rule detects blocks that do the same in this respect. 18 */ 19 public class AddListenerEquivalenceRule { 20 21 /** 22 * Matches strings like {@code obj.addActionListener(listener)} or 23 * {@code addlistener(obj)}. Used to detect add listener calls whose order 24 * shouldn’t matter. 25 *

26 * The {@code (?s)} makes the dot match linebreaks -> statements containing 27 * anonymous listener classes stretching over multiple lines are matched 28 * too. 29 */ 30 private static final String ADD_LISTENER_REGEX = "(?s).*\\.add.*[Ll]istener.*"; 31 final static Logger log = LoggerFactory.getLogger(AddListenerEquivalenceRule.class); 32 33 /** 34 * Check if a {@code groupOfBlocks} are semantically equivalent. 35 * 36 * @param groupOfBlocks 37 * Multiple blocks consisting of code statements. 38 * @return Whether the {@code groupOfBlocks} are semantically equivalent. 39 */ 40 public static boolean blocksAreEquivalent( APPENDIX A. SOURCE CODE 137

41 final List groupOfBlocks) { 42 boolean blocksAreEquivalent = false; 43 final List ignoreTheseNodes = getStatementsThatAreTheSameInAllBlocks(groupOfBlocks); 44 blocksAreEquivalent = statementsAreAllAddListenerCalls(groupOfBlocks, 45 ignoreTheseNodes); 46 47 return blocksAreEquivalent; 48 } 49 50 /** 51 * Compare the single block ({@code block}) against all blocks for equality. 52 * @param blocks the {@link CodeBlock}s that might be equal to {@code block} 53 * @param block the {@link CodeBlock} that might be equal to the other {@code blocks} 54 * @return whether the blocks are equivalent according to this rule 55 */ 56 public static boolean blocksAreEquivalent(final Set blocks, 57 final CodeBlock block) { 58 final List blocksAsList = new ArrayList<>(blocks); 59 blocksAsList.add(block); 60 return blocksAreEquivalent(blocksAsList); 61 } 62 63 /** 64 * Compare the single block ({@code block}) against all blocks for equality. 65 * @param blocks the code blocks, which are a set of grouped {@link Node}s, that might be equal to {@code block} 66 * @param block the {@link CodeBlock} that might be equal to the other {@code blocks} 67 * @return whether the blocks are equivalent according to this rule 68 */ 69 public static boolean blocksAreEquivalent(final Set> blocks, 70 final List currBlock) { 71 APPENDIX A. SOURCE CODE 138

72 // Make defensive copies 73 final CodeBlock codeBlock = CodeBlock.fromNodes(currBlock); 74 final List codeBlocks = asCodeBlocks(blocks); 75 codeBlocks.add(codeBlock); 76 return blocksAreEquivalent(codeBlocks); 77 } 78 79 /** 80 * Converts a set of {@link Nodes} that are grouped inside multiple lists into 81 * the more convenient to work with and easier to understand abstraction of {@link CodeBlock}s. 82 * @param blocks which are to be converted 83 * @return the blocks converted to the {@link CodeBlock} type 84 */ 85 private static List asCodeBlocks(final Set> blocks) { 86 87 final List codeBlocks = new ArrayList<>(); 88 for (final Collection codeStmts : blocks) { 89 codeBlocks.add(CodeBlock.fromNodes(codeStmts)); 90 } 91 return codeBlocks; 92 } 93 94 /** 95 * When deciding whether code blocks only differ in their way of adding listeners, 96 * we have to filter out the statements that are the exactly same in each code block. 97 */ 98 private static List getStatementsThatAreTheSameInAllBlocks( 99 final List groupOfBlocks) { 100 final List sameInAllBlocks = new ArrayList<>(); 101 final CodeBlock firstBlock = groupOfBlocks.get(0); 102 for (int i = 0; i < firstBlock.size(); i++) { 103 final Node statement = firstBlock.getStmt(i); 104 if (isSameInAllBlocks(statement, i, groupOfBlocks)) { 105 sameInAllBlocks.add(statement); 106 } 107 } 108 return sameInAllBlocks; 109 } APPENDIX A. SOURCE CODE 139

110 111 /** 112 * Determines whether this statement represented as a {@link Node} is an add listener call. 113 */ 114 private static boolean isAddListenerCall(final Node statement) { 115 final Artifact artifact = statement.getArtifact(); 116 final String statementStr = artifact.getIdentifier(); 117 118 return statementStr.matches(ADD_LISTENER_REGEX); 119 } 120 121 /** 122 * See if one node of a block is at the same position across every other 123 * block. 124 * 125 * @param node 126 * The node which might be at the same position in every block. 127 * @param pos 128 * The position of the node within its block. 129 * @param groupOfBlocks 130 * All blocks whose node at {@code pos} is compared with 131 * {@code node}. 132 * @return Whether the node is at the same position across {@code allBlocks} 133 * . 134 * 135 */ 136 private static boolean isSameInAllBlocks(final Node node, final int pos, 137 final List groupOfBlocks) { 138 boolean isTheSameInEveryBlock = true; 139 final Artifact statement = node.getArtifact(); 140 for (final CodeBlock otherBlock : groupOfBlocks) { 141 // The other block must have a node at this position to compare with 142 if (pos >= otherBlock.size()) { 143 isTheSameInEveryBlock = false; 144 break; 145 } else { APPENDIX A. SOURCE CODE 140

146 /* 147 * Compare the artifact of the node which is a modifier, a 148 * statement, etc 149 */ 150 final Artifact statementInOtherBlock = otherBlock.getStmt(pos) 151 .getArtifact(); 152 if (!statementInOtherBlock.equals(statement)) { 153 isTheSameInEveryBlock = false; 154 break; 155 } 156 } 157 } 158 return isTheSameInEveryBlock; 159 } 160 161 /** 162 * Helper method that checks if {@code statements} are add listener calls exlusively. 163 * @return true if all the {@code statements} are add listener calls 164 */ 165 private static boolean nothingButAddListenerMethods( 166 final List statements) { 167 boolean statementsAreAllAddListenerCalls = true; 168 for (final Node statement : statements) { 169 // We found one statement that is not a an addListener call 170 if (!isAddListenerCall(statement)) { 171 statementsAreAllAddListenerCalls = false; 172 break; 173 } 174 } 175 return statementsAreAllAddListenerCalls; 176 } 177 178 /** 179 * Checks if the {@code groupOfBlocks} is equivalent according to the add listener equivalence rule. 180 * @return true if all the statements in the {@groupOfBlocks} are add listener calls while ignoring {@code ignoreTheseNodes} APPENDIX A. SOURCE CODE 141

181 */ 182 private static boolean statementsAreAllAddListenerCalls( 183 final List groupOfBlocks, 184 final List ignoreTheseNodes) { 185 boolean allStatementsAreAddListenerCalls = true; 186 for (final CodeBlock block : groupOfBlocks) { 187 final List defensiveCopy = new ArrayList<>( 188 block.getCodeStmts()); 189 // Don’t modify the original block 190 defensiveCopy.removeAll(ignoreTheseNodes); 191 if (!nothingButAddListenerMethods(defensiveCopy)) { 192 allStatementsAreAddListenerCalls = false; 193 break; 194 } 195 } 196 return allStatementsAreAddListenerCalls; 197 } 198 } Listing A.6: Implementation of the Add Listener Equivalence Rule described in 5.5.1.

1 package at.jku.sea.plt.core.compose.rules.consistency; 2 3 import java.util.ArrayList; 4 import java.util.Collections; 5 import java.util.List; 6 import java.util.Set; 7 8 import at.jku.sea.plt.core.artifact.Node; 9 import at.jku.sea.plt.core.compose.rules.EccoJavaParser; 10 import at.jku.sea.plt.core.compose.rules.RuleUtil; 11 import at.jku.sea.plt.core.compose.rules.data.RuleJudgement; 12 import at.jku.sea.plt.core.compose.rules.data.BlockRepair; 13 import at.jku.sea.plt.core.compose.rules.data.CodeBlock; 14 import at.jku.sea.plt.core.compose.rules.data.Variable; 15 16 import com.google.common.base.Optional; 17 18 /** 19 * Every variable must be read before being assigned a new value. 20 *

21 * For every block of Java code this rule keeps a list of variable. APPENDIX A. SOURCE CODE 142

If a 22 * variable receives a new value without its old value being read or a method 23 * being called in between, this rule is violated. 24 *

25 * If this rule finds violations it generates {@link BlockRepair}s which suggest 26 * removing the superfluous variable assignment from the inconsistent 27 * {@link CodeBlock}. 28 * 29 * @author Matthias Braun 30 * 31 */ 32 public class MultipleVarAssignmentRule implements ConsistencyRule { 33 34 /** 35 * The minimum number of code statements in a block necessary to get 36 * analyzed. 37 */ 38 public static final int MIN_NR_OF_STMTS = 2; 39 /** The score a block gets if it has unnecessary variable assignments */ 40 private static final int UNNECESSARY_ASSIGNMENTS_SCORE = RuleJudgement.MIN_SCORE; 41 private static final int NO_UNNECESSARY_ASSIGNMENTS_SCORE = RuleJudgement.MAX_SCORE; 42 43 @Override 44 public String getName() { 45 return getClass().getSimpleName(); 46 } 47 48 @Override 49 public Optional judge(final CodeBlock block) { 50 RuleJudgement judgment; 51 /* Only with enough statements can there be a superfluous one */ 52 if (block.size() >= MIN_NR_OF_STMTS) { 53 judgment = checkForMultipleVarAssignments(block); 54 } else { 55 judgment = null; 56 } APPENDIX A. SOURCE CODE 143

57 58 return Optional.fromNullable(judgment); 59 } 60 61 @Override 62 public String toString() { 63 return getName(); 64 } 65 66 /** 67 * See if a variable is changed without its value being read before. 68 * 69 * @param block 70 * The statements of a block of Java code. They constitute method 71 * calls and variable assignments. 72 */ 73 private RuleJudgement checkForMultipleVarAssignments(final CodeBlock block) { 74 /* 75 * These variables are suspicious: They were set to a value and have not 76 * been read so far. If they get assigned a new value their previous 77 * assignment was superfluous. 78 */ 79 final List watchedVars = new ArrayList<>(); 80 81 // Assume this block is ok at first and needs no repairing 82 int score = NO_UNNECESSARY_ASSIGNMENTS_SCORE; 83 BlockRepair repair = null; 84 85 for (int stmtNr = 0; stmtNr < block.size(); stmtNr++) { 86 final Node stmt = block.getStmt(stmtNr); 87 88 final Set writtenToVars = EccoJavaParser 89 .getWrittenToVars(stmt); 90 91 if (!Collections.disjoint(watchedVars, writtenToVars)) { 92 /* 93 * A variable was assigned a value without its old one being APPENDIX A. SOURCE CODE 144

94 * read previously 95 */ 96 score = UNNECESSARY_ASSIGNMENTS_SCORE; 97 /* 98 * Suggest to remove the repeated assignment from the block. 99 */ 100 repair = RuleUtil.createRepairByRemoval(block, stmtNr, this); 101 } 102 103 watchedVars.addAll(writtenToVars); 104 /* 105 * If a variable was written to and read in the same statement it is 106 * also removed from the watch list 107 */ 108 if (!watchedVars.isEmpty()) { 109 removeUsedVars(watchedVars, stmt); 110 } 111 } 112 113 return RuleJudgement.create(block, score, repair); 114 } 115 116 /** 117 * When a variable is read during a comparison or an assignment it is 118 * removed from the list of watched variables. 119 * 120 * @param watchedVars 121 * List of currently suspicious variables -> They weren’t read 122 * yet. 123 * @param writtenToVars 124 * @param blockStmt 125 * An {@link ASTBlockStatement} that can be a variable assignment 126 * or a method call. 127 */ 128 private void removeUsedVars(final List watchedVars, 129 final Node blockStmt) { 130 APPENDIX A. SOURCE CODE 145

131 /* 132 * Check if a method is called: This resets the list of watched 133 * variables because the method might have read them. 134 */ 135 if (EccoJavaParser.containsMethodCall(blockStmt)) { 136 watchedVars.clear(); 137 } else { 138 // Get the read variables and remove them from the watch list 139 final Set readVars = EccoJavaParser 140 .getReadVars(blockStmt); 141 watchedVars.removeAll(readVars); 142 } 143 } 144 } Listing A.7: Implementation of the Multiple Variable Assignment Rule described in 5.5.2.

1 package at.jku.sea.plt.core.compose.rules.consistency; 2 3 import java.util.ArrayList; 4 import java.util.List; 5 6 import at.jku.sea.plt.core.artifact.Node; 7 import at.jku.sea.plt.core.compose.rules.EccoJavaParser; 8 import at.jku.sea.plt.core.compose.rules.RuleUtil; 9 import at.jku.sea.plt.core.compose.rules.data.BlockRepair; 10 import at.jku.sea.plt.core.compose.rules.data.CodeBlock; 11 import at.jku.sea.plt.core.compose.rules.data.MethodCall; 12 import at.jku.sea.plt.core.compose.rules.data.RuleJudgement; 13 14 import com.google.common.base.Optional; 15 16 /** 17 * This rule is violated when the same setter method is called multiple times 18 * without other method calls happening in between. 19 *

20 * Example for a violation:
21 * myObj.setFoo(5);
22 * int x = 1;
APPENDIX A. SOURCE CODE 146

23 * myObj.setFoo(9);

24 *
This is ok:
25 * myObj.setFoo(5);
26 * bar(); // The value of foo might get used in bar()
27 * myObj.setFoo(9); 28 *
29 * 30 * @author Matthias Braun 31 * 32 */ 33 public class MultipleSetterCallRule implements ConsistencyRule { 34 35 private static final int MIN_NR_OF_STMTS = 2; 36 37 @Override 38 public String getName() { 39 return getClass().getSimpleName(); 40 } 41 42 @Override 43 public Optional judge(final CodeBlock block) { 44 RuleJudgement judgment; 45 /* Only with enough statements can there be a superfluous one */ 46 if (block.size() >= MIN_NR_OF_STMTS) { 47 judgment = checkForSuperfluousSetterCalls(block); 48 } else { 49 judgment = null; 50 } 51 return Optional.fromNullable(judgment); 52 } 53 54 @Override 55 public String toString() { 56 return getName(); 57 } 58 59 /** 60 * Go through every statement of the {@code block}. It’s a rule violation if 61 * the same setter method is called twice on an object, without the object 62 * being used in between those calls. APPENDIX A. SOURCE CODE 147

63 * 64 * @param block 65 * the {@link CodeBlock} that might contain a superfluous setter 66 * call 67 * @return a {@link RuleJudgement} that tells us whether a setter was called 68 * superfluously. It also contains a {@link BlockRepair} if a rule 69 * violation occurred in the {@code block}. 70 */ 71 private RuleJudgement checkForSuperfluousSetterCalls(final CodeBlock block) { 72 int score = RuleJudgement.MAX_SCORE; 73 BlockRepair repair = null; 74 75 final List watchedCalls = new ArrayList<>(); 76 for (final Node stmt : block.getCodeStmts()) { 77 final String identifier = stmt.getArtifact().getIdentifier(); 78 79 // Get the method calls within this single code statement 80 final List calls = EccoJavaParser 81 .getMethodCalls(identifier); 82 for (final MethodCall call : calls) { 83 if (watchedCalls.contains(call)) { 84 // The setter was called again on the same object 85 repair = RuleUtil.createRepairByRemoval(block, stmt, this); 86 score = RuleJudgement.MIN_SCORE; 87 } else { 88 /* 89 * The call was not a setter on the watch list. Reset the 90 * list of watched setter calls because their receiving 91 * object might have been used in this call. 92 */ 93 watchedCalls.clear(); 94 if (call.isSetter()) { 95 /* 96 * This setter call was not among the watched calls -> APPENDIX A. SOURCE CODE 148

97 * Watch it now 98 */ 99 watchedCalls.add(call); 100 101 } 102 } 103 } 104 } 105 return RuleJudgement.create(block, score, repair); 106 } 107 } Listing A.8: Implementation of the Multiple Setter Call Rule described in 5.5.3.

1 package at.jku.sea.plt.core.compose.rules.consistency; 2 3 import java.util.ArrayList; 4 import java.util.List; 5 6 import org.slf4j.Logger; 7 import org.slf4j.LoggerFactory; 8 9 import at.jku.sea.plt.core.artifact.Node; 10 import at.jku.sea.plt.core.compose.rules.EccoJavaParser; 11 import at.jku.sea.plt.core.compose.rules.data.RuleJudgement; 12 import at.jku.sea.plt.core.compose.rules.data.CodeBlock; 13 import at.jku.sea.plt.core.compose.rules.data.MethodCall; 14 import at.jku.sea.plt.core.compose.rules.data.Variable; 15 16 import com.google.common.base.Optional; 17 18 /** 19 * Detects null dereferences causing null pointer exceptions. 20 * 21 * @author Matthias Braun 22 * 23 */ 24 public class UninitializedReadRule implements ConsistencyRule { 25 26 private static final Logger LOG = LoggerFactory 27 .getLogger(UninitializedReadRule.class); 28 private static final int MIN_NR_OF_STMTS = 2; APPENDIX A. SOURCE CODE 149

29 30 @Override 31 public String getName() { 32 return getClass().getSimpleName(); 33 } 34 35 @Override 36 public Optional judge(final CodeBlock block) { 37 RuleJudgement judgment; 38 /* Only with enough statements can there be a superfluous one */ 39 if (block.size() >= MIN_NR_OF_STMTS) { 40 judgment = checkForReadsOfUninitializedVars(block); 41 } else { 42 judgment = null; 43 } 44 45 return Optional.fromNullable(judgment); 46 } 47 48 @Override 49 public String toString() { 50 return getName(); 51 } 52 53 private RuleJudgement checkForReadsOfUninitializedVars(final CodeBlock block) { 54 55 /* 56 * These variables are suspicious: They were not initialized or set to 57 * null. If they get are read this constitutes a rule violation. 58 */ 59 final List watchedVars = new ArrayList<>(); 60 /* 61 * Assume at first that no null pointer exception could occur in the 62 * block 63 */ 64 int score = RuleJudgement.MAX_SCORE; 65 for (final Node node : block.getCodeStmts()) { 66 // Is there a potential null pointer exception in this APPENDIX A. SOURCE CODE 150

block? 67 boolean npe = false; 68 watchedVars.addAll(EccoJavaParser.getDeclaredVars(node)); 69 // It’s fine to use variables that were written to... 70 watchedVars.removeAll(EccoJavaParser.getWrittenToVars(node)); 71 // ...except if they were set to null 72 watchedVars.addAll(EccoJavaParser.getVarsSetToNull(node)); 73 // Check if receivers of method calls are on the watch list 74 final List calls = EccoJavaParser.getMethodCalls(node); 75 for (final MethodCall call : calls) { 76 77 if (watchedVars.contains(call.getReceiver())) { 78 /* 79 * When a node contains multiple statements it’s necessary 80 * to check if the called method would really cause a null 81 * pointer exception 82 */ 83 if (EccoJavaParser.containsMultipleStmts(node)) { 84 if (EccoJavaParser.callCausesNPE(call, node)) { 85 86 /* 87 * After checking the individual statements of the 88 * node, it’s clear that a null pointer exception 89 * must occur. 90 */ 91 npe = true; 92 } 93 } else { 94 /* 95 * An object that is null was called and the node does 96 * not contain multiple statements. 97 */ 98 npe = true; 99 } 100 } 101 if (npe) { APPENDIX A. SOURCE CODE 151

102 score = RuleJudgement.MIN_SCORE; 103 LOG.info("NPE in {}", block); 104 break; 105 } 106 } 107 } 108 return RuleJudgement.create(block, score); 109 } 110 } Listing A.9: Implementation of the Uninitialized Read Rule described in 5.5.4. A.2 Code for ArchStudio

A.2.1 ArchStudio Parsing and Modeling

1 package archstudio.comp.tron.tools.schematron.repairSuggestions; 2 3 import static java.util.stream.Collectors.toList; 4 5 import java.util.ArrayList; 6 import java.util.Arrays; 7 import java.util.List; 8 import java.util.Optional; 9 10 import org.slf4j.Logger; 11 import org.slf4j.LoggerFactory; 12 13 import archstudio.comp.tron.tools.schematron.repairSuggestions.diffs.ArchElement; 14 import archstudio.comp.tron.tools.schematron.repairSuggestions.diffs.ArchInterface; 15 import archstudio.comp.tron.tools.schematron.repairSuggestions.diffs.ArchLink; 16 import archstudio.comp.xarchtrans.XArchFlatTransactionsInterface; 17 import edu.uci.ics.xarchutils.ObjRef; 18 import edu.uci.ics.xarchutils.XArchFlatInterface; 19 20 /** 21 * Helps reading and manipulating ArchStudio models. 22 */ 23 public final class ArchUtil { APPENDIX A. SOURCE CODE 152

24 private static final Logger LOG = LoggerFactory.getLogger(ArchUtil.class); 25 /** 26 * The number of model copies. This is used to assign unique names. 27 */ 28 private static int copyCounter; 29 30 /** 31 * Adds a component to its {@code parent} object. 32 *

33 * Note: Interfaces of the component are not supported yet. 34 * 35 * @param parent 36 * the parent object inside of which the resulting component 37 * should live. An archstructure for example. 38 * @param id 39 * the ID of the component 40 * @param description 41 * the description of the component 42 * @param xarch 43 * the {@link XArchFlatInterface} used to parse and manipulate the model 44 * @return the {@link ObjRef} of the added component wrapped in an 45 * {@link Optional} in case the adding went wrong 46 */ 47 public static Optional addComponentToModel(final ObjRef parent, 48 final String id, final String description, 49 final XArchFlatTransactionsInterface xarch) { 50 ObjRef newComponentRef; 51 52 try { 53 final ObjRef xArchRef = xarch.getXArch(parent); 54 final ObjRef typesContextRef = xarch.createContext(xArchRef, 55 "types"); 56 newComponentRef = xarch.create(typesContextRef, "component"); 57 xarch.set(newComponentRef, "id", id); 58 final ObjRef newDescriptionRef = xarch.create(typesContextRef, 59 "description"); 60 xarch.set(newDescriptionRef, "Value", description); 61 xarch.set(newComponentRef, "Description", newDescriptionRef); 62 APPENDIX A. SOURCE CODE 153

63 xarch.add(parent, "component", newComponentRef); 64 } catch (final Exception ex) { 65 newComponentRef = null; 66 LOG.warn("Could not add component to model", ex); 67 } 68 return Optional.ofNullable(newComponentRef); 69 } 70 71 /** 72 * Gets all elements of type {@code elementType} from the {@code model}. 73 * 74 * @param elementType the {@link ElementType} of the objects that we want from the {@code model} 75 * @param model 76 * the model that contains the elements 77 * @param xarch the ArchStudio interface allowing interaction with the model and its context 78 * @return the element references that conform to the {@code elementType} inside the {@code model} 79 */ 80 public static List getAll( 81 final ArchElement.ElementType elementType, final ObjRef model, 82 final XArchFlatInterface xarch) { 83 84 final ObjRef typesContextRef = xarch.createContext(model, "Types"); 85 86 // One model can contain multiple ArchStructures 87 final ObjRef[] archStructures = xarch.getAllElements(typesContextRef, 88 "ArchStructure", model); 89 90 final List components = new ArrayList<>(); 91 92 for (final ObjRef archStructure : archStructures) { 93 final ObjRef[] curComponents = xarch.getAll(archStructure, 94 elementType.toString()); 95 components.addAll(Arrays.asList(curComponents)); 96 } 97 return components; 98 } 99 APPENDIX A. SOURCE CODE 154

100 /** 101 * Gets an ArchStructure from a model. 102 * @param xarch 103 * the {@link XArchFlatInterface} used to parse and manipulate the model 104 * @return a reference to the ArchStructure 105 */ 106 public static ObjRef getArchStructureFromModel(final ObjRef model, 107 final XArchFlatInterface xarch) { 108 109 final ObjRef typesContextRef = xarch.createContext(model, "types"); 110 final ObjRef archStructure = xarch.getElement(typesContextRef, 111 "archStructure", model); 112 return archStructure; 113 } 114 115 /** 116 * Gets the component references from a {@code model}. 117 * 118 * @param model 119 * the model that contains the components 120 * @param xarch the ArchStudio interface allowing interaction with the model and its context 121 * @return the component references of the {@code model} 122 */ 123 public static List getComponents(final ObjRef model, 124 final XArchFlatInterface xarch) { 125 return getAll(ArchElement.ElementType.COMPONENT, model, xarch); 126 } 127 128 public static List getConnectors(final ObjRef model, 129 final XArchFlatInterface xarch) { 130 return getAll(ArchElement.ElementType.CONNECTOR, model, xarch); 131 } 132 133 /** 134 * Makes a copy of {@code origModel}. 135 *

136 * As a side effect the cloned model will show up as an entry in the 137 * Archstudio file manager 138 * APPENDIX A. SOURCE CODE 155

139 * @param origModel 140 * copy this model 141 * @param xarch the ArchStudio interface allowing interaction with the model and its context 142 * @return a {@link ObjRef} to the copied model 143 */ 144 public static ObjRef getCopyOfModel(final ObjRef origModel, 145 final XArchFlatInterface xarch) { 146 final String uri = "copyNr" + copyCounter; 147 copyCounter++; 148 final ObjRef copy = xarch.cloneXArch(origModel, uri); 149 return copy; 150 } 151 152 /** 153 * Gets an {@link ArchInterface} from the model by its {@code id}. 154 * 155 * @param id 156 * the interface’s ID 157 * @param xarch 158 * the {@link XArchFlatInterface} used to parse and manipulate the model 159 * @return an {@link ArchInterface} 160 */ 161 public static ArchInterface getInterface(final String id, 162 final XArchFlatInterface xarch) { 163 164 final ObjRef iFaceRef = xarch.getByID(id); 165 166 return ArchInterface.fromRef(iFaceRef, xarch); 167 } 168 169 /** 170 * Parse the interface IDs that a {@code linkRef} contains. 171 *

172 * Example output: 173 *

174 * interface.82828891.14739bdfbbf.4c9cf6b9a3dbbf9d.55 175 * 176 * @param linkRef 177 * the reference to a linkRef in the ArchStudio model 178 * @param xarch 179 * the {@link XArchFlatInterface} used to parse and APPENDIX A. SOURCE CODE 156

manipulate the model 180 * @return the list of interfaces this {@code linkRef} contains 181 */ 182 public static List getInterfaceIds(final ObjRef linkRef, 183 final XArchFlatInterface xarch) { 184 185 // Get the points inside the linkRef element 186 final ObjRef[] pointRefs = xarch.getAll(linkRef, "point"); 187 // Get the ids of the interface from inside the point references 188 final List ids = Arrays 189 .stream(pointRefs) 190 .map(pointRef -> (ObjRef) xarch.get(pointRef, 191 "anchorOnInterface")) 192 .map(anchor -> (String) xarch.get(anchor, "href")) 193 // Remove the leading pound sign of the anchor 194 .map(anchor -> { 195 if (anchor != null && anchor.startsWith("#")) { 196 return anchor.substring(1); 197 } else 198 return anchor; 199 }).collect(toList()); 200 return ids; 201 } 202 203 /** 204 * Gets an {@link ArchLink} by its {@code linkId}. 205 * 206 * @param linkId 207 * the ID of the link in the ArchStudio model 208 * @param xarch 209 * an {@link XArchFlatInterface} used to find the reference to 210 * the link 211 * @return an initialized {@link ArchLink} 212 */ 213 public static ArchLink getLink(final String linkId, 214 final XArchFlatInterface xarch) { 215 final ObjRef linkRef = xarch.getByID(linkId); 216 return ArchLink.fromRef(linkRef, xarch); 217 } 218 219 /** 220 * Gets the link references from a {@code model}. APPENDIX A. SOURCE CODE 157

221 * 222 * @param model 223 * the model that contains the links 224 * @param xarch 225 * the {@link XArchFlatInterface} used to parse and manipulate the model 226 * @return the link references of the {@code model} 227 */ 228 public static List getLinks(final ObjRef model, 229 final XArchFlatInterface xarch) { 230 231 final ObjRef typesContextRef = xarch.createContext(model, "Types"); 232 233 // One model can contain multiple ArchStructures 234 final ObjRef[] archStructures = xarch.getAllElements(typesContextRef, 235 "ArchStructure", model); 236 237 final List links = new ArrayList<>(); 238 239 for (final ObjRef archStructure : archStructures) { 240 final ObjRef[] curComponents = xarch.getAll(archStructure, "Link"); 241 links.addAll(Arrays.asList(curComponents)); 242 } 243 return links; 244 } 245 246 /** 247 * Parses the value of a {@code property} from an element. 248 * @param property we want to get the value from this property 249 * @param elemRef the reference to the element that should contain the {@code property} 250 * @param xarch the ArchStudio interface allowing interaction with the model and its context 251 * @return the value of the element wrapped in an {@link Optional} in case the property could not be parsed 252 */ 253 public static Optional parseValFromElem(final String property, 254 final ObjRef elemRef, final XArchFlatInterface xarch) { 255 String val = null; APPENDIX A. SOURCE CODE 158

256 try { 257 final Object valRef = xarch.get(elemRef, property); 258 259 if (valRef instanceof ObjRef) { 260 val = (String) xarch.get((ObjRef) valRef, "Value"); 261 } else { 262 val = (String) valRef; 263 } 264 265 } catch (final Exception e) { 266 LOG.warn("Could not parse value of property {} from element reference {}", 267 property, elemRef, e); 268 } 269 return Optional.ofNullable(val); 270 } 271 272 /** 273 * Puts a number of ArchStudio elements identified by {@code refsToRecontextualize} in 274 * a {@code newContext}. 275 * @param newContext the new ArchStudio context to put the objects in 276 * @param refsToRecontextualize these references identify the objects that need recontextualization 277 * @param typeOfThing the type of the objects to recontextualize 278 * @param xarch the ArchStudio interface allowing interaction with the model and its context 279 * @return the references to the recontextualized objects 280 */ 281 public static ObjRef[] recontextualize(final ObjRef newContext, 282 final ObjRef[] refsToRecontextualize, final String typeOfThing, 283 final XArchFlatInterface xarch) { 284 final ObjRef[] recontextualizedRefs = new ObjRef[refsToRecontextualize.length]; 285 for (int i = 0; i < refsToRecontextualize.length; i++) { 286 final ObjRef ref = refsToRecontextualize[i]; 287 final ObjRef newRef = xarch.recontextualize(newContext, 288 typeOfThing, ref); 289 recontextualizedRefs[i] = newRef; 290 } 291 return recontextualizedRefs; 292 } APPENDIX A. SOURCE CODE 159

293 294 // This is a utility class not meant to be instantiated 295 private ArchUtil() { 296 } 297 298 /** 299 * Sets the description of an @code{element}. 300 * 301 * @param element 302 * the element whose description is set 303 * @param description 304 * the description the @code{element} will have 305 * @param xarch 306 * the {@link XArchFlatInterface} used to parse and manipulate the model 307 */ 308 public void setDescription(final ObjRef element, final String description, 309 final XArchFlatInterface xarch) { 310 final ObjRef currDescription = (ObjRef) xarch.get(element, 311 "Description"); 312 xarch.set(currDescription, "Value", description); 313 } 314 } Listing A.10: Shared utility methods to parse and manipulate ArchStudio’s models.

1 package au.uow.archelements; 2 3 import static java.util.stream.Collectors.toList; 4 5 import java.io.IOException; 6 import java.util.ArrayList; 7 import java.util.List; 8 9 import org.slf4j.Logger; 10 import org.slf4j.LoggerFactory; 11 import org.xml.sax.SAXException; 12 13 import archstudio.comp.tron.tools.schematron.repairSuggestions.ArchUtil; 14 import au.uow.rules.NoMandatoryLinkOnOptionalIface; APPENDIX A. SOURCE CODE 160

15 import edu.uci.ics.xarchutils.ObjRef; 16 import edu.uci.ics.xarchutils.XArchFlatImpl; 17 import edu.uci.ics.xarchutils.XArchFlatInterface; 18 19 /** 20 * Represents an entire ArchStudio model including all connectors and components. 21 * Contains factory methods to parse the XML representation of the ArchStudio model. 22 * 23 */ 24 public class ArchModel { 25 private static final Logger LOG = LoggerFactory.getLogger(ArchModel.class); 26 private List components = new ArrayList<>(); 27 private List connectors = new ArrayList<>(); 28 private String name = "Unnamed model"; 29 30 public static ArchModel empty() { 31 return new ArchModel(); 32 } 33 34 /** 35 * Creates a Java representation of an ArchStudio model. 36 * @param modelRef the reference to the model used in ArchStudio 37 * @param xarch is used as a context and to parse the model from XML 38 */ 39 public static ArchModel from(final ObjRef modelRef, 40 final XArchFlatInterface xarch) { 41 final List compRefs = ArchUtil.getComponents(modelRef, xarch); 42 final List connectorRefs = ArchUtil.getConnectors(modelRef, 43 xarch); 44 45 final List connectors = connectorRefs.stream() 46 .map(connRef -> ArchConnector.fromRef(connRef, xarch)) 47 .collect(toList()); 48 49 final List components = compRefs.stream() 50 .map(compRef -> ArchComponent.fromRef(compRef, xarch)) 51 .collect(toList()); APPENDIX A. SOURCE CODE 161

52 53 final ArchModel model = new ArchModel(); 54 model.setComponents(components); 55 model.setConnectors(connectors); 56 57 return model; 58 } 59 60 /** 61 * Creates a Java representation of an ArchStudio model. 62 * @param modelUrl URL to the ArchStudio model as XML we want to have a Java representation of 63 * @param xarch is used as a context and to parse the model from XML 64 */ 65 public static ArchModel from(final String modelUrl, 66 final XArchFlatInterface xarch) { 67 final ObjRef modelRef = xarch.getOpenXArch(modelUrl); 68 return from(modelRef, xarch); 69 } 70 71 /** 72 * Only allow clients to get an instance of this class via the static factory method. 73 */ 74 private ArchModel() { 75 } 76 77 public List getComponents() { 78 return this.components; 79 } 80 81 public List getConnectors() { 82 return this.connectors; 83 } 84 85 public String getName() { 86 return name; 87 } 88 89 public void setConnectors(final List connectors) { 90 this.connectors = connectors; 91 } APPENDIX A. SOURCE CODE 162

92 93 public void setName(final String name) { 94 this.name = name; 95 } 96 97 private void setComponents(final List components) { 98 this.components = components; 99 } 100 101 } Listing A.11: Representation of an ArchStudio model in Java. It knows how to create itself by parsing the underlying XML ArchStudio model.

1 package au.uow.archelements; 2 3 import archstudio.comp.tron.tools.schematron.repairSuggestions.ArchUtil; 4 import edu.uci.ics.xarchutils.ObjRef; 5 import edu.uci.ics.xarchutils.XArchFlatInterface; 6 import edu.uci.isr.xarch.XArchPropertyMetadata; 7 import edu.uci.isr.xarch.XArchTypeMetadata; 8 import eu.bges.jutils.strings.ParseUtil; 9 import org.slf4j.Logger; 10 import org.slf4j.LoggerFactory; 11 12 import java.util.Locale; 13 14 /** 15 * Links, interfaces, components, and connectors are elements in ArchStudio. 16 * 17 * @author Matthias Braun 18 */ 19 public abstract class ArchElement { 20 /** 21 * Whether this element in context of a software product line is optional. 22 */ 23 public boolean isOptional() { 24 return isOptional; 25 } 26 APPENDIX A. SOURCE CODE 163

27 private boolean isOptional = false; 28 private XArchFlatInterface xarch; 29 30 protected XArchFlatInterface getXarch() { 31 return this.xarch; 32 } 33 34 /** 35 * The different kinds of elements that exist in an ArchStudio architecture. 36 * 37 * @author Matthias Braun 38 */ 39 public enum ElementType { 40 COMPONENT("Component"), CONNECTOR("Connector"), LINK("Link"), INTERFACE( 41 "Interface"), UNKNOWN(UNKNOWN_VAL); 42 private final String elementType; 43 44 /** 45 * Creates an {@link au.uow.archelements.ArchElement.ElementType} from a {@code string}. 46 *

47 * The {@code string} must be either "connector", "component", or 48 * "link". Case doesn’t matter. 49 * 50 * @param string the string to construct the {@link au.uow.archelements.ArchElement.ElementType} from. 51 * @return the corresponding {@link au.uow.archelements.ArchElement.ElementType} or 52 * {@link au.uow.archelements.ArchElement.ElementType#UNKNOWN} if the {@code string} didn’t 53 * match any known {@link au.uow.archelements.ArchElement.ElementType} 54 */ 55 public static ElementType from(final String string) { 56 final ElementType type; 57 switch (string.toLowerCase(Locale.ROOT)) { 58 case "component": 59 type = COMPONENT; 60 break; 61 case "link": APPENDIX A. SOURCE CODE 164

62 type = LINK; 63 break; 64 case "connector": 65 type = CONNECTOR; 66 break; 67 default: 68 type = UNKNOWN; 69 } 70 return type; 71 } 72 73 /** 74 * Creates an instance of ElementType. 75 * 76 * @param elementKind the string of the element kind as it appears also in the 77 * ArchStudio models 78 */ 79 ElementType(final String elementKind) { 80 this.elementType = elementKind; 81 } 82 83 @Override 84 public String toString() { 85 return this.elementType; 86 } 87 } 88 89 private static final Logger LOG = LoggerFactory 90 .getLogger(ArchElement.class); 91 92 private static final String UNKNOWN_VAL = "Unknown"; 93 private static final String UNKNOWN_ELEM_ID = ""; 94 private static final String UNKNOWN_ELEM_DESCRIPTION = ""; 95 96 private String id = UNKNOWN_ELEM_ID; 97 private String description = UNKNOWN_ELEM_DESCRIPTION; 98 99 private ObjRef elementRef = new ObjRef(); 100 101 protected void setOptional(boolean optional) { 102 this.isOptional = optional; 103 } APPENDIX A. SOURCE CODE 165

104 105 /** 106 * Creates a concrete {@link au.uow.archelements.ArchElement} from the "add" DiffPart XML of 107 * ArchStudio. For example an {@link ArchConnector}. 108 *

109 * If no {@link au.uow.archelements.ArchElement} could be successfully created, return an 110 * {@link EmptyArchElement}. 111 * 112 * @param diffPartRef the reference to the element in the ArchStudio architecture 113 * @param xarch is used to parse data from ArchStudio 114 * @return an initialized {@link au.uow.archelements.ArchElement}, never null 115 */ 116 public static ArchElement createFromAddDiffPart(final ObjRef diffPartRef, 117 final XArchFlatInterface xarch) { 118 119 final ElementType type = getElementType(diffPartRef, xarch); 120 final ObjRef elemRef = (ObjRef) xarch.get(diffPartRef, type.toString()); 121 ArchElement element; 122 switch (type) { 123 case COMPONENT: 124 element = new ArchComponent(); 125 break; 126 case LINK: 127 element = new ArchLink(); 128 break; 129 case CONNECTOR: 130 element = new ArchConnector(); 131 break; 132 default: 133 element = new EmptyArchElement(); 134 } 135 element.elementRef = elemRef; 136 element.init(elemRef, xarch); 137 element.xarch = xarch; 138 return element; 139 } 140 APPENDIX A. SOURCE CODE 166

141 /** 142 * Creates a concrete {@link au.uow.archelements.ArchElement} from the "remove" DiffPart XML of 143 * ArchStudio. For example an {@link ArchConnector}. 144 *

145 * If no {@link au.uow.archelements.ArchElement} could be successfully created, return an 146 * {@link EmptyArchElement}. 147 * 148 * @param diffPartRef the reference to the element in the ArchStudio architecture 149 * @param xarch is used to parse data from ArchStudio 150 * @return an initialized {@link au.uow.archelements.ArchElement}, never null 151 */ 152 public static ArchElement createFromRemoveDiffPart( 153 final ObjRef removeDiffPart, final XArchFlatInterface xarch) { 154 // Get the ID of the removed element 155 final String removeId = (String) xarch.get(removeDiffPart, "removeId"); 156 final String elementType = ParseUtil.subBefore(removeId, "."); 157 final ArchElement removedElement; 158 switch (ElementType.from(elementType)) { 159 case COMPONENT: 160 removedElement = new ArchComponent(); 161 break; 162 case LINK: 163 removedElement = new ArchLink(); 164 break; 165 case INTERFACE: 166 removedElement = new ArchConnector(); 167 break; 168 default: 169 removedElement = new EmptyArchElement(); 170 } 171 removedElement.setId(removeId); 172 removedElement.xarch = xarch; 173 return removedElement; 174 } 175 176 /** 177 * Gets the description of an element such as a component, connector or link. APPENDIX A. SOURCE CODE 167

178 */ 179 public static String parseDescriptionFromElement(final ObjRef elemRef, 180 final XArchFlatInterface xarch) { 181 return ArchUtil.parseValFromElem("Description", elemRef, xarch).orElse( 182 UNKNOWN_VAL); 183 } 184 185 /** 186 * Parses the kind of element contained in this {@code diffPart}. 187 *

188 * This only works for Add diffParts because Remove diffParts contain no 189 * information about their removed element, just its ID. 190 * 191 * @param diffPartRef contains an element of a certain {@link au.uow.archelements.ArchElement.ElementType} 192 * @param xarch the {@link XArchFlatInterface} used to access the element 193 * @return the {@link au.uow.archelements.ArchElement.ElementType} from the {@code diffPart} 194 */ 195 private static ElementType getElementType(final ObjRef diffPartRef, 196 final XArchFlatInterface xarch) { 197 198 ElementType elementKind = ElementType.UNKNOWN; 199 200 for (final ElementType curElementKind : ElementType.values()) { 201 final String kind = curElementKind.toString(); 202 if (xarch.get(diffPartRef, kind) != null){ 203 elementKind = curElementKind; 204 break; 205 } 206 } 207 return elementKind; 208 } 209 210 protected ArchElement() { 211 } 212 213 public boolean isMandatory() { 214 return !isOptional; APPENDIX A. SOURCE CODE 168

215 } 216 217 /** 218 * Adds an object to the model. 219 * @param typesContextRef the context that is needed to add this object 220 * @param addObjRef add the object that this reference identifies to the model 221 * @param xarch the {@link XArchFlatInterface} used to parse and change the model 222 */ 223 abstract public void addToModel(ObjRef typesContextRef, 224 ObjRef addObjRef, XArchFlatInterface xarch); 225 226 @Override 227 public boolean equals(final Object obj) { 228 if (this == obj) 229 return true; 230 if (obj == null) 231 return false; 232 if (getClass() != obj.getClass()) 233 return false; 234 final ArchElement other = (ArchElement) obj; 235 /* 236 * Don’t consider the element reference for checking equality because 237 * two elements can semantically be completely the same but have 238 * different references that would make them unequal 239 */ 240 // if (elementRef == null) { 241 // if (other.elementRef != null) 242 // return false; 243 // } else if (!elementRef.equals(other.elementRef)) 244 // return false; 245 if (description == null){ 246 if (other.description != null) 247 return false; 248 } else if (!description.equals(other.description)) 249 return false; 250 if (id == null){ 251 if (other.id != null) 252 return false; 253 } else if (!id.equals(other.id)) APPENDIX A. SOURCE CODE 169

254 return false; 255 return true; 256 } 257 258 public String getDescription() { 259 return this.description; 260 } 261 262 public String getId() { 263 return this.id; 264 } 265 266 public ObjRef getRef() { 267 return this.elementRef; 268 } 269 270 /** 271 * Parses the ArchStudio model to find out whether this element is 272 * optional or mandatory. 273 *

274 * An element is considered optional if it has the property "optional" and 275 * contains an "options:Optional" block 276 * 277 * @param elemRef the reference to this element in the ArchStudio model 278 * @param xarch the {@link XArchFlatInterface} used to parse the model 279 * @return true if this element is optional, false, if it’s mandatory 280 */ 281 protected boolean parseWhetherOptional(final ObjRef elemRef, 282 final XArchFlatInterface xarch) { 283 284 boolean isOptional = false; 285 286 /* 287 * If an element has the "optional" property it means that it can be an 288 * optional because it was at one time "promoted to an Optional" via 289 * Archipelago. But it doesn’t have to be optional because if it’s made APPENDIX A. SOURCE CODE 170

290 * mandatory via Archipelago it still has that property "optional". 291 */ 292 final XArchTypeMetadata metaData = xarch.getTypeMetadata(elemRef); 293 final XArchPropertyMetadata optionalProp = metaData.getProperty("optional"); 294 295 if (optionalProp != null){ 296 // If optionalProp were null, this would be an invalid operation 297 final ObjRef optional = (ObjRef) xarch.get(elemRef, "Optional"); 298 isOptional = optional != null; 299 } 300 return isOptional; 301 } 302 303 @Override 304 public int hashCode() { 305 final int prime = 31; 306 int result = 1; 307 /* 308 * Don’t include the element reference because it’s an ArchStudio 309 * implementation detail 310 */ 311 // result = prime * result 312 // + ((elementRef == null) ? 0 : elementRef.hashCode()); 313 result = prime * result 314 + ((description == null) ? 0 : description.hashCode()); 315 result = prime * result + ((id == null) ? 0 : id.hashCode()); 316 return result; 317 } 318 319 public abstract ArchElement merge(final ArchElement otherElem); 320 321 @Override 322 public String toString() { 323 return "ArchElement [id=" + id + ", description=" + description + "]"; 324 } 325 326 /** APPENDIX A. SOURCE CODE 171

327 * Parses the ID from an element. 328 * @param elemRef we want to pars the ID of the element that this reference identifies 329 * @param xarch the {@link XArchFlatInterface} used to parse the model 330 * @return the ID of the element 331 */ 332 protected String parseIdFromElement(final ObjRef elemRef, 333 final XArchFlatInterface xarch) { 334 final String id = (String) xarch.get(elemRef, "id"); 335 return id; 336 } 337 338 protected void setDescription(final String elemDescription) { 339 this.description = elemDescription; 340 } 341 342 343 protected void setId(final String id) { 344 this.id = id; 345 346 } 347 348 protected void setXarch(final XArchFlatInterface xarch) { 349 this.xarch = xarch; 350 351 } 352 353 protected void setRef(final ObjRef ref) { 354 this.elementRef = ref; 355 } 356 357 /** 358 * Initializes this object by parsing the element from the ArchStudio model. 359 * @param elemRef we want to parse the data from the element that this 360 * reference identifies and initialize the element with its data 361 * @param xarch the {@link XArchFlatInterface} used to parse the model 362 */ 363 abstract void init(ObjRef elemRef, XArchFlatInterface xarch); 364 APPENDIX A. SOURCE CODE 172

365 } Listing A.12: Abstract representation of an ArchStudio element in Java. Links, interfaces, connectors, and components count as elements in ArchStudio and are a subtype of this abstract class.

1 package au.uow.archelements; 2 3 import archstudio.comp.tron.tools.schematron.repairSuggestions.ArchUtil; 4 import archstudio.comp.tron.tools.schematron.repairSuggestions.diffs.ArchInterface.Direction; 5 import edu.uci.ics.xarchutils.ObjRef; 6 import edu.uci.ics.xarchutils.XArchFlatImpl; 7 import edu.uci.ics.xarchutils.XArchFlatInterface; 8 import org.slf4j.Logger; 9 import org.slf4j.LoggerFactory; 10 import org.xml.sax.SAXException; 11 12 import java.io.FileNotFoundException; 13 import java.io.IOException; 14 import java.util.*; 15 import java.util.stream.Collectors; 16 17 import static java.util.stream.Collectors.toList; 18 19 /** 20 * Represents a component in an ArchStudio model. 21 * 22 * @author Matthias Braun 23 */ 24 public class ArchComponent extends ArchElement { 25 26 private static final Logger LOG = LoggerFactory 27 .getLogger(ArchComponent.class); 28 29 private List interfaces = new ArrayList<>(); 30 31 32 /** 33 * Creates a Java representation of an ArchStudio component. 34 * @param compRef the reference to the component used in ArchStudio 35 * @param xarch is used as a context and to parse the model from APPENDIX A. SOURCE CODE 173

XML 36 */ 37 public static ArchComponent fromRef(final ObjRef compRef, 38 final XArchFlatInterface xarch) { 39 final ArchComponent comp = new ArchComponent(); 40 comp.init(compRef, xarch); 41 return comp; 42 } 43 44 /** 45 * Adds an {@link ArchInterface} to this component 46 * @param iface the interface to add 47 */ 48 public void addInterface(final ArchInterface iface) { 49 interfaces.add(iface); 50 } 51 52 @Override 53 public void addToModel(final ObjRef typesContextRef, 54 final ObjRef parentObjRef, final XArchFlatInterface xarch) { 55 final ObjRef newComponentRef = xarch.create(typesContextRef, 56 "component"); 57 // Add ID and description 58 xarch.set(newComponentRef, "id", this.getId()); 59 final ObjRef newDescriptionRef = xarch.create(typesContextRef, 60 "description"); 61 xarch.set(newDescriptionRef, "value", this.getDescription()); 62 xarch.set(newComponentRef, "description", newDescriptionRef); 63 64 // Add the interfaces to the model 65 interfaces.forEach(iface -> { 66 /* 67 * Interfaces need to know their containing component to add 68 * themselves to the XML model 69 */ 70 iface.setComponentRef(newComponentRef); 71 iface.addToModel(typesContextRef, newComponentRef, xarch); 72 }); 73 74 xarch.set(parentObjRef, "Component", newComponentRef); 75 76 } 77 APPENDIX A. SOURCE CODE 174

78 @Override 79 public boolean equals(final Object obj) { 80 if (this == obj) 81 return true; 82 if (!super.equals(obj)) 83 return false; 84 if (getClass() != obj.getClass()) 85 return false; 86 final ArchComponent other = (ArchComponent) obj; 87 if (interfaces == null){ 88 if (other.interfaces != null) 89 return false; 90 } else if (!interfaces.equals(other.interfaces)) 91 return false; 92 return true; 93 } 94 95 /** 96 * Returns the dependencies of this component. 97 *

98 * A dependency is defined as a component that is connected to this 99 * component via an outgoing interface and a link. 100 * 101 * @return the components this component depends on 102 */ 103 public List getDependencies() { 104 final List outgoingLinkIds = interfaces.stream() 105 .filter(iFace -> iFace.getDirection().equals(Direction.OUT)) 106 .map(iFace -> iFace.getLinkId()) 107 // If an interface has no attached link, the ID is unknown 108 .filter(linkId -> !linkId.equals(ArchLink.UNKNOWN_LINK_ID)) 109 .collect(toList()); 110 final List outgoingLinks = outgoingLinkIds.stream() 111 .map(linkId -> ArchUtil.getLink(linkId, getXarch())) 112 .collect(toList()); 113 final List dependencies = outgoingLinks.stream() 114 .map(link -> link.getOther(this)) 115 // Filter out the links that are only attached to one component 116 .filter(Optional::isPresent).map(Optional::get) 117 .collect(toList()); 118 LOG.info("Dependencies of {}: {}", getDescription(), dependencies); 119 return dependencies; APPENDIX A. SOURCE CODE 175

120 } 121 122 /** 123 * Gets all the interfaces that are part of to this component. 124 * @return this component’s interface 125 */ 126 public List getInterfaces() { 127 return this.interfaces; 128 } 129 130 /** 131 * Gets all the IDs of the links that are attached interfaces of this component. 132 * @return the IDs of the links connected to this component 133 */ 134 public List getLinkIds() { 135 return getInterfaces().stream().map(iFace -> iFace.getLinkId()) 136 .collect(Collectors.toList()); 137 } 138 139 @Override 140 public int hashCode() { 141 final int prime = 31; 142 int result = super.hashCode(); 143 result = prime * result 144 + ((interfaces == null) ? 0 : interfaces.hashCode()); 145 return result; 146 } 147 148 149 @Override 150 public ArchElement merge(final ArchElement otherElem) { 151 final ArchComponent mergedComp; 152 if (otherElem instanceof ArchComponent) { 153 final ArchComponent otherComp = (ArchComponent) otherElem; 154 // otherElem is an ArchComponent or a subtype thereof and is not null 155 mergedComp = new ArchComponent(); 156 mergedComp 157 .setInterfaces(mergeInterfaces(otherComp.getInterfaces())); 158 if (this.getDescription().equals(otherElem.getDescription())) { 159 mergedComp.setDescription(this.getDescription()); 160 } else { APPENDIX A. SOURCE CODE 176

161 LOG.info("Can’t merge differing descriptions of " + this 162 + " and " + otherElem); 163 } 164 if (this.getId().equals(otherElem.getId())) { 165 mergedComp.setId(this.getId()); 166 } else { 167 LOG.info("Can’t merge differing IDs of " + this + " and " 168 + otherElem); 169 } 170 if (this.getRef().equals(otherElem.getRef())) { 171 mergedComp.setRef(this.getRef()); 172 } else { 173 LOG.info("Can’t merge differing references of " + this + " and " + otherElem); 174 } 175 } else { 176 LOG.warn("Can’t merge elements because {} is not an ArchComponent", 177 otherElem); 178 mergedComp = this; 179 } 180 return mergedComp; 181 } 182 183 184 public void setInterfaces(final List interfaces) { 185 this.interfaces = interfaces; 186 } 187 188 @Override 189 public String toString() { 190 return "ArchComponent [" + super.toString() + "interfaces=" 191 + interfaces + "]"; 192 } 193 194 195 /** 196 * Merges all {@code otherInterfaces} with this interfaces by adding them together, removing duplicates 197 * @return the merged interfaces as a list 198 */ 199 private List mergeInterfaces( 200 final List otherInterfaces) { APPENDIX A. SOURCE CODE 177

201 202 // Use a set to remove duplicates 203 final Set mergedInterfaces = new HashSet<>(); 204 mergedInterfaces.addAll(this.interfaces); 205 mergedInterfaces.addAll(otherInterfaces); 206 mergedInterfaces.forEach(iface -> LOG.info( 207 "Descr: {} | Hashcode: {}", iface.getDescription(), 208 iface.hashCode())); 209 210 return new ArrayList<>(mergedInterfaces); 211 212 } 213 214 /** 215 * Parses the interfaces from a ArchStudio component. 216 * @param compRef we parse the interfaces from the component that this reference identifies 217 * @param xarch the {@link XArchFlatInterface} used to parse and manipulate the model 218 * @return a list of the parsed interfaces as {@link ArchInterface}s 219 */ 220 protected List parseInterfaces(final ObjRef compRef, 221 final XArchFlatInterface xarch) { 222 223 final ObjRef[] interfaceRefs = xarch.getAll(compRef, "Interface"); 224 final List interfaces = Arrays 225 .stream(interfaceRefs) 226 .map(interfaceRef -> ArchInterface.fromRef(interfaceRef, xarch)) 227 .collect(toList()); 228 229 return interfaces; 230 231 } 232 233 234 @Override 235 void init(final ObjRef compRef, final XArchFlatInterface xarch) { 236 final String compId = parseIdFromElement(compRef, xarch); 237 setId(compId); 238 final String compDescr = parseDescriptionFromElement(compRef, APPENDIX A. SOURCE CODE 178

xarch); 239 setDescription(compDescr); 240 final List interfaces = parseInterfaces(compRef, xarch); 241 this.interfaces = interfaces; 242 this.setXarch(xarch); 243 244 setOptional(parseWhetherOptional(compRef, xarch)); 245 } 246 } Listing A.13: Representation of an ArchStudio component in Java. It knows how to create itself by parsing the underlying XML ArchStudio model.

1 package au.uow.archelements; 2 3 import java.util.List; 4 import java.util.Locale; 5 import java.util.Optional; 6 7 import org.slf4j.Logger; 8 import org.slf4j.LoggerFactory; 9 10 import archstudio.comp.tron.tools.schematron.repairSuggestions.ArchUtil; 11 import edu.uci.ics.xarchutils.ObjRef; 12 import edu.uci.ics.xarchutils.XArchFlatInterface; 13 14 /** 15 * Represents an interface that is part of an {@link ArchComponent} in the ArchStudio model. 16 */ 17 public class ArchInterface extends ArchElement { 18 19 /** 20 * The different directions an interface can have. 21 */ 22 public enum Direction { 23 IN, OUT, INOUT, NONE, UNKNOWN; 24 25 /** 26 * ArchStudio expects the directions to be lower case 27 */ APPENDIX A. SOURCE CODE 179

28 @Override 29 public String toString() { 30 return this.name().toLowerCase(Locale.ROOT); 31 } 32 } 33 34 private static final Logger LOG = LoggerFactory 35 .getLogger(ArchInterface.class); 36 37 private Direction direction; 38 39 /** 40 * The reference to the component that contains this interface 41 */ 42 private ObjRef componentRef; 43 private String connectedLinkId; 44 45 /** 46 * Creates an instance of this class. 47 * @param description the description of this interface 48 * @param id the ID of this interface 49 * @param direction the direction of this interface 50 * @return an instance of this class 51 */ 52 public static ArchInterface create(final String description, 53 final String id, final Direction direction) { 54 final ArchInterface iface = new ArchInterface(); 55 iface.setDescription(description); 56 iface.setDirection(direction); 57 iface.setId(id); 58 return iface; 59 } 60 61 /** 62 * Parses the data from the ArchStudio model to create an instance of this class. 63 * @param interfaceRef the data for the interface we create is parsed from the 64 * interface identified by this reference in the ArchStudio model 65 * @param xarch is used as a context and to parse the model from XML 66 * @return an instance of this class 67 */ APPENDIX A. SOURCE CODE 180

68 public static ArchInterface fromRef(final ObjRef interfaceRef, 69 final XArchFlatInterface xarch) { 70 final ArchInterface iFace = new ArchInterface(); 71 iFace.init(interfaceRef, xarch); 72 iFace.componentRef = xarch.getParent(interfaceRef); 73 final ObjRef modelRef = xarch.getXArch(interfaceRef); 74 iFace.connectedLinkId = parseConnectedLinkId(modelRef, iFace.getId(), 75 xarch); 76 iFace.setXarch(xarch); 77 78 return iFace; 79 } 80 81 /** 82 * Finds the ID of the link the interface with {@code ifaceId} is attached to. 83 * @param modelRef the reference to the model which contains this interface 84 * @param ifaceId we want to find the link connected to the interface with this ID 85 * @param xarch is used as a context and to parse the model from XML 86 * @return the ID of the link 87 */ 88 private static String parseConnectedLinkId(final ObjRef modelRef, 89 final String ifaceId, final XArchFlatInterface xarch) { 90 String connectedLinkId = ArchLink.UNKNOWN_LINK_ID; 91 final List linkRefs = ArchUtil.getLinks(modelRef, xarch); 92 for (final ObjRef linkRef : linkRefs) { 93 final List interfaces = ArchUtil.getInterfaceIds(linkRef, 94 xarch); 95 if (interfaces.contains(ifaceId)) { 96 connectedLinkId = ArchUtil.parseValFromElem("id", linkRef, 97 xarch).orElse(ArchLink.UNKNOWN_LINK_ID); 98 } 99 } 100 return connectedLinkId; 101 } 102 103 private ArchInterface() { 104 APPENDIX A. SOURCE CODE 181

105 } 106 107 @Override 108 public void addToModel(final ObjRef typesContextRef, 109 final ObjRef parentObjRef, final XArchFlatInterface xarch) { 110 final ObjRef ifaceRef = xarch.create(typesContextRef, "interface"); 111 xarch.add(componentRef, "interface", ifaceRef); 112 113 // Set the interface’s ID 114 xarch.set(ifaceRef, "id", this.getId()); 115 116 // Set the interface’s description 117 final ObjRef ifaceDescRef = xarch 118 .create(typesContextRef, "description"); 119 xarch.set(ifaceDescRef, "value", this.getDescription()); 120 xarch.set(ifaceRef, "description", ifaceDescRef); 121 122 // Set the interface’s direction 123 final ObjRef ifaceDirectionRef = xarch.create(typesContextRef, 124 "direction"); 125 xarch.set(ifaceDirectionRef, "value", this.getDirection().toString()); 126 xarch.set(ifaceRef, "direction", ifaceDirectionRef); 127 128 } 129 130 @Override 131 public boolean equals(final Object obj) { 132 if (this == obj) 133 return true; 134 if (!super.equals(obj)) 135 return false; 136 if (getClass() != obj.getClass()) 137 return false; 138 final ArchInterface other = (ArchInterface) obj; 139 /* 140 * Don’t consider the component reference for checking equality because 141 * two interfaces can semantically be completely the same but their 142 * components may have different references that would make them unequal APPENDIX A. SOURCE CODE 182

143 */ 144 // if (componentRef == null) { 145 // if (other.componentRef != null) 146 // return false; 147 // } else if (!componentRef.equals(other.componentRef)) 148 // return false; 149 return direction == other.direction; 150 } 151 152 /** 153 * @return the component this interface is a part of. 154 */ 155 public ArchComponent getContainingComp() { 156 final ArchComponent component = ArchComponent.fromRef(componentRef, 157 getXarch()); 158 return component; 159 } 160 161 public Direction getDirection() { 162 return direction; 163 } 164 165 /** 166 * Gets the link attached to this interface. 167 * 168 * @return the link attached to this interface 169 */ 170 public Optional getLink() { 171 172 if (this.connectedLinkId == null 173 || this.connectedLinkId.equals(ArchLink.UNKNOWN_LINK_ID)) { 174 return Optional.empty(); 175 } else { 176 return Optional.of(ArchUtil.getLink(this.connectedLinkId, 177 getXarch())); 178 } 179 } 180 181 public String getLinkId() { 182 return connectedLinkId; 183 } 184 APPENDIX A. SOURCE CODE 183

185 @Override 186 public int hashCode() { 187 final int prime = 31; 188 int result = super.hashCode(); 189 result = prime * result 190 + ((direction == null) ? 0 : direction.hashCode()); 191 return result; 192 } 193 194 @Override 195 public ArchElement merge(final ArchElement otherElem) { 196 throw new UnsupportedOperationException("Implement me"); 197 } 198 199 public void setComponentRef(final ObjRef componentRef) { 200 this.componentRef = componentRef; 201 202 } 203 204 @Override 205 public String toString() { 206 return "ArchInterface [direction=" + direction + ", getDescription()=" 207 + getDescription() + ", getId()=" + getId() + "]"; 208 } 209 210 /** 211 * Parses the direction from the {@code elemRef}. 212 * @param elemRef the reference to the element that might contain the direction of the interface 213 * @param xarch is used as a context and to parse the model from XML 214 * @return the {@link Direction} of the interface 215 */ 216 private Direction parseDirection(final ObjRef elemRef, 217 final XArchFlatInterface xarch) { 218 final Optional directionStrMaybe = ArchUtil.parseValFromElem( 219 "Direction", elemRef, xarch); 220 221 final Direction direction; 222 if (directionStrMaybe.isPresent()) { 223 switch (directionStrMaybe.get()) { APPENDIX A. SOURCE CODE 184

224 case "in": 225 direction = Direction.IN; 226 break; 227 case "out": 228 direction = Direction.OUT; 229 break; 230 case "inout": 231 direction = Direction.INOUT; 232 break; 233 case "none": 234 direction = Direction.NONE; 235 break; 236 default: 237 direction = Direction.UNKNOWN; 238 } 239 } else { 240 direction = Direction.UNKNOWN; 241 } 242 return direction; 243 244 } 245 246 private void setDirection(final Direction direction) { 247 this.direction = direction; 248 } 249 250 @Override 251 void init(final ObjRef elemRef, final XArchFlatInterface xarch) { 252 final String interfaceDescr = parseDescriptionFromElement(elemRef, 253 xarch); 254 setDescription(interfaceDescr); 255 final String id = parseIdFromElement(elemRef, xarch); 256 257 setId(id); 258 final Direction direction = parseDirection(elemRef, xarch); 259 this.setDirection(direction); 260 setOptional(parseWhetherOptional(elemRef, xarch)); 261 } 262 263 } Listing A.14: Representation of an ArchStudio interface in Java. It knows how to create itself by parsing the underlying XML ArchStudio model. APPENDIX A. SOURCE CODE 185

1 package au.uow.archelements; 2 3 import static java.util.stream.Collectors.toList; 4 5 import java.util.List; 6 import java.util.Optional; 7 8 import org.slf4j.Logger; 9 import org.slf4j.LoggerFactory; 10 11 import archstudio.comp.tron.tools.schematron.repairSuggestions.ArchUtil; 12 import edu.uci.ics.xarchutils.ObjRef; 13 import edu.uci.ics.xarchutils.XArchFlatInterface; 14 15 /** 16 * Represents a link that is potentially attached to an {@link ArchInterface} in the ArchStudio model. 17 */ 18 public class ArchLink extends ArchElement { 19 public static final String UNKNOWN_LINK_ID = "UNKNOWN LINK ID"; 20 private List interfaceIds; 21 22 private static final Logger LOG = LoggerFactory.getLogger(ArchLink.class); 23 24 /** 25 * Parses the data from the ArchStudio model to create an instance of this class. 26 * @param interfaceRef the data for the link we create is parsed from the 27 * link identified by this reference in the ArchStudio model 28 * @param xarch is used as a context and to parse the model from XML 29 * @return an instance of this class 30 */ 31 public static ArchLink fromRef(final ObjRef linkRef, 32 final XArchFlatInterface xarch) { 33 final ArchLink link = new ArchLink(); 34 link.init(linkRef, xarch); 35 return link; 36 } 37 APPENDIX A. SOURCE CODE 186

38 @Override 39 public void addToModel(final ObjRef typesContextRef, 40 final ObjRef parentObjRef, final XArchFlatInterface xarch) { 41 final ObjRef linkRef = xarch.create(typesContextRef, "link"); 42 // Add ID and description 43 xarch.set(linkRef, "id", this.getId()); 44 final ObjRef newDescriptionRef = xarch.create(typesContextRef, 45 "description"); 46 xarch.set(newDescriptionRef, "value", this.getDescription()); 47 xarch.set(linkRef, "description", newDescriptionRef); 48 49 // Add the points the link is attached to 50 interfaceIds.forEach(id -> addEndPointToLink(id, typesContextRef, 51 xarch, linkRef)); 52 53 // Add the link to its parent element in the XML 54 xarch.set(parentObjRef, "Link", linkRef); 55 56 } 57 58 /** 59 * Finds the component that is connected to {@code comp} via this link. 60 * 61 * @param comp 62 * we are interested in the component that is connected to 63 * {@code comp} via this link 64 * @return the component that is connected to {@code comp} via this link 65 */ 66 public Optional getOther(final ArchComponent comp) { 67 68 // The interfaces of this link 69 final List interfaces = interfaceIds.stream() 70 .map(iFaceId -> ArchUtil.getInterface(iFaceId, getXarch())) 71 .collect(toList()); 72 73 // The components connected to comp (should be zero or one) 74 final List otherComps = interfaces.stream() 75 .map(iFace -> iFace.getContainingComp()) 76 .filter(otherComp -> !comp.equals(otherComp)).collect(toList()); APPENDIX A. SOURCE CODE 187

77 78 ArchComponent resComp; 79 if (otherComps.size() == 1) { 80 resComp = otherComps.get(0); 81 } else { 82 LOG.info("There are {} components connected to component {}", 83 otherComps.size(), comp.getId()); 84 resComp = null; 85 } 86 return Optional.ofNullable(resComp); 87 } 88 89 @Override 90 public ArchElement merge(final ArchElement otherElem) { 91 throw new UnsupportedOperationException("Implement me"); 92 } 93 94 @Override 95 public String toString() { 96 return "ArchLink{" + "interfaceIds=" + interfaceIds + "} " 97 + super.toString(); 98 } 99 100 /** 101 * Adds an endpoint to a link in order to connect it with an interface. 102 * @param interfaceId we want the link to be connected to the interface with this ID 103 * @param typesContextRef needed to manipulate the model 104 * @param xarch is used as a context and to manipulate the model in XML 105 * @param linkRef the reference to the link in the ArchStudio model we want to connect 106 */ 107 private void addEndPointToLink(final String interfaceId, 108 final ObjRef typesContextRef, final XArchFlatInterface xarch, 109 final ObjRef linkRef) { 110 111 /* 112 * Create a link whether or not the interface ID is null. Null means 113 * that the link’s end point is not attached to a component’s or 114 * connector’s interface. This is the same way ArchStudio handles APPENDIX A. SOURCE CODE 188

115 * interface IDs that are null 116 */ 117 final ObjRef newPointRef = xarch.create(typesContextRef, "point"); 118 final ObjRef newAnchorRef = xarch.create(typesContextRef, "XMLLink"); 119 xarch.set(newPointRef, "anchorOnInterface", newAnchorRef); 120 121 xarch.add(linkRef, "point", newPointRef); 122 123 if (interfaceId != null){ 124 xarch.set(newAnchorRef, "type", "simple"); 125 126 // ArchStudio uses the # to find the interfaces 127 String hashOrNothing; 128 if (interfaceId.startsWith("#")) { 129 hashOrNothing = ""; 130 } else { 131 hashOrNothing = "#"; 132 } 133 xarch.set(newAnchorRef, "href", hashOrNothing + interfaceId); 134 } 135 } 136 137 private void setInterfaceIds(final List interfaceIds) { 138 this.interfaceIds = interfaceIds; 139 140 } 141 142 @Override 143 void init(final ObjRef elemRef, final XArchFlatInterface xarch) { 144 final String id = parseIdFromElement(elemRef, xarch); 145 final String descr = parseDescriptionFromElement(elemRef, xarch); 146 setId(id); 147 setDescription(descr); 148 final List interfaceIds = ArchUtil.getInterfaceIds(elemRef, 149 xarch); 150 setInterfaceIds(interfaceIds); 151 setXarch(xarch); 152 setOptional(parseWhetherOptional(elemRef, xarch)); 153 154 final ObjRef modelRef = xarch.getXArch(elemRef); APPENDIX A. SOURCE CODE 189

155 } 156 } Listing A.15: Representation of an ArchStudio link in Java. It knows how to create itself by parsing the underlying XML ArchStudio model.

A.2.2 ArchStudio Rules

1 package au.uow.rules 2 3 import _root_.au.uow.archelements.ArchModel 4 5 /** 6 * Common trait of ArchRules that check [[ArchModel]]s for inconsistencies. 7 * Created by Matthias Braun on 11/5/14. 8 */ 9 trait ArchRule { 10 11 /** 12 * Checks whether the given ArchModel conforms to this rule. 13 * @param model the ArchModel that is checked by the rule 14 * @return an ArchRuleResult that contains the outcome of the rule checking 15 */ 16 def check(model: ArchModel): ArchRuleResult 17 18 /** Returns the textual description of this rule */ 19 def description: RuleDescription 20 } 21 22 case class RuleDescription(text:String) Listing A.16: The trait defining the operations that a rule for ArchStudio must support.

1 package au.uow.rules 2 3 /** 4 * Contains the judgment of an ArchRule about an ArchModel. 5 * @param text what the rule has to say about the checked model in plain English 6 * @param resultType whether the ArchModel conforms to the rule 7 * @param descr the description of the rule that was applied to the APPENDIX A. SOURCE CODE 190

model 8 */ 9 case class ArchRuleResult(text: String, resultType: ArchRuleResultType)(implicit descr: RuleDescription) { 10 11 override def toString: String = { 12 val ruleResult = resultType match { 13 case TestPassed => "Passed" 14 case TestFailed => "Failed" 15 } 16 s"Result of rule ’${descr.text}’: $text --> $ruleResult" 17 } 18 19 /** 20 * Indicates whether an ArchModel has passed the assessment of an [[au.uow.rules.ArchRule]]. 21 */ 22 sealed trait ArchRuleResultType 23 24 case object TestPassed extends ArchRuleResultType 25 26 case object TestFailed extends ArchRuleResultType Listing A.17: The outcome of an rule evaluation telling us, if the model conforms to the rule or not.

1 package au.uow.rules 2 3 import _root_.au.uow.archelements.{ArchComponent, ArchModel} 4 5 import scala.collection.JavaConversions._ 6 import scala.collection.JavaConverters._ 7 8 /** 9 * This rule verifies that there are no circular dependencies in an ArchStudio model. 10 * Created by Matthias Braun on 10/17/14. 11 */ 12 object CircularDependency extends ArchRule { 13 14 implicit def description = RuleDescription("Circular dependencies between components are forbidden") 15 APPENDIX A. SOURCE CODE 191

16 /** @return whether there is a circular dependency in the ‘model‘ */ 17 def check(model: ArchModel) = 18 // A component indirectly depending on itself constitutes a circular dependency 19 if (model.getComponents.exists(dependsOnItself)) 20 ArchRuleResult("Circular dependency detected", TestFailed) 21 else 22 ArchRuleResult("No circular dependencies", TestPassed) 23 def description = "Circular dependencies between components are forbidden" 24 25 /** 26 * Returns true if an [[ArchComponent]] ‘comp‘ depends on itself. 27 * @param comp 28 * the ArchComponent that might depend on itself 29 * @return 30 * true if ‘comp‘ has a dependency on itself, directly or indirectly 31 */ 32 def dependsOnItself(comp: ArchComponent) = { 33 val dependencies = comp.getDependencies.asScala.toList 34 compDependsOnItself(comp, dependencies, List()) 35 } 36 37 /** 38 * Returns true if an [[ArchComponent}]] ‘comp‘ has a direct or indirect dependency on itself. 39 * @param comp 40 * the ArchComponent that might depend on itself 41 * @param curDeps 42 * if these dependencies contain ‘comp‘ then there is a circular dependency in the model 43 * @param checkedDeps 44 * these components were already checked whether they contain the original component 45 * @return 46 * true if comp has a dependency on itself, directly or indirectly 47 */ 48 def compDependsOnItself(comp: ArchComponent, curDeps: List[ArchComponent], checkedDeps: List[ArchComponent]): Boolean = { 49 APPENDIX A. SOURCE CODE 192

50 if (curDeps.contains(comp)) { 51 /* The component under test has itself directly or indirectly as a dependency 52 * -> Circular dependency detected */ 53 true 54 } else { 55 curDeps.foreach(dependency => { 56 // Check the dependencies of the dependency if we haven’t done so yet 57 if (!checkedDeps.contains(dependency)) { 58 val newDependencies = dependency.getDependencies.asScala.toList 59 return compDependsOnItself(comp, newDependencies, checkedDeps ++ curDeps) 60 } 61 }) 62 /* We’ve visited all dependencies without meeting the original component again -> No circular dependency */ 63 false 64 } 65 } 66 } Listing A.18: ArchStudio rule for detecting circular dependencies in a model.

1 package au.uow.rules 2 3 import _root_.au.uow.archelements.ArchInterface.Direction._ 4 import _root_.au.uow.archelements.{ArchConnector, ArchModel} 5 import scala.collection.JavaConversions._ 6 7 /** 8 * Rule finding connectors that have no incoming interfaces. 9 * Created by Matthias Braun on 4/1/15. 10 */ 11 object ConnectorHasIncomingIface extends ArchRule{ 12 13 implicit def description = RuleDescription("Connectors must have an incoming interface") 14 15 val hasIncomingIface = (_:ArchConnector).getInterfaces.exists(_.getDirection == IN) 16 val connectorsWithNoIncomingIfaces = APPENDIX A. SOURCE CODE 193

(_:ArchModel).getConnectors.filterNot(hasIncomingIface) 17 18 /** @return whether all connectors in the model have an incoming interface */ 19 def check(model: ArchModel) = 20 if (connectorsWithNoIncomingIfaces(model).isEmpty) 21 ArchRuleResult("All connectors have at least one incoming interface", TestPassed) 22 else 23 ArchRuleResult("These connectors don’t have an incoming interface: " + connectorsWithNoIncomingIfaces, TestFailed) 24 } Listing A.19: ArchStudio rule for finding connectors inside a model that have no incoming interface.

1 package au.uow.rules 2 3 import _root_.au.uow.archelements.ArchInterface.Direction._ 4 import _root_.au.uow.archelements.{ArchConnector, ArchModel} 5 import scala.collection.JavaConversions._ 6 7 /** 8 * This rule verifies that every connector in an ArchModel has at least one outgoing interface. 9 * Created by Matthias Braun on 11/5/14. 10 */ 11 object ConnectorHasOutgoingIface extends ArchRule { 12 13 implicit def description = RuleDescription("Connectors must have an outgoing interface") 14 15 val hasOutgoingIface = (_:ArchConnector).getInterfaces.exists(_.getDirection == OUT) 16 val connectorsWithNoOutgoingIfaces = (_:ArchModel).getConnectors.filterNot(hasOutgoingIface) 17 18 /** @return whether all connectors in the model have an outgoing interface */ 19 def check(model: ArchModel) = 20 if (connectorsWithNoOutgoingIfaces(model).isEmpty) 21 ArchRuleResult("All connectors have at least one outgoing interface", TestPassed) APPENDIX A. SOURCE CODE 194

22 else 23 ArchRuleResult("These connectors don’t have an outgoing interface: " + connectorsWithNoOutgoingIfaces, TestFailed) 24 } Listing A.20: ArchStudio rule for finding connectors inside a model that have no outgoing interface.

1 package au.uow.rules 2 3 import _root_.au.uow.archelements.ArchModel 4 import scala.collection.JavaConversions._ 5 6 /** 7 * This rule verifies that every mandatory component has at least one mandatory interface. 8 * Created by Matthias Braun on 11/11/14. 9 */ 10 object MandatoryComponentHasMandatoryIface extends ArchRule { 11 12 implicit def description = RuleDescription("Mandatory components must have at least one mandatory interface") 13 14 /** @return whether this model contains a mandatory component that only has optional interfaces */ 15 def check(model: ArchModel) = { 16 17 // Filter out those mandatory components that have a mandatory interface. 18 // This leaves us with the potentially unconnected components 19 val mandComps = model.getComponents.filter(_.isMandatory) 20 val hasMandatoryIface = (_: ArchComponent).getInterfaces.exists(_.isMandatory) 21 val mandCompsWithoutMandIface = mandComps.filterNot(hasMandatoryIface) 22 23 if (mandCompsWithoutMandIface.isEmpty) 24 ArchRuleResult("Each mandatory component has a mandatory interface", TestPassed) 25 else { 26 val affectedCompsDescription = mandCompsWithoutMandIface.map(_.getDescription).mkString(", ") APPENDIX A. SOURCE CODE 195

27 ArchRuleResult("Mandatory components without a mandatory interface: " + affectedCompsDescription, TestFailed) 28 } 29 } 30 } Listing A.21: ArchStudio rule for finding mandatory components inside a model that have only optional interfaces.

1 package au.uow.rules 2 3 import _root_.au.uow.archelements.ArchModel 4 import scala.collection.JavaConversions._ 5 6 /** 7 * This rule verifies that there is at least one mandatory component in the model. 8 * Created by Matthias Braun on 11/11/14. 9 */ 10 object ModelHasMandatoryComponents extends ArchRule { 11 12 implicit def description = RuleDescription("Model must have at least one mandatory component") 13 14 /** @return whether there are mandatory components within the model */ 15 def check(model: ArchModel) = 16 if (model.getComponents.exists(_.isMandatory)) 17 ArchRuleResult("Model has a mandatory component", TestPassed) 18 else 19 ArchRuleResult("Model doesn’t have any mandatory components", TestFailed) 20 } Listing A.22: ArchStudio rule for detecting whether a model has mandatory components

1 package au.uow.rules 2 3 import _root_.au.uow.archelements.ArchModel 4 import scala.collection.JavaConversions._ 5 6 /** APPENDIX A. SOURCE CODE 196

7 * This rule verifies that every mandatory component has at least one mandatory interface. 8 * Created by Matthias Braun on 11/11/14. 9 */ 10 object MandatoryComponentHasMandatoryIface extends ArchRule { 11 12 implicit def description = RuleDescription("Mandatory components must have at least one mandatory interface") 13 14 /** @return whether this model contains a mandatory component that only has optional interfaces */ 15 def check(model: ArchModel) = { 16 17 // Filter out those mandatory components that have a mandatory interface. 18 // This leaves us with the potentially unconnected components 19 val mandComps = model.getComponents.filter(_.isMandatory) 20 val hasMandatoryIface = (_: ArchComponent).getInterfaces.exists(_.isMandatory) 21 val mandCompsWithoutMandIface = mandComps.filterNot(hasMandatoryIface) 22 23 if (mandCompsWithoutMandIface.isEmpty) 24 ArchRuleResult("Each mandatory component has a mandatory interface", TestPassed) 25 else { 26 val affectedCompsDescription = mandCompsWithoutMandIface.map(_.getDescription).mkString(", ") 27 ArchRuleResult("Mandatory components without a mandatory interface: " + affectedCompsDescription, TestFailed) 28 } 29 } 30 } Listing A.23: ArchStudio rule for finding components that have mandatory links on their optional interfaces Appendix B

Performance Measurements Raw Data

B.1 Efficiency

1 INFO: 6 input products: P0, P1, P12, P16, P2, P8, (Main.java:200) Main.start 2 INFO: Tree pruning enabled: false (Main.java:201) Main.start 3 INFO: Nr of removed nodes: 0 (Main.java:202) Main.start 4 INFO: Total number of code blocks created: 20146 (Main.java:203) Main.start 5 INFO: 5 desired features: Diagrams, ArgoUML, Class, COGNITIVE, LOGGING, (Main.java:71) Main.main 6 INFO: Runtime: 39811 ms (Main.java:73) Main.main Listing B.1: Efficiency measurement: generate a new ArgoUML product with five features from six input products. Tree pruning via rule application is turned off.

1 INFO: 6 input products: P0, P1, P12, P16, P2, P8, (Main.java:200) Main.start 2 INFO: Tree pruning enabled: true (Main.java:201) Main.start 3 INFO: Nr of removed nodes: 6594 (Main.java:202) Main.start 4 INFO: Total number of code blocks created: 20146 (Main.java:203) Main.start 5 INFO: 5 desired features: Diagrams, ArgoUML, Class, COGNITIVE, LOGGING, (Main.java:71) Main.main

197 APPENDIX B. PERFORMANCE MEASUREMENTS RAW DATA 198

6 INFO: Runtime: 63424 ms (Main.java:73) Main.main Listing B.2: Efficiency measurement: generate a new ArgoUML product with five features from six input products. Tree pruning via rule application is turned on.

1 INFO: 3 input products: P10, P4, P5, (Main.java:200) Main.start 2 INFO: Tree pruning enabled: false (Main.java:201) Main.start 3 INFO: Nr of removed nodes: 0 (Main.java:202) Main.start 4 INFO: Total number of code blocks created: 74 (Main.java:203) Main.start 5 INFO: 5 desired features: drawRect, drawLine, wipe, DPL, color, (Main.java:71) Main.main 6 INFO: Runtime: 1530 ms (Main.java:73) Main.main Listing B.3: Efficiency measurement: generate a new Draw Product Line product with five features from three input products. Tree pruning via rule application is turned off.

1 INFO: 3 input products: P10, P4, P5, (Main.java:200) Main.start 2 INFO: Tree pruning enabled: true (Main.java:201) Main.start 3 INFO: Nr of removed nodes: 16 (Main.java:202) Main.start 4 INFO: Total number of code blocks created: 72 (Main.java:203) Main.start 5 INFO: 5 desired features: drawRect, drawLine, wipe, DPL, color, (Main.java:71) Main.main 6 INFO: Runtime: 1609 ms (Main.java:73) Main.main Listing B.4: Efficiency measurement: generate a new Draw Product Line product with five features from three input products. Tree pruning via rule application is turned on.

1 INFO: 4 input products: P11, P17, P29, P3, (Main.java:200) Main.start 2 INFO: Tree pruning enabled: false (Main.java:201) Main.start 3 INFO: Nr of removed nodes: 0 (Main.java:202) Main.start 4 INFO: Total number of code blocks created: 525 (Main.java:203) Main.start 5 INFO: 11 desired features: StopMovie, QuitPlayer, Pause, StartPlayer, VRCInterface, VOD, ChangeServer, Detail, StartMovie, SelectMovie, PlayImm, (Main.java:71) Main.main APPENDIX B. PERFORMANCE MEASUREMENTS RAW DATA 199

6 INFO: Runtime: 3247 ms (Main.java:73) Main.main Listing B.5: Efficiency measurement: generate a new Video On Demand product with eleven features from four input products. Tree pruning via rule application is turned off.

1 INFO: 4 input products: P11, P17, P29, P3, (Main.java:200) Main.start 2 INFO: Tree pruning enabled: true (Main.java:201) Main.start 3 INFO: Nr of removed nodes: 6 (Main.java:202) Main.start 4 INFO: Total number of code blocks created: 525 (Main.java:203) Main.start 5 INFO: 11 desired features: StopMovie, QuitPlayer, Pause, StartPlayer, VRCInterface, VOD, ChangeServer, Detail, StartMovie, SelectMovie, PlayImm, (Main.java:71) Main.main 6 INFO: Runtime: 5121 ms (Main.java:73) Main.main Listing B.6: Efficiency measurement: generate a new Video On Demand product with eleven features from four input products. Tree pruning via rule application is turned on.

B.2 Effectiveness

1 Input products: [P1, P3, P11] 2 Target product: P2 (LOGGING, Diagrams, Class, ArgoUML) 3 Nr of code blocks (excluding alternate variants): 20506 4 Total nr of code blocks (including alternate variants) before filtering: 20534 5 Total nr of code blocks (including alternate variants) after filtering: 20498 6 Average nr of variants per code block before filtering: 1.001365454 7 Average nr of variants per code block after filtering: 0.9996098703 8 Nr of code blocks that were corrected (but not filtered out): 9 9 Average number of variants per code block before filtering (not considering code blocks that had only one variant before filtering): 2.2173913043 10 Average number of variants per code block after filtering (not considering code blocks that had only one variant before filtering): 1.2173913043 Listing B.7: Effectiveness measurement: generate a new ArgoUML product with four features from three input products. Compare the number of variants per code block before and after the rules have filtered the code. APPENDIX B. PERFORMANCE MEASUREMENTS RAW DATA 200

1 Input products: [P4, P5, P10] 2 Target product: P12 (ACTIVITYDIAGRAM, STATEDIAGRAM, Diagrams, Class, ArgoUML) 3 Nr of code blocks (excluding alternate variants): 19145 4 Total nr of code blocks (including alternate variants) before filtering: 19282 5 Total nr of code blocks (including alternate variants) after filtering: 19260 6 Average nr of variants per code block before filtering: 1.0071559154 7 Average nr of variants per code block after filtering: 1.0060067903 8 Nr of code blocks that were corrected (but not filtered out): 13 9 Average number of variants per code block before filtering (not considering code blocks that had only one variant before filtering): 4.6052631579 10 Average number of variants per code block after filtering (not considering code blocks that had only one variant before filtering): 4.3684210526 Listing B.8: Effectiveness measurement: generate a new ArgoUML product with five features from three input products. Compare the number of variants per code block before and after the rules have filtered the code.

1 Input products: [P5, P8, P10] 2 Target product: P7 (Diagrams, ArgoUML, ACTIVITYDIAGRAM, Class, COGNITIVE, LOGGING) 3 Nr of code blocks (excluding alternate variants): 20759 4 Total nr of code blocks (including alternate variants) before filtering: 20891 5 Total nr of code blocks (including alternate variants) after filtering: 20861 6 Average nr of variants per code block before filtering: 1.0063586878 7 Average nr of variants per code block after filtering: 1.0049135315 8 Nr of code blocks that were corrected (but not filtered out): 15 9 Average number of variants per code block before filtering (not considering code blocks that had only one variant before filtering): 4.5675675676 10 Average number of variants per code block after filtering (not considering code blocks that had only one variant before filtering): 4.1081081081 Listing B.9: Effectiveness measurement: generate a new ArgoUML product with six features from three input products. Compare the number of variants per code block before and after the rules have filtered the code. APPENDIX B. PERFORMANCE MEASUREMENTS RAW DATA 201

1 Input products --> target product 2 3 [P4, P5, P10] --> P12 4 [P5, P8, P10] --> P7 5 [P1, P3, P11] --> P2 6 7 Average number of variants per code block before filtering: 1.0049164046 8 Average number of variants per code block after filtering: 1.0034596921 9 Total nr of variants that were corrected (but not filtered out): 37 10 Average number of variants per code block before filtering (excluding those code blocks that had only one variant): 4.0306122449 11 Average number of variants per code block after filtering (excluding those code blocks that had only one variant): 3.1326530612 Listing B.10: Average numbers for the results from the last three ArgoUML effectiveness measurements (B.7, B.8, and B.9)

1 Input products: [P1, P3, P11] 2 Target product: P2 (DPL, drawRect) 3 Nr of code blocks (excluding alternate variants): 58 4 Total nr of code blocks (including alternate variants) before filtering: 58 5 Total nr of code blocks (including alternate variants) after filtering: 58 6 Average nr of variants per code block before filtering: 1 7 Average nr of variants per code block after filtering: 1 8 Nr of code blocks that were corrected (but not filtered out): 0 9 There are no code blocks with multiple variants Listing B.11: Effectiveness measurement: generate a new Draw Product Line product with two features from three input products. Compare the number of variants per code block before and after the rules have filtered the code.

1 Input products: [P4, P5, P10] 2 Target product: P12 (DPL, drawLine, drawRect, color, wipe) 3 Nr of code blocks (excluding alternate variants): 60 4 Total nr of code blocks (including alternate variants) before filtering: 74 5 Total nr of code blocks (including alternate variants) after APPENDIX B. PERFORMANCE MEASUREMENTS RAW DATA 202

filtering: 72 6 Average nr of variants per code block before filtering: 1.2333333333 7 Average nr of variants per code block after filtering: 1.2 8 Nr of code blocks that were corrected (but not filtered out): 4 9 Average number of variants per code block before filtering (not considering code blocks that had only one variant before filtering): 3.8 10 Average number of variants per code block after filtering (not considering code blocks that had only one variant before filtering): 3.4 Listing B.12: Effectiveness measurement: generate a new Draw Product Line product with five features from three input products. Compare the number of variants per code block before and after the rules have filtered the code.

1 Input products: [P5, P8, P10] 2 Target product: P7 3 Nr of code blocks (excluding alternate variants): 61 4 Total nr of code blocks (including alternate variants) before filtering: 73 5 Total nr of code blocks (including alternate variants) after filtering: 72 6 Average nr of variants per code block before filtering: 1.1967213115 7 Average nr of variants per code block after filtering: 1.1803278689 8 Nr of code blocks that were corrected (but not filtered out): 6 9 Average number of variants per code block before filtering (not considering code blocks that had only one variant before filtering): 2 10 Average number of variants per code block after filtering (not considering code blocks that had only one variant before filtering): 1.9166666667 Listing B.13: Effectiveness measurement: generate a new Draw Product Line product with three features from three input products. Compare the number of variants per code block before and after the rules have filtered the code.

1 Input products --> target product 2 3 [P4, P5, P10] --> P12 4 [P5, P8, P10] --> P7 5 [P1, P3, P11] --> P2 6 7 Average number of variants per code block before filtering: APPENDIX B. PERFORMANCE MEASUREMENTS RAW DATA 203

1.1452513966 8 Average number of variants per code block after filtering: 1.1284916201 9 Total nr of variants that were corrected (but not filtered out): 10 10 Average number of variants per code block before filtering (excluding those code blocks that had only one variant): 2.5294117647 11 Average number of variants per code block after filtering (excluding those code blocks that had only one variant): 2.3529411765 Listing B.14: Average numbers for the results from the last three Draw Product Line effectiveness measurements (B.11, B.12, and B.13)

1 Input products: [P1, P3, P11] 2 Target product: P2 (SelectMovie, PlayImm, VRCInterface, StopMovie, StartMovie, StartPlayer, VOD) 3 Nr of code blocks (excluding alternate variants): 433 4 Total nr of code blocks (including alternate variants) before filtering: 433 5 Total nr of code blocks (including alternate variants) after filtering: 433 6 Average nr of variants per code block before filtering: 1 7 Average nr of variants per code block after filtering: 1 8 Nr of code blocks that were corrected (but not filtered out): 3 9 There are no code blocks with multiple variants Listing B.15: Effectiveness measurement: generate a new Video On Demand product with seven features from three input products. Compare the number of variants per code block before and after the rules have filtered the code.

1 Input products: [P4, P5, P10] 2 Target product: P12 3 Nr of code blocks (excluding alternate variants): 440 4 Total nr of code blocks (including alternate variants) before filtering: 510 5 Total nr of code blocks (including alternate variants) after filtering: 510 6 Average nr of variants per code block before filtering: 1.1590909091 7 Average nr of variants per code block after filtering: 1.1590909091 8 Nr of code blocks that were corrected (but not filtered out): 3 9 Average number of variants per code block before filtering (not considering code blocks that had only one variant before APPENDIX B. PERFORMANCE MEASUREMENTS RAW DATA 204

filtering): 36 10 Average number of variants per code block after filtering (not considering code blocks that had only one variant before filtering): 36 Listing B.16: Effectiveness measurement: generate a new Video On Demand product with eight features from three input products. Compare the number of variants per code block before and after the rules have filtered the code.

1 Input products: [P5, P8, P10] 2 Target product: P7 (SelectMovie, PlayImm, Pause, VRCInterface, StopMovie, StartMovie, QuitPlayer, StartPlayer, VOD) 3 Nr of code blocks (excluding alternate variants): 438 4 Total nr of code blocks (including alternate variants) before filtering: 508 5 Total nr of code blocks (including alternate variants) after filtering: 438 6 Average nr of variants per code block before filtering: 1.1598173516 7 Average nr of variants per code block after filtering: 1 8 Nr of code blocks that were corrected (but not filtered out): 3 9 Average number of variants per code block before filtering (not considering code blocks that had only one variant before filtering): 36 10 Average number of variants per code block after filtering (not considering code blocks that had only one variant before filtering): 1 Listing B.17: Effectiveness measurement: generate a new Video On Demand product with nine features from three input products. Compare the number of variants per code block before and after the rules have filtered the code.

1 Input products --> target product 2 3 [P4, P5, P10] --> P12 4 [P5, P8, P10] --> P7 5 [P1, P3, P11] --> P2 6 7 Average number of variants per code block before filtering: 1.1067887109 8 Average number of variants per code block after filtering: 1.0533943555 9 Total nr of variants that were corrected (but not filtered out): 9 10 Average number of variants per code block before filtering APPENDIX B. PERFORMANCE MEASUREMENTS RAW DATA 205

(excluding those code blocks that had only one variant): 36 11 Average number of variants per code block after filtering (excluding those code blocks that had only one variant): 18.5 Listing B.18: Average numbers for the results from the last three Video On Demand effectiveness measurements (B.15, B.16, and B.17) Appendix C

ArchStudio Model Merge & Repair

This chapter depicts the merge scenarios that form the motivation for the model rules described in 6.6. The basic structure of the merge models is the same for all three scenarios: Based on an Ancestor software model, two change sets Alice and Bob are combined into a merged model titled Merge. Although both Alice’s and Bob’s changes are benign and valid when seen in isolation, combining them causes an invalidity in the merged model. Sec- tion 4.2 gives more information about this overarching principle that also affects source code merging and their respective rules. Optional components and links have dotted lines to distinguish them from the mandatory elements of a software product line. Elements highlighted in orange signify that they were changed with respect to the previous model version. Each of the invalid artifacts occuring in these model versions and their rationale are discussed in section 6.6.

206 APPENDIX C. ARCHSTUDIO MODEL MERGE & REPAIR 207 C.1 First Repair Leads to Cyclic Dependency

Ancestor Comp F Comp E

Connector B Comp C

Comp A

Comp D

Comp G APPENDIX C. ARCHSTUDIO MODEL MERGE & REPAIR 208

Alice Comp F Comp E

Connector B Comp C

Comp A

Comp D

Comp G APPENDIX C. ARCHSTUDIO MODEL MERGE & REPAIR 209

Bob Comp E

Connector B Comp C

Comp A

Comp D

Comp G APPENDIX C. ARCHSTUDIO MODEL MERGE & REPAIR 210

Mandatory interface Merge must be connected to a Comp E mandatory component

Connector B Comp C

Comp A

Comp D

Comp G APPENDIX C. ARCHSTUDIO MODEL MERGE & REPAIR 211

Fix #1 Cyclic dependency Comp E

Connector B Comp C

Comp A

Comp D

Comp G APPENDIX C. ARCHSTUDIO MODEL MERGE & REPAIR 212

Fix #2 Comp F Comp E

Connector B Comp C

Comp A

Comp D

Comp G APPENDIX C. ARCHSTUDIO MODEL MERGE & REPAIR 213 C.2 First Repair Removes Last Mandatory In- terface from Component

Comp A Comp B

Comp C Comp E Comp F

Comp D Ancestor APPENDIX C. ARCHSTUDIO MODEL MERGE & REPAIR 214

Comp A Comp B

Comp C Comp F

Comp D Comp G Alice APPENDIX C. ARCHSTUDIO MODEL MERGE & REPAIR 215

Comp A Comp B

Comp C Comp E

Comp D Bob APPENDIX C. ARCHSTUDIO MODEL MERGE & REPAIR 216

Comp A Comp B

Comp C No mandatory components

Comp D Comp G Merge APPENDIX C. ARCHSTUDIO MODEL MERGE & REPAIR 217

Comp A Comp B

Comp C No mandatory interfaces on Comp A

Comp D Comp D Fix #1 APPENDIX C. ARCHSTUDIO MODEL MERGE & REPAIR 218

Comp A Comp B

Comp C Comp E

Comp D Comp D Fix #2 APPENDIX C. ARCHSTUDIO MODEL MERGE & REPAIR 219 C.3 First Repair Removes Last Incoming Inter- face from Connector. Second Repair Cre- ates Mandatory Link on Optional Interface

Ancestor

Comp E

Connector Comp C Comp D

Comp A APPENDIX C. ARCHSTUDIO MODEL MERGE & REPAIR 220

Alice

Comp E

Comp D Connector

Comp A

Comp C APPENDIX C. ARCHSTUDIO MODEL MERGE & REPAIR 221

Bob

Comp D

Connector Comp C

Comp A

Comp B APPENDIX C. ARCHSTUDIO MODEL MERGE & REPAIR 222

Merge Cyclic dependency

Comp D Connector

Comp A Comp C

Comp B APPENDIX C. ARCHSTUDIO MODEL MERGE & REPAIR 223

Fix #1 No incoming interface for Connector

Comp D Connector

Comp A Comp C

Comp B APPENDIX C. ARCHSTUDIO MODEL MERGE & REPAIR 224

Fix #2 Mandatory link on optional interface

Comp D Connector

Comp A Comp C

Comp B APPENDIX C. ARCHSTUDIO MODEL MERGE & REPAIR 225

Fix #3

Comp D Connector

Comp A Comp C

Comp B Appendix D

Curriculum Vitae

226 Figulystraße 35 4020 Linz Matthias Braun, BSc Austria T 06763092253 B [email protected] Curriculum Vitae  Stack Overflow: Matthias Braun

Professional Experience Since April ’15 Self-employed, For GMR Fotografen GmbH, Austria, Design and development of photo ordering software. Used technologies: Java 8, Gradle Summer term ’15 Tutor, Institute for Software Systems Engineering, Johannes Kepler University, Course: Software Processes and Tools. Used technologies: Git, Gerrit, EPF Composer, Java 8 April – Nov. ’14 Overseas internship, Decision Systems Lab, University of Wollongong, Australia, Research and software development for master’s thesis with ArchStudio 3. Used technologies: Java 8, Scala Summer term ’14 Tutor, Institute for Systems Engineering and Automation, Johannes Kepler University, Course: Software Processes and Tools, Held talk about continuous delivery in context of course. Used technologies: Git, Gerrit, EPF Composer, JavaFX March – Oct. ’13 Software development, Christian Doppler Labor „Monitoring and Evolution of Very-Large- Scale Software Systems“, Johannes Kepler University, Real time visualization of communication among Siemens VAI blast furnace software components. Used technologies: Eclipse RCP (Java), d3.js (JavaScript) Summer term ’13 Tutor, Institute for Systems Engineering and Automation, Johannes Kepler University, Course: Software Processes and Tools. Used technologies: SVN, IBM Rational Rhapsody Sept. ’12 Internship, RACON Software GmbH, Automation of website load tests via JMeter plugin. Used technologies: Swing (Java), Ant, Maven August ’12 Internship, Institute for Systems Engineering and Automation, Johannes Kepler University, Creo plugin for checking consistency between model and documentation. Used technologies: Java 6, Creo March – June ’11 Co-op program, Qualcomm Research Center Vienna, Benchmark automation and visualization for Vuforia SDK. Used technologies: Python, JavaScript, Bash, Android Debug Bridge, ASP.NET (C#) Sept. ’10 Summer internship, RACON Software GmbH, Java tool for code maintenance. Used technologies: Swing (Java) Education Since March ’12 Johannes Kepler University, Linz, Masters’s degree: Software Engineering. Oct. ’11 – March ’12 Graz University of Technology, Master’s: Computer Science. Oct. ’11 Hagenberg Campus of the University of Applied Sciences Upper Austria, Bachelor’s degree: Mobile Computing, Graduation: Bachelor of Science in Engineering, Bachelor’s thesis: ”Augmented Reality Interface for Home Automation and Control”, YouTube-Video: Augmented Reality meets Home Automation. Used technologies: Android, Vuforia SDK, digitalSTROM June ’07 Akademisches Gymnasium Linz, Graduation: matriculation. Appendix E

Erklärung

Ich erkläre an Eides statt, dass ich die vorliegende Masterarbeit selbstständig und ohne fremde Hilfe verfasst, andere als die angegebenen Quellen und Hilfsmittel nicht benutzt bzw. die wörtlich oder sinngemäß entnommenen Stellen als solche kenntlich gemacht habe. Die vorliegende Masterarbeit ist mit dem elektronisch übermittelten Textdokument identisch.

228