Eindhoven University of Technology

MASTER

Re-engineering the re-engineering process

Yankov, A.G.

Award date: 2018

Link to publication

Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration.

General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain Department of Mathematics and Computer Science Model Driven Software Engineering Research Group

Re-engineering the re-engineering process

Master Thesis

Adrian Yankov

Supervisors: prof.dr. M.G.J. (Mark) van den Brand dr. Y. (Yaping) Luo ir. M. (Marc) Hamilton dr. N. (Natalia) Sidorova

Eindhoven, Jan 2018 Abstract

The number of legacy systems around the world that still function is high. Their maintenance has turned problematic, due to the original developers leaving or lack of proper documentation. Re-engineering can be one of the solutions to these problems. It applies reverse engineering to recover the missing artifacts and then forward engineering to generate a new and more modern software product. Additionally, the re-engineering process can be backed up by models turning it into model-based re-engineering. Nevertheless, the existing model-based re-engineering standards are not precise enough and do not talk about verification and validation of the new process and product. In this thesis we give an overview of the current model-based re-engineering standards. Then we suggest a new more detailed approach depicted in Business Process Model and Notation (BPMN) with sample activities, process input and outputs. We also investigate reverse engineering as an essential part of the main re-engineering process by scanning the available literature and inter- viewing some experts from industry. We conclude with a case study on an Internet of Things (IoT) project, where our model-based re-engineering approach is applied.

ii Re-engineering the re-engineering process Acknowledgements

Completing this thesis provided me with a large amount of satisfaction. This important event of my life would not have been possible without a few people, to whom I would like to greatly acknowledge. First of all, I would like to express special gratitude to prof. Mark van den Brand for supervising the overall graduation process. He also helped me greatly with my personal devel- opment. Secondly, I would like to thank dr. Yaping Luo for agreeing to be my daily supervisor and always supporting me in word and deed. She was of huge help of me for mediating between the academic and industrial world. In addition to daily supervisor, dr. Yaping Luo is also a good friend. Thirdly, I would like to express gratitude to Marc Hamilton for providing irreplaceable feedback from an industrial point of view and being my main contact in Altran. His many years of work experience contributed greatly to my research. Last, but not least, my gratitude also goes to dr. Natalia Siderova for playing an essential role as a committee member and assessing my thesis. This last paragraph, I dedicate to my loving parents, girlfriend and friends. They are the main drive behind my motivation. I could not have gone through this journey alone without their moral support.

Re-engineering the re-engineering process iii List of Acronyms

Acronyms

ADL architecture description language API Application Programming Interface AST Abstract Syntax Tree BPMN Business Process Model and Notation CMMI Capability Maturity Model Integration CRUD Create Read Update Delete DSL Domain Specific Language EMF Eclipse Modeling Framework GUI Graphical User Interface IDE Integrated Development Environment IoT Internet of Things IRE integrated reverse-engineering environment KLOC Kilo Lines of Code KPI Key Performance Indicators MDE Model Driven Engineering MDSE Model Driven Software Engineering MOF Meta-Object-Facility MVC Model-View-Controller OMG Object Management Group QVT Query/View/Transformation UML Unified Modeling Language

iv Re-engineering the re-engineering process Contents

Contents v

List of Figures vii

List of Tables viii

1 Introduction 1 1.1 General Introduction...... 1 1.2 Problem Definition...... 1 1.3 Research Questions...... 2 1.4 Structure of Thesis...... 2

2 Background 3 2.1 Legacy Systems...... 3 2.2 Re-engineering...... 4 2.3 Reverse Engineering...... 4 2.4 Forward Engineering...... 5 2.5 Model-driven software engineering...... 5

3 The typical process of model-based re-engineering6 3.1 Introduction...... 6 3.2 Related work...... 6 3.3 Re-engineering scenarios...... 8 3.3.1 Scenario I...... 8 3.3.2 Scenario II...... 9 3.4 A model-based re-engineering process...... 9 3.4.1 Detailed re-engineering process in BPMN...... 9 3.5 Re-engineering process inputs and outputs...... 11 3.6 Conclusion...... 13

4 Status of reverse engineering 14 4.1 Introduction...... 14 4.2 Part I: Literature Study...... 14 4.3 Pretty printers and code visualization...... 15 4.4 Static and Dynamic Analyses...... 15 4.4.1 Static Analysis...... 16 4.4.2 Dynamic Analysis...... 16 4.5 Reverse Engineering Challenges...... 17 4.6 Reverse Engineering Tools...... 17 4.7 Part II: Interviews...... 20 4.7.1 Planning...... 20 4.7.2 Design...... 20 4.7.3 Performing the interview...... 20

Re-engineering the re-engineering process v CONTENTS

4.7.4 Results...... 20 4.7.5 Threats to validity...... 21 4.8 Conclusion...... 22

5 Validation of the re-engineering process in an industrial environment 23 5.1 Introduction...... 23 5.2 The YouKnowWatt Project...... 23 5.2.1 Project introduction...... 23 5.3 Challenges with IoT projects...... 24 5.4 Chimera(A-B--D)...... 24 5.5 Lotte...... 26 5.6 Validating the model-based re-engineering process...... 28 5.6.1 Overview...... 28 5.6.2 Mapping to the proposed BPMN model...... 28 5.6.3 Gathering all available project information...... 29 5.6.4 Reverse Engineering Plan...... 30 5.6.5 Reverse Engineering Tools: Old YouKnowWatt → models...... 31 5.6.6 Perform early validation...... 35 5.6.7 Applying model-driven engineering...... 35 5.6.8 Analyzing the results...... 38 5.7 Extended BPMN after validation...... 40 5.8 Conclusion...... 41

6 Conclusions 42 6.1 Research questions and conclusions...... 42 6.2 Future work...... 43

Bibliography 44

Appendix 49

A Interview materials 50

B Reverse Engineering Plan 52 B.1 Introduction...... 52 B.2 Definition & Justification...... 52 B.2.1 Phase 1.1: Analyze the project goals...... 52 B.2.2 Phase 1.2: Inventory of available existing components...... 52 B.2.3 Phase 1.3: Determine reverse engineering strategy...... 52 B.3 Execution...... 53

vi Re-engineering the re-engineering process List of Figures

3.1 The SEI Horseshoe Model [1]...... 7 3.2 OMG ADM Standards and the ADM Horseshoe [2]...... 7 3.3 Scenario I - same process, two similar products...... 8 3.4 Scenario II - two different processes, two similar products...... 9 3.5 Detailed Re-engineering Process expressed in BPMN...... 10

4.1 Collected papers’ overview...... 15 4.2 Reverse Engineering Requirements results from interview...... 21

5.1 YouKnowWatt customer example...... 24 5.2 A-B-C-D Framework...... 25 5.3 A-B-C-D Framework evolution...... 25 5.4 Lotte and YouKnowWatt relation...... 26 5.5 Lotte illustration...... 27 5.6 Re-engineering of new YouKnowWatt project...... 28 5.7 Case study mapped to Re-engineering BPMN...... 29 5.8 Architectural view of YouKnowWatt components from the available documentation 30 5.9 YouKnowWatt Cloc [3] Result...... 32 5.10 Understand input and output...... 32 5.11 HTMT5 dependency graph created by Understand...... 33 5.12 Android connectivity class diagram created by Understand...... 33 5.13 Rubrowser input and output...... 34 5.14 Admin backend Module and class dependency graph generated by Rubrowser... 34 5.15 Zoomed-in Admin:WebController...... 35 5.16 Lotte specification for the new YouKnowWatt...... 36 5.17 New YouKnowWatt Directory Structure...... 37 5.18 New YouKnowWatt architecture...... 38 5.19 Improved model-based re-engineering BPMN model...... 41

A.1 Interview questions from 1 to 7...... 50 A.2 Interview questions from 8 to 12...... 50

Re-engineering the re-engineering process vii List of Tables

5.1 Complete result from Phase 1.2 and partial result from Phase 1.3...... 30 5.2 Types of objects for each project...... 31 5.3 Information per project overview...... 36 5.4 Component A...... 39 5.5 Component B...... 39 5.6 Component C...... 40

A.1 Requirements...... 51

viii Re-engineering the re-engineering process Chapter 1

Introduction

1.1 General Introduction

This thesis has been written with the intention to introduce the topic of re-engineering and the process behind it. Additionally, this document focuses on the current state of reverse engineering. We study how re-engineering and in particular reverse engineering are used in a model driven approach to modernize legacy systems. Thus, we propose a process for performing model-based re-engineering. To validate our approach, we apply our research to a case study on an IoT project. A definition of the terms used in this thesis is available in the next chapter.

1.2 Problem Definition

It is a well-known fact that many legacy systems still remain operational, and companies are facing serious issues with their maintenance. Due to the fact that developing a new system from scratch might require a large amount of funds and personnel, re-engineering an old system is often chosen as a solution. One of the steps during re-engineering process is reverse engineering. It plays a crucial role by locating at which level of abstraction the problem lies and what the input model for the new re-engineering process could be. The re-engineering process usually has an output, for instance a product with new source code. The new product has new features, similar to the previous one, but implemented using a more modern technology, which is easier to maintain. However, there can still be difficulty in validating the similarity between the old and new re-engineered product. Moreover, we want to make sure that no unexpected behaviour has been introduced and all expected behaviour is still present. Model Driven Software Engineering (MDSE) has previously addressed problems, such as validation and dealing with unexpected behaviour, with the help of models. Furthermore, these models can be used for code generation. Another issue that might arise is to verify the correctness of the process of how a new product is re-engineered. Even though there are existing standards on how to apply Model Driven Engin- eering (MDE) to re-engineering, also knows as model-based re-engineering, they are outdated and not detailed enough. An example of such a standard is the Horseshoe model [1]. Making model-based re-engineering standards more detailed is not easy, because it is difficult to find relevant case studies for validation. Furthermore, the number of experts available in this field is limited. Even if the knowledgeable people in re-engineering have completed many projects, information sharing might be often troublesome due to non-disclosure agreements. Thus, the gained experience of how exactly the experts applied model-based re-engineering remains obscured. The company Altran [4] also wanted a better understanding and description of the model-based re-engineering process. This thesis gives a more detailed process model and its validation based on an IoT application.

Re-engineering the re-engineering process 1 CHAPTER 1. INTRODUCTION

The model-based re-engineering process is definitely not a new topic. There are tools that have been developed to provide assistance in this process, one of them being MoDisco [5]. Additionally, there is a case study where Barbier et al. [6] have demonstrated the power of MoDisco by suc- cessfully generating models to understand a large complex geological system. Unfortunately, the tool support has been discontinued. In another recent paper, Aakash et al. [7] describe how to apply model-based re-engineering to migrate a legacy system to the cloud. Nevertheless, in their publication, verification and validation is not discussed. The focus is on the generation of the new product. Our work contributes by defining a new model-based re-engineering process that includes four phases such as: problem analysis, reverse engineering, forward engineering, and verification and validation. The four phases are divided in eight steps. We provide further details about our re-engineering process by specifying possible inputs and outputs of the sample activities for the eight steps. One of the phases, reverse engineering is also investigated. Our findings finish with applying the model-based re-engineering to a case study.

1.3 Research Questions

Three research questions have been proposed in order to help tackle the problem described in Section 1.2.

RQ1: What is the process of model-based re-engineering? The main purpose is to study the notion of re-engineering in more detail and to define a re- engineering process. This is necessary because there is no commonly accepted definition available, while there is a need for a reference to frame the elements of this study and future work.

RQ2: What is the current state of reverse engineering? In this question, we narrow to the application of reverse engineering in the context of the re- engineering process. However, reverse engineering is not new, so we need to investigate the differ- ent reverse engineering technologies and tools. Additionally, four experts have been interviewed to summarize their state of practice on this topic.

RQ3: How to validate the re-engineering process in an industrial environment? To gain confidence in the process definition of RQ1 and to evaluate the results of RQ2, we ap- ply the process to an industrial case study. We follow some of the steps from our model-based re-engineering process to an IoT project to generate a new, similar product. Our focus is on the reverse engineering part.

1.4 Structure of Thesis

This thesis consists of six chapters. Chapter 2 provides background information required to acquire a better understanding of the terms used in this thesis. In Chapter 3 we address RQ1 about suggesting a detailed model-based re-engineering process. Chapter 4 discusses the current status of reverse engineering. In the next chapter, Chapter 5, it is explained how we applied the newly suggested re-engineering process to an industrial case study. The document ends with Chapter 6, where the results of this thesis are summarized and future work is recommended.

2 Re-engineering the re-engineering process Chapter 2

Background

This chapter introduces some terms used throughout the thesis. Topics such as legacy systems, re-engineering, reverse engineering, forward engineering and model-driven software engineering are briefly explained.

2.1 Legacy Systems

In 1999 Warren stated that a legacy system is “an old system which remains in operation within an organization” [8]. In the same year, Alderson et al. [9] discussed two more definitions for legacy systems: “any code that has left development” and “a system whose security has been compromised”. We will use Alderson’s first definition, because security is not of interest to us and the word “old” in Warren’s interpretation can be ambiguous. Traditionally, legacy systems were developed using old programming languages such as CO- BOL, Fortran [10], Ada [11], and have undergone many changes in their source code. Nowadays, the even faster market needs, rapidly changing technology or the higher demands on system de- velopment performance make it also more likely for a young, modern system to be already legacy. The migration of legacy systems is often postponed for a long period. One of the possible reasons might be the high cost. Moreover, organizations may not want risking losing crucial business logic or adding unwanted side effects in the process of re-engineering. Warren [8] identified ten attributes of legacy systems. 1. High maintenance costs 2. Complex software 3. Obsolete support software

4. Obsolete hardware 5. Lacking technical expertise 6. Business critical

7. Backlog of change requests 8. Poor documentation 9. Embedded business knowledge 10. Poorly understood by maintainers

Re-engineering the re-engineering process 3 CHAPTER 2. BACKGROUND

The high maintenance cost might be explained by the growing Kilo Lines of Code (KLOC), resulting in more staff being hired. In 1996 Pigoski stated that every 35 KLOC require one developer [12] . The high complexity is covered by Lehman’s second law of software evolution [13]. The hardware cannot also be replaced with new one, because the change might result in unexpected behaviour in the legacy system. It seems that modern legacy systems are often poorly documented due to high pressure on development and the limited possibilities to keep documentation up to date with system changes. The previous three problems are mentioned, because they are probably a key indicator for starting a re-engineering process.

2.2 Re-engineering

In 1990 Chikofsky and Cross II [14] declared that re-engineering was part of system renovation. System renovation is usually initiated, due to issues identified in the current process or product. System renovation is often postponed, because companies fear the loss of irreplaceable business domain knowledge. The process of re-engineering usually consists of reverse engineering, restructuring, re-documen- tation and forward engineering [15]. The term “re-engineering” is also well-known in the business process management domain. Since 1993, Hammer [16] has interpreted the term as “Fundamental rethinking and radical redesign of business processes to achieve dramatic improvements in critical contemporary measures of performance such as cost, quality, service and speed“. Note that this in- terpretation clearly focuses on the development process aspects of the rationale for re-engineering. In 2003, Seacord et al. [17] gave a very brief definition for re-engineering stating that “it was an engineering process aiming to generate evolvable system”. Later on in 2004, re-engineering is explained as “a way to achieve software reuse and to understand the concepts underlying the application domain” [18]. The definition that we use in this paper introduced by Jain et al. [19] states that “re-engineering is the process of creating an abstract description of a system, reason about a change at the higher abstraction level, and then re-implement the system”. We have chosen this definition, because it is the most recently defined one.

2.3 Reverse Engineering

Reverse engineering originates from military personnel and companies trying to reproduce how certain hardware parts functioned for their own advantage. Later in time, the concept was transferred to the software world [14]. As mentioned before, companies often have to deal with legacy code. When new features need to be added to these systems or issues need to be fixed, companies may meet a number of challenges because of missing the original developers or up-to- date documents. Reverse engineering gives an aid in solving this issue by providing understanding the problem through extracting useful information. Chikofsky and Cross II in 1990 [14] stated that reverse engineering is “the process of analysing a subject system to identify the system’s components and their interrelationships and create representations of the system in another form or a higher level of abstractions”. The main goals of reverse engineering are to restore disappeared information, to visualize the system in various views, and to encourage the reuse of information. Additionally, reverse engineering does not alter the system under investigation, but only gathers information, which can be later used as input for forward engineering [20]. The reverse engineering process consists of two sub-processes: re-documentation and design recovery. Re-documentation deals with finding lost information. Design recovery focuses on re- covery of design artifacts. When performing design recovery, domain knowledge, and external information are required [14].

4 Re-engineering the re-engineering process CHAPTER 2. BACKGROUND

2.4 Forward Engineering

In addition to reverse engineering, in 1990, Chikofsky [14] also explained the term forward engineering. He defines forward engineering as “The traditional process of moving from high level abstractions and logical implementation-independent designs to the physical implementation of a system”. By physical implementation of a system, usually the low-level implementation is understood. In 2015, Bowen [21] also mentions the forward engineering process as a “models before implementation”.

2.5 Model-driven software engineering

During the 1960s Dijkstra [22] introduced the term “structured programming”. This is a programming paradigm in which a program is built by combining different control structures. Computer scientists thought for a long period that structured programming would be “the-next- big-thing”. It served its purpose, until the 1980s when a new software engineering principle “Everything is an object“ was introduced. This method also proved efficient until a certain point in time, when the object trend faded. Then the idea of “Everything is an object“ was almost forgotten. As a result, a new principle “Everything is a model” has been developed. This motto is the main force driving MDSE. The goal is to establish a formal basis for the concepts of software modelling. Note that different definitions for a ‘model’ exist. One of them being ‘model’ is a re- worked version of “modulus”, which originates from Latin and stands for measure, rule, example, pattern to be followed. In 1971, Stachowiak [23] declared another definition for a model, which stated that an item must satisfy three criteria:

• “Mapping criterion - there is an original object or phenomenon that is mapped to the model • Reduction criterion - not all the properties of the original are mapped to the model, but it is somehow reduced • Pragmatic criterion - the model can replace the original from purpose“

In 2006, K´uhne[24] gives another definition for a model - “an abstraction of a (real or lan- guage based) system allowing predictions or inferences to be made”. In MDSE, models are used for complicated tasks such as creating domain-specific languages, running code-generators and performing model-transformations.

Re-engineering the re-engineering process 5 Chapter 3

The typical process of model-based re-engineering

3.1 Introduction

Re-engineering is becoming more relevant due to the growth of legacy systems. There are two main reasons for performing re-engineering: improvement of product performance and migration. The performance improvement comes from understanding the specification, design and implement- ation of a system. Engineers use this knowledge to remodel the system and boost the product’s functionality or implementation. Migration of legacy systems to new platforms or frameworks might be necessary, because used technologies become outdated. Other reasons for migration are: too much time is required for the implementation of new features and reduced time-to-market. To resolve these issues model-based techniques can be applied. By doing so, engineers can create better maintainable software, thus saving time and cost. Recent years, modelling techniques are also applied to support reverse engineering. In combination with a modernization of the forward engineering process to MDSE, we can truly talk about model-based re-engineering. However, this technique requires the reconstruction of legacy information into models in the MDSE process.

3.2 Related work

The first descriptive approach how to perform re-engineering is the “Horseshoe model”. In 1999, Bergey et al. [1] first defined the Horseshoe model as “analysis of an existing system, logical transformation, and development of a new systems”. The authors also stated that the power of this model comes from the three levels of abstraction, which could be used for various descriptions of logic, for instance in the source code or architecture layer. These three levels are shown in Figure 3.1. The lowest layer - “Source Text Representation” is not regarded as a level, because it includes the old and new system’s source code. The description of the three levels are:

6 Re-engineering the re-engineering process CHAPTER 3. THE TYPICAL PROCESS OF MODEL-BASED RE-ENGINEERING

Figure 3.1: The SEI Horseshoe Model [1]

• Level 1: “Code-Structure Representation” - consisting of source code and artifacts, such as an Abstract Syntax Tree (AST), analytical operations, and flow graphs found via parsing. • Level 2: “Function-level Requirements” - contains a description of how the functions of the programs, data and files are related. • Level 3: “Concept” - parts of source code that builds components, later combined into subsystems.

In 2003, the Object Management Group (OMG) [25] introduced an improved horseshoe model, named “ADM Horseshoe” [2], based on the original one from SEI. Figure 3.2 illustrates the model and included standards [26]. Despite the fact that the ADM horseshoe serves as a guide how to perform re-engineering, it suffers from being not detailed enough. Companies, who want to apply it in practice, may interpret the standards in their own way. This may lead to different results every time. Moreover, verification and validation are not mentioned in both the SEI and ADM Horseshoe. However, verification and validation are essential parts for the new process and re-engineering results. Furthermore, a combination of a few standards into a model does not automatically guarantee correctness.

Figure 3.2: OMG ADM Standards and the ADM Horseshoe [2]

Re-engineering the re-engineering process 7 CHAPTER 3. THE TYPICAL PROCESS OF MODEL-BASED RE-ENGINEERING

3.3 Re-engineering scenarios

In the previous section the issues of the current definitions of re-engineering process have been presented. To address these problems we propose a detailed process how to perform model-based re-engineering. Therefore, we need to discover some type of common characteristics between re-engineering projects. We identified two scenarios of re-engineering via communication with experts. In this section both scenarios will be explained in details.

3.3.1 Scenario I Figure 3.3 shows the first scenario in a form of a pyramid. The pyramid is used as an illustration in this thesis and has four main levels. Rosenberg [27] discusses these levels as part of the software development levels of abstraction. There is also the development process A. As a result from it, Product A is created. In this scenario, the product owner decides to use the previous development process to create a new product B. Product B can be a new item or a similar one to Product A. An example of this could be reusing an existing system, and adding some new features. Scenario I might be also classified as a typical case of forward engineering. This scenario is usually executed to boost product performance or migrate to a newer technology.

Figure 3.3: Scenario I - same process, two similar products

8 Re-engineering the re-engineering process CHAPTER 3. THE TYPICAL PROCESS OF MODEL-BASED RE-ENGINEERING

Figure 3.4: Scenario II - two different processes, two similar products

3.3.2 Scenario II Figure 3.4 depicts the second scenario. As can be seen, it resembles Scenario I with the four levels. Nevertheless, in this scenario the stakeholders have probably noticed that many products, created using Process A, experienced the same issues. Example problems can be it takes too long to implement a new feature, slow-time to market, entanglement with a specific technology or the domain expert is gone. The issues in Process A also affect the end product Product A. Consequently, an investigation needs to be carried out for finding the issues in the original software engineering process. After the issues are located, a new improved process can be created. In Figure 3.4 this refers to Process B. This process is then used to create the Product B.

3.4 A model-based re-engineering process

In Scenario II, the new process B defines a new entry point, which needs to be reconstructed from information from the old process A and its resulting product. In the vision of Altran, the new process B will apply MDE forward engineering, requiring reconstructing models as the new input. Therefore, this results in a model-based re-engineering process.

3.4.1 Detailed re-engineering process in BPMN To explain the model-based re-engineering process in detail, we create a model expressed in BPMN [28]. BPMN has been chosen, because it is suitable for expressing procedures in a standardized manner. In this thesis, the suggested re-engineering steps from the BPMN model focus only on Scenario II. This model is also based on the horseshoe models, but is refined by insight obtained by the paper of [7]. Figure 3.5 presents the detailed re-engineering process. As can be noted, the process is divided into four phases - problem analysis, reverse engineering, forward engineering, and verification and validation. To initiate a re-engineering process, issues must be present. Thus, we introduce the problem analysis phase to locate the different problems on process and product level. Con- sequently, reverse engineering and forward engineering are included, because they are essential parts from the re-engineering process as previously mentioned in Chapter 2. Moreover, the second phase is necessary to gather information about the old product or process and the third phase to create a new product as described in Scenario II. Finally, we boost our contribution by putting verification and validation as the fourth phase, because it is missing from the other standards. Note that the first three phases are connected to validation and verification. We introduce these links to reduce the risk of having wrong information as input in the any of our re-engineering phases.

Re-engineering the re-engineering process 9 CHAPTER 3. THE TYPICAL PROCESS OF MODEL-BASED RE-ENGINEERING

Figure 3.5: Detailed Re-engineering Process expressed in BPMN

Each of the phases are composed of specific tasks. The highlighted steps in numbers are ex- plained below. We think these steps are the key tasks when performing model-based re-engineering. Step 1 (Identify problem on process level): Example problems might be that the personnel are not well-trained, the software development methodology is not suitable for the projects, or the tooling is incorrect. The first issue is not solvable by switching to MDSE. Thus, another approach outside of this scope needs to be selected. Examples of issues that MDSE might be able to solve are numbers of features per iterations is too low or too many defects in resulting code. Step 2 (Locate problematic level): In this step, the engineers need to identify in which of the levels the issues lies. For example, the requirements can be badly formulated in the requirements phase, due to communication issues between stakeholders. According to Fabbrini et al. [29], the aforementioned requirements usually suffer from the following issues: vagueness, subjectivity, optionality, weakness, underspecification, multiplicity and implicity. These mentioned example problems can steer the engineers what to be cautions about. Moreover, finding problems with requirements engineering might affect the new process definition for performing that activity in the next step. In the design phase, these issues could be separated into two categories: non-technical and technical. In 2016 Ansar and Khan [30] published a paper, which investigated the non-technical issues in the software designing phase, and discovered that the most significant problems are linked to understandability and communication. Hence, as mentioned before, the new process definition may need to contain a better communication process. In the implementation and testing phase, the problems might be experienced in several ways. In 2000, Fowler and Beck [31] discovered 22 of the most common flaws when writing code and coined the term as “bad smells”. The top five are duplicated code, long method, large class, long parameter list and divergent change. Thus, the new process definition might have to make code reviews obligatory. Despite the fact that not writing tests is a big omission, there are still cases where this happens. Another problem is automatic testing not always being possible, then manual one is required. Thus, the time-to-market and labour costs increase. Additionally, the customer’s test might not cover the correct parts of the source code, resulting in undiscovered bugs in the system-under-test. Moreover, according to Heimdahl [32], selecting the correct validation of the test model and the right model-coverage-criteria are still challenges for testing. Step 3 (Adapt new process definition): During this step one must adapt or create a new process definition, eliminating the issues in the old process. There are certain existing process improvement guidelines such as the Capability Maturity Model Integration (CMMI) [33] and

10 Re-engineering the re-engineering process CHAPTER 3. THE TYPICAL PROCESS OF MODEL-BASED RE-ENGINEERING

ISO/IEC 33014 [34] standard. We will not go deeper into process improvement, as it is a separate field of study in Computer Science. Nevertheless, it is important to mention that in the vision of Altran, the new process definition should involve models as input. Step 4 (Reconstruct artifacts): The artifacts, in our case models, are used as the input for the new improved process using the old product and process. Example of such information can be the whole codebase, regression tests, a specification document and software repository history. In this step, the reverse engineering takes place. Inputs from all levels can be used if available. The professional can choose which reverse engineering method to apply, for instance static analysis techniques, dynamic analysis techniques or a combination. For more information, see Chapter 4. Step 5 (Perform early validation): Early validation is performed to make sure that the recon- structed artifacts are correct. In our model based re-engineering process, we can use the models to perform validation by, for instance, reviewing by experts of generated documentation, static analysis, simulation or formal verification of derived formal models. Step 6 (Apply model driven engineering): After the validation step, the reconstructed artifacts are used as inputs for the MDSE to make a new product - Product B. Sample activities that are carried out might be code generation, model transformation or a domain-specific language. Step 7 (Analyze the results): One of the ways to compare Product A and B is by running the original test suite from Product A. Another possible method is to extract behavior models from both products and perform a model comparison. Model-based testing can also be an option. An alternative could be domain experts being asked to perform a manual validation. Last but not least, the engineer could even run the traces or event logs on the new product to check the fitness. The collected results determine the satisfactory level of the stakeholders on product level. Step 8 (Validate solved problems): We try to give evidence that process issues are solved. The stakeholders and engineers use Key Performance Indicators (KPI) to deliver the verdict. If they are not satisfied with the result, the process returns to the start at Step 1 to again identify another problem on process level.

3.5 Re-engineering process inputs and outputs

As mentioned before, the available engineering process descriptions do not provide enough guidance. The BPMN model makes it more tangible, but we can add more details to the process steps. Table 3.5 contains further explanation. We constructed this table by thinking of the various sample activities for each re-engineering step. In addition, valuable feedback from Altran experts was also received to formulate the final version. The table is divided in six columns: stage, steps, sample activities, process input, process output and automation support. Note that engineers do not need to execute all the sample activities and have all the process inputs available. The person executing the re-engineering can select the most appropriate activities for his case.

Re-engineering the re-engineering process 11 CHAPTER 3. THE TYPICAL PROCESS OF MODEL-BASED RE-ENGINEERING Manual Manual Semi-automated Fully-automated Automation Support Metrics KPI targets V &V report New codebase V& V Criteria Process metrics SWOT analysis New documents Process overview Process Output Dependency Graph Controw-flow graph Performance results Comparison to KPIs KPIs and their status Finite State automata New process definition Problem description(s) Elaborated process view identifying problematic contributions Specification models according to new product qualification process Goals Metrics Codebase KPI targets New codebase Questionnaires Runtime traces Documentation Issue databases Process metrics Process context Process Input Previous output Assessment results Process description Specification models New process definition Reconstructed artifacts Documentation(for manual work) Data-mining Run diff tool Assess process Static analysis Run test suites Machine learning Dynamic Analysis Sell new approach Define new process Assess new process Translate to process Sample Activities Create business case Define relevant KPIs Negotiate expected results Perform root-cause analysis Define product qualification Get feedback from developers Software engineering according to new process Steps Perform MDE Analyze the results Perform early validation Locate problematic level Validate solved problems Reconstruct artifacts code Adopt new process definition Identify problem on process level V& V Stage Reverse eng. Forward Eng. Problem Analysis

12 Re-engineering the re-engineering process CHAPTER 3. THE TYPICAL PROCESS OF MODEL-BASED RE-ENGINEERING

3.6 Conclusion

In this chapter we investigated existing re-engineering scenarios. It appears that there are little usable process descriptions available, where the two variants of the Horseshoe model appear to be the most mature. We argued that these descriptions do not contain sufficient detail to be of use for guiding studies or organizations applying it in practice. Furthermore, the descriptions lack the necessary validation steps to provide evidence in the various stages of the process. To address these shortcomings, we have defined a variant of the re-engineering process that applies a MDSE forward engineering process. Since this requires the reconstruction of a model as an entry point in the improved development process, we call it the model-based re-engineering process. This model-based re-engineering process is defined in a BPMN model, where the steps are described in an accompanying table in terms of inputs, outputs and typical activities. This definition of the model-based re-engineering process provides sufficient detail to guide this stage of the study.

Re-engineering the re-engineering process 13 Chapter 4

Status of reverse engineering

4.1 Introduction

Step 4 in our BPMN model from Chapter 3 is reconstructing artifacts. This is done with the help of reverse engineering. Moreover, reverse engineering is an essential step of re-engineering as mentioned in Chapter 2. This chapter is divided in two parts. The first part dives deeper into reverse engineering by presenting the results from our literature study. The second part presents the results of conducting interviews with some experts in the field of reverse engineering.

4.2 Part I: Literature Study

In order to get a better understanding of the terms, and grasp the current state of reverse engineering and re-engineering a literature study has been conducted. Another reason is to find related work with real-life applications of these terms. During the literature study we apply database keyword search, and select the ACM [35] and IEEE Xplore[36] digital libraries as information sources. The following keywords are identified and used for our research: Software reengineering, Software rejuvenation, Architecture Driven Modernization, Knowledge Discovery Metamodel, Architecture driven reverse engineering, Reverse Engineering, Reverse engineering approach, Reverse engineering tool and Model-driven reverse engineering. More than a hundred articles about re-engineering and reverse-engineering were collected and read. The gathered information ranges from the year 1973 to 2017. The number of keywords is high, hence, we put the papers in three categories - Re-engineering, Reverse engineering and Combined. The Re-engineering category includes publications that contain the first four keywords. Papers with the rest of the keywords are placed into the Reverse engineering category. Combined is trivial. Figure 4.1 illustrates a summary of how many papers were collected. As can be noted, reverse engineering has the highest number of publications. This result has been expected and can be explained with the large number of keywords related to that topic. Moreover, the subject of reverse engineering is not new. The number of gathered publications for re-engineering is half. Nevertheless, the number is large enough to get an understanding about the topic. Additionally, we are more interested in the reverse engineering part of the re-engineering process. The combined publications give insights into both topics and are only a small fraction. Fortunately, they are mostly books, hence, providing large volumes of information. From the successfully obtained information about the two topics, we first gained knowledge about the different reverse engineering technologies such as static and dynamic analysis, just to name a few. We continued by understanding the different challenges of reverse engineering being finding suitable industrial cases and coping with scalability of projects. Finally, we discovered the

14 Re-engineering the re-engineering process CHAPTER 4. STATUS OF REVERSE ENGINEERING

Figure 4.1: Collected papers’ overview various tools, for example Understand [37] and Obeo Agility [38]. We present our findings in the following paragraphs. In 2011, Canfora [39] stated that four types of reverse engineering technologies exist: pretty printers, static and dynamic analysers, tools for code visualization and exploration, and software repository mining tools. The next two sections provide further information about pretty printers, code visualization, static and dynamic analysis. We do not cover software repository mining tools, because they are out of scope.

4.3 Pretty printers and code visualization

Redocumentation can be one of the sub processes of reverse engineering. Pretty printers are developed to help in this aspect. Pretty-printing has existed for more than fifty years. It became popular as a feature of the Integrated Development Environment (IDE) for the programming language Lisp [40], which is now Common-Lisp [41]. The idea of pretty printers is to transform code into readable format, thus helping with program understanding. Additionally, in his paper de Jonge [42] states that pretty printing can be used together with other program understanding techniques to help analyse and examine applications to discover where and how re-engineering is required. Almost every integrated development environment has pretty-printing integrated by default and the technology is used on a regular basis by software engineers. Another essential reverse engineering technology is code visualization. Canfora [39] states that its importance comes from being able to pinpoint appropriate information at the right level of detail. Additionally, code visualization focuses on illustrating the source code or providing an understanding for the architecture in different images, diagrams or animations. Some of the most common example diagrams could be a class diagram, a sequence diagram and a call dependency diagram. Finally, code visualization can also illustrate software metrics. Software metrics can, for instance, give knowledge about the lines of code, the current maintainability of a program, how complex it is and code cloning.

4.4 Static and Dynamic Analyses

The main goal of reverse engineering is to help understand a particular program. This under- standing can be achieved in the design recovery phase. This phase usually consists of static and

Re-engineering the re-engineering process 15 CHAPTER 4. STATUS OF REVERSE ENGINEERING dynamic analysis. Jerding [43] explains these two a bit more in detail: “static, where the program code itself is analyzed, and dynamic where the program is executed to learn how it behaves”. The next subsections discuss these two terms, in particular their advantages, disadvantages and how they are applied in the industry. In addition, there is a hybrid analysis that combines aspects of both dynamic and static.

4.4.1 Static Analysis Static analysis is performed with the help of parsers. In the start of the activity, the grammar of the target programming language needs to be defined. Then the parsers can read the source code. When scanning the codebase, the system under analysis does necessarily need to be complete. This is a major advantage, because it has been proven to be time-saving and all the possible executions are considered. Canfora [39] also declares that static analysis is “reasonably fast, precise and cheap”. Static analysis is not limited to code parsing. Telea states that [44] static analysis also performs other operations such as fact extraction, fact aggregation, querying and presentation. In his paper he proposes the integrated reverse-engineering environment (IRE) SolidFX. SolidFx has a fact extractor designed for C, C++ based on the ELSA [45] parser. Telea uses the fact extractor to store all discovered facts in a fact database. The stored information can later be visualized in an IRE. Unfortunately, static analysis also has disadvantages, one of which is conservatism. Conservat- ism is explained by Tonella [46] with the fact that static analysis can falsely assert an in-executable path as true. He further states that misleading results without any existing input values can also be made. Additionally, static analysis does not cope well with some programming languages’ fea- tures such as polymorphism, dynamic class loading and pointers. These features can be confusing for study, hence, leading to incorrect analysis. To avoid this issue, dynamic analysis must be chosen. Despite having drawbacks, static analysis has been used extensively and proven useful in the industry. There are numerous examples. For instance, in 2008 Telea et al. [44] used SolidFx, a static analysis tool for C++ code, to study a famous C++ library wxWidgets. Another tool used in the industry is Bauhaus [47]. In 2011 Berger et al. [48] successfully applied static analysis with Bauhaus with security as main goal, not program understanding. They discover security flaws in the mobile phone platform Android. Finally, in 2017 Ludwig et al. [49] developed a plug-in for the static analysis tool - Understand [37] to generate metrics for architectural complexity. The goal of the plug-in was to assist project managers in their work and help reduce technical debt.

4.4.2 Dynamic Analysis As mentioned in the previous subsection, static analysis is not a universal solution for all program understanding problems. This is where dynamic analysis comes to help. In 1999 Ball defined dynamic analysis as “the analysis of the properties of a running software system” [50]. Nevertheless, to find the system’s dynamic properties, it needs to be ready for execution. One of the ways dynamic analysis works is by logging all function calls of the system to study the behaviour later. The data is preprocessed and outputted to a trace file. Tonella [46] explains that the trace file should contain logs with identifiers for each class and method call. Additionally, time stamps are included. After writing the information to a trace file, a visual model is extracted. Examples of such a model can be dynamic object diagrams, interactive diagrams or a sequence diagram. The diagrams are usually used to study the behaviour of a system. Wagner [51] also states that the model is used by software engineers to discover information about how resources are consumed, functions are related, data usage, or concurrent areas in parallel processing. Cornelissen claims that dynamic analysis also suffers from a few issues [52]. Firstly, despite the fact that filtering is performed, the trace files might become very large and complex to comprehend. Secondly, the execution path might be limited, because it could be associated with a particular input. Thirdly, one cannot be certain that the full code coverage has been performed. Additionally,

16 Re-engineering the re-engineering process CHAPTER 4. STATUS OF REVERSE ENGINEERING the scalability of the analysis might become problematic, if large amounts of data are processed. Finally, one must take into consideration the observer effect - the ability of software to behave differently when being monitored, especially when dealing with timing requirements. Even though dynamic analysis suffers from flaws, engineers think it is an essential task. For example, in 2002 the researchers Xie and Notkin [53] compared nine different tools for dynamic call graph extraction. According to them, dynamic call graph extraction is essential for optimizing compilers, analyzing performance and program understanding. Later in 2010, von Detten et al. [54] used the dynamic analysis tool Reclipse [55] to detect design patterns in JUnit 3.8.2, Java’s Abstract Window Toolkit, The Java Generic Library and Eclipse’s Standard Widget Toolkit. The authors concluded that the Composite, Decorator and Template Method occurred. Dynamic and static analysis can also be combined. This technique is known as hybrid analysis. In the literature, researches usually say that they first applied static analysis and then dynamic analysis, but rarely use the term hybrid analysis. In their paper from 2012, Labiche et al. [56] claimed that hybrid analysis can be applied to diminish execution overhead. They successfully applied hybrid analysis to an industry-size software by using control flows and execution traces with the aim to extract scenario diagrams. The authors also performed a systematic literature review and gathered 176 papers about dynamic analysis. Only two of the publications described similar work to theirs. In 2013, Silva et al. [57] also used the approach of hybrid analysis. They successfully applied this technique by performing static analysis during the dynamic analysis phase. The goal of the project was to reverse engineer the user interface of a web application. The web application was written in HTML, CSS, Javascript and Ajax. In the end, the authors successfully extracted a state diagram that illustrated the correct Graphical User Interface (GUI) ’s behaviour.

4.5 Reverse Engineering Challenges

Reverse engineering also yields different challenges. Firstly, for instance, in 1993 Selfriedge et al. [58] declared that there was an absence of industrial case studies. Even though there are now public software development platforms, the size of the available projects might not be enough to test the stability of tools. Reverse engineering tools often cannot handle projects with high lines of code. Secondly, in 1997, another reverse engineering challenge is discussed by van den Brand et al. [59]. They stated that defining a formal syntax definitions of (old) languages is a troublesome task. A syntax definition is mandatory for the parser to know how to build an AST. Thirdly, Bruneliere et al. [60] warn engineers about possible information loss of the system under analysis. This issue could appear due to the heterogeneity of the system. In their paper, they also talk about the existing hardship in managing scalability. Lastly, according to Tonella et al. [46] distinguishing valuable domain specific knowledge from implementation details still remains a rigour.

4.6 Reverse Engineering Tools

After discussing the different reverse engineering technologies and challenges, it is important to mention the available tools too. Engineers need to be aware of their existence to fully use the potential of reverse engineering. According to Skramstad and Khan [61], reverse engineering tools must have three categories of process: 1. “Analyzing the source code format, and recognizing the program structure and semantics of underlying components” 2. “Save the information in a repository”

3. “Present the information to the user”

Re-engineering the re-engineering process 17 CHAPTER 4. STATUS OF REVERSE ENGINEERING

The authors further claimed that reverse engineering tools must also represent information to the in forms of a graphic layout such as structure chart, call graph and control structures. Moreover, textual representation of data structures from the source code should also be supported. Finally, in their paper, the MECCA model suggested that metrics collection and metrics analysis must also be a feature. We have prepared a table with 19 collected reverse engineering tools during the literature study. We found them the most relevant. Table 4.6 lists all of them. The tools can be categorized as either academic or commercial. There are a few more evaluation criteria that were also added. These criteria are: input programming languages and the technical support available. As we are interested in reverse engineering as part of the re-engineering process, we also check which tools support re-engineering. Six of them support that option. Unfortunately, some of these tools do not come as standalone software packages, but the services of software vendor are required as well. Such examples are Obeo Agility and Codecase. In addition, there are tools which are also no longer supported, but can still prove to be useful. ‘

18 Re-engineering the re-engineering process CHAPTER 4. STATUS OF REVERSE ENGINEERING Re-engineering Yes No Yes No Unknown Yes Yes Yes No No No No No No No No No Yes No Description Commercial. Commercial. Commercial. No longer supported. Open Source Commercial. No longer supported. No longer supported. No longer supported. Commercial Commercial. Free to use. Free to use. Commercial. Academic Commercial. Commercial. No longer supported. Academic Analysis Unknown Static Unknown Static Combined Static Static Static Static Static Dynamic Dynamic Dynamic Static Dynamic Static Static Static Static Visualization Unknown Yes Unknown Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No No Yes Yes Yes Languages .NET, C++, Delphi, Java,Ada, COBOL Assembly, .NET, COBOL,Java, C/C++, Jovial, FORTRAN, Delphi/Pascal, PL/M, Python, VHDL, HTML, CSS, Javascript Java, COBOL, Ada, C/C++,C, Oracle C++, forms, COBOL FORTE, .NET SQL, PHP, Java, Delphi C, C++ Java Java, XML, JEE Java C/C++, Delphi, and Java Java Java Java C/C++, Java Java Java, C++, C#, Python,C/C++ RPG Ada, C, COBOL, C++, C#, FORTRAN, Javascript, Natural, PHP, PLSQL, Python, SystemC, Verilog, VHDL, Visual Basic,Language XML Independent Tool CodeCase tools [ 62 ] Understand [ 37 ] Obeo Agility [ 38 ] Rigi [ 63 ] Moose [ 64 ] Bauhaus [ 47 ] JaMooP [ 65 ] MoDisco [ 5 ] Reclipse [ 55 ] SoMoX(based on MoDisco)software MaintainJ [ 66 ] EclipseJIVE [ 67 ] EclipseDiver [ 68 ] Imagix [ 69 ] LearnLib [ 70 ] Columbus(now Sourcemeter) [ 71 ] CodeSurfer [ 72 ] DMS [ 73 ] CodeCrawler(based on Moose) [ 74 ]

Re-engineering the re-engineering process 19 CHAPTER 4. STATUS OF REVERSE ENGINEERING

4.7 Part II: Interviews

To understand how reverse engineering as part of re-engineering is applied in practice, we have conducted interviews. We follow the interview approach of Kendall et al. [75], which consists of five phases such as planning, design, performing the interview, results and discussion. These sub phases are explained in the next subsections.

4.7.1 Planning In this phase, five steps are performed. We start familiarizing ourselves with the topic of re- engineering and reverse engineering. We reuse the collected information from Section 4.2. The next step was to make the interviewing goals clear. Step three is to select the interviewees. We opted for four experts on re-engineering, because of time constraints. Three of them were available at Altran, and another one was highly recommended. They are all industrial software engineers. Last but not least, we prepared the interviewees by sending them an invitation via email explaining what the meeting would be about. The planning finishes with the questions and their structure.

4.7.2 Design The second phase is design of the interview. There is plenty of available articles on how to do this correctly. Examples in the aforementioned book by Kendall of such are the diamond model, pyramid model and funnel model. We chose the funnel model, because we start with a general topic and then try to reach a concrete conclusion. This model starts from a very wide topic and narrows it down by asking more detailed questions. We apply the funnel model twice, first time for the re-engineering process and the second time for reverse engineering tools. We have also obliged the interviewees to rate requirements for reverse engineering tools on the scale from 1 to 5, based on their importance, 1 being the lowest and 5 the highest. The option to mark a requirement as not applicable is also available. In addition to all the listed requirements, an extra field “other” was added if the participants felt that something was missing. The selected requirements for reverse engineering tools are a combination of non-functional and functional. We have also added modern techniques such as machine learning, data-mining and code generation, besides static and dynamic analysis. The importance of the programming language that the tool was written in was also indicated. The user might want to modify its source code. We also ask how crucial is the input format, meaning the number of programming languages that can be parsed by the tool. For the concrete questions and schema to rate the requirements, please refer to Appendix B.

4.7.3 Performing the interview Before the start, we familiarized each participant with the goal of the study. Then we inter- viewed the four experts. These people have many years experience in re-engineering projects in well-known high-tech companies such as Philips[76], NXP[77] and FEI[78]. All of the selected engineers have worked in the medical domain. The duration of each interview was approximately one hour.

4.7.4 Results General results This subsection describes the results of the interview. It did not come to a surprise that the interviewees mostly had different answers. This can be explained by engineers not following a certain standard or just having their own interpretation. Nevertheless, they all agreed on most common reasons for re-engineering such as: migration, architecture ageing, technical debt and quality evaluation. Three out of the four people interviewed said that the validation has been performed through testing, while only one suggested formal system analysis. Another advice that

20 Re-engineering the re-engineering process CHAPTER 4. STATUS OF REVERSE ENGINEERING

Figure 4.2: Reverse Engineering Requirements results from interview all the experts agreed on was that the incremental approach must be used. Mooi et al. explain this approach in their paper [79]. They state that it is important to demonstrate the progress rapidly to all the involved people such as developers, system architects and management. Moreover, the authors split the approach in three phases: identifying the general patterns, identifying the variation points, and finding what remains.

Requirements results As mentioned above, we also asked the participants about what requirements should a reverse engineering tool fulfil. This subsection summarizes these results. Figure 4.2 shows the calculated average for each requirement. As can be seen, the input format seems to be the most important requirement. Input format means the programming language that the projects has been written. Second place of importance go to support, platform, scalability and performance. The interviewees were really concerned if there was nobody to help them with a certain tool when struggling. Finally, the figure also demonstrates that for these four people, machine learning and data-mining are not essential for reverse engineering and re-engineering. One of the explanation might be that they have not applied them previously. Moreover, machine learning and data mining might not meet their industrial case’s needs.

4.7.5 Threats to validity This section presents the threats to validity of re-engineering through interviews. We consider two types of validity that might have affected our research procedures and results: external and construct.

External threats An external threat that might have influenced our research is that a majority of the participants are Altran employees and they have all worked in the medical domain, a situation which might have led to a particular bias. Nevertheless, this bias should have a minimal impact on the interviews, because the interviewees’ re-engineering experience comes from projects in companies specialized in different fields as previously mentioned. Another possible external threat is the low number of participants for this interview. Because of the low number, the results from the people, which we interviewed, may not be enough to make the findings general for all re-engineering projects with reverse engineering. Nonetheless, the answers of the participants were similar. Even though this is a small sample pool, we cannot

Re-engineering the re-engineering process 21 CHAPTER 4. STATUS OF REVERSE ENGINEERING make strong claims, but it is a starting point for future development. We would also like to do an expanded continuation of this study in the future, because of the potential value of the input. To achieve this goal, interviewing more experts will help us see if their opinion matches with our results that we obtained.

Construct validity The main threat to validate that we identified in this process was that the questions asked may have been limited. For example, we asked what were the reasons for re-engineering a project, but knowing who was responsible for initiating the process might have yielded more relevant input. However, this has been prevented by first showing the questions to another expert in the field of re-engineering for inspection. Moreover, we followed the funnel model and the participants in the interview showed no trouble in following the flow. Additionally, the interviewees had the ability to add more information at the end in the form of comments.

4.8 Conclusion

This chapter discussed the different types of reverse engineering technologies, particularly static and dynamic analysis, and how they can be combined. We presented some successful examples of their application. Additionally, we mentioned what keywords were used to conduct a literature study and the chosen digital libraries for obtaining information. Moreover, the entire process of interviewing four re-engineering experts was described in detail. The obtained results showed various reasons for re-engineering. Furthermore, all of them agreed that an incremental approach must be used. We also drew the conclusion from the interviews that the input language for a reverse engineering tool is the most important. Although machine learning and process mining are hot topics in 2017, they did not appear to be relevant for reverse engineering according to the results. Finally, we listed the reverse engineering tools discovered during the literature study with some information about them in Table 4.6.

22 Re-engineering the re-engineering process Chapter 5

Validation of the re-engineering process in an industrial environment

5.1 Introduction

A case study has been carried out to validate our approach. Altran provided us with the smart metering IoT project YouKnowWatt. The goal is to apply MDE techniques to make a new YouKnowWatt project. The process is executed following five out of the eight steps from the BPMN model documented in Chapter 3. Not all steps are performed, because identifying the process or product issues was not relevant. The reverse engineering part is of most interest to us. We perform analysis of the legacy code of the YouKnowWatt project, which was stored in a repository. Our aim is to extract relevant information such as the components of each subsystem to create a model in the Domain Specific Language (DSL) Lotte. We use the reverse engineering tools Understand [37] and Rubrowser [80] to generate graphs and diagrams to extract the required relevant information. We further analyze the names of the modules from the reconstructed artifacts to gain insight into which components to include in the Lotte model. This model is used to construct a new YouKnowWatt project. We finish by performing a validation by comparing the old project with the new one.

5.2 The YouKnowWatt Project 5.2.1 Project introduction The YouKnowWatt project has been initiated to calculate energy consumption in people’s homes. This goal is achieved with an intelligent energy measuring device that can also present the aggregated information to its users. Altran developed the software for the project and owns the final source code and documentation. Figure 5.1 gives a general idea about the YouKnowWatt project. The figure also gives insight where the proprietary YouKnowWatt sensor is placed and shows a general idea of the project’s architecture. The YouKnowWatt sensor is responsible for recording the voltage and current wave of the power. YouKnowWatt consists of five sub-projects. A total of five developers worked on the YouKnowWatt codebase following the Scrum methodology [81]. The final size of YouKnowWatt is 155 KLOC. The main technologies used for development are HTML, CSS, Javascript, Java, Ruby on Rails and C.

Re-engineering the re-engineering process 23 CHAPTER 5. VALIDATION OF THE RE-ENGINEERING PROCESS IN AN INDUSTRIAL ENVIRONMENT

Figure 5.1: YouKnowWatt customer example

5.3 Challenges with IoT projects

The engineers at Altran have identified a number of obstacles in practice when dealing with IoT projects. An example of such a hurdle in IoT projects is the high dependency between components, often experienced in different environments. Morin et al. [82] confirms by stating that deploying large numbers of dissimilar types of nodes is problematic. They also warn the software engineers to be extra careful when developing applications designed to run on shared hardware. Morin further declares that IoT development is complicated and requires plenty of time due to resource constraints, independent development of things, services and interoperability. Another issue in IoT projects is the difficulty to find a multidisciplinary team, with expertise in mobile applications, cloud software and embedded systems. In addition, reuse of code for multiple customers is difficult to achieve or sometimes impossible, because junior developers experience trouble in reading or understanding the code of other people. Frustration is also experienced when integrating the different components that are deployed over different environments. Automation of build and deployment, and end-to-end system testing are also not so straightforward, because the systems are distributed and contain many dynamic behaviours.

5.4 Chimera(A-B-C-D)

After completing numerous IoT projects, one of the system architects at Altran noticed simil- arities between all of them. Thus, the A-B-C-D framework with codename “Chimera” was intro- duced. Figure 5.2 displays the framework and explains what each letter from A-B-C-D stands for.

24 Re-engineering the re-engineering process CHAPTER 5. VALIDATION OF THE RE-ENGINEERING PROCESS IN AN INDUSTRIAL ENVIRONMENT

Figure 5.2: A-B-C-D Framework

The Application (A) is used to present the data from the Backend (B) and offer desirable end user experience. The visualization is achieved using different subsystems such as Android, iOS, HTML5 or Windows applications, depending on the customer needs. The Backend takes care of scalability, security and offers generic RESTful APIs. Additionally, component B can be deployed on any cloud platform, for instance Google, Amazon, Microsoft, Altran or private. Next is the Connectivity (C), which functions as a bridge between the device and backend in a secure, reliable and real-time manner. Examples can be esp8266, RaspberryPi, industrial Linux, Brillo or Android. Last but not least, the Device (D) is the secure, power managed, well designed, embedded software for the device. Examples of communication protocols that can run on the Device (D) are CAN, Modbus, I2C, serial, USB, BTLE, RF434/868, WIFI, TCP or UDP. YouKnowWatt is the first project to use Chimera. First, the software engineers developed new subsystems for the YouKnowWatt project. Then they merged the subsystems’ source code back to the Chimera repository for future reuse. This trend continued with other customer specific IoT projects. Unfortunately, the merging was not performed under strict supervision. This means no special guidelines were followed. Examples of such guidelines are: specific project code to be removed, the quality of the code to be up to standard, meaning passing the unit testing and static code analysis. Additionally, all code must be documented and be part of continuous integration. The evolution of Chimera continued by developers completing even more customer specific projects. Additionally, the customers guided the development process by advising which

Figure 5.3: A-B-C-D Framework evolution

Re-engineering the re-engineering process 25 CHAPTER 5. VALIDATION OF THE RE-ENGINEERING PROCESS IN AN INDUSTRIAL ENVIRONMENT features to create next. The source code for some of these features was also merged back. Figure 5.3 shows the framework history. At some point in time, the decision to implement a DSL for Chimera was made. According to engineers from Altran, the initial reason is to introduce MDE techniques to IoT projects internally. The DSL also serves as an experiment to see if the application of MDE to IoT projects is possible. Altran engineers further claim that by working with a DSL, professionals may have an improved user experience. This user experience is provided by the IDE ’s features. For instance, when using a DSL in Eclipse [83], customization, auto-complete and hints are available. These helpful features are the initial ideas behind Lotte.

5.5 Lotte

The DSL Lotte has been created to improve the development process of IoT framework solu- tions by reusing core projects from the Chimera repository. The purpose of the language is to combine different subsystems from repositories and build end-to-end products from them. Altran has developed Lotte as an internal project. Figure 5.4 displays how Lotte and YouKnowWatt are related with the help of a Venn diagram. As can be noted, there are overlapping repositories between both items. Lotte uses a combination of five key technologies: 1. Eclipse Modeling Framework (EMF) [84] : used for creating the metamodel of Lotte 2. Xtext [85] : used for defining the grammar

3. Query/View/Transformation (QVT) [86] : used for model-to model transformations 4. Freemarker [87] : used as a template engine 5. JGit [88] : used for clone repositories

Lotte first clones one or more projects from the Chimera repositories. These projects are then marked as subsystems or components. Finally, additional modules can be added to the subsystems as Git submodules and Gradle dependencies. Using Gradle for handling dependencies helps en- gineers to automatically obtain dependencies, find conflicts between them make the build process easier. For more information on Gradle dependencies, see the official website [89]. Lotte also helps with overriding values in code using templates. This task is executed from the feature that generates a Docker Compose [90] file. In the Docker Compose file developers describe the different environments that they need for their project, and then the tool integrates them. Figure 5.5 shows an example of a Lotte specification.

Figure 5.4: Lotte and YouKnowWatt relation

26 Re-engineering the re-engineering process CHAPTER 5. VALIDATION OF THE RE-ENGINEERING PROCESS IN AN INDUSTRIAL ENVIRONMENT

Figure 5.5: Lotte illustration

Re-engineering the re-engineering process 27 CHAPTER 5. VALIDATION OF THE RE-ENGINEERING PROCESS IN AN INDUSTRIAL ENVIRONMENT

Figure 5.6: Re-engineering of new YouKnowWatt project

5.6 Validating the model-based re-engineering process

In this section we explain the steps followed while performing the case study. We apply the re-engineering process defined in Chapter 3.

5.6.1 Overview Figure 5.6 illustrates an overview of our plan of performing the re-engineering process for the YouKnowWatt project. As can be noted, the process starts with performing reverse engineering on the old YouKnowWatt project. The idea behind this step is to extract the current components from each sub-project in the form of models. A list of preselected reverse engineering tools assists in this process. After successfully extracting and investigating the models, we write a Lotte specification to generate the new YouKnowWatt project via forward engineering. This step is carried more than once, each time measuring how close the new YouKnowWatt project is similar to the old one, by running the original test suite.

5.6.2 Mapping to the proposed BPMN model Our case study starts with examining the BPMN re-engineering process and then mapping the relevant steps in each phase. Figure 5.7 illustrates the mapping. Above each step there is a coloured rectangle with three types of labels: “Not Applicable”, “Given” and “Done”. The label indicates the scope to the project. Step 1 and 2 and 8 are marked as not applicable, because the goal of the project is not to address issues on process or product level in YouKnowWatt. Thus, we start with Step 3, because we want to use Lotte as a new process to generate a YouKnowWatt product. All the steps from 4 to 7 are executed.

28 Re-engineering the re-engineering process CHAPTER 5. VALIDATION OF THE RE-ENGINEERING PROCESS IN AN INDUSTRIAL ENVIRONMENT

Figure 5.7: Case study mapped to Re-engineering BPMN

5.6.3 Gathering all available project information We started by gaining access to the project files. The software repository consists of five sub- projects and a folder containing all the documentation. The name of the five projects are HTML5 App, Admin Backend, Android Connectivity, Reasoning Backend and Simulator Device. The documentation is divided in three folders: Product, Project and Wiki. The Product folder includes information about the design, model, requirements, specification, testing and user manu- als. The Project folder explains the process behind the development of YouKnowWatt. Wiki contains images of data and information models. Figure 5.8 illustrates the initial YouKnowWatt architecture. As can be seen, there are three main subsystems: presentation engine, reasoning engine and acquisition engine. The correspond- ing components to each subsystem are also visualized below each engine. Firstly, the presentation engine visualizes the correct information to the end users of YouKnowWatt. This subsystem is among others responsible for the power usage and data monitor visualization. From the compon- ents, we can deduce that it is a web application. Secondly, the reasoning engine provides valuable logic for reasoning in the backend. This logic includes functionality such as communication with other parts of YouKnowWatt, retrieving sensor data from a relational database, performing calculations, and storing the results in binary Matlab [91] files or a NoSQL database. Finally, the acquisition engine stores and forwards the data to the cloud, and replaces the real sensor with a simulator. This subsystem also has the responsibility of communicating with the rest of the YouKnowWatt software and simulating the sensor data for it. The connection is established via Wi-Fi or 3G.

Re-engineering the re-engineering process 29 CHAPTER 5. VALIDATION OF THE RE-ENGINEERING PROCESS IN AN INDUSTRIAL ENVIRONMENT

Figure 5.8: Architectural view of YouKnowWatt components from the available documentation

5.6.4 Reverse Engineering Plan The reverse engineering plan is available in Appendix B. The motivation is drawn from [92]. It is essential to make an inventory of the available resources of the project and decide which reverse engineering tools to be used, based on the re-engineering goal. In our case, the goal is to reconstruct the old process inputs for a new Lotte model. The new model can use the old YouKnowWatt architecture components as a starting reference. Phase 1.2 of the reverse engineering plan is to make an inventory of the available resources. Subsequently, phase 1.3 is responsible for selecting a strategy, categorizing the quality attributes and choosing the tools. Table 5.1 lists the identified items, together with their corresponding quality attributes such as structure and relevance. Both use the scale from one to five. For structure, the number one means the artifact has bad or no structure and can only be analyzed manually. The number five means the opposite, the artifact is well-structured and its analysis can be fully automated. The numbers in between express that the item require manual work with a possibility to partially automate the process with a tool. For relevance, one stands for non-relevant and outdated, five for up-to-date. Note that the Android device X5 Mini [93] does not have a number filled in for structure, because it is out of scope. We have selected the number five for relevance, because all items from the final version of YouKnowWatt should be up-to-date.

Table 5.1: Complete result from Phase 1.2 and partial result from Phase 1.3

Item Structure Relevance Codebase of all five sub-projects with Repository history of 4 5 YouKnowWatt Documentation such as requirements and specification 5 5 Management information regarding the process such as 1 5 sprint goals and sprint reviews Validation and acceptance items such as test plan, test cases 1 5 and test results Data files and databases such as MongoDB and PostgreSQL 2 5 schemas Hardware used for the project - an X5 Mini - 5 Three out of five original developers 1 5

30 Re-engineering the re-engineering process CHAPTER 5. VALIDATION OF THE RE-ENGINEERING PROCESS IN AN INDUSTRIAL ENVIRONMENT

Table 5.2: Types of objects for each project

Project Type of objects Engine Power simulator 2 Acquisition Engine Admin backend 3 Reasoning Engine Html5 App 3 Presentation Engine Reasoning backend 2 Reasoning Engine Android Connectivity 3 Acquisition Engine

In Chapter 4, we concluded from the interviews that an incremental approach must be used for reverse engineering. Therefore, we have chosen it as our strategy. Table 5.2 shows the projects from YouKnowWatt, the type of object they belong to and their corresponding engine. As described in Appendix B, there are three types of objects: 1. Not utilizable - these objects are discarded; 2. Object already satisfies reverse engineering goals - these objects are reused;

3. Objects contains reusable parts - these objects are analyzed.

5.6.5 Reverse Engineering Tools: Old YouKnowWatt → models We have applied the following three criteria to select reverse engineering tools necessary for the carrying out of the case study:

1. Programming Language: It is determined by the technologies used in the YouKnowWatt project. 2. Availability: We will focus on open-source or commercial tools, to which Altran has access. 3. Support: The tool must have well-written documentation, video tutorials and a helpful community.

Overview Figure 5.9 illustrates a general overview of the technologies used in the YouKnowWatt project. We gathered this information by running the lines of code counter cloc [3]. As can be seen, Javascript has the largest codebase, followed by Ruby, and then Java. Therefore, the selected tools need to support at least one of these three programming languages. The Javascript source code also has the largest number of commented lines. This fact should assist us when trying to understand more about the included components. In the end, we have selected Understand as our first reverse engineering tool, because of its support to parse Java and Javascript. As a second choice we turned to Rubrowser[80], because none of identified tools in our literature study supported Ruby on Rails.

Understand and its results Another reason for selecting the static analysis tool Understand is the available webinar to introduce newcomers to the tool. Moreover, the support documentation is abundant and easy to comprehend. Furthermore, large corporations such as Adobe, IBM, NASA, SIEMENS, BMW and TOYOTA are using Understand. The reverse engineering tool can also easily visualize parsed source code in different diagrams and generate code metrics. Finally, as a bonus, Altran already had a license for it. In our case study, Understand is used to gain further insight of the HTML5 application and Android connectivity projects by creating a dependency graph and a class diagram.

Re-engineering the re-engineering process 31 CHAPTER 5. VALIDATION OF THE RE-ENGINEERING PROCESS IN AN INDUSTRIAL ENVIRONMENT

Figure 5.9: YouKnowWatt Cloc [3] Result

Figure 5.11 depicts the dependency graph. It was generated by parsing HTML, CSS and Javascript. As can be noted, the picture zooms on the various “components” used in the present- ation engine. We ignore the other views as they are of no interest to us. There are currently five visible components: jQuery, Bootstrap, MomentJS, HighCharts and Angular. Understanding which components are included in the presentation engine is necessary to write a correct Lotte specification. This specification is used to generate a new YouKnowWatt project in a later step. Figure 5.12 illustrates a generated class diagram by Understand for the android connectivity project. We examine all the class names in order to get a basic idea of the functionality behind this project. As can be noted, we have zoomed in on two places, because we have found two class names particularly interesting: RabbitMqDataHandler and MySQLiteHelper. These two names suggest RabbitMQ and SQlite as two components from the project. This information is crucial when writing the new Lotte specification for the new YouKnowWatt project, because we will need to include them. All in all, we find Understand very helpful, especially its reverse engineering features such as generating dependency graphs, class diagrams. Other tools would probably also have coped with extracting a class diagram, as it is a very general task. Yet, we cannot be certain about the how

Figure 5.10: Understand input and output

32 Re-engineering the re-engineering process CHAPTER 5. VALIDATION OF THE RE-ENGINEERING PROCESS IN AN INDUSTRIAL ENVIRONMENT

Figure 5.11: HTMT5 dependency graph created by Understand

Figure 5.12: Android connectivity class diagram created by Understand well they would perform for dependency graphs. We also found useful the ability to generate software metrics and easily navigate through the source code, especially if the engineers work with large codebases, for instance more than 20 KLOC. There is also the availability to develop plug ins. For a list of all the functionality of Understand, see the official website [37]. The only possible disadvantage is that the tool is not free. Moreover, we experienced that if you are not connected to an internal Understand license server, you will not be able to use the tool.

Rubrowser and its results Our aim is to extract the different modules from the Admin Backend by studying its source code with Rubrowser. According to its creator, Rubrowser is a visualization tool for parsing Ruby code that performs analysis and collects the module definitions found in classes and modules. The visualized model is presented in the shape of an interactive graph in the browser. The dependants and dependencies are made available for each module. Figure 5.13 demonstrates how we used this tool in the context of the YouKnowWatt project. Ruby on Rails uses the Model- View-Controller (MVC) software architectural pattern, so we have selected to investigate only the

Re-engineering the re-engineering process 33 CHAPTER 5. VALIDATION OF THE RE-ENGINEERING PROCESS IN AN INDUSTRIAL ENVIRONMENT

Figure 5.13: Rubrowser input and output

Figure 5.14: Admin backend Module and class dependency graph generated by Rubrowser models and controllers. The models might provide us with information about where the data is stored and the business logic behind the application. Additionally, the controllers reveal more details about how the data is used. Figure 5.14 illustrates the dependency graph extracted from the Admin backend project. As can be noted, the model is too complex to fit in one page. Thus, we zoom in on particular sides. There are a few points that have many dependencies such as: Doorkeeper:AccessToken, Admin::WebController, ActiveRecord::Base, ActiveScaffold::Config and ActiveScaffold::DataStructures. ActiveScaffold and Doorkeeper are Ruby on Rails gems. Gems are dependencies that are used to help simplify the work of the developer. ActiveScaffold is responsible for Create Read Update Delete (CRUD) user interfaces, while the main functionality of Doorkeeper is to provide OAuth2 support. According to the official description on the Github page of ActiveScaffold, the gem requires a database to function. Therefore, Admin backend must contain at least one database. This information is not directly visible by just examining dependencies. One of the Doorkeeper modules, Doorkeeper::AccessToken, lists Doorkeeper::Models::Mongoid2::Scopes, Doorkeeper::Models::Mongoid3::Scopes, Doorkeeper::Models::Mongoid4::Scope as dependencies. Thus, this project is certainly running MongoDB in the background. A manual double check on the application’s configuration file confirms this statement. Connection settings that list Post- greSQL were also identified, meaning the admin backend project makes use of two databases.

34 Re-engineering the re-engineering process CHAPTER 5. VALIDATION OF THE RE-ENGINEERING PROCESS IN AN INDUSTRIAL ENVIRONMENT

Figure 5.15: Zoomed-in Admin:WebController

Next, we investigate Admin::WebController, as it appears to have the second most dependants. Figure 5.15 displays a general overview of the module’s dependants and dependencies. As can be noted from the names of the dependencies, this controller deals with user’s roles, permissions and views. Additionally, it handles information for devices, sessions, contracts and channels. After examining the source code of the dependants, we observed that all of previously mentioned information of Admin::WebController is stored in a PostgreSQL database. To sum up, we are happy with the performance of Rubrowser. We recommend this simple tool for Ruby projects, because it is easy to set up, the usage is straightforward by having no complicated commands and it runs in a browser. We would not have been able to reverse engineer the YouKnowWatt project without this tool, because we were not able to find an alternative.

5.6.6 Perform early validation Table 5.3 summarizes the results from the aforementioned extracted models. Some items have appeared both in the models and documentation. These items were marked as being discovered by the tool. We now have a better understanding about the components that are part of the three projects. No validation report was produced, because Table 5.3 was validated by the original developers and writing such a report was out of scope.

5.6.7 Applying model-driven engineering Models → Lotte We specify a Lotte model by reusing the information extracted via reverse engineering from the models in section 5.6.5. This part cannot be automated as the specification requires manual work. Note that Lotte does not generate all the source code, so there will be missing parts. Fur- thermore, no documentation is generated. Hence, the engineers will need to write the specification documents. Figure 5.16 shows the final Lotte specification. We use ”YKWv2“ for the name of the re- engineered YouKnowWatt. As can be noted, the four Chimera components A,B,C and D are present. The parameter CreateGit has been set to false, because creating a Git repository is not necessary.

Re-engineering the re-engineering process 35 CHAPTER 5. VALIDATION OF THE RE-ENGINEERING PROCESS IN AN INDUSTRIAL ENVIRONMENT

Figure 5.16: Lotte specification for the new YouKnowWatt

Software architects advised us to choose a project named HTML5 app for component A. Moreover, this project is part of the general repositories with sample IoT projects. Component B includes two projects: admin backend and reasoning. We reuse the reasoning from the old YouKnowWatt, because rewriting the reasoning engine is not the aim of this research. Admin backend is another base project from the general IoT repository. The extra commands such as “environment”, “command” and “ports” are used for the deployment configuration of

Table 5.3: Information per project overview

Plain Javascript Discovered in Model Make calls to OAuth service Discovered in Model HTML5 App No extra docker containers Discovered in documentation Client needs an API key Discovered with manual work Uses MongoDB Discovered in Model Uses PostgreSQL Discovered with manual work Admin backend Uses Apache2 Discovered with manual work Uses Rubystack Discovered with manual work Uses RabbitMQ Discovered with manual work Runs on X5 Mini Discovered in documentation Stores data in SQlite Discovered in Model Android Connectivity Uses RabbitMQ Discovered in Model Sends or receives data using RTP Discovered in Model

36 Re-engineering the re-engineering process CHAPTER 5. VALIDATION OF THE RE-ENGINEERING PROCESS IN AN INDUSTRIAL ENVIRONMENT

Figure 5.17: New YouKnowWatt Directory Structure

Docker. Docker is a container platform provider that runs in clouds, and datacenters. It ensures independence between applications. Docker is used to deploy different YouKnowWatt components in containers. We added the line platform backend compose in order to use the built-in functionality of Docker in Lotte. Consequently, we include five Docker containers Bitnami, PostgreSQL, Mon- goDB, RabbitMQ and SQlite. This is done, because MongoDB was identified in the admin backend dependency graph from Figure 5.14 and PostgreSQL with manual work. RabbitMQ and SQlite are also added, because their presence was recognized by the Android connectivity class diagram in Figure 5.12. Furthermore, the manual inspection of configuration files confirmed these facts and also found Bitnami as a component. Analogously, Component C is also a base Android IoT project from the general repository responsible for connecting devices. We chose this project, because of advisory input from the initial developers. Component D reuses the original power simulator for the same reasons as the reasoning engine.

Lotte → New YouKnowWatt Figure 5.17 depicts the folder tree structure of the new YouKnowWatt. As can be seen, each of the four components has its own project folder, cloned from the general IoT repository. Additionally, there is the folder backend compose containing the Docker configuration file for all the services using containers. Figure 5.18 shows the new architecture. The red cross stands for an item not being there, the green tick means that the item exists in the old and new architecture. New means that the component has not been present in the old YouKnowWatt. Each component is also mapped to Lotte’s A-B-C-D. As can be noted, the Bluetooth for the acquisition engine is new. Moreover, now Bitnami, MongoDB, PostgreSQL and SQlite are also part of the application, but as Docker containers. Thus, the management of components is simplified and security increased. During the reverse engineering Hadoop was not found as a component. The initial developers confirmed that they were planning to use it, but never included it. This explains the reason it is still in the old documentation. The rest of the components are the same. To sum up, the architecture level of the old YouKnowWatt has been reached.

Re-engineering the re-engineering process 37 CHAPTER 5. VALIDATION OF THE RE-ENGINEERING PROCESS IN AN INDUSTRIAL ENVIRONMENT

Figure 5.18: New YouKnowWatt architecture

5.6.8 Analyzing the results During this step, we put HTML5 App, Admin Backend and Android connectivity from the new YouKnowWatt project under the microscope. We examine only three of the components, as indicated in Table 5.2, because the reasoning backend and power simulator are reused. We manually compared folder structure and read all files to get a better understanding of what has been changed compared to the old YouKnowWatt. For convenience, we have summarized the information that we collected in Table 5.4, Table 5.5 and Table 5.6. These tables highlight similarities, differences, the technology used and function- ality. We have chosen these three categories, because of their significance for comprehending the end re-engineering result. We are also interested how the new components answer the following three questions:

• Is the component complete? If not what is missing? • Is the component compilable? • Does the new component pass the tests from the old product?

The similarities from Table 5.4 show that both old and new components A have a very analogous project structure in terms of folders and files. However, the two A components use different build tools. We are uncertain if this difference causes any major changes in the behaviour. The difference between the two build tools is that Bower is used as a package manager for web applications, and the engineers need to state from where to get the dependencies. On the other hand, Grunt is a task runner and makes use of the package.json to download the dependencies. The old component A has also been delivered without any tests. The new component A does not lack tests, because it has continuous integration. Additionally, the lines of code for both projects seems to be similar. However, it is noticeable that the new component A has 31 more files. The majority of these files are part of continuous integration. The main technologies behind both projects are Javascript, CSS, and HTML. The new component A also contains Bourne Shell files with scripts for continuous integration, hence increasing the file count. Table 5.5 displays almost no similarity between the old and new Component B. Even though the new component B is used as base, the differences are rather striking. For example, compared to the old B, the new project B is 22 times smaller in lines of code. Moreover, the new B also contains approximately six times less files. This finding suggests that the new component B is not

38 Re-engineering the re-engineering process CHAPTER 5. VALIDATION OF THE RE-ENGINEERING PROCESS IN AN INDUSTRIAL ENVIRONMENT

Table 5.4: Component A

Similarities Have features, src folders Have package.json Some main folder structure in src/app Have folder containing the same components New A Old A 7 folders 3 folders Built with Bower Built with Grunt Has tests No tests Has scripts folder for continuos integration No scripts folder Angular App No server component 129 files 98 files CLOC LOC: 86401 CLOC LOC: 85046 Understand LOC: 72855 Understand LOC: 71881 Technology Javascript Javascript CSS CSS HTML HTML Ruby YAML Bourne Shell Functionality OAuth 2 OAuth 2 Chart visualization Chart visualization Login Login Dutch Language Support

Table 5.5: Component B

Similarities Standard Ruby on Rails project folder structure New B Old B Contains main directory with Docker configuration file Contains active scaffold folder Contains scripts folder to assist in CI Contains doorkeeper folder Folder Bin contains 3 more files: setup, console and update App folder also has helpers, mailers, assets folders Config folder contains extra files spring.rb, puma.rb, cable.yml Config folder contains extra files test mongo.yml, test db.yml, staging mongo.yml, stagind db.yml CLOC LOC: 2052 Build.gradle file in main directory Number of Files: 114 .metrics file CLOC LOC: 86401 CLOC LOC: 85046 Technology Ruby Ruby Yaml Javascript Bourne Shell YAML CSS SASS HTML Bourne Again Shell Functionality Exposes API endpoint Exposes API endpoint Device support Supports OAuth 2 Subject support Web pages done with CRUD for user interface User management support Device support Channels support Permissions support Transducer support User management support complete and additional manual work is required to reach the behaviour of the old component. In order to find the exact amount of additional manual work, the two components need to be compared on source code level line by line. We did not perform this task, due to time restrictions. Even though the new project B compiles and runs, the tests still fail. The failure may be explained that a full end-to-end system needs to be present with the corresponding hardware to actually achieve the success of tests. Nevertheless, unlike the old B, continuous integration is included in the new B. There also seems to be a major difference in the technology used. The new component B seems to be simple by being implemented in Ruby and taking advantage of YAML and Bourne Shell. There is also a major difference in the functionality. The old component B

Re-engineering the re-engineering process 39 CHAPTER 5. VALIDATION OF THE RE-ENGINEERING PROCESS IN AN INDUSTRIAL ENVIRONMENT

Table 5.6: Component C

Similarities Standard Android projects with Gradle as build tool Use of JUnit for test cases Use of Mockito Use of rabbitmq-amqp-client Use of nonsenseapps:filepicker New C Old C Jenkins integration Uses dexmaker:mockito Uses pipeline module Uses common-net 3.3 Uses colibri module Uses appcompat Uses base protocols Uses acra Uses gson CLOC LOC: 17923 Uses beanlib Number of files: 282 Uses Conan CLOC LOC: 1324 Number of files: 36 Technology Java Java XML XML Bourne Shell Again Bourne Shell Again DOS Batch DOS Batch YAML YAML Ruby Ruby IDL Functionality Bluetooth support Upload files through USB SSH support Crash logger to a file Communication Service Database connection support Colibri service Graph visualization JSON, RTP, TCP, Random input Tuning of graph view, usb, and server preferences JSON, REST, SSH output Handles data from RTP, AMQP, databases and files Temperature Info HTTP, WebSocket, Temperature pipelines contains more features as listed in the table. Table 5.6 introduces five similarities for both C components such as being an Android project, having tests and use of three components. For differences, the new C has six different modules and the old has four. This inequality results in different functionalities. Additionally, the size of the codebase of the new C is 16 times smaller than the old one. As a result of these two findings, and similar to component B, manual work will be required. The required work can be determined by studying the difference in the various components. A general picture can be achieved by looking at both class diagrams. We did this observation and noticed that the new component C includes continuous integration, Bluetooth and SSH communication. There is also a pipeline for temperature. Almost none of the old component C’s functionality seems to be included in the new C. For instance, uploading files through USB, crash logging and graph visualization are missing. The difference in functionality probably comes from the new Android connectivity project from the IoT projects. The evolution of Chimera might also explain the new functionality as customers might have added new features to the original project. Nevertheless, the missing features of the old C component will still have to be added manually by developers to the new YouKnowWatt.

5.7 Extended BPMN after validation

After completing the case study, we understood that our initially proposed model-based re- engineering BPMN model can be extended to provide even better results. Figure 5.19 illustrates the improved version with the new extensions. As can be noted, an extra connection between Step 7 and 6 has been added, because parts might be missing from the models used in the MDE process. Thus, the newly generated product will not be complete. To resolve this issue, the engineer must go back to Step 6 and fill in the missing information. Nevertheless, it is also possible that not all the necessary artifacts were previously reconstructed. Hence, a connection between Step 7 and 4 has also been added.

40 Re-engineering the re-engineering process CHAPTER 5. VALIDATION OF THE RE-ENGINEERING PROCESS IN AN INDUSTRIAL ENVIRONMENT

Figure 5.19: Improved model-based re-engineering BPMN model

5.8 Conclusion

In this chapter we explain how a new YouKnowWatt project is re-engineered using the DSL Lotte. The aim of this case study is to validate and follow the steps from our suggested process in Chapter 3, while focusing on reverse engineering to recover artifacts as input for the new process. The feedback from the interviews described in Chapter 4 to use the incremental approach was also taken into consideration. Successfully extracted are dependency graphs and a class diagram from three of the YouKnowWatt projects using two static analysis tools. The old existing documentation is also examined. By doing so we determined which components to include in the new Lotte specification. Consequently, we compare the new and old YouKnowWatt to get a better understanding of the difference between the two projects. Our results show that the new components B and C are incomplete. Thus, the missing parts need to be developed manually. Nevertheless, the new components are still compilable and run- nable. Additionally, there are differences in the architecture of the two YouKnowWatt project, for instance Docker containers and Bluetooth functionality are new features added in the new YouKnowWatt project. We recommend to Altran to use their in-house knowledge and experts to further investigate why did the test for the new YouKnowWatt fail on code level to avoid this obstacle when using Lotte again to re-engineer a new product.

Re-engineering the re-engineering process 41 Chapter 6

Conclusions

6.1 Research questions and conclusions

Despite being the 21st century, the number of legacy systems in production is still very high. This fact is quite worrisome, because their maintenance and introduction of new features is quite difficult. Moreover, simply replacing the legacy systems is not an option, because valuable business logic might be lost. Re-engineering can offer a solution by studying the old legacy system with the help of reverse engineering, extracting models and using them as input for forward engineering techniques to create a new modern system. This is the basic process behind model-based re- engineering. Furthermore, there is even a standard, e.g the Horseshoe model how to perform this as a migration. Nevertheless, this standard is not detailed and lacks verification and validation of the newly generated product and the new process.

RQ1: What is the process of Model-based re-engineering? Our RQ1 addresses these problems in detail by presenting the different variations of the horse- shoe model for model-based re-engineering and their flaws. We proposed a new detailed model- based re-engineering process expressed with four phases, divided in seven steps. The answer to RQ1 also included the different sample activities, inputs and outputs for each step.

RQ2: What is the current state of Reverse Engineering? RQ2 deals with performing reverse engineering as a sub phase of the model-based re-engineering process. We investigated the different types of technology, focusing on static and dynamic ana- lysis. Different case studies where this technology was applied were also mentioned. Additionally, a summary of some available reverse engineering tools was also described. This included differ- ent categories such as support for re-engineering, static and/or dynamic analysis, programming languages and visualization. To answer the RQ2 fully, we also interviewed four experts. Most of them agreed that testing was the main way to validate the results from reverse engineering and re-engineering. They further stated that the incremental approach fitted best the re-engineering process. Finally, all of them were not interested in machine learning and process mining for reverse engineering.

RQ3: How to validate the re-engineering process in an industrial environment? RQ3 showed how we applied our findings on a case study provided by Altran as a proof of concept of the proposed model-based re-engineering process. We extracted and analysed depend- ency graphs and a class diagram from three of the existing sub-projects. The results determined which components to include in the DSL Lotte. Lotte was used to generate a new YKW pro- ject. Finally, we verified the similarity between the old and new product by answering three subquestions: • Is the product complete? If not what is missing?

42 Re-engineering the re-engineering process CHAPTER 6. CONCLUSIONS

• Is the product compilable? • Does the new product pass the tests from the old product? The results show that two of the components are incomplete, with rather different function- alities and need manual work. One of the reasons might be the undocumented evolution of the Chimera repository. Nonetheless, engineers need to implement support for OAuth2, transducer, permissions, devices and a CRUD for the user interface in Component B. The new Component C is missing most of the old C’s features such as uploading files through USB, Crash logging and graph visualization. We can now be certain about the minimum amount of manual work required for the developers.

6.2 Future work

After summarizing our findings in this thesis, there is plenty of work that can be done as a continuation of this project. This section describes what can be improved in the future. First, our proposed model-based re-engineering process needs to be verified by more experts. As mentioned before, the low number might not be enough to make general claims. However, it can serve as a starting point for an expanded continuation. It would also be beneficial to find industrial case studies, where all the steps from our proposed model-based re-engineering process can be executed. These industrial cases should also further validate the proposed table for sample activities, inputs and outputs. This validation may be possible to achieve through trial and error. Another possible step is to build a new reverse engineering tool based on the interview results. Based on the interview results, when building a new tool, the developers need to put focus on the programming languages. For instance, we advise the engineers to start by adding support for the not so common programming languages such as Ruby and Pascal, because they are still used, but not enough tools exist. Future versions can include support for more modern programming languages like Swift and Kotlin. Furthermore, it is highly recommended for the people who develop these tools and include static analysis as a technique to have background knowledge about parsers. Next, it would be interesting for our model-based re-engineering process to see what would happen if a DSL that defines behaviour is used. Unfortunately, at this point it is not possible to define behaviour in Lotte, because the metamodel does not support behaviour. This feature might be possible in the future if engineers add a behavioural model to Lotte. Another possibility is to use another language that can define behaviour in combination with Lotte. Lotte also assembles components together into one final product, hence this DSL might be classified as an architecture description language (ADL). The article [94] lists all the requirements for an ADL. After using Lotte to complete the reconstruction of a new YouKnowWatt project, we know that the language lacks typical ADL features such as graphical syntax and architecture validation. These features can be implemented in the future. Finally, a good continuation could be to carry out case study focusing on process improvement. We recommend this topic, because it is related to the first two steps in the BPMN model and we briefly discussed it. A possible case study is to analyze the logs of a development process to determine the issues of the development process with the help of process mining. Then engin- eers could apply our model-based re-engineering process as a solution and further validate our approach.

Re-engineering the re-engineering process 43 Bibliography

[1] J. Bergey, D. Smith, N. Weiderman, and S. G. Woods, “Options Analysis for Reengineering ( OAR ): Issues and Conceptual Approach,” Technical Report CMU/SEI-1999-TN-014, Software Engineering Institute, Carnegie Mellon University, Pittsburgh, PA, no. September, 1999. [Online]. Available: http://www.sei.cmu.edu/library/abstracts/reports/99tn014.cfm vii,1,6,7 [2] V. Khusidman and W. Ulrich, “Architecture-driven modernization: Transforming the enter- prise,” Seminar Software Analyse and Trasformation, pp. 1–7, 2007. vii,7

[3] AlDanial, “Aldanial/cloc,” Aug 2017. [Online]. Available: https://github.com/AlDanial/cloc vii, 31, 32 [4] “Discover altran world leader in engineering solutions and r & d.” [Online]. Available: https://www.altran.com/nl/en/1

[5] G. D. H. Bruneliere, “Modisco.” [Online]. Available: http://www.eclipse.org/MoDisco/2, 19 [6] G. Barbier, H. Bruneliere, F. Jouault, Y. Lennon, and F. Madiot, “MoDisco, a Model-Driven Platform to Support Real Legacy Modernization Use Cases,” Information Systems Trans- formation, pp. 365–400, 2010.2 [7] A. Ahmad and M. A. Babar, “A Framework for Architecture-driven Migration of Legacy Systems to Cloud-enabled Software,” 2014.2,9 [8] Ian Warren, The Renaissance of Legacy Systems. Springer London, 1999.3 [9] A. Alderson and H. Shah, “Viewpoints on Legacy Systems,” vol. 42, no. 3, pp. 115–116, 1999. 3

[10] J. Backus, J. Lee, and G. Ryckman, “FORTRAN I, II and III,” History of Programming Languages, p. 25, 1981.3 [11] “Ada programming language.” [Online]. Available: http://www.adaic.org/3 [12] T. M. Pigoski, Practical Software Maintenance: Best Practices for Managing Your Software Investment. Wiley, 1996. [Online]. Available: https://www.amazon.com/Practical-Software-Maintenance-Practices-Investment-ebook/ dp/B000UD5RMS?SubscriptionId=0JYN1NVW651KCA56C102&tag=techkie-20& linkCode=xm2&camp=2025&creative=165953&creativeASIN=B000UD5RMS4 [13] M. M. Lehman, “On understanding laws, evolution, and conservation in the large-program life cycle,” J. Syst. Softw., vol. 1, pp. 213–221, Sep. 1984. [Online]. Available: http://dx.doi.org/10.1016/0164-1212(79)90022-04 [14] E. J. Chikofsky and J. H. Cross, “Reverse engineering and design recovery: a taxonomy,” IEEE Software, vol. 7, no. 1, pp. 13–17, Jan 1990.4,5

44 Re-engineering the re-engineering process BIBLIOGRAPHY

[15] N. Ajlouni and F. Hani, “Redesigning Legacy Systems using Hybrid Re-engineering,” 2006 2nd International Conference on Information & Communication Technologies, vol. 2, no. 1, pp. 2784–2785, 2006.4 [16] M. Hammer and J. Champy, Reengineering the Corporation: A Manifesto for Business Re- volution. HarperBusiness, 1993.4 [17] R. C. seacord, D. Plakosh, and G. A. Lewis, Modernizing Legacy Systems: Software Tech- nologies, Engineering Process and Business Practices. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 2003.4 [18] V. Garcia, D. Lucredio, a. D. Prado, a. Alvaro, E. D. Almeida, Z. Zhang, S. Wang, V. Garcia, D. Lucredio, a. D. Prado, a. Alvaro, and E. D. Almeida, “Towards an effective approach for reverse engineering,” 11th Working Conference on Reverse Engineering, pp. 4–5, 2004.4 [19] A. Jain, S. Soner, and A. Gadwal, “Reverse engineering: Journey from code to design,” ICECT 2011 - 2011 3rd International Conference on Electronics Computer Technology, vol. 5, pp. 102–106, 2011.4 [20] R. K. Keller, R. Schauer, S. Robitaille, and P. Pag´e, “Pattern-based reverse- engineering of design components,” pp. 226–235, 1999. [Online]. Available: http: //doi.acm.org/10.1145/302405.3026224 [21] J. Bowen, “Creating Models of Interactive Systems with the Support of Lightweight Reverse- Engineering Tools,” pp. 110–119, 2015.5 [22] E. W. Dijkstra, “Letters to the editor: Go to statement considered harmful,” Commun. ACM, vol. 11, no. 3, pp. 147–148, Mar. 1968. [Online]. Available: http://doi.acm.org/10.1145/362929.3629475 [23] H. Stachowiak, Allgemeine Modelltheorie. Springer-Verlag, 1973. [Online]. Available: https://books.google.nl/books?id=DK-EAAAAIAAJ5 [24] T. K¨uhne,“Matters of (meta-) modeling,” pp. 369–385, 2006.5 [25] “Object management group.” [Online]. Available: http://www.omg.org/7 [26] A.-d. M. S. Roadmap, “February 2009,” no. February, p. 2009, 2009.7 [27] R. H. Linda, “Software re-engineering,” Software Assurance Technology Center, Goddard Space Flight Center, NASA, Tech. Rep. 301-286-0087.8 [28] “Object management groupbusiness process model and notation.” [Online]. Available: http://www.bpmn.org/9 [29] F. Fabbrini, M. Fusani, S. Gnesi, and G. Lami, “The linguistic approach to the natural language requirements quality: Benefit of the use of an automatic tool,” pp. 97–, 2001. [Online]. Available: http://dl.acm.org/citation.cfm?id=829503.830081 10 [30] M. Ansar and T. A. Khan, “Non-technical issues in software designing phase,” pp. 179–184, Aug 2016. 10 [31] Refactoring: Improving the Design of Existing Code. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1999. 10 [32] M. P. E. Heimdahl, “Model-Based Testing : Challenges Ahead,” no. March 2004, p. 3157, 2005. 10 [33] C. P. Team, “Cmmi for development, version 1.3,” Software Engineering Institute, Carnegie Mellon University, Pittsburgh, PA, Tech. Rep. CMU/SEI-2010-TR-033, 2010. [Online]. Available: http://resources.sei.cmu.edu/library/asset-view.cfm?AssetID=9661 10

Re-engineering the re-engineering process 45 BIBLIOGRAPHY

[34] “Information technology – Process assessment – Guide for process improvement,” Interna- tional Organization for Standardization, Geneva, CH, Standard, Dec. 2013. 11 [35] “Acm digital library.” [Online]. Available: https://dl.acm.org/ 14 [36] “Ieee xplore digital library.” [Online]. Available: http://ieeexplore.ieee.org/Xplore/home.jsp 14 [37] “Understand static code analysis tool.” [Online]. Available: https://scitools.com/ 15, 16, 19, 23, 33 [38] “Obeo agility.” [Online]. Available: https://www.obeo.fr/en/products/obeo-agility 15, 19 [39] G. Canfora, M. Di Penta, and L. Cerulo, “Achievements and challenges in software reverse engineering,” Commun. ACM, vol. 54, no. 4, pp. 142–151, Apr. 2011. [Online]. Available: http://doi.acm.org/10.1145/1924421.1924451 15, 16 [40] I. Goldstein, “Pretty Printing : Converting List to Linear Structure,” 1973. 15 [41] “Welcome to common-lisp.net!” [Online]. Available: https://common-lisp.net/ 15 [42] M. de Jonge, “Pretty-printing for software reengineering,” International Conference on Software Maintenance, 2002. Proceedings., pp. 550–559, 2002. [Online]. Available: https://www.scopus.com/inward/record.url?eid=2-s2. 0-0036439969{&}partnerID=40{&}md5=05a2ed392758648985777f424f04ea8c{%}5Cnhttp: //ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1167816 15 [43] D. Jerding and S. Rugaber, “Using visualization for architectural localization and extraction,” Science of Computer Programming, vol. 36, no. 2, pp. 267–284, 2000. 16 [44] A. Telea and L. Voinea, “An interactive reverse engineering environment for large-scale C++ code,” Proceedings of the 4th ACM symposium on Software visuallization, pp. 67–76, 2008. [Online]. Available: http://dl.acm.org/citation.cfm?id=1409720.1409732 16 [45] McPeak, “Elkhound and elsa,” 2006. [Online]. Available: http://scottmcpeak.com/elkhound/ 16 [46] P. Tonella and A. Potrich, Reverse Engineering of Object Oriented Code (Monographs in Computer Science). Secaucus, NJ, USA: Springer-Verlag New York, Inc., 2004. 16, 17 [47] “Axivion bauhaus suite: the complete solution for software erosion protection.” [Online]. Available: https://www.axivion.com/en/produkte-60 16, 19 [48] B. J. Berger, M. Bunke, and K. Sohr, “An Android Security Case Study with Bauhaus,” 2011. 16 [49] J. Ludwig, S. Xu, and F. Webber, “Compiling Static Software Metrics for Reliability and Maintainability from GitHub Repositories,” pp. 5–9, 2017. 16 [50] T. Ball, “The concept of dynamic analysis,” SIGSOFT Softw. Eng. Notes, vol. 24, no. 6, pp. 216–234, Oct. 1999. [Online]. Available: http://doi.acm.org/10.1145/318774.318944 16 [51] C. Wagner, Model-driven software migration: A methodology: Reengineering, recovery and modernization of legacy systems, 2014. 16 [52] B. Cornelissen, A. Zaidman, A. van Deursen, L. Moonen, and R. Koschke, “A systematic survey of program comprehension through dynamic analysis,” IEEE Transactions on Software Engineering, vol. 35, no. 5, pp. 684–702, Sept 2009. 16 [53] D. Notkin, “An Empirical Study of Java Dynamic Call Graph Extractors,” no. December, 2002. 17

46 Re-engineering the re-engineering process BIBLIOGRAPHY

[54] M. von Detten, M. Meyer, and D. Travkin, “Reverse engineering with the reclipse tool suite,” Proceedings of the 32nd ACM/IEEE Interna- tional Conference on Software Engineering - ICSE ’10, vol. 2, p. 3, 2010. [Online]. Available: http://www.scopus.com/inward/record.url?eid=2-s2. 0-77954706960{&}partnerID=40{&}md5=e7861067ab4ceb633dd20e9fc2bdaee6{%}5Cnhttp: //portal.acm.org/citation.cfm?doid=1810295.1810360 17 [55] S. Becker, “Reclipse reverse engineering tool,” Feb 2015. [Online]. Available: https: //github.com/CloudScale-Project/StaticSpotter 17, 19 [56] Y. Labiche, B. Kolbah, and H. Mehrfard, “Combining static and dynamic analyses to reverse-engineer scenario diagrams,” IEEE International Conference on Software Mainten- ance, ICSM, pp. 130–139, 2013. 17 [57] C. E. Silva and J. C. Campos, “Combining static and dynamic analysis for the reverse engineering of web applications,” Proceedings of the 5th ACM SIGCHI symposium on Engineering interactive computing systems - EICS ’13, p. 107, 2013. [Online]. Available: http://dl.acm.org/citation.cfm?id=2494603.2480324 17 [58] P. Selfridge, R. Waters, and E. Chikofsky, “Challenges to the field of reverse engineering,” [1993] Proceedings Working Conference on Reverse Engineering, pp. 144–150, 1993. 17 [59] M. van den Brand, P. Klint, and C. Verhoef, “Re-engineering Needs Generic Programming Language Technology,” SIGPLAN Not., vol. 32, no. 2, pp. 54–61, 1997. [Online]. Available: http://doi.acm.org/10.1145/251621.251633 17 [60] H. Bruneliere, J. Cabot, G. Dup´e, and F. Madiot, “Modisco: A model driven reverse engineering framework,” Information and Software Technology, vol. 56, no. 8, pp. 1012–1032, 2014. [Online]. Available: http://www.sciencedirect.com/science/article/pii/ S0950584914000883{%}5Cnhttps://hal.inria.fr/hal-00972632/document 17 [61] T. Skramstad and M. K. Khan, “Assessment of reverse engineering tools: A MECCA approach,” pp. 120–126, 1992. [Online]. Available: http://ieeexplore.ieee.org/ielx2/389/ 5275/00205845.pdf?tp={&}arnumber=205845{&}isnumber=5275 17 [62] L. Graphoir, “Codecase software is an innovating solution to map, transform or build complex and multi-technologies information systems.” [Online]. Available: http://codecasesoftware.com/ 19 [63] “Rigi software.” [Online]. Available: http://www.rigi.cs.uvic.ca/Pages/download.html 19 [64] A. Bergel, “Moose enables humane assessment.” [Online]. Available: http://www. moosetechnology.org/ 19 [65] “What is jamopp?” [Online]. Available: http://www.jamopp.org/index.php/JaMoPP 19 [66] “We simplify the complexity of maintaining java code.” [Online]. Available: https: //www.maintainj.com/ 19 [67] “Jive: Java interactive visualization environment.” [Online]. Available: http://www.cse. buffalo.edu/jive/ 19 [68] “Eclipse diver, dynamic interactive views for reverse engineering.” [Online]. Available: https://eclipsediver.wordpress.com/ 19 [69] “Imagix, reverse engineer and analyze your source code.” [Online]. Available: https: //www.imagix.com/ 19 [70] “Learnlib is a free, open-source (apache license 2.0) java library for active automata learning.” [Online]. Available: https://learnlib.de/ 19

Re-engineering the re-engineering process 47 BIBLIOGRAPHY

[71] “Sourcemeter advanced source code analysis suite 2016,” Dec 2016. [Online]. Available: https://www.sourcemeter.com/ 19 [72] Rfleming, “Codesurfer,” Aug 2017. [Online]. Available: https://www.grammatech.com/ products/codesurfer 19 [73] I. S. Designs, “Design maintenance system.” [Online]. Available: http://semanticdesigns. com/Products/DMS/ 19 [74] M. Lanza, “Codecrawler-lessons learned in building a software visualization tool,” in Seventh European Conference onSoftware Maintenance and Reengineering, 2003. Proceedings., March 2003, pp. 409–418. 19 [75] K. E. Kendall and J. E. Kendall, Systems analysis and design. Pearson Prentice Hall, 2011, vol. 8. 20 [76] “Nederland.” [Online]. Available: https://www.philips.nl/ 20 [77] “Ahead of the curvewith autonomous driving.” [Online]. Available: http://www.nxp.com/ 20 [78] M. Anderson, “Fei, high-performance microscopy workflow solutions,” Jun 2017. [Online]. Available: https://www.fei.com/ 20 [79] D. Kolovos and M. Wimmer, “Theory and practice of model transformations: 8th Interna- tional Conference, ICMT 2015 held as part of STAF 2015 L’Aquila, Italy, July 20-21, 2015 proceedings,” Lecture Notes in Computer Science (including subseries Lecture Notes in Arti- ficial Intelligence and Lecture Notes in Bioinformatics), vol. 9152, pp. 66–81, 2015. 21 [80] emad elsaid, “emad-elsaid/rubrowser,” Oct 2017. [Online]. Available: https://github.com/ emad-elsaid/rubrowser 23, 31 [81] K. Schwaber, Agile project management with Scrum. Microsoft Press, 2004. 23 [82] B. Morin, N. Harrand, and F. Fleurey, “Model-Based Software Engineering to Tame the IoT Jungle,” 2017. 24 [83] E. Foundation, “Eclipse oxygen.” [Online]. Available: https://www.eclipse.org/ 26 [84] R. Gronback, “Eclipse modeling framework (emf).” [Online]. Available: https://www. eclipse.org/modeling/emf/ 26 [85] S. Efftinge and M. Spoenemann, “Why xtext?” [Online]. Available: https: //www.eclipse.org/Xtext/ 26 [86] C. Guindon, “Eclipse qvt operational,” Jan 2017. [Online]. Available: https://projects. eclipse.org/projects/modeling.mmt.qvt-oml 26 [87] B. Geer and M. Bayer, “What is apache freemarker?” Sep 2017. [Online]. Available: http://freemarker.org/ 26 [88] C. Aniszczyk, “Jgit.” [Online]. Available: https://www.eclipse.org/jgit/ 26 [89] “Chapter 25. dependency management.” [Online]. Available: https://docs.gradle.org/ current/userguide/dependency management.html 26 [90] “Docker compose,” Oct 2017. [Online]. Available: https://docs.docker.com/compose/ 26 [91] “Matlab.” [Online]. Available: https://nl.mathworks.com/products/matlab.html 29 [92] X. A. Debest, R. Knoop, and J. Wagner, “Reveng: a Cost-Effective Approach To Reverse- Engineering,” vol. 17, no. 4, pp. 60–67, 1992. 30

48 Re-engineering the re-engineering process BIBLIOGRAPHY

[93] “X5-i.” [Online]. Available: http://minix.com.hk/en/products/x5-i-1 30 [94] “Architecture description language,” Dec 2017. [Online]. Available: https://en.wikipedia. org/wiki/Architecture description language 43

Re-engineering the re-engineering process 49 Appendix A

Interview materials

Figure A.1: Interview questions from 1 to 7

Figure A.2: Interview questions from 8 to 12

50 Re-engineering the re-engineering process APPENDIX A. INTERVIEW MATERIALS

Table A.1: Requirements

Requirement 1 2 3 4 5 Not Applicable Visualization (Visualize source code, graphics, control flows, etc.) Support for Data-mining techniques Support for machine/model learning Language (The programming language the tool was developed in) Platform (The platform the tool runs on) Supports code generation for reengineering Input format (Input source code can be in different programing languages) Perform static analysis (for instance metric analysis) Perform dynamic analysis (Collect traces and make a model) Usability (GUI, Documentation support, need for domain knowledge) Extendibility (number of plugins available, Level of difficulty to develop a new plugin) Performance (Size of the installation package, speed of analysis process) Scalability (size of the input project) Support of tool (tool can be academic, open-source, or commercial) Other:

Re-engineering the re-engineering process 51 Appendix B

Reverse Engineering Plan

B.1 Introduction

This document describes our intentions how to perform reverse engineering specifically for the YouKnowWatt Project. The process will be carried out in two phases - definition & justification and execution.

B.2 Definition & Justification B.2.1 Phase 1.1: Analyze the project goals In this subphase we need to determine the current problems with the system. While performing reverse engineering we would like to achieve identification of the A-B-C-D components in the YouKnowWatt project and also find the difference between the new situation using Lotte and the existing project.

B.2.2 Phase 1.2: Inventory of available existing components The task in this subphase is to make an inventory of the available items in categories such as source code, documentation, management information, validation and acceptance, data files and databases and hardware.

B.2.3 Phase 1.3: Determine reverse engineering strategy This subphase is responsible for multiple tasks. These tasks are determining the approach, defining quality and selection criteria, and selecting the tools. The quality attributes are divided in two categories: structure and relevance. Both use the scale from one to five. For structure one means analyze manually, while five - fully automated. For relevance, one stands for non-relevent and outdated, five for up-to-date. The tools are chosen based on the type of objects the project contains. Currently, there are three types of objects: 1. Not utilizable - these type of objects are discarded;

2. Object already satisfies reverse engineering goals - these type of objects are reused; 3. Objects contains reusable parts - these types of objects are analyzed.

52 Re-engineering the re-engineering process APPENDIX B. REVERSE ENGINEERING PLAN

B.3 Execution

In this subphase we finally perform the reverse engineering. There are two steps that need to be followed prior to that: 1. Verification and actualization of the object-oriented programming contents, to make sure that they are coherent with the reality; 2. Set up the IDE and projects

Re-engineering the re-engineering process 53