THE UNIVERSITY OF NEW SOUTH WALES

Building a Systematic Modernization Approach

Meena Jha

School of Computer Science and Engineering University of New South Wales, Sydney 2052, Australia

A thesis submitted in fulfillment of the requirements for the degree of Doctor of Philosophy

February, 2014

1

ABSTRACT

A systematic legacy system modernizing approach represents a new approach for modernizing legacy systems. Systematic legacy system modernization has reuse as an integral part of modernization. We have developed a modernization approach which uses software architecture reconstruction to find reusable components within the legacy system. The practice of software development and modernization continues to shift towards the reuse of components from legacy systems to handle the complexities of software development. Modernization of a legacy system requires reuse of software artefacts from legacy system to conserve the business rules and improve the system’s quality attributes. Software reuse is an integral part of our systematic legacy modernization approach. Software should be considered as an asset and reuse of these assets is essential to increase the return on the development costs. Software reuse ranges from reuse of ideas to algorithms to any documents that are created during the software development life cycle. Software reuse has many potential benefits which include increased , and decreased software development cost and time. Demands for lower software production and maintenance costs, faster delivery of systems and increased quality can only be met by widespread and systematic software reuse. In spite of all these benefits software reuse adoption is not widespread in the software development communities. Software reuse cannot possibly become an engineering discipline so long as issues and concerns have not been clearly understood and dealt with. We have conducted two surveys to understand the issues and concerns of software reuse in the Conventional Software Engineering (CSE) Community and the Software Product Line (SPL) Community where reuse is an integral part of the product development. The quantitative and qualitative analysis of our surveys identified the critical factors which affect and inhibit software engineers and developers adopting software reuse. Software reuse has been talked about in generic terms in software product lines. Though software reuse is a core concept in SPL it has however failed to become a standardized practice. The survey conducted on the SPL Community investigates how software reuse is adopted in SPL so as to provide the necessary 2 degree of support for engineering software product line applications and to identify some of the issues and concerns in software reuse. The identified issues and concerns have helped us to understand the difference between software reuse in the CSE and SPL Communities. It has also given us an indication of how both communities can learn good software reuse practices from each other in order to develop a common software reuse process. Based on the outcome of our surveys we have developed a systematic software reuse process, called the Knowledge Based Software Reuse (KBSR) Process, which incorporates a Repository of reusable software assets to build a systematic legacy system modernization approach. Being able to reuse software artefacts, be it software requirement specification, design, or code, would greatly enhance software productivity and reliability. All of these software artefacts can go in the Knowledge Based Software Reuse Repository and be candidates for reuse. The contributions in our research are: • We have developed a systematic legacy system modernization approach which incorporates software reuse as a key activity through using software architecture reconstruction to find reusable components; • We conducted a survey on software reuse in the CSE Community to identify issues and concerns in software reuse in that community; • We conducted a survey on software reuse in the SPL Community which helped us to understand the difference between software reuse in the CSE Community and SPL Community. The survey gave us an indication of how both communities can learn good software reuse practices from each other; • To address the issues and concerns identified from both communities we propose a systematic software reuse process called the Knowledge Based Software Reuse (KBSR) Process which incorporates a Repository of reusable software assets. The KBSR Process targets solving the main issue and concern of software engineers of locating quality desired/reusable software artefacts with sufficient documentation. Software engineers can search the KBSR Repository to find reusable software artefacts.

3

PREFACE

Some of the results presented in the thesis have already been published in various articles. These are listed below and will be referenced in the text.

I. Jha Meena, Maheshwari Piyush, Phan Thi Koi Anh, Technical Report on the “Comparison of Four Architecture Reconstruction Toolkits”, UNSW TR- 0435, 2004. II. Jha Meena, Maheshwari Piyush, “Reusing Code for Modernization of Legacy Systems”, Proceedings of IEEE Conference on Software Technology and Engineering Practice (STEP) 2005, Budapest, Hungary, 24-25 September, 2005. III. Jha Meena, O’Brien Liam, Maheshwari Piyush, “Identify Issues and Concerns in Software Reuse”, International Conference on Information Processing ICIP, Bangalore, India, 8-10 August, 2008. IV. Jha Meena, O’Brien Liam, Maheshwari Piyush, “Identify Issues and Concerns in Software Reuse” International Journal of Information Processing, Volume 3, Number 2, 2009. V. Jha Meena, O’Brien Liam, “Identifying Issues and Concerns in Software Reuse in Software Product Lines” 11th International Conference on Software Reuse, ICSR 2009, Falls Church, VA, USA, 27-30 September, 2009. VI. Jha Meena, O’Brien Liam, “A Comparison of Software Reuse in Software Development Communities”, The 5th International Malaysian Conference on Software Engineering, Johor Bahur, Malaysia, 12-14 December 2011. VII. Jha Meena, O’Brien Liam, “Re-engineering Legacy Systems for Modernization: The Role of Software Reuse”, The Second International Conference on Advances in Computer Science and Electronics Engineering, New Delhi, India, 23-24 February 2013. VIII. Jha Meena, O’Brien Liam, “Comparison of Modernization Approaches: With and Without the Knowledge Based Software Reuse Process”, The second International Conference on Advances in Computer Science and Engineering (CSE 2013), Loa Angeles, CA, USA, July1-2, 2013.

4

ACKNOWLEDGEMENTS

I would like to acknowledge all the people who have assisted me throughout my studies at University of New South Wales. Firstly, I am extremely grateful to my supervisor Dr. Liam O’Brien for his continuous guidance and encouragement. I also like to thank him for his invaluable insights and the time that he spent with me. I always came to him and got more than expected support when I had any research or general issues, Thanks for everything Liam. Without his support and encouragement this thesis had no end. I would like to thank Professor Ross Jeffery for his encouragement and support. I would also like to express my thanks to Dr. Piyush Maheshwari for his invaluable time and discussions in the beginning of my study. I am grateful to him for showing me correct directions to find Dr. Liam O’Brien in this world and without whom this thesis would not have been possible. I would like to express my thanks to the Conventional Software Engineering (CSE) Community and Software Product Line (SPL) Community professionals who supported me by giving their time to finish the survey. Finally, I want to express my deepest thanks to my beloved husband Sanjay and my son Manish for their love, understanding, patience and moral support during the development of this thesis.

5

I would like to dedicate this thesis to my loving parents ...

6

CONTENTS

ABSTRACT ...... 2 PREFACE ...... 4 ACKNOWLEDGEMENTS ...... 5 PART 1 ...... 15 CHAPTER 1: INTRODUCTION ...... 16 1.1 Introduction ...... 16 1.2 Problem Outline ...... 21 1.3 Research Context and Research Questions ...... 24 1.4 Research Design ...... 28 1.5 Contributions ...... 31 1.5.1 Contributions on Modernization Approaches ...... 32 1.5.2 Contributions on Empirical Work: Contributions in Issues and Concerns in Software Reuse in Software Engineering Communities ...... 33 1.6 Thesis Structure ...... 34 1.7 Chapter Summary ...... 36 CHAPTER 2: LITERATURE REVIEW ...... 37 2.1 Introduction ...... 37 2.2 Legacy System Definitions and Challenges ...... 37 2.3 Legacy System Evolution ...... 40 2.4 Legacy System Modernization ...... 43 2.5 Approaches for Modernization of Legacy Systems ...... 44 2.5.1 Black-box Modernization Approach ...... 46 2.5.2 White-Box Modernization Approach ...... 47 2.5.3 Wrapping ...... 47 2.5.4 Migration ...... 47 2.5.5 Screen Scraping ...... 48 2.5.6 Data Wrapping ...... 49 2.5.7 Legacy Integration using CGI ...... 49 2.5.8 Data Contextualization ...... 50 2.5.9 Architecture-Driven Modernization (ADM) ...... 50 2.5.10 COTS Based Modernization ...... 52 2.6 Software Reuse ...... 52 2.6.1 Software Reuse Definitions...... 53 7

2.6.2 Software Reuse Challenges ...... 54 2.6.3 Software Reuse Benefits ...... 56 2.6.4 Example Approach to Software Reuse as Web Services ...... 58 2.6.5 Categories of Software Reuse ...... 59 2.6.6 Software Reuse Process Models ...... 61 2.7 Product Families and Software Reuse ...... 63 2.8 Software Architecture as a Catalyst for Software Reuse...... 66 2.8.1 Software Architecture Reconstruction ...... 68 2.9 Knowledge Based Software Reuse ...... 69 2.10 Summary of Modernization Approaches ...... 73 2.11 Chapter Summary...... 75 CHAPTER 3: MODERNIZATION ...... 77 3.1 Introduction ...... 77 3.2 Software Evolution Dynamics ...... 81 3.3 Software Modernization ...... 82 3.4 Challenges with Modernization ...... 88 3.5 Classification of Existing Modernization Approaches ...... 91 3.6 Shortfalls of the Existing Approaches...... 97 3.7 Software Architecture and its Effect on Modernization ...... 99 3.8 Software Architecture Reconstruction ...... 102 3.9 Chapter Summary...... 104 CHAPTER 4: SOFTWARE ARCHITECTURE AND ITS ROLE IN SOFTWARE REUSE ...... 105 4.1 Software Architecture?...... 105 4.2 Software Architecture and Maintenance Attribute ...... 107 4.3 Software Architecture Reconstruction ...... 109 4.5 Documenting Software Architecture...... 110 4.6 Software Architecture and Software Reuse ...... 111 4.7 Chapter Summary...... 113 CHAPTER 5: ARCHITECTURE RECONSTRUCTION TOOLKITS ...... 114 5.1 Introduction ...... 114 5.2 Reconstruction Toolkits Overview ...... 115 5.2.1 Dali Workbench ...... 116 5.2.2 The PBS and SWAGKIT Toolkits ...... 116 5.2.3 Bauhaus Toolkit ...... 117 5.3 Toolkits Functionalities ...... 118

8

5.3.1 Overview of ‘Concepts’ Program: A Case Study ...... 119 5.3.2 Dali Workbench ...... 120 5.3.3 PBS and SWAGKIT Toolkits ...... 124 5.3.4 Bauhaus Toolkit ...... 126 5.4 Analysis of Architecture Reconstruction Toolkits ...... 130 5.4.1 Extraction Capabilities ...... 130 5.4.2 Abstraction Capabilities ...... 132 5.4.3 Visualization Capabilities ...... 133 5.4.4 Other Attributes ...... 135 5.5 Chapter Summary...... 137 CHAPTER 6: REUSING CODE FOR MODERNIZATION OF LEGACY SYSTEMS ...... 139 6.1 Introduction ...... 139 6.2 Overview of Our Approach to Legacy System Modernization ...... 142 6.2.1 Activity 1: Reverse Engineer an Analysis Model ...... 144 6.2.2 Activity 2: Re-structure into an Object Model ...... 148 6.2.3 Activity 3: Forward Engineer using Object-oriented Methods ...... 157 6.3 Chapter Summary...... 158 PART 2 ...... 160 CHAPTER 7: ORGANIZATION OF SOFTWARE REUSE SURVEY ...... 161 7.1 Introduction ...... 161 7.2 Survey Framework ...... 163 7.2.1 Step 1: Gather Good Candidate Questions...... 165 7.2.2 Step 2: Select Questions of Interest ...... 165 7.2.3 Step 3: Select Companies ...... 166 7.2.4 Step 4: Analyze Strategies for Conducting Survey ...... 166 7.2.5 Step 5: Analyze Survey Results ...... 169 7.3 Chapter Summary...... 170 CHAPTER 8: CONCERNS IN SOFTWARE REUSE IN CONVENTIONAL SOFTWARE ENGINEERING ...... 171 8.1 Introduction ...... 171 8.2 Results and Analysis ...... 173 8.2.1 Section 1: General Questions ...... 173 8.2.2 Section 2: Software Reuse Measurement ...... 174 8.2.3 Section 3: Software Reuse Technical Aspects ...... 176 8.2.4 Section 4: Testing and the Reliability of Reused Software...... 181

9

8.2.5 Section 5: Development Environment for Reuse ...... 184 8.3 Chapter Summary...... 185 CHAPTER 9: CONCERNS IN SOFTWARE REUSE IN SOFTWARE PRODUCT LINES ...... 187 9.1 Introduction ...... 187 9.2 Presentation ...... 173 9.3 Results and Analysis ...... 189 9.3.1 Section 1: General Questions ...... 190 9.3.2 Section 2: Software Reuse Measurement ...... 191 9.3.3 Section 3: Software Reuse Technical Aspects in SPL ...... 196 9.3.4 Section 4: Testing and Reliability of Reused Software ...... 198 9.3.5 Section 5: Development Environment for Reuse ...... 200 9.4 Chapter Summary...... 203 CHAPTER 10: A COMPARISION OF SOFTWARE REUSE IN BOTH SOFTWARE DEVELOPMENT COMMUNITIES ...... 205 10.1 Introduction ...... 205 10.2 Comparison of Software Reuse in CSE and SPL Communities ...... 206 10.2.1 Software Reuse Management and Measurement ...... 206 10.2.2 Disadvantages of Software Reuse in the SPL and CSE Communities ...... 207 10.2.3 Is Reuse Affected by Not-Invented-Here Syndrome ...... 208 10.2.4 Reuse Planning ...... 209 10.2.5 Reuse and Software Quality ...... 209 10.2.6 Software Reuse: Is Software Reuse Domain-Based ...... 210 10.2.7 Software Reuse and Technical Aspects ...... 211 10.3 Possible Reuse Issues and Concerns ...... 211 10.4 Chapter Summary...... 215 PART 3 ...... 217 CHAPTER 11: KNOWLEDGE BASED SOFTWARE REUSE PROCESS AND REPOSITORY ...... 218 11.1 Introduction ...... 218 11.2 An Overview of the KBSR Process ...... 222 11.2.1 Phase 1: Develop the KBSR Repository ...... 224 11.2.2 Phase 2: Use the KBSR Repository in the modernization of a system ...... 231 11.3 Chapter Summary...... 234 CHAPTER 12: EVALUATION OF MODERNIZATION APPROACHES ...... 235 12.1 Introduction ...... 235 12.2 A Case Study – The ACRSS System ...... 236 10

12.3 Reusing Code for Modernization ...... 238 12.3.1 Activity 1: Reverse Engineer an Analysis Model: ...... 238 12.3.2 Activity 2: Re-Structure into an Object-Model: ...... 239 12.3.3 Activity 3: Forward Engineer using Object-Oriented Methods: ...... 244 12.4 Modernization Using Knowledge Based Software Reuse Process and Repository…………………………………………………………………………….246 12.4.1 Phase 1: Develop the KBSR Repository ...... 246 12.4.2 Phase 2: Use the KBSR Repository in the Modernization of a System ..... 250 12.5 Comparing Software Modernization Approaches based on Software Reuse ... 253 12.6 Comparison of different Modernization Approaches to our Approach ...... 255 12.7 Discussion and Limitation ...... 258 12.8 Chapter Summary...... 260 CHAPTER 13: CONCLUSION ...... 261 13.1 Introduction ...... 261 13.2 Research Questions ...... 262 13.3 Contributions and Significance of the Research ...... 264 13.3.1 Reusing Code for Our Legacy Modernization Approach ...... 264 13.3.2 Issues and Concerns about Software Reuse in the Software Engineering Communities ...... 265 13.3.3 A Comparison of Software Reuse in Software Development Communities ...... 268 13.3.4 Systematic Legacy System Modernization Approach ...... 269 13.4 Future Work ...... 270 REFERENCES ...... 273 APPENDIX A ...... 303 APPENDIX B ...... 310

11

LIST OF FIGURES

Figure 1.1: Research Design and Different Phases of Research Conducted...... 29 Figure 2.1: Software System Life Cycle ...... 40 Figure 2.2: Technical Quality and Business Value ...... 44 Figure 2.3: Model of ADM Modernization ...... 51 Figure 3.1: Modernization driving factors for market drive ...... 86 Figure 3.2: Modernization driving factors for business drivers ...... 86 Figure 3.3: Modernization driving factors for technology drivers ...... 87 Figure 3.4: Challenges with modernization of legacy system ...... 89 Figure 3.5: Legacy system modernization at different levels ...... 91 Figure 3.6: Classification of legacy system modernization approaches ...... 92 Figure 3.7: Classification of legacy system wrapping approaches ...... 93 Figure 3.8: Classifications of legacy system Re-engineering approach ...... 95 Figure 3.9: Classifications of legacy system migration approaches ...... 96 Figure 3.10: Software Architecture Reconstruction Process ...... 103 Figure 5.1: Dali Interface ...... 122 Figure 5.2: Raw concrete module of ‘Concepts’ ...... 123 Figure 5.3: The ‘Concepts’ architecture, after function pattern are applied ...... 123 Figure 5.4: Domain specific components ...... 123 Figure 5.5: The content of sub systems in the domain specific components ...... 124 Figure 5.6: Highest level Architecture of Concepts retrieved from SWAGKIT ...... 125 Figure 5.7: One of the main sub-systems of ‘Concepts’ Architecture ...... 126 Figure 5.8: Extraction of global variables ...... 127 Figure 5.9: Nodes within two subsystems lib and src and dependencies between them ...... 128 Figure 5.10: Concepts’ sub-system Node Information in Tree Layout ...... 128 Figure 5.11: Hierarchical Call for ListInit Routine ...... 130 Figure 6.1: Legacy System Modernization Approach ...... 143 Figure 6.2: Steps in Reconstructing the Architecture of a Software System ...... 147 Figure 6.3: Artefacts Analyzed from Legacy System ...... 150 Figure 6.4: Re-structuring the components ...... 151 Figure 6.5: Coincidental Decomposition ...... 153 Figure 6.6: CIC Decomposition ...... 154 Figure 6.7: Decomposition with Sequential Cohesion and Sequential Composition ...... 155 Figure 6.8: Sequential Decomposition ...... 155 Figure 6.9: Caller/Callee Composition ...... 156 12

Figure 6.10: Hide and Reveal ...... 156 Figure 6.11: Forward Engineering with Reuse of Legacy Artefacts ...... 157 Figure 7.1: Framework of Our Survey Process ...... 164 Figure 8.1: Classification of Component Reuse ...... 176 Figure 8.2: Difference between Reuse and Porting ...... 177 Figure 8.3: Moving Deadlines ...... 178 Figure 8.4: Time Distribution over Software Development Phases ...... 183 Figure 9.1: Respondents’ Role in SPL Development ...... 191 Figure 9.2: Possible Range of Different Phases of SPL...... 200 Figure 11.1: Overview of the KBSR Process ...... 223 Figure 11.2: Activities to Develop the KBSR Repository ...... 224 Figure 11.3: Legacy System Modernization Approach using KBSR Repository ...... 225 Figure 11.4: Components of the KBSR Repository ...... 226 Figure 11.5: Activities using the KBSR Repository in the Modernization Approach ...... 232 Figure 12.1: The Dependency view of methods inside OPTMDL ...... 240 Figure 12.2: The Dependency view of methods and subroutine ...... 241 Figure 12.3: NDEPTH Variable Dependency View ...... 242 Figure 12.4: INSCOL Subroutine and Variable Dependency View ...... 242 Figure 12.5: Re-structured subroutines SALE, PAY and PROFIT ...... 243 Figure 12.6: Super type class and Subtype class of PAY Subroutine………………………….253

13

LIST OF TABLES

Table 2.1: Comparison of Modernization Approaches ...... 74 Table 3.1: Most Common Problems in a Legacy System ...... 85 Table 5.1: Conigurations of Tools ...... 118 Table 5.2: Features Table Lists Visualization Capabilities of the Toolkits ...... 134 Table 5.3: Assessments of the Software Architecture Reconstruction Toolkits ...... 136 Table 6.1: Module Analysis Data...... 145 Table 10.1: Possible Reuse Issues Explored in SPL and CSE ...... 215 Table 12.1: Three Subroutines SALE, PAY, and PROFIT ...... 244 Table 12.2: Psedo Code of the Re-structured Methods and Variables ...... 246 Table 12.3: Software Artefacts Representation within the KBSR Repository ...... 248 Table 12.4: Storage of Reusable Artefacts SALE in the KBSR Repository ...... 248 Table 12.5:Reusable Software Artefacts and Components from the ACRSS System ...... 250 Table 12.6: Comparisons of Modernization Approaches: With and Without KBSR Process and KBSR Repository ...... 254 Table 12.7: Modernization Approach 1 and Modernization Approach 2 based on Strength, Weaknesses, Software Reuse and Software Architecture Reconstruction……………………………………………………………………..257 Table 12.8 Issues and Concerns of Software Reuse and Incorporation of these in our KBSR Repository/ Process ...... 258

14

PART 1

15

CHAPTER 1: INTRODUCTION

1.1 Introduction

Constant technological change often weakens the value of legacy systems, which have been developed over the years through significant investments. Even though legacy systems can be quite valuable they are also problematic for several reasons. Legacy systems are built on obsolete (and usually slow) technology. These systems are often hard to maintain, improve, and expand because there is a general lack of understanding of the system. The designers and developers of the system may have left the organization, leaving no one left who understands the system or is able to explain how it works. Such a lack of understanding is increased by inadequate documentation or manuals getting lost or becoming out of date over the years. What to do with such systems becomes a vital question for organizations that have them and one of the options is to modernize them. Modernizing legacy system is difficult and looms as a huge challenge for industry and the research community. In this thesis we explore the issue of modernization of legacy systems. While modernizing legacy software, developers must try to respect the system’s functional and non-functional goals. The requirements or the business rules of an organization are dynamic and keep changing. Perhaps the biggest problem with the is the changing requirements of software maintenance. Any legacy software that lacks agility, openness, and flexibility needs modernization as part of its evolution. Software modernization attempts to evolve a legacy system, or elements of the system, when conventional evolutionary practices, such as maintenance and enhancement, can no longer achieve the desired system properties. Legacy systems contain significant and invaluable business logic of the organization. Organizations cannot afford to throw away or replace the business logic. These legacy systems are assets of the organization. These invaluable assets of encoded ‘business logic’ represent many years of coding, development, real-life experiences, enhancements, modifications, debugging etc. Unfortunately, they may also represent many years of bad documentation or no documentation at all. They are well known by their dominant characteristics of ‘resisting modification’ and 16

‘evolution’, and ‘running’ on obsolete hardware. These systems are often slow and expensive to maintain [1]. Even if a system’s documentation was perfectly up to date, redeveloping these systems would still be estimated unaffordable in terms of time, costs, and the required human resources [2]. Since they are vitally important for the organization, they need to be evolved into new technology environments and run on modern platforms [1]. Hence, legacy software systems must evolve to survive. Legacy systems are often written in a variety of 3GL programming languages such as COBOL, RPG, PL1, FORTRAN, BASIC, PASCAL, , etc. [3], [4]. Others are written in 4GL programming languages like PACBAS, DELTA, etc. Some of these later languages generate the aforementioned 3GL programs when compiled. All of these systems are called legacy systems. From one perspective, legacy systems are time-tested having value proven by long use representing decades of effort and customization, becoming reliable parts of an overall IT strategy along the way. These entrenched software systems often resist evolution because their ability to adapt has diminished through factors not exclusively related to their functionality. According to Lehman’s first Law [5] software must be continually adapted or it will become progressively less satisfactory in ‘real-world’ environments. This is due to the continuous change of user requirements and technical environment. Many legacy systems have been very large investments for organizations, and they contain invaluable business logic and knowledge. Koskinen, et al. have done an empirical study on software modernization decision criteria and found that the feasibility of a legacy system to be evolved, maintained, and integrated with other systems is improved due to modernization [6]. The ending of technological support and expected system lifetime reflect mandatory system modernization [6]. Therefore, the modernization of legacy systems can be expected to be potentially desirable. Generally modernization means large changes to a system, typically due to the major changes in the technical context or major changes in business processes of the organization using the software. Modernizations are generally both technically and socially the hardest, and economically the most critical maintenance situations. According to Seacord Modernized systems have lower maintenance cost [7]. The relative amount of the costs of software maintenance and related evolution activities has traditionally between 50%-75% of the software life-cycle, in the case of the 17 successful systems with long lifetime. Since the relative amount is increasing with increased maintenance cost, the importance of modernization can hardly be over- emphasized [7]. It is infeasible to rewrite the entire software with new design rules, or in new programming languages because business rules may not be reproduced in the new system as they are in the legacy system [8]. There is a need to understand existing software systems for the purpose of modernization. Legacy systems have the business rules embedded in the existing software artefacts. Existing software artefacts which were developed during the development of the legacy system need to be reused for the modernization of a system to maintain the functionality of the legacy systems. Software reuse is perhaps the best strategy to maintain the functionality of the legacy systems and handle complexities in software development [9]. According to Malan et al. [9] maintaining the reuse components, managing their evolution, and propagating upgrades to new products, the organization can reduce duplication of effort. Software reuse gives the benefit of not having to duplicate corrective and evolutionary maintenance activities [9]. Wegner [10] has pointed out that “the view of maintainability as a form of reusability is novel and important. It captures the idea of reusability in time within a dynamically evolving system. Evolutionary dynamic systems require reusability in time of unchanging parts of the system while other parts of the system evolve.” Therefore, a systematic modernization approach should take advantage of software reuse. This generates the requirement of having a systematic software reuse process which should be embedded in a systematic modernization approach so that the modernization process benefits from software reuse. Software reuse is the process of using existing software artefacts rather than building from scratch. Typically, reuse involves the selection, specialization, and integration of artefacts [11]. The primary motivation to reuse software artefacts is to reduce the time and effort required to build software systems. Software reuse has been very successful in many areas. People have been using compilers, system libraries, numeric libraries and GUI libraries for a long time. The word reuse in software has the intrinsic character of being tailored according to one’s needs. Reuse emphasizes increasing productivity by learning from the experiences of others and applying them judiciously. 18

The quality of software systems is enhanced by reusing quality software artefacts, which also reduces the time and effort required to maintain software systems [12], [13]. Simple replacements of these systems may be infeasible or impractical because of the scope of previous investments [14]. It is impractical also because the new system may not be as functional as the legacy system. Over the years business rules are embedded in the legacy systems and it is not feasible to reproduce all the business rules from scratch. The use of reusable software components increases the quality of the software, reduces the software development phase cost, and reduces the maintenance phase cost [10]. Software reuse is about reusing software artefacts. Reusable software artefacts include modules, classes, libraries, packaged components, test cases, documentation, system architecture, software architecture, software design, and the skill and knowledge applied at the time of software development. Any document, report, source code developed during the development of software is a software artefact. Reusable software artefacts like modules and classes reduce implementation time, increase the likelihood that prior testing and use has eliminated bugs and localizes code modifications when a change in implementation is required [15]. Software reuse can occur at all stages of the development process and reusable artefacts can take several forms as stated above. Software reuse has proved its advantages and yet it has not been incorporated systematically in the software development life cycle or as part of system modernization [16]. The traditional software development life cycle has five phases: Analysis, Design, Coding, Implementation, and Maintenance. One of the newer and more effective ones is called Agile Development. The basic philosophy of Agile Development is that neither team members nor the users completely understands the problems and complexities of a new system, so the project plan and the execution of the project must be responsive to unanticipated issues. It must be agile and flexible [268]. Modernization uses many of the same phases from the software development life cycle. Industry needs a systematic approach/standardization for modernization of a legacy system that reuses already developed software artefacts. This means that already developed software artefacts need to be identified and saved for reuse in the modernization approach. Software for reuse does exist in the software application but has not been discovered and saved 19 for later reuse for software development. Software reuse is the use of software that was designed for reuse [17]. Software developed for reuse needs a systematic process in the modernization approach. This requires a systematic software reuse process which is based on already developed software atrefacts during software development for reuse. As discussed earlier software reuse is widely believed to be one of the most promising techniques to improve software quality and productivity for legacy system modernization [18], [19], [20], [21] and [22]. However as seen from the literature [23], [11], [24] and [25] there remain several problems with software reuse. These range from the scarce availability of reusable components and other software artefacts to the difficulty of retrieving, understanding and adapting the required software reusable artefacts and components. Software engineers find difficulty in locating reusable software artefacts (both code related and non-code related). To make software reuse an integral phase in software development or in legacy system modernization all reusable software artefacts should be made easily available to software engineers. This can be made possible only if a reuse repository is developed to store all of the knowledge base of reusable software artefacts. The experiences and knowledge of the experts are represented in the software artefacts of the system. By identifying the reusable software artefacts and storing them in a repository we are capturing the experiences and knowledge developed during the development phase of the software. The knowledge used for software development can be categorized and saved in a repository for reuse. The knowledge extracted from legacy system can be categorized and saved in a repository for modernization with software reuse. The goal of our research is to develop a systematic modernization approach for legacy systems and also supporting software reuse process. The development of a supporting toolkit is planned for future work. In this chapter the background to the research and the research context is briefly presented. The chapter describes the research questions, research design and, the contributions of this research. The structure of the thesis is also presented in this chapter.

20

1.2 Problem Outline

Issues regarding modernization have been research and development topics for some time. Issues such as cost, quality, etc have been investigated as well as the development of modernization approaches. The decision to modernize a legacy system can be related to quality improvements such as maintenance and the cost associated with it. It has been realized that the maintenance and evolution costs of legacy systems are normally somewhere between 50% and 75% of the total costs of the life cycle of the system. It has also been observed that a software system deteriorates in quality as it ages and is maintained [26], [27], [28] and [29]. Using the vocabulary of Fred Brooks, a system’s conceptual integrity [29] degrades as changes are made to it. Repeated system maintenance supports the business need sufficiently for a time, but as the system becomes increasingly outdated, maintenance falls behind the business needs. A modernization effort is then required that represents a greater effort, both in time and functionality, than the maintenance activity. Retaining the value of the legacy system is a common objective of most modernization approaches. Many approaches to legacy modernization have been developed and reported. The state-of-the-art may be found in some recent publications [1], [5], [30] and [31]. Bisbal et al. [1] advocates migration as a sound modernization strategy based on given scale, complexity, and risk of failure in Legacy Information System modernization. According to Lehman et al. [5] the evolution of several industrial software systems are derived from evolution metrics. Comella-Dorda et al. [30] have provided several techniques to support legacy system modernization such as: Screen Scraping, Database Gateway, XML Integration, CGI Integration, OO Wrapping, Component Wrapping and Database Replication. The main problem with most of the listed approaches is that they do not offer an effective and long-term solution that preserves the data and business logic of a legacy system. This suggests that there is a need for a systematic modernization approach which can offer an effective and long term solution. Many of the problems of legacy systems are due to the fact that the system’s architecture is often virtually un-documented [48]. Moreover, the architecture could have degraded over time, due to modifications or extensions that violate the initial 21 architectural principles, resulting in overall chaos. Modernization of legacy systems must be able to address these issues. From our research we have observed that the fact that the architecture of the legacy system is un-documented has not been addressed so far. And we are not aware of any modernization approach to date that addresses the problems of legacy systems arising because of un-documented system architecture. Our systematic legacy system modernization approach addresses the issue of un-documented system architecture and provides a long term solution for the problems of the legacy systems such as lack of agility, openness and flexibility. Agility, openness and flexibility are required for software to evolve. As modernization is here to stay, a systematic modernization approach is required so that the cost invested in the modernization process today should benefit the organization in the long term. As mentioned above while modernizing the legacy system the main objective is to protect the functional requirements and enhance quality attributes. This means the artefacts of the legacy system are required to be reused. Software reuse results in improvements in quality, maintainability, productivity, performance, reliability and interoperability [32].To accomplish legacy system modernization, we must rely on methods, processes and tools for understanding, identifying assets and then managing these assets so that they can be effectively reused [7]. The foundation of any software system is its architecture. As mentioned previously the software architecture documentation of the system may not exist or be out of date so there is a need to document the architecture of the system as it is. Software architecture reconstruction serves as a means of documenting the software, identifying reusable components, understanding the existing system as a basis for software system analysis and as a primary vehicle for communication among stakeholders [33]. Software documentation fulfils the vital function of retaining and transferring knowledge about various aspects of software systems. Whatever the business needs driving organizations to modernize their software applications, most want to ensure that the business logic and functional design which are core assets to the organizations are preserved [34]. Modernization of a legacy system is a development (enhancement of quality attributes) of a system with whatever is known about the system (documentation, architecture, artefacts, etc.) so that core assets to the organizations are preserved. Modernization can be mapped to 22 the software development life cycle where most of the phases are similar but with improvements such as reduced analysis time, reduced implementation time and reduced testing time with software reuse for modernization. There are several models for a software development life cycle (SDLC), each describing approaches to a variety of tasks or activities that take place during software development. Several models exist to streamline the development process. Each one has its pros and cons, and it is up to the development team to adopt the most appropriate one for the project. Sometimes a combination of the models may be more suitable. Modernization uses many phases of SDLC and software reuse needs to be incorporated in the modernization approach for various reasons stated earlier. Some of the software development models are traditional such as the waterfall approach [31] and some of them are new such as the Unified Process [35], [36]. There are combination of traditional and new software development models such as Chaos model [37], Extreme Programming [38], Unified Process [36], ICONIX [39], Model-Driven Engineering, Service-oriented modeling, Software prototyping, Specification and Description Language, Top-down and Bottom-up design, User Experience, V-model software development, and verification and validation software development. These so called “life cycle” models of software engineering, while reducing the development complexity, do not address certain problems of software development such as software reuse. Some of the problems are with analysis phase where identification and specification of software requirements are established. Experience has shown that the effect and repair cost of undetected errors in the analysis phase will be magnified with each successive phase [40]. The phased “life cycle” paradigm also creates new problems. The phased life cycle development model means no phase of the development may begin until the preceding phase has been completed such as in the Waterfall approach. This results in a long lag between the beginning of software development and the appearance of the delivered product. In agile (e.g. XP, Scrum) we have more iterative phases of requirement development, and testing. But is has a number of issues such as: agile programming emphasizes programming over engineering. This results in software that does not have clean interfaces and is intertwined with other code. Of course, such code is difficult to maintain, debug, and replace. Expensive code bloat is the consequence. The situation in the phased development life cycle gets even worse if 23 the requirements change in the middle of the project. The project needs to be backtracked to the beginning and each phase repeated from there [41]. In the past few years software development life cycle model has been strongly criticized [42], [43], [44]. In software engineering there is a need for software development methods that will significantly decrease effort in developing software products, increase quality of software products and decrease time-to-markets [44]. This suggests that software reuse should be made an integral part of SDLC and hence modernization approach because modernization approach uses many phases of SDLC. During our literature review we identified that there is no software model which has systematically embraced software reuse [37], [40], [42], [43], [44].One of the reasons for not having software reuse, as an activity in any of the examined software development models could be issues related to software reuse. Software reuse has a number of issues [18] such as the not-invented-here syndrome, reusable artefacts are not known, reusable artefacts compatibility issues, etc. Until and unless issues associated with software reuse are addressed software reuse cannot systematically fit in any software development model or software development life cycle. Empirical studies on software reuse may answer questions on how software reuse is applied and how components are built for reuse in the future. Such studies may also suggest a systematic software reuse process to be incorporated into the modernization approach.

1.3 Research Context and Research Questions

Software systems have become increasingly large and complex [45] and the potential for redeveloping such systems, or components from such systems, from scratch has decreased [46]. There is a need to understand and evolve existing software systems for the purpose of modernization. There are many modernization approaches but none of them focuses on software reuse. There is no methodical approach which can be repeated and made standardized for the modernization of all legacy systems. A systematic legacy system modernization approach is methodical, repeatable and learnable through step by step phases and activities. Software reuse is an integral part of a systematic legacy system modernization approach. 24

Software developers look for opportunities to reuse existing functionality in the form of artefacts or components of existing (legacy) systems. Accordingly, the use of architecture reconstruction to support reuse has become a central research theme in the area of software engineering [47]. Understanding the current system, especially the architecture, plays an important role in reusing artefacts from the system. As discussed earlier if the architecture documentation is not available then we require the use of tools to support the reconstruction of the architecture of the system. The architecture of a system is a blue print which identifies all the components and the dependencies between the components. Software architecture reconstruction aims to recover information about the system’s architecture, whose documentation was lost or has become outdated over the years. In the case of little or outdated system documentation the most reliable information can be derived from the source code. There is often an abstraction gap between the source code and system documentation. To bridge this gap software architecture reconstruction is applied. O’Brien, et al. [48] have found the use of architecture reconstruction to understand the as-built system provided an essential step in making decisions on the migration of the legacy components to services. By reusing components or artefacts, that have been proven in use and enhanced over time, it would be possible to construct high quality modernized systems more rapidly than ever [49]. Moreover, research is progressing towards the vision that system behavior can be predicted from component behavior [49], [50] which would make reuse even more attractive, as the consequences of selecting a particular component for reuse would be known in advance. However, in practice, software reuse is difficult and not entirely successful for several reasons [50]. In our research we have explored what are the issues and concerns that have inhibited software reuse. We encountered that there are a number of unidentified issues and concerns in software reuse which have inhibited the development of a common systematic software reuse process. This is discussed in subsequent chapters. In our research we have also explored the impact, issues and concerns of software reuse in software development communities to develop a systematic software reuse process which overcomes of the issues and concerns, which can be used in the modernization of legacy systems. In order to explore this

25 research we define several research questions around modernization and software reuse, which are:

Research Question 1 (RQ1): What role do software architecture and software reuse play in software system modernization?

Research Question 2 (RQ2): What are the issues and concerns about software reuse in the software engineering communities that will have an impact on software system modernization?

Research Question 3 (RQ3): What is a systematic legacy system modernization approach that incorporates software architecture reconstruction and software reuse?

Research Question 4: (RQ4): How do we address the issues and concerns of software reuse to make it an integral part of systematic legacy system modernization?

Research Question 1 allowed us to understand what software architecture is, and what its role in modernization is. We identified that software architecture helps in identifying reusable software components. And hence, we identified that software architecture plays a major role in modernization where software reuse is an inherent part and software architecture reconstruction approach also supports software reuse in the modernization of legacy systems. Research Question 2 allowed us to identify issues and concerns in software reuse in the Conventional Software Engineering (CSE) Community where software reuse has not been adopted as a core activity in software development and software modernization and identify issues and concerns in software reuse in the Software Product Line (SPL) Community where software reuse is a core activity. Research Question 3 allowed us to understand what a systematic legacy system modernization approach is that incorporates software architecture reconstruction and software reuse, and how our systematic legacy system modernization approach is different than the rest of the approaches. We used software architecture reconstruction as part of our modernization approach. While 26 using our modernization approach we could identify reusable software artefacts and wanted to explore why software reuse has not become an integral part of any modernization approach to date. Research Question 4 generated the comparison of the survey results on software reuse which must be addressed in the development of a systematic software reuse process which is an integral part of systematic legacy system modernization approach. It also allowed us to develop a systematic software reuse process called the Knowledge Based Software Reuse (KBSR) Process which can be incorporated into our modernization approach to resolve some of the issues and concerns around software reuse. The research in this thesis uses the results of quantitative and qualitative empirical studies on software reuse. Surveys were carried out to identify issues and concerns in software reuse in different software development communities. For this purpose we divided the software development communities into two groups: the Conventional Software Engineering (CSE) Community and the Software Product Line (SPL) Community. As a means to improve software quality and productivity, software reuse involves building “software with reuse” as well as building “software for reuse” [51]. What makes certain software reuse practices successful is not currently well understood. In some cases software reuse is a success and in some cases it is a failure [52]. Our survey allowed us to understand and document the issues and concerns in software reuse in different software development communities (CSE and SPL). A software reuse process and a reuse repository that manages reusable software artefacts is an answer to several of the issues and concerns we have identified in software reuse. A reuse repository stores all of the reusable artefacts and a reuse process uses this repository when modernizing a legacy system. We also acquired a case study written in a 3GL programming language for modernization purposes. We wanted to test our modernization approach on a legacy system from industry. A case study on the Automatic Cane Railway Scheduling System (ACRSS) is used for the purpose of illustrating our modernization approach. ACRSS is a computer-based system developed to schedule operations involved in the transport of sugar cane from the field to the factory. ACRSS is one of the four programs developed by the Sugar Research Institute to study cane railway transport. The modernization of the ACRSS system allowed us to reuse the code,

27 documentation, and other artefacts provided to us. Our modernization approach is based on software architecture reconstruction and software reuse. The goal of our research is to: • Advance the state-of-the-art of legacy system modernization through the development of a systematic software modernization approach that uses software architecture reconstruction and software reuse; • Identify the issues and concerns of software reuse and the state-of-the- practice of software reuse in two different software development communities; • Propose a systematic legacy system modernization that incorporates reuse through the development of the Knowledge Based Software Reuse (KBSR) Process and the Knowledge Based Software Reuse (KBSR) Repository. • Show how the systematic legacy system modernization approach uses the KBSR Process and KBSR Repository for the modernization of legacy systems.

1.4 Research Design

This research has been carried out using a combination of quantitative and qualitative studies. This research was conducted in four phases which are:

1. Phase 1 i. Study of Software Architecture ii. Study of Software Architecture Reconstruction Toolkits 2. Phase 2 i. Study of Modernization Approaches ii. Reusing code for Modernization of Legacy systems 3. Phase 3 i. Study of Software Reuse ii. Survey of Software Reuse in the Conventional Software Engineering Community iii. Survey of Software Reuse in the Software Product Line Community 28

4. Phase 4 i. Combining results from Phases 1, 2 and 3 ii. Developing a systematic legacy system modernization approach that incorporates KBSR Process and KBSR Repository. iii. Evaluation of Modernization Approaches.

Phase 1 as shown in Figure 1.1 is dominated by literature surveys of Software Architecture and Software Architecture Reconstruction Toolkits. Four Software architecture reconstruction toolkits were chosen and compared on different attributes. This phase is chosen to understand the importance of Software Architecture and to document the Software Architecture so that it can be reused when required. Software Architecture Reconstruction Toolkit is used to reconstruct the architecture of the software system. Phase 2 is also dominated by a literature survey of current modernization approaches. We explore the problems in the current modernization approaches and outline an approach to overcome the problems associated with existing modernization approaches. We used a legacy system from the industry to justify our modernization approach. This phase is chosen to understand the existing modernization approaches in the research and in the industry and the pitfalls of the existing approaches so that a new approach could be designed to overcome the shortfalls of the existing modernization approaches.

29

Study of Software Study of Modernization Combining results from phases Study of Software reuse Architecture Approaches 1, 2 and 3

Phase 1 Phase 2 Phase 3 Phase 4

Survey of Software Reuse in the Conventional Software Engineering Community

Systematic legacy system modernization approach that Survey of Software Reuse in the Study of Software Architecture Reusing code for Modernization incorporates KBSR Process and Software Product Line Reconstruction Toolkits of Legacy systems KBSR Repository. Evaluation of Community Modernization Approaches

Figure 1.1: Research Design and Different Phases of Research Conducted

Phase 3 handles the study of software reuse. This phase is dominated by literature review of software reuse and empirical studies. We carried out two surveys, one each in the CSE and SPL communities to identify the issues and concerns associated with software reuse in these communities. This phase is chosen to gain the understanding of software reuse and how software reuse is done in the Conventional Software Engineering Community and in the Software Product Line Community. Phase 4 is dominated by a quantitative and qualitative study. This compares the results from the two surveys and identifies what one development community can learn from the other. The detailed comparison also allowed us to develop a systematic software reuse process (the Knowledge Based Software Reuse (KBSR) Process), which uses a KBSR Repository, which is incorporated into our modernization approach. We used the KBSR Process which uses the KBSR Repository on our case study ACCRS to confirm its validity. This phase is chosen to develop a systematic software reuse process. The research methods for each research question are:

30

Research Question 1 (RQ1) is examined by literature survey of the software architecture reconstruction research. We chose four architecture reconstruction toolkits and compared these toolkits on different attributes. We selected an architectural reconstruction approach to help us identify reusable software artefacts from the legacy system in our case study and identified reusable software artefacts to modernize the legacy system.

Research Question 2 (RQ2) is examined by surveying software reuse in the Conventional Software Engineering Community and Software Product Line Community to identify issues and concerns in software reuse.

Research Question 3 (RQ3) is explored by literature survey for different modernization approaches used in the software industry and the problems associated with each modernization approach. We developed a modernization approach and used the Automatic Cane Railway Software System (ACRSS) and code from it to illustrate our approach. Our modernization approach uses software architecture reconstruction. After researching Research Question 3 and 4 we developed a systematic legacy system modernization approach which incorporates software reuse and which is methodical, repeatable and learnable through its phases and activities.

Research Question 4 (RQ4) is examined by combining the results of the research questions RQ1, RQ2 and RQ3, and by proposing a systematic legacy system modernization approach that incorporates software reuse through the development of KBSR Process and KBSR Repository for the modernization of a legacy system.

1.5 Contributions

The contributions of our research work are integrated in two observations: • Several aspects of existing software modernization approaches must be revised so that software reuse can become a part of legacy system modernization. This work is based on an investigation of existing 31

modernization approaches and the development of a new modernization approach in which software architecture reconstruction is carried out to understand and identify reusable software artefacts and artefacts are reused in the modernization. • Software reuse should be an in-built phase of the modernization approach to reap the benefits of already developed software artefacts. Issues and concerns related to software reuse are identified so that a systematic software reuse process can be developed. Empirical evidence is provided for the issues and concerns in software reuse. To make software reuse a integral phase in the modernization of legacy systems it is necessary to develop a software reuse process and a reuse repository that manages reusable software artefacts.

Our contributions are in the following area:

• Contribution on Modernization Approaches : Developing a systematic legacy system modernization approach through the development of the KBSR Process and the KBSR Repository for legacy system modernization; • Contribution on Empirical Work: Contributions in identifying issues and concerns in software reuse in the Conventional Engineering Community; and Contribution in identifying Issues and Concerns in Software Reuse in Software Product Line Community.

1.5.1 Contributions on Modernization Approaches

We have developed a systematic legacy system modernization approach which uses software architecture reconstruction for finding reusable components and incorporates the KBSR Process and KBSR Repository. The practice of software development continues to shift towards the reuse of artefacts and components from legacy systems to handle the complexities of software development. Object-oriented programming enables us to reuse the code by organizing the system into objects through information hiding, encapsulation, polymorphism and other techniques, which results in increased flexibility compared to non-object oriented systems. We have proposed and tested a new four phase process for modernization of legacy

32 systems based on software reuse which reuses artefacts from a software reuse repository. It was necessary to develop a software reuse process and a reuse repository that manages reusable software artefacts so that software reuse can be done systematically. We call our process and repository the KBSR Process and the KBSR Repository. The KBSR Process and Repository aim to give software engineers easy access to reusable software artefacts and reusable components within a defined process which can be incorporated into a software development life cycle. The knowledge used for software development can be categorized and saved in the KBSR Repository for reuse. The knowledge extracted from legacy system can also be categorized and saved in the KBSR Repository for modernization with software reuse.

1.5.2 Contributions on Empirical Work: Contributions in Issues and Concerns in Software Reuse in Software Engineering Communities

The quantitative and qualitative analysis of our survey results identified the need for software reuse and the critical factors which effect and inhibit software engineers and developers adopting software reuse. Software reuse cannot possibly become an engineering discipline so long as issues and concerns have not been clearly understood and dealt with. There is no clear consensus between input and, output artefacts and the requirements that an effective reuse process must have. We completed a survey on software reuse in the conventional software engineering community and the software product line community the focus of which is to identify issues and concerns in software reuse which has helped us to develop a Knowledge Based Software Reuse (KBSR) Process that addresses many of the issues and concerns that we have identified. Software reuse has been talked about in generic terms in software product lines. Also in the software product lines field it is evident that software reuse is a core concept. Software reuse has, however, failed to become a standardized practice in SPL. In an attempt to understand the obstacles to implementing software reuse in SPL we conducted a survey to investigate how software reuse is adopted in SPL so as to provide the necessary degree of support for engineering software product line systems and to identify some of the issues and concerns in software reuse. The 33 identified issues and concerns have helped us to understand the difference between software reuse in the CSE community and SPL community and contributed to the development of the KBSR Process. It has also given us an indication of how both communities can learn good software reuse practices from each other to make a common software reuse process. This software reuse process can be used by any organization trying to reap the benefits of software reuse while overcoming many of the difficulties.

1.6 Thesis Structure

The rest of the thesis is organized as follows. Chapter 2 presents our literature review on modernization and software reuse. It discusses where the problem of modernization comes from, What is already known about the problem of legacy modernization, what are the different modernization approaches and what is missing in the current approaches. It also discusses legacy modernization definitions and challenges and software reuse definitions and challenges. Chapter 3 discusses software evolution dynamics, different drivers of software modernizations, our classification of existing modernization approaches, shortfalls of existing modernization approaches, software architecture and its role in legacy modernization. Chapter 4 presents an overview of software architecture and its importance in the modernization of legacy systems. It describes different quality attributes related to modernization and what the need are for software architecture and software architecture reconstruction in a modernization approach. Chapter 5 discusses different architecture reconstruction toolkits and the toolkit we used in our modernization approach. In this Chapter, we have examined four software architecture reconstruction toolkits and assessed and compared their capabilities. The toolkits’ capabilities are evaluated in terms of extraction ability, abstraction ability, navigation, ease-of-use, views generated, language support, extensibility and completeness.

34

Chapter 6 describes our modernization approach that reuses code during the modernization of legacy systems. We have demonstrated our modernization approach on a number of subroutines of the case study, the Automatic Cane Railway Scheduling System (ACRSS). We modernized one independent module of the system. We also performed a few perfective maintenance tasks such as adding security features to a few subroutines and found that our modernized system is more capable of evolution than the legacy system where we could not add the security features. Chapter 7 outlines the structure of the software reuse surveys we did with the CSE and SPL communities. In these surveys we have collected and established actual data about software reuse in software development that would help us to better understand the issues and concerns around reusing code for system modernization. The survey results were analyzed to identify issues and concerns in software reuse. Chapter 8 aims to identify and explore the issues and concerns in software reuse in the CSE community. We used the survey method so that we could determine to what extent reuse was taking place in projects today, how developers use and reuse different types of code and other artefacts, how much testing they are doing on the code, whether or not the code is more reliable and what are the issues and concerns which are inhibiting software reuse in the software development life cycle. Chapter 9 describes issues in software reuse in the Software Product Line (SPL) Community. Software Product Line practitioners come across many software reuse problems. We surveyed the software product line practitioner community (managers, developers, architects, and researchers) to help identify and document software reuse issues and concerns in SPLs. Up to now the software reuse issues and concerns have not been surveyed, compared or documented in a systematic way. Chapter 10 compares software reuse in the CSE Community and SPL Community. This chapter also describes differences in software reuse approaches in both communities and identifies some insights that could flow from one community to the other. Chapter 11 describes our approach of building a Knowledge Based Software Reuse (KBSR) Process and Repository for systematic legacy system modernization. We validated our KBSR Process on the same case study, Automatic Cane Railway Scheduling System (ACRSS) which we used for our modernization approach. The 35 experiences and findings of using the KBSR Process and the use of the KBSR Repository are reported in this Chapter. Chapter 12 evaluates our modernization approaches. The aim of this chapter is to evaluate our modernization approaches. The modernization approaches we have used to modernize the legacy system are: Reusing Code for Modernization, Modernization using Knowledge Based Software Reuse Process and Repository. We have demonstrated our modernization approach on a number of subroutines of the case study, the Automatic Cane Railway Scheduling System (ACRSS). We obtained the source code of ACRSS and modernized a few modules of the system and gave it back to the organization to use the system. We also performed a number of perfective maintenance tasks such as adding security features to some subroutines and found that our modernized system is more capable of evolution than the legacy system where we could not add the security feature. Chapter 13 concludes and summarizes our research work and revisits the research questions, and outlines our contribution to the software engineering community and body of knowledge.

1.7 Chapter Summary

Over the past few years there has been a great deal of interest in software reuse in the CSE and SPL communities. Our contribution includes a new approach for the modernization of legacy systems that reuses existing code. We have discussed our modernization approach in later chapters where software reuse is an inherent part. We identify some issues and challenges with existing modernisation and software reuse approaches. We outline our research model to identify through surveys and address issues and concerns of software reuse in the Conventional Software Engineering and Software Product Line Communities. Identifying these issues and concerns helps us to develop the Knowledge Based Software Reuse Process and Repository which we will use for the modernization of legacy systems. This modernization approach will be applied on a case study of the Automatic Cane Railway Scheduling System. Software reuse is the prime focus of our modernization approach.

36

CHAPTER 2: LITERATURE REVIEW

2.1 Introduction

This chapter presents our literature review on modernization and software reuse. It discusses where the problem of modernization comes from, what is already known about the problem of legacy modernization, what are the different modernization approaches and what is missing in the current approaches. It also discusses legacy modernization definitions and challenges and software reuse definitions and challenges. This chapter describes the challenges in legacy modernization approaches and the motivation behind software reuse and what is missing in the current modernization approaches. Then, there is a classification of literature related to legacy modernization, software reuse, software reuse in different software engineering communities, and systematic software reuse processes for legacy modernization. The definitions of these subjects are discussed and research challenges are described for each of them. Many side effects of current modernization approaches are briefly presented that are the reason behind proposing an alternative modernization approach. Finally, the whole chapter is summarized to answer the questions: Where did the problem come? What is already known about this problem? What other methods have been tried to solve it? and what are the research challenges?

2.2 Legacy System Definitions and Challenges

There are many definitions of legacy systems given by different researchers and software practitioners. We have examined the existing definitions within the literature and on the web and these include:

1. A legacy system is defined as a system which was developed sometime in the past and is critical to the business in which the system operates [3]. 37

Typically, legacy systems were developed before the widespread use of modern software engineering methods and have been maintained to accommodate changing requirements. These two factors result in systems which are often difficult to understand and expensive to maintain. Many legacy systems thus present a dilemma – such systems are critical to the business process, but maintaining them incurs unjustifiable expense.

2. A legacy system is an antiquated computer system or application program which continues to be used because the user (typically an organization) does not want to replace or redesign it. Legacy systems are considered to be potentially problematic by many software engineers for several reasons. Legacy systems often run on obsolete (and usually slow) hardware, and sometimes spare parts for such computers become increasingly hard to obtain. These systems are often hard to maintain, improve, and expand because there is a general lack of understanding of the system; the designers of the system have often left the organization, so there is no one left to explain how it works. Such a lack of understanding can be exacerbated by inadequate documentation, or manuals getting lost over the years. Integration with newer systems may also be difficult because new software may use completely different technologies [53].

3. A legacy system is an old method, technology, computer system, or application program that continues to be used, typically because it still functions for the users’ needs, even though newer technology or more efficient methods of performing a task are now available. A legacy system may include procedures or terminology which are no longer relevant in the current context, and may hinder or confuse understanding of the methods or technologies used. The term “legacy” may have little to do with the size or age of the system — mainframes run 64-bit Linux and Java alongside 1960s vintage code [54].

4. Legacy systems are not just application software systems. They are socio- technical systems so include layers of business processes, application 38

software, support software and system hardware. Each layer depends on the layer immediately below it and interfaces with that layer. If interfaces are maintained, then one should be able to make changes within a layer without affecting either of the adjacent layers [53].

5. When software is new, it is very malleable; it can be formed to be whatever is wanted by the implementers. But as the software in a given project grows larger and larger, and develops a larger base of users with long experience with the software, it becomes less and less malleable. Like a metal that has been work-hardened, the software becomes a legacy system, brittle and unable to be easily maintained without fracturing the entire system [1].

6. A legacy system is any system that significantly resists modification and evaluation to meet new and constantly changing requirements. Legacy systems share many negative characteristics. The most typical ones are: they are monolithic code, they are geriatric, often more than 10 years old, they are written in 3GL languages, and they consist of millions of lines of code [55].

We can define a legacy system as a useful system which an organization would like to keep, but is difficult to understand and hence difficult to maintain, evolve, deploy and lacks business functions that support the evolving organization. Also the cost of running these legacy systems are often not justifiable. Organizations often describe their legacy systems as business assets. These same systems are also often referred to as business liabilities, sometimes even in the same sentence: "that system is our biggest business asset, and it's getting to be a real liability" [56]. Legacy systems must continually evolve to meet ever changing needs [2]. Business requirements change over time and this suggests that a legacy system must evolve.

39

2.3 Legacy System Evolution

A legacy system may evolve in a number of ways and the system evolution activities can be divided into three categories [2]: maintenance, modernization, and replacement. System evolution is a broad term that covers a continuum from adding a field in a database to completely re-implementing a system. Figure 2.1 illustrates how different evolution activities are applied at different phases of the software system life cycle. The dotted line represents growing business needs while the solid line represents the functionality provided by the system. Repeated system maintenance supports the business needs sufficiently for a time, but as the system becomes increasingly outdated, maintenance falls behind the business needs. A modernization effort is required that represents a greater effort, both in time and functionality, than the maintenance activity. Finally, when the old system can no longer be evolved, it must be replaced [57].

Figure 2.1: Software System Life Cycle System assessment is used to gain an understanding of a legacy system, which is fundamental to any system evolution exercise. To understand whether a legacy 40 system needs maintenance, replacement or modernization, Ransom, et al. have described an assessment technique called RENAISSANCE [58]. Ransom, et al.’s assessment method is a product of the ESPRIT project RENAISSANCE [59]. The RENAISSANCE method offers guidance for system modeling, techniques for migrating legacy systems to distributed client/server technology, and advice for evolution planning. To better understand the extent of modernization, we briefly describe maintenance, replacement, and modernization. Maintenance – Maintenance is the modification of software after it is released. It should not change the architecture of the system but focus on bugs or some small functional improvements. Maintenance is a continuous and repeated process. As time goes by, it is difficult to find people who are familiar with the technology and people who are familiar with the system itself on which a legacy system is built, so the maintenance cost for the system increases. Maintenance is required to support the evolution of any software system, but it does have many limitations [13]. One of the limitations is adopting new technologies. Competitive advantage derived from adopting new technologies is constrained because enhancements such as implementing a distributed architecture, or a GUI interface are not typically considered maintenance activities. The other limitation is to modify a legacy system to adapt it to new business requirements. This becomes increasingly difficult, because small changes may have greater impact on legacy systems overall. Maintenance impacts on many quality attributes such as availability, reliability and maintainability.

Replacement – Replacement means wholly constructing or buying a new system to replace an outdated and non-extensible legacy system. Replacement is appropriate for legacy systems that cannot keep pace with business needs and for which modernization is not possible or cost effective [60]. The investment in replacement is the biggest among maintenance, modernization and replacement. It can involve the development of a completely new system and hence follows system development life cycle phases. This requires the complete testing of all modules and units implemented. Replacement can also involve buying a new system which may not able to fulfill all the business requirements of the organization and may require customization. Legacy systems are often bespoke systems and buying a replacement 41 is not feasible as it may not be possible for the system to comply with all the functionalities.

Modernization – Legacy modernization is restructuring of a legacy system to a modern language (4GL), such as an object-oriented programming language, by reusing the existing legacy assets and improving reliability and maintainability of the legacy system for the future and hence updating aging applications and systems to interact with newer technologies. Legacy modernization aims to retain and extend the value of the legacy investment through migration to new platforms. The process of modernization means gradually and partially improving the legacy system with some new technique. Modernization involves more extensive changes than maintenance, but conserves a significant portion of the existing system. These changes often include system restructuring, important functional enhancements, or improved quality attributes such as maintainability. Modernization is used when a legacy system requires more pervasive changes than those possible during maintenance, but the system still has business value that must be preserved. Software modernization attempts to evolve a legacy system, or elements of a system, when conventional evolutionary practices, such as maintenance, can no longer achieve the desired result [7]. In the long run the business requirements change and the legacy system cannot be maintained to conform to the changing needs of the organization. We might recall the busy timeframe, in which organizations were validating their systems for Y2K (Year 2000) compliance when we were approaching the year 2000. The IT people probably cannot forget about how stressful they were on the midnight of December 31, 1999 as people were ready to celebrate the New Year. But although many legacy systems survived for Y2K, how many legacy systems can yet survive due to the nature of software evolution [61]. Legacy system modernization is an inevitable process due to software evolution.

42

2.4 Legacy System Modernization

Legacy modernization is the practice of updating aging applications and systems to interact with newer technologies. While the scope of a modernization effort is not always fixed or well defined, software engineers typically strive for greater application agility so they can rapidly respond to business requests for change [62]. When planning a modernization effort, it should be carefully considered how best to leverage existing assets. And, it must be considered how best to support future initiatives, about which the organizations may yet know very little. Software modernization is more challenging than most software engineers suspect. Legacy software systems tend to expand with time, as efforts to remove unused code are seldom funded. The average Fortune 100 company, for example, maintains 35 million lines of code and adds 10 percent each year only in enhancements, updates, and other maintenance. As a result, the amount of code maintained by these companies doubles in size every seven years [7]. More software means more software to evolve and maintain. For example, companies modify software in response to changing business practices, to correct errors, or to improve performance, maintainability, or other quality attributes. This is all in accordance with Lehman’s first law: “A large program that is used undergoes continuing change or becomes progressively less useful” [61]. It is not possible to redevelop business critical legacy systems due to the risks involved in doing so. Some of these risks are: • The legacy system may not be well documented and specifications may need to be redeveloped and this may introduce errors in the system; • The legacy system may not conform to the running system and any redevelopment based solely on the legacy system may create problems; • Critical data and business logic may not be replicated; • The size and complexity may have grown beyond a comprehensible level to understand and analyse.

The most suitable legacy system for modernization is a system having low quality, is mission-critical, and which provides competitive advantage to an organization. The second most suitable candidate for modernization is high quality 43 systems with standard functionality. These systems are the most likely candidates for integration with e-business and other modern enterprise solutions, and in most cases should be modernized by extending their use to an Internet platform [64]. A practical approach to compare technical quality with business value [65], based on plotting the results on a quadrant map is shown in Figure 2.2. From analyses of our literature survey we can say that legacy systems fall into four categories. They are as follows: • Low Priority Modernization: These systems have high business value and high quality. These systems have a low priority for modernization. • No Modernization: These systems have high quality and low business value. These require no modernization effort as it does not cost much for organizations to keep them running. • Replace with Commercial Package: These systems have low quality and low business value. These systems should be replaced with commercial packages. • Good Modernization Candidate: These legacy systems have low quality and high business value. These are good candidates for modernization.

High Quality Low Value High Quality High Value No Modernization Low Priority Modernization

Low quality Low value Low Quality High Value Replace with commercial package Good Modernization Candidate

Figure 2.2: Technical Quality and Business Value

2.5 Approaches for Modernization of Legacy Systems

We identified several approaches to the modernization of legacy system. Modernization can be done at different levels. At lower levels, modernization takes 44 the form of transforming the code from one language into another. At higher levels, the structure of the system may be changed as well as to make it, for instance, more object-oriented. At still higher levels, the global architecture of the system may be changed as part of the modernization process. Although, design pattern and generic programming have been very successful in new software development, we are not aware of any work from other research groups, who have studied the effectiveness of these techniques in restructuring the legacy code and hence modernizing legacy systems. The following gives an overview of the various modernization approaches.

Errickson-Connor [66] proposed the steps of a software modernization process where legacy code is transformed to new languages and new environments. She suggests that in the first stage legacy code needs to be cleaned up by removing program anomalies before it can be transformed. The second stage involves the tasks of software restructuring such as identifying business rules, isolating business rules, and extracting business rules as reusable services. When the code corresponding to a business rule is extracted, it is ready for transformation into components in stage three. The fourth stage manages these reusable components in a software environment. Zhang Li, et al., [67] have provided a modernization process called Tollgate Model Transformation with which the legacy system can be adapted to a services oriented architecture (SOA) system whose granularity is changeable. The Tollgate Model Transformation process is based on the wrapping technique. The Aberdeen Group [68] did a survey of the legacy application modernization and found that companies are looking to the SOA approach to create distributed applications to help them both modernize their legacy applications and to make their composite applications more flexible and therefore giving their businesses more agility. Some companies, however, are looking to simply get rid of legacy applications on mainframes and UNIX servers to get rid of the legacy problem. Fuhr et al. [267] have described an approach using model-driven techniques to extend IBM’s SOMA method towards migrating legacy systems into Service- Oriented Architectures (SOA). The approach was applied to the migration of functionality of GanttProject towards a Service-Oriented Architecture. As result, fully functional Web Services were generated whose business functionality where implemented by transforming legacy code. The approach addresses the

45 semiautomatic migration of legacy software systems to Service-Oriented Architectures, based on model-driven techniques and code transformation. Other legacy system modernization approaches that have been developed by researchers and software practitioners include black-box approach such as wrapping, white-box approach which requires program understanding and reverse-engineering, screen scraping, data wrapping, legacy integration using Common Gateway Interface (CGI), data contextualization, architecture driven modernization, COTS based modernization, etc. Some of the concepts of one modernization approach overlap with the concepts of another modernization approach. Examples are wrapping which is also called the black-box approach and white-box approach requires program understanding. There is no clear and concise classification of these modernization approaches to be found in the existing literature. We give an overview of these modernization approaches in the following section and detailed discussion along with our classification of these modernization approaches is given in Chapter 3.

2.5.1 Black-box Modernization Approach

Black-box modernization approach provides a new interface to a legacy component. Black-box modernization does not require the understanding of the system and treats a running system as a black box. A new interface is designed so that the functionality of the legacy system can be achieved using the new interface. Black box modernization includes techniques such as screen scraping, database gateway, XML integration, CGI integration, object-oriented wrapping of legacy systems [30]. In the absence of legacy system knowledge this approach can be used. Black-box approach can be used where a system is already very stable and only needs to be interoperable with another external system. Security and reliability are the other quality attribute which can be improved using black-box modernization. Reliability is the ability of a system to remain operational over time. Reliability is measured as the probability that a system will not fail to perform its intended functions over a specified time interval. Maintainability or reusability are not the objective when using this modernization approach.

46

2.5.2 White-Box Modernization Approach

White-box modernization approach requires an understanding of legacy system internals. If this understanding is unavailable some work needs to be done to understand the internals of the legacy systems. White-box modernization includes source code restructuring. Source code restructuring keeps the external behavior of the system intact and improves maintainability and performance of the system [13]. White-box modernization requires program understanding. Maintainability and reusability are the main objectives of using this modernization approach. Interoperability is another quality attribute which can be improved by White-box modernization approach. Interoperability is the ability of a system or different systems to operate successfully by communicating and exchanging information with other external systems written and run by external parties. An interoperable system makes it easier to exchange and reuse information internally as well as externally.

2.5.3 Wrapping

The wrapping approach is also called black-box modernization. This provides a new interface to a legacy component. In other words wrapping removes mismatches between the interface exported by a software artefact and the interfaces required by current integration practices. Wrapping involves surrounding existing data, individual programs, application systems, and interfaces to give a legacy system a ‘new and improved’ look or improve operations [4], [69]. The wrapped component acts as a server, performing some function required by an external client, which does not need to know how the service is implemented [70]. Wrapping permits re-using components and leveraging the massive investment done in the legacy system over many years. This approach enhances security and interoperability but does not have any effect on maintainability and reusability.

2.5.4 Migration

Legacy migration allows legacy systems to be moved to new environments that allow such systems to be easily maintained and adapted to new business requirements, while retaining the functionality and data of the original legacy 47 systems without having to completely redevelop them. Ganti and Brayman [71] propose general guidelines for migrating legacy systems to a distributed environment. Using these guidelines, the business is first examined and the business processes found are re-engineered as required. Migration of legacy system includes system migration and component migration. Component migration involves migrating small components to new platforms whereas system migration is migrating the complete system to a new platform [72]. The few successful migration reports found in the literature [73], [74] describe ad-hoc solutions to the problem at hand. Migration involves complete understanding of the legacy system, its interfaces and legacy data [75]. Migrating legacy systems to services enables both, the reuse of already established and proven software components and the integration with new services, including their orchestration to support changing business needs. In order to gain most benefit from a migration, a comprehensive approach supporting the migration process and enabling the reuse of legacy code is required [267]. One of the objectives of migration of legacy system to a newer platform is to improve interoperability. Interoperability is the ability of a system or different systems to operate successfully by communicating and exchanging information with other external systems written and run by external parties. An interoperable system makes it easier to exchange and reuse information internally as well as externally. Communication protocols, interfaces, and data formats are the key considerations for interoperability. Standardization is also an important aspect to be considered when designing an interoperable system. Understanding of the legacy system, its interfaces and legacy data are migration challenges.

2.5.5 Screen Scraping

Carr [76] has suggested a technique for modernization called screen scraping. Screen scraping consists of wrapping old, text-based interfaces with new graphical interfaces. The old interface is often a set of text screens running on a dumb terminal. In contrast, the new interface can be a PC-based, graphical user interface (GUI), or even a hypertext mark-up language (HTML) light client running in a Web browser. This technique can be extended easily, enabling one new user interface to wrap a number of legacy systems. From the perspective of the legacy system, the new 48 graphical interface is indistinguishable from an end user entering text in a screen. From the end user’s point of view, the modernization has been successful as the new system now provides a modern, usable graphical interface. However, from the IT department’s perspective, the new system is as inflexible and difficult to maintain as the legacy system. Screen scraping is basically a “makeover” for legacy systems. This kind of modernization can be effective for stable systems where the principle objective is to improve usability and not maintainability.

2.5.6 Data Wrapping

Data wrapping improves connectivity and allows the integration of legacy data into modern infrastructures. Legacy systems often exchange information developed on different systems, where sources and receivers have implicit preconceived assumptions about the meaning of data. It is thus not uncommon for system A and system B to use different terms to define the same thing. However, in order to achieve a useful exchange of data, the individual systems must agree on the meanings of the exchanged data. In other words, the legacy systems must ensure interoperability. Altman, et al., [77] have suggested data wrapping. Data wrapping enables accessing legacy data using a different interface or protocol approach than those for which the data was designed initially. Data wrapping improves the interoperability of the system.

2.5.7 Legacy Integration using CGI

Eichmann [78] has suggested legacy integration using Common Gateway Interface (CGI). The CGI is a standard for interfacing external applications with information servers, such as HTTP or Web servers. Legacy integration using CGI is often used to provide fast web access to existing assets including legacy systems on mainframes and transaction monitors. The GUI communicates directly with the core business logic or data of the legacy system instead of wrapping it as in screen scraping. This approach adds value to interoperability but does not improve maintainability of the legacy system.

49

2.5.8 Data Contextualization

Data contextualization is a technique which can be used in a modernization approach when the reverse engineering stage is being carried out. Systems that attempt to integrate and analyze data from multiple data sources are greatly aided by the addition of specific semantic and metadata context that explicitly describes what a data value means. Ricardo, et al., [79] proposed the Data Contextualization Technique. This technique recovers the linkages between pieces of legacy source code and the fragments of database schemas used for that piece. This is to place the data in context in order to provide detailed metadata to integration systems. The context of a piece of data includes its semantics (“To what specific concept does this piece of data refer ?”), its syntax (“How is this piece of data structured?”), and other related metadata such as information about the quality of the data [80]. Data contextualization makes data format standardized. This helps in the exchange of information.

2.5.9 Architecture-Driven Modernization (ADM)

In June 2003, the Object Management Group (OMG) formed a Task Force on modeling in the context of legacy software systems. Initially, the group was called the Legacy Transformation Task Force, but then the name was unanimously changed to the Architecture-Driven Modernization (ADM) Task Force. Reengineering and MDA (Model-Driven Architecture) have converged on ADM. ADM is the concept of modernizing existing systems with a focus on all aspects of the current systems architecture and the ability to transform current architecture to target architecture [81]. ADM is the process of understanding and evolving existing software assets for the purpose of software improvement; modifications; interoperability; re-factoring; restructuring; reuse; porting; migration; translation into another language; and enterprise application integration [82]. ADM usually involves one or more components of the IT architecture. Each component of an IT portfolio has its own trajectory of evolution from the as-is state to the to-be state (i.e. an element of the existing solution evolves into an element of

50 the target solution). Figure 2.3 depicts various trajectories across the knowledge curve that reflects transformations within architectural perspectives. The increasing cost of maintaining legacy systems together with the need to preserve business knowledge has turned modernization of legacy systems into an important research field. ADM provides several benefits such as return on investment (ROI) improvements on existing information system, reducing development and maintenance cost, extending the life cycle of the legacy systems, and easy integration with other systems.

Business Architecture

Application and Data Architecture

Technical Architecture

Existing Solution Target Solution

Figure 2.3: Model of ADM Modernization

The work of the ADM Task Force in OMG has led to the development of several standards. The cornerstone within this set of standards is the Knowledge Discovery Metamodel (KDM). KDM allows standardized representation of knowledge extracted from legacy systems by means of reverse engineering [83]. KDM provides a common repository structure that makes possible the exchange of information about existing software assets in legacy systems.

51

2.5.10 COTS Based Modernization

Kotonya, et al., [84] have described a COTS based modernization approach called COMPOSE which is a component–based approach to extending legacy systems. The COMPOSE method embodies a cyclical development process that integrates verification into every part of the process to ensure that there is an acceptable match between components and the system being built. It also includes negotiation in each cycle as an explicit recognition of the need to trade-off and accept compromise in successful component-based system development. This ensures that even the earliest stages of system development are carried out in a context of off-the shelf component availability, system requirements and critical architectural concerns. The lack of practical methods and tools has hampered more widespread use of COTS in modernizing legacy systems [85, 86]. In all of the above described modernization approaches the key is using the existing legacy artefacts in some form whether it is black-box or white-box approach. The legacy artefacts are reused depending on the modernization approach used. For example in black-box modernization approach the complete legacy system is reused as it is with a new wrapper on it. In white-box class, modules, objects, etc. are reused for restructuring. In program understanding, legacy documents, Data Flow Diagram (DFD), Entity Relationship Diagram (ERD) sequence diagrams, etc. are reused. Hence, the key to modernization is software reuse. However, software reuse has its own challenges.

2.6 Software Reuse

Software reuse has been associated with software engineering since its early days at the NATO Software Engineering Conference in 1968 which marked the birth of the idea of systematic software reuse as well as the formal birth of the field of software engineering. This section describes definitions of software; software reuse challenges; software reuse benefits; and software reuse as web-services.

52

2.6.1 Software Reuse Definitions

Software reuse is defined as the process of building or assembling software applications and systems from previously developed software [87]. McIlroy first introduced the idea of software reuse as a planned development and widespread use of software components in 1968 [87]. Morisio, et al. defined reuse as [88]: a systematic practice of developing software from a stock of building blocks, so that similarities in requirements and/or architecture between applications can be exploited to achieve substantial benefits in productivity, quality and business performance. Morisio, et al.’s defines software reuse is an umbrella concept, encompassing a variety of approaches and situations but excludes ad-hoc reuse, reuse of knowledge, or internal reuse within a project. Frakes, et al. define software reuse as, “the use of existing software knowledge or artefacts to build new software artefacts”, a definition that includes reuse of software knowledge [89]. Determining whether or not a component or asset can be reused depends on the reusability of the component or asset. The reusable components or assets can take several forms: a subroutine in a library, free-standing Commercial-Off-The-Shelf (COTS) or Open Source Software (OSS) components, modules in a domain-specific framework, or entire software architectures and their components forming a product line or a product family. Mili, et al. define reusability as a combination of two characteristics [90]: I. Usefulness, which is the extent to which an asset is often needed II. Usability, which is an extent to which an asset is packaged for reuse.

Krueger [91] identified different types of reuse. They can be further grouped as follows: I. Code scavenging of fragments from existing software systems and use them opportunistically as part of new software development; II. Reuse source code components including functions, libraries, modules, packages, subsystems, and classes;

53

III. Reuse software architecture consisting large-grain software frameworks and subsystems that capture the global structure of a software system design. Software reuse is an approach to create software systems from existing software rather than building software systems from scratch [91]. Reuse of software knowledge such as domain knowledge, or patterns may happen without the reuse of building blocks and is captured in domain engineering. Krueger’s definition is closer to what is meant by “software reuse” in this thesis. Software can be developed with reuse which means that existing software assets are being reused during the development of a new software system. Software for reuse implies that new software is developed for reuse in the future such as creating objects in OO programming. The objects are created for reuse in different application software. To understand how successful software reuse is we need to understand the challenges associated with software reuse. Software reuse challenges are described in the following section.

2.6.2 Software Reuse Challenges

Morisio et al. have performed structured interviews of project managers of 32 Process Improvement Experiments funded by the European Commission to understand software reuse attributes [88]. The success of software reuse depends on many attributes. Some of the identified attributes are:

1. Top management commitment is a prerequisite for software reuse success; 2. Product family practice, common architecture and domain engineering increase reuse capability; 3. Size, development approach (object oriented or not), rewards, repository and reuse measurements are not decisive factors, while training is; 4. The other three factors that are considered to be successful factors are reuse process introduced, non-reuse process modified and human factors; 5. Successful cases tried to minimize change, to retain their existing development approach, choosing reuse technology to fit that.

54

Morisio et al. [88] concluded that reuse varies and it is important that software reuse fits in the context of specific reuse. Griss et al. wrote that software reuse needs [21]: 1. Management support, since reuse involves more than one project; 2. Domain stability and software developer experience for successful software reuse; 3. Process understanding so that object technologies or libraries can give improvement in software reuse. 4. Incremental adoption of the software reuse so that software can be reused on the incremental modules. Software reuse needs are related to the software reuse challenges. Software reuse needs to fit in a specific context of reuse as stated by Morisio et al. [88]. There has to be a process in which the software reuse can fit as a phase so that software can be reused. Frakes and Fox investigated 16 questions about software reuse using a survey in 29 organizations in 1991-1992 [89]. They report that most software engineers prefer to reuse rather than build from scratch. They also did not find any evidence that the use of certain programming languages or CASE tools promote reuse. They also found that the telecom industry has higher levels of reuse than some other fields. Schmidt [92] found that systematic software reuse is most effective when the following prerequisites are met: 1. The market is competitive where time-to-market is crucial; 2. The application domain is complex, such as distributed real-time systems where coding from scratch is too error-prone, costly and time consuming; 3. The corporate culture, development processes and tools are supportive; 4. Attractive “reuse magnets” exist including technologies as well-maintained frameworks and component repositories; 5. Strong leadership of empowerment of skilled architects and developers.

Software reuse is a long term investment. In addition to operating cost saving, software reuse may support strategic options such as realignment and re-factoring of enterprise developments to meet critical business objectives and enables out-sourcing of certain development activities. Other strategic options are the opportunity to enter 55 new markets, or flexibility to respond to competitive forces and changing market conditions. However the challenges of software reuse must be overcome, such as, identifying the preconditions to start a reuse program, developing process for software reuse, its roles and steps, and adapting existing processes, to make software reuse an integral part of the software development life cycle.

2.6.3 Software Reuse Benefits

The benefits of software reuse can be quite significant. Reuse programs within organizations like AT&T, GTE, Ericsson, NEC, Toshiba, and HP, demonstrate that time to market can be improved by a factor of 1.5 to 2.5 or more. Software or firmware that used to take 24 months or longer to produce, can take only 6 months or less to develop with reuse. Quality can be increased by 5-10 times, and cost of development and maintenance can be reduced [21]. And there are other benefits. Reuse of standard modules can increase the interoperability and consistency of products. Reusable software are called reusable assets. Assets may be software designs, components, requirements, artefacts, test cases, architectures, models, design patterns, use cases, and business processes [93]. Appropriate components and supporting tools can enable a professional services organization to rapidly produce “almost custom solutions” cost effectively [94]. Many software organizations around the word have reported successful reuse programs such as IBM, Hewlett-Packard (HP), and Hitachi [76]. The report shows that reuse actually works and refers to improved productivity, decreased time-to- market and/or reduced cost. For HP, a reuse program resulted in cutting time to market for certain products by a factor of three or more, from 18 months to less than 5 months [77]. But reuse is not a silver bullet. This simple idea is quite complex in practice. It is not enough to gather interesting pieces of software into a library and offer them to people to reuse. Components have to be carefully designed and carefully developed so that they are high quality and work together. They have to be documented well so that those wanting to reuse can understand them. Components have to be carefully chosen so that this extra investment will be repaid by significant reuse. This works best when reusing components between members of a product line or product family (or product domain). 56

Studies have shown that software reuse is a critical aspect for organizations interested in the improvement of software development quality and productivity [95]. Quality could be improved by reusing all forms of proven experience, including products and processes, as well as quality and productivity models. Productivity could be increased by reusing existing experience, rather than creating everything from scratch. Through the years several research works, including company reports [96], [97], [98], [21], [99] informal research [100] and empirical studies [101], [25], [102] have shown that an effective way to obtain the software reuse benefits is to adopt a reuse process. Probably the answer to why software reuse is an issue compared to hardware reuse is because software assets are typically very information rich, and hence, it is difficult to characterize them, match them, and capture their relevant properties. An approach proposed by Kruger [103], called the extractive model, reuses one or more existing software products in a product line’s initial baseline. This approach can be effective for an organization that has accumulated development experience and artefacts in a domain but wants to quickly transition from conventional to software product line engineering. When accumulated expertise is used properly, this approach may not require a large capital investment. Software product lines refer to engineering techniques for creating a portfolio of similar software systems from a shared set of software assets using a common means of production. The characteristic that distinguishes software product lines from other software reuse is predictive versus opportunistic software reuse. Rather than put general software components into a library in hopes that opportunities for reuse will arise, software product lines only call for software artefacts to be created when reuse is predicted in one or more products in a well-defined product line. However, even today, with the idea of software product lines, there is still no clear software reuse process. There is no clear consensus between input, output, software artefacts and the requirements that a software reuse process must have. The benefits of software reuse should be quantified and empirically assessed.

57

2.6.4 Example Approach to Software Reuse as Web Services

One area of software reuse, especially in the reuse of software functionality, is the reuse of software by making it available as web services or using it in other web environments. This approach has been used in the modernization of legacy systems. Web services are an industry wide project for service description and discovery. Simply, Web services defines a “standardized mechanism to describe, locate, and communicate with online applications” [104]. Sneed and Sneed [105] outline an approach for creating web services from reusing existing systems. Connecting existing systems to the Web is the ability to link client programs on the Web site with server programs. The server programs have not been conceived to run in an internet mode. They are either online transactions or batch steps. Zhou and Kontogiannis [106] outline an approach to migrate legacy applications through identification of the major legacy components and reusing these procedural components in an object-oriented design, specifying the interfaces, automatically generating the wrappers and seamlessly interoperating them via HTTP based on Simple Object Access Protocol (SOAP) messaging. Woodside, Zheng and Litoiu [107] outline issues such as performance and scalability related to reuse and migration of legacy applications to web services. O’Brien and Smith [108] have emphasized the need of software architecture reconstruction as a decision making tool to identify and document components dependencies to get better understanding of the potential for component reuse within a legacy system. A service-oriented architecture promises to increase the reuse of software components, thus increasing the programmers’ and testers’ productivity. Service Oriented Architecture (SOA) is increasingly used in enterprise settings on a larger scale. The main focus of SOA lies on information integration, service provisioning to internal and external customers, and internal software reuse. In a survey from November 2006 by [109], 90% of all participants regarded reuse as the most important key driver for investments in a service oriented architecture. At the same time, more than 50% of all participants that had already started building a SOA were

58 not experiencing software component reuse. This indicates that there is a gap between the perception of opportunities that service orientation promises and the experience gone through during the implementation of a SOA.

2.6.5 Categories of Software Reuse

Software reuse can have several level of granularity such as: single line of code reuse, functions/ procedure reuse, modules reuse, components reuse, packages reuse, subsystems reuse, software architecture reuse, entire system reuse etc… According to Tomar et al. [110] software reuse can be categorized as follows: 1. Opportunistic reuse – A team realizes that there are existing components that they can reuse. Opportunistic reuse can be categorized further into: i. Internal reuse – A team reuses its own components, and ii. External reuse – A team chooses to use third party components, or Commercial-Off-The-Shelf components. 2. Planned systematic reuse – A team strategically designs components so that they will be used for reuse. Systematic software reuse and the reuse of components influence almost the whole software engineering process (independent of what a component is). Software engineering process models were developed to provide guidance in the creation of high-quality software systems by development teams at predictable costs. The original models were based on the misconception that systems are built from scratch according to stable requirements. Software process models have been adapted since based on experience, and several changes and improvements have been suggested since the classic waterfall model. 3. Horizontal reuse - Horizontal reuse refers to software components used across a wide variety of applications. In terms of code assets, this includes the typically envisioned library of components, such as a linked list class, string manipulation routines, or graphical user interface (GUI) functions. Horizontal reuse can also refer to the use of a commercial off-the-shelf (COTS) or third- party applications within a larger system, such as an email package or a word processing program. A variety of software libraries and repositories containing this type of code and documentation exist today at various locations on the Internet. 59

4. Vertical Reuse - Vertical reuse, significantly untapped by the software community at large, but potentially very useful, has far reaching implications for current and future software development efforts. The basic idea is the reuse of system functional areas, or domains that can be used by a family of systems with similar functionality [111]. The study and application of this idea has spawned another engineering discipline, called domain engineering. Domain engineering is a comprehensive, iterative, life-cycle process that an organization uses to pursue strategic business objectives. It increases the productivity of application engineering projects through the standardization of a product family and an associated production process [111].

Within the reuse community, several approaches to reuse have emerged. Software developers can select different policies, processes, and technologies to match a reuse program to the business goals and capabilities of an organization. Some of the learning has been summarized in a variety of Reuse Maturity Models [112, 113], suggesting that the benefits, scope, and formality of a reuse program grows as more experience is gained. There are three basic modes in which software asset reuse can be practiced in an organization; there is also a question of how broadly across the organization a practice should be applied. 1. Facilitated Reuse – Organization encourages and supports reuse with limited resources, infrastructure, and policies to make reuse easier. Tools and policies are established to make it easier to submit, publish, find and use reusable assets across the requisite organization. Tools and technology are self-use repository or Web site, software developers may add some metadata to improve search and evaluation. Some support is provided to track usage and inform software developers of changes to the assets they have retrieved. Some incentives are employed to encourage participation as software developers do very little extra effort at validating the assets that are the best to reuse, and that they are in fact reused. Reuse level of this type is 5%-15% [22]. 2. Managed Reuse – Organization enforces reuse practice through policies, resources, tools, and people. Tools and technology used for managed reuse are metadata, project source code, change notification, asset quality assurance, and

60

utilization measurements. There is no standard approach for this type of reuse in the organizations. Typical reuse level for managed reuse is 15-50% [22]. 3. Designed Reuse - Organization invests in carefully designing assets for reuse. Assets are architected or reengineered to fit together. In this mode proactive measures are taken to ensure that assets are developed to meet specific needs. Architecture and domain engineering techniques are used to determine which assets should be developed to cover more of the application space. “Design for reuse” guidelines, standards, and additional reviews ensure that high-quality, compatible assets are produced. This strategy works best when developing for a specific application family or product line within a relatively stable domain. Typical reuse level is 40%-90% [22]. This type of reuse is adopted in product line.

2.6.6 Software Reuse Process Models

Component-based development (CBD) allows the reuse of large and highly abstract enterprise components so fewer components are needed to build an application. This reduces the assembling and integration efforts required when constructing large applications [71]. McClure identifies several properties a software component is expected to have to be reusable [71]. These properties include a set of common functionality, a well-defined interface that hides implementation details, the ability to inter-operate with other components, and the ability to be reusable in several different software applications and systems. With increasing reuse of software, new models for software engineering are emerging as discussed in Chapter 1. New models are based on systematic reuse of well-defined components that have been developed in various projects [32]. Neighbors proposed a reuse process model called Draco [114]. The main ideas introduced by Draco include: Domain Analysis, Domain-Specific Languages, and Components that are used as sets of transformations. Neighbors’s contribution on domain analysis, and transformation is notable. However, his approach is very difficult to apply in an industrial environment due the complexity to perform activities such as writing transformations and using the Draco machine [115] . Even with some advances related to his work [115], many of these problems still remain unsolved. Thus, even presenting a potential possibility to develop reusable software, 61 the software reuse community and industry needed major guidance to achieve software reuse concept in an effective way. Software Technology for Adaptable, Reliable Systems (STARS) [116] developed the Conceptual Framework for Reuse Processes (CFRP). CFRP applies a domain-specific reuse-based software engineering paradigm. The CFRP established a framework for considering reuse-related software engineering processes. CFRP by itself was a very generic framework. CFRP was preceded by the Reuse Oriented Software Evolution (ROSE) process model [116]. The goal of the ROSE process model is to capture, organize and represent knowledge about a domain and produce reusable assets that can be applied to produce a family of systems encompassing that domain. ROSE worked on domain analysis. Jacobson, Griss and Jonsson created the Reuse-driven Software Engineering Business (RSEB) process model [117]. RSEB is a use-case driven systematic reuse process based on UML notation. The method was designed to facilitate the development of reusable object-oriented software. Key ideas in RSEB are: the explicit focus on modelling variability and to maintain traceability links connecting representation of variability throughout all models, i.e., variability present in use cases can be traced to variability in the analysis, design, and implementation object models. The method does not describe a systematic way to perform the asset development as proposed. Griss et al. developed FeatuRSEB [118] based on the limitation of RSEB. FeatuRSEB extends RSEB in two ways: one is the activity that corresponds to domain analysis with feature modelling and domain scoping; second is feature models are used as the main representation of commonality, variability, and dependencies. Although FeatuRSEB presented important considerations related to domain analysis, such as extracting functional features but limitations such as domain scoping and feature modeling in FeatuRSEB were not solved. Kang et al. developed the Feature-Oriented Reuse Method (FORM) [119]. The core of FORM lies in the analysis of domain features and use of these features to develop reusable domain artefacts. The domain architecture, which is used as a reference model for creating architectures for different systems, is defined in terms of a set of models, each one representing the architecture at a different level of 62 abstraction. Nevertheless, aspects such as specification, design, implementation and packaging of the components are little explored. There has been several reuse processes but none of the software reuse process is the part of Software Development Life Cycle. Developing software with reuse requires software reuse process to be a part of SDLC. Developing software with reuse requires planning for reuse, developing for reuse and with reuse, and providing documentation for reuse. The priority of documentation in software projects has traditionally been low [83]. However, proper documentation is a necessity for the systematic reuse of components. If we continue to neglect documentation we will not be able to increase productivity through the reuse of components. Detailed information about components is indispensable.

2.7 Product Families and Software Reuse

Many organizations are using a product family engineering approach for software development by exploiting commonalities between software systems, and by reusing software architecture and a set of core assets. Product family engineering is reuse at the largest level of granularity [120]. The terms product family engineering, product line engineering, system family engineering and application family engineering are used for a wide range of approaches to develop a set of products with reuse. The main idea is to increase the granularity of the reused parts and define a common architecture for a family of systems. The use of product line terminology is sometimes confusing. van der Linden explains the confusion as, “certain European companies use product line to indicate a set of related, commercial products that appear similar to users but often are built with different technologies. For example, product lines in consumer electronics include mobile phones, televisions, VCRs, DVD players, audio receivers, CD players, audio amplifiers and so on. Often products in the same product line are in different product families and vice versa” [121]. The European community uses product family for software products that are built using the same technology, which is the same as a product line in USA.

63

Parnas wrote the first paper on development of systems with common properties in 1976. He wrote: “We consider a set of programs to constitute a family, whenever it is worthwhile to study programs from the set by first studying the common properties of the set and then determining the special properties of the individual family members” [122]. The Carnegie Mellon Software Engineering Institute (SEI) has conducted research on product families for several years and has published technical reports, results of surveys and case studies of companies that have a product family [123]. The SEI defines a software product family/product line as: “a set of software-intensive systems sharing a common, managed set of features that satisfy the specific needs of a particular market segment or mission, and that are developed from a common set of core assets in a prescribed way”. When an organization is producing multiple similar systems and re-using the artefacts it enjoys substantial benefit. The essence of software product line is the disciplined, strategic reuse of assets in producing a family of products. The potential for reuse is broad and far ranging, including: Requirements, Architectural design, Elements, Modeling and analysis, testing, etc. Software product lines can be described in terms of four simple concepts [103]. They are as follows:

• Software asset inputs: a collection of software assets – such as requirements, source code components, test cases, architecture, and documentation – that can be configured and composed in different ways to create all of the products in a product line. Each of the assets has a well-defined role within a common architecture for the product line. To accommodate variation among the products, some of the assets may be optional and some of the assets may have internal variation points that can be configured in different ways to provide different behavior [103]. • Decision model and product decisions: The decision model describes optional and variable features for the products in the product line. Each product in the product line is uniquely defined by its product decisions - choices for each of the optional and variable features in the decision model. • Production mechanism and process: the means for composing and configuring products from the software asset inputs. Product decisions are 64

used during production to determine which software asset inputs to use and how to configure the variation points within those assets. • Software product outputs: the collection of all products that can be produced for the product line. The scope of the product line is determined by the set of software product outputs that can be produced from the software assets and decision model.

These concepts illustrate the key objectives of software product lines: to capitalize on commonality and manage variation in order to reduce the time, effort, cost and complexity of creating and maintaining a product line of similar software systems.

• Capitalize on commonality through consolidation and sharing within the software asset inputs, thereby avoiding duplication and divergence. • Manage variation by clearly defining the variation points and decision model, thereby making the location, rationale, and dependencies for variation explicit. Software product line development approaches provide a shift in perspective, so that development organizations can engineer their entire portfolio as though it were a single system – the production line – rather than a multitude of products.

The objective of a software product line is to reduce the overall engineering effort required to produce a collection of similar systems by capitalizing on the commonality among the systems and by formally managing the variation among the systems. This is a classic software reuse problem because the types of artefacts that can be reused are not limited to source code fragments but rather may include design structures, module-level implementation structures, specifications, documentation, transformations, and so on [103]. The primary focus of software product line research has been on domain analysis and modeling, architecture modeling, software reuse repositories, generators, and process definition [124, 125]. There are several reasons why these have occupied the central focus in software product lines. First, since software product lines are harder to engineer than single software systems, the most rigorous and advanced software engineering techniques are the most likely candidates to apply to the problem. Second, domain and architecture models can 65 represent abstractions for a software family. This is an essential feature of any software reuse technology [91]. Finally, the software engineering processes for building software product lines can be quite different from the process for building single systems. A systematic software reuse process can succinctly address these differences.

2.8 Software Architecture as a Catalyst for Software Reuse

Software architecture lays the foundation for applications to adhere to their non-functional quality attributes [27]. Therefore software architecture also has to be considered in all kinds of reuse activities. Software reusable artefacts needs to be chosen not only based on their functionality, but also to be architecturally compatible to the new system being built or considered [126]. So software architecture is a key to reuse. This has been missing in all software reuse approaches we have discussed so far. The software architecture of a program or computing system is the structure or structures of the system, which comprise software elements, the externally visible properties of those elements, and the relationships among them [27]. An architecture has different stakeholders with different concerns. Architectural representations enable software developers to explicitly describe access and manage the architecture of software systems. Architecture representation consists of structural and non- structural information about software architecture. Structural information are components and connectors describing the configuration of a system and non- structural information are architectural properties [127]. Architecture bridges the gap between the requirements and implementation of the system. Software architecture is very important artifact as it is a key to understanding, analysis, reusability, evolution and management of legacy systems. The software architecture is the result of a set of business and technical decisions. Development of software system architecture represents a large investment of resources from the organization’s most talented software developers. The Architecture can be used for understanding the requirements. The creation of an architecture supports the identification of possible reuse opportunities. For example,

66 the identification of the architecturally significant components and their associated interfaces and qualities supports the selection of off-the-shelf components, existing systems, packaged applications, and so on, that may be used to implement these components. The architecture itself may also prove to be reusable as a reference architecture for subsequent systems. Even components within the architecture may be deemed potentially reusable in other contexts. Although the process of architecting can identify reuse opportunities within the current project, there is a much greater impact when reuse is considered across projects and across the enterprise. An important benefit of architecting is that it allows us to reason about the impact of making a change before it is undertaken. An architecture identifies the major components and their interactions, the dependencies between components, and traceability from these components to the requirements that they realize. Given this information, a change to a requirement, for example, can be analyzed in terms of the impact on the components that collaborate to realize this requirement. Similarly, the impact of changing a component can be analyzed in terms of the other components that depend upon it. Such analyses can greatly assist in determining the cost of a change, the impact that a change has on the system, and the risk associated with making the change. This information is then used when prioritizing changes and negotiating the changes that is absolutely necessary. Because of the fundamental role of software architecture both for adhering to non-functional requirements and with respect to software reuse, the architecture of a successful software system itself is also an important subject to reuse. Software architecture is very important for understanding, analysis, reusability, evolution and management of legacy systems.Software architecture of a legacy system needs to be reconstructed to see if the running system conforms to the existing documentation. Architecture reconstruction is a process of identifying and extracting higher level abstractions from existing software systems. Architecture reconstruction is critical to handle legacy code for large and complex systems. Architecture reconstruction deals with the issues of reconstructing the past design decisions that has been taken by the experts during the development of a system. These are decisions that has been lost due to some reasons; not documented, document revisions or developer have left or unknown [27]. 67

Software architecture reconstruction provides important leverage for the effective reuse of software assets. The ability to identify the architecture of an existing system that has successfully met its quality goals fosters reuse of the architecture in systems with similar goals; hence architectural reuse is at cutting edge of software reuse. The full benefit of software reuse can only be achieved by systematic software reuse that should be conducted formally as an integral part of the software development cycle by constructing and applying multi-use artefacts such as architectures, patterns, components, and frameworks. Although a potentially powerful technology, systematic software reuse is still uncommon in the corporate world. Even though several major corporations have adopted systematic reuse programs and despite the advances in reuse enabling technology, systematic reuse in practice is still a difficult goal to accomplish. Therefore, the search for tools and technologies to promote successful and productive reuse is still an active area of research.

2.8.1 Software Architecture Reconstruction

Software architecture reconstruction for legacy systems is motivated by the fact that these systems do not often have architecture documentation, and when they do, this documentation is in many cases not up to date and often out of synchronization with the implemented system. The recovery process can be assisted by different tools available in the market like Dali [48], PBS [128], Imagix4D [129], Bauhaus [130], and ARMin [108]. No one tool can perform all the tasks required for architecture recovery. Studies show that between 50% and 90% of software maintenance involves the understanding of the software being maintained [131]. One major reason for these high costs is due to architecture erosion which results in the maintainability of the software system becoming deteriorated. Software architects have to understand, analyze, and reason about the as-built software architecture of a system to modernize it [7]. Software architecture reconstruction can support the understanding of existing systems. It allows software architects to form increasingly abstract models of a system and the resulting artefacts from software architecture reconstruction can be

68 used to analyze quality-driven requirements that are caused by the demands of software modernization.

2.9 Knowledge Based Software Reuse

Systematic reuse is generally recognized as a key technology for improving software productivity and quality [132]. It is widely accepted that software reuse is a major component of many software productivity improvement efforts, because it can result in higher quality software at a lower cost and delivered within a shorter time period [132]. Most software assets including source code, test cases, and design models are complex and not written in text document (e.g. different files have different extensions and cannot be opened as text document) which makes searching for these components in the repository harder than searching for text documents [133]. One way to make a software asset easier to find and retrieve from the repository is the use of metadata. Metadata is simply data describing data. In the case of software reuse, metadata is a kind of representation that describes a software asset from various aspects including how to use it and how it relates to other assets which helps locating an asset and determining if it is suitable to be used. Effective reuse depends not only on finding and reusing assets, but also on the ways the reusable components are combined [134] and [13]. This means reuse of the control structure of the application. Architecture-based reuse extends the definition of reusable assets to include these properties and relationships. Shaw [134] software architecture into common architecture styles where every style has four major elements: components, connectors that mediate interactions among components, a control structure that governs execution and rules about other properties of the system, and a system model that captures the intuition about how the previous elements are integrated. Some of the popular architecture styles are pipeline, data abstraction, implicit invocation, repository, and layered [134]. Applying a combination of architecture styles creates architectural patterns [24]. An architectural pattern is a high-level structure for software systems that contains a set of predefined sub-systems, defines the responsibilities of each sub-system, and details the

69 relationships between sub-systems. Layers, Pipes and Filters, and Blackboard are some of the common patterns described by Buschmann et al in [135]. Software systems in the same domain have similar architectural patterns that can be described with a generic architecture (domain architecture). Domain architecture is then refined to fit individual application in the domain creating application architecture [23], [24]. Domain engineering, use of specific knowledge and artefacts to support the development and evolution of systems, has become a major aspect of disciplined software reuse. Most organizations work only in a small number of domains. For each domain they build a family of systems that vary based on specific customer needs. Identifying the common features of existing systems within a particular domain and using these features as a common base to build a new system in the same domain may result in higher efficiency and productivity. Domain engineering has two stages domain analysis and domain implementation. Domain analysis is the process of examining the related systems in a domain to identify the commonalities and variabilities. Domain implementation is the employment of that information to develop reusable assets based on the domain commonalities and use these assets to build new systems within that domain. There are several areas in which knowledge bases can be used to support the software development process. These areas include supporting the expert nature of software design and coding, facilitating the reuse of software components and artefacts, and providing domain knowledge to support software reuse and in the implementation effort [136]. Our survey of the literature [137], [138], [139] shows that there are several software reuse processes. Bauhaus [137] is a knowledge-based software parts composition system shell. It has a knowledge base of reusable software component descriptions, a catalogue for browsing and editing the knowledge base, a composition editor for component specification, and a code generator for composed or tailored components. LaSSIE [138] is a knowledge based software information system. It has knowledge representation for software objects and its relations, and it provides functions to query and browse software objects. The Software Components Catalogue [139] is another knowledge-based system for software reuse. It is an integrated component classification and retrieval system. 70

It utilizes a conceptual dependency database describing software components and their relations, then matches users requests for software components with descriptions of components which satisfy these requests. Finally, it provides a natural language interface to specify user requests. In summary, the core of these knowledge-based software reuse systems includes a knowledge representation of software components, and a reasoning mechanism to search and match user requests for software components. The advantages of using a knowledge-based approach include: aggregation of information about individual components, semantic retrieval, use of classification and inheritance to support updates, and use of a knowledge base as an index. The success of knowledge-based approaches to software reuse naturally leads to its use in software reuse processes. However reuse of software artefacts is different from reuse of software source code components only. Software artefacts include all the documentation, designs, design patterns, test cases, and everything else including source code at the time of the development of the software. Reusable artefacts, the object of reusability, can be any information which software engineers may need in the process of creating software. This includes any of the following: • code fragments, which come in a form of source code, Program Description Language, or various charts; • logical program structures, such as modules, interfaces, or data structures; • functional structures, e.g. specifications of functions and their collections; • domain knowledge, i.e. scientific laws, models of knowledge domains; • knowledge of development process, in the form of life-cycle models; • environment-level information, e.g. experiential data or users’ feedback; • artefact transformations during development process [140]; etc.

A controlled collection of software reuse artefacts constitutes a reuse library. Such libraries must contain not only reusable software artefacts but are also expected

71 to provide certain types of services to their users [141], e.g. storage, searching, inspecting and retrieval of artefacts from different application domains, and of varying granularity and abstraction, loading, linking and invoking of stored artefacts, specifying artefact relationships, etc. The major problems in the utilisation of such reuse libraries are in finding appropriate reusable artefact. Once implemented a software system undergoes maintenance and it becomes a legacy system. Over the years original architecture of the software is lost. Because maintenance is an ongoing phase, software engineers only get executable file and source code to maintain the system. Predefined source code is stored in the library. During software development many other software artefacts are developed such as test cases, DFD’s ERD’s Use cases etc. This is actual knowledge developed during software development. This knowledge is not stored in any library. If this knowledge cannot be recovered and reused, software reuse will not reach its objective. So there is a need to store all explicit and implicit knowledge developed during the time of software development or to recover it from the system much later in it life cycle, such as, when considering modernization. There is a need for a repository to store a large collection of designed-for-reuse software components and other designed-for-reuse software artefacts. There should be mechanisms to locate reusable software artefacts and components from the repository, adapt them (if necessary) and even create new ones, making use of the information provided by other similar reusable software components and reusable software artefacts With the changing paradigm of software development software reuse is required for software development or for modernization of legacy systems The risks involved in redevelopment of a system suggest that there is a need to reuse existing software artefacts, components, software assets, application requirements, source code, etc. This can only be facilitated with a systematic software reuse process with a corresponding reuse repository. This requires a paradigm shift in software development where software reuse should be treated as one of the phases of software development such as: analysis, design, coding, implementation, and maintenance. However, as the concept of reusing software artefacts is very clear at the code level (whether in source or binary form), the very same concept becomes more fuzzy and difficult to grasp when discussed in the context of reusing software artefacts as 72 specifications and designs (whether in textual or diagrammatical form), or quite incomprehensible when applied to software informal requirements, domain knowledge or human skills and expertise (expressed in natural language, knowledge representation formalism, or existing only in humans). This problem of dealing with reusable software artefacts resulting from the earliest stages of software development, in particular requirements specifications attracted our particular interest in software reuse. To make software reuse an integral phase in software development or in legacy system modernization all reusable software artefacts, components, assets etc. should be made easily available to software engineers [13]. This can be made possible only if a reuse repository is developed to store all of the knowledge base of reusable software artefacts, reusable components, previous software development experiences, etc.

2.10 Summary of Modernization Approaches

We have presented several approaches to support legacy system modernization. Each presented approach has strengths, weaknesses, and tradeoffs between software reuse, software architecture reconstruction, flexibility, data, business logic, presentation, integration, and impact of code changes. Table 2.1 summarizes the discussions of each presented modernization approach based on strength, weaknesses, software reuse and software architecture reconstruction. Modernization Strengths Weaknesses Software Software Approaches Reuse Architecture Reconstruction Black-box Cost Flexibility Yes, but no No modernization Time to Market Limited impact individual approach on components/ maintainability modules extracted for reuse White-box Maintainability Cost Yes, but no No modernization Time to market individual components/ modules extracted for reuse Screen Scraping Cost Flexibility Yes, but no No Time to market Limited impact individual Internet support on components/ maintainability modules extracted for 73

reuse Database Cost Limited impact Yes, but no No Gateway Tool support on individual maintainability components/ modules extracted for reuse XML Flexibility Tool support Yes, but no No Integration Tool support (present) individual (future) Evolving components/ B2B technology modules extracted for reuse Database Performance Data coherence Yes, but no No replication Reliability Applicable to a individual very specific components/ problem modules extracted for reuse Legacy Cost Flexibility Yes, but no No Integration Using Internet support Applicability individual CGI components/ modules extracted for reuse Architecture Cost Flexibility Yes, No Driven Internet support Applicability transform Modernization current architecture to target architecture COTS Based Cost Lack of tool Yes, but no No Modernization Internet support support individual Applicability components/ modules extracted for reuse OO Wrapping Flexibility Cost Yes, but no No Internet support individual components/ modules extracted for reuse Data wrapping Flexibility Cost Yes, but no No Integrated individual services components/ modules extracted for reuse

Table 2.1: Comparison of Modernization approaches.

74

2.11 Chapter Summary

This chapter has presented our literature review on modernization and software reuse. It has discussed where the problem of modernization comes from, What is already known about the problem of legacy modernization, what are the different modernization approaches and what is missing in the current approaches. It also discusses legacy modernization definitions and challenges and software reuse definitions and challenges. In this Chapter we reviewed the literature on legacy systems, legacy system modernization, software reuse and knowledge based software reuse processes. We discussed different approaches of modernization, different software reuse definitions and software reuse challenges, software reuse benefits and software reuse as web-services. Legacy systems are invaluable but in many cases because of maintenance legacy systems have lost their ability to evolve. The answer to make legacy system evolvable again is modernization. We discussed many modernization approaches. Business rules are embedded in the legacy systems. These business rules should be reused to get the same functionality from the modernized system, enhancement in maintainability and improvement in reliability. The modernization approaches we have discussed works on either a Black-box approach or a White-box approach. A Black-box approach requires wrapping the stable legacy systems and a White-box approach requires understanding the legacy systems. For a White-box approach software reuse is the key. Software reuse is a process of creating software or modernizing software from previously developed artefacts. The biggest challenge with software reuse is to understand software artefacts so that they can be reused. Software architecture is very important concern for analysis, reusability, evolution and modernization of legacy systems. Software architecture promotes software understandability. Many of the problems of these legacy systems are due to the fact that the system’s architectures are virtually un- documented. Moreover, these architectures have degraded over time, due to modifications or extensions that violate the initial architectural principles, resulting in overall chaos. If no software architecture is available for legacy system it can be reconstructed using software architecture reconstruction tools. Reusable software artefacts identified in the architecture reconstruction of the system can be saved into a knowledge base and the knowledge base can support software reuse. 75

Many software reuse processes exist but there is a lack of systematic software reuse process which can be incorporated into SDLC and modernization approach. There is a need to develop a systematic software reuse process which can be integrated into SDLC and modernization approaches so that the benefits of software reuse can be exploited.

76

CHAPTER 3: MODERNIZATION

3.1 Introduction

This chapter discusses software evolution dynamics, different drivers of software modernizations, our classification of existing modernization approaches, shortfalls of existing modernization approaches, software architecture and its role in legacy modernization. Organizations invest a lot of money on software systems and, to get a return on that investment, the software must be usable for a number of years. The lifetime of a software system is very variable. Some organizations, banks and governments still rely on software systems that are more than 20 years old. Many of these legacy systems are still business critical. Business critical means that the business relies on the services provided by the software system and any failure of these services would have a serious effect on the day-to-day running of the organizations. These legacy systems represent past capital investment of an organization. Yet, the value of the legacy system investment tends to decline over time. According to Lehman’s first law [142] software must be continually adapted or it will become progressively less satisfactory in "real-world" environments. This is due to the continuous change of user requirements and technical environment during software maintenance phase. Software maintenance is the last phase of the software development life cycle and it is also considered as expensive activity. After the software product has been released, the maintenance phase keeps the software up to date with environment changes and changing user requirements. Maintenance can only happen efficiently if the phases of the software development life cycle include structured, object-oriented or component based code, up-to-date documentation, sufficient knowledge of the existing system, etc. There are number of major problems that can slow down the software maintenance process. Some of these include: unstructured code, maintenance programmers having insufficient knowledge of the system, documentation being absent, out of date, or at best insufficient, and software maintenance having a bad image. The worse (unstructured code, no documentation

77 etc.) the source program the higher the cost of its maintenance. If the control flow of the program is unclear and different parts of the software systems are strongly related, then even a small change could have major side effects. For example, [143] quotes the results of a study conducted in a big company providing maintenance services. This study shows that even a one line correction of a program has a 55% chance of introducing another bug. Thus, sooner or later, the cost of new changes becomes too high and the system does not answer its purpose anymore. Irrespective of above mentioned problems software system must change or evolve according to user requirements and changes in the technical environment. Software evolution is inevitable because of emergence of new requirements when the software is used, changes in the business environment, repairing an error, new hardware needed to be added to the system, or quality attributes such as maintenance, performance or reliability need to be improved. Legacy modernization aims to retain and extend the value of the legacy systems investment [30]. The feasibility of a legacy system to be evolved and integrated with other systems may also be improved due to legacy system modernization. Legacy system modernization can address the issues of software maintenance and evolution. Therefore, the modernization of legacy systems, instead of a complete rewrite or replacement, can be expected to be an option which is often potentially desirable. Legacy systems suitability for modernization can be justified based on our discussion in Chapter 2. Legacy systems can be categorized into four different types which are: 1. High quality low business value: no modernization is required; 2. Low quality low value: replace with commercial package; 3. High quality high value: low priority for modernization; and 4. Low quality high value: good modernization candidate. For our research work the legacy systems of interest are of the fourth type. A low quality high value legacy system is a good candidate for modernization. Often for such systems there is rarely a complete specification and documentation available [27]. The original specifications and documentations may have been lost. If a specification exists, it is unlikely that it incorporates details of all of the system changes that have been made over the years. Therefore, there is no straightforward 78 way of specifying a new system which is functionally identical to the legacy system. Keeping legacy systems in use avoids the risks of replacements but making changes to existing software usually becomes more expensive as systems get older because of several reasons including: inadequate or out of sync documentation, no consistent programming style, old programming language used, data duplications and, low performance and software structure that have become corrupted over the years due to maintenance. Scraping legacy systems and replacing them with more modern software involves significant business risk. Replacing legacy systems is a risky business strategy. There is no guarantee of replacing all business rules embedded in a legacy system with more maintainable pieces of software. It is impossible to produce software systems of any size which do not need to be changed. Once software is put into use, new requirements emerge and existing requirements often need to change. Parts of the changes have to be made because of advances in technology to improve the quality attributes of the systems. As a result, constant technological change often weakens the business value of legacy systems, which have been developed over the years through large investments. The application software in a legacy system is not a single application program but usually includes a number of different programs. The system may have started as a single program, but over time, changes may have been implemented by adding new programs which share the same data files or resources. Over the time these systems become old running on old platforms and tend to become obsolete. Despite their obsolescence, legacy systems continue to provide a competitive advantage through supporting unique business processes and containing invaluable knowledge and historical data. International Data Corporation estimates that 200 billion lines of legacy code are still in use today on more than 10,000 large mainframe sites [144]. The difficulty in integrating legacy systems is reflected in a December 2001 study by the Hurwitz Group that found only 10% of enterprises have fully integrated their most mission-critical business processes [145]. Maintaining a legacy system is often not an option due to constant changes. Replacing the existing system by redeveloping a new system from scratch will incur several difficulties due to inability of software engineers to understand the legacy system. Understanding of legacy system may be impossible because of inconsistency in programming language of, difficulty in finding staff who have adequate 79 knowledge, system documentation may be inadequate, system structure may have been corrupted due to maintenance etc.. Redeveloping a legacy system from scratch may create a functionally equivalent information system based on modern software techniques and hardware, but the high risk of failure associated with any large software project lessens the chances of success [53]. A legacy system is an old system that still provides essential business services. These business services must be protected. Business services or functionality is the ability of the system to do the work for which it was intended. Functionality may be achieved through the use of any of a number of possible structures. In fact, if functionality were the only requirement, the system could exist as a single monolithic module with no internal structures at all. Instead it is decomposed into modules to make it understandable and to support quality attributes. Most legacy systems have been designed from a functional perspective and are composed of sets of interacting functions which communicate through parameters and global shared data areas. It is well known that a software system deteriorates as time passes and changes are made to it [142]. Legacy systems are likely to have an architecture that makes it difficult to evolve to address new requirements. Legacy systems are often incompatible with interfacing systems or difficult to integrate. They carry a high cost of ownership, are difficult to modify to meet ongoing business demands, require a legacy skill set that fewer and fewer people possess, and do not adequately meet today’s compliance demands [53]. The designers of the system may have left the organization, leaving no one left to understanding or to explain how it works. Such a lack of understanding is increased by inadequate documentation or manuals getting lost over the years. Documentation that is incomplete or out-of-date does not reflect truth, does not obey its own rule for form and internal consistency, and is not used. Organizations which have a large number of business critical legacy systems are therefore faced with a fundamental dilemma. If they continue using the legacy systems and making changes as required, their cost will inevitably increase. And if they decide to replace their legacy system with a new system, high risk is involved and a new system may not be able to provide the same level of support as the legacy system did. To solve this dilemma, organizations need to modernize their legacy systems. Modernization of legacy systems delivers a new system which keeps the same 80 business values and is more maintainable. Maintenance is a part of every system’s life cycle. In fact, if a software system can be maintained within an acceptable budget it is usually not considered a candidate legacy system for modernization [4]. The focus of modernization of legacy systems is to retain the business value of the legacy systems. Modernization is difficult because of the complex nature of these legacy systems. This Chapter discusses software evolution dynamics, different drivers of software modernization, classification of existing modernization approaches, shortfalls of existing modernization approaches, software architecture and its role in legacy modernization. At the end summary of the Chapter is presented.

3.2 Software Evolution Dynamics

Software evolution is defined as an activity of adding new functionality to existing legacy software. Maintaining refers to the activity of modifying software after it has been put to use in order to maintain its usefulness. Generally, software evolution refers to the study and management of the process of making changes to software over time. Software evolution is comprised of development activities, maintenance activities and reengineering activities. Software evolution is important because organizations are completely dependent on their software systems and have invested heavily in these systems. Their systems are critical business assets. Organizations must invest to maintain these systems to meet the changing requirements and to maintain the value of these assets. The majority of the software budget in large companies is therefore devoted to maintaining their existing business critical legacy systems, and we should not be surprised by the figures which suggest that 85% of software costs are evolution costs [47]. Maintaining legacy systems becomes difficult for several reasons which include requirements and design documents may not be passed from one company to another (if the system was developed by another company), companies may merge or reorganize and inherit software from other companies, and they find that the existing legacy system has to be changed.

81

According to the first law of Lehman and Belady [142] system maintenance is an inevitable process. All systems have a limited life time. Due to changing requirements systems must be modified. A change of functionality comes from a change of business rules. Thus modification of the business rules results in modification of the system. Their second law is about increasing complexity. As an evolving system changes, its structure tends to become more complex. Extra resources must be devoted to preserve and to keep the structure simple. The only way to avoid increasing complexity happening is to invest in preventive maintenance without adding any functionality to the system. Preventive maintenance does not accommodate changing user requirements. And user requirements will change due to change in technological environment. Their third law suggests that large systems have a dynamic of their own, which is established at an early stage in the development process. This determines the gross trends of the system maintenance process and limits the number of possible system changes. Lehman and Belady suggest that this law is a consequence of the second law where structure becomes complex and is degraded. During the maintenance process software documents are not kept up to date. The system becomes hard to understand. Even small changes cannot be done. A large change will probably introduce many new faults and will limit the useful services delivered by the system. As a result modernization of legacy system is required. Modernization does not mean modification of the functionality of the legacy system rather in our context means improving the future maintainability of the system. It enhances the maintainability attribute of the system.

3.3 Software Modernization

As discussed in the previous section, software systems evolve and hence there is a need for modernizing legacy systems. There are many reasons why a legacy system may need to be modernized and some of these are: 1. In many cases a legacy system may be written for mainframe hardware that is no longer available or that is expensive to maintain. 2. The legacy systems may not be compatible with new systems. It may have poor integration with the rest of the application systems in the organization.

82

3. A legacy system may rely on a range of support software from the utilities provided by the hardware manufacturer through to the compilers used for system development. Again these may be obsolete and no longer supported by their original providers. The operating system utilities may have limited user access and risk associated with hardware and software that is no longer supported. 4. In many legacy systems, an immense volume of inconsistent and duplicated data could have been accumulated over its lifetime. 5. Business processes may be designed around a legacy system and constrained by the functionality it provides. And the cost of maintenance of these business processes is very high. 6. During the system evolution the documentation may be lost, not kept up to date, or recorded properly. Actual system documentation may not conform to the running system. As suggested the most common reasons for modernizing legacy systems are increased maintenance cost, diminished productivity, limited user access, poor documentation, poor integration and the risks associated with hardware and software that are no longer supported. The legacy system is no longer supported by the vendor or is incompatible with future environments. The old system or application is unable to run appropriately and causes significant disruption. For these reasons organizations often consider the move to new technologies and architectures and hence investigate modernizing their legacy system. Modernization of a legacy system should be able to preserve the business value and obtain the desired maintenance quality attribute. Table 3.1 summarizes some of the most common problems in a legacy system. Modernization of a legacy system has crucial impacts on the system and on the organization. All organizations want to realize the highest business value possible from their existing investments in their legacy systems. However, maintaining a system in a legacy environment can consume a disproportionate percentage of organizational budget and human resources. The average company spends from 60% to 85% of software development budget on maintaining legacy systems that fail to meet the changing competitive needs of the business [147].

83

As a result, organizations are under increasing pressure to reduce costs and react more nimbly to ongoing business demands. The organizational need for modernizing legacy system can be categorized as: market drivers, business drivers, and technology drivers. Problem Area Description Software Legacy systems are old systems and are incompatible to new Architecture hardware platforms and future environment, difficult to integrate, difficult to change and difficult to maintain. Software architecture can be useful for maintenance of the legacy systems as it shows the interconnection of the modules and elements of the software. But most of the time the architecture was never recorded or if it was the architecture documentation may have become out of date and hence has to be reconstructed in which the as-is architecture of an implemented system is obtained from the existing system. Programming Different parts of the system have been implemented by language different technical teams which may mean there is no consistent programming style across the whole system. Part or all of the system may be implemented using an obsolete programming language. It may be difficult to find staff who have knowledge of these languages and expensive outsourcing of system maintenance may be required. User Interface The user interface (UI) is the most visible part of a system. Modernizing the UI improves usability and is greatly appreciated by end users. A common technique for UI modernization is Screen scraping which consists of wrapping old, text-based interfaces with new graphical interfaces. Hardware In many cases, legacy systems have been written for Platforms mainframe hardware which is no longer available, which is expensive to maintain and which may not be compatible with current organisational IT purchasing policies. Database systems The data processed by the system may be maintained in

84

different databases or files which have incompatible structures. There may be data duplication and the data itself may be out of date, inaccurate and incomplete. Architecture Hofmeister et al. [146] discuss how the architectural pattern Pattern forces software engineers to consider key design aspects. They suggest that the software architecture can serve as a design plan. Large systems are always decomposed into sub-systems that provide a related set of services. A legacy system may be poorly layered (for example there is improper use of the layered architecture pattern). Code Quality The code written for legacy system is also a big concern. Some Issues of the problems are inconsistent naming conventions, inconsistent or missing internal documentation, highly coupled code, low cohesion and high complexity at subsystems and module level, presence of dead code, constant definitions of identical parameters distributed through code, and code that is difficult to understand. All the problems stated add up to inhibiting the ongoing maintenance of a legacy system and hence it requires modernization. Part or all of the system may be implemented using an obsolete programming language. The system may have been optimized for space utilization or execution speed rather than written for understandability. This causes particular difficulties for programmers who have learned modern software engineering approaches and who have not been exposed to the programming tricks that have been used. Monolithic architectures implemented on DOS and earlier Windows based PCs often worked poorly with multiple users [53]. Table 3.1: Most Common Problems in a Legacy System Figure 3.1 shows modernization driving factors for market drivers. The modernization driving factors for marker drivers are: Business innovation opportunities, Merger, acquisition or integration, Regulatory compliance

85 requirements, Competitive and customer pressures to add function, and Performance and capacity.

Regulatory compliance requirements

Competitive and Merger, customer pressures acquisition or to add function integration

Market drivers

Performance and Business capacity innovation opportunities

Figure 3.1: Modernization driving factors for market drivers

Need to better integrate with other business systems: internal, supplier or customers

Time-to market Increased deadlines customer focus

Business Drivers

Resource Budget constraints Availability

Figure 3.2: Modernization driving factors for business drivers Figure 3.2 shows modernization driving factors for business drivers. The modernization driving factors for business drivers are: Budget constraints, Increased customer focus, Need to better integrate with other business systems: internal, supplier or customers, Time-to-market deadlines and Resource availability. 86

Organizations cannot provide good customer support if the systems are not integrated. The focus on customers’ needs is pushing the systems to be modernized. Time-to-market deadlines need to be met to satisfy the customers. Modernization driving factors for business drivers also includes budget constraints. Pushing the deadline means extra cost on the system.

Lack of software engineering skills for software maintenance

High cost of vendor lock-in and Lengthy maintenance development cycle

Technology Drivers

Software product Poor performance, and technology reliability, or obsolescence and availability vending product support Security audit compliance and access issues

Figure 3.3: Modernization driving factors for technology drivers

Figure 3.3 shows modernization driving factors for technology drivers. The modernization driving factors for technology drivers are: Software product and technology obsolescence and vendor product support, Lengthy development cycle, Lack of software engineering skills for software maintenance, High cost of vendor lock-in and maintenance, Poor performance, reliability or availability and Security audit compliance and access issues. The software product and technology becomes

87 obsolete after a period of time. Software obsolescence is generally due to one of three main causes: 1. Functional Obsolescence: Hardware, requirements, or other software changes to the system obsolete the functionality of the software. 2. Technological Obsolescence: The sales and/or support for software products no longer exist. There is an inability to expand or renew licensing agreements (legally unprocurable). Software maintenance terminates - the original supplier and/or third parties no longer support the software products. 3. Logistical Obsolescence: Digital media obsolescence, formatting, or degradation limits or terminates access to software and hence product support becomes an issue. Software engineers lack the skills to maintain the system. The cost of maintenance becomes very high. The system’s performance starts becoming poor. The product may not be available when required. Modernization of legacy systems can answer all these issues, however modernization by itself has many challenges.

3.4 Challenges with Modernization

Figure 3.4 shows challenges with modernization of legacy systems. The challenges with modernization legacy systems are: Unsatisfactory documentation, Monolithic architecture of the legacy systems, Cost associated with the redevelopment or re-engineering the legacy systems, Integration of the legacy systems with other systems, Consolidations of the legacy systems and the quality of the legacy systems.

88

Quality of the legacy systems

Consolidations of Unsatisfactory the legacy systems documentation

Challenges with modernization of legacy systems

Integration of the legacy systems Monolithic with other systems architecture Cost associated with the redevelopment or re-engineering the legacy systems

Figure 3.4: Challenges with modernization of legacy systems The challenges with modernization of legacy systems are: • Quality of the legacy systems: Many years of maintenance have usually corrupted the system structure, making the legacy system difficult to understand. • Consolidation of legacy systems: New programs may have been added and interfaced with other parts of the system in an ad hoc way. The data processed by the system may be maintained in different files which have incompatible structures. There may be data duplication and the data itself may be out of date, inaccurate and incomplete. • Integration of legacy systems with other systems: Different parts of the system have been implemented by different teams. There is, therefore, no consistent programming style across the whole system. • Cost associated with redevelopment or re-engineering the legacy systems: Business processes and the ways in which legacy systems operate are often inextricably intertwined. These processes have been designed to take advantage of the software services and to avoid its weaknesses. If the system is redeveloped or re-engineered, these

89

processes will also have to change, with potentially unpredictable costs and consequences. New software development is itself risky so that there may be unexpected problems with new systems. It may not be delivered on time and for the price expected. • Monolithic architecture: Monolithic architecture is the ones in which functionally distinguishable aspects (for example data input and output, data processing, error handling, and the user interface), are not architecturally separate components but are all interwoven. Understanding the system is very difficult task. Part or all of the system may be implemented using an obsolete programming language. It may be difficult to find people who have knowledge of these languages. • Unsatisfactory documentation: System documentation is often inadequate and out of date. In some cases, the only documentation is the system source code. Sometimes the source code has been lost and only the executable version of the system is available.

Software modernization needs to go through some or many phases of software development lifecycle. This requires requirement gathering, design, development and deployment. In legacy modernization requirement gathering is a challenge because of poor quality, monolithic architecture and unsatisfactory documentation. There are number of factors which can contribute to this such as : Lack of documentation about business use cases, Lack of mature automation tools which extract business rules from code, Non-availability of usage information for applications, High risk of business rules being missed at the time of analysing requirements and the cost associated with all these activities. During modernization cost for the information system changes needs to be clubbed with cost incurred due to change management. Integrating silos to the modernized systems is also an issue. Organizations with a large footprint of legacy systems cannot modernize all the systems at once. The developers need to think about integrating the other systems to legacy systems. A phased modernization approach may be required. However, this brings its own set of challenges like providing complete business coverage with well understood and implemented overlapping functionality, data duplication; throw away systems to

90 bridge legacy and modernized systems needed during the interim process. It all adds up the cost. Modernization can be done at different levels as shown in Figure 3.5 [148]. At lower levels, modernization takes the form of transforming the code from one language into another. At higher levels, the structure of the system may be changed as well as to make it, for instance, more object-oriented. At still higher levels, the global architecture of the system may be changed as part of the modernization process.

Architecture of the system is changed

Structure of the system is changed

Transforming the code

Legacy system Modernized system

Figure 3.5: Legacy system modernization at different levels

3.5 Classification of Existing Modernization Approaches

Many modernization approaches exist in the software industry. Figure 3.6 presents our classification of the existing legacy system modernization approaches. The first distinction made is the platform change decision, because that characterizes the most important decision one has to make when planning a legacy system modernization strategy [149]. Platform change means moving the legacy system from one environment or operating system to another e.g. mainframe to distributed. The no platform change decision means staying with the current environment’s upgrade solution. If no platform change is decided then we are limited to the 91 approaches that can be applied. These approaches can all be categorized as wrapping. If a platform change is decided then we have two options: re-engineering and/or migration.

Legacy system modernization

No platform change Platform change

Wrapping Re-engineering Migration

Figure 3.6: Classification of legacy system modernization approaches

In Figure 3.7, the wrapping approach has been further sub divided into three different approaches [30]. 1. User Interface Wrapping: In User Interface Wrapping screens of the legacy system are captured and are mapped into a modern graphical interface; 2. Function Wrapping: In Function Wrapping not only the data but the business logic encoded in old legacy programming language are wrapped and accessed through an interface by other applications. Function Wrapping can be further subdivided into: • CGI Integration, • Object-Oriented Wrapping, and • Component Wrapping.

92

3. Data Wrapping: In Data Wrapping new interfaces are developed for legacy data structures. Therefore the old data can be accessed or made available to new applications. This category itself could be subdivided as follows: • Database Gateways, • XML Integration, and • Data Replication methods.

Wrapping

User Interface Wrapping Function Wrapping Data Wrapping

CGI Integration Screen Scraping Data Replication

Object-Oriented XML Integration Wrapping

Component Database Gateways Wrapping

DNA CORBA3 EJB ODBC JDBC ODMG

Figure 3.7: Classification of legacy system wrapping approaches

Figure 3.8 shows different categories of the Re-engineering approach. The Re- engineering approach can be divided in two main approaches. They are as follows: • Scratch Re-engineering: Scratch Re-engineering is the most radical solution to legacy problems. This approach is based on leaving the existing code behind and rewriting the whole system from scratch. This is only appropriate for legacy system that cannot keep pace with the 93

business needs and modernization is not possible or cost effective. It has risks that should be evaluated carefully in advance [1, 30]. Scratch Re- engineering basically involves building a system from scratch and is very resource intensive. In addition, IT resources are typically fully allocated performing maintenance tasks and may not be familiar with new technologies that can be utilized on the new system. Scratch Re- engineering requires extensive testing of the new system once developed. Legacy systems are well tested and tuned, and encapsulate considerable business expertise. There is no guarantee that the new system will be as robust or functional as the old one.

• Reverse Engineering: The second class is Reverse engineering, which can be subdivided further as: White-Box approach and Black-Box approach [30]. In White-Box approach, the business-logic is to be extracted through analysis of the legacy system. This approach can be separated into two domains of Database Reverse Engineering (DBRE) and Procedure Reverse Engineering (PRE). Database Reverse Engineering is the part of system maintenance work that produces a sufficient understanding of an existing database system and its application domain to allow appropriate changes to be made. Database Reverse Engineering deals with a subset of the problems addressed by software reverse engineering. Database Reverse Engineering recovers domain semantics of an existing database and represents them as a conceptual schema that corresponds to the possible (most likely) design specifications of the database. While the first domain DBRE seems to be mature enough to be considered for the development of DBRE tools, the second (PRE) is still an unsolved problem [150, 151]. Procedure Reverse Engineering deals with analysing and understanding the old code. Analysing and understanding the old code is a difficult task. Some architecture reconstruction tools have been developed to aid in the understanding of such code but these tools are human interactive and interpretive [152, 153].

94

Re-engineering

Scratch Re-engineering Reverse Engineering

White-Box Blach-Box

Database Reverse Procedure Reverse Engineering Engineering Function Wrapping

Figure 3.8: Classifications of legacy system Re-engineering approach

In the Black-Box subclass of Reverse Engineering, the legacy system is kept intact and is treated as one monolithic problem. This leads us to the Function Wrapping techniques, but in the context of a new platform. The new platforms could be Object-Oriented or Component based. The Object-Oriented and the Component Wrapping approaches are more adequate to be used in conjunction with Reverse Engineering. The Object-Oriented and Component Wrapping can be classified as sub categories of Function Wrapping. Using these approaches, the interfacing logic is to be extracted through a shallow analysis of interconnected components, when possible. This objective will be to transfer the legacy components to the new platform and re-interface and restructure the links between them [154]. However, translating the monotonic and plain semantics of the procedural legacy programs to a rich hierarchic and structured semantic of independent components can be a very difficult task, if not impossible [[30, 152, 155]. The wrapping of the legacy business

95 logic is considered to provide a roadmap to substitute the old system incrementally, without having to go through a ‘big bang ‘replacement of the system [30].

Migration

System migration Component migration

No-value-added Value-added Phased Parallel operation features features interoperability

Platform User interface Data replication emulation change

Straight forward Database Database change Data slicing transformation gateway

Program code change

Figure 3.9: Classifications of legacy system migration approaches

In Figure 3.9, two classes for legacy migration approaches can be seen. The first class is Component migration in which the large legacy systems are broken down into independent components and each component is migrated separately [13, 72]. There will be a period of transition where both legacy and the new platform have to be online and work together. Two strategies will arise, Phased interoperability which is migrating modules of different software on different operating systems at phased interval and agreeing on a common way to exchange data with one another and Parallel operation which is migrating al modules at one interval. Both strategies need the data to be shared via Database gateways, or

96 replicated on the two platforms, or sliced into separate independent domains to be migrated gradually to a new platform [13, 72]. Data slicing is not easy to apply to legacy databases. The second class of migration approach is the System migration approach in which the whole legacy system and data are transferred to a new platform in a single step. There are two subclasses to this approach. The first one is the No-value-added features approach. This approach does not add any enhanced functionality to the legacy systems. No-value-added approach can be done by Platform emulation or Straight forward transformation. The Platform emulation approach consists of moving the whole legacy system to a virtual legacy machine emulated on a new platform. Such a solution does not bring any progress to the old system other than switching to a newer hardware when the old hardware is no longer supported. On the other hand, the Straight forward transformation of all the legacy components is not applicable to all legacy systems. Straight forward transformation is transforming the legacy system line by line onto a new platform. The second subclass to the System migration approach is the Value-added features modernization. This approach changes the features of legacy systems such as: User interface, Database, and Program code. It may offer more flexibility, better understanding, easier maintenance, and reduced costs and reduced code [30].

3.6 Shortfalls of the Existing Approaches

Many of the problems of legacy systems are due to the fact that the system’s architecture is virtually un-documented. Moreover, the architecture could have degraded over time, due to modifications or extensions that violate the initial architectural principles, resulting in overall chaos. Modernization of legacy systems must be able to address these issues. As discussed earlier there are different approaches to modernization of a legacy system. But how far these approaches address the undocumented architectural issues is a question. We have not come across any modernization approach which incorporates architectural reconstruction for the understanding of the legacy system. Software architecture reconstruction must be viewed not as an effort on its own but as a contribution in a broader technical

97 context, such as the streamlining the modernization of legacy systems that hit their architectural borders, that is, require major restructuring [48]. There are success stories of modernization of legacy systems but still there is a lack of literature on successful modernization processes. The scientific community has a vast number of man years invested in FORTRAN based application programs. The amount of code in legacy systems is immense. In 1990, it was estimated [263] that there were 120 billion lines of source code in existence. The majority of these systems have been written in COBOL, a programming language best suited to business data processing, or FORTRAN. FORTRAN is a language for scientific or mathematical programming. These languages have limited program structuring facilities and, in the case of FORTRAN, very limited support for data structuring. And it is very surprising that there is very little published work dealing with the modernization of these systems. The only notable exception is in the work of Decyk and Norton from NASA’s Jet Propulsion Laboratory [156] who have proposed a staged process for moving FORTRAN 77 to FORTRAN 90/95. Latha and Thanamani [150] have stated that there are few comprehensive approaches to software evolution and that the current literature contains no successful, practical experience reports from projects using a comprehensive modernization approach. In all of the approached discussed in our classification of modernization approaches, none of them integrate software architecture reconstruction. Given the scale, complexity, and risk of failure in legacy system modernization projects, a well- defined, easily implemented, and detailed methodology is essential to modernization success. However, few comprehensive legacy migration methodologies are available, and a general approach has yet to be agreed on. Existing approaches are either too high level or have yet to be applied in practice [59], [157] and [158]. Although partial solutions such as wrapping are widely adopted, such solutions are short term and can actually complicate legacy system maintenance and management over the long term [30]. Another aspect of modernization is that software artefacts must not be thrown away. Software artefacts are usually built over time and they should be reused in the modernization process. These assets are invaluable to an organization and an 98 organization’s day-to-day running is based on these assets. But the main problem is discovering these software assets. They could be software architecture, design decisions made early in the software development, documentation, source code, and software artefacts. There is a lack of modernization approaches that are methodical, repeatable, and learnable through their different phases and activities. Different modernization approaches require different activities as discussed and these activities cannot be made standardized for all legacy systems. We need a systematic legacy system modernization approach which is methodical, repeatable, and learnable through its own activities so that experience gained on one modernization project can be reused on other modernization projects.

3.7 Software Architecture and its Effect on Modernization

As we have discussed in Section 3.2 software evolves and during the evolution process, software becomes large because of modifications done to it. Software understanding also degrades because documentation does not get updated. System understanding may be lost because of a lack of documentation. One of the challenges that organizations are facing is how to reduce the complexity associated with legacy systems. Unless a legacy system is understood completely it cannot be modernized. To address this issue, software architecture is seen as an important tool. A systems’ software architecture is generally regarded as having a large influence on the effort required to adapt a system [27]. Software architecture of a legacy system is as valuable as it is for systems whose construction has just been initiated (‘architecture-based software development’) [159]. The architecture of the software depends on the requirements levied on it. Bass, Clements, et al. [27] have discussed different areas where software architecture is helpful. One of them is for maintainers of the software system where software architecture is helpful in revealing areas of a prospective change. Software architectures of a legacy system may be useful in many situations. Some of them are as follows:

99

• The functionality of a legacy system should be made available via different channels: the typical example is web-enabling the back-office systems of banks or insurance companies. The functional model view generated of software architecture addresses the techniques which can be used to decompose the problem domain into a set of architecture artefacts. • Organization A and organization B consider a merger (or acquisition). Should the software systems of A or B be used in the new organization? What is the quality of these systems? Are the systems’ architectures sufficiently flexible to leverage the new business opportunities created by the merger? Software architecture can be used as input into a decision regarding what to use from software system as a basis of merger [33]. • A software house specialized in outsourcing offers fixed-price maintenance services for legacy systems. When bidding for a new contract, it needs to determine a price, based on an architectural review and an assessment of the capabilities of the legacy system to deal with the planned future changes. Software architecture helps in determine the architectural views which can be used as a basis to determine price. • New people (aptly called “software immigrants” by [160]) are brought into the team responsible for maintaining a legacy system. Before they can start being productive, they need to acquire an understanding of the legacy systems software architecture. • Software maintenance of a particular legacy system is experienced as too difficult and error-prone by the software maintenance team. To remedy this, a modernization of the legacy system is planned. A key element of the modernization is an architectural review, including an assessment of the architecture’s capabilities to deal with the anticipated future modification requests. • Software modernization requires understanding of the legacy system. Modernization is required when system stops evolving and no modifications can be done. Maintenance of these systems is too difficult. Components, modules, procedures, functions in the legacy system need

100

to be identified for modernization. Software artefacts which are long lost needs to be recovered for complete understanding of the legacy system. The software artefacts and other assets such as business rules of the legacy system are reused for modernization. Software architecture helps in recovering these software assets, artefacts and components.

Every software system has an architecture. There could be a case where there may be very few components, or the given subdivision into components may be far from optimal, but some architecture must exist. The existing architecture, however, may be unknown to the software engineers, i.e. the architecture is not documented in any way. The original developers may have gone, or the initial architecture may have been degraded beyond recognition, and the effect is that no one really understands the system’s architecture. Software architecture helps to make forgotten architectural structures explicit. Software architecture also emphasizes that one system can comprise of more than one structure. No one structure holds the irrefutable claim to being the architecture. Example architectural structures listed by [27] include the module structure (for work assignments), the logical structure (how functions share data), the process structure (how programs run concurrently), and the call structure (how procedures invoke each other with parameters). The observation that a software architecture is a mix of structures and views, used for various purposes, is made explicit by the “4+1 View Model” of architecture proposed by Kruchten [161]. This model describes software architecture using five concurrent views: The logical view describes the design’s object model, or entity relationship diagram; the process view deals with the design’s concurrency and synchronization aspects; the physical view covers the mapping of the software onto hardware; the development view describes the software’s static organization in its development environment. The 5th view of “4+1 View Model” consists of a selected set of use cases (scenarios) to illustrate the other views. For the purpose of legacy system architecture, mainly the logical and development views are required [161]. The software architecture of a program or computing system shows software elements, software components, externally visible properties, design, functions and relationships among them. Components connectivity in a software system can be viewed by its software architecture. Software architecture helps software engineers in 101 the understanding of the system. All software components are potential reusable artefacts. It’s difficult to find reusable components, design and other software artefacts that can be reused when dealing with legacy system. To identify reusable artefacts we need to reconstruct the architecture, documentation, design, etc.. As described earlier it is often the case that the architecture is not documented so there is a need for software architecture reconstruction. Software architecture helps identifying reuse opportunities. As a result architecture is a key for reuse [27].

3.8 Software Architecture Reconstruction

The source code hosts the most current information about the system and the source code needs to be analyzed in order to understand the legacy system. To perform an effective analysis a suitable set of architecture documentation is required. Ideally, the architecture documentation needs to be concise but effective, flexible, interactive and dynamic. However in many deployed applications, having high quality documentation available and synchronized with the application implementation is difficult [162]. The system architecture can be understood with the help of software architecture reconstruction (SAR). Much research has been done in SAR in the past several years [148], [163], [164], [165], and [166] and many techniques and methods have been developed [48], [167], [168], and [169]. The end result of the architectural reconstruction process is a set of architectural views of a system. These views help us in analyzing the source code and the legacy system. Figure 3.10 shows an overview of the software architecture reconstruction process. This process involves three steps which are: Extracting source information, Composition of architectural views, and finally Select architectural views. These steps of the software architecture reconstruction process are very helpful in the modernization of legacy systems because they help in understanding of legacy system, identifying reusable components, determining internal and external dependencies of components, etc.

102

Composition of architectural Extracting source information Select architectural views views

Figure 3.10: Software Architecture Reconstruction Process

The information needed for understanding the legacy system is extracted from the source code, during the Extracting source information step, in the form of a set of elements and relations among these elements. This understanding of legacy system is required for modernization purpose. Many program analysis tools can do this type of analysis, depending on the language in which the system is implemented. The reconstruction operates on the information, generating abstractions and eventually a set of views are generated in the Composition of architectural views and Select architectural views steps. These architectural views reveal broad, coarse- grained insights into the architecture. Reconstruction consists of two primary activities: visualization and interaction and pattern definition and recognition. Visualization and interaction provides a mechanism by which the user may interactively visualize, explore and manipulate views. Pattern definition and recognition provides facilities for architectural reconstruction: the definition and recognition of the code manifestation of architectural patterns [27]. An architectural pattern can be defined as “expressing a fundamental structural organization schema for software systems. It provides a set of predefined subsystems specifies their responsibilities, includes rules and guidelines for organizing the relationships between them” [27]. The visualization and architectural pattern of software can help in finding out the answers to questions which can form the basis of modernization of legacy systems. Some of the questions include: • What are the subsystems or components of the software systems? • How should the interfaces between components be structured? • What are the characteristics of the component’s communication? • What are the system components which can be reused or made reusable?

The Software Engineering Institute (SEI) proposes an architectural-centric decision-making method, called Option Analysis for Reengineering (OAR) [170], to 103 facilitate the systematic identification of best candidate legacy components for reuse based on the level of difficulty, cost, and effort required to mine and rehabilitate the selected components.

3.9 Chapter Summary

This Chapter discussed the software evolution dynamics, different drivers of software modernizations, our classification of existing modernization approaches, shortfalls of existing modernization approaches, software architecture and its role in legacy modernization. Software evolves and evolution also degrades the software structure. In the course of time software stops evolving and runs on obsolete hardware/operating systems that are slow and expensive to maintain. Software maintenance can also be expensive, because documentation and understanding of system details is often lacking and tracing faults is costly and time consuming. Complexity associated with legacy system makes it difficult to integrate with other systems. Several modernization approaches are discussed in this Chapter. These approaches fall generally into three categories: 1) Re-engineering which is a combination of Reverse engineering and Scratch re-engineering with some changes and modifications to the legacy system; 2) wrapping, which provides a new interface to a component, making it more easily accessible by other software components; and 3) migration, which moves the legacy system to a more flexible environment, while retaining the original system’s data and functionality. All the modernization approaches discussed tend to offer short term solutions to long term problems. None of the approaches discussed have incorporated software architecture reconstruction. Software architecture gives the blue print of the legacy system and modernization decisions can be based on the software architecture. The approaches fail to recognize that the essence of modernization is reconstructing the lost software architecture. Different views of software architecture can be generated. The approaches discussed also fail to recognize that software architecture is the key to software reuse in the modernization approaches.

104

CHAPTER 4: SOFTWARE ARCHITECTURE AND ITS ROLE IN SOFTWARE REUSE

4.1 Software Architecture?

This chapter presents an overview of software architecture and its importance in the modernization of legacy systems. It describes different quality attributes related to modernization and what the need are for software architecture and software architecture reconstruction in a modernization approach. Software architecture is a blue print of a system. All functional and non- functional requirements can be well understood once the blue print is in hand. A system’s software architecture is generally regarded to have a large influence on the effort required to modify a system [33]. Software architecture plays a pivotal role in allowing an organization to meet its business goals. The software architecture is an asset that holds tangible value to the developing organization beyond the project for which it was created. According to Eden and Kazman [171] and Garlan and Shaw [175] software architecture is concerned with issues beyond the algorithms and data structures of the computations. Architectural design is a design at a higher level of abstraction [172]. Architecture is specifically not about details of implementations (e.g., algorithms and data structures). Architectural design involves a richer collection of abstractions than is typically provided by OOD (Object-Oriented Design) [173]. Perry and Wolf define software architecture as a configuration of architectural elements—components, connectors, and data constrained in their relationships in order to achieve a desired set of architectural properties [174]. They represent a model that defines software architecture as a set of architectural elements that have a particular form, explicated by a set of rationale. Architectural elements include processing, data, and connecting elements. Form is defined by the properties of the elements and the relationships among the elements, that is, the constraints on the elements. The rationale provides the underlying basis for the architecture by

105 capturing the motivation for the choice of architectural style, the choice of elements, and the form. The given definition suggests that every system has architecture. The definition also emphasizes that one system can comprise more than one architectural structure. In practice the term “architecture” gives information about the existing software system. Architecture is high level design. The study of software architecture has evolved by the observation of the design principles that designers follow and the actions they take when working on real system. Over the past 15 years many researchers have sought to define software architecture. The software architecture of a program or computing system is the structure or structures of the system, which comprise software elements, the externally visible properties of those elements, and the relationships among them [33]. Software architecture has emerged as a crucial part of the software system design process. Software architecture encompasses the structures of software systems. A system’s software architecture is the first artefact and would be the set of requirements in the development process and captures the very first design decisions for that system. Architecture is the overall structure of the system. Different structures may provide different quality attributes and decides the success or failure of the design. The multiplicity of structures in an architecture lies at the heart of different quality attribute concepts [27]. Architecture is the structure of the components of a program or systems, their interrelationships, and the principles and guidelines governing their design and evolution over time. An architecture of a systems can be discovered and analyzed independently of any knowledge of the process by which the architecture was designed or evolved [27]. Architecture is components and connectors. Connectors imply a runtime mechanism for transferring control and data around a system. Thus, this definition concentrates on the runtime architectural structures [27]. Eden and Kazman have suggested the distinction between “architecture”, “design” and “implementation”. Architecture, design and implementation fall on the same continuum, architecture on one side and implementation on the other [171]. Complete detail with physical design is called implementation, few details with 106 logical design is called design and the highest level of abstraction is architecture. The Software Engineering Institute (SEI) has also clarified the difference between architecture, design and implementation. It has suggested that architectural specifications are intentional and non-local. Intentional means that design specifications are “abstract” in the sense that they can be formally characterized by the use of logic variables that range over unbounded domain. Non-local specifications are “abstract” in the sense that they apply to all parts of the system (as opposed to being limited to some part thereof). Bass, et al. [33] have divided architectural structures into three groups: • Module structure: Here the elements are modules. The module structure describes functionality of each module and its relationships with other modules. Module-based structure includes Decomposition, Uses, Layered and Class structures. • Components-and-Connector structure: Here the elements are runtime components (which are principal units of computation) and connectors (which are the communication vehicle among components). This structure describes the major executing components, their interactions, parts of the systems that can run in parallel and the parts of the system which can be replicated. • Allocation structures: Allocation structures show the relationship between the software elements and other elements (hardware or communication) in one or more external environments in which the software is created and executed. The Allocation structure describes the deployment, implementation or the work assignment (to development teams).

4.2 Software Architecture and Maintenance Attribute

Quality attributes of large software systems are to a large extent determined by the system’s software architecture, i.e. qualities such as maintainability and modifiability depend at least as much on the overall architecture as on the code level implementation [5].

107

The domain of software architecture has received considerable attention in recent years. This is, to some extent, because quality requirements (QRs) are heavily influenced by the architecture of the system. Some QRs are conflicting, thus making it necessary to find an architecture that provides an appropriate compromise. Architectural design is a typical multiple objective design activity where the software architect has to balance the various requirements during the design of the architecture. Although there are methods for analyzing specific quality attributes, these analyses have typically been done in isolation [176], [177] and [178]. Kazman et al. [176] developed the Software Architecture Analysis Method (SAAM), an approach that uses scenarios to gain information about a system’s ability to meet desired quality attributes such as safety and portability. Typically they have been done for some purpose but not to support software reuse or system modernization. The Architecture Tradeoff Analysis Method (ATAM), a follow on from the SAAM work, is a method for evaluating software architectures relative to quality attribute goals [179]. ATAM reveals how well architecture satisfies particular quality goals. ATAM provides architects with a means of evaluating the technical tradeoffs faced while designing or maintaining a software system. Software architecture lays the foundations for applications to adhere to their non-functional quality attributes. The same holds for legacy architecture. Before we conduct any change we need to understand the architecture of the software isolate the parts that can be safely changed. Therefore software architecture also has to be considered in all kinds of modernization activities. The components which need to be reused for modernization have to be chosen not only based on their functionality, but also to be architecturally compatible to the respective application. For almost every software system development there is at least an informal description of the software’s architecture. Yet in most cases, the software architecture is not consequently considered in the later development stages, especially in reuse activities. Practical experiences with software reuse, however is often disillusioning. There are number of hurdles to software reuse. One of the hurdles is about incompatibility of reusable software components to the existing system. On the one hand, black-box components cannot be modified; it is only possible to wrap them by appropriate glue-code with limited chances of success. White-box components on the other hand would have to be modified. 108

Since the software architecture lays the foundation to achieve important system non-functional quality attributes such as scalability, interoperability, maintainability, performance and reliability, undermining software architecture has direct consequences for the entire system. Quality attributes are the overall factors that affect run-time behavior, and system design. We are reconstructing software architecture to enhance the maintainability quality attribute of our legacy system. Maintainability is the ability of the system to undergo changes with a degree of ease. These changes could impact components services, features, and interfaces when adding or changing the functionality, fixing errors, and meeting new business requirements. Software maintenance involves the identification or discovery of program requirements and/or design specifications that can aid in understanding, modifying, and adding features to the legacy systems. When designing software to meet any of the quality attributes such as maintenance, it is necessary to consider the potential impact on other requirements such as interoperability. The maintainability of a system can be directly affected by the excessive dependencies between components and modules. Maintainability is also affected by inappropriate coupling to concrete classes. This prevents easy replacements, updates, and any changes. Software architecture can aid software architect to make decisions on replacements, updates and any changes to components and modules.

4.3 Software Architecture Reconstruction

Software architecture reconstruction of an existing system unfolds the architecture which was perhaps never recorded by the original developers. There could be case that the architecture was recorded but the documentation has been lost. Or there could be a case it was recorded but the documentation is no longer synchronized with the system after a series of changes were made to it. How do we modernize systems where we don’t have the architecture documentation or it is out of date? Our modernization approach is based on software architecture we need software architecture reconstruction to help find out the as-built architecture of the implemented system from the existing system. More details of software architecture

109 reconstruction will be given in Chapter 5. Software architecture reconstruction is an interpretive and iterative process. It requires the skills and attention of both the reverse engineering experts and the architect, largely because architectural constructs are not represented explicitly in the source code. There is no programming language constructs for architectural elements that can be easily picked out of a source code file. Software architecture design patterns can be constructed out of the architecture to show the architectural elements. Different software architectural views can be generated to support the modernization of a legacy system. A software architecture view is a representation of coherent sets of architectural elements developed by and read by system stakeholders. It consists of a representation of a set of elements and the relations among them. Software architecture structure is a set of elements itself, as they exist in software or hardware [33].

4.5 Documenting Software Architecture

Given the uses of the software architecture, it is very important to document the software architecture [181]. This is the first artefact to help in system understanding. For nearly all legacy systems, quality attributes such as performance, interoperability, modifiability, maintainability are modernization goals. Software architecture is where these goals are met [181]. Software architecture documentation serves many roles in the modernization of a legacy system. Our goal is modernization of legacy system for maintainability. For maintainability, we need to be concerned with the following: • Be concerned with decomposition of the software system into cooperating processes; • Manage inter-process communication; • We must pay attention to how data elements are defined and used; • If maintenance is need to support interoperability we must carefully separate concerns among parts of the software system;

110

• If the legacy system’s previous architects are no longer with the organization, the architecture is the artefact that (if properly documented) preserves that architect’s knowledge and rationale; • If modernization is required, architecture documentation is both prescriptive and descriptive. That is, it prescribes what should be true, and it describes what is true, about a system’s design; • If we want to maintain the system incrementally, by maintaining larger subsets, we have to keep the dependency relationships among the pieces untangled in order to avoid the “nothing works until everything works” syndrome [181].

4.6 Software Architecture and Software Reuse

Software artefacts can be systematically reused across the entire development life-cycle, i.e. domain analysis, requirements specifications, design and implementations, and testing. Early experiences with software reuse were limited to reuse of program code in source and binary form. Today’s development approaches, such as object-oriented methods or rapid application development, vigorously advocate reusing artefacts at the earliest possible stage of the software life-cycle. Traditionally, the primary focus of reuse research has been on the reuse of code-level entities, such as classes, subroutines, and data structures. While there have been significant improvements in code reuse technology and methods, code-level artefacts are not the only ones that can be profitably reused. The architecture of a software itself can be reused. An architectural design is concerned with the gross decomposition of a system into a set of interacting components [175], [182]. At this level of abstraction, key issues include the assignment of functionality to design elements, protocols of interaction, system extensibility, and broad system properties such as throughput, maintainability, and overall performance. All architectural design components can be reused. From a legacy system we need to reconstruct the architecture of the system. Monroe and Garlan [183] have design style based reuse for software architecture. Architecture Design reuse would appear to be one of the most

111 promising avenues for improving the prospects for software reuse. Their work has integrated the concepts of software architecture and architectural style with traditional mechanisms for performing standard software reuse tasks. They have described the approach, based on the reuse of architectural designs. The essential ingredients are (a) the use of components, connectors, and configurations as the basic vocabulary of reusable assets, and (b) the exploitation of architectural style to aid in the classification, retrieval, and instantiation of those assets. The limitations of Monroe and Garlan approach is “design reuse” can mean many things. Design reuse can mean reuse of DFD, ERD, Collaboration, Sequence, Activity and UML diagrams. It can also mean reuse of test cases. Legacy systems are not developed on Object Oriented Design (OOD) principles and hence would not have adopted an OOD approach. Baum et al. [184] presented a development approach explicitly designed for extensive reuse, and shown the importance of software architecture as a catalyst in reuse. They have adopted the technique of reuse of a core system. The core system approach also seems to be promising for product lines. Provided that the reusable assets exhibit sufficient flexibility to meet the varying demands of different products, the core system might serve as the common basis for a product line. The limitation of this approach is that no knowledge based reuse repository is being used to store reusable software artefacts. Su, Hoskins and Grundy [185] have built a prototype tool KaitoroCap to support finding relevant information in architecture documents for reuse. To facilitate the usage of software architecture documents, the architectural information in the architectural documents needs to be structured into, or presented as, chunks [186]. KaitoroCap is s semi-automated approach capturing user’s exploration paths through architectural documents while engaging in information seeking and navigation tasks. It focuses more on finding out the exploration path. The limitation of this approach is also not having knowledge based reuse repository where software artefacts can be stored for future reuse. Software architecture can convey the quality attributes of a system. It also shows the interrelationships between components, classes, modules, etc.. Software architecture represents the high-level design. Reusable software architecture allow the reuse of design add code as a whole or as a different software reusable artefacts 112 such as code fragments, logical program structures, functional structures, domain knowledge, knowledge of development process, process, environment-level information, ERD, DFD, sequence, activity, collaboration diagram and artefacts transformation during development process.

4.7 Chapter Summary

This chapter discussed software architecture for modernizing legacy systems. And presented an overview of software architecture and its importance in the modernization of legacy systems. It described different quality attributes related to modernization and what the need are for software architecture and software architecture reconstruction in a modernization approach. The software architecture serves as the blueprint for the legacy system modernization. It identifies the modules that require modification and the kind of effects this modification will have on design and implementation. The architecture is the carrier of system qualities such as understandability, modifiability, interoperability and maintainability. Modernization requires understanding the legacy system. Understanding the legacy system can be achieved through the architecture of the system. Architecture is a vehicle for early analysis to make sure that the design approach will yield an acceptable system. And architecture is the artefact that holds the key to post-deployment system understanding or mining efforts. Software architecture is the key to reuse. In short, software architecture is the conceptual glue that holds the key to the modernization of a legacy system.

113

CHAPTER 5: ARCHITECTURE RECONSTRUCTION TOOLKITS

5.1 Introduction

This chapter discusses different architecture reconstruction toolkits and the toolkit we used in our modernization approach. In this Chapter, we have examined four software architecture reconstruction toolkits and assessed and compared their capabilities. The toolkits’ capabilities are evaluated in terms of extraction ability, abstraction ability, navigation, ease-of-use, views generated, language support, extensibility and completeness. During maintenance, modifications may occur and therefore may have an impact on the software system architecture. Software architects have to understand, analyze, and reason about the as-built software architecture of a system to modernize it [13]. There will be a need for architecture reconstruction if the existing architecture is not well documented and understood. Most of the time the software documentation of legacy systems does not truly represent the existing systems or is out of sync with actual systems due to poor documentation during the maintenance process [187]. Because of the time and effort required to develop a business critical software system, these systems usually have a long lifetime. For example military systems are often designed for a 20-year lifetime, and much of the world’s air traffic control still relies on software and operational processes that were originally developed in 1960s and 1970s. Changes to one part of the system inevitably involve changes to other components. These systems are being maintained because it is too risky to replace them. Studies show that between 50% and 90% of software maintenance involves the understanding of the software being maintained [157]. A software architecture, together with its elements and the way they interact, constitute valuable assets for understanding the software system. Understanding legacy systems is time and effort consuming, due to several reasons, among which: the system size (large systems consist of millions lines of code), lack of overall views of the system, its previous evolutions (not necessarily documented), etc. This motivates us to

114 investigate reconstructing of the architecture of the legacy system to support understanding of the system [188]. The software architecture of a system is the set of structures needed to reason about the system, which comprise software elements, relations among them, and the properties of both. Modernization of a legacy system requires a understanding of all the functional and non-functional attributes of the system. The functional attributes for a system describe what the system should do. Functional attributes of a system are nested in the software components of the system and the behavior of those components. The quality attributes are the attributes that are not directly concerned with the specific functions delivered by the system. These attributes are rarely associated with individual system features. These attributes are system properties. Understanding the source code will not give sufficient aid in understanding the quality attributes of the system. To understand the quality attribute we need to reconstruct the architecture of the system. As already outlined one of the reasons for modernization needs to be carried out is when there is a business need to upgrade the quality attributes of a system. Therefore, recovering or reconstructing software architecture information is an essential process for software modernization and evolutionary development. Software architecture reconstruction is infeasible manually when working with medium or large size software systems because of their complexity. As a result, tools are necessary for the automation of the architecture reconstruction process. However, no tool is fully automatic since it requires human interaction at some stage to obtain application-specific knowledge about the system [189]. In this Chapter, we have thoroughly examined four software architecture reconstruction tools and assessed and compared their capabilities. The tools’ capabilities are evaluated in terms of extraction ability, abstraction ability, navigation, ease-of-use, views generated, language support, extensibility and completeness.

5.2 Reconstruction Toolkits Overview

There are a number of toolkits available that support the parsing of source code, extracting information from the source code and reconstructing the software systems architecture. We have selected the following four tools:

115

1. Dali workbench [190] 2. PBS toolkit [191] 3. SWAGKIT toolkit [192], and 4. Bauhaus toolkit [193] The criteria for a selection of a tool is that a tool is not only being able to extract information from the source code of a system but also being able to abstract and visualize system components to a higher level. Other tools such as Sniff+, Understand and Imagix 4D are not categorized as software architecture reconstruction toolkits since they do not support the abstraction and/or visualization of the software architecture [194], [195] and [196]. Nevertheless, they can be used as external extractors to produce formatted outputs which then can be the input to one of the examined software architecture reconstruction toolkits. Table 5.1 gives a summarized description of the languages, platform supported, fundamental components and some of the development history of the toolkits.

5.2.1 Dali Workbench Dali1 is an open lightweight workbench which utilizes PostgreSQL database, Perl scripts and Rigi [195] Interface. It accepts well-formatted data as input, populates the input to tables in a database. By manipulating the database, the architectural information can be grouped to show different abstraction levels of the system’s architecture, and are presented in a graphical interface [190].

5.2.2 The PBS and SWAGKIT Toolkits PBS2 was developed as a Software Bookshelf and is a web-based model for the presentation and navigation of software systems’ architectural information [197]. However, a new reverse engineering tool named SWAGKIT3 has been developed to enhance the capabilities of PBS toolkit. SWAGKIT is used to generate software landscapes from source code through three phases namely extraction by the cppx tool; manipulation by the prep, linkplus and layoutplus tools and presentation by the lsedit tool [197].

1 Dali is named after Salvador Dali, Spanish surrealist 2 PBS: Portable BookShelf, 3 SWAGKIT: SWAGKIT toolkit 116

5.2.3 Bauhaus Toolkit Bauhaus4Toolkit is used for architecture recovery as well as for program understanding and code auditing during maintenance process [198]. This toolkit has a set of tools to extract, analyze, query and visualize information about existing software.

PBS and SWAGKIT Features DALI PBS SWAGKIT Bauhaus Toolkit Supported Linux Windows and Linux Linux and platforms Linux Windows Installation Linux Redhat 9 Windows XP Linux Redhat 9 Linux Redhat 9 platform Software University of Waterloo, Canada University of Developed by Engineering Stuttgart, Institute Germany Space occupied 7Mb 15Mb 98Mb 48Mb Supported C, C++, Java and C C and C++ C and C++ language Fortran - PostgreSQL 7.0+ - Web server - cppx - C frontend cafe - Rigi - csh shell - Grok - Preprocessor - Tcl/tk scripts - C language - lsedit cafeCPP parser cfx - Frontend cafeCC Components - csh shell - Perl scripts - Manipulation - Linker imllink tool Grok - Script iml2rfg - Java-based user interface lsedit Required Java Virtual JVM - Emacs external software Machine (JVM) First release 1998 Sep 2002 2000 Latest update 2001 1997 Feb 2003 Dec 2003 Superceded by ARMIN5 SWAGKIT N/A N/A Rigi Standard Form - Source code - Source code .rsf files - contain.rsf which specifies the - RSF Input format architecture of the system - GXL (optional)

4 Bauhaus: a very influential German school of arts in the nineteen twenties.

5 ARMin: Architecture Reconstruction and MINing tool which was developed by the Software Engineering Institute and Robert Bosch Corporation. 117

- Hierarchical Graphical layouts of the main - Hierarchical graphical software system and its subsystems. graphical layouts representation of of the main the highest level software system architecture. and its - Graphical subsystems. representation of - Various types of Output the conformance views including to designed call graph, file architecture. view, logical view, type view and user defined view

Nokia [199] Linux operating Linux and - Web Browser Previous big case system [191] VIM editor Mosaic and study [192] Chimera [200]

Table 5.1: Configurations of tools

5.3 Toolkits Functionalities

In order to understand and compare the functionalities supported by the tools each of them was applied to a case study called ‘Concepts’. The primary goal of using this case study was to reconstruct the software architecture of the case study using each tool and to understand the capabilities of the Toolkits’ functionalities in the reconstruction of the software architecture. ‘Concepts’ is a medium size application system with 3861 lines of codes and 31 files. While reconstructing the software architecture source code are primary artefacts used for identifying reusable software components. Reusable code and other reusable artefacts are identified from the reconstructed software architecture. System documentation for ‘Concepts’ was not available so we relied only on the source code which was written in C.

The secondary goal was to have hands on experience with a medium size application system in the reconstruction process. Trying to use the tools to reconstruct the architecture of a much larger system would have meant more effort involved in the reconstruction process and maybe less chance of being able to do a reasonable comparison. We also wanted to produce a scalable, modular, and flexible software architecture artefacts that could be reused to other contexts in

118 modernization. Concepts’ architecture is shown in Figure 5.3, after function pattern are applied. Figure 5.4 shows the domain specific components and Figure 5.5 shows the content of sub systems in the domain specific components to do architectural analysis to identify software architecture artefacts. Concept analysis is a principled way of deriving a concept hierarchy or formal ontology from a collection of objects and their properties. Each concept in the hierarchy represents the set of objects sharing the same values for a certain set of properties; and each sub-concept in the hierarchy contains a subset of the objects in the concepts above it. Formal concept analysis finds practical application in fields including data mining, text mining, machine learning, knowledge management, semantic web, software development, and biology.

The original motivation of formal concept analysis was the concrete representation of complete lattices and their properties by means of formal contexts, data tables that represent binary relations between objects and attributes. In this theory, a formal concept is defined to be a pair consisting of a set of objects (the "extent") and a set of attributes (the "intent") such that the extent consists of all objects that share the given attributes, and the intent consists of all attributes shared by the given objects. In this way, formal concept analysis formalizes the notions of extension and intension.

5.3.1 Overview of ‘Concepts’ Program: A Case Study

The ‘Concepts’ program, which is written in C, calculates a lattice of concepts from a binary relation and produces output in a format specified by the user. Concept lattices are also known as Galois lattices. Any set of objects can share a (possibly empty) set of common attributes. The same for attribute sets which share common objects. Therefore, every set of objects determines a set of common attributes and every set of attributes determines a set of common objects, forming a pair of object and attribute set. A pair (O, A) of such two sets is called a ‘Concepts’, if the following holds: the set of attributes common to the objects in O is A and the set of objects commonly shared by attributes in A is O [193]. The only documentation available about the ‘Concepts’ program is an installation guide and a brief description about the software. No design or architectural documentation exists. 119

Understand for C and Imagix 4D have been used in this case study for the extraction and understanding of information from the source code. We used this case study to compare the selected architecture reconstruction toolkits. A metrics summary of ‘Concepts’ which was obtained from the “Understand for C” tool [264] is shown below:

Files: 31 Functions: 125 Lines: 9381 Lines code: 3861 Lines comments: 2603

5.3.2 Dali Workbench

View extraction: View extraction is the process of gathering and analysing existing design and implementation of artefacts such as source code, architectural documentation or design documentation. The Dali workbench does not have any extraction tools; thus, it is the user’s responsibility to extract structural information from the source code. There are a number of tools available in the market to choose from to extract structural information: such as Imagix 4D [8], Understand and Sniff+ [20]. In this case study, Understand for C was used for the extraction of information from the source code. Perl scripts were written to transfer data from reports generated by Understand for C to Rigi [195] Standard Format (RSF) which is the only format that Dali accepts. The RSF schema for the ‘Concepts’ program is: function calls function file defines_fn function file defines_global global_variable file defines_macro macro file defines_ADT ADT6

6 ADT = Abstract Data Type including struct and union 120

file uses_macro macro function uses_global global_variable function uses_ADT ADT function defines_var local_variable

The Understand software can only extract static information. Dynamic state information for the ‘Concepts’ program can be captured by using other tools or facilities like profiling or from the makefile.

Database Construction: After obtaining the view(s) (sets of relations) of a system, the views are stored in a relational database for later use. We have used postgreSQL 7.0 for our case study to store the data from the relations.

View Fusion: More than one view can be obtained from the view extraction stage, and those views can be combined to create a fused view [190]. Fusions are defined by using SQL. An SQL file name fuse-types.sql was written to specify entity types for the views extracted in “Concepts”. The static view can be combined with a dynamic view to give an overall view.

Architecture Reconstruction: Dali uses Rigi as its interface. Figures 5.1 is a screenshot presented when Rigiedit is started and is loading a new database (which we have saved in PostgreSQL when constructing the database) to the main Rigi interface window. Dali allows a user to construct more abstract views of a software system from more detailed ones by developing aggregations of elements. This capability is achieved by applying aggregation patterns that are defined as a combination of SQL queries and Perl scripts. The SQL query identifies nodes in the repository and is grouped to form a new aggregation. Perl commands are used to transform names and perform some manipulations of the query’s results.

6 ADT = Abstract Data Type including struct and union

121

Figure 5.1: Dali Interface

There are three aggregation patterns applied in the ‘Concepts’ case study.

1. The first one is function aggregation which aggregates functions and their local variables; 2. The second pattern is file aggregation which accumulates all elements that are defined in files such as aggregated functions, global variables, macros and ADTs; 3. The third one is concepts-architecture aggregation which groups a number of related files into a module. The first two patterns are application-independent patterns since they leverage architectural information common to many applications. The last one, concepts- architecture aggregation is dependent on domain knowledge of the system. This requires human interaction during the reconstruction process. Based on the decomposition of sub-systems specified in the conceptual architecture, the concepts- architecture pattern was developed. Figure 5.2 and 5.3 display the graphical layouts of ‘Concepts’ architecture after each pattern is applied sequentially.

122

Figure 5.2: Raw concrete module of ‘Concepts’

Figure 5.3: The ‘Concepts’ architecture, after function pattern are applied

Figure 5.4: Domain specific components 123

Figure 5.5: The content of sub systems in the domain specific components

Figure 5.4 shows the domain specific components and Figure 5.5 shows the content of sub systems in the domain specific components to do architectural analysis. Architecture Analysis: With the help of Rigiedit interface tool, the as-designed architecture of ‘Concepts’ is drawn. This allows us to have as-implemented and as- designed architectures of a program and hence therefore, a conformance analysis can be performed by using the RMTool [201].

5.3.3 PBS and SWAGKIT Toolkits

The Portable Bookshelf (PBS) is a toolkit used for generating a ‘software bookshelf’ [197]. A software bookshelf for a large system can provide an easily accessible Web-based structure for storing information about a system. The information provided in bookshelf includes source code, as well as other documentation about the system.

Reconstruction using PBS: Similar to Dali, PBS uses RSF format to describe facts. PBS has a C extractor to extract input source code directly. The results retrieved from the extractor then can be manipulated by using a fact manipulator. The fact manipulator provided by PBS is Grok [197]. Grok is a software tool that operates on facts written in RSF. Also, PBS has a Layouter tool which reads facts representing a graph and adds layout attributes to them. Finally, the graphs are

124 viewed by a Landscape Viewer which is a Java applet. To obtain the architecture of a software system, a file in RSF format needs to be written to specify the containment structure of the system. PBS can automatically generate such a file. When PBS was evaluated its functionalities were not fully supported and this toolkit has been taken over by SWAGKIT.

Reconstruction using SWAGKIT: Figure 5.6 shows the highest level architecture of Concepts retrieved from SWAGKIT. SWAGKIT interface is similar to that of PBS; however diagrams are not displayed on web pages. Figure 5.7 shows one of the main sub-systems of ‘Concepts’ Architecture. The following steps were involved in the reconstruction process of ‘Concepts’ [192]:

• Extract facts from source code by using cppx tool which produced *.ta file from the original source files; • Prepare the facts by using prep, which produced *.o.ta from the extracted facts; • Link the facts by using linkplus which produced out.ln.ta from *.o.ta; • Layout the facts by using layoutplus which produced out.ls.ta from out.ln.ta; • Finally, the graphs can be visualised by lsedit.

Figure 5.6: Highest level Architecture of Concepts retrieved from SWAGKIT

125

Figure 5.7: One of the main sub-systems of ‘Concepts’ Architecture

5.3.4 Bauhaus Toolkit

The Bauhaus Toolkit provides a combination of tools to assist the maintenance of software and recovery of software’s architecture [198]. The Bauhaus Toolkit uses three special tools to analyse source code: 1. A C analyser front end cafe which generates intermediate language from the source code 2. A linker imllink which performs global name resolution on IML files and generates a globally linked IML that describes the whole system; 3. A script cafeCC which acts as a front end to cafe and imllink. The compiler used for Concepts program is cafeCC.

Reconstruction using Bauhaus Toolkit: 1. Extraction The Bauhaus Toolkit extracts the following types of global declarations: • routines: C functions defined in source code and included from the environment; • user-defined types: such as structs, unions, enums and typedefs;

126

• objects: global variables and constants; • members: record components of structs and unions; • indirect calls: calls through function pointers.

Figure 5.8: Extraction of global variables

The Bauhaus Toolkit also captures the aggregation of global declarations in modules and the containment of modules in directories. The relations between global elements can be lifted to the modules and the global elements are grouped to directories or sub-systems. This feature is supported by setting the option _total_lift in the iml2rfg tool which reads the IML file and creates a graph called resource flow graph with all global declarations. The Bauhaus Toolkit isolates global declarations. Figure 5.8 shows the extraction of global variables.

2. Visualisation of Dependencies The Bauhaus Toolkit’s interface has two main windows: first is the workbench which contains the main menu and displays status information and second is the view box which helps users manipulate various views of the graph. Components

127

can be browsed from top to bottom of the hierarchical architecture in a view mode of user’s choice. There are three kinds of views: 1. Flat: all nodes and edges are shown in this view; 2. Hierarchical: only the top-level nodes are shown in this view; 3. Shrimp: the level arcs (arcs that are connecting two nodes on different hierarchical levels) can be displayed using this view. Figure 5.9 shows the nodes and edges of two main ‘Concepts’ sub-systems in a Shrimp view.

Figure 5.9: Nodes within two subsystems lib and src and dependencies between them

Figure 5.10: Concepts’ sub-system Node Information in Tree Layout

128

The Concepts’ sub system node’s information can be analysed choosing Imports, Exports, Clients and Details view. • Imports – are those entities C uses from other components; • Exports – are those entities of C that are used by other components; • Clients – are those components that use entities of C; • Details – are those entities of C that are not used by other components. Figure 5.10 shows the Concepts’ sub system node’s information using a particular layout. Different layout algorithms can be applied to the graphs. Some common ones are: • Circular layout: nodes are displayed circularly; • Grid: nodes are displayed in a grid; • Stretch: nodes can be moved apart in X and/or Y axis; • Spring layout; • Tree layout; • Sugiyama; • Tighten: Close ranks nodes in X and/or Y axis.

3. Architecture Conformance Analysis The Bauhaus Toolkit allows users to check the conformance of a system’s implementation if there is a specification of that system’s intended architecture was. The Bauhaus Toolkit lets users specify the high-level architecture and map the concrete components onto the architecture, then compares the high-level architecture to the concrete components and their dependencies [198].

The output of the comparison is a reflexion model highlighting: • Convergences: are references in the hypothesized model also present in the concrete model; • Divergences: are references in the concrete model for which no reference in the hypothesized model exists; • Absences: are references in the hypothesized model not present in the concrete model.

129

4. Transitive Dependencies among Components Figure 5.11 depicts the result of the analysis of the ListInit routine. The Bauhaus Toolkit also allows users to analyse one specific component and its dependencies to other components. All routines can be specified that are needed to be executed for function ListInit.

Figure 5.11: Hierarchical Call for ListInit Routine

5.4 Analysis of Architecture Reconstruction Toolkits

Analyzing the results from the architecture reconstruction case study using the various toolkits is very important as it supports the choice of toolkit that will be used for architecture reconstruction to support modernization. We have analyzed the architecture reconstruction toolkits using several criteria including extraction capabilities, abstraction capabilities, visualization capabilities, language support and completeness. This analysis is discussed in the following sections.

5.4.1 Extraction Capabilities

Integrated extraction tools: PBS, SWAGKIT and Bauhaus Toolkit have an extractor included in their toolkits. An extractor is not part of the Dali workbench, and as such source code has to be extracted by external tools such as Sniff+, Imagix 4D or Understand. Since the extraction process can be done outside the Dali

130 workbench, Dali can be applied to programs written in a wider variety of languages as long as the input file is in the RSF format. PBS is only able to parse C code, while SWAGKIT can parse code in both C and C++ programming languages by the use of cppx front end analyser. The Bauhaus Toolkit’s parsing capability is more extensive since it has analysers for ANSI-conformant C, C++, Java and COBOL as well as accepting inputs in RSF and Graph eXchange Language (GXL) format. Also, most non-strict ANSI-C programs can be processed by the Bauhaus Toolkit analysers through providing appropriate macros [198].

Hiding system library calls: For the SWAGKIT tool, the isolation of standard library files is not done automatically. The users can group those files into a sub- module by modifying the arch-contain.rsf file. The arch-contain.rsf file identifies how the system should be decomposed in a tree hierarchy. A similar process has to be done in PBS to achieve this goal; otherwise a Grok script has to be written to remove those library files from the results. The Bauhaus Toolkit has an option to identify which library calls can be removed from the analysis. Dali has the capability to show the library calls.

Parsing capability: Bauhaus Toolkit allows users to define their own schema for the data; while in Dali the extraction is dependent on the external extraction tool chosen and the architect who performs the task. Local variables cannot be extracted using the Bauhaus Toolkit. Dali could provide more flexibility for users because the users can decide how deep the architectural level they want to obtain from the analysis. A correct compilation of the source code is required for PBS’s extractor to work. SWAGKIT presents a scriptable environment for visualizing the structure of large software systems. Parsers are provided for multiple languages (C, C++ and COBOL). The Rigi C++ parser is based on IBM VisualAge for C++.

Ease of Use: All of the integrated extraction tools are easy to use since each provides straightforward scripts which require little human interaction.

131

5.4.2 Abstraction Capabilities

All four tools can show low-level information to high-level representations through the composition of models. The abstraction of data can be done automatically according to files-modules-directories hierarchy in Bauhaus Toolkit while it is not automatic in Dali and PBS. Abstraction in PBS and SWAGKIT is achieved by the Grok tool. There are many scripts provided with PBS that automate the execution of abstraction. However, a pre-defined sub-system structure has to be written manually before the scripts can be performed. The abstraction in PBS is fast and is done before the visualisation therefore it is more time efficient. It is also possible for users to write their own Grok scripts to manipulate the data on a specific purpose such as analysing only a portion of a system or using an external extraction tool. Dali populates low-level data extracted from an external tool into a Postgres SQL data repository by using Perl scripts. Dali uses the Rigi tool to provide interface and abstraction ability. Manual inspection of the nodes displayed in the resulting graph reveals that all system library calls are not included; they have been deleted during the abstraction process. The users have to write a pattern query file which is a combination of a SQL query and a Perl script command for each abstraction. In Dali, a number of sequential queries can be grouped together to form a query set which can be executed at one time and can be used later to save time for users. The first abstraction process is quite simple, but higher-level sub-system decomposition normally requires domain knowledge. In Bauhaus Toolkit, the abstraction is done automatically. During the extraction, a resource flow graph is created to represent all global declarations (nodes) and their dependencies (edges). When this graph is visualised by the Bauhaus Toolkit graph editor, the components of the system can be browsed from top (directories) to bottom (source code). In addition, logical abstraction can be performed via a component mining process. The Bauhaus Toolkit does automatic analyses, computation of the metrics for the proposed candidate components, presentation of the results and keeping records of the user decisions. The user selects an analysis to be applied. The analysis takes into consideration the components that are confirmed by the user (in the first iteration

132 there are no components). Thus, the analysis is applied incrementally and clusters only those global declarations that have not been clustered before to the new or existing components. Then, the candidate components are presented to the user for acceptance. The user validates the candidate components and then the accepted components get registered into the component memory—that is, the user view. Each iteration requires a user to select and combine different analyses to find components that could not be found by previous analyses. This process ends when “the found components” are sufficient for the task at hand or no further components can be found anymore. Several analyses can be selected and applied in parallel. After finding all the components, the user can investigate and validate the components using intersection, union, and difference analyses. This analysis is an automatic process [198].

5.4.3 Visualization Capabilities

Table 5.2 summarizes the visualisation capabilities of the toolkits. A check mark () in a specific column indicates the presence of the corresponding feature on that row. All tools use graphs that contain only nodes and edges to represent architectural information. Each node represents a component in the system while each arrow represents the relationships between components. The Dali workbench and the PBS bookshelf use only colour code such as red and green to distinguish different types of nodes and edges. Bauhaus Toolkit and SWAGKIT utilise both colours and shapes to differentiate node types and only colour for edges types. These two tools also have a panel describing information of all node and edge types. Moreover, SWAGKIT has a map which is a miniature of the content structure at each hierarchical level of the target node. PBS and SWAGKIT graph viewers do not support scrolling while Dali and Bauhaus Toolkit do. In addition, Bauhaus Toolkit provides zoom features and Dali offers scale feature which is quite similar to zooming but less flexible. Also, only Bauhaus Toolkit offers multiple views of graphs. Dali uses uni-directional arcs to represent the relations between any two components. Dali allows users to create their own diagrams which represent the architecture of a system according to design documentation and then use 133 conformance analysis tool to compare between the as-designed architecture and as- implemented architecture. A similar feature is implemented in Bauhaus Toolkit. In addition, both tools have specific colour codes to distinguish between the divergence, convergence and absence.

Feature Dali PBS SWAGKIT Bauhaus Toolkit Node type in shape - -   Node type in color     Node type in text - - -  Edge type in color     Edge type in text - - -  Move nodes and     arcs Node hierarchies  -   Resize nodes -   - Bi-directional -   - edges Scroll  - -  Zoom  - -  Annotation     Different layouts  -   Saving graph  -   Opening graph  -   Create user’s own  - -  graphs

Table 5.2: Feature Table lists Visualisation Capabilities of the Toolkits

Navigation features: All of the tools except PBS support hierarchical browsing which reinforces the abstraction of components into modules. Multiple nodes can be collapsed together into one parent node. The children nodes can also be viewed. While Dali and Bauhaus Toolkit open a new window to display children nodes, SWAGKIT and PBS display a new graph in the same main window (the current one is cleared). Bauhaus Toolkit can either display a nested graph in the same window or display children nodes in a new window. Bauhaus Toolkit has a manager to control these windows.

134

Views and Layouts: The readability of a graph can be examined by analysing the issues of node shape, node size, and graph layout, number of arcs crossing and placements of nodes. In Dali nodes are represented as a square box whereas in PBS it is a rectangular box. SWAGKIT and Bauhaus Toolkit supply a number of different shapes for nodes such as circle, square, hexagon and triangle. Dali and Bauhaus Toolkit allows full editing of the graphs including adding, deleting, modifying and moving nodes. PBS provides only one layout option which is automatically invoked when the user interface is started while Dali, SWAGKIT and Bauhaus Toolkit have more options for layout. However, Dali and SWAGKIT do not have as many layout options as Bauhaus Toolkit. In Dali, an appropriate graph layout is not automatically invoked. Horizontal and Vertical layouts gives tight structure of nodes and their names. The Bauhaus Toolkit has a number of layout options to choose from depending on type of graph needed to be presented. All the tools allow arcs crossing and arcs can even pass through nodes to which they are not attached. This can make graphs look more complex.

5.4.4 Other Attributes

Table 5.3 shows the assessment of the Software Architecture Reconstruction Toolkits based on assessment criteria: Extraction capabilities, Abstraction capabilities, Visualization Capabilities, Language support and Completeness.

Assessment Dali PBS SWAGKIT Bauhaus Criteria Toolkit Extraction capability Integrated - ++ ++ ++ extractor Hiding system + 0 0 ++ library calls Parsing ability 0 ++ + + Ease-of-use + + + ++ Combined views + - - + Abstraction capability Files-directory ++ 0 ++ ++ 135 decomposition Logical sub- ++ + ++ ++ module decomposition Ease-of-use + 0 + + Visualization Node types + 0 ++ ++ Edge types + 0 + ++ Combined views + - - ++ Annotations ++ 0 ++ ++ Layouts + 0 + ++ Navigation ++ 0 ++ ++ Node attributes + 0 + + editable View editable ++ ++ - ++ User interface + 0 ++ ++ Views storable ++ - ++ ++ History of - - 0 0 browsed locations Language ++ 0 ++ + support Completeness + 0 + +

Table 5.3: Assessment of the Software Architecture Reconstruction Toolkits ++: excellent, +: good, 0: minimal and -: not at all.

Language support: Dali supports C, C++, Java and FORTRAN. Bauhaus Toolkit supports C, C++ and Java. SWAGKIT has GCC-based fact extractor for C, C++, FORTRAN and PASCAL. Completeness: Functionalities that a software architecture reconstruction tool should have are extraction ability, abstraction ability, views, and navigation. Bauhaus Toolkit has a combination of tools that support all these functionalities. Dali also possesses all these features except the extraction ability. PBS and SWAGKIT have extraction ability but are not capable of hiding environment calls automatically and they do not have an easy implementation of architecture conformance analysis. Both PBS and Bauhaus Toolkit’s structures and utilities are useful for maintenance and reengineering of software systems; while Dali and SWAGKIT are mainly designed for reverse engineering purposes.

136

In Extraction capabilities we have identified if the tools have integrated extractor, hiding system library calls, parsing abilities, ease of use and combined views. In Abstraction capabilities we have identified if the tools support Files- directory decomposition, Logical sub module decomposition, and ease of use of abstraction capabilities. In Visualization capabilities we have identified if tools can support Node Types, Edge types, Combined views, Annotations, Layouts, Node attributes are editable, Vies are editable, user interface, views can be stored and history of browsed locations.

5.5 Chapter Summary

This chapter discussed different architecture reconstruction toolkits and the toolkit we used in our modernization approach. In this Chapter, we have also examined four software architecture reconstruction toolkits and assessed and compared their capabilities. The toolkits’ capabilities are evaluated in terms of extraction ability, abstraction ability, navigation, ease-of-use, views generated, language support, extensibility and completeness. The reconstruction toolkits evaluated are all quite different with varying strengths and weaknesses. The capability of each software architecture reconstruction toolkit was evaluated by applying the toolkit to the “Concepts” program written in C. However, no tool is fully automatic since it requires human interaction at some stage to obtain application-specific knowledge about the system. Dali provides high flexibility in supported languages, abstraction ability and integration of new tools into the workbench. It is also good at visualizing architectural information in various layouts but it does not have any history mechanism. It supports multi language information (that is information extracted from a variety of programming languages). PBS is the model of building software bookshelves which store system documents on a web server to be shared by many users at the same time. It has its own extractor and the abstraction of data requires a special file specifying the containment of the system’s components. It does not support views and navigation efficiently. PBS does not support multi language information.

137

SWAGKIT provides good visualisation capabilities. It also has its own extractor and abstraction ability similar to PBS. However, both PBS and SWAGKIT do not offer functionality to perform the architecture conformance analysis. The Bauhaus Toolkit features such as layered views, Shrimp views and various layout algorithms. It provides a number of other functionalities for software maintenance purposes apart from full functionalities in software architecture recovery. Moreover, Bauhaus Toolkit can parse a variety of languages through its analyzers for C, C++, Java and COBOL. Bauhaus Toolkit is better at visualization while Dali is more extensible. Although Dali, PBS, SWAGKIT and Bauhaus Toolkit are prototype toolkits, they all provide concrete approaches for extraction, abstraction and visualization. However, none of them is complete. Their development should be continued to extend their scope, improve their extraction capabilities and user interaction. Especially, the history mechanism should be better supported in all tools to provide users with more flexibility in navigating different abstract levels of an architecture. An option of playback or ‘undo’ should be developed to provide the user more ease in browsing the hierarchical presentation of architecture.

138

CHAPTER 6: REUSING CODE FOR MODERNIZATION OF LEGACY SYSTEMS

6.1 Introduction

This chapter describes our modernization approach that reuses code during the modernization of legacy systems. As software systems become increasingly large and complex [45], the potential for developing systems from scratch has decreased [202]. Instead, software developers look for opportunities to reuse existing functionality (in the form of libraries, Application Programming Interfaces (APIs) or elements of existing systems). Accordingly, reuse has become an important area of Software Engineering [47]. Software reuse is identified as one of the best strategies to handle complexities associated with development and modernization of complex legacy software [2], [203] and [149]. Analyzing, identifying, extracting and reconstructing software components that implement abstractions within existing systems is a promising cost-effective way to create reusable assets and modernize legacy system. Changing technology is pushing the modernization of legacy system in several ways. Design patterns and generic programming can be employed to take advantage of legacy systems. For current green-field projects, the situation with respect to software reuse has been somewhat alleviated by the emergence of component-based development (CBD) and the availability of deployment platforms that support component-based development. In addition, the explicit segregation of process- oriented and data-oriented layers in the software architecture [204] and [205] of component-based systems has further increased the potential for reuse. Many approaches to modernize legacy systems have been developed [2], [206] and have been discussed in Chapter 3. The current situation in legacy system modernization can be summarized as follows: • Most of the database reverse engineering literature examines solutions for migration to relational databases from existing databases [207];

139

• Database reverse engineering is sufficiently mature to be applied in practice [207]; • Wrapping solutions are short-term solutions that can complicate legacy system maintenance and management over time [2]; • Redevelopment approaches are considered too risky for most organizations [2], [13]; • There is a lack of literature on successful modernization processes. Many modernization projects fail as outlined by the Standish Group [13]; • The reverse engineering of procedural components of a large application is still unsolved [208], [149]. According to Chikofsky and Cross [269] reverse engineering generally involves extracting design artefacts and building or synthesizing abstractions that are less implementation-dependent. While reverse engineering often involves an existing functional system as its subject, this is not a requirement. Reverse engineering can be performed starting from any level of abstraction or at any stage of the life cycle. Reverse engineering in and of itself does not involve changing the subject system or creating a new system based on the reverse-engineered subject system. Reverse engineering is a process of examination, not a process of change or modernization. Forward engineering is the traditional process of moving from high-level abstractions and logical, implementation-independent designs to the physical implementation of a system. Forward engineering follows a sequence of going from requirements through designing its implementation [269]. The existing approaches to reverse engineering/forward engineering do not integrate software architecture reconstruction. Given the scale, complexity, and risk of failure in legacy system modernization projects, a well-defined, easily implemented, and detailed methodology is essential to modernization success. However, few comprehensive legacy migration methodologies are available, and a general approach has yet to be agreed on. Existing approaches are either too high level or have yet to be applied in practice [59], [157] and [158]. Although partial solutions such as wrapping are widely adopted, such solutions are short term and can actually complicate legacy system maintenance and management over the long term [30].

140

Another aspect of modernization is that software artefacts must not be thrown away. Software artefacts are usually built over time and they should be reused in the modernization process. These assets are invaluable to an organization and an organization’s day-to-day running is based on these assets. But the main problem is discovering these software assets. They could be software architecture, design decisions made early in the software development, documentation, source code, and software artefacts. There is a lack of modernization approaches that are methodical, repeatable, and learnable through their different phases and activities. Different modernization approaches require different activities as discussed and these activities cannot be made standardized for all legacy systems. We need a systematic legacy system modernization approach which is methodical, repeatable, and learnable through its own activities so that experience gained on one modernization project can be reused on other modernization projects. Software engineers and developers need to carry out modernization for maintenance [209]. The acceleration in the modernization process involves the reuse, as far as possible, of components and software artefacts that already exist in the organization to attain two main advantages [210]. First, modernizations are cheaper and faster with software reuse. Second, the lifespan of the legacy systems is extended and improved for maintainability [211]. In our context modernization of legacy system keeps the business value intact and improves the quality attributes of the system such as maintainability. Over the years the complexity of legacy systems increases and quality decreases because of continuous maintenance. When modernizing, the existing software artefacts need to be reused to save time and effort. Some of the benefits of software reuse in modernization are such reduced time and effort to modernize, improving maintainability of the legacy system, potentially improved reliability of the system, etc. [265]. Many of the problems of legacy systems are due to the fact that the system’s architecture is virtually un-documented. Moreover, the architecture could have degraded over time, due to modifications or extensions that violate the initial architectural principles, resulting in overall chaos. Modernization of legacy systems must be able to address these issues. As discussed earlier there are different approaches to modernization of a legacy system. But how far these approaches address the undocumented architectural 141 issues is a question. We have not come across any modernization approach which incorporates architectural reconstruction for the understanding of the legacy system. Software architecture reconstruction must be viewed not as an effort on its own but as a contribution in a broader technical context, such as the streamlining the modernization of legacy systems that hit their architectural borders, that is, require major restructuring [48]. It is assumed that the business value of our legacy system is high and organizations cannot afford to lose it. Our modernization approach is based on reuse of code to improve the maintainability of the legacy system. This Chapter describes our approach of reusing code for modernization of legacy systems.

6.2 Overview of Our Approach to Legacy System Modernization

For many applications within the science and engineering community the root implementation language was and still is FORTRAN 77 and for some, even FORTRAN 66. Software engineering has evolved and languages have grown and now FORTRAN 2003, C, and C++ provide the main modern vehicles for these applications. To maintain and continue to develop the science encapsulated in the code of these legacy applications a process of modernization must be formalized and undertaken. Our strategy is based on the assumption that there are no changes to the functional requirements for the software. The legacy and the modernized system will behave identically from a user’s viewpoint and will differ only in the design and implementation for reusability and better maintainability. If changes to the system requirements are to be implemented as a part of a project, a different approach will be needed. Our research does not focus on this issue. The modernization approach we have demonstrated transforms from a procedural model to an object-oriented model. Using an object-oriented approach promises reduced maintenance, code re-usability, real world modeling, and improved reliability and flexibility [203]. Using an object-oriented modelling approach reduces the maintenance cost. The primary goal of object-oriented development is the assurance that the system will enjoy a longer life while having far smaller 142 maintenance costs. Because most of the processes within the system are encapsulated, the behaviors are reused and incorporated into new behaviors [203]. Object-orientation has high code re-usability. When a new object is created, it will automatically inherit the data attributes and characteristics of the class from which it has spawned. The new object inherits the data and behaviors from all super- classes in which it participates. In our context the definition of modernization of legacy system is to improve the quality attributes of large monolithic complex legacy system which has high business value. And hence, therefore, transforming procedural programming to object-oriented model is a part of our modernization approach. Figure 6.1 shows an overview of our modernization approach. Our modernization approach is divided into three major activities, which are:

• Activity 1: Reverse Engineer an Analysis Model • Activity 2: Re-Structure into an Object Model, and • Activity 3: Forward Engineer using Object-Oriented Methods.

Activity 2

Re- Structure

into an Activity 1 Obje ct Activity 3 Model

Reverse Engineer Forward Engineer

an Analysis Model using Object- Oriented Methods

Legacy System Modernized System

Figure 6.1: Legacy System Modernization Approach

143

6.2.1 Activity 1: Reverse Engineer an Analysis Model

An Analysis Model is a means to describe the architecture of the system. The reverse engineering activity recovers design information from the source code and any existing documentation. Reverse engineering begins with the system and works through the design process in the opposite direction. In doing so, it uncovers as much information as possible about the design ideas that were used to produce a particular product. The Reverse Engineer an Analysis Model activity consists of two phases, which are: • Phase 1: Analyse the legacy system • Phase 2: Reconstruction of legacy system

6.2.1.1 Phase 1: Analyse the legacy system

Most legacy systems were developed using programming paradigms and languages that lack adequate means for modularization. Consequently, there is little explicit structure for a software engineer. This makes modernization or extension of such a system a difficult task. Object-orientation is advocated as a way to enhance a system’s correctness, robustness, extendibility, maintainability, and reusability, the key factors affecting software quality [212]. Many organizations consider migration to object-oriented platforms in order to tackle maintenance problems. However, such migrations are hindered themselves by the lack of modularization in the legacy code. If the legacy code cannot be modularized it cannot be reused in the transformation to an object-oriented platform and hence cannot be modernized for maintainability. For modularization of the legacy system it is very important in the analysis phase that the software engineer brainstorms about what to modularize. The purpose of the analyses of the legacy system depends on the objective of software engineer and what they want to achieve. As stated earlier our objective is to modernize from a procedural programming model to an object-oriented programming model. This phase analyses the legacy system to capture the structure and to identify problems caused by the past development and evolution. Analysis of the legacy system is supported by a software architecture reconstruction tool. We have used ARMin (Architecture Reconstruction and MINing) for the reconstruction of our legacy system. ARMin (Architecture 144

Reconstruction and MINing) is a tool developed by the Software Engineering Institute and Robert Bosch Corporation. This tool is a successor of Dali. At the time when we did the comparison of software reconstruction toolkit ARMin was not available. ARMin can incorporate architectural specific elements and concepts into the model, and is not bound by the information in the source code. The legacy system is analysed and the results represented in a form suitable for further analysis. This information is very necessary for modernization of a legacy system to an object-oriented system. It identifies the candidate objects/components in a given legacy system that can be reused. This task includes gathering all systems artefacts such as source code, copybooks, Job Control Language (JCL), etc., statistics about the size, complexity, dead code or unused code [213].

Information Description

Module Subroutine or Procedure or Function Name

File Name of source code file where module is defined

Metrics Size, Complexity, structuredness, etc.

Parameters Explain the purpose of each input and output parameter

Purpose Describe the purpose or the functionality of the module

Variables Identify and explain each local variable. Global variables and elements from COMMON BLOCKs are identified. Indicate whether global variables are used or set.

Table 6.1: Module Analysis Data [214]

In the analysis the important part is to create a description of each module and each data item as shown in the Table 6.1 [214]. Table 6.1 suggests information to record for each module. The information recorded becomes a software artefact will be reused in the modernization of the legacy system. The description of each module and each data item will also suggest what components can be reused for

145 modernization. In the absence of domain experts this information helps the software architect to understand the source code. In this case the software architect should be able to have a higher-level view of the legacy system, which is often difficult for developers and project managers because they are often too focused on a specific project and on immediate needs. This means the software architect should able to find out the information from the software system. The knowledge gained during this task is invaluable during the modernization process. This knowledge is about the complete understanding of the legacy system.

6.2.1.2 Phase 2: Reconstruction of legacy system

This phase of the modernization process discovers the design of the legacy system. Techniques that help analysts extract design and structure information can support software reuse by helping to locate reusable software components from existing systems. Abstract models represent modules and relationships between components. The source code hosts the most current information about the system. The software architecture can be fully understood with the help of software architecture reconstruction (SAR). The importance of SAR has been established in understanding legacy systems [204], [205], [206] and [13]. The end result of the architecture reconstruction process is a set of architectural views. In order to modernize the legacy system into object-oriented paradigm we need to identify the dependencies that legacy components have. Dependencies include: • Data dependencies where global data is shared between a component and other parts of the system; and • Functional dependencies where a component uses other parts of the system in order to carry out its functionality or other parts of the system use the component. The visualization of the architectural views can help us in restructuring by understanding: • What are the subsystems or components of the software system? • How should the interfaces between components be structured? • What are the characteristics of the component communication?

146

The architecture reconstruction effort can produce useful architectural metrics and views. Figure 6.2 shows the steps involved in SAR that are supported by the ARMin tool.

Source Architectural Architectural Information View Views Extraction Composition Selection

Figure 6.2: Steps in Reconstructing the Architecture of a Software System

During the reconstruction process the relations between the elements are aggregated into high-level abstractions. Identifying all external dependencies that a component has is important when considering modernization from one language paradigm to another. Of particular importance are the dependencies between components. The major difficulty is to understand the large and complex system architecture without documentation through source code. The difficulty lies in understanding the semantic relationship between the components that are far apart syntactically. Architecture reconstruction tools aid the software engineers to understand the source code and to extract complex patterns, dependency of components, and relations between the modules and elements. It aids in the complete understanding of the legacy system in context. For a procedure-oriented to object-oriented design change the legacy system decomposition is highly desirable for code reuse. The program decomposition would depend on the components of the legacy systems and depending on how separated and well identified these components are, the architecture of a legacy system can be decomposable, semi-decomposable, or non-decomposable [72]. In a decomposable system, the user interface, business logic and the related databases can be considered as distinct components with well-defined interfaces. In a semi-decomposable system, only interface and business logic are separate modules. The business logic and databases are not separable, due to their complex structure. In these systems the interaction of the business logic and the databases are not easy to understand. In a non-decomposable system, the system appears as a single unstructured and 147 monolithic module. These systems are complex and have to be treated as black boxes.

6.2.2 Activity 2: Re-structure into an Object Model

The major purpose of the object model is to describe an object structure. The Object structure defines the logical design of the system. When the logical design becomes specific to technology it becomes a physical design. Poor logical design leads to poor physical design and hence results in complex software which is difficult to maintain and modernize. Logical structure is analyzed from the logical design. Logical design is not specific to implementation but uses different modeling tools to model the software systems. Modeling tools are UML class diagram, UML activity diagram, sequence diagram, collaboration diagram etc. The logical design allows us to identify the responsibilities of each class such as: What this class is supposed to do? Why is the class included into the architecture? The responsibilities of an object of a specific class are described by answers to a set of questions which include the following:

• What information or knowledge is the object responsible for maintaining? • What should the object be able to do? • What other objects does the object need to know to full fill its responsibilities? • What are the inheritance relations? Are the responsibilities of objects of this class similar to that of other classes?

We use the analysis model of the legacy system to create an object model using UML and class diagram so that we could identify the logical structure of the legacy system. Analysis model has created a description of each module and each data item as shown in Table 6.1. The dependencies between the modules in the legacy systems have been detected using SAR. Looking into the description and the dependency of each module the Object model is created. UML and class diagrams of each module were generated and analysed to see the data flow, and if any functional decomposition is possible. Based on the result of analysing the Analysis Model and

148 software architecture reconstruction of the legacy system we re-structured the modules which showed low or no dependency. We also identified the code which can be used as an object by itself. We identified objects based on class, and software patterns. Software patterns are represented by the code itself [32]. We applied re- structuring techniques to make objects loosely coupled and highly cohesive. Software re-structuring is the process of re-organizing the logical structure of existing software systems. Objects are the structural units of the actual information system. After the objects have been identified, (objects are entity of the information system) the next step is to define the methods or the functionality of these objects. The methods for the components are determined using both the invocation statements and the bodies of the components. Subroutines are actually collection of methods and perform some related task in the information system. Objects are created from the subroutines. The invocation statements are used to provide the proper mapping of formal parameters to actual parameters while the bodies of the subroutine are considered line-by-line to define the actual methods. During this line-by-line analysis of the subroutine, an assignment statement is classified as one of the following types: incrementing, decrementing, computing or redefining. An incrementing assignment statement takes the form X = X+c, where c is some constant and X is a variable. Such an assignment statement results in the method Increment_X(c). Similarly a decrementing assignment statement takes the form X=X-c, and results in the method Decrement_X(c). When X appears only on the left hand side, as in X= EXPR, where EXPR is any valid expression such as EXPR = A+B; but not a function invocation, the resulting method is Compute_X(var_list) where var_list is a list of all variables in EXPR, in this case it is A, B. EXPR form the body of the method. Finally, in all other cases where X appears on both the left hand side and right hand side of the assignment statement, the resulting method is Redefine_X(var_list) where var_list is a list of all variables on the right hand side of the assignment statement (including X) [18]. And the right hand side becomes the body of the method. Figure 6.3 shows the artefacts which have been identified as a result of the re- structuring of the legacy system. These artefacts are: • Object Model: It consists of Data Flow Diagram (DFD), Functional Decomposition Diagram (FDD), Unified Modelling Language (UML) 149

models, and any artefacts that have been used for the legacy software development; • Software Patterns: It contains the interconnections of the modules in the system. It is it an identification of the patterns used and where they occur in the system and what components are involved. • Library: It contains source code for modules, functions, units, systems or sub-systems; • Legacy Documents: It contains requirements, specifications, test cases, etc... Restructure into Object model activity is speculative and proposes a structure for the modernized system for future evolution. The most important qualities of software, in fixing bugs and upgrading software are the code that is easy to understand, test, and maintain. Re-structuring improves the quality of software [28]. Re-structured code is easier to understand, test and maintain [28]. Since the focus of our modernization approach is on changing the programming paradigm and re-using legacy code from procedural to object-oriented, re-structuring is an important aspect of it.

Figure 6.3: Artefacts Analyzed from Legacy System

150

Generating Architectural Patterns

No

Is Re-structuring Yes Architectural Views done? An object model

NO

Re-structuring the Code

Figure 6.4 Re-structuring the components

Figure 6.4 illustrates the process of re-structuring the components. It consists of following reusable steps:

1. Re-structuring the Code: Architecture views are representations of the overall architecture. The software architect chooses and extracts a set of views that enables the architecture to be communicated to, and understood by, all the stakeholders, and enables them to verify that the system will address their concerns. The code re-structuring manipulates design information. This design information is extracted from the architectural views. We re-structure components and manipulate design information by decomposing and composing the design (described later) in line with targeted architectural patterns. The design of the legacy system is governed by the existing source code. The re-structured components provide a set of predefined subsystems, specifies their responsibilities, and includes rules and guidelines for organizing the relationships between them.

151

2. Is Re-structuring done?: The re-structuring process is a manual exercise. The software architect needs to see from the generated architectural patterns what could be re-structured. Once the software architect is satisfied that the re- structuring is complete, meaning thereby no more independent components or elements can be extracted or no more subsystems can be extracted for independent use, and all components of the legacy systems are loosely coupled and tightly cohesive, this step generates the re-structured object model to be reused for modernization of the legacy system. Re-structuring allows the components to be reused. The re-structuring process consists of a series of function preserving decompositions and compositions of ‘processing elements’. If the functions are in the same logical unit a view that shows the logical connection can be generated through abstraction and grouping of the functions within the unit. If the functions are in different logical units but have relationships for example there are calls between them or they share data then through a different abstraction it would be possible to generate a view that shows the connection between the logical units and the calls that make up that connection and also abstraction through a data view could show a set of functions related to some logical grouping of data. Several models capture the relationships between modules. Four types of relationships are extracted using ARMin. They are: 1. Common relation: a subroutine sends information to another through a global component. 2. Call relation: a procedure imports another subroutine’s computation to execute its functions; a subroutine calls another subroutine. 3. Sequential relation: an output of a subroutine is passed to another subroutine as an input; an output of a subroutine is used as an input of another subroutine, and 4. No relation: two subroutines do not have any of the above said relations.

Re-structuring is carried out by applying re-structuring operations. Re- structuring is carried out until the subsystems are made loosely coupled. The little pieces of subsets of the subsystems are put back together to make the solution space 152 from where the subsets of the subsystems are extracted. We use the following eight basic restructuring operations [213], [215]:

1. Coincidental Decomposition: A subroutine exhibiting coincidental cohesion has dis-joint components – one or more groups of objects linked to individual outputs without any dependence relations on another group. Separating the disjoint groups can easily split these procedures. Subroutine S1 is decomposed into subroutine S2 and S3. S1  (S2, S3). Splitting the subroutine does not mean that we are changing the functionality of the subroutine. It is only being decomposed so that S2 and S3 can be reused independently as a component. Subroutine S1 has been broken up in two parts. The functionality still remains the same. Figure 6.5 illustrate coincidental decomposition. I1, I2 represents the input to the subroutine and O1 and O2 represents the output of the subroutine. S1 has been decomposed into two subroutines S2 and S3. S2 has I1 as an input and O1 as an output. S3 has I2 as an input and O2 as an output.

S1 S2 S3

I1 I2 I1 I2

O1 O2 O1 O2

Figure 6.5: Coincidental Decomposition

2. Conditional, Iterative or Communicational (CIC) Decomposition: Subroutines with conditional, iterative, or communicational cohesion have one or more inputs tied to all of the outputs. Copying all objects linked to more than one output can decompose these subroutines. Subroutine S1 has I1 as input and generates outputd O1 and O2. Subroutine S1 can be 153

decomposed into subroutine S2 (I1 as input, O1 as output) and subroutine S3 (I1 as input and O2 as output). S1 (S2, S3). Figure 6.6 illustrates this. Subroutine S2 and S3 can be reused because they are independent components or lightly coupled components.

S1 S2 S3

I1 I1 I1

O1 O2 O1 O2

Figure 6.6: CIC Decomposition 3. Decomposition with Sequential Cohesion: In subroutines with sequential cohesion one or more outputs depend on other outputs. Splitting the subroutines into two or more subroutines with sequential coupling, which is one of the most desired forms of coupling, can decompose such subroutines. S1 (S2, S3). The output O2 depends on the output O1 which depends on the input I1. When decomposing it the O1 output becomes I2 input for S3 group and O2 depends on I2 which is actually O1. Figure 6.7 illustrates this. 4. Sequential Composition: Sequential composition is to compose two subroutines with sequential coupling. S1 (S2, S3). The output O1 depends on the input I1 and the output O2 depends on the input I2. If input I2 and the output O1 are the same as shown in the Figure 6.7, they can be composed together. 5. Sequential Decomposition: An alternative to Decomposition with sequential cohesion is to replace an output component with a subroutine call, and move the output components and the components that it depends on into a separate callee subroutine. S1:O1 S2. Figure 6.8 describes this.

154

S1 S2 S3

I1 I1

O1 O1 I2

O2 O2

Figure 6.7: Decomposition with Sequential Cohesion and Sequential Composition

S1 S2 S3

I1 I1

I1

O1 S2

O1

O2 O2

Figure 6.8: Sequential Decomposition

6. Caller/Callee Composition: Two subroutines can be composed if they have a call relation, exhibit either conditional or computational coupling, and the callee has only one caller. The call statement is replaced by the tokens of the callee, and unnecessary coupling is reduced. (S1, S2 )  S3. Figure 6.9 describes this.

155

S1 S2 S3

I I1 I1

O1 O S1

O2 O2

Figure 6.9: Caller/Callee Composition

7. Hide: The Hide operation, converts an exported output into a hidden local variable, when the exported output is not actually used externally. S1  S2 as shown in the Figure 6.11 where O1 becomes a hidden variable.

8. Reveal: Reveal is the inverse of Hide. Using reveal, a local variable is exported by changing the local variable into an output variable; Reveal can be used to separate a hidden function from a large subroutine. S1 S2 as shown in the Figure 6.10 where O1 becomes an output variable.

S1 S2

I1 I1

O1 O1

O2 O2

Figure 6.10: Hide and Reveal

156

The analysis of the component view shows the relationships between modules. These relationships are useful in applying restructuring operations.

6.2.3 Activity 3: Forward Engineer using Object-oriented Methods

In this activity the goal is to implement the legacy system as an object-oriented system. The components of the legacy system to be reused have already been identified in the previous activities for object-oriented implementation. Figure 6.11 illustrates forward engineering with reuse of legacy artefacts.

Figure 6.11: Forward Engineering with Reuse of Legacy Artefacts

The re-structured subroutines or components can be viewed as implementing an abstract data type, through encapsulating a set of data (i.e. attributes) and a corresponding set of permissible actions on the data (i.e. methods). Each component

157 is an autonomous entity and interacts with other components during the execution of the system. The functionality of the subroutines can be viewed at many levels with the most important ones for component identification being: • The functionality of the individual component; and • The functionality of each line of code. The core idea of object-oriented programming is that the objects we are interested in and the methods that operate on that object are contained in a single class. The single class contains the methods and the attributes of how that class is defined. Once the methods and variables of re-structured subroutines are identified the subroutines can be programmed in any object-oriented programming language. In the Forward Engineering we did the transformation of procedural to object-oriented programming language and used the aspects of encapsulation, information hiding, generating classes, and inheritance. We packaged data and the code that operates on the data (methods) into a single unit called an object. The use of object-orientation and languages such as C++ is becoming more common in scientific software.

6.3 Chapter Summary

This chapter described our modernization approach that reuses code during the modernization of legacy systems. Legacy system modernization should be effective and semantic preserving. Our approach emphasizes modernization of legacy system through a reuse approach. Maintainability of the modernized system becomes easier for legacy systems. We believe that reusing software is very important for software modernization to hold on to its business value. From the economic perspective, it has been reported that reuse strategy could save more than 20% of the development cost [208]. If existing software is to benefit from advances in object-oriented methods, the software artefacts must be reused. This chapter has presented a new approach on reusing code for modernization of legacy system. The resulting modernized system uses object- oriented programming paradigm and documentation and source code are reused.

158

Reuse has become possible only after reconstructing the software architecture where dependencies between the subroutines are identified. Experience with applying the new approach suggests that the basic process works quite well for the subroutines on which we have applied it. Reconstruction and restructuring are the prime activity to identify reusable code. Our modernization approach has three activities. Activity 1 is to Reverse Engineer an Analysis Model. The input to this activity is a legacy system and output is analysis model. Analysis model describes the architecture of the system. To find out the architecture of the system we identified the system’s components and the interrelationships between the components. Activity 1 consists of two phases, Phase 1 is to analyze the legacy system and Phase 2 is to reconstruction of legacy system. Analysis of the legacy system is important part to create a description of each module and each data item of the legacy system. We used ARMin to reconstruct the architecture to understand interfaces between the components and the subsystems, and the characteristics of the components and the subsystems. Activity 2 is Re-Structure into an Object-Model. Re-structuring gives the logical structure of the existing software. We have identified different artefacts as a result of re-structuring such as object-model which consists of DFD’s, FDD’s and UML’s, software patterns, library and legacy documents. Activity 3, Forward Engineering Using Object-Oriented Methods, implements the objective of modernization of the legacy system. The components of the legacy systems to be reused have already been identified in the previous activities for object-oriented implementation. The outcome this activity is the modernized system.

159

PART 2

160

CHAPTER 7: ORGANIZATION OF SOFTWARE REUSE SURVEY

7.1 Introduction

This chapter outlines the structure of the software reuse surveys we did with the CSE and SPL communities. In these surveys we have collected and established actual data about software reuse in software development that would help us to better understand the issues and concerns around reusing code for system modernization. The survey results were analyzed to identify issues and concerns in software reuse. As software reuse is the use of software artefacts from all stages of the software development process in the development of new applications, given the high cost and difficulty of developing high quality software, the idea of capitalizing on previous software investments is appealing. However, software reuse has not been as effective as expected and has not been very broadly or systematically used in industry [50]. The practice of software development continues to shift towards the reuse of legacy systems to handle the complexities of software development [218]. However there are many issues and concerns around reusing software and software reuse cannot possibly become an engineering discipline so long as these issues and concerns have not been clearly identified, understood and dealt with. The modernization approach we have built is based on software reuse and is discussed in Chapter 4. Modernization involves more extensive changes than maintenance, but conserves a significant portion of the existing system [13]. To conserve the investment done in the development of software system, software reuse has to become an integral part of legacy software modernization. Software reuse is also required for new software development to save on cost and time to market [11]. Software artefacts already developed can be reused. There are lots of advantages of software reuse such as reduced time to market, improved productivity and improved quality of software [266]. However, to reap the benefits of software reuse in modernization which are reduced time and effort to modernize, improving maintainability of the legacy

161 system, potentially improved reliability of the system, etc. [265] and in software development we need to understand the issues and concerns around software reuse. To understand the issues and concerns around software reuse we have conducted two surveys on software reuse practices using a framework that helps identify and organize the many factors that must be considered to achieve the benefits of software reuse in practice and to identify the issues and concerns that practitioners have around software reuse. The first survey was done involving participants from the Conventional Software Engineering (CSE) community and the second involving participants from the Software Product Line (SPL) Community. According to our definition the CSE community consists of software engineers, developers, programmers, architects and software professional who are involved in the development process of software, system analyses, software design, software testing and software maintenance, that are following traditional approach of software development, integration, maintenance and modernization. We chose this community to understand how much software reuse is done in this community and what are the issues and concerns of software reuse in this community. The SPL community consists of professional working in the area of Software Product Lines where software reuse is an inherent part of product development. They are doing the same activities as outlined above for the CSE community but are using product line approaches. We wanted to explore the issues and concerns of software reuse and how they are tackling software reuse in their community. A survey is a systematic method of collecting data from a population of interest. It tends to be quantitative in nature and aims to collect information from a sample of the population such that the results are representative of the population within a certain degree of error. Surveys represent one of the most common types of quantitative and qualitative research [219], [220]. In these surveys we have collected and established actual data about software reuse in software development that would help us to better understand the issues around reusing code for system modernization. The survey results were analyzed to identify issues and concerns in software reuse. In our work, we are concerned with a concept of software reuse in which all of the artefacts of the software life cycle such as specifications, requirements, data flow diagrams, documentation etc. are reused by developers within the same, as well as 162 different organizations, on a broad spectrum of development tasks and across a variety of software application domains. The rest of this chapter outlines the structure of our surveys.

7.2 Survey Framework

To get a better understanding of software reuse in the CSE and SPL communities we conducted two surveys. Initial planning of the survey design and survey questions is extremely important in conducting survey research [221]. The framework we used for the development and administration of the questionnaire and the analysis of the results is shown in Figure 7.1. It consists of five steps which are as follows:

• Step 1: Gather Good Candidate Questions

• Step 2: Select Questions of Interest

• Step 3: Select Companies

• Step 4: Analyze Strategies for Conducting Survey

• Step 5: Analyze Survey Results.

Because software is intangible, the information about software reuse can only be provided as experience reports, questionnaires and documents that describe the state of the software reuse, its issues and concerns in the software development community. Without the information about software reuse, it is impossible to assess how well software reuse is progressing in an organization. When planning this survey, we established a series of milestones, where a milestone is a recognizable end-point of a survey framework step. At each milestone we had a formal output to work on. At the first milestone we had a Questionnaire Report. The second milestone gave us a Survey Design Report. The third milestone gave us a Company Participation Report. The fourth milestone gave us a Strategy Requirement Report. Finally the fifth milestone was a Survey Result Report.

163

As we wanted to conduct a survey that would point out areas that need more attention for software reuse, we used empirical enquires from different people involved in the different stages of software development to better examine software reuse and how it is done in the software industry. We used a mixture of quantitative and qualitative approaches to the questions and gave space for the participants to expand on their responses and add additional information if they wished

STEPS MILESTONES

Questionnaire Report Gather Good Candidate Questions

Survey Design Report Select Questions of Interest

Company Participation Report Select Companies

Analyze Strategies for Conducting Strategy Requirement Report Survey

Survey Result Report Analyze Survey Results

Figure 7.1: Framework of Our Survey Process

Any survey requires a considerable amount of time [221]. It takes time to prepare the questions of interest and it takes time for software engineers to answer the questions. Gaining access to different companies and their employees proved to be the greatest obstacle. The survey was done using a mix of email sent to the participants and in some cases in person i.e. actually handing out the questionnaires to the participants.

164

Since this research aimed to explain to what extent and how software is being reused in the software industry we chose several different organizations. The survey used closed and open ended questions. When the surveys were collected, the answers were checked and if any ambiguous answers were found, the survey participant was contacted and additional questions were asked in order to avoid misinterpretations. In the following sections we outline more detail about each of the steps.

7.2.1 Step 1: Gather Good Candidate Questions

Questionnaires are a well-established technique for collecting demographic data and professional opinion. They are similar to interviews and can have open and closed ended questions [221]. In selecting the questions we put a lot of effort into ensuring that questions are clearly worded and the data collected can be analysed efficiently. It was particularly important to encourage people to respond to the survey and resolve any ambiguities or misunderstandings with respondents. We took into consideration that well designed questionnaires are good at getting answers to specific questions from a large group of people, and especially if that group is spread across a wide geographical area, making it infeasible for us to visit and interview them all. We developed a Questionnaire Report after this step.

7.2.2 Step 2: Select Questions of Interest

It can be harder to develop good questionnaire questions compared with structured interview questions because the interviewer is not available to explain the questions or to clarify any ambiguities. Because of this it is important that questions are specific to software reuse. We chose to ask open and closed ended questions and a range of answers were offered, including a ‘yes’, ‘no’ or ‘may-be’ where possible for closed ended questions. We decided not to mix negative and positive statements in our questionnaire because the questionnaire was already complex enough without forcing participants to pay attention to the direction of the argument. Our questionnaire starts by asking for basic demographic information, e.g. age range, educational level, details of relevant experience, level of expertise within the

165 domain, etc. This background information was useful for putting the questionnaire responses into context. For example, if two respondents conflict, their different perspectives may be due to their level of experience. We have collected only contextual information that is relevant to our study of software reuse. We developed a Survey Design Report after this step.

7.2.3 Step 3: Select Companies

The main data set for this survey is collected from software engineers working in the area of software development, software maintenance, or software design. We chose people from different organizations who are professionals working in the software industry as programmers, developers, analysts, software architects, managers, researchers, etc. Our respondents liked to keep the organization name undisclosed for privacy reasons and we maintained their privacy. We could see from the respondent’s job description and experience, how much meaningful contribution they can provide in response to our survey. The purpose of our survey is to see how reuse is being done in the CSE and SPL communities and to analyze, compare, and aggregate the survey data in order to derive empirical evidence of key factors of software reuse. We developed a Company Participation Report at the end of this step.

7.2.4 Step 4: Analyze Strategies for Conducting Survey

We had a set of specific questions, which followed the more general questions that contribute to the data gathering goal. Our questionnaire consisted of 51 questions in total. We decided that questions should be subdivided into related topics to make it easier and more logical to complete the questionnaire. We designed and subdivided the questionnaire into the following sections: • Section 1: Section 1 has General Questions and this section captures answers about years of experience, role of the participant in software industry, and level of educational qualification. Some of the key questions are: 1. What is your educational Level?

166

2. How many years of experience do you have in Software Engineering? 3. What do you consider yourself?

• Section 2: Section 2 has Reuse Measurement Questions and this section captures answers about reuse measurement, advantages, disadvantages, and factors influencing reuse in the software development community. This section consists of 16 questions. Some of the key questions are: 1. What do you feel are the key benefits of reuse? 2. What do you feel is the main disadvantage with code reuse today? 3. In your opinion, does reuse education influence reuse? 4. What do you feel are the factors contributing to facilitating reuse? 5. Do you feel there is increased recognition towards reuse?

• Section 3: Section 3 has Reuse Technical Questions and this section captures technical answers about project/team and its working environment which effects reuse in some way. This section consists of 10 questions. Some of the key questions are: 1. Do you find yourself having much freedom in your work as a developer? 2. How do you know if you have fulfilled the requirements, specifications or goals for particular software? 3. Do you have flexible time frames when working in projects? 4. How often do you move deadlines in projects? 5. How often do you spend time writing code and reuse a class/ component?

• Section 4: Section 4 has Testing Reused Code Questions and this section captures answers about how analysis, design, coding and testing of software affect code reuse. This section consists of 10 questions. Some of the key questions are: 167

1. Which part do you find most troublesome: Analysis, Design, Implantation, V&V/testing? 2. How often do you test your software? 3. Do you use any specific test framework? If yes, please specify which framework(s). 4. In case of time shortage during a project, which part do you find is being reduced firstly? Analysis, Design, Implementation, V&V/testing? • Section 5: Section 5 has Development Environment for Reuse Questions and this section captures answers about the choice of frameworks, use of software architecture, classes and libraries and the development environment where reuse can be a significant question. This section consists of 11 questions. Some of the key questions are: 1. How often do you search for re-usable code (i.e. libraries, components, classes) instead of doing it yourself? 2. What sort of development environment do you usually use? 3. Does the complexity of (a) component(s) seriously affect decisions on whether you develop it yourself? 4. What do you think should be improved in today’s components technologies? Do they miss a feature that a developer might need? 5. Do you feel the need for well documented software architecture of the system in order to reuse the software?

As our survey was being done for two communities the CSE and SPL, we developed the questionnaire for the CSE community to start with and then when designing the questionnaire for the SPL community we used the same set of questions as designed for CSE Community but with an addition of specific questions targeted to the SPL community. We developed a Strategy Requirement Report after this step. This report suggested the order of our questions to encourage people to participate in the survey.

168

7.2.5 Step 5: Analyze Survey Results

We fed the collected data set into Microsoft Excel for analysis. The first step in analyzing the data was to clean up the raw data. Raw data from the questionnaire consisted of the respondents’ answers to the questions in written format. It was necessary for us to clean up the data by removing entries where the respondent had misunderstood a question. Our data cleansing strategy involved phoning the respondent or following up with the respondent via email so that all ambiguities are cleared up. We cleansed the data and filtered it into different data-sets according to the experience and the professional level of the respondent. The second step was to divide the data for quantitative analysis and qualitative analysis. Quantitative data is in the form of numbers, or it can be easily translated into numbers. Quantitative analysis uses numerical methods to ascertain the magnitude, amount, or size of something. While qualitative data is difficult to measure, count, or express in numerical terms. Quantitative analysis techniques we used were averages and percentages. Percentages are useful for standardizing the data, particularly if we want to compare two or more large sets of responses. Averages and percentages are fairly well-known numerical measures [221]. Before any analysis can take place, the data needed to be collated into analyzable data sets. We translated the quantitative data sets into rows and columns, where one row equals one record, e.g. respondent. And we used Excel, this made simple manipulations and data set filtering easier. Initial analysis involved calculating averages and percentages, and identifying any ‘outliers’, i.e. values that are significantly different. Producing a graphical representation of data helped us to get an overall view of the data and any patterns it contains. Qualitative analysis is used to gain an overall impression of data and to start looking for patterns in the data collected. Some answers to software reuse concerns have emerged during the data gathering itself, and we had some idea of the kind of software reuse (opportunistic, vertical or horizontal) done in any organization. But it is important for us to confirm and re-confirm findings to make sure initial impressions are not being biased for our analysis. We kept clear and consistent records of what has been found and emerged from our findings. For qualitative analysis, we studied the collected data, focused on our

169 software reuse goals, and kept clear records of the analysis as it progressed. We developed a Survey Result Report. We published several papers on this Survey Result Report [18], [19], [20] and [222].

7.3 Chapter Summary

This chapter presented the way we have organized our survey and the framework we used to conduct our survey. It described in detail the planning and execution of each step of our framework and the milestone of each of the steps. We incorporated five steps into our framework. The first step was Gather Good Candidate Questions in which we put together a large set of clearly worded questions around software reuse. The second step was Select Questions of Interest in which we selected from this large set a mixture of quantitative and qualitative questions which would help us to gather then information that we were looking for. The third step was Select Companies in which we selected companies whom we would like to participate in our survey and so that the information collected is useful to us. The fourth step was Analyze Strategies for Conducting Survey in which we categorized our questions so that participants follow a logical sequence while answering the questions. The final and the fifth step was Analyze Survey Results in which we used Microsoft Excel tool to anise our fed data. At the end of each step we developed a report to capture the output from that step. The most important output was the Survey Result Report from the last step and this report detailed the issues and concerns in software reuse we identified from the surveys. From this report we published several papers. In the following chapters we will outline the results from conducting the survey in the Conventional Software Engineering community and within the Software Product Line community.

170

CHAPTER 8: CONCERNS IN SOFTWARE REUSE IN CONVENTIONAL SOFTWARE ENGINEERING

8.1 Introduction

The aim of this chapter is to identify and explore the issues and concerns in software reuse in the Conventional Software Engineering (CSE) community. Software development in the CSE community needs best practices and tools to support software engineers to develop software that is as fault free as possible. Software development cannot possibly become an engineering discipline so long as it has not perfected a technology for developing products from reusable assets in a routine manner on an industrial scale. As discussed in Chapter 2 software reuse has many advantages including cost reduction, faster software development, and low maintenance cost. Software maintenance is defined as the performance of all activities required to keep a software system operational and responsive after delivery. This includes adaptations to changing requirements and correction of faults. As discussed in Chapter 6 the focus of our modernization approach is to keep the functionality of the legacy system unchanged and improve on the quality attribute of maintainability. The core concept of keeping the functionality unaltered is to restructure and reuse the code and other artefacts such as class specifications, any source document, design patterns, algorithms or code modules. Maintenance is inherently reuse-oriented [223]. So for modernization purposes, issues and concerns in software reuse must be identified. This chapter discusses in detail the issues and concerns which have been identified, in the survey with the CSE community, in the area of software reuse to benefit legacy software systems modernization. Although successful software reuse experiments are increasingly common [224], [225] success is not the norm. Software reuse is not a matter of routine practice and is mainly done in an ad-hoc manner. The promise of software reuse remains for the most part unfulfilled and a number of issues such as what to reuse,

171 where reusable assets can be found, whether domain engineering is a crucial aspect of reuse, etc. remain worthy of further research [24]. Additional exploration of software reuse issues and concerns will lead to a better understanding of software reuse and can form the basis for additional work to help fulfill the promise of software reuse. One aspect of software reuse is that it is different from software design. Good software design advocates designing software from reusable assets and producing software systems with the perspective that they might be reused. Software reuse deals with producing reusable assets and exploiting reusable assets, so as to make good software design a routine practice. We had 18 software companies and 4 Universities to participate in our survey. Altogether we had 60 participants from research; academia and industry. We had 60 respondents to participate in this survey. The summary of the demographics of the respondents are: Total number of respondents: 60 Experience (Postgraduate): 45 Other IT Professional Experience: 15 Domain (Industry): 40 Academia and research: 20 We have conducted a survey of software reuse the focus of which is to identify issues and concerns in software reuse which may pave a path to developing a suitable software reuse process for software organizations and may be used for developing efficient methods of software reuse for the software community. The embedded knowledge about reuse in an organization could be uncovered using many different methods. We used the survey method so that we could determine to what extent reuse was taking place in projects today, how developers use and reuse different types of code, how much testing they are doing on the code, whether or not the code is more reliable and what are the issues and concerns which are inhibiting software reuse in the software development life cycle. The survey questionnaire consists of 51 questions. All the survey questions are given in Appendix A. We are following the survey framework as discussed in Chapter 7. Results and analyses are discussed in the following sections.

172

9.2 Presentation

A total of 51 questions were put together and organized into five subsections: general (2 questions), software reuse measurement in SPL (16 questions), software reuse technical aspects in SPL (10 questions), testing and reliability of reused software in SPL (10 questions), and development environment for reuse in SPL (13 questions). The survey questions are outlined in Appendix B. The general questions were asked to capture the educational and experience level of the respondents. Reuse related questions included, for example, identifying the level of reuse being done in SPL, measuring the reuse level, testing of reused code, development environment used, etc. The survey gathered information regarding product line activities, in particular the reuse of software, within industry. The survey was targeted specifically at the software product line community. This survey was both quantitative and qualitative so as to gather current information about software reuse from organizations and people active in software product line efforts.

8.2 Results and Analysis 8.2.1 Section 1: General Questions

Of the survey respondents 75% had an educational level of postgraduate in IT. Only 25% had other professional qualification in IT. We did not specifically ask the age of the software engineers as this for some people may have been perceived as a violation of their privacy. However, we asked the years of experience they had in the Software Engineering field. The survey result shows the years of experiences of the respondents’ ranges from 10 to 20 years. Simply put these numbers could mean that we have a more mature population in the sense of working experience and educational level. General questions matches to the other studies conducted in the area of software engineering [226]. This indicates we have gathered a good sample population which means the results we analyze could answer our software reuse issues and its related questions. Our sample reflects the average software engineer

173 and developer working in software development, software maintenance, software modernization, etc.

8.2.2 Section 2: Software Reuse Measurement

Reuse measurement is crucial for determining if a software reuse program is succeeding [227], but we found that not many organizations are measuring reuse. Software engineers have many reusable assets, artefacts, code, etc. available to them, but do they actually use them and find them valuable. Our data says the answer is both Yes and No. 50% of the respondents say yes and 50% say no. Software reuse is based on many factors such as complexity to reuse, Not-Invented-Here (NIH) syndrome, reliability of the reusable asset etc.. Most of the time software engineers want to do reuse but they cannot find the reliable software reuse asset. People do believe that there are lots of benefits of reuse. 100% of the respondents say that it is beneficial to reuse software for higher productivity, quicker time to market, better use of resources, and increased quality. However, frequency of reusing a piece of code/class/core asset is not done unless it is provided. On the other hand, the issues and concerns of software reuse were the main focus of the survey. Looking at the survey results we have come to a conclusion that 88% of the software engineers do not feel comfortable in reusing code. Some of the major concerns shown are lack of tool support, increased maintenance cost, NIH syndrome, maintaining a component library and finding reusable components. These numbers could mean that there is not a standardized process for software reuse and software engineers are not comfortable using code not written in their projects. Software engineers were asked several questions with regard to reuse in software engineering. Almost 58% of the respondents said that they usually had some elements of reuse in their projects. But only 8% of these respondents were from the development sector. So looking at the result we can conclude that 8% out of 58% are actually doing reuse in industry. One of the main reasons given in the survey was that code was not owned by the consultants engaged in the development of a system but rather it was owned by the customer who pays for the work, so the code could not be reused across consulting engagements. In general the code is not open source and was proprietary to the development organization. Naturally this makes it harder

174 to reuse the code later for the development of other systems. Only 45% of the software engineers actively search for code to be reused. The low number is not strange when one considers that people creating components almost never certify them in any way (either in-house or commercial). Only 8% of the respondents use some sort of certification of components on a regular basis. Some of the software components are available to the software engineers as Commercial-off-the-shelf (COTS) components. These components are available in the market for reuse. Software developers who are aware of COTS components are reusing these COTS components. If software engineers and developers are aware of the availability of code and know how to reuse it, they would prefer to reuse. The complexity aspect of reuse is what makes software engineers and developers see a great advantage. When it comes to deciding to build from scratch or to reuse, respondents were of the opinion if components are available and reliable they would prefer reusing them than building from scratch. Unfortunately most respondents found that components performing complex tasks were hard to find. The reason for this, claimed by many software engineers, was that these types of components usually contain business logic made specifically for one company. This kind of software they called bespoke software. The components are custom made and cannot be reused in another company. This gives us an open question to think about classifying reuse. Reuse which is domain specific (within an organization or for a particular type of product) and reuse which is across organizations and can be used in different products. But many projects are one off and the experience and code developed on one project cannot be reused and transported to another project. This leads to a conclusion to classify components into different types. Figure 8.1 shows different types of reusable components as discussed above.

175

Figure 8.1: Classification of Reusable Components

Domain-Specific Components are components that usually contain business logic made specifically for a particular type of system for one company and cannot be reused across other companies. Universal Components are components which can be reused across other companies and One-Off Components are made for one project only and cannot be reused anywhere else even within the same organization.

8.2.3 Section 3: Software Reuse Technical Aspects

Many software engineers believe that they need to change their programming language to promote reuse. However, opinion on the importance of the choice of programming language is divided. Some respondents think that language is of more importance for reuse, and advocate that features of the languages such as Ada, C++, Smalltalk, and Java provide better reuse support. 65% of our respondents believe that programming language supports reuse. Past analyses of the effects of languages on the various aspects of the software development process have shown that programming language may not be very important [228]. Respondents also talked about reuse and porting software to new environments. 80% of the respondents are of the opinion of porting the same system to a new environment and 20% respondents are of the opinion n reusing software components to a different environment for a different system. Software reuse, which is the use of existing software knowledge or artefacts to build new software artefacts, should not be confused with porting. The two are distinguished as follows: Reuse is 176 using an asset in different systems; porting is moving a system across environments or platforms. For example, in Figure 8.2 if components of System A developed for a particular environment, Environment 1, (shown as colored stars) are reused again in System B for a different environment, Environment 2; then this is an example of reuse. If System A, developed for a particular environment, Environment 1, is shown moved into a new environment, Environment 2; then this is an example of porting.

Environment 1 System A Environment 2 System B

Reusable Component Reusable component Reuse Porting

Environment 2 System A

Reusable Component

Figure 8.2: Difference between Reuse and Porting

Meeting deadlines for projects is still an important issue in today’s software projects. Figure 8.3 shows the analysis of the question “How often do you move deadlines in projects?” We found that 45% of the respondents stated that project completion deadlines are moved often and 30% of the respondents stated it is moved rarely. With this figure there is still room for improvement. According to the CHAOS report by The Standish Group [229] 9% of the projects in larger software companies are on-time and on-budget; in our case we have approximately 25% on time. Over 57% of the respondents claimed that the majority of the projects they took

177 part in encompassed more than 100,000 bytes. According to Ghosh, et al. [230] in the FLOSS study the mean value is 346,403 bytes of software source code in the average project.

Moving Deadlines

Never , 25%

Often , 45% Often Rarely Never

Rarely , 30%

Figure 8.3: Moving Deadlines

The large volume of literature on reuse CASE tools and their growing market show that many organizations regard CASE tools as a way to improve reuse. To study this question, participants were asked whether they agreed with the statement, “CASE tools have promoted reuse across projects in their organization”. The data shows that respondents generally feel that CASE tools have not promoted reuse across projects in their organization; 65% of respondents do not agree even somewhat that CASE tools have promoted reuse. We conclude that CASE tools are not currently very effective in promoting reuse. There are at least three reasons why this may be so: reuse CASE tools may not be used at all; they may not be used correctly; or they may not be effective in promoting reuse even when they are used correctly. However SOA and some product line tools promote reuse [231]. Many of the respondents believe that 25% of their software development time is used in testing when they try to reuse a piece of code/class/component from another project. And the size of the component does not affect the decisions on

178 whether they develop it themselves or reuse it. However 25% believe that software engineers prefer to build their own software rather reuse someone else’s. This is often referred to as the “Not Invented Here” (NIH) syndrome. They agreed it is more interesting to write their own software rather than try to reuse where they are not sure about reliability. Most respondents 65% do not have the NIH syndrome as long as the component is reliable. We can conclude that most software engineers prefer to reuse rather than build from scratch if they can find the component and the component is reliable. Many respondents believe that their work is increased when they reuse other’s code. 92% of our respondents believe that there is an increase in work when reusing other’s code. 8% believed that it will not increase their work load. They also said it depends on the complexity of software reuse. 94% of the respondents believe the code requires extra testing and writing test cases for these components could be cumbersome. According to the definition of software reuse anything can be reused from the entire application, to components, documents, objects, functions, structures, etc [228]. The only problem is that there is no process which can suggest how to reuse components or documents or test cases. Hence there may be a chance that if a reuse process is developed it will promote reuse. Most of the respondents do believe that reuse is cost saving, a quicker time to market as reuse of components is quicker than developing from scratch. Reuse also gives improved confidence in a component if it is used in many systems. A systematic reuse process will save time, cost, and also can impact development time of software. Respondents have also talked about the reuse philosophy and the need for it to be taught to software engineers and developers. Not many respondents feel that a company’s division’s or projects size is predictive of organizational reuse. However they feel that educational/professional background of the key people involved in the technical decision making is predictive. For example if these people have been taught about reuse whether product lines, SOA, etc. then they are more likely to introduce a reuse philosophy within their organization. 90% of our respondents believe that reuse education helps in reuse practice. Participants were asked “In your opinion, do quality concerns inhibit reuse? If so, please state what quality concerns are most common in your organization?” The responses we received to this question are: 78% believe that if one is reusing 179 code/components developed by others then there may be concerns about reliability, security, performance, etc. However, 82% also believe that if knowing that a component (maybe a service in an SOA) has been used in the development of multiple systems this can help to build confidence in that component. Domain engineering is the activity of collecting, organizing, and storing past experiences in building systems or parts of systems in a particular domain in the form of reusable assets, as well as providing an adequate means for reusing these assets when building new systems. All the respondents are of the opinion that domain knowledge is the key to reuse software. And certainly this is the case when the context of software development is a product line [231]. One cannot decide what software to build as reusable software without understanding the domain and identifying common functionality/features that can be developed as reusable software. The main advantages of the Software Product Line (SPL) approach are:

• The product line’s wide engineering vision can be shared among the projects easily; • Development knowledge and corporate expertise can be utilized efficiently across projects; and • Assets can be managed systematically.

The SPL approach often requires a large upfront capital investment to create an organizational structure dedicated to implementing a reuse program and it takes time to see a return on that investment [232]. Identification and understanding of critical components for reuse is not done by our respondent and 100% believed that if they get the component then only they reuse it. Our respondents do not look for reusable components. These respondents are also not sure how software architecture can be used for software reuse and the kind of role software architecture plays in software reuse.

180

8.2.4 Section 4: Testing and the Reliability of Reused Software

Reuse is likely to occur only if potential users are confident of the quality of reusable assets. Reuse-in-the-small was a problem in the 1950’s because not many components were developed at that time. Organizations were creating/developing functional requirements to fit their organizational needs. Since then the software community has built huge, useful collections of mathematical and data-processing library routines, routines that are commonly used to this day. Reuse-in-the-large is and always has been an unsolved problem because software assets/components were not developed for reuse. In spite of the enthusiasm of the reusable components, finding in a library of components the precise one that will solve the problem at hand is nearly an impossible task. The problem is twofold. Firstly, we cannot find the perfect component in the library of reusable parts. Secondly even if we find the perfect component, is the component reliable enough to use. Having a documented architecture of the software can make understanding of the system and how system reliability depends on the component’s reliability and its interfaces. The software behavior with respect to the manner in which different modules of the software interact is defined through the software architecture. It is commonly thought that software engineers distrust assets developed outside their immediate environment, and thus are less likely to reuse assets from outside their organization. Our survey has revealed if the software architecture is provided to software engineers, they can make software reuse common in practice. Quality attributes such as reliability could be measured using architecture based software reliability. Software engineers would feel more comfortable reusing reliable components. We considered several responses in investigating this issue. Participants were asked to rate their agreement with the statement, “Software developed elsewhere is reliable” and were also asked, “Do you test a component in any way before you reuse it?” Our survey has shown a strong correlation between these variables suggesting that quality concerns are very much related to the amount of external reuse. Almost 63% of our respondents stated that they had some element of reuse in their code but only 40% of these claimed that they tested the reused code in any way The reason

181 for this was primarily that they found writing test cases afterwards too tedious. 78% of the respondents show concern about reliability of the components to be reused. 22% of the respondents were not sure. Participants were also asked if they feel current software engineering practice is influencing reuse. Responses to this question (which are negative) should change the way we do reuse currently. 88% of the respondents feel reuse education may help them learn more about recent reuse technology in making reuse possible in their organization. Only 1% of the respondents are from a group that uses a product line approach. That supports our analysis that reuse education is lacking in the software community. In some industries reuse is more common than others. For example the automotive industry uses product line approaches where there is a lot of reuse. There is a move across many sectors to use Service Oriented Architecture (SOA) approaches which have reuse of services [233]. However many organizations don’t really know how this is achieved. This leads us back to our analysis of educating software engineers and developers in reuse philosophy. Reusing software components may also alter time allocated in software development phases. If a component is already available for reuse then the analysis and design phase time can be minimized. The result we have collected say 50% of the time is used in analysis and design phase. This 50% of the time is for the analysis and design of all the components used in the software. However, if there are more reusable components more time can potentially be saved out of this 50%. We could not get this result as to how much time can be saved if one or more component is reused. Time can also be saved during the testing phase if reusable components have been used in the development of multiple systems. On average respondents have said that they use 30% of development of project’s time in the analysis phase, 20% in the design phase, 30% in the implementation phase and 20% in the verification and testing phase. Figure 8.4 shows the distribution of time in each phase from our respondents. If the components are reused then the analysis, design and testing time can be brought down.

182

Time Distribution Over Software Development Phases

20% 30%

Analysis Phase Design Phase Implementation Phase V & V Testing Phase 30% 20%

Figure 8.4: Time Distribution over Software Development Phases

There are some good examples of the use of software product lines, which are very successful. One of the examples is based on the real experiences of Nokia. During the analysis phase, it is possible to define the scope of the different mobile phone series. During the requirement phase, requirements are defined for the family, and for the individual types of phones, e.g., Nokia 6100/8300 series. The use of a product line approach gave Nokia the opportunity to increase their production of new mobile phone models from 5-10 to around 30 new models per year [234], [235], [236]. In the requirements phase, the software requirements are made, which can serve as a base for the whole product family. This speeds the overall development process for the software. The design phase is more focused on the individual types of phones. The requirements from the preceding phase are used to create individual software for the type of phone then being developed. This gives a strong example that with the reuse of software components development time can be reduced.

183

8.2.5 Section 5: Development Environment for Reuse

Our survey respondents are not very much aware of the new technologies being accepted in software reuse and the environment in which software reuse is happening such as Software Product Line. Respondents were asked about different development platforms, such as .NET, Enterprise Java and CORBA. The majority of respondents preferred Java and in some cases CORBA as opposed to .NET. Software engineers prefer J2EE on other development platform, which is also supported by other studies [230]. Software engineering aims for the systematic, principled design and deployment of applications that fulfill software’s original promise. 59 % of our respondents believe that software reuse process will promote reuse in their community and these respondents also agreed that software reuse repository will enhance software reuse. 41% of respondents were not sure about how reuse process and repository can improve software reuse. 78% of our respondents believe that domain knowledge is the key to software development and hence said Yes to software reuse. 10% of the respondents did not share this view and 12% of the respondents said the domain knowledge may not be key to reuse. A comprehensive, managed and controlled architectural approach to reuse is given in Software Engineering Institute’s (SEI) Framework for Software Product Line Practice [237]. The SPL Practice Framework outlines 29 practice areas that organizations should address to undertake SPL. The SEl’s SPL patterns outlined in [238] address subsets of these practices that organizations need to establish to introduce a SPL approach. Such an approach is too cumbersome for small and medium-sized companies to use effectively, and in response lightweight approaches to managed reuse have been developed by, e.g. Krueger [228]. These architectural approaches are prescriptive, providing a solution to manage reuse for companies with specific objectives in specific situations. In the SPL context, software products are developed in a two-stage process, i.e. domain engineering and an application engineering process [231], [239]. The domain engineering process is responsible for establishing the reusable platform [231]. The platform consists of all software development phases and the existing artefacts. The platform should have artefacts such as requirements specification, architecture

184 documentation, design specification, implementation artefacts, and test cases. Our survey has shown that if these components of domain engineering are achievable, reuse would not be an issue as much as it is today. The application engineering process is responsible for deriving product line applications or products from the platform established in domain engineering [231].

8.3 Chapter Summary

This chapter has presented qualitative and quantitative analysis of information about software reuse gathered from respondents working in the conventional software engineer field who have been working in different organizations at various levels. The survey data has been analyzed and discussed. The analysis presents the need for software reuse and the critical factors which effects and inhibits software engineers and developers adopting software reuse in software development or in legacy modernization. As maintenance of legacy system becomes difficult and there is a need to modernize these systems, to keep the functionality of the system unchanged, software reuse has to be made a core concept in the modernization approach. So software reuse, maintenance and modernization are very much related to each other. Our modernization approach, as discussed in Chapter 4, for a legacy software system is based on software reuse. We found from our survey that software engineers and developers are interested in reuse of software components available if the components are reliable. 97% of our respondents believe that common reusable components may be widely used if they know how to use the component. We also found that most software engineers would prefer to reuse software rather than build it from scratch, contradicting the common wisdom that software engineers prefer building things themselves rather than reusing. Our survey has also revealed that the software engineers have trouble finding the perfect and reliable component to reuse because software engineers do not know where to look for the components to reuse. Education on software reuse is not very common in most organization and hence it affects software reuse. Reuse and domain engineering needs to be integrated and put more commonly into practice if NIH syndrome is to be avoided which is inhibiting 185 common reuse practices among software engineers. Better system reliability is one of the goals of software reuse. It is argued that reusable components, because of their more careful design and testing and broader and more extensive usage, can be more reliable. If so, then using these more reliable components in a system can increase the reliability of the system as a whole. Our survey has shown that education is a primary factor in better reuse, yet there has been little systematic study of how best to do reuse education. Today software engineers working on modernization of a legacy system must be knowledgeable enough to support software reuse for modernization. Certainly both academia and industry could improve education practices around software reuse. One way to do this and to facilitate better reuse technology transfer would be to have better joint work between industry and academia where modernization that is incorporating software reuse is happening. Software reuse is a core concept in software product lines. Descriptive studies have also been undertaken that describe how companies manage SPLs [240]. The use of a product line approach gave Nokia the opportunity to increase their production of new mobile phone models from 5-10 to around 30 per year [238].

186

CHAPTER 9: CONCERNS IN SOFTWARE REUSE IN SOFTWARE PRODUCT LINES

9.1 Introduction

The objective of this chapter is to identify and explore the issues and concerns in software reuse in the Software Product Line (SPL) community. We have seen in Chapter 8 that software reuse has many concerns in the Conventional Software Engineering community. We wanted to explore these concerns further in a community where software reuse is happening, i.e. the SPL community. As we have discussed in Chapter 2, one of the reasons for introducing software product lines is the reduction of costs through reusing common assets for different products. The SPL community is the second community of interest where software reuse is very prominent. The SPL community reuses common assets for different products of the same family. Even in this case software reuse is not an easy task. Some of the reasons for this could be: increasing complexity due to the multitude of different functions and their interactions as well as a rising number of different products variants. The need to rapidly deploy high quality families of software products has led researchers and practitioners to investigate how to integrate software reuse in software product lines. Also in SPL it is evident that software reuse is a core concept. Software reuse has, however, failed to become a standardized practice in SPL. In conducting our survey we wanted to find an answer to the question “What are the issues and concerns of software reuse in the software product line community”? This will allow us to compare issues and concerns in software reuse in both the Software Product Line community and Conventional Software Engineering community. We wanted to find if the issues and concerns in software reuse in the SPL community mean that systematic reuse does not happen. If so the result will further suggest to us to develop a systematic software reuse model which can be used by both the CSE and SPL communities.

187

The study of software product lines addresses the issues of engineering software system families, or collections of similar systems [238], [240]. The objective of a software product line is to reduce the overall engineering effort required to produce a collection of similar systems by capitalizing on the commonality among the systems and by formally managing the variation among the systems. This is a classic software reuse problem [228]. The primary focus of software product line research has been on a number of issues such as domain analysis and modeling, architecture modeling, process definition, etc [226]. Research in software product lines [232], [231] and [239] shows that there are success stories, but there are no strong rules that can be derived from these success stories about how software is reused. Systematic reuse of software still remains a classic problem. However, software reuse has been identified as an important topic in most of the literature on SPL. Systematic reuse is one of the goals of software product lines, meaning that components must not only be copied but actually shared among several subprojects. Also a software product will not only consist of shared components for the whole software product line, but some of the components will be specific to a single product. This means that the ability to reuse software depends on the commonality and variability requirements. The foundation of systematic reuse is the systematic analysis of requirements in software product lines. Requirements engineering is not the only phase in the SPL software development life cycle where there may be problems with reuse. There may be other un-noticed and un-documented problems associated with software reuse at other phases such as the analysis, design, V&V/testing, or the implementation phase. Also problems in software reuse can occur with the use of a particular Integrated Development Environment (IDE). SPL practitioners come across many such software reuse problems. We surveyed the software product line practitioner community (managers, developers, architects, and researchers) to help identify and document software reuse concerns in SPLs. Up to now the software reuse issues and concerns have not been surveyed, compared and documented in a systematic way. We used the survey method to identify constraints on software reuse, to determine to what extent reuse is taking place in SPL projects today, how developers and programmers use and reuse different types of code, how much testing they are 188 doing on the code, whether or not the code is more reliable and what are the issues and concerns with the use of Integrated Development Environments (IDEs) in today’s SPL community which are inhibiting software reuse in SPLs. There are many interesting outcomes we have come across when we analyzed the results of the surveys to document the reuse issues and concerns associated with SPLs. Some results are in sync with the literature and some are out of sync and are contradictory.

9.2 Presentation

We had 5 companies and 4 Universities to participate in this survey. The survey questions were divided into 5 sections. They are: • Section 1: General Questions: This section captures answers about years of experience, role of the participant in software product line, and level of educational qualification. The past experience of people working in software product line reflects the correct data on software reuse in SPL and what kind of software reuse is done in SPL. Responses from experienced software product line practitioners or managers at high technology companies involved in software product line are fairly representative. This data can be treated as the benchmark of current practices in software reuse available in the software product line. • Reuse Measurement Questions: This section captures answers about reuse measurement, advantages, disadvantages, and factors influencing reuse in the software product line community. The data collected here represents the attitudes, beliefs, and practices in reusing code and other lifecycle objects in SPL Community. Reuse of software products, processes, and knowledge are the key to enable the software development industry to achieve the dramatic improvement in productivity and quality. But are they being used in SPL Community? • Reuse Technical Questions: This section captures technical answers about project/team and its working environment in SPL which effects reuse in some way. What is the reason the core concept of SPL is reuse? • Testing Reused Code Questions: This section captures answers about how analysis, design, coding and testing of software affect code reuse in SPL Community. A

189 software reuse technology allows broad and extensive use of reusable objects. How this is done in SPL Community. • Development Environment for Reuse Questions: This section captures answers about the choice of frameworks, use of software architecture, classes and libraries and the development environment in SPL Community where reuse is a core concept. Creating objects so they are reusable in the future, and integrating the objects to build a new system depends on the development environment used. What kind of development environment is used in SPL Community?

9.3 Results and Analysis 9.3.1 Section 1: General Questions

Of the survey respondents 17% had less than 5 years of experience in SPL, 37% had experience of 5-10 years, 37% had experience of 10-20 years and only 6% had more than 20 years of experience. The survey result shows the years of experience for the majority of the respondents ranges from 5 to 20 years. Figure 9.1 shows the role of the respondents in SPL development. 65% of the respondents were researchers and managers in SPL, 20% were product line architects, 6% each were core asset architects, and 9% were R&D project leaders. Simply put these numbers could mean that we have a more mature population in the sense of working experience in an SPL context. This could indicate that we have gathered a good sample population that could answer our reuse and reliability questions, thus reflecting the average software product line engineer and developer. We had total number of respondents: 16 Domain (Industry): 9 Research academia: 7

190

Researchers and Managers (65%)

Product Line Architect(20%)

Core Asset Architect (6%)

R&D Project Leaders (9%)

Figure 9.1: Respondents’ Role in SPL Development

9.3.2 Section 2: Software Reuse Measurement

The benefits of product line engineering have been extensively discussed in the literature and have widely been recognized in industry. This section of the survey has captured answers about reuse measurement, its advantages, disadvantages and factors influencing reuse in the software product line community. The complexity and factors affecting software reuse in SPL need to be evaluated. The key benefits of reuse in SPL have been widely accepted and our survey result reflects the same. 100% of the respondents believe that reuse in SPL will achieve increased quality, planned productivity, capture of domain knowledge, and cost reduction, to name but a few. Our respondents believe that SPL necessitates effective strategic planning and product line road mapping; market analysis; change from bespoke customer relationships and projects toward market, product and service orientation; effective requirements, scope, and release management of products and services. Moreover, the scope of reuse is broadened from software (code) reuse to the reuse of all domain artefacts.

191

Software product line engineers have many reusable assets available to them, but do they actually use them and find them valuable. People believe that there are lots of key benefits of reuse. The advantage thus is that reuse is considered to be of strategic importance in organizations adopting SPL practices and tools. However, the disadvantages associated with reuse in SPL were maintenance cost and startup cost. The major disadvantages respondents have highlighted are complexity – the “gravity” of software engineering: reuse can add complexity by creating dependencies between previously autonomous organizational units. Some of the problems with the dependencies are:

• Web of dependencies: can lead to a “lockstep” evolution model in which everyone has to evolve synchronously;

• Coordination cost: dependencies require significant synchronization and alignment, diminishing the benefits of strategic reuse;

• Cost of offering integration: often the cost of offering integration is higher than expected due to the complexity of configuring and integrating the selected shared assets;

• Process and tool divergence: teams with diverging “external” interfaces, e.g. different release cycles and mechanisms, “creative” interface management, immature requirements management, lacking quality management, etc. cause significantly higher creation cost and jeopardize the product line effort;

Respondents believe that software reuse “is not sufficiently domain based”. Some of the responses from the respondents include “Documentation may be missing, the trace between requirements and design artefacts may be missing, and the understanding of the interplay between different functions may be missing”, “There is no real way of translating practice into theory”, “Widespread reusable software with inappropriate design”, “People reuse software without solving architecture mismatches”, “There is too much reusable software that is large grained such as in SOA (Service Oriented Architecture). Loosely coupled software modules are hard to be developed due to the difficulty to define the right interfaces, since SOA usually 192 erodes other quality attributes such as performance and scalability”, and “Software developers are interested in developing software for-reuse but not developing a product with reusable software components”. Reuse might also hinder new ideas and the making of innovations. Balance between innovation and reuse should be determined by the company’s strategy. The opinion that ‘copy and paste’ is THE reuse strategy should be removed from SPL. Respondents are of the opinion that to maintain asset health, regular investments should be done to keep code healthy and keep the number of variations manageable. Participants were also asked if they feel current software product line engineering practice is influencing reuse. Responses to this question are negative. And respondents felt that software engineers should change the way they do reuse currently. 87% of the respondents feel reuse education will definitely help them learn more about recent reuse technology in making reuse possible in their organization. Also companies need to have development policies that mandate reuse. The companies that are serious about SPL must educate all levels (technical management, higher level management, and developers) about what SPL is, how the company plans to achieve it and what their role will be in making the transition successful. However, more than half of the respondents (55%) feel that work may be increased when they are reusing other’s code. 20% feel that reusing other’s code will not increase their work. However, 26% believe that it will definitely increase their work. 100% of the respondents believe that there is increased recognition of reuse. But it will not be achieved with just education. The respondents listed the factors contributing to facilitating reuse in SPL. They felt that SPL reuse has to be planned in advance. In their view, the central factor is to anticipate common and variable artefacts in the domain of the product line to provide useful reusable artefacts. The alignment of four views of SPL (business, architecture, process and organization) is fundamental for achieving benefits from SPL based product development. According to the respondents reuse is part of SPL, and its facilitators are: high expertise in domain knowledge, technical skills in SPL techniques, tool support, the number of software units produced in an SPL, motivated personnel, and high management commitment and support for integrating domain and technical expertise. Long term strategies and commitments

193 are required. Standards should be established on how to communicate all properties of reusable components. The reuse policy should be tracked and managed by engineering managers and/or quality managers and/or configuration managers; include planned variation and variation mechanisms; include mechanisms to deal with change control conflicts on assets; include rotation of product team members among different products, and/or core asset teams. Respondents agreed that SPL is very challenging to institutionalize as long as the costs of implementing an organization-wide SPL program are perceived to outweigh the benefits. The most important factor is then to achieve the break-even point between costs and benefits as quickly as possible. 85% of the respondents felt that software engineering practice influences reuse. Using core assets, the configuration management philosophies, the amount and type of documentation, requirements elicitation, and many other software engineering practices all influence reuse. 90% of the respondents felt the increased recognition of reuse as the software product line success stories have an impact on companies. It was also felt that a common software process would promote reuse. There is some benefit in having a common process across products within an SPL. A common process will help if there is an equal emphasis on the technology and the product. The responses to the use of a reuse repository improving code reuse were very good. The clear idea behind the reuse repository in SPL could not be gathered. There is no “reuse repository” used in SPL. But all the respondents believed that the use of a repository with the proper methods will help and enhance software reuse. The fundamental premise of software product lines is that code repositories without the context provided by a software product line do not work. 100% of the respondents believed that software reuse repository will improve software reuse. Reuse is more common in hardware related industries. And there is an increasing “interest” for software reuse and SPL in software based industries, like Enterprise Resource Planning, Customer Resource Management, etc. Software reuse is also common in embedded systems for cars or electronic devices (e.g. mobile phone), air traffic control systems, aerospace and aeronautics. But the telecom industry seems to be the leaders, especially companies like Samsung. Reuse is more common in industries where the domain is mature and there are regulatory 194 requirements imposed on certain aspects (e.g., function blocks in control systems, algorithms in medical systems), industries that have to produce variety-intensive systems (i.e., “new” systems in a short time frame), e.g., mobile phones, consumer electronics, automotive electronics. The majority of the respondents believe that a company’s divisions or project’s size is not predictive of organizational reuse. Respondents are of the opinion that the size has nothing to do with reuse. This is very contradictory with the theory already established where the complexity of the reuse increases with the size of the project [231]. Respondents are also of the opinion that product line approaches must be tailored to organizations. Size is one of the customization factors; but far from being the only one. The size of organization impacts on how reuse is established and made to work. A small company developing sophisticated and customized products does not benefit from reuse. But a small company that is developing products which have 50%-80% of the same components, 20%-50% for customized software, reuse and SPL are beneficial. Products targeted into mass- markets have high potential for reusable software components. So, business and target markets are more important than size of the organization. Instead organizational structures, e.g., a division decomposed into a platform team and several product teams are predictive of organizational reuse. A smaller company may tend to reuse code more effectively as communication is easier. Reuse is much more a function of the commonality and of opportunity for an improved return on investment (ROI). Reuse across projects does not depend on size. So it can be concluded that size is not a good predictor of software reuse in the SPL community. The majority of the respondents felt that quality does not inhibit reuse. Reuse helps to maintain the quality of software if all good software practices of the company are included such as documenting the software, updating it at regular intervals, etc. In organizations that have difficulty managing their quality, there is a resistance to trusting shared assets developed by other teams as it is considered to be risky. Reuse means most code in each product is already of high quality. Quality concerns should actually support SPLs. The quality drivers of a system or product line of systems are keys for business success. Reuse is ‘just’ a technique that may help to achieve (some of) the required quality attributes. 195

But several of the respondents felt that conflicts on core assets (usually arising because of the various stakeholders’ different quality concerns) can limit the modification and/or use of core assets within products. Especially for mass marketed, deeply embedded systems where memory and CPU power are very limited due to cost constraints, reuse (and even modular software design) is inhibited. Also for some real-time systems with specific performance requirements reuse might be inhibited. So performance in terms of efficiency and response time are most inhibitive for reuse. Of course all the domain artefacts that are explicitly designed (typically in an SPL context) to be reused must be tested extensively to uncover both functional and non-functional flaws. In an environment where reuse is not explicitly and strategically planned, quality problems may seriously hamper reuse. It has been shown that domain knowledge is still the key to reuse of software [231]. All of our respondents also agreed on this question. Without domain knowledge common and variable artefacts cannot be anticipated. People believed without domain knowledge there is no hope of reuse. Actually, the domain knowledge is the thing that is reused, not code. However, not every application engineer must have that knowledge if the reuse infrastructure supports the production process adequately. Several of the respondents believe that domain knowledge is necessary but not sufficient. Organizations that have successfully adopted an SPL approach tend to report the solutions to their problems and underplay what they were already doing well. It has been seen that SPL efforts initiated from management, architects, and process engineers are ultimately necessary. It’s a significant factor, but not the key. Management commitment and process discipline to follow through on a reuse agenda that’s driven by business goals are more significant.

9.3.3 Section 3: Software Reuse Technical Aspects in SPL

The large volume of literature on reuse CASE tools and their growing market show that many organizations regard CASE tools as a way to improve reuse. To study this question, participants were asked whether they agreed with the statement, “CASE tools have promoted reuse across projects in their organization”. The survey results show that 50% of the respondents generally feel that CASE tools have not promoted reuse across projects in their organization; while 50% agree that it does. Model Driven Tools and environments are the way to go and less the traditional 196

CASE tools. In Model Driven Development (MDD) and Model Driven Architecture (MDA) reuse is happening at the model level. CASE tools can help reuse design artefacts. Tools with round trip reengineering can help reverse engineer code to better understand the properties of software to enable reuse. We conclude that CASE tools are not currently very effective in promoting reuse. 50% of the respondents are using CASE tools and the other 50% said they are not using them. However approaches such as SOA and some product line tools promote reuse [231]. When asked if given an opportunity to build from scratch or reuse respondents believe that in a product line organization reuse is built in and is clearly not an option. Arbitrarily reusing assets that have not been prepared and proactively planned to be reused for a given task is of course problematic as we know from ad hoc reuse experiences. Respondents are of the opinion that if they find reusable software that fit to the architecture they would prefer to reuse rather than build from the scratch. Software architecture shows the interrelationships and dependency between the software components. Software architecture can also suggest what software components to restructure. Restructuring the software components can eliminate the dead components and identify what components can be reused. So, the architecture is the main key issue that facilitates reuse. However, there might be the necessity to re-factor the code first in order to successfully reuse. If well designed reusable software such as OSS (open source software) is available the respondents prefer to reuse it. However one very interesting answer was received: “Reuse can be done effectively if we know (1) which domain artefacts are available and (2) we can trust the domain artefacts to do the things we want them to do, why not reuse them? The issue then is do we know and how can we know these issues.” This is a very challenging question posed to the SPL community. Domain engineering is the activity of collecting, organizing, and storing past experiences in building systems or parts of systems in a particular domain in the form of reusable assets, as well as providing an adequate means for reusing these assets when building new systems. All the participants are of the opinion that domain knowledge is the key to reuse of software. One cannot decide what software to build as reusable software without understanding the domain and identifying common functionality/features that can be developed as reusable software. According to our respondents the main advantages of the SPL approach are: 197

• The product line’s wide engineering vision can be shared among the projects easily;

• Development knowledge and corporate expertise can be utilized efficiently across projects; and

• Assets can be managed systematically. The SPL approach often requires a large upfront capital investment to create an organizational structure dedicated to implementing a reuse program and it takes time to see a return on investment [232].

9.3.4 Section 4: Testing and Reliability of Reused Software

We considered several questions in our survey in investigating this issue. We were interested in looking at whether or not software engineers trust components built elsewhere. Participants were asked to rate their agreement with the statement, “Software developed elsewhere is reliable” and were also asked, “Do you test a component in any way before you reuse it?” 100% of the respondents felt that the core assets that they reuse are already tested and hence they do not feel the necessity to test them for reliability. 100% respondents felt that usually people spend more time testing product specific components. And usually the problems that are discovered are related to reusable components that were not properly adapted for a specific product. This was not an issue in this survey. When asked if they “understand system and component reliability, their interactions and the process to identify critical components?”, the respondents said that the software architecture of the system helps in the understanding of system behavior. There are several pieces of research that document some of the reusable components typically used (and demanded by customers) in the automotive area. Examples include FMEA (Failure Mode and Effects Analysis) and FTA (Fault Tolerance Analysis).In some cases scenario-based approaches like ATAM (Architecture Tradeoff Analysis Method) [238] have been used to identify critical components of a system. In most SPL teams the identification of critical components is based on experience. The understanding of the critical components is mainly derived from source code and documentation, or actual testing of the components. 198

There is no systematic/formal process to understand system and component reliability and the components’ interactions. System reliability is determined by (field or internal) defect reports – component reliability is determined by how those defect reports are caused by any component. Informal architectural reviews are used to analyze systems and component interactions. 80% of the survey respondents stated that to find reliable components an architecture review is the answer. When asked if they “test a single component, class or core component in any way?”, the respondents said that single classes are typically not tested. Most of the respondents said they do not test single components by themselves. Sometimes single components are tested, but most of the time some integration with other components is already done and the integrated components are tested. Sometimes unit testing is performed on an assembly of components, as the simplest “unit” of test. Unit testing is not widespread. One of the respondents said that perhaps 10% of the components have some sort of unit test. JUnit, WindowTester, httpUnit, and Rational TestManager are the main test frameworks used in SPL. A traditional type of structured approach is used when testing software which involves unit and integration testing. Unit testing focuses on internals of a component. Integration testing is carried out on the combined usage of components. The same testing is done at the model and code levels. Some respondents also said that they don’t do component testing as they have trust in the Eclipse process. Some of the respondents are of the view that, because a component is always developed from the requirements of a product, it must be tested to determine if it fulfills those requirements. Reusing software components may also alter time allocated in the software development phases. If a component is already available for reuse then the analysis and design phase time can be minimized. The results we have collected say that between 5%-20% of the time is spent in the analysis phase, 5%-60% of the time is spent in the design phase, 0%-50% in the implementation phase and 10%-30% for the testing phase. Some of the respondents said they spend 0% of time in implementation and 50% in testing. Figure 9.2 shows the possible range of analysis, design, implementation, and V&V/testing phase from our respondents. When asked which part do they find most troublesome; Analysis, Design, Implementation, or V&V/testing? Most of the respondents were of the opinion that the analysis and the testing parts are most difficult. We can conclude that with proper 199 reuse the time used in analysis and testing can be reduced significantly. Test cases can be reused if documented properly and in sync with the current version of the core asset.

70%

60%

50%

40%

30%

20% Minimum Range Maximum Range 10%

0%

Figure 9.2: Possible Range of Different Phases of SPL

9.3.5 Section 5: Development Environment for Reuse

Reuse will not be carried out if it is not economically feasible. Survey participants were asked what sort of development environment they usually use. The majority of the respondents preferred Java Eclipse, Netbeans or a simple programming IDE, e.g. JCreator. Some of the respondents use a model based approach, UML2 as a standard modelling language extended by ontology oriented design models and tools. Nowadays, Eclipse is used as a tooling platform. A specific environment for embedded systems/microcontrollers e.g., Java projects Eclipse and for PC-based applications typically Visual Studio is used. One of the responses from a product designer was that the environment or programming language depends on

200 the Programmable Logic Controller (PLC) used, as every PLC brand has its own environment for the software created for that PLC. When asked if in their opinion, the choice of framework (e.g., .NET, EJB, CORBA, etc.) affects the possibility for the software to be easily upgradeable in the long term, the responses we received were mixed. One of our respondents was not familiar with these frameworks, some did not agree and the rest believed that the mainstream frameworks would probably be more upgradeable than others. Most of our respondents also agreed that complexity of a component does not affect the decision to develop or reuse the component. If a component has been previously tested, it is used regardless of its complexity. However, one of our respondents believed that complexity and domain knowledge could have an effect on such decisions but this really depends on the business strategy. When asked if open source or commercial projects usually enforce a certain technology 37% of the respondents think that such projects usually enforce such technology such as .NET, J2EE, and CORBA, 27% of the respondents think that such projects usually do not enforce the use of such technology and 36% provided no answer to this question. Our respondents are of view that code is reusable and reliable only if it has been reused several times. Components are designed for the purpose of reuse. Detailed documentation of the component to be reused should be done. A component cannot be reused if is not developed for the purpose of reuse. However, the respondents are also of view that most of the time the developers do not care about whether they feel confident or not reusing a component. According to our respondents proper requirements engineering is vital for the success of an SPL. When requirements elicitation does not work, there is no basis for building an SPL. Also, having requirements gathered only in documents is not sufficient for building an SPL. It is very important to have a more formal requirements database from which requirements can be systematically grouped into a customized requirements document for each product. These requirement documents can be reused for application development. But the challenge it to analyze the existing documents and to detect the features and the variability especially when masses of information are distributed among different types of documents. Therefore

201 it is necessary to have the right requirements engineering tools to support this type of requirements management and manipulation. We received a very positive response on the need of well documented software architecture of the system in order to reuse the components. Well documented software/system architectures are very important to support decision making about reuse and to correctly integrate the different components to reduce the testing time. However, few of the respondents believe that as long as domain expertise remains adequate it is not essential to have architectural details. Experts are good at storing domain knowledge. Architecture helps a lot in reuse but may not always be the case. This leads us to a question what if the experts leave an organization. This is an important unsolved issue with our legacy systems where experts have left an organization and the documentation is out of sync [241]. The quality attributes can be determined by the architecture. Many critical design decisions about software are made long before it is implemented. It is widely recognized that the linchpin of the software development process is software architecture [182]. The reliability attribute is aimed at reducing or eliminating failures in software systems. Our respondents agree there is a strong correlation between software architecture and the reliability issue. There has been some recent research in this area. Tekinerdogan et al. have proposed the Software Architecture Reliability Analysis (SARA) Approach. This approach provides an early software reliability analysis at the architecture design level [242]. The software product line has clearly differentiated the activities of domain engineering and of application engineering. The role of domain engineers is to create core software assets “for reuse” and the role of application engineers is to use the core assets to create products “with reuse”. All of our respondents believe that the most software product lines must be maintained and must evolve over time. Core asset architect respondents believe that maintaining and evolving core assets and integrating core assets in application engineering are the biggest problems for SPL. The core asset architects have also said that sometimes the product-specific software in each product is just for that individual product and cannot be reused in any other product. This requires conventional software engineering approaches such as-clone- and-own. This gives rise to another problem which is the lack of reuse across products. This situation also requires a team of engineers dedicated to the product to 202 create it, understand it, maintain it, and evolve it rather than reusing the knowledge across multiple products. The benefits and the advantages of SPL cannot be reaped in this case.

9.4 Chapter Summary

This chapter summarizes the issues and concerns of software reuse in the Software Product Line community. The result is a comprehensive analysis of software reuse in the SPL community. Many interesting issues and concerns have been identified which if solved can help in making a systematic software reuse process which can be used in the modernization of legacy systems. The objective of software reuse is to improve the software structure for maintenance and reduce the development cost of the software. The cost of future maintenance of a legacy system should therefore be reduced if software reuse is happening. The focus of our modernization approach is that a legacy system can be maintainable without any changes to its functionality. Software reuse is the core concept of our modernization approach. So it is for the SPL community. The identified issues and concerns in software reuse and by addressing these issues and concerns, will enable us to make software reuse an integral part of software modernization. Some of the key findings are that domain knowledge is not the key to reuse in SPL, and potentially there may not actually be any time spent in implementation which means that either there is 100% reuse or products are generated automatically from the core assets. Organizations should keep component code and architecture and documentation of both in sync to enable systematic reuse in SPL. Software architecture can be used in a product line setting to support better identification, reuse, and integration of reliable components. Documentation of software architecture requires a substantial amount of effort. So software architecture is considered as a key issue in software reuse and hence in our modernization approach. The identified issues and concerns in software reuse in the SPL community have unfolded many unanswered questions about software reuse. As software reuse is a key concept in our modernization approach, the results from this survey will allow us to address issues related to software reuse in our modernization approach as 203 outlined in Chapter 4. The issues discussed in this Chapter are addressed in Chapter 11 to enhance our modernization approach.

204

CHAPTER 10: A COMPARISION OF SOFTWARE REUSE IN BOTH SOFTWARE DEVELOPMENT COMMUNITIES

10.1 Introduction

The objective of this chapter is to compare issues and concerns in software reuse in the Conventional Software Engineering community and Software Product Line community and to describe differences in software reuse approaches in both communities. Software reuse is the prime focus of our modernization approach as discussed in Chapter 6. Over the past few years there has been a great deal of interest in the Conventional Software Engineering and Software Product Line communities in software reuse. Most of the literature on software reuse has focused on different types of software reuse to improve software product quality and, in particular, to reduce the number of defects in delivered software. As we are trying to make software reuse an integral part of our modernization approach we need to examine he issues and concerns in both communities around software reuse and reuse processes. A software reuse process is inherently complex and involves a very large number of activities. A software reuse process does not simply mean adopting particular methods or tools or using some model of a software reuse process that has been used somewhere else. The simple software reuse process is unlikely to be successful in a large scale [243]. Different organizations have adopted different types of reuse and software development communities such as the Conventional Software Engineering and Software Product Line have their own issues and concerns about software reuse. We have already identified these issues and concerns in Chapters 8 and 9. In this Chapter we compare the identified issues and concerns in software reuse to find answers to the issues and concerns in software reuse so that a systematic software reuse process can be developed and can be used in the software development and in the modernization approach to modernize any

205 legacy system. Once a comparison has been done, the lessons from one community that could benefit software reuse in the other community can be identified.

10.2 Comparison of Software Reuse in CSE and SPL Communities

Software reuse can be language specific, design specific, domain specific, product specific, architecture specific and organizational and managerial specific [244]. Based on a survey of the literature, it is evident that there are no standard definitions for software reuse. In our study we adopt the scope of software reuse to encompass concepts, tools, people, analysis, design, specifications, documentation, and domain knowledge in addition to component reuse [228], [10] and [245]. The goal of our comparison is to find important commonality and differences between the issues and concerns of software reuse in both the CSE and SPL communities. The collected data sets have been analyzed on different important issues which are described in the following sections.

10.2.1 Software Reuse Management and Measurement

Before an organization fully commits itself to for or with software reuse, the advantages, disadvantages and factors influencing reuse in software should be addressed. The key benefits of software reuse; time to market, reduced development cost, and increased quality of the software, have been widely accepted within both communities and our survey result reflects the same. The SPL community respondents also made positive statements about planned productivity, capture of domain knowledge, and cost reduction. The SPL community respondents believe that SPLs necessitate effective strategic planning and product line road mapping; market analysis; change from bespoke customer relationships and projects toward market, product and service orientation; effective requirements, scope, and release management of products and services. On the other hand in the CSE community software engineers do not feel comfortable in reusing code. Even today, the CSE community is working on traditional phases of the software development life cycle

206

(SDLC) described in waterfall, evolutionary or prototype approach. There is no planned phase of software reuse in the traditional software development life cycle process. The software development life cycle used by the CSE community has evolved over a number of years and is mature enough to make developers understand the requirements of the software system. The only phase missing in SDLC to fit in today’s requirement is a software reuse phase. Learning from one community to the other: The CSE community could learn from the SPL community by integrating a software reuse process into the matured SDLCs used in their community. However, the SPL community does not have a matured process like SDLC and hence can learn from the CSE community in how to make SPL development processes more mature.

10.2.2 Disadvantages of Software Reuse in the SPL and CSE Communities

The disadvantages associated with the software reuse in the SPL community were start up and maintenance cost. The major disadvantages respondents have highlighted are complexity – the “gravity” of software engineering: reuse can add complexity by creating dependencies between previously autonomous organizational units. Some of the problems with the dependencies identified by one of the respondents are a web of dependencies, coordination cost, cost of offering integration, process and tool divergence. However, the disadvantages associated with software reuse in the CSE community are the lack of tool support, increased maintenance cost, Not-Invented-Here (NIH) syndrome (unwillingness to reuse code built by another), maintaining a component library and finding reusable components. In the CSE community there is a lack of a systematic software reuse process. Not that the SPL community has a systematic software reuse process. But the SPL community is using some kind of software reuse processes which are non-standard across the community. Our findings are also supported by the study done in NASA where ad-hoc reuse practices are seen as valid (14 out of 15 respondents) as compared to systematic reuse practices (1 out of 15 respondents)”. Ad-hoc reuse is 90% higher than systematic reuse [246]. Learning from one community to the other: The NIH syndrome of the CSE community can be overcome by adopting the capture of domain knowledge as is 207 done in the SPL community. And both the SPL and CSE communities need to avoid ad-hoc software reuse or reusing a component without a strategy.

10.2.3 Is Reuse Affected by Not-Invented-Here Syndrome

CSE community respondents have concerns of Not-Invented-Here (NIH) syndrome. On a typical consulting project the code is usually not owned by the consultants and instead is owned by the customer. The code is not open source and becomes proprietary. Naturally this makes it harder to reuse code in other projects and consulting engagements. The SPL community can do software reuse in the absence of certain documentation, inappropriate design, with architecture mismatches, and without a proper understanding of domain based knowledge. The NIH syndrome can be taken care of by certifying reusable components in the conventional software engineering community. However, the SPL community suggests that the software reuse can be done even in the presence of NIH syndrome. A component could be domain-based but NIH may still get in the way if they are developed by another group. In the SPL community reuse of ideas and knowledge embedded in a common component is prevalent. Reuse and NIH syndrome needs to be integrated and put more commonly into practice if the NIH syndrome, as identified by the CSE community respondents, is to be avoided. The NIH syndrome is inhibiting common reuse practices in the CSE community. Learning from one community to the other: Certification of components is required in the CSE community. Certification is not normally done in the SPL community but in some cases it is. However because software components are used across multiple products the NIH syndrome is not an issue in the SPL community. The SPL community can adopt from the CSE community that NIH syndrome does not inhibit software reuse.

208

10.2.4 Reuse Planning

The SPL community respondents believe that software reuse has to be planned in advance. The CSE community also requires planning for software reuse, because software reuse is done on an ad-hoc basis and should be more systematic. In addition the CSE community also requires the understanding of development with software reuse and for software reuse. This suggests that there is a need for a software reuse process. The objective of planning software reuse should be targeted to the concerns which have been identified as a resistance to software reuse such as not-invented here syndrome, cannot find the reusable components, No reuse repository etc. In the CSE community respondents found that reusable components performing complex tasks are hard to find. According to respondents from the SPL community, software reuse is a part of SPL, meaning that it is already concerned with software development for and with reuse. Learning from one community to the other: Software reuse has to be planned in advance in both communities. The CSE community needs to adopt the idea from the SPL community that reuse should be an integral part of conventional software engineering. If software engineers think in this way then reuse will potentially become a lot more widespread and will become a standard part of engineering of new software systems and the modernization of existing software systems.

10.2.5 Reuse and Software Quality

Our respondents from the SPL community feel that the quality of software components to be reused in an SPL is already considered to be very high as components are normally reused across multiple products. In the CSE community software reuse is not a core concept. Reusing a component requires the test cases to assure the quality of software reused. If the software architecture is provided in the conventional software engineering, software engineers can make software reuse common in practice. The CSE community respondents also felt that the software architecture may also be used to measure quality attributes, such as reliability, of the component to be reused.

209

Learning from one community to the other: Within the CSE community if software reuse is adopted to a sufficient level and the quality of reusable components can be guaranteed to a certain level, then there will be less of a need to test components which are reused. If the hurdle of having to test components can be overcome it would make it easier for software reuse to become a core part of software development.

10.2.6 Software Reuse: Is Software Reuse Domain-Based

The SPL community enjoys the Domain-Oriented reusability concept where domain analyses and modeling deals with identifying reusable abstractions and architectures for the development of a family of software systems as opposed to the CSE community which believes more in language specific reuse. However, the identified concerns in the SPL community are “how to identify frequently reusable components”, “how assets should be managed systematically,” “what are the domain roles,” and how “to know which domain artefacts are available for reuse”. The respondents from the CSE community believe that language is of most importance for reuse, and advocates that features of languages such as Ada, C++, Smalltalk, and Java provide better reuse support. Most existing programming languages including object-oriented languages provide features that support reuse. However, simply writing code in those languages does not confirm the software reuse philosophy. Components must be designed for reusability using those features. Hence we can conclude that language is not an issue in the SPL community but domain is, whereas in the CSE community the domain in which reuse is taking place is of little consequence and language is a focus for software reuse. Learning from one community to the other: Software reuse in the SPL community is domain specific and hence components in one domain may not be reusable across different products in other domains. However, software reuse in the CSE community focuses more on programming language. The CSE community can do software reuse across different products having the same language base. This idea could be learned by the SPL community to move away from domain specific reuse.

210

10.2.7 Software Reuse and Technical Aspects

The large volume of literature on reuse CASE tools and their growing market show that many organizations regard CASE tools as a way to improve reuse. The SPL community is using different types of CASE tools. Model-Driven tools and environments seem to be the way to go and less so with traditional CASE tools. In Model-Driven Development (MDD) and Model-Driven Architecture (MDA) reuse happens at the model level. On the contrary in the CSE community the data from our respondents shows that CASE tools have not promoted reuse across projects in organizations. The CSE community needs to have a reusable platform like the SPL community. The domain engineering process is responsible for establishing that reusable platform [239]. Learning from one community to the other: Tools and environments adopted in the SPL community could be adopted by the CSE community to better support reuse. Techniques such as MDD and MDA could make significant improvement in how software reuse is carried out in the CSE community.

10.3 Possible Reuse Issues and Concerns

Software product line engineering consists of domain and application engineering. Domain assets and application assets should be loosely coupled to enhance reuse within the product line. There are a number of tools and methods to facilitate domain engineering. If the outcome of domain engineering process can be saved in a repository of common and variable domain artefacts, it can save a lot of development effort and complexity associated with product development. Table 10.1 shows the possible reuse issues and concerns that we explored in the SPL and CSE communities. There seems to be a conflict in order to reuse in SPL community. Some of our respondents that a well documented architecture is required and on the other hand some say it is not a must in order to have reliable software components. The entry number 15 and 17 in Table 10.1 reflects this. 70% of the respondents use core assets and they are seen as reliable and 30% of the respondents do not care about the reliability issue. Reliability, as a quality attribute of a software system, must be built into the architectural design [247]. We have identified a major contradiction that is whether or not we need to have a well documented architecture 211 to enable reuse so that reliable components can be identified as reusable artefacts. Our survey has identified that because SPL is potentially based on a reuse process, reuse of artefacts can happen even though components may not be reliable. A product line architecture has common components shared by all products of the same family. In addition the architecture provides details of variant components exploited by a subset of the products. These common components are essential for reuse, because together they cover a large set of situations in which they can be reused. Thus, we can conclude that an important benefit of a well document architecture is to identify reusable components which can be reused among the products of the same family and even between different product lines. Hence an SPL process should enforce a well-documented architecture. Reuse is one of the key attributes in SPL and potentially architecture documents, requirements, test cases and other artefacts should be standard practice to save time and cost. The issue with reuse is identifying the assets what can and cannot be reused and managing reusable assets properly. In SPL core assets are reused. Variability and core asset are central concepts of product lines and need to be explicitly defined and managed. The issue with the software reuse is that the variability should be described in domain requirements artefacts and adequate traceability of the artefacts within and between interacting domain and application requirements engineering life-cycle needs to be maintained. The complexity of variability grows higher in accordance with the complexity of the product line. This requires having a well-developed requirement management tool in SPL [248]. There are lots of case studies about the success of SPLs but there are no strong rules that can be derived from the experiences of experts in this area that can guarantee the success of an SPL. It is important for the SPL community that success stories are written and best practice is codified and made available for others to use. As seen from our surveys domain knowledge is required but it is not the key to reuse in SPL. Domain knowledge is not an issue in the CSE community. Our respondents’ belief is that successful SPL efforts can be generated from good management, architects, and process engineering practices.

212

Respondent Information Respondent Information Number Reuse Issues and from SPL from CSE Concerns Explored Reuse education helps in 87% Yes 90% Yes 1 reuse practice 13% Maybe 10% Maybe There is an increase in work 56% Yes 92% Yes 2 when reusing other’s code 20% No 8% Maybe 24% Maybe A common software reuse 39% Yes 59% Yes 3 process would promote 34% No 41% Maybe reuse 27% Maybe Would a reuse repository 48% Yes 59% Yes 4 help in reuse 13% No 41% Maybe 39% Maybe Domain knowledge is the 31% Yes essential key 78% Yes 5 key to reuse 69% Yes but not an 10% No essential key 12% Maybe Does programming 58% Yes 65% Yes 6 language affect reuse 42% No 45% No The importance of CASE 50% Yes 65% No 7 tools in reuse 50% No 45% Maybe Moving deadlines in 56% Often 45% Often 8 projects 33% Rarely 30% Rarely 11% Sometimes 25% Never Frequency of reusing a 43% Often 100% believed that unless 9 piece of code/class/core 35% Not Often provided they do not reuse it asset 22% Sometimes Affect of the component 45% Does not affect 68% Does not Affect 10 size on the ability to reuse it 40% Does affect 15% Maybe Most troublesome phase of 27% Analysis 30% Analysis 11 the software development 20% Design 20% Design

213

13% Implementation 20% Implementation 40% Testing 30% Testing Frequency of testing reused Core asset component – They do not do it 12 software components seldom Application specific component – often Use of any specific 65% Uses no Framework Only the manual test cases 13 framework for testing 35% Uses a Framework Testing the component 40% Test the Component Not done 14 before reuse 60% No Testing done Reliability of the reused 70% agreed that core 78% show concern about 15 component components are reliable reliability of component to 30% Do not care about be reused reliability as these 22% were not sure components are already tested. Integrated Development 100% SPL practitioners 100% use IDE’s. e.g. Java, 16 Environment (IDE) are used use IDEs e.g. Java Beans, ASP.net, PHP, Netbeans, Netbeans, Eclipse, etc. JCreator, Visual studio and specific environments Need well documented 80% Yes 100% not sure how 17 architecture for reuse 20% Not a must to have architecture can be used for reuse Identification and 45% Use Architecture 100% Do not look for 18 understanding of critical 45% No Answer critical component for reuse. components for reuse 10% No identification done. Efforts are focused on system use. Reliability evaluation at component level is one focus area of research.

214

Reliability can be determined by software architecture, Eclipse Feature Analysis and ATAM.

Table 10.1: Possible Reuse Issues Explored in SPL and CSE

10.4 Chapter Summary

This chapter compares the issues and concerns around software reuse in the CSE and SPL communities. Our survey has identified that there is no systematic reuse process in the CSE community. The SPL community has non-standard reuse processes which may vary from organisation to organisation but these are not systematic and consistent. In the SPL community during domain engineering testing of reusable components takes place but there is normally no certification of components. Product line engineers have a high level of trust that the components that are reused are reliable. Normally certification of code does not exist in the CSE community and because of this software engineers in the CSE community do not feel comfortable reusing the components. Software engineers need a base of properly catalogued and documented reusable components which suggests the need for a software reuse repository. With such a repository all reliable and well tested components can go into the knowledge based software reuse repository for reuse.

Reuse education has being identified as a definite influence on the way software reuse is performed in both the CSE and SPL communities. There are a few courses on general software reuse but they are all context specific especially around software product lines [249]. Software engineers working on modernization of existing systems and software development of new systems need a full understanding of what software reuse is and how it can be done and how it can support modernization.

215

The NIH syndrome does not exist in the SPL community even if there are cases of missing documentation, missing architecture and no trace between requirements and design artefacts. On the other hand the CSE community respondents have shown more concern about the NIH syndrome. So if this is to be overcome there is a need to raise the expectation in the CSE community that software built for other projects should be reused and when there is a need for new components, existing components should be investigated to see if they can be reused to satisfy that need rather than build the component from scratch. Advanced planning for and with software reuse must be done in both communities. Software architecture has been identified as one of the facilitators of software reuse in both communities. The objective of comparing software reuse in CSE and SPL community is to understand the differences between how software reuse is done in both communities. This comparison has unveiled that there is a basic need of a knowledge based software reuse process and repository which can be used by any software engineer involved in the modernization of a legacy system to improve quality attributes of the system keeping the business need the same.

216

PART 3

217

CHAPTER 11: KNOWLEDGE BASED SOFTWARE REUSE PROCESS AND REPOSITORY

11.1 Introduction

This chapter describes our approach of building a Knowledge Based Software Reuse (KBSR) Process and Repository for systematic legacy system modernization. Based on the findings from our survey and literature review [18], [20], [21], [22] and [23] we can state that software reuse is widely believed to be one of the most promising techniques to improve software quality and productivity for legacy system modernization. However as seen from the literature [23], [11], [24] and [25] and from the surveys we’ve completed, there remain several problems that still limit software reuse. These range from the scarce availability of reusable components and other software artefacts to the difficulty of retrieving, understanding and adapting the required reusable software artefacts and components. Software engineers find difficulty in locating reusable software components (code related) and reusable software artefacts (non-code related). Our survey results support this finding. To overcome this problem there is a need for a repository to store a large collection of designed-for-reuse software components and other designed-for-reuse software artefacts However, legacy systems when built did not address the concept of reusability. Still there are many software artefacts found in legacy systems that could be reused or could be modernized so that they become reusable. Hence, there should be a mechanism to locate reusable components in the legacy systems while modernizing it so that they could be reused. And there should also be a mechanism to locate reusable software artefacts and components from the repository, adapt them (if necessary) and even create new ones making use of the information provided by other similar reusable software components and reusable software artefacts. Our survey shows that software reuse has primarily been the result of opportunistic reuse, where development of a software system was able to take advantage of the development efforts of other software systems on an ad-hoc basis. It

218 has also been identified that there is no systematic software reuse process. Both software practitioners and academia frown on opportunistic reuse [250]. Opportunistic reuse is also considered as non-structured reuse. Non-structured reuse has no guidelines on how to reuse software artefacts and components. In non- structured reuse what software artefacts and components are reused and how they are reused completely depends on the experience of the software engineer working on the software development. The software product line (SPL) area has some guidelines on how reuse is done. The characteristic that distinguishes software product lines from previous non- structured reuse is predictive versus opportunistic software reuse. Rather than put general software components into a library in the hope that opportunities for reuse will arise, software product lines only call for software artefacts or software components to be created when reuse is predicted in one or more products in a well- defined product line. This is called structured reuse. Large organizations report that structured reuse methods [251] and software product lines are often the way to go when it comes to efficient software reuse [252]. When software vendors are developing an innovative product or service, its success depends largely on the time to market and software development cost. Instead of developing a component or service, if it is reused the cost of developing that component or service is reduced and also time to market is reduced. As a result, software reuse saves on development cost and time to market. Research [18], [19] shows that software reuse has increased software dependability which means reused software that has been tried and tested in working systems, should be more dependable than new software. The initial use of the software may reveal any design and implementation faults. These are then fixed, thus reducing the number of failures when the software is reused [32]. The growing size of software systems and the complexity of managing their development have led software engineers to devise methods which reduce the complexity of the development by breaking it down into manageable subtasks. The existence of reusable software artefacts gathered and stored in a repository can provide an important aid in effective and efficient reuse of software assets. Mi, et al. [253] have suggested process-driven software development which represents a technique for software production, in which a conceptual knowledge representation, 219 called a software process, is used to represent and guide software development activities and not software artefacts. Selby [254] has identified two categories of factors that characterize successful reuse-based software development of large-scale systems: module design factors and module implementations factors. His work relates more to characterizing software reuse at project level, module design level, module implementation level and module faults and changes level and no systematic software reuse process. Our survey has also reported many benefits of software reuse such as quicker time to market, better use of resources, increased quality, reduced software risk, and reduced development cost [20], [19], and [53]. There is a need of systematic software reuse process which should address the issues and concerns of software reuse. A systematic software reuse process facilitates an increase in productivity, quality, and reliability, easy to find reusable assets and the decrease of costs, implementation time and Not-Invented-Here syndrome. The development of a systematic software reuse process and repository produces a base of knowledge [13]. Our literature survey [136], [32], [114], [115], [255], and [118] as discussed in Chapter 2 shows that there are software reuse processes. However, to overcome the shortcomings identified in the literature review of software reuse process, to tackle some of the issues and concerns from our survey results, to achieve the benefits of software reuse and to address the shortcomings of current reuse practice (i.e. the lack of systematic reuse processes used in industry), we need a systematic software reuse process so that required reusable software artefacts or reusable software components can be systematically reused. With the changing paradigm of software development software reuse is required for software development or for modernization of legacy systems. As discussed in the previous chapters it is not possible to redevelop business critical legacy systems, rather than modernize them, due to the risks involved in doing so. Some of the major risks are: • Current system may not be well documented and specifications may need to be redeveloped and this may introduce errors in the system; • Current system may not conform to the running system and any redevelopment may create problems; • Critical data and business logic may not be replicated; 220

• The size and complexity of the legacy system may have grown beyond a comprehensible level to understand and analyse. These redevelopment risks suggest that there is a need to reuse the existing software artefacts, components, software assets, application requirements, source code, etc., when modernizing the system. This can only be facilitated with a systematic software reuse process. This requires a paradigm shift in software development where software reuse should be treated as one of the phases of software development in a similar way to the other phases: analysis, design, coding, implementation, and maintenance. To make software reuse an integral phase in software development or in legacy system modernization all reusable software artefacts, components, assets etc. should be made easily available to software engineers. As suggested previously this is possible using a reuse repository. A Reuse repository will store all the knowledge base of reusable software artefacts, reusable components, previous software development experiences, etc. We propose such a repository and call it the Knowledge Based Software Reuse (KBSR) Repository. The KBSR Repository aims to give software engineers easy access to reusable software artefacts and reusable components. The knowledge used for software development can be categorized and saved in the KBSR Repository for reuse. The knowledge extracted from legacy system can also be categorized and saved in the KBSR Repository for modernization with software reuse. The KBSR Repository can contain all categories of reusable software artefacts, reusable components and hence it will provide software reusable assets. Reuse repositories are one critical element of successful software reuse processes [32]. Software engineers and developers can access reusable software assets from this KBSR Repository. A reuse repository by itself is not of much use without it being used part of an overall software reuse process. We are proposing the development of a KBSR Process with the associated KBSR Repository which will systematize the software reuse process and provide the repository to store the reusable components, reusable software artefacts and capture current and past knowledge of software reuse and the process to reuse these artefacts. There are several areas in which knowledge bases can be used to support the software development process [136]. These areas include supporting the expert 221 nature of software design and coding, facilitating the reuse of software components and artefacts, and providing domain knowledge to support software reuse in implementation. Our approach of developing a KBSR Process is based on facilitating the reuse of software components and artefacts. This Chapter describes our KBSR Process and KBSR Repository for use in legacy system modernization. We have discussed our modernization approach in Chapter 4 where software reuse is an inherent part. We generated the KBSR Repository and KBSR Process which we used for our modernization approach.

11.2 An Overview of the KBSR Process

The KBSR Process involves two necessary software reuse phases to help software engineers develop or modernize a software system with reuse. These phases are: i. Develop the KBSR Repository (for reuse), and ii. Use the KBSR Repository in the modernization of a system (with reuse).

Figure 11.1 shows an overview of the KBSR Process. It includes the KBSR Repository and a complete framework where the KBSR Repository is being used for the modernization of a legacy system or development of a new software system based on the reusable artefacts found in the KBSR Repository. Once the KBSR Repository is developed it stores the reusable software artefacts and components. This KBSR Repository can then be used in the KBSR Process for reuse/with reuse development or modernization of the software. We have used the KBSR Process for modernization of legacy systems to build a systematic legacy system modernization approach where components extracted from legacy systems are stored in the KBSR Repository and then retrieved from the KBSR Repository as part of the KBSR Process to modernize a legacy system. Modernization of the legacy system may also require reusable components from other sources, architecture, product line components, and other artefacts from external sources, etc [13]. The KBSR Repository categorises and stores these

222 different components and artefacts along with additional metadata to support reuse of these components and artefacts [13]. Many legacy systems can be modernized using our KBSR Process and with time our KBSR Repository is going to grow large to store reusable software artefacts to build systematic legacy system modernization approach. KBSR Repository can serve as a single point to look for reusable software resources for all software engineers and developers within an organisation. Our approach of using a KBSR Repository may dramatically improve software reuse in the software industry. It is well understood that present software development or modernization approaches are not adequate for meeting the software reuse demand [136].

KBSR Repository

New New System/Moderni Requirements/Legacy zed System System

Figure 11.1: Overview of the KBSR Process

All the reusable artefacts extracted from our legacy systems are stored in the KBSR Repository for/with reuse. Below we discuss the activities within each of the KBSR Process phases in detail.

223

11.2.1 Phase 1: Develop the KBSR Repository

Figure 11.2 outlines the activities to develop the KBSR Repository. The activities involved to develop the KBSR Repository are:

• Activity 1: Identify Reusable Artefacts; • Activity 2: Classify Reusable Artefacts; • Activity 3: Store Reusable Artefacts in the KBSR Repository.

Figure 11.2: Activities to Develop the KBSR Repository

The modernization approach we have demonstrated in Chapter 6 transforms the legacy system from a procedural model to an object-oriented model. Object- orientation has high code reusability. The development of the KBSR Repository is part of Activity 2 (Re-Structure into an Object Model) of our modernization approach. The integration of our legacy system modernization approach to KBSR 224

Repository is shown in Figure 11.3. A systematized reuse process is built using KBSR Repository for KBSR Process. The developed KBSR Repository will then be used for the modernization of our legacy system using the KBSR Process. As discussed in Chapter 6 we have worked at the subroutine level to re-structure the code for the legacy system. We improved maintainability of the legacy subroutines by modernizing (changing the design paradigm from procedural to object-oriented) through reusing the existing code.

Re-Structure into an Object Model

Reverse Engineer an Forward Engineer using

Analysis Model Object-Oriented Methods

Legacy System Modernized System

Figure 11.3: Legacy System Modernization Approach using KBSR Repository

225

Activity 1: Identify Reusable Artefacts

Figure 11.4 shows the artefacts and the components of the KBSR Repository. All the artefacts and components from all software organizations can contribute to our KBSR Repository. Once the artefacts and components are developed they should be made available for/with reuse so that other organizations involved in the development of a new system or modernization of legacy system should able to reap the benefits of already developed components.

Software Internal patterns from Component Other other sources library reusable artefacts Software Components patterns internal from from market to organization legacy Components from systems other sources

Figure 11.4: Components of the KBSR Repository

The KBSR Repository consists of the following reusable software artefacts: • Components from market; • Components from other sources; • Software patterns from other sources; • Software patterns internal to organization; • Internal Component libraries; • Other reusable artefacts from legacy systems.

Components from market: Modern enterprise systems are composed from these types of components, which include relational database, transaction monitor and other middleware, web/intranet, security and a host of others.The Java 2 Platform, Enterprise Edition (J2EE) greatly simplifies enterprise application development. Off-

226 the-shelf software solutions developed with Enterprise JavaBeans (EJB), JavaServer Pages (JSP), and Servlet technologies are designed to work within the component architecture of J2EE. These J2EE software components reduce the time it takes to deliver enterprise applications to market.

Components from other sources: Organizations may benefit from organizing an internal component market. Software components can then be reused over multiple projects saving valuable resources. A component available from other sources would really allow organizations to implement software reuse. Software engineers would only need to focus on functionality specific to the software project, and on locating and integrating available components.

Software patterns internal to organization: A software pattern is a general repeatable solution to a commonly occurring problem in software design. Software patterns can speed up the development process by providing tested, proven development paradigms. Effective software patterns require considering issues that may not become visible until later in the implementation. Reusing software patterns helps to prevent subtle issues that can cause major problems and improves code readability for coders and architects familiar with the patterns. Software engineers only need to understand how to apply certain software pattern techniques to certain problems. Some of the software patterns may be specific to the organization. Software development may take an advantage of already developed software patterns in the organization if developing similar kinds of projects.

Software patterns from other sources: Organizations involved in the development of software would have already developed different patterns. Adding these patterns to the KBSR Repository will allow software engineers to access software patterns already developed by other organizations. Because of the fundamental role of software design and patterns both for adhering to non-functional requirements and with respect to software reuse, the software pattern of a successful software system itself is an important object to reuse.

227

Internal Component libraries: Internal component libraries are organization- specific developed components, tools, software documents, etc. Internal components libraries should be able to provide automated support to software engineers for classifying, describing, and finding reusable assets stored in them. These are internal libraries specific to the organization. Some of the libraries are specific to particular products. We are looking to have software reuse across the organizations whereas internal libraries only allow reuse of organization-specific developed components and artefacts.

Other reusable artefacts from legacy systems: An organization's business policies, processes, and procedures are often maintained on its legacy systems. While modernizing these systems reusable software artefacts can be extracted and kept in the KBSR Repository for future reuse. Ideas, concepts, architectural patterns, designs and other high-level constructs can be reused because such artefacts preserve previously acquired domain knowledge [256]. In this activity we are creating reusable artefacts so that these artefacts can be reused later. We are extracting all the information from the legacy system. We have used ARMin software architecture reconstruction tool for the creation of reusable artefacts. ARMin is capable of showing component dependencies and software patterns. Moore and Bailin [257] suggest a method for creation of objects for reuse that involves: encapsulation of local features, parameterization, and enrichment, abstraction, restructuring, and testing. Software patterns can also be reused. Software patterns describe the interaction between groups of components; hence it is a higher- level abstraction than classes or objects and is used for creation of reusable artefacts [258]. Creating and using patterns promotes software reuse because a pattern is designed once and used many times. We have chosen a small number of modules of the legacy system to modernise. While modernizing the system we identified reusable artefacts. To identify reusable artefacts we reconstructed the architecture of the legacy system to identify independent subroutines that can be reused in our modernized system. Analyses of the legacy architecture, description of the subroutines and modules, documentation, patterns, and code of subroutines identified the reusable artefacts. We also identified reusable test cases in the documentation. We used different dependency views of 228 subroutines to re-structure the subroutines for reusability. The detailed description of the re-structured subroutines can be found in Chapter 6. Output from this phase becomes the input to the next phase called Classify Reusable Artefacts. For the description of the artefacts the following information should be included so that the artefact can be indexed and located easily in the KBSR Repository.

• Artefact name; • Artefact Type as described above. • A description of the function the artefact performs; • Interface specifications; • A description of the artefact’s inputs and where these come from; • A description of the artefact’s outputs and where these go to; • An indication of what other functions are required by the artefact (dependencies); • A description of the side-effects (if any) of reusing the artefact; • A description of pre-condition setting out what must be true before the artefact is reused; • A description of post-condition and what is true after the artefact is reused; • Application name from where the artefact has been extracted.

Activity 2: Classify Reusable Artefacts

Classification of reusable objects [259] simplifies communication, encourages comparison of domains, and supports analysis by providing a basis for evaluation criteria. Reusable artefacts can be classified by product and process descriptors. Product descriptors identify what is reused by type (what types of artefacts are reused?), medium (how formal are the reused artefacts?) and maturity (how broadly the artefacts is used?). Process descriptors which define how the artefacts is reused include: activity (when do we reuse?), mechanism (how do we accomplish reuse?), extent of modification (how much change is needed?), and granularity (how much of the product is needed?) [260].

229

Software engineers can come across a question: What to reuse? Software engineers can reuse the following categories of software artefacts and components:

Software artefacts: • A software asset: A software asset can be any artifact resulting from one stage in a software development task, and any mapping from the artefacts produced by one development stage to the artefacts produced by the next. It could be domain/business analysis/model, application requirements, application architectures and generic architecture patterns, application design/model and generic design patterns. • Documentation and meta-data. • Test suites and test generators, Test cases. • Any information, formalism or tool used during development can also be subject to reuse: Languages, CASE tools, IDE, processes. • ERD, DFD, Sequence diagrams, Collaboration diagram, UML class diagram, Activity Charts, Event diagrams. etc.

Software components: • Source Code: Libraries, modules, packages, Classes, templates, idioms, Objects, functions, subroutines, data. • Executable code: executable components, byte code, binary code.

Activity 3: Store Reusable Artefacts in the KBSR Repository

All of the reusable software components are stored in the KBSR Repository. The KBSR Repository is the collection of reusable software artefacts and reusable components. The components in the KBSR Repository are stored with the purpose of reuse. Reusable software artefacts and components are developed such that these become more and more reusable. We stored several artefacts from the legacy system in the KBSR Repository. These included the code of several subroutines, Test Cases, Flow Charts, Internal Libraries, Software Patterns and ERD’s.

230

11.2.2 Phase 2: Use the KBSR Repository in the modernization of a system

We used the KBSR Repository when we incorporated the KBSR Process in our modernization approach to build a systematic legacy system modernization approach. The activities involved in using the KBSR Repository in the systematic legacy system modernization are as follows: • Activity 1: Analyse Problem • Activity 2: Retrieve Reusable Artefacts • Activity 3: Adapt and Reuse Reusable Artefacts Figure 11.5 shows the activities where the KBSR Repository is used in the systematic modernization of a legacy system.

Activity 1: Analyse Problem In this phase we analyse the problems associated with the legacy system. The first question is why modernization is required for the existing system. What problems are being faced while keeping the existing system running? There could be a number of quality issues such as reliability, maintainability, security, dependability, etc. There could also be a requirement to modernize the legacy system to be compatible with the new technology. Whatever the reason may be for modernization of legacy systems the problem needs to be analysed. The Analyse Problem activity gives the complete understanding of the legacy system as discussed in Chapter 6. Analysis of the problem also looks into new requirements developed during maintenance. We analysed the legacy system for modernization for better maintainability by re-structuring the subroutines to find independent subroutines.

Activity 2: Retrieve Reusable Artefacts

The process of finding reusable artefacts involves more than just locating an exact match. It includes locating highly similar components based on re-structuring (based on the input and output from the component) because even if a target component must be partially redeveloped—rather than be reused in total—an example similar to the ideal component can reduce the effort and eliminate many

231 defects [262]. Due to the difficulty of finding an exact match, artefacts become less and less reusable the more specific they become.

Figure 11.5: Activities using the KBSR Repository in the Modernization Approach

The artefacts retrieved from legacy system were based on re-structuring. Re- structuring is discussed in detail in Chapter 6. We identified the subroutines based on low coupling. Related to these subroutines we also looked to determine if any test

232 cases were used in the documentation. We generated a few test cases and stored them as multiple reusable artefacts. The description of the reusable artefacts has already been saved in the KBSR Repository so that the software engineers can understand the artefacts that have been retrieved. The descriptions were generated after executing the system and understanding the software architecture of the system. Reusable artefacts have to be chosen not only based on their functionality, but also to be compatible to the respective application. The computer system should meet minimum requirements for the software to run. We used Intel Pentium III, 500 MHz to run our software application. Between the subroutines we considered interface design as the compatibility issue.

Activity 3: Adapt and Reuse Reusable Artefacts

Adaptation, the lifeblood of software reusability, is the customization of the reusable artefacts to fit the new problem. This changes perception of a reusability system from a static library of rock-like building blocks to a living system of subroutines that spawn, change, and evolve new subroutines with the changing requirements of their environments [262]. The major steps involved in adaptation are figuring out what to adapt and adapting it. During Activity 3 of the Development of the KBSR Repository, we stored reusable artefacts in the KBSR Repository. We stored the subroutines. We also stored test cases, flow charts and internal libraries of the existing system. The generated software patterns and documentation were also stored in the KBSR Repository. From the existing source code we generated the class diagrams using the Enterprise Architect software tool. We also generated the class diagrams from our modernised systems to see the dependency of the class (loose coupling and tight cohesion). We used Enterprise Architect to generate the logical model to see how classes are connected to identify the class inheritance. Enterprise Architect allows us to create the UML, class diagram from the existing source code.

233

11.3 Chapter Summary

This chapter described our approach of building a Knowledge Based Software Reuse (KBSR) Process and Repository for systematic legacy system modernization. The ever increasing demand for improvements in software maintainability and modernization cannot be met through traditional techniques of software development. The life cycle of software development model must be replaced by a new paradigm which can address issues and concerns of software reuse. A Knowledge Based Software Reuse (KBSR) Process can support such paradigms, providing a knowledge base for/with reuse and can facilitate software reuse. In order to develop such a new paradigm and supporting environment, it is important to identify the nature of software reuse happening in the software industry. The KBSR Process developed in this Chapter is based on understanding of issues and concerns of software reuse and is used to build a systematic legacy system modernization approach. The reusable artefacts and reusable components are stored in a repository called the KBSR Repository. The KBSR Process and KBSR Repository support and save the long term investment in a legacy system. The development of a KBSR Process with an associated KBSR Repository systematizes the software reuse process and provides the repository to store the reusable components, reusable software artefacts and capture current and past knowledge of software reuse to build a systematic legacy system modernization approach. The KBSR Repository also provides mechanisms to locate reusable software artefacts and components from the reuse repository, adapt them (if necessary) and even create new ones making use of the information provided by other similar software reusable components and software reusable artefacts. Software engineers now know exactly where to look for reusable software artefacts. This addresses the major issues and concerns of software reuse which was hindering software reuse from being a systematic process.

234

CHAPTER 12: EVALUATION OF MODERNIZATION APPROACHES

12.1 Introduction

The aim of this chapter is to evaluate our modernization approaches. The modernization approaches we have developed to modernize the legacy system are: • Reusing Code for Modernization; • Modernization using Knowledge Based Software Reuse Process and Repository. We demonstrate our modernization approach on a number of subroutines of the case study, the Automatic Cane Railway Scheduling System (ACRSS). We obtained the source code of ACRSS and modernized a few modules of the system and gave it back to the organization to use the system. We also performed a number of perfective maintenance tasks such as adding security features to some subroutines and found that our modernized system is more capable of evolution than the legacy system where we could not add the security feature. Encapsulation is a way of organizing data and methods into a structure by concealing the way the object is implemented, i.e. preventing access to data by any means other than those specified. Encapsulation therefore guarantees the integrity of the data contained in the object. The user of a particular class does not need to know how the data in that object is structured, this means that a user does not need to know how implementation takes place. By preventing the user from directly modifying attributes, and forcing the user to use defined functions in order to modify them (called interfaces), data integrity is thus ensured (for example, one can ensure that the data type given matches expectations, or is returned within the expected time interval). Encapsulation defines the access levels for elements of that class. These access levels define the access rights to the data, allowing us to access the data by a method

235 of that particular class itself, from an inheritance class, or even from any other class. There are three levels of access: • public: functions of all classes may access the data or methods of a class that is defined with the public access level. This is the lowest level of data protection • protected: data access is restricted to functions of inheritance classes, i.e. member functions of that class and all sub-classes • private: data access is restricted to methods of that particular class only. This is the highest level of data protection The details of our “Reusing Code for Modernization” approach are discussed in Chapter 6 and the details of our “Using Knowledge Based Software Reuse Process and Repository” modernization approach are discussed in Chapter 11. The experiences, findings and evaluation of modernizing the ACRSS legacy system using our initial approach of “Reusing Code for Modernization” and generating the KBSR Repository and using the KBSR Process are reported in this Chapter. A comparison with some existing modernization approaches is undertaken as well as a discussion of both approaches including some limitations of the approaches.

12.2 A Case Study – The ACRSS System

The case study used is the Automatic Cane Railway Scheduling System (ACRSS) [216]. ACRSS is a computer-based system developed in 1987 to solve the cane railway scheduling problem. ACRSS was developed to schedule operations involved in the transport of sugar cane from field to factory. The vast majority of the cane is transported to the sugar mills by factory owned railways. The basic purpose of a cane railway is to deliver empty bins from mill to the growers, and to collect these bins when they are filled with cane and to return them to the mill. The railway operations are influenced both by the procedures at the mill to handle the bins and by the activities at the growers’ side. ACRSS was designed to satisfy the requirements of both the mill and the growers. The ACRSS has been under investigation by Sugar Research Institute. Pinkney [216] traced the historical development of the scheduling practices and developed ACRSS to assist with the production schedule.

236

This case study was chosen because ACRSS is a very critical legacy system for Sugar Research Institute. ACRSS system was the best candidate for the modernization as it was written in FORTRAN in 1983 and was creating maintenance problems. The transport of cane from field to the factory is a major cost item in the production of raw sugar. It is estimated that between 30%-40% of the costs of processing the cane are transport related costs. The capital costs associated with the rail transport of cane are also significant. Automatic Railway Scheduling System enables the user to access the impact of additional locomotive shifts on the transport system. The objectives of the ACRSS are: to minimize the average cost of transportation and to maximize the profit. ACRSS uses data describing the cane railway layout, harvesting patterns of the relevant growers and some operational parameters to produce a schedule. We could get some details of ACRSS from its documentation and user’s guide. The source code had bad programming practices. The methods names chosen were bad and did not mean anything. This increases maintainability problem. We also had an access to the running program and its subroutines written in FORTRAN 77. The main modules contained in the ACRSS are: • Pre-Processor Phase: The pre-processor phase processes the data describing the tramway network and processes the operational parameters and the harvester details were little altered. • Runs Generation Phase (Routing Sub-problem): Generate a set of runs, iteratively refine runs and allocate runs to locomotives. The solution to the routing sub-problem is a set of locomotive runs which make all the required collections and deliveries. • Runs Sequencing (Scheduling Sub-problem): The runs sequencing phase places the runs generated in solving the routing sub-problem into a time sequence by defining, for each run, the time the locomotive leaves the mill yard. • Constraint Review Phase: The constraint review phase takes the near-feasible schedule produced after the runs sequencing phase and removes produced after the runs sequencing phase and removes any constraint violations so that a feasible schedule is produced. • Iterative Refinement Phase: The feasible schedule produced by the constraint review program (known as the first feasible schedule) can be refined iteratively. 237

Attempts are being made to modify the existing schedule by altering activities such as the timing of a delivery or a collection. • Post-processor Phase: Reports, including listings and graphs, detailing the schedule obtained, are produced by this phase. We also had an access to the running program and its subroutines written in FORTRAN 77. In the following sections we show how we modernized ACRSS legacy system using our modernization approaches.

12.3 Reusing Code for Modernization

The legacy system modernization approach, described in Chapter 6 was applied to several subroutines of the case study. The goal of the case study was to re- structure old unstructured FORTRAN code into structured object-oriented code. Reusing code for modernization approach involves three activities. These activities are: • Activity 1: Reverse Engineer an Analysis Model • Activity 2: Re-Structure into an Object-Model • Activity 3: Forward Engineer using Object-Oriented Methods

12.3.1 Activity 1: Reverse Engineer an Analysis Model:

Activity 1 is composed of two phases.

Phase 1: Analyze the Legacy System: ACRSS consists of 194 subroutines and about 50,000 lines of code. We chose a number of subroutines on which we applied our modernization approach. The subroutines chosen were: • OPTMDL: Optimizes deliveries for all growers affected. • ACCOL4: Adjusts collection sizes over Growers existing set of collections. • ACCOLA: Checks that ACCOL4 does not introduce errors. • NDEPTH: Shortest path between start node and all other nodes of a network • INSCOL: Creates a new collection for a grower • INITSH: Sets up array after new shift added.

238

To demonstrate our modernization approach we choose the NDEPTH subroutine. We also identified bad programming practices as names used for the subroutines were bad names without having any meaning. Bad programming practices inhibit the understanding of the source code and logic. We first analyzed the source code of ACRSS using the Understand for FORTRAN tool [217]. Understand for FORTRAN is an interactive tool providing reverse engineering, automatic documentation, metrics, un-used objects and cross-referencing of FORTRAN source code. Understand for FORTRAN analyses FORTRAN source code to create a repository of the relations and structures contained within it. The repository is then used to learn about the source code. The documentation and the source code analysis helped us in understanding the system since we were not domain experts. From source code analysis we identified subroutines of our interest. The results from the Understand for FORTRAN tool were analyzed and this analysis was carried out using a manual approach.

Phase 2: Reconstruction of Legacy System: The code analysis produced input for the architecture reconstruction tool. The software architecture reconstruction tool ARMin was used to identify the dependencies of the subroutines on each other. Looking at the dependency of the subroutines we could identify which subroutines are of most interest to be re-structured. We looked at the dependency of the identified subroutines and if there are no dependencies, there is no need to re- structure the subroutine. Because of loose coupling it can be reused as it is.

12.3.2 Activity 2: Re-Structure into an Object-Model:

For reuse, if dependencies (patterns) are repeated the subroutines are re- structured into different subroutines and for this we need to understand different types of dependencies. The re-structured subroutines are classified as classes with methods and data variables encapsulated in it. These classes are reusable. We worked at the subroutine and method level to re-structure the code for the ACRSS system. We improved maintainability of ACRSS subroutines by changing the design paradigm from procedural to object-oriented re-using the existing code.

239

Pm P1 P4 Pn Ps P2

P3 Px

Figure 12.1: The Dependency view of methods inside OPTMDL

A dependency view can be between methods, between subroutines or between methods and subroutines. The level of dependency, the view can show it at a subroutine level and at a method level. Subroutine NDEPTH is dependent on subroutine INSCOL and Subroutine OPTMDL is tightly coupled to subroutine INSCOL through method Px which belongs to OPTMDL. Figure 12.1 shows the dependency view of OPTMDL subroutine at the method level of the ACRSS system. P1, P2, P3, P4, Pm, Ps, Pn and Px are the methods within the OPTMDL subroutine. The functionality of this subroutine is to optimize deliveries for all growers affected. We also constructed dependency view of all of the methods. Figure 12.2 shows dependency view of the method Px. We can see that subroutine OPTMDL is tightly coupled to subroutine INSCOL through method Px which belongs to OPTMDL. It also shows subroutine NDEPTH is dependent on subroutine INSCOL. It also shows what other procedures are dependent on Px such as P5. P6 is dependent on P5, and P7 is dependent on subroutine INSCOL. In Figures 12.2 a circle represents a procedure and an oval represents a subroutine. Methods perform a specific task, whereas subroutines are the collection of methods which together fulfil systems requirements.

240

Px P5 INSCOL

P6 P7 NDEPTH

Figure 12.2: The Dependency view of methods and subroutine

Subroutines NDEPTH and subroutine INSCOL have been used to demonstrate the re-structuring operations. Dependency view of methods and subroutines allowed us to analyse and figure out a dependency view of variables. It was a manual exercise to find out the variable dependency view. The variable dependency view (manual exercise) and dependency view of methods and procedure (SAR process) are used for restructuring operations. Figure 12.3 shows the variable dependency view of NDEPTH and Figure 12.4 shows subroutine and variable dependency view of INCSOL as subroutine NDEPTH is dependent on subroutine INSCOL. We need to identify the variables used for the tight coupling between the INSCOL and NDEPTH subroutines. The variables days, sale, total_sale and total_pay are used in the subroutine NDEPTH. The variables days, pay, file, sale, cost and profit are used in the subroutine INSCOL.

241

days sale

total_sale total_pay

Figure 12.3: NDEPTH Variable Dependency View

days file cost

sale

pay

NDEPTH profit

Figure 12.4: INSCOL Subroutine and Variable Dependency View

In Figure 12.4 removing NDEPTH by NDEPTH dependency view, we can visualize which variables are used again. After applying the coincidental decomposition, decomposition with sequential cohesion, sequential composition, and sequential decomposition re-structuring operations on the subroutine NDEPTH and INSCOL, we could generate another three independent subroutines called SALE, 242

PAY, and PROFIT. The algorithms (as pseudo code) of these three subroutines are shown in Table 12.1. Taking an example of subroutine SALE. SALE subroutine has four variables: days, file, sale and total_sale. Variable sale is calculated based on input value from file, and days in INSCOL subroutine as shown in Figure 12.4. Also sale variable generates an output of total_sale, and total_sale also takes an input of days. Subroutine SALE is being restructured to fulfil this purpose as shown in Figure 12.5 having the variables days, file and sale.

SALE (input days, input file, PAY (input days, input sale, PROFIT (input days, input input sale, output total_sale) output total_pay) sale, input cost, output total_profit)

days days days

sale file sale

cost sale profit total_sale pay total_pay

total_profit

Figure 12.5 Re-structured subroutines SALE, PAY and PROFIT

Similarly PAY and PROFIT are being restructured as subroutine. These subroutines are independent and can be reused in the application. The SALE subroutine is calculating total_sale on the basis of days and the data input from file. Now this becomes a reusable subroutine. The interface needs to be defined and this subroutine will work. The input it takes is days and reading data from file and is used to calculate total_sale as an output. This way the reusable software artefacts are identified. The variables used in the subroutines SALE, PAY and PROFIT are, SALE (input days, input file, input sale, output total_sale), PAY (input days, input 243 sale, output total_pay), and PROFIT(input days, input sale, input cost, output total_profit). Examining these variables we can identify that variables days and sale are common to all subroutines.

Subroutine SALE Subroutine PAY Subroutine PROFIT (input days, (input days, (input days, input file, input sale, input sale, input sale, output total_pay) input cost, output total_sale) integer days, pay, total_pay, i output total_profit) integer days, i, Dimension sale (14) integer days, cost, profit, i character total_pay =0 dimension sale(14) file(50) do 60 i = 0, days total_sale =0 dimension total_pay= pay+0.1*sale(i) do 90 i=0,days sale(14) if sale(i) < 1000 total_sale= do 20 i = 0, go to 80 total_sale+sale(i) days total_pay =total_pay+50 90 continue i = i +1 60 continue 80 profit = 0.9 read file pay = total_pay/days+100 *total_sale_cost sale ( i ) end end 20 continue return end

Table 12.1: Three subroutines SALE, PAY, and PROFIT

12.3.3 Activity 3: Forward Engineer using Object-Oriented Methods:

Table 12.2 shows the pseudo code of the re-structured methods and variables reused from ACRSS source code to modernize it. We wrote another subroutine called SecSalePayProfit and invoked it from SALE, PAY and PROFIT to add on the security features. Security has extensive experience with assessing application security - both web (browser based), non-web (client/server, compiled binaries, command line, etc), including front-end and back-end systems. Software defects, bugs and logic flaws during maintenance are consistently the primary cause of

244 commonly exploited application software vulnerabilities. These can lead to unauthorized access of applications information. A security identifies vulnerabilities inherent in the code of an application itself, regardless of the technology in which it is implemented. The ACRSS System had no security features on the PROFIT subroutine embedded on it. We could identify from the architecture that the total_profit component had no data validation. Total profit should be based on sale records. If there is no sale no profit should be displayed and should not be accessed outside sale. In the legacy system user could access total_profit outside sale. The reason could be that during maintenance this security feature may have been overlooked by the software engineers. We added the data validation security feature to PROFIT subroutine. It was very quick and easy to add on the security features on PROFIT subroutines as they are very lightly coupled due to modernization. Adding the security features allowed the system to evolve. SAMPLE PROGRAM Program ACRSS integer m, n, days, i, character file (50) dimension sale(14) dimension pay(14) dimension profit(14) Call SALE (input days, input file, input sale, output total_sale) Call PAY (input days, input sale, output total_pay), Call PROFIT (input days, input sale, input cost, output total_profit) End Subroutines PAY(input days, input sale, output total_pay), Methods Compute_tpay (pay, sale) Compute_pay (tpay, days) Redifine_tsale( tsale, sale) Subroutines PROFIT (input days, input sale, input cost, output total_profit) Methods Compute_profit( tsale, cost)

245

Subroutine SALE(days, file, sale) Compute_tpay (pay, sale) Compute_pay (tpay, days) Redefine_tsale( tsale, sale) Compute_profit( tsale, cost)

Table 12.2: Pseudo Code of the Re-structured Methods and Variables

12.4 Modernization Using Knowledge Based Software Reuse Process and Repository

This section describes using our KBSR Process on the same case study which we chose for our modernization approach “Reusing Code for Modernization”. All the reusable artefacts extracted from our legacy system ACRSS are stored in the KBSR Repository for/with reuse. Below we discuss how we applied the KBSR Process to our case study.

12.4.1 Phase 1: Develop the KBSR Repository

This Phase 1, the Development of the KBSR Repository is part of Activity 2 (Re-Structure into an Object Model) of our other modernization approach “Reusing Code for Modernization”. The developed KBSR Repository will be used for the modernization of our case study using the KBSR Process. As discussed in the previous section we have worked at the subroutine level to re-structure the code for the ACRSS system. We improved maintainability of ACRSS subroutines by modernizing (changing the design paradigm from procedural to object-oriented) through reusing the existing code. This phase involves three activities. And they are: • Activity 1: Identify Reusable Artefacts • Activity 2: Classify Reusable Artifacts • Activity 3: Store Reusable Artefacts in the KBSR Repository Activity 1: Identify Reusable Artefacts

In this activity we are creating reusable artefacts so that these artefacts can be reused later. To identify reusable artefacts we reconstructed the architecture of the legacy system to identify independent subroutines that can be reused in our 246 modernized system. Analyses of the legacy architecture, description of the subroutines and modules, documentation, patterns, and code of subroutines identified the reusable artefacts. We also identified reusable test cases in the documentation. We chose a small number of subroutines of the ACRSS system to modernise. We used different dependency views of subroutines to re-structure the subroutines for reusability. SALE, PAY and PROFIT the restructured subroutines from applying the our modernization approach. The deliverable of this phase is a set of identified reusable artefacts. We identified SALE, PAY and PROFIT subroutines, test cases, flow chart, and documentation of ACRSS to be reused. Output from this phase becomes the input to the next phase called Classify Reusable Artefacts. Using a formatted template allows us to index all the artefacts in the KBSR Repository. To illustrate how a software artefacts would be represented within the KBSR Repository including the specification of its interface we have chosen a software component called SALE from the ACRSS legacy system and its description is provided in the Table 12.3.

Attributes used for Description Attribute Values Artefact Name SALE Artefact Type Subroutine at Module level A description of the function the Keeps the record of the sale in the file artefact performs Interface specifications SALE (input days, input file, input sale, output total_sale) A description of its inputs and Input1: days - User entered value where these come from Input2: sale - User entered value Input3: file - Data read from file A description of its outputs and Total Sale: total_sale data output on the where these go to display for user to view An indication of what other None functions are required A description of the side-effects None (if any) of reusing the artefact

247

A description of pre-condition User data to be saved in the file setting out what must be true before the artefact is reused A description of post-condition All Sale data saved in the file and what is true after the artefact is reused Application name from where the ACRSS artefact has been extracted Table 12.3: Software Artefacts Representation within the KBSR Repository

Activity 2: Classify Reusable Artefacts

The created SALE, PAY and PROFIT subroutines, test cases and documentation of ACRSS to be reused were categorized according to the above reusable artefact categories. The reusable components we had were source code of the subroutines SALE, PAY and PROFIT. Test cases, flow chart, and documentation of ACRSS were classified as reusable software artefacts. The way we represent the classifications in the repository is through the use of different columns to represent the type of reusable assets such as class, object, subroutines, test cases, ERD, DFD, etc. Table 12.4 shows the template we are using to store reusable components in the KBSR Repository. We have used ACRSS SALE subroutine data to show the values in the template. Table 12.3 and Table 12.4 are related through “Artefact Name”. Project Category Artefact Description of Attributes used Name Name the Reusable for Description Artefacts Automatic Subroutine SALE Computes total Refer to Table Cane sale based on 12.3 under Railway days, file, sale as Artefact Name: Scheduling input data and SALE System calculates total_sale.

Table 12.4: Storage of Reusable Artefacts SALE in the KBSR Repository 248

Activity 3: Store Reusable Artefacts in the KBSR Repository

All of the reusable software components are stored in the KBSR Repository. The KBSR Repository is the collection of reusable software artefacts and reusable components. The components in the KBSR Repository are stored with the purpose of reuse. Reusable software artefacts and components are developed. We stored several artefacts from the ACRSS system in the KBSR Repository. These included the code of several subroutines (SALE, PAY PROFIT), Test_Case_SALE, Flow_Charts_SALE, Internal_Libraries_SALE, Software_Patterns_SALE and ERD_SALE. The Artefact name such as Test_Case_SALE keeps the information about SALE subroutine test cases. The artefacts are shown in Table 12.5.

Project Category Artefacts Description of Attributes Name Name the Reusable used for Artefacts Description ACRSS Subroutine SALE Calculates total Refer to sale Table 12.3 ACRSS Subroutine PAY Calculates total Refer to pay Table 12.3 ACRSS Subroutine PROFIT Calculates total Refer to profit Table 12.3 ACRSS Software Test_Case_SALE Test cases used Refer to artefact for SALE Table 12.3 ACRSS Software Flow_ Flow chart of Refer to artefact Charts_SALE SALE Table 12.3 ACRSS Software Internal_ Internal Libraries Refer to artefact Libraries_SALE used in SALE Table 12.3 ACRSS Software Software_ Software patterns Refer to artefact Patterns_SALE used in SALE Table 12.3

249

ACRSS Software ERD_SALE Entity Refer to artefact relationship Table 12.3 diagram SALE

Table 12.5: Reusable Software Artefacts and Components from the ACRSS system

12.4.2 Phase 2: Use the KBSR Repository in the Modernization of a System

The activities involved in using the KBSR Repository in the systematic legacy system modernization are as follows: • Activity 1: Analyse Problem • Activity 2: Retrieve Reusable Artefacts • Activity 3: Adapt and Reuse Reusable Artefacts We illustrate the use of the KBSR Repository in the KBSR Process to modernize the ACRSS legacy system and to build a systematic legacy system modernization approach. Activity 1: Analyse Problem

The ACRSS legacy system is written in old procedural programming language and needed to be converted to a more modern object-oriented programming for better maintainability and reusability. Abel, et al. [261] used experimental techniques to examine the maintenance of Version 1 of the ACRSS. The examination detailed by Abel, et al. consisted of two sections. The first section compared the generated schedules with the manually designed existing schedules and the second section, where a number of parametric analyses were conducted to probe the system’s reliability. Pinkney [216] worked on the modifications of the source code using FORTRAN 77. Values such as total_sale, total_profit and total_pay from previous versions of the ACRSS system were analysed to assess maintainability of the system.

250

We analysed the ACRSS system for modernization for better maintainability by re-structuring the subroutines as detailed below to find independent subroutines SALE, PROFIT and PAY. The existing problem descriptors such as modifying total_sale, total_profit and total_pay were old monolithic legacy source code which stopped evolving and were difficult to maintain. We chose a few subroutines where SALE, PROFIT and PAY could be restructured as an independent subroutine to modernize using the KBSR Process. The subroutines we chose were:

• OPTMDL: Optimizes deliveries for all growers affected. • ACCOL4: Adjusts collection sizes for Growers 1 over his existing set of collections. • ACCOLA: Checks that ACCOL4 does not introduce errors. • NDEPTH: Shortest path between start node and all other nodes of a network • INSCOL: Creates a new collection for a grower • INITSH: Sets up array after new shift added.

Activity 2: Retrieve Reusable Artefacts

The artefacts retrieved from ACRSS were based on re-structuring. Re- structuring is discussed in detail in Chapter 6. We identified the subroutines SALE, PAY, and PROFIT based on low coupling. Related to these subroutines we also looked to determine if any test cases were used in the documentation. We generated a few test cases and stored them as multiple reusable artefacts based on the SALE, PAY, and PROFIT test cases. We updated the NDEPTH module to reflect the independent subroutine SALE, PAY and PROFIT. The retrieved reusable artefacts from ACRSS were subroutines, internal libraries, test cases and the application requirements. The description of the reusable artefacts has already been saved in the KBSR Repository as shown in Table 12.5 so that the software engineers can understand the artefacts that have been retrieved. The descriptions were generated after executing the system and understanding the software architecture of the system. Reusable artefacts have to be chosen not only based on their functionality, but also to be compatible to the respective application. The computer system should meet

251 minimum requirements for the software to run. We used Intel Pentium III, 500 MHz to run our software application. Between the subroutines we considered interface design as the compatibility issue. In our case study we looked into the interface of the subroutines to be compatible with calling subroutines.

Activity 3: Adapt and Reuse Reusable Artefacts

During Activity 3 of the Development of the KBSR Repository, we stored reusable artefacts in the KBSR Repository. We stored the SALE, PAY and PROFIT subroutines. We also generated the class diagrams from our modernised systems to see the dependency of the class (loose coupling and tight cohesion). Based on the class diagram and manually looking at the class dependency we created the inheritance (super type class and subtype class) as shown in the Figure 12.6. We re- factored the code of employee type to make three more classes: HOURLY EMPLOYEE, SALARIED EMPLOYEE and CONSULTANT. These three classes are being inherited from EMPLOYEE. Re-structuring the code has made PAY subroutine easier to maintain because different employee types are tightly cohesive within its own functionality and loosely coupled so that changes in one employee type is independent of another employee type. Looking at the PAY subroutine the maintenance of this PAY subroutine has become easier as compared to what we had in our legacy system. The re-structured code conforms to all object-oriented principles such as encapsulation, inheritance and information hiding. We used the re-structured classes in a different application just to see if reuse is domain based. An example of this is that we used the EMPLOYEE class in an application system called the “Theatre System” (C# language used) to add extra functionality so that the Theatre System can have different types of Employee. We reused the Employee class in the Theatre System and calculated pay based on different employee type such as: HOURLY EMPLOYEE, SALARIED EMPLOYEE and CONSULTANT. The test cases we ran gave correct answers. This shows the re-structured employee class worked perfectly well in another application called the Theatre System. Using Employee class added a new functionality on the Theatre System that pay can be calculated based on different employee type. This shows that the reuse is not necessarily domain based.

252

Figure 12.6: Super type class and Subtype class of PAY Subroutine

12.5 Comparing Software Modernization Approaches based on Software Reuse

In this section we compare the two modernization approaches we applied to our case study the ACRSS legacy system. The first time we modernized the legacy system was without having any KBSR Process or KBSR Repository. And then we again took the same case study using the KBSR Process and KBSR Repository. We compare the modernization approaches on different attributes.

Comparison Comparison Approach 2 Approach 1 Issues Attributes Modernization with Modernization without KBSR Process and KBSR Process and KBSR KBSR Repository Repository

Software Integration of Reuse was integrated as We required extra time and Reuse software reuse in we had components effort to find out what Management modernization identified for reuse. The software artefacts are and and SDLC modernization approach available to reuse. In the Measurement process used software with reuse. process we developed software for reuse.

Disadvantage Ad-hoc reuse, no No ad-hoc reuse was Strategy was followed to s of Software strategy for done. Strategy was identify reusable artefacts. Reuse software reuse followed to modernize Again it was time the system with reuse. consuming and human 253

efforts were used. Is Software Domain based Reusable components We used legacy system Reuse and language such as Employee class written in FORTRAN to Domain specific from KBSR Repository modernize. Reusable Based and were used in ACRSS components were extracted language and another application, to be reused in the same Specific? the Theatre System, to system. check the functionality. So Software reuse is not necessarily domain based and language specific. Reuse Planning is We planned with reuse We planned for reuse. Planning required Reuse and Quality attributes Maintainability and Maintainability and Software Maintainability, Understandability of the Understandability of the Quality Understandability code were enhanced. code were enhanced.

Table 12.6: Comparison of Modernization Approaches: With and Without KBSR Process and KBSR Repository

We collected the set of attributes used for the comparison purpose from the outcome of our survey so that the issues and problems associated with reuse could be addressed to the software development communities (CSE community and SPL community). Some of the major concerns shown are lack of tool support, the Not- Invented-Here (NIH) syndrome, case tools are not promoting reuse, no reuse education, and no reuse repository and no systematic reuse process. With the development of KBSR Process we have addressed the no reuse repository and no systematic reuse process concern of our software development communities [13]. The comparison attribute “integration of software reuse in modernization and SDLC process” addresses the issue and concern for “software reuse management and measurement”, the comparison attribute “ad-hoc reuse, no strategy for software reuse” address the issue and concern of “disadvantages of software reuse”, the comparison attribute “domain based” address the issue and concern for “is software reuse domain based?”, the comparison attribute “Planning required”, addresses the issue and concern for “reuse planning”, the comparison attribute “quality attributes maintainability, understandability” addresses the issue and concern for “reuse and software quality”, and the comparison attribute “language specific” addresses the 254 issue and concern for “is software reuse language specific?”. Modernization with the KBSR Process and KBSR Repository has software reuse as an integral phase as software reuse components were already identified for reuse. This modernization approach is based on with software reuse. It saved us time and cost as software reusable artefacts were already there in the repository. While using modernization without KBSR Process and KBSR Repository, we required extra time and effort to find out what software artefacts are available to reuse. This process is very resource intensive. The comparison of modernization with KBSR Process and KBSR Repository and without KBSR Process and KBSR Repository is summarized in Table 12.6.

12.6 Comparison of different Modernization Approaches to our Approach

In this section we compare our modernization approaches, modernization with and without KBSR Process and KBSR Repository, to other modernization approaches outlined earlier in the thesis. In Chapter 2 we presented several approaches to support legacy system modernization. Each presented approach has strengths, weaknesses, and tradeoffs between software reuse, software architecture reconstruction, flexibility, data, business logic, presentation, integration, and impact of code changes (Table 2.1). Our modernization approach without KBSR Process and KBSR Repository (approach 1) has four phases: Phase 1 is analysis of the legacy system, Phase 2 is reconstruction of legacy system, Phase 3 is restructuring and Phase 4 is Transformation from procedural to OOP. In the analysis phase the important part is to create a description of each module and each data item. In the absence of domain experts it helps the software architect understand the source code. The knowledge gained during this task is invaluable during the restructuring process. During the phase 2, reconstruction of legacy system the relations between the elements are aggregated into high-level abstractions. Identifying all external dependencies that a component has is important when considering modernization from one language paradigm to another. Of particular importance are the dependencies between

255 components that are candidates for restructuring. During phase 3, restructuring is completed by applying restructuring operations. The restructuring operations applied are: Coincidental Decompositions, Conditional Iterative or Communicational Decomposition, Sequential Decomposition, Caller/ Callee Composition, Hide and Reveal. Once the stage of reconstruction and restructure is complete, phase 4 is applied and it gives us Structured Object Model. The object can be viewed as an abstract data type, encapsulating a set of data (i.e. attributes) and a corresponding set of permissible actions on the data (i.e. methods). Each object is an autonomous entity and interacts with other objects during the execution of the system. The functionality of the program is viewed at many levels. Our modernization approach with KBSR Process and KBSR Repository (approach 2) includes the KBSR Repository and a complete framework where the KBSR Repository is being used for the modernization of a legacy system or development of a new software system based on the reusable artefacts found in the KBSR Repository. Once the KBSR Repository is developed it stores the reusable software artefacts and components. This KBSR Repository can then be used in the KBSR Process for reuse/with reuse development or modernization of the software. Many legacy systems can be modernized using our KBSR Process and with time our KBSR Repository is going to grow large to store reusable software artefacts to build systematic legacy system modernization approach. KBSR Repository can serve as a single point to look for reusable software resources for all software engineers and developers within an organisation. Our approach of using a KBSR Repository may dramatically improve software reuse in the software industry. It is well understood that present software development or modernization approaches are not adequate for meeting the software reuse demand [136]. Table 12.7 summarizes the discussions of modernization approach 1 and modernization approach 2 based on strength, weaknesses, software reuse and software architecture reconstruction.

256

Modernization Strengths Weaknesses Software Software Approaches Reuse Architecture Reconstruction Approach 1 Maintainability, Limited Individual Yes Modernization Understandability support to do components/ without KBSR Architecture modules Process and Reconstructio extracted for KBSR n, cost, extra reuse Repository time and effort to find out software artefacts to reuse. Approach 2 All strength of Cost, Yes Yes Modernization software reuse, expertise may Modernizati using Maintainability, not be on done with Knowledge generating available, software Based Software software reuse limited tool reuse Reuse Process repository. support, Individual and Repository searching for components/ reusable modules components were reused

Table 12.7: Modernization Approach 1 and Modernization Approach 2 based on Strength, Weaknesses, Software Reuse and Software Architecture Reconstruction

There are many approaches for legacy system modernization. Black-box modernization approach is based on software reuse but has limited impact on maintainability as no software architecture reconstruction is done to understand the components. We believe that reconstruction of the architecture of the existing system is an important activity in the modernization and should be an integral part of a modernization approach. White-box modernization approach involves understanding the internals of legacy systems so we believe that the architecture reconstruction can help with this activity. Though one of the objectives of White-box modernization is improving reusability it does not identify individual components or modules for reuse. Architecture Driven Modernization (ADM) does transform current architecture (legacy) to target architecture (modernized) but does not involve software architecture reconstruction in the process. Software architects have to understand, analyze, and reason about the as-built software architecture of a system to modernize it [7]. Software architecture reconstruction can support the understanding of existing systems. It allows software architects to form increasingly

257 abstract models of a system and the resulting artefacts from software architecture reconstruction can be used to analyze quality-driven requirements that are caused by the demands of software modernization.

12.7 Discussion and Limitation

Based on the findings from our survey and literature review [18], [20], [21], [22] and [23] we can state that software reuse is widely believed to be one of the most promising techniques to improve software quality and productivity for legacy system modernization. However as seen from the literature [23], [11], [24] and [25] and from the surveys we’ve completed, there remain several problems that still limit software reuse. These range from the scarce availability of reusable components and other software artefacts to the difficulty of retrieving, understanding and adapting the required reusable software artefacts and components. Software engineers find difficulty in locating reusable software components (code related) and reusable software artefacts (non-code related). Our survey results support this finding. Software reuse is widely believed to be one of the most promising techniques to improve software quality and productivity for legacy system modernization. From the surveys we’ve completed, we can identify that there are several problems that still limit software reuse. Table 12.8 summarizes the discussion and limitation of issues and concerns of software reuse and incorporation of these in our KBSR Repository/Process. Software Reuse Issues Concerns of Software Incorporated in KBSR Engineering Community Repository/ Process Available software Difficult to find Yes, Save it in KBSR components Repository Reliable software Difficult to find Yes, Save it in KBSR components Repository Build from scratch Build for reuse Yes, Save it in KBSR Repository Finding software Where to look for In the KBSR Repository components software components

258

Education in Software Not very common in most Not incorporated in the reuse organization KBSR Process NIH Syndrome NIH syndrome is to be Yes, Reusable avoided components, because of their more careful design, testing and more extensive usage, can be more reliable. Reusing common assets Reusing common assets Yes, Using KBSR for different products for different products to be Repository for different a standardized practice. products. Domain Knowledge Domain knowledge is not Yes, Software components the key to reuse in SPL from different domain can be stored in the KBSR Repository for reuse.

Table 12.8: Issues and Concerns of Software Reuse and Incorporation of these in our KBSR Repository/Process

We have used the KBSR Process for modernization of legacy systems to build a systematic legacy system modernization approach where components extracted from legacy systems are stored in the KBSR Repository and then retrieved from the KBSR Repository as part of the KBSR Process to modernize a legacy system. Our modernization approach is based on software reuse. Modernization of the legacy system may also require reusable components from other sources, architecture, product line components, and other artefacts from external sources, etc [13]. These components may not be available as other organization may not be ready to share their valuable assets. The limitation with the KBSR Repository is to find the perfect component in the repository to reuse. Detailed search capability may be required in order to find the right component or set of components. The problem could also be that the perfect component may not be there and the search capability should be able to indicate this. There are also limitations because of lack of tool support to understand the behaviour 259 of reusable software components. It is sometimes difficult to explain what the software component does without knowing how it does it? The approach needs to take advantages of database technologies to improve the storage and retrieval of software reusable components. We had limited software artefacts to store so used Access to store our data. Two tier database server architectures can be employed where client is responsible for I/O processing logic and some business rules logic. Server should be used to perform all data storage and access processing. Three-tier architecture can also be an option where data storage is on database server.

12.8 Chapter Summary

Modernization of the legacy system involves analyzing the system to identify the system’s components and their interrelationships and to create a re-structured design. We used the ACRSS system and modernized it into an object-oriented programming language using our modernization approaches. We identified that a Knowledge Based Software Reuse (KBSR) Process can support and facilitate software reuse in the modernization process. In order to develop such a new paradigm and supporting environment, it is important to identify the nature of software reuse happening in the software industry. Modernization with the KBSR Process and KBSR Repository has software reuse as an integral phase as software reuse components were already identified for reuse. This modernization approach is based on with software reuse. It saved us time and cost as software reusable artefacts were already there in the repository. While using modernization without KBSR Process and KBSR Repository, we required extra time and effort to find out what software artefacts are available to reuse. This process is very resource intensive.

260

CHAPTER 13: CONCLUSION

13.1 Introduction

Modernizing legacy system is difficult. Constant technological change often weakens the value of legacy systems, which have been developed over the years through huge investments. We have developed a new promising systematic modernization approach. Our systematic modernization approach incorporates software reuse so that the investment in the legacy system can be utilized during the modernization process through the reuse of software components and artefacts from the system. Software development communities have shown their great concern about software reuse. We identified a number of issues and concerns that the software development communities have about software reuse which we believe we have overcome in out systematic modernization approach. The purpose of this thesis is to build a systematic modernization approach that incorporates software reuse. In order to incorporate software reuse we needed to understand the issues and concerns of software reuse in different software development communities and to build a systematic modernization approach that addresses these issues and concerns. Due to the complexity of legacy systems and unidentified re-usable components we firstly developed a modernization approach where functionality of the legacy system is preserved and the maintainability of the legacy system is enhanced. Secondly, we developed a reuse repository to store re- usable software components artefacts so that the software components and artefacts within the repository can be reused during the modernization of the legacy system. In order to incorporate software reuse as an integral part of our modernization approach we needed to understand why software reuse has not already an integral part of software development and modernization. To do this we needed to uncover the issues and concerns around software reuse in the different software development communities and address them when developing our modernization approach. Thus this thesis presents original work in five areas: re-using code for modernization of legacy systems [241], identify issues and concerns in software reuse in the conventional software development community [18], identifying issues 261 and concerns in software reuse in software product line community [20], comparing software reuse issues and concerns in software development communities [263], and creating a knowledge based software reuse process with an associated repository and incorporating these into a systematic modernization approach for legacy systems. We have been able to verify our modernization approaches using the Automatic Cane Railway Scheduling System (ACRSS) (written in FORTRAN) case study and validated it on the Theatre System (written in C#). This concluding chapter comprises of the following sections. Section 13.2 summarizes the research questions; section 13.3 summarizes the contributions of our research while section 13.4 suggests future research directions.

13.2 Research Questions

We investigated the issues and concerns around software reuse and addressed those issues and concerns to improve software reuse overall to make a systematic software reuse process which can be incorporated in a modernization approach to make it systematic. Towards the development of the systematic legacy system modernization approach, the research questions we have explored are as follows:

Research Question 1: What role do software architecture and software reuse play in software system modernization? Research Question 1 allowed us to understand what role software architecture play in software reuse and what its role in modernization is. This research question was examined by literature survey of software architecture, software reuse and software architecture reconstruction and toolkits that support reconstruction. We chose four architecture reconstruction toolkits and compared these toolkits on different attributes to find out which toolkit supports the software architecture reconstruction the best. Based on the comparison we chose ARMin for the software architecture reconstruction of the case study system ACRSS. Software architecture reconstruction gave us the dependencies of modules, subroutines and design patterns which suggest what can be modernized to make the system more flexible and agile.

262

Research Question 2: What are the issues and concerns about software reuse in the software engineering communities that will have an impact on software system modernization? To examine this research question, to better understand why software reuse has not been adopted and to identify issues and concerns around reuse for system modernization and development in the software engineering community we carried out a number of surveys. We identified two communities; the Conventional Software Engineering (CSE) Community where software reuse has not been adopted as a core activity in software development and the Software Product Line (SPL) Community where software reuse is a core activity. We carried out similar surveys in both communities to identify issues and concerns in software reuse.

Research Question 3: What is a systematic legacy system modernization approach that incorporates software architecture reconstruction and software reuse? Research Question 3 gave clarity about what a systematic legacy system modernization approach is, and how our suggested systematic legacy system modernization approach is different than other modernization approaches. We used software architecture reconstruction as part of our modernization approach. While using our modernization approach we could identify reusable software artefacts and wanted to explore why software reuse could not become an integral part of any modernization approach to date. This research question was explored by literature review for different modernization approaches used in the software industry and the problems associated with each modernization approach. We developed a modernization approach and used the Automatic Cane Railway Software System (ACRSS) and code from it to illustrate our approach. Our modernization approach incorporated software architecture reconstruction.

Research Question 4: How do we address the issues and concerns of software reuse to make it an integral part of a systematic legacy system modernization approach? We compared the survey results on software reuse between both communities and addressed the major issues and concerns in the development of our systematic software reuse process which we made an integral part of our 263

systematic legacy system modernization approach. We developed a systematic software reuse process called the Knowledge Based Software Reuse (KBSR) Process which was incorporated into our modernization approach. This research question was examined by combining the results of the research questions RQ1, RQ2 and RQ3, and by using the proposed systematic legacy system modernization approach that incorporates software architecture reconstruction and software reuse through the development of KBSR Process and KBSR Repository for the modernization of a legacy system.

13.3 Contributions and Significance of the Research

This section presents the contributions and the significance we made to the research community. The contributions and significance we made can be grouped into four areas to address the research questions as stated in the previous section. They are discussed as follows.

13.3.1 Reusing Code for Our Legacy Modernization Approach

That software architecture plays a major role in modernization is due to the need to understand the system and to help identify reusable components. Chapter 6 of this thesis presents the first modernization approach where software architecture of the legacy system has been reconstructed and software artefacts for reuse have been re-structured and identified. Chapter 6 describes our approach of reusing code for the modernization of legacy systems based on software architecture reconstruction. We have demonstrated our modernization approach on a number of subroutines of the case study, Automatic Cane Railway Scheduling System (ACRSS). We modernized one independent module of the system and gave it back to the organization. We also performed a few perfective maintenance tasks such as adding security features to a few subroutines and found that our modernized system is more capable of evolution than the legacy system where we could not add the security feature. Our modernization approach has three activities. Activity 1 is to reverse engineer an analysis model using software architecture reconstruction. The input to 264 this activity is a legacy system and output is analysis model. Analysis model describes the architecture of the system. To find out the architecture of the system we identified the system’s components and the interrelationships between the components. Activity 1 consists of two phases, Phase 1 is to analyze the legacy system and Phase 2 is the reconstruction of the architecture of the legacy system. Analysis of the legacy system is an important part of our modernization to create a description of each module and each data item of the legacy system. We used ARMIN to reconstruct the architecture to understand interfaces between the components and the subsystems, and the characteristics of the components and the subsystems. Activity 2, restructure into object-model, consists of one phase called re- structuring of the analyzed components. Re-structuring gives the logical structure of the existing software. We have identified different artefacts as a result of restructuring such as object-model which consists of DFD’s, FDD’s and UML’s, software patterns, library and legacy documents. Activity 3, forward engineering using object-oriented methods, implements the objective of modernization of legacy system. The components of the legacy systems to be reused have already been identified in the previous activities for object-oriented implementation. This activity consists of only one phase called Implementation (procedural to object-oriented programming). After this activity we get the modernized system. Reconstruction and restructuring are the prime activities to identify reusable code. Modernization of the legacy system also involves analyzing the system to identify the system’s components and their inter-relationships and to create a restructured design. We used one independent module of ACRSS system to modernize it in object-oriented programming language. The contribution we have made is reusing the code for modernization of legacy systems using our SAR approach to modernization.

13.3.2 Issues and Concerns about Software Reuse in the Software Engineering Communities

Chapter 8 of this thesis presents the first survey results on the issues and concerns in software reuse for modernization. The contribution from this chapter is 265 to identify and explore the issues and concerns in software reuse in the first community of interest, called the Conventional Software Engineering (CSE) community. Modernization is inherently reuse-oriented. So for modernization purposes, issues and concerns in software reuse must be identified. We conducted a survey of software reuse the focus of which was to identify issues and concerns in software reuse which may pave a path to developing a suitable software reuse process for software organizations and may be used for developing efficient methods of software reuse for the software community. We followed the survey framework as discussed in Chapter 7. We found from our survey that software engineers and developers are interested in reuse of available software components if the components are reliable. We also found that most software engineers would prefer to reuse software rather than build it from scratch, contradicting the common wisdom that software engineers prefer building things themselves rather than reusing. Our survey has also revealed that the software engineers have trouble finding the perfect and reliable component to reuse because software engineers do not know where to look for the components to reuse. Education on software reuse is not very common in most organization and hence it affects software reuse. Reuse and domain engineering needs to be integrated and put more commonly into practice if the Not Invented Here (NIH) syndrome is to be avoided which is inhibiting common reuse practices among software engineers. It is argued that reusable components, because of their more careful design and testing and broader and more extensive usage, can be more reliable. If so, then using these more reliable components in a system can increase the reliability of the system as a whole. Chapter 9 of this thesis presents the issues and concerns in software reuse in the Software Product Line (SPL) community. As we have discussed in Chapter 2, one of the reasons for introducing software product lines is the reduction of costs through reusing common assets for different products. Software Product Line is the second community of interest where software reuse is very prominent. In SPL it is evident that software reuse is a core concept. Software reuse has, however, failed to become a standardized practice in SPL. We wanted to find an answer to the question “What are the issues and concerns of software reuse in the software product line community”? 266

We used the survey method to identify constraints on software reuse, to determine to what extent reuse is taking place in SPL projects today, how developers and programmers use and reuse different types of code, how much testing they are doing on the code, whether or not the code is more reliable and what are the issues and concerns with the use of Integrated Development Environment (IDE) in today’s SPL community which are inhibiting software reuse in SPLs. Some of the key findings from this survey were that domain knowledge is not the key to reuse in SPL, and potentially there may not actually be any time spent in implementation which means that either there is 100% reuse or products are generated automatically from the core assets. Organizations should keep component code and architecture and documentation of both in sync to enable systematic reuse in SPL. The identified issues and concerns in software reuse in the SPL community have unfolded many unanswered questions about software reuse. In our modernization approach software reuse is the key concept. The result from this survey allowed us to address issues related to software reuse in our modernization approach as discussed in Chapter 6. Software architecture can be used in a product line setting to support better identification, reuse, and integration of reliable components. Documentation of software architecture requires a substantial amount of effort. Software Architecture is considered as a key issue in software reuse and hence in our modernization approach. The list of issues and concerns given in this paper can be used to implement a new software reuse process in the software development life cycle and in the modernization approach we have used in Chapter 6. Current product engineering and derivation processes in SPL can be analyzed for systematic reuse of software. This suggests a systematic reuse process which we have incorporated into our modernization approach. The contribution we have made is identifying issues and concerns in software reuse in the Conventional Software Engineering community and the Software Product Line community to understand why software development communities do not make use of software reuse for software development and software modernization.

267

13.3.3 A Comparison of Software Reuse in Software Development Communities

Chapter 10 of this thesis presents the comparisons of issues and concerns in software reuse in the Conventional Software Engineering community and Software Product Line community and describes the differences in software reuse approach in both communities. We used our survey result from both communities to do the comparison. Our survey has identified that there is no systematic reuse process in the CSE community. The SPL community has non-standard reuse processes which may vary from organisation to organisation but these are not systematic and consistent. In the SPL community during domain engineering testing of reusable components takes place but there is normally no certification of components. Product line engineers have a high level of trust that the components are reliable. Normally certification of code does not exist in the CSE community and because of this software engineers in the CSE community do not feel comfortable reusing components. Software engineers need a base of properly catalogued and documented reusable components. This suggests that all reliable and well tested components can go into a knowledge based software reuse repository for reuse.

Reuse education has being identified as a definite influence on the way software reuse is performed in both the CSE and SPL communities. There are a few courses on general software reuse but they are all context specific especially around software product lines. Software engineers working on modernization approach and software development need full understanding of what and how software reuse can be done. The NIH syndrome does not exist in the SPL community even if there is missing documentation, missing architecture and no trace between requirements and design artefacts. On the other hand the CSE community respondents have shown more concern about the NIH syndrome. So if this is to be overcome there is a need to raise the expectation in the CSE community that software built for other projects should be reused and when there is a need for new components, existing components should be investigated to see if they can be reused to satisfy that need rather than build the component from scratch. Advanced planning for and with software reuse

268 must be done in both communities. Software architecture has been identified as one of the facilitators of software reuse in both the communities. The objective of comparing software reuse in CSE and SPL community is to understand the difference in how software reuse is done in both communities. This comparison has unveiled that there is a basic need of a knowledge based software reuse repository which can be used by any software engineer involved in the modernization of a legacy system to improve quality attribute keeping the business need same. The contribution we have made is the comparison of software reuse in the Software Development Communities which address the issues and concerns why software reuse is not an integral part of software development life cycle and software modernization and what one community can learn from other community to make software reuse as an integral part to software development life cycle and software modernization.

13.3.4 Systematic Legacy System Modernization Approach

Chapter 11 of our thesis presents our Knowledge Based Software Reuse (KBSR) Process which is incorporated into our modernization approach to build a systematic legacy system modernization approach. Based on the findings from our survey and literature research we can state that software reuse is widely believed to be one of the most promising techniques to improve software quality and productivity for legacy system modernization. However as seen from the literature and from the surveys we’ve completed, there remained several problems that still limit software reuse. These range from the scarce availability of reusable components and other software artefacts to the difficulty of retrieving, understanding and adapting the required software reusable artefacts and components. Our KBSR Process for systematic software reuse involves several necessary software reuse phases to help the software engineers develop a software system with systematic reuse. We have used the KBSR Repository in the KBSR Process to build systematic legacy system software modernization approach which incorporates systematic software reuse.

269

The case study we used on our systematic legacy system modernization approach was the ACRSS legacy system. We used the EMPLOYEE class from ACRSS in another application written in C# called the Theatre System to validate that components can be reused from the KBSR Repository in the modernization of other systems. Developing KBSR Repository and KBSR Process has shown that reuse gets systematized and making our modernization system as systematic legacy system modernization. The KBSR Process for systematic software reuse is based on understanding of issues and concerns of software reuse in both communities. The reusable artefacts and reusable components are stored in a repository called Knowledge Based Software Reuse Repository. Both the KBSR Process and KBSR Repository support and save the long term investment in the legacy system. The development of a KBSR Process with an associated KBSR Repository systematizes the software reuse process and provides the repository to store the reusable components, reusable software artefacts and capture current and past knowledge of software reuse and hence build a systematic legacy system modernization approach. The contribution we have made is systematic legacy system modernization approach which has addressed the issue of un-documented system architecture and provides a long term solution for the problems of the legacy systems such as lack of agility, openness and flexibility. Agility, openness and flexibility are required for software to evolve.

13.4 Future Work

The systematic modernization approach should be applied to other legacy systems, probably written in other languages, etc. to further validate the approach. There are a number of research issues related to the modernization of legacy systems which should be addressed to extend the KBSR Process. Software modernization should be based on knowledge already created during the development of the software which makes software reuse an integral part of legacy system modernization. While developing the KBSR Process we did not emphasise on capturing the reasoning behind the design decisions made during the software

270 development. Once the rationale behind the design decisions made during the development of the software is captured, it will help to gather information that relates to software reuse at the architecture and component levels. Gather of this information is one of the future directions for our work. As the saying is “no pain no gain”, and the reuse of software is no exception. The software reuse requires substantial efforts. Most programming languages including object-oriented languages provide features that support software reuse. However, simply writing code in those languages doesn’t promote reusability. Software artefacts must be designed for reusability using the features of object- oriented programming. Replication of an entire software program does not count as reuse. Reuse of assets is dependent upon both similarities and differences between the applications in which the software artefact is being used. The costs of using KBSR Process and KBSR Repository will include establishing and maintaining reusable software artefacts, searching for applicable software artefacts to be reused, as well as adapting software artefacts toward a proper implementation. As the number of software artefacts increases in the KBSR Repository there should be some algorithm written to retrieve the most appropriate reusable components effectively and efficiently. The algorithm should be a combination of clustering algorithms, Singular Value Decomposition Algorithm, Naive Bayesian Algorithm, etc. A comparative study should be done to find out which algorithm will be the best to retrieve software artefacts from the KBSR Repository. An advanced indexing method should also be suggested to store a large number of software artefacts in the KBSR Repository. Software reuse processes and procedures must be incorporated into existing software development processes. The KBSR Process when used in software development will make software reuse possible. Using the KBSR Process, which incorporates KBSR Repository, will reduce the cost of software development by reusing software artefacts instead of writing them from scratch. The KBSR Repository should be used in the modernization of different legacy systems to make it easier to evolve. The KBSR Repository can be built for different organizations and at a later stage it can be centralized. To create a centralized KBSR Repository the question is “Would an organization really want this? Exposing organizational specific software artefacts and components to other organization”? This a question to 271 research further. Many database principles can be used to optimize the use of KBSR Repository for the modernization of legacy systems. On the KBSR Repository different partitioning (Horizontal or Vertical) rules can be applied so that organizations can also maintain the security of the software artefacts. Some software artefacts can be made cost free cost but others, if the organization requires, can be made to be fee for use artefacts. Some software artefacts can be stored in the KBSR Repository be used for sharing within the same business units in the organizations, some can be stored to be used across different business units in the same organizations and some can be stored to be used across different organizations based on what and how the organizations would like to share their reusable software artefacts. Depending on what reusable software artefacts are used, communication reliability and costs model can be designed which suites the organizations. Software reuse education should be made an important part of software engineers’ curriculum. Software engineers should be taught how to look for or search for reusable software artefacts in the KBSR Repository and also how to add and index reusable software artefacts in the KBSR Repository. Software engineers must be trained in the skills of software reuse. Despite the initial overhead, there are high benefits to software reuse, if appropriate processes, such as the KBSR Process, and repositories, such as the KBSR Repository, are used. Developing toolsets of the KBSR Process and deploy it and evaluate the approach on larger or different systems can also be done as a future work. KBSR Process can be improved based on studying and surveying current modernization efforts in industry and academia.

272

REFERENCES

[1] J. Bisbal, D. Lawless, B. Wu, and J. Grimson, "Legacy Information Systems:

Issues and Directions", IEEE Software, no. 16(5), pp. 103-111, 1999.

[2] N. H. Weiderman, J. K. Bergy, D. B. Smith, and S. R. Tilley, "Approaches to

Legacy System Evolution", Software Engineering Institute, Carnegie Mellon

University, Piitsburgh, PA15213 December 1997.

[3] K. Bennett, "Legacy Systems", IEEE Software, vol. 12(1), pp. 19-23, 1995.

[4] N. Weiderman, L. Northrop, D. Smith, S. Tilley, and K. Wallnau,

"Implications of Distributed Object Technology for Reengineering,"

Technical Report CMU/SEI-97-TR-005, Carnegie Mellon University,

Pittsburgh, June 1997.

[5] M. Lehman, D. Perry, and J. Ramil, "Implications of Evolution Metrics on

Software Maintenance", in Proceedings of the International Conference on

Software Maintenance, 1998, pp. 208-217.

[6] J. Koskinen, J. J. Ahonen, and H. Sivula, "Software Modernization Decision

Criteria: An Empirical Study", in Proceedings of the Ninth European

Conference on Software Maintenance and Reengineering (CSMR'05),

Washington, D.C. USA, 2005.

[7] R. C. Seacord, D. Plakosh, and G. A. Lewis, Modernizing Legacy Systems:

Addison-Wesley Professional, 2001.

[8] J. Kral and M. Zemliicka, "Software Architecture for Evolving

Environment", in 13th IEEE International Workshop on Software Technology

and Engineering Practice Budapest, Hungary, 2005.

273

[9] R. Malan and K. Wentzel, "Economics of Software Reuse Revisited",

Hewlett Packard, 1993.

[10] P. Wegner, Varieties of Reusability, Tutorial: Software Reusability, Edited by

P. Freeman, Washington, D.C., IEEE Computer Society Press, 1987.

[11] Y. Kim and E. A. Stohr, "Software Reuse: Survey and Research Directions",

Journal of Management Information Systems, vol. 14, no. 4, pp. 113-145,

(Spring, 1998).

[12] W. Schafer, R. Prieto-Diaz, and M. Masao, Historcal Overview: Software

Reusability: Ellis Horwood, 1994.

[13] M. Jha and L. O'Brien, "Re-engineering Legacy Systems for Modernization:

The Role of Software Reuse", accepted in The Second International

Conference on Advances in Computer Sciences and Electronics Engineering,

New Delhi, India, 23-24 February 2013.

[14] M. Shaw and D. Garlan, Software Architecture: Perspectives on an Emerging

Discipline: Prentice-Hall, 1996.

[15] P. Hsia, A. Davis, and D. Kung, "Status Report: Requirements Engineering",

IEEE Software, vol. 1, no. 1, pp. 75-79, 1993.

[16] M. L. Griss, J. Favaro, and P. Walton, Managerial and Organizational Issues

- Starting and Running a Software Reuse Program: Ellis Horwood, New

York, 1994.

[17] W. Tracz, Confessions of a Used Program Salesman: Instituionalizing

Software Reuse: Addison-Wesley, 1995.

[18] M. Jha, L. O’Brien, and P. Maheshwari, "Identify Issues and Concerns in

Software Reuse", in International Conference on Information Processing

ICIP, Bangalore, India, 8-10 August, 2008. 274

[19] M. Jha, L. O’Brien, and P. Maheshwari, "Identify Issues and Concerns in

Software Reuse", Journal of Information Processing, 2008.

[20] M. Jha and L. O’Brien, "Identifying Issues and Concerns in Software Reuse

in Software Product Lines", in 11th International Conference on Software

Reuse, ICSR 2009, Falls Church, VA, USA, 27-30 September, 2009.

[21] M. L. Griss and M. Wosser, "Making Reuse Work in Hewlett-Packard", IEEE

Software, vol. 12, no. 1, pp. 105-107, January, 1995.

[22] M. L. Griss, "Reuse Comes in Several Flavours", Flashline Software

Development Productivity Council, Flashline white paper 2003.

[23] C. McClure, Software Reuse: Wiley-IEEE Computer Society Press,New

York, 2001.

[24] W. Frakes and K. Kang, "Software Reuse Research: Status and Future", IEEE

Transactions on Software Engineering, vol. 31, no. 7, pp. 529-536, 2005.

[25] M. Morisio, M. Ezran, and C. Tully, "Success and Failures in Software

Reuse", IEEE Transaction on Software Engineering, vol. 28, no. 4, pp. 340-

357, April 2002.

[26] N. E. Fenton and M. Neil, "Software Metrics: Successes, Failures and New

Directions", The Journal of Systems and Software, vol. 47, pp. 149–157,

1999.

[27] L. Bass, P. Clements, and R. Kazman, Software Architecture in Practice, 2nd

ed.: Addison-Wesley, 2003.

[28] D. L. Parnas, "", in Proceedings of The 16th International

Conference on Software Engineering, 1994.

[29] P. Clements, R. Kazman, and M. Klein, Evaluating Software

Architectures:Methods and Case Studies: Addison-Wesley, 2000. 275

[30] S. Comella-Dorda, K. Wallnau, R. Seacord, and J. Robert, "A Survey of

Legacy System Modernization Approaches", Software Engineering Institute,

Carnegie Mellon University, Pittsburgh, PA 15213, April 2000.

[31] W. W. Royce, "Managing the Development of Large Software Systems", in

IEEE WESCON, August 1970.

[32] J. Sametinger, Software Engineering with Reusable Components: Springer-

Verlag, 1997.

[33] L. Bass, P. Clements, and R. Kazman, Software Architecture in Practice:

Addison-Wesley, 1998.

[34] R. Downey and S. Milligan, "Reengineering and Rebuild With X-Analysis",

Databorough Limited, May, 2009.

[35] P. Kruchten, The Rational Unified Process (RUP): An Introduction second

ed.: Addison-Wesley, 2000.

[36] P. Kroll and P. Kruchten, The Rational Unified Process Made Easy: A

Practitioner's Guide to the RUP: IBM Software Group, 2003.

[37] L. B. S. Raccoon, "The Chaos Model and the Chaos Life Cycle ", ACM

Software Engineering Notes, vol. 20, no. 1, pp. 55-66, January 1995.

[38] K. Beck, Extreme Programming Explained: Addison-Wesley, 2000.

[39] D. Rosenberg, M. Stephens, and M. Collins-Cope, Agile Development with

the ICONIX Process: People, Process and Pragmatism Apress, 2005.

[40] B. W. Boehm, J. R. Brown, and M. Lipow, "Quantitative Evaluation of

Software Quality", in Second International Conference on Software

Engineering, October, 1976, pp. 592-605.

[41] D. D. McCracken and M. A. Jackson, "Life cycle concept considered harmful

", ACMSIGSOFT, vol. 7, no. 2, pp. 29-32, April, 1982. 276

[42] G. R. Gladded, "Stop the life-cycle, I want to get off", ACMSIGSOFT, vol. 7,

no. 2, pp. 35-39, April, 1982.

[43] M. D. Lubras and M. T. Harandi, "Addressing the software life cycle

dilemma through knowledge-based design", Workshop Models Languages

Software Specific Design, March 1984.

[44] D. C. Rine and N. Nada, "An empirical study of software reuse reference

model", Information and Software Technology, vol. 42, no. 1, pp. 47-65,

January 2000.

[45] T. A. Corbi, "Program Understanding Challenge for the 1990's", IBM System

Journal, vol. 28 no2, pp. 294-306, 1990.

[46] B. Meyer and C. Mingins, "Component-Based Development: From Buzz to

Spark", IEEE Software, vol. 32 no7, pp. 35-37, 1999.

[47] L. Erlikh, "Leveraging Legacy Systems Dollars for E-Business", IEEE

Computer vol. 2, pp. 17-23, 2000.

[48] L. O'Brien, D. Smith, and G. Lewis, "Supporting Migration to Services using

Software Architecture Reconstruction", in IEEE International Workshop on

Software Technology and Engineering Practice (STEP 2005), Budapest,

Hungary, 2005.

[49] R. Land, J. Carlson, S. Larsson, and I. Crnkovic, "Towards Guidelines for a

Development Process for Component-Based Embedded Systems", in

Workshop on Software Engineering Processes and Application (SEPA),

Yongin, Korea 2009.

[50] K. R. Yeong and E. A. Stohr, "Software reuse: Survey and Research

Directions", Journal of Management Information Systems, vol. 14 no 4, pp.

113-149, 1998. 277

[51] H. Mili, F. Mili, and F. A. Mili, "Reusing Software: Issues and Research

Directions", IEEE Transaction on Software Engineering, vol. 21 no 6, pp.

528-561, 1995.

[52] N. E. Fenton and N. Ohlsson, "Quantitative Analysis of Faults and Failures in

Complex Software System", IEEE Transaction on Software Engineering, vol.

26 no 8, pp. 797-814, 2000.

[53] I. Sommerville, Software Engineering, Seventh ed.: Pearson Education, 2010.

[54] J. McGee "Legacy Systems: Why History Matters", Enterprise Systems

Journal, 10th November 2005.

[55] M. L. Brodie and M. Stonebraker, Migrating Legacy Systems: Gateways,

interfaces and the Incremental Approach: Morgan Kaufman 1995.

[56] A. Lauder and M. Lind, "Legacy Systems: Assets or Liabilities? A Language

Action Perspective on Respecting and Reflecting Negotiated Business

Relationships in Information Systems", Working Paper in Boras Studies of

Information Systems, University College of Boras, Sweden, vol. 20, 2006.

[57] S.K. Mishra, D.S. Kushwaha, A.K. Misra "Creating Reusable Software Component

from Object-Oriented Legacy System through Reverse Engineering". The Journal of

Object Technology (ETH Zurich) Vol. 8 (5): 133–152. July–August 2009.

[58] J. Ransom, I. Sommerville, and I. Warren, "A Method for Assessing Legacy

Systems for Evolution", in Proceedings of the Second Euromicro Conference

on Software Maintenance and Reengineering (CSMR98), 1998.

[59] M. Battaglia, G. Savoia, and J. Favaro, "RENAISSANCE, A Method to

Migrate from Legacy to Immortal Software Systems", presented at the 2nd

Euromicro Conference on Software Maintenance and Reengineering,

Florence, Italy 1998. 278

[60] J. Bisbal, D. Lawless, B. Wu, J. Grimson, V. Wade, R. Richardson, and D.

Garlan, "An Overview of Legacy Information System Migration", presented

at the Proceedings of the 4th Asian-Pacific Software Engineering and

International Computer Science Conference (APSEC 97, ICSC 97), Clear

Water Bay, Hong Kong, 1997.

[61] M. M. Lehman and L. Belady, "Program Evolution:Process of Software

Change", Academic Press, London, 1985.

[62] "Legacy Modernization Survey", Attachmate September 11, 2009.

[63] T. S. Group, "Extreme Chaos", 2001.

[64] L. Erlikh, "Leveraging Legacy Systems In Modern Architectures", ZJournal

www.zjournal.com, 2003.

[65] H. M. Sneed, "Economics of Software Reengineering", Software

Maintenance: Research and Practice, vol. 3, pp. 163-182, 1991.

[66] B. Erricson-Connor, "Truth and Consequences", ZJournal, pp. 38-43,

August/September 2003.

[67] Z. Li, X. Anming, Z. Naiyue, H. Jianbin, and C. Zhong, "A SOA

Modernization Method Based on Tollgate Model", in 2009 International

Symposium on Information Engineering and Electronic Commerce, Ternopil,

Ukraine, 16-17 May 2009.

[68] T. P. Morgan, "Legacy Application Modernization Strategies Hinge on

SOA", The Four Hundred iSeries and AS/400 Insight, vol. 15, no. 40,

October 6, 2006.

[69] P. Winsberg, "Legacy Code:Don't Bag it, Wrap it", Datamation vol. 41, no.

9, pp. 36-41, 1995.

279

[70] H. M. Sneed, "Encapsulating Legacy Software for Use in Client/Server

Systems", in 3rd Working Conference on Reverse Engineering, Monterey,

California, USA, pp. 104-119, November 1996.

[71] N. Ganti and W. Brayman, Transition of Legacy Systems to a Distributed

Architecture: John Wiley & Sons Inc., 1995.

[72] M. L. Brodie and M. Stonebraker, Migrating Legacy Systems: Gateways,

Interfaces & the Incremental Approach: Morgan Kaufmann, 2007.

[73] D. Aebi, "Data Re-engineering - A Case Study", in 1st East-European

Symposium on Advances in Database and Information Systems (ADBIS'97),

St.Petersburg, Russia, September 1997.

[74] A. J. O'Callaghan, Practical Experiences of Object Technology: Cheltenham:

Stanley Thornes in association with UNICOM, 1996.

[75] J. Bisbal, D. Lawless, B. Wu, and J. Grimson, "Legacy Information System

Migration:A Brief Review of Problems, Solutions and Research Issues",

Trinity College, Ireland, Dublin 1999.

[76] D. F. Carr, "Web-Enabling Legacy Data When Resources Are Tight",

Internet World August 10,1998.

[77] R. Altman, Y. Natis, J. Hill, J. Klein, B. Lheureux, M. Pezzini, R. Schulte,

and S. Varma, "Middleware: The Glue for Modern Application", Gartner

Group, Strategic Analysis Report 26 July, 1999.

[78] D. Eichmann, "Application Architectures for Web Based Data Access",

Proceedings of the workshop Web Access to Legacy Data, Fourth

International WWW Conference, Boston, Massachusetts, USA, 11-14

December, 1195.

280

[79] R. Perez-Castillo, I. Garcia-Rodrigez de Guzman, M. Piattini, and O. Avila-

Garcia, "On the use of ADM to Contextualization Data on Legacy Source

Code for Software Modernization", in 16th Working Conference on Reverse

Engineering, Antwerp, Belgium, 15-18 October 2008.

[80] E. Sciore, M. Siegel, and A. Rosenthal, "Using Semantic Values to Facilitate

Interoperability among Heterogeneous Information Systems", ACM

Transactions on Database Systems, vol. 19, no. 2, pp. 254-290, 1994.

[81] OMG. (2007, 09/06/2009). ADM Task Force by OMG

[82] P. Newcomb, "Architecture-Driven Modernization (ADM)", in Proceedings

of the 12th Working Conference on Reverse Engineering (WCRE’05),

Washington, DC, USA, 2005.

[83] OMG. Architecure-Driven Modernization (ADM): Knowledge Discovery

Meta-Model (KDM), v1.1. Available:

http:///www.omg.org/sec/KDM/1.1/PDF/.2009

[84] G. Kotonya and J. Hutchinson, "A COTS-Based Approach for Evolvng

Legacy Systems", presented at the Sixth International Conference on

Commercial-Off-the-Shelf (COTS) -Based Software Systems (ICCBSS'07),

Alberta, Canada, February26- March 2, 2007.

[85] G. Kotonya and J. Hutchinson, "Managing Change in COTS-Based Systems",

in Proceedings of 21st IEEE International Conference on Software

maintenance (ICSM) Budapest, Hungary, 25-30 September, 2005.

[86] J. M. Voas, "The Challenges of Using COTS Software In Component-Based

Development", Computer, vol. 44, pp. 31-37, 1998.

[87] D. McIlroy, "Mass-produced Software Components", in Proceedings of

Software Engineering Concepts and Techniques, pp. 138-155, 1968. 281

[88] M. Morisio, M. Ezran, and C. Tully, "Success and Failures in Software

Reuse", IEEE Transactions on Software Engineering,, vol. 28(4), pp. 340-

357, April 2002.

[89] W. B. Frakes and C. J. Fox, "Sixteen Questions about Software Reuse",

Communications of the ACM, vol. 38(6), pp. 75-87, June 1995.

[90] H. Mili, A. Mili, S. Yacoub, and E. Addy, Reuse-based Software

Engineering. Techniques, Organizations, and Controls: John-Wiley & Sons,

2002.

[91] C. W. Krueger, "Software Reuse", ACM Computing Surveys (CSUR), vol. 24

(2), pp. 131-183, June 1992.

[92] D. C. Schmidt, "A Family of Design Patterns for Applicationlevel Gateways",

The Theory and Practice of Object Systems (Special Issue on Patterns and

Pattern Languages), vol. 2, no. 1, 1996.

[93] M. Gualtieri and M. Gilpin, "Achieving Optimal Reuse of a Broader Range

of Assets. Best Practices: A Paragmatic Approach to Software Reuse", A

Report, June 2008.

[94] M. L. Griss, "Systematic Software Reuse: Architecture, Process and

Organization are Crucial", Software Technology Laboratory, HP

Laboratories, October 1996.

[95] V. Basili, L. Briand, and W. Melo, "How Reuse Influences Productivity in

Object-Oriented Systems", Communication of the Association for Computing

Machinery, vol. 39, no. 10, pp. 104-116, 1996.

[96] A. Endres, "Lessons Learned in an Industrial Software Lab", IEEE Software,

vol. 10, no. 5, pp. 58-61, September 1993.

282

[97] D. Bauer, "A Reusable Parts Center", IBM Systems Journal, vol. 32, no. 4,

pp. 620-624, 1993.

[98] M. L. Griss, "Software Reuse Experience at Hewlett-Packard", in 16th

International Conference on Software Engineering, Sorento, Italy, 1994.

[99] R. Joos, "Software Reuse at Motorola", IEEE Software, vol. 11, no. 5, pp. 42-

47, September 1994.

[100] W. B. Frakes and S. Isoda, "Success Factors of Systematic Software Reuse",

IEEE Software, vol. 12, no. 1, pp. 14-19, January 1995.

[101] D. C. Rine, "Success Factors for Software Reuse that are Applicable across

Domains and Businesses", presented at the ACM Symposium on Applied

Computing, San Jose, USA, 1997.

[102] M. A. Rothenberger, K. J. Dooley, U. R. Kulkarni, and N. Nada, "Strategies

for Software Reuse: A Principal Component Analysis of Reuse Practices",

IEEE Transactions on Software Engineering, vol. 29, no. 9, pp. 825-837.,

September 2003.

[103] C. W. Kruger, "Software Reuse", ACM Computing Surveys, vol. 24, no. 2, pp.

131-183, June 1992.

[104] R. Guha, R. McCool, and E. Miller, "Semantic Search", in Proceedings of the

twelfth international conference on World Wide Web, Budapest, Hungary,

2003, pp. 700-709.

[105] H. M. Sneed and S. H. Sneed, "Creating web services from legacy host

programs", presented at the 5th Internation Workshop on Web Site Evolution,

Amsterdam, The Netherlands 2003.

283

[106] Y. Zou and K. Kontogiannuis, "Reengineering Legacy Systems Towards

Web Environments", in Managing Corporate Information Systems Evolution

and Maintenance, I. G. Inc., Ed., ed, 2005.

[107] M. Woodside, T. Zheng, and M. Litoiu, "Performance Model Estimation and

Tracking using Optimal Filters", IEEE Transactions on Software

Engineering, May 2008.

[108] L. O’Brien and C. Stoermer, "Architecture Reconstruction Case Study",

Software Engineering Institute, Carnegie Mellon University, , Pittsburgh, PA

15213, April 2003.

[109] C. Baroudi and F. Halper, "Executive Survey:SOA Implementation

Satisfaction Technical Report", Hurwitz and Associates, 2007.

[110] A. Tomer, L. Goldin, T. Kuflik, E. Kimchi, and S. R. Schach, "Evaluating

Software Reuse Alternatives: A Model and its Application to an Industrial

Case Study", IEEE Transactions on Software Engineering, vol. 30 (9), no. 9,

pp. 601-612, 2004.

[111] J. Sametinger and Software engineering with reusable components: Springer-

Verlag New York, Inc. , 1997

[112] I. Jacobson, M. Griss, and P. Jonsson, Software Reuse: Architecture, Process,

and Organization for Business Success: Addison-Wesley, 1997.

[113] W. Lim, Managing Software Reuse: A Comprehensive Guide to Strategically

Reengineering the Organization for Reusable Components: Prentice Hall

Professional, 1998.

[114] J. M. Neighbors, "The Draco Approach to Constructing Software from

Reusable Components", IEEE Transcation on Software Engineering, vol. 10,

no. 5, pp. 564-574, 1984. 284

[115] J. C. S. do Prado Leite, "Draco-Puc: A Technology Assembly for Domain

Oriented Software Development", in International Conference on Software

Reuse, Page(s) 94-100, Rio de Janerio, Brazil, 1994.

[116] C. D. Klingler and D. Creps, "Software Technology for Adaptable, Reliable

Systems (STARS), The Reuse-Oriented Software Evolution (ROSE) Process

Model", Paramax Systems Corporation, Reston, VA 22091July 1993.

[117] I. Jacobson, M. L. Griss, and P. Jonsson, Reuse-driven Software Engineering

Business (RSEB): Addison-Wesley, 1997.

[118] M. L. Griss, J. Favaro, and M. d. Alessandro, "Integrating Feature Modeling

with the RSEB", in International Conference on Software Reuse, pp. 76-85,

1998.

[119] K. C. Kang, S. Kim, J. Lee, K. Kim, E. Shin, and M. Huh, "FORM: A

Feature-Oriented Reuse Method with domain-specific reference

architectures", Annals of Software Engineering Notes, pp. 143-168, 1998.

[120] C. Atkinson, J. Bayer, C. Bunse, E. Kamsties, O. Laitenberger, R. Laqua, D.

Muthig, B. Paech, J. Wust, and J. Zettel, Component-based Product Line

Engineering with UML: Addison-Wesley, 2002.

[121] F. Van der Linden, "Software Product Families in Europe: The Esaps and

Café Projects", IEEE Software, vol. 19(4), pp. 41-49, July/August 2002.

[122] D. L. Parnas, "On the Design and Development of Program Families", IEEE

Transaction on Software Engineering, vol. 2(1), pp. 1-9, 1976.

[123] J. Bosch, Design and Use of Software Architectures: Adopting and Evolving

a Product Line Approach, Pearson Education (Addison-Wesley & ACM

Press), ISBN 0-201-67494-7, May 2000.

285

[124] R. Prieto-Diaz and G. Arango, "Domain Analysis and Software Systems

Modeling", IEEE Computer Society Press . Los Alamitos, CA., 1996.

[125] L. Palmas de Gran Canaria, "Development and evolution of software

architectures for product families", Second International Workshop on

Development and Evolution of Software Architectures for Product Families,

Spain, February 26-27, 1998.

[126] L. Baum, M. Becker, L. Geyer, and G. Molter, "Using Software Architecture

as a Catalyst for Reuse", presented at the European Reuse Workshop Madrid,

Spain, 1998.

[127] L. O’Brien, "Dali: A Software Architecture Reconstruction Workbench",

Software Engineering Institute, Carnegie Mellon University, 2001.

[128] P. J. Finnigan, R. Holt, I. Kalas, S. Kerr, and K. Kontogiannis, "The software

bookshelf", IBM Systems Journal, vol. 36(4), pp. 564–593, November, 1997.

[129] Imagix4D. Imagix. Available: http://www.imagix.comImagix

[130] Bauhaus_Group. ( version 4.7.2, December 2003., Tour de Bauhaus.

Available: http://www.Bauhausstuttgart.de/demo/index.html

[131] S. Tilley and D. B. Smith, "Perspectives on Legacy System Reengineering"

Software Engineering Institute, Carnegie Mellon University, Pittsburgh, PA

15213-3890, 1995.

[132] N. Y. Lee and C. R. Litecky, "An Empirical Study of Software Reuse with

Special Attention to Ada", IEEE Transactions on Software Engineering, vol.

23, no. 9, pp. 537-549, September 1997.

[133] H. J. Happel, A. Korthaus, S. Seedorft, and P. Tomczyk, "KOntoR: An

Ontology-enabled Approach to Software Reuse", Software Engineering and

Knowledge Engineering pp. 349-354, 2006. 286

[134] M. Shaw, "Architectural Issues in Software Reuse: It's Not Just the

Functionality, It's the Packaging", in IEEE Symposium on Software

Reusability, Page(s): 3-6, Seatle, USA, April 1995.

[135] F. Buschmann, R. Meunier, H. Rohnert, P. Sommerlad, and M. Stal, Pattern-

Oriented Software Architecture : A System of Patterns Chichester, UK; New

York: Wiley, 1996.

[136] M. T. Harandi, "Building a Knowledge Based Software Development

Environment", IEEE Journal on Selected Areas of Communications, vol. 6,

no. 5, pp. 862-868, 1988.

[137] B. P. Allen and S. D. Lee, "A Knowledge-based Environment for the

Development of Software Parts Composition Systems", in 11th International

Conference on Software Engineering, Pittsburgh, PA, May 1989, pp. 104-

112.

[138] P. Devanbu, R. J. Brachman, P. G. Selfridge, and B. W. Ballard, " LaSSIE: A

Knowledge-based Software Information System.", in 12th International

Conference on Software Engingeering, Nice, France, March 1990, pp. 249-

261.

[139] M. Wood and I. Sommerville, "A Knowledge-based Software Components

Catalogue", presented at the Software Engineering Environments, P.

Brereton, Ellis Horwood Limited (ed.) ,Chichester, England, 1988.

[140] V. R. Basili, "Viewing maintenance as reuse-oriented software development",

IEEE Software, vol. 7, no. 1, pp. 19-25, 1990.

[141] P. Wegner, Capital-intensive software technology: Software Reusability

Concepts and Models vol. 1: ACM Addison Wesley Publishing Comapany,

1989. 287

[142] M. M. Lehman and L. A. Belady, Program Evolution : Process of Software

Change, 1985.

[143] D. P. Freedman and G. M. Weinberg, Handbook of Walkthroughs,

Inspections and Technical Reviews, 3rd ed.: Dorset House, 1990.

[144] F. Zoufaly, Issues and Challenges Facing Legacy Systems, November 1,

2002.

[145] R. Horowitz and P. Varaiya, "Control Design of an Automated Highway

System ", Proceedings of the IEEE: Special Issue on Hybrid Systems, vol. 88,

no. 7, pp. 913-925, July 2000.

[146] C. Hofmeiter, R. Nord, and D. Soni, Applied Software Architecture: Boston:

Addison-Wesley, 2000.

[147] T. Guimaraes, "Managing Application Program Maintenance Expenditure",

ACM Computing, vol. 26, no. 10, pp. 739-746, 1983.

[148] R. Kazman, S. G. Woods, and S. J. Carrière, "Requirements for Integrating

Software Architecture and Reengineering Models: CORUM 11", presented at

the 5th Working Conference on Reverse Engineering WCRE'98, Honolulu,

Hawai, 1998.

[149] M. Rahgozar and F. Oroumchian, "An Effective Strategy for Legacy System

Evolution", Journal of Software Maintenance and Evolution: Research and

Practice, vol. 15, no. 5, pp. 325-344, 2003.

[150] S. Latha and A. S. Thanamani, "Service Oriented Architecture –

Technologies, Approaches for Integration and Automation of Legacy System

in Heterogeneous Environment using Reusability technique", Journal of

Computing, vol. 2, no. 12, pp. 64-70, December 2010.

288

[151] J. L. Hainau, "Database Reverse Engineering", Doctoral Dissertation,

University of Namur Institute of d'Informatique, Namur, Belgium,, 1998.

[152] A. Climitile, A. De Lucia, G. A. Di Lucca, and A. R. Fasolino, "Identifying

Objects in Legacy Systems", in 5th International Workshop on Program

Comprehension (WPC '97), Dearborn, MI, USA, May 1997, pp. 138-147.

[153] M. Rahgozar and F. Oroumchian, "A Practical Approach for Modernization

of Legacy Systems", in First EuroAsian Conference on Advances in

Information and Communication Technology, ICT 2002, Vienna, 2002, pp.

149-153.

[154] A. v. Deursen, B. Elsinga, P. Klint, and R. Tolido, "From Legacy to

Component: Software Renovation in Three Steps", CAP Gemini

Institute,CWI, P.O. Box 94079, 1090 GB Amsterdam 2000.

[155] R. C. Seacord, K. Wallnau, J. Robert, S. Comella-Dorda, and S. A. Hissam,

"Custom versus off-the-shelf architecture", Proceedings of Third

International Enterprize Distributed Object Computing Conference (EDOC),

Mannheim, Germany, Page(s)270-278, 27-29 September, 1999,.

[156] V. K. Decyk, C. D. Norton, and B. K. Szymanski, "Modernizing FORTRAN

77 Legacy Codes", presented at the Conference on Computational Physics

2000 ( CCP 2000), Gold Coast Queensland, Australia, 2000.

[157] M. Talla and R. Valverde, "Data oriented and Process oriented Strategies for

Legacy Information Systems Reengineering", International Journal on

Information Technology (ACEEE), vol. 02, no. 01, Page(s): 47-51, March

2012.

[158] B. Wu, D. Lawless, J. Bisbal, R. Richardson, J. Grimson, V. Wade, and D.

O’Sullivan, "The Butterfly Methodology: A Gateway-Free Approach for 289

Migrating Legacy Information Systems", in International Conference on

Engineering Complex Computer Systems (ICECCS ’97), Los Alamitos,

California, US, 1997, pp. 200-205.

[159] J. A. Stafford and A. L. Wolf, "Architecture-Based Software Engineering",

Technical Report CU-CS-891-99, University of Colorado November 1999.

[160] P. Klint and A. v. Deursen, "Techniques for Understanding Legacy Software

Systems", Center for Mathematics and Computer Science (CWI), Amsterdam

26 Feb 2002.

[161] P. B. Kruchten, "The 4+1 View Model of Architecture", IEEE Software, vol.

12, no. 6, pp. 42-50, November 1995.

[162] P. Clements, F. Bachmann, L. Bass, D. Garlan, J. Ivers, R. Little, R. Nord,

and J. Stafford, Documenting Software Architectures: Views and Beyond, 1st

ed.: Pearson Education, 2002.

[163] L. O’Brien, "Architecture Reconstruction to Support a Product Line Effort:

Case Study", Technical Report CMU/SEI-2001-TN-015, Software

Engineering Institute, Carnegie Mellon University July 2001.

[164] L. O’Brien and V. Tamarree, "Architecture Reconstruction of J2EE

Applications: Generating Views from the Module Viewtype", Technical

Report CMU/SEI-2003-TN-028, Software Engineering Institute, Carnegie

Mellon University November 2003.

[165] G. Guo, J. Atlee, and R. Kazman, "A Software Architecture Reconstruction

Method", in Proceedings of the First Working IFIP Conference on Software

Architecture ( WICSAI ), San Antonio, Texas, US, February 22-24,1999, pp.

225-243.

290

[166] T. Bowman, R. C. Holt, and N. V. Brewster, "Linux as a Case Study: Its

Extracted Software Architecture", in International Conference on Software

Engineering, Los Angeles, California, US, May 1999.

[167] P. J. Finnigan, R. Holt, I. Kalas, K. Kontogiannis, H. Mueller, J. Mylopoulos,

S. Perelgut, M. Stanley, and K. Wong, "The portable Bookshelf", IBM

Systems Journal, vol. 36, no. 4, pp. 564-593, November 1997.

[168] The Portable Bookshelf : http://swag.uwaterloo.ca.pbs/.

[169] Bauhaus Group, Tour de Bauhaus, http://www.Bauhaus-

stuttgart.de/demo/index.html, Version 4.7.2, December 2005.".

[170] D. Smith, L. O’Brien, and J. Bergey, "Using the Options Analysis for

Reengineering (OAR) Method for Mining Components for a Product Line",

in Second International Conference on Software Product Line (SPLC 2002),

San Diego, California, USA, 2002, pp. 316-327.

[171] A. H. Eden and R. Kazman, "Architecture, Design, and Implementation", in

25th International Conference on Software Engineering, Los Alamitos, USA,

13-10 May 2003, pp. 149-159.

[172] R. Kazman, "A New Approach to Designing and Analyzing Object-Oriented

Software Architecture", presented at the Conference on Object-Oriented

Programming Systems, Language and Applications- OOPSLA, Denvor,

USA, 1-5 November,1999.

[173] R. T. Monroe, A. Kompanek, R. Melton, and D. Garlan, "Architectural

Styles, Design Patterns, and Objects", IEEE Software vol. 14, no. 1, pp. 43-

52, January 1997.

291

[174] D. E. Perry and A. L. Wolf, " Foundations for the Study of Software

Architecture", ACM SIGSOFT Software Engineering Notes, vol. 17(4), no. 4,

pp. 40-52, Octtober 1992.

[175] D. Garlan and M. Shaw, "An Introduction to Software Architectures",

Advances in Software Engineering and Knowledge Engineering, V. Ambriola,

G. Tortora, (eds.), vol. 2, pp. 1-39, 1993.

[176] R. Kazman, G. Abowd, L. Bass, and P. Clements, "Scenario-Based Analysis

of Software Architecture", IEEE Software, pp. 47-55, November 1996.

[177] M. Klein, T. Ralya, B. Pollak, R. Obenza, and M. G. Harbour, "A

Practitioner’s Handbook for Real-Time Analysis", Kluwer Academic1993.

[178] C. Smith and L. Williams, "Software Performance Engineering: A Case

Study Including Performance Comparison with Design Alternatives", IEEE

Transactions on Software Engineering, vol. 19, no. 7, pp. 720-741, July 1993.

[179] R. Kazman, M. Klein, M. Barbacci, T. Longstaff, H. Lipson, and J. Carriere,

"The Architecture Tradeoff Analysis Method", Technical Report,

CMUMU/SEI-98-TR-008, July 1998.

[180] P. Merson, "Data Model as an Architectural View ", A Technical Note,

Software Engineering Institute, Carnegie Mellon University, Pittsburgh, PA

15213-3890, October 2009.

[181] P. Kruchten, "What do software architects really do?", Journal of Systems

and Software, vol, 81, no. 12, pp. 2413–2416, December 2008.

[182] D. E. Perry and A. L. Wolf, "Foundations for the Study of Software

Architecture ", ACM SIGSOFT Software Engineering Notes, vol. 17, no. 4,

pp. 40-52, October 1992.

292

[183] R. T. Monroe and D. Garlan, "Style-Based Reuse for Software

Architectures", in International Conference on Software Reuse, Orlando, FL,

USA, 1996.

[184] L. Baum, M. Becker, L. Geyer, and G. Molter, "Using Software Architveture

as a Catalyst for Reuse", in European Reuse Workshop Madrid, Spain, 1998.

[185] M. T. Su, J. Hosking, and J. Grundy, "KaitoroCap : a document navigation

capture and visualisation tool", in 9th Working IEEE/IFIP Conference on

Software Architecture, Boulder, Colorado, USA, 20-24 June 2011.

[186] M. T. Su, "Capturing exploration to improve software architecture

documentation ", in 4th European Conference on Software Architecture: ,

2010, pp. 17-21.

[187] R. Kazman and J. Carriere, "Playing Detective: Reconstructing Software

Architecture from Available Evidence", Journal of Automated Software

Engineering, pp. 107-138, April 1999.

[188] A. Razavizadeh, S. Cîmpan, H. Verjus, and S. Ducasse, "Software System

Understanding via Architectural Views Extraction According to Multiple

Viewpoints", in Proceedings of the Confederated International Workshops

and Posters Heidelberg, Berlin, 2009.

[189] B. Bellay and H. Gall, " A Comparison of four Reverse Engineering Tools",

in Fourth Working Conference on Reverse Engineering, Amestedam, the

Netherlands, Page(s): 2-11, 1997.

[190] L. O’Brien, "Dali: A Software Architecture Reconstruction Workbench",

Software Engineering Institute, Carnegie Mellon University May 2001.

[191] R. Holt, PBS: The portable Bookshelf- Introduction

http://SWAG.uwaterloo.ca/PBS/. 293

[192] SWAG Group, Introduction to SWAGKIT:

http://www.SWAG.uwaterloo.ca/SWAGKIT/#introduction.

[193] R. Koschke, Readme.txt, Bauhaus toolkit distribution, December 2003.

[194] Imagix. Imagix4D. Available: http://www.imagix.comImagix

[195] Rigi, "http://www.rigi.csc.uvic.ca/index.html", 2004.

[196] A. Trevors, Software Architecture Toolkit, University of Waterloo:

http://SWAG.uwaterloo.ca/SWAGKIT, February 2004.

[197] R. Holt, Software Bookshelf: Overview and Construction:

http://SWAG.uwaterloo.ca/PBS/.

[198] Bauhaus_Group. (version 4.7.2, December 2003, Tour de Bauhaus.

Available: http://www.Bauhausstuttgart.de/demo/index.html

[199] L. O’Brien, "Experiences in Architecture Reconstruction at Nokia", Software

Engineering Institute, Technical Report, CMU/SEI-2002-TN-004, August

2003.

[200] R. Koschke, T. Eisenbarth, and D. Simon, "Locating Features in Source

Code", IEEE Transactions on Software Engineering, vol. 29, no. 3, pp. 210-

224, March 2003.

[201] G. Murphy and D. Notkin, "Reengineering with Reflexion Models: A case

study", IEEE Computer, vol. 30, no. 8, pp. 29-36, August 1997.

[202] M. C. Meyer B, "Component-Based Development: From Buzz to Spark",

IEEE Software, vol. 32 no7, pp. 35-37, 1999.

[203] Oracle White Paper, "Unlocking the Mainframe: Modernizing Legacy

Systems to a Service-Oriented Architecture", June 2008.

294

[204] F. McGurren, "Supporting Component-Based Software Evolution through

ConnX ", Research MSc Thesis, Department of Computer Science and

Information Systems, University of Limerick, June 2004.

[205] S. Galvin, "Enhancing the Role of Architectural Representations in

Component-Based Development using Architectural Description Languages",

Research MSc Thesis, Department of Computer Science and Information

Systems, University of Limerick, June 2005.

[206] J. Willaimson, T. Laszewski, Oracle Modernization Solutions: Approaches to

Legacy Modernization, ISBN: 1847194648, Publisher: Packt Publishing, 1st

ed. September 2008.

[207] J. L. Hainaut, " Database Reverse Engineering", Doctoral Dissertation,

University of Namur- Institute d’Informatique, 211B-5000 Namur, Belgium,

1998.

[208] A. v. Deursen, P. Klint, and C. Verhoef, "Research Issues in Software

Renovation", in Proceedings of the Fundamental Approaches to Software

Engineering FASE99, Berlin 1999.

[209] G. A. Di Lucca, A. R. Fasolino, and P. Tramontana, "Reverse Engineering

Web Applications: the WARE Approach", Journal of Software Maintenance

and Evolution: Research and Practice, vol. 16, pp. 71-110, 2004.

[210] H. M. Sneed, "Migrating to Web Services", in Emerging Methods,

Technologies and Process Management in Software Engineering, Hoboken,

New Jersey, 2008, pp. 151-176.

[211] R. Pérez-Castillo, I. García-Rodríguez de Guzmán, I. Caballero, and M.

Piattini, "Software modernization by recovering Web services from legacy

295

databases", Journal of Software Maintenance and Evolution: Research and

Practice, vol. 25, no. 1 28 March 2012.

[212] B. Meyer, Object-Oriented Software Construction, 2nd Edition ed.: Prentice

Hall, 1997.

[213] B. K. Kang and J. Bieman, "Using Design Abstractions to Visualize,

Quantify, and Restructure Software", The Journal of Systems and Software,

vol. 42, no. 2, pp. 172-187, 1998.

[214] G. V. Subramaniam and E. J. Byrne, "Deriving an Object Model from Legacy

FORTRAN Code", in International Conference on Software Maintenance,

Monterey, CA, USA, Page(s): 3-12, 4-8 November 1996.

[215] W. Stevens, G. Myers, and L. Constantine, "Structured Design", IBM Systems

Journal, vol. 13, no. 2, pp. 115-139, June 1974.

[216] A. J. Pinkney, "An Automatic Cane Railway Scheduling System", MSc

Thesis, Department of Mathematics, James Cook University of North

Queensland, Australia, December 1987.

[217] Understand for FORTRAN, " FORTRAN 77, 90, 95 Reverse Engineering,

Metrics and Cross Reference Tool, Scientific Toolworks Inc",

www.scitools.com

[218] H. A. Muller, O. A. Mehmet, S. R. Tilley, and J. S. Uhl, "A Reverse

Engineering Approach to System Identification", Journal of Software

Maintenance: Research and Practice, vol. 5, no. 4, pp. 181-204, December

1993.

[219] M. Ezran, M. Morisio, and C. Tully, "A Survey of European Reuse

Experiences: Initial Results", in Proceedings of 24th Euromicro Conference,

pp. 875-881, Pisa, Italy, 25-27 August 1998,. 296

[220] J. Daly, J. Miller, A. Brooks, M. Roper, and M. Wood, "A Survey of

Experiences amongst Object-Oriented Practitioners", presented at the

Proceedings of Software Engineering Conference, Asia Pacific, Brisbane,

Queensland, Australia, 6-9 December 1995.

[221] H. Sharp, Y. Rogers, and J. Preece, Interaction Design: Beyond Human-

Computer Interaction, 2nd Edition ed.: John Wiley & Sons Ltd, 2007.

[222] M. Jha and L. O'Brien, "A Comparison of Software Reuse in Software

Development Communities", presented at The 5th International Malaysian

Conference on Software Engineering, Johor Bahur, Malaysia, 12-14

December 2011.

[223] H. D. Rombach, "Software Reuse: A Key to the Maintenance Problem",

Journal of Information and Software Technology, vol. 33, no. 1, pp. 86-92,

1991.

[224] W. B. Frakes and C. J. Fox, "Quality Improvement Using A Software Reuse

Failure Modes Model", EEE Transactions on Software Engineering, vol. 22,

no. 4, pp. 274-279, April, 1996.

[225] S. Rosenbaum and B. d. Castel, "Managing Software Reuse: An Experience

Report ", in 17th International Conference on Software Engineering (ICSE'

95), Seattle, USA 24-28 April 1995.

[226] R. Torkar and S. Mankefors, "A Survey on Testing and Reuse", in

Proceedings of the IEEE International Conference on Software - Science,

Technology & Engineering, Herzelia, Israel, 4-5 November 2003.

[227] M. Daneva, "Practical Reuse Measurement in ERP Requirements

Engineering ", in 12th International Conference on Advanced Information

Systems Engineering, London, UK, May 2000, pp. 309-324. 297

[228] C. W. Krueger, "Software Reuse", ACM Computing Surveys, vol. 24, no. 2,

pp. 131-183, June 1992.

[229] The Standish Group, "The CHAOS Report", 1994.

[230] A. R. Ghosh, B. Krieger, R. Glott, and G. Robles, "Free/Libre and Open

Source Software: Survey and Study. FLOSS ", University of Maastricht, The

Netherlands June 2002.

[231] K. Pohl, G. Bockle, and F. van der Linden, Software Product Line

Engineering: Foundations, Principles, and Techniques, First ed.: New York,

NY: Springer, 2005.

[232] D. M. Weiss and C. R. Lai, Software Product Line Engineering: A Family-

Based Software Development Process Addison-Wesley, 1999.

[233] T. Erl, Service-Oriented Architecture: Concepts, Technology, and Design:

Prentice Hall, 2009.

[234] J. Kuusela, "Architectural evolution: Nokia Mobile Phone Case Study ",

presented at the First Working IFIP Conference on Software Architecture

(WICSA1), San Antonio, TX, USA 22-24 February 1999.

[235] A. Maccari and C. Riva, "Architectural Evolution of Legacy Product

Families", in Fourth International Workshop on Product Family Engineering

Bilbao, Spain 3-5 October 2001.

[236] M. Jazayeri, A. Ran, and F. van der Linden, Software Architecture for

Product Families: Addison Wesley, 2000.

[237] D. M. Hoffman and D. M. Weiss, Software Fundamentals: Collected Papers

by David Parnas: Addison-Wesley 2001.

[238] P. Clements and L. Northrop, Software Product Lines: Practices and

Patterns: Addison Wesley: Boston, MA, 2002. 298

[239] F. van der Linden, "Software Product Families in Europe: The ESAPS &

CAFÉ Projects", IEEE Software, vol. 13, no. 3, pp. 41-49, 2002.

[240] A. Birk, G. Heller, I. John, K. Schmid, T. v. d. Maßen, and K. Müller,

"Product Line Engineering: The State of the Practice", IEEE Software, vol.

20, no. 6, pp. 52-60, 2003.

[241] M. Jha and P. Maheshwari, "Reusing Code for Modernization of Legacy

Systems , September 24-25 Budapest, Hungary, 2005.", in IEEE

International Workshop on Software Technology and Engineering Practice

(STEP05), Budapest, Hungary, 2005.

[242] B. Tekinerdogan, H. Sozer, and M. Aksit, "Software Architecture Reliability

Analysis using Failure Scenarios", The Journal of Systems and Software, vol.

81, pp. 558-575, 2008.

[243] J. L. Cybulski, "Introduction to Software Reuse", Technical Report TR 96/4,

Department of Information Systems, The University of Melbourne1996.

[244] M. Ramachandran, "Software Reuse Guidelines", ACM SIGSOFT Software

Engineering Notes, vol. 30, no. 3, pp. 1-8, May 2005.

[245] P. Freeman, Reusable Software Engineering: Concepts and Research

Directions,Tutorial: Software Reusability Edited by Peter Freeman,

Washington, D.C ed.: IEEE Computer Society Press, 1989.

[246] A. S. Orrego and G. E. Mundy, "A Study of Software Reuse in NASA

Legacy Systems", Innovations System and Software Engineering, vol. 3, pp.

167-180, 2007.

[247] R. Roshandel, S. Banerjee, and L. Cheung, "Estimating Software Component

Reliability by Leveraging Architectural Models", in International Conference

on Software Engineering (ICSE' 06), Shangai, China, 20-28 May 2006 299

[248] D. Beuche, A. Birk, and H. Dreier, "Using Requirements Management Tools

in Software Product Line Engineering: The State of the Practice", in

Proceedings of 11th International Conference on Software Product Line

(SPLC’07), 2007.

[249] The SEI Software Product Lines Certificate Programs.

http://www.sei.cmu.edu/training/certificates/productlines/index.cfm.

[250] R. van Ommering, "Software Reuse in Product Populations", IEEE

Transactions on Software Engineering, vol. 31, no. 7, pp. 537-550, 2005.

[251] M. F. Dunn and J. C. Knight, "Software Reuse in an Industrial Setting: A

Case Study", in 13th International Conference on Software Engineering

(ICSE' 91), Austin, TX, USA 13-17 May 1991, pp. 329-338.

[252] E. Henry and B. Faller, "Large-Scale Industrial Reuse to Reduce Cost and

Cycle Time", IEEE Software, vol. 12, no. 5, pp. 47-53, 1995.

[253] P. Mi and W. Scacchi, "Modeling Articulation Work in Software Engineering

Processes", in International Conference on the Software Process, California,

USA, 1991, pp. 188-201.

[254] R. W. Selby, "Enabling Reuse-Based Software Development of Large-Scale

Systems", IEEE Transactions on Software Engineering, vol. 31, no. 6, pp.

495-510, June 2005.

[255] I. Jacobson, M. Griss, and P. Jonsson, Software Reuse: Architecture, Process,

and Organization for Business Success: Addison-Wesley, 1997.

[256] G. Booch, "Architectural patterns, http://www.rational.com/products/ ",

1998.

[257] J. M. Moore and S. C. Bailin, "Domain Analysis Framework for Reuse",

Domain Analysis and Software Systems Modeling, IEEE CS Press, 1990. 300

[258] H. Bruyninckx, "Software Patterns - http, www.orocos.org/patterns.html",

2002.

[259] W. W. Agresti and F. E. McGarry, "Minnowbrook Workshop on Software

Reuse: A Summary Report", Software Reuse: EmergingTechnology, Will

Tracz, ed., pp. 33-40, 1988.

[260] A. Sen, "The Role of Opportunism in the Software Design Reuse Process",

IEEE Transactions on Software Engineering, vol. 23, no. 7, July 1997.

[261] D. J. Abel, K. P. Stark, C. R. Murry, and Y. M. Demoulin, "A Routing and

Scheduling Problem for a Rail Syatem: A Case Study", Journal of the

Operations Research Society, 1981.

[262] T. Biggerstaff and C. Richter, Reusability Framework, Assessment, and

Directions vol. 1: ACM Press, 1989.

[263] W. M. Ulrich, "The Evolutionary Growth of of Software Reenginering and the

Decade Ahead", American Programmer, vol. 3, no. 10, Page(s): 14-20. 1990.

[264] Understand for C/C++ Tool Understand Source Code Analysis and Metrics

http:\\scitools.com.

[265] M. Helft and C. Peake, "An Automated Approach to Legacy Modernization",

White Paper: Legacy Modernization, CA Mainframe Solutions, January

2010.

[266] B. Jalender, A. Govardhan and P. Premchand, "Designing Code Level

Reusable Software Components", International Journal of Software

Engineering and Applications (IJSEA), Vol. 3, no. 1, January 2012.

[267] A. Fuhr, T. Horn, V. Riediger and A. Winter, "Model-Driven Software

Migration into Service-Oriented Architectures", Journal of Computer Science

- Research and Development 28(1) Page(s): 65-84, 2013. 301

[268] J. W. Satzinger, R.B. Jackson and S. D. Burd, “Systems Analysis and

Design in a Changing World”, Course Technology Cengage Learning, Sixth

Edition , 2011.

[269] E. J. Chikofsky and J. H. Cross, "Reverse Engineering and Design Recovery:

A Taxonomy", IEEE Software, January 1990.

302

APPENDIX A

A Survey towards Software Reuse

Software reuse is the process of creating all or part of software systems from existing software rather than building software systems from scratch. This simple yet powerful vision was introduced in 1968. Software reuse has, however, failed to become a standard software engineering practice. Although several processes have been investigated to develop reusable software, several research works, including company reports, informal research and empirical studies have shown that an effective way to obtain the software reuse benefits is to adopt a reuse process. However, the existing reuse processes present crucial problems, such as gaps in important activities like development for and with reuse, and putting more emphasis on some specific activities such as analysis, design and implementation. Even today with the idea of software product lines, there is still no clear consensus between the activities such as input, output, processes and the requirements that an effective reuse process must have. Many technical solutions for reuse have been proposed and examined, from subroutines to modules, objects, components and to current research into Commercial Off-the-Shelf (COTS) and software product lines. And yet what is required for software reuse and with is an issue. This survey will help us in our thinking towards creating an effective software reuse process which has not yet been completely answered by the research community. This survey will benefit software engineers, managers, educators and others in the software development and research community about their beliefs, and practice in reusing software, collecting requirements for and with reuse and other lifecycle objects. Answer to some of the questions asked in this survey (e.g. Questions 4, 5, 6 and 7) often taken for granted in the Software Engineering Community, but have not been verified empirically. Many organizations are implementing systematic reuse programs and need answers to practical questions about reuse. This survey should not take more than 20 minutes. We appreciate your valued time and your response to this survey.

303

The survey has been divided into 5 sections. Please provide a more details answers than YES or NO where appropriate.

Section 1: General Questions (1-3) 1. What is your educational level? • Research in IT • Postgraduate in IT • Bachelors in IT • Any other Professional Qualification in IT

2. How many years of experience do you have in Software Engineering? • Less than 5 years • 5-10 years • 10-20 tears • More than 20 years

3. What do you consider yourself? • A business/industry developer • An open source developer • A manager • A researcher

Section 2: Reuse Measurement Questions (4-19): This section captures answers about reuse measurement, advantages, disadvantages, and factors influencing reuse in the software development community.

4. What do you feel are the key benefits of reuse?

5. What do you feel is the main disadvantage with code reuse today?

6. In your opinion, does reuse education influence reuse? • Yes

304

• No • Maybe 7. What do you feel are the factors contributing to facilitating reuse?

8. Do you feel work is increased when reusing others’ code? • Yes • No 9. Do you feel software engineering practice influence reuse? • Yes • No 10. Do you feel there is increased recognition towards reuse?

11. Do you feel a common software process would promote reuse? • Yes • No 12. In your opinion does the use of a reuse repository improve code reuse? • Yes • No 13. Do you feel reuse is more common in certain industries? • Yes • No If Yes, please specify what type of industries.

14. Do you feel that a company’s, divisions, or project’s size is predictive of organizational reuse?

15. In your opinion, do quality concerns inhibit reuse? If so, please state what quality concerns are most common in your organization?

16. In your opinion, is domain knowledge the key to reuse of software artefacts?

17. Do you feel that programming language affects reuse?

305

• Yes • No • Please specify if any other reason 18. Do you feel that CASE tools promote reuse? • Yes • No • Please specify if any other reason 19. If given an opportunity, would you prefer to build from scratch or to reuse?

Section 3: Reuse Technical Questions (20-29): This section captures technical answers about project/ team and its working environment which effects reuse in some way.

20. How many developers do you usually have in one team?

21. Is there usually interaction between your team/project and other team(s).project(s)?

22. Do you find yourself having much freedom in your work as a developer?

23. Are you involved in setting requirements or specifications for software?

24. How do you know if you’ve fulfilled the requirements, specifications or goals for particular software?

25. Do you have flexible time frames when working in projects?

26. How often do you move deadlines in projects?

27. How often do you spend time writing code and reuse a class/ component?

28. How often do you reuse a piece of code/ class component from another project? 306

29. Does the size of the component seriously affect decisions on whether you develop it yourself or buy it?

Section 4: Testing reused software Questions (30-39): This section captures answers about how analysis, design, coding and testing of software affect code reuse.

30. During a project that is proceeding according to plan, how much of your total time spent on the project do you approximately spend on; • Analysis: __% • Design: __% • Implementation: __% • V&V/testing: __% 31. Which part do you find most troublesome; Analysis, Design, Implementation, V&V/testing?

32. How often do you test your software?

33. Do you test a single component or class in any way?

34. Do you test an assembly of components in any way?

35. Do you use any specific test framework? If yes, please specify which framework(s).

36. What type of structured approach do you use when testing software?

37. Do you test a component in any way before you reuse it?

38. Have the projects you’ve been taking part in stored and documented components/ libraries in a systematic way for later reuse in other projects? 307

39. In case of time shortage during a project, which part do you find is being reduced firstly? Analysis, Design, Implementation, V&V/testing?

Section 5: General Technical Questions (40-51): This section captures answers about the choice of framework, use of software architecture, classes and libraries and the development environment where reuse can be a significant question.

40. How often do you search for re-usable code (i.e. libraries, components, classes) instead of doing it yourself?

41. How many classes does your average project encompass?

42. How many lines of codes does your average project encompass?

43. What sort of development environment do you usually use?

44. Do you use some sort of certification when you have developed a component within your project/ company?

45. In your opinion, does the choice of framework (.NET, EJB, CORBA) affect the possibility for the software to be easily upgradeable in the long term?

46. Which framework do you use (e.g. Java, CORBA, .NET, any other please specify)? Any other please specify.

47. Does the complexity of (a) component(s) seriously affect decisions on whether you develop it yourself?

48. Does open source or commercial projects usually enforce a certain technology? (.NET, EJB, CORBA) 308

49. What do you think should be improved in today’s components technologies? Do they miss a feature that a developer might need?

50. When you have written a piece of software or test cases do you feel confident that you’re written code/ test cases can be reused?

51. Do you feel the need for well documented software architecture of the system in order to reuse the software?

309

APPENDIX B

A Survey towards Software Reuse in the SPL Community

Cost reduction, time to market, product quality improvement, and the support of technological evolution are all critical issues that software companies must face to be competitive in today’s market. Software Product Lines (SPL) is an approach to address these issues. The core concept of SPL is reuse. To practically apply the SPL approach in any application, we must be able to reuse components when building similar applications. The benefits of product line engineering have been extensively discussed in the literature and have widely been recognized by industry. However, reuse is less widespread outside of the SPL community, it’s important to investigate why this is so and investigate several issues related to software reuse in the SPL community. Existing reuse processes present crucial problems, such as gaps in important activities like development for and with reuse, and putting more emphasis on some specific activities such as analysis, design and implementation. Even today, with the idea of software product lines, there is still no clear consensus between the activities such as input, output, processes and the requirements that an effective reuse process must have. Many technical solutions for reuse have been proposed and examined, from subroutines to modules, objects, components and to current research into Commercial Off-the-Shelf (COTS) and SPL. And yet what is required for software reuse and with software reuse is an issue. This survey will help to get a better understanding of reuse within the SPL community and help us in our thinking towards creating an effective software reuse process which has not yet been completely answered by the research community. This survey will benefit software engineers, managers, educators and others in the software development and research community about their beliefs, and practice in reusing software, collecting requirements for and with reuse, and other lifecycle objects. Answers to some of the questions asked in this survey (e.g. Questions 4, 5, 6, and 7) are often taken for granted in the software engineering community, but have not been verified empirically. Many organizations are implementing systematic reuse programs and need answers to practical questions about reuse. The survey should not take more than 20 minutes. We appreciate your valued time and your response to this survey.

310

The survey has been divided into 5 sections. Please provide a more detailed answer than Yes or No where appropriate.

Section 1: General Questions (1-3)

1. How many years of experience do you have in Software Product Lines? • Less then 5 years • 5-10 years • 10-20 years • More then 20 years

2. What do you consider yourself? • A Product Architect • A Core Asset Architect • An open source developer • A researcher • A manager • Others please specify

Section 2: Reuse Measurement Questions (3-19): This section captures answers about reuse measurement, advantages, disadvantages, and factors influencing reuse in the software development community.

3. What do you feel are the key benefits of reuse in SPL?

4. What do you feel is the main disadvantage with software reuse today?

5. In your opinion, does reuse education (especially SPL) influence reuse? • Yes • No • Maybe 6. What do you feel are the factors contributing to facilitating reuse in SPL?

311

7. Do you feel work is increased when reusing others’ code? • Yes • No 8. Do you feel software engineering practice influence reuse? • Yes • No 9. Do you feel there is increased recognition towards reuse?

10. Do you feel a common software process would promote reuse especially in SPL)? • Yes • No 11. In your opinion does the use of a reuse repository improve code reuse? • Yes • No 12. Do you feel reuse is more common in certain industries? • Yes • No If Yes, please specify what type of industries.

13. Do you feel that a company’s, divisions, or project’s size is predictive of organizational reuse?

14. In your opinion, do quality concerns inhibit reuse? If so, please state what quality concerns are most common in your organization?

15. In your opinion, is domain knowledge the key to reuse of software ?

16. Do you feel that programming language affects reuse? • Yes • No

312

• Please specify if any other reason

17. Do you feel that CASE tools promote reuse? • Yes • No • Please specify if any other reason 18. If given an opportunity, would you prefer to build from scratch or to reuse?

Section 3: Reuse Technical Questions (19-29): This section captures technical answers about project/ team and its working environment which effects reuse in some way.

19. How many developers do you usually have in one team?

20. Is there usually interaction between your team/project and other team(s).project(s)?

21. Do you find yourself having much freedom in your work as a developer/architect?

22. Are you involved in setting requirements or specifications for software?

23. How do you know if you’ve fulfilled the requirements, specifications or goals for particular software you are developing\architecting?

24. Do you have flexible time frames when working in projects?

25. How often do you move deadlines in projects?

26. How often do you spend time writing code and reuse a class/ component?

27. How often do you reuse a piece of code/ class component from another project?

313

28. Does the size of the component seriously affect decisions on whether you develop it yourself or reuse it?

Section 4: Testing reused software Questions (29-39): This section captures answers about how analysis, design, coding and testing of software affect code reuse.

29. During a project that is proceeding according to plan, how much of your total time spent on the project do you approximately spend on; • Analysis: __% • Design: __% • Implementation: __% • V&V/testing: __% 30. Which part do you find most troublesome; Analysis, Design, Implementation, V&V/testing?

31. How often do you test your software? • Core asset component • Product specific component 32. Do you test a single component or class or core component in any way?

33. Do you test an assembly of components in any way?

34. Do you use any specific test framework? If yes, please specify which framework(s).

35. What type of structured approach do you use when testing software?

36. Do you test a component in any way before you reuse it?

37. Have the projects you’ve been taking part in stored and documented components/ libraries in a systematic way for later reuse in other projects?

314

38. In case of time shortage during a project, which part do you find is being reduced firstly? Analysis, Design, Implementation, V&V/testing?

Section 5: Development environment Questions (39-51): This section captures answers about the choice of framework, use of software architecture, classes and libraries and the development environment where reuse can be a significant question.

39. How often do you search for re-usable code (i.e. libraries, components, classes) instead of doing it yourself? If you develop core components do you search for reusable components?

40. How many classes does your average project encompass?

41. How many lines of codes does your average project encompass?

42. What sort of development environment do you usually use?

43. Do you use some sort of certification when you have developed a component within your project/ company?

44. In your opinion, does the choice of framework (.NET, EJB, CORBA) affect the possibility for the software to be easily upgradeable in the long term?

45. Which framework do you use (e.g. Java, CORBA, .NET, any other please specify)? Any other please specify.

46. Does the complexity of (a) component(s) seriously affect decisions on whether you develop it yourself?

47. Does open source or commercial projects usually enforce a certain technology? (.NET, EJB, CORBA) 315

48. What do you think should be improved in today’s components technologies? Do they miss a feature that a developer might need?

49. When you have written a piece of software or test cases do you feel confident that you’re written code/ test cases can be reused?

50. Do you feel the need for well documented software architecture of the system in order to reuse the software?

51. How do you understand system and component reliability and their interactions? Do you use any process to identify critical components and interfaces?

316