A Study on the Evolution of Software Quality and Technical Debt in Open Source Applications
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSITY OF MACEDONIA SCHOOL OF INFORMATION SCIENCES DEPARTMENT OF APPLIED INFORMATICS A study on the evolution of software quality and technical debt in open source applications Theodoros Amanatidis PhD Dissertation Supervisor: Prof. Alexander Chatzigeorgiou Advisors: Prof. Maria Satratzemi, Prof. Ioannis Stamelos Thessaloniki, 05/2020 “A study on the evolution of software quality and technical debt in open source applications” Theodoros Amanatidis BSc Financial Accounting 2011, MSc Applied Informatics 2014 PhD Dissertation A dissertation submitted in fulfillment of the requirements for the degree of Doctor of Philosophy University of Macedonia Thessaloniki, Greece 2020 This dissertation was submitted during the massive pandemic of the corona virus COVID-19 with over 4M cases and 280K deaths all over the world at the time of this submission Summary Almost all software systems around us evolve constantly to accommodate new requirements, to adapt in changing environments and to fix known issues. Software evolution analysis can reveal important information concerning maintenance practices followed by development teams. The goal of this dissertation is to study the evolution of open source web applications by investigating the evolution of software quality and maintenance cost as captured by the metaphor of Technical Debt (TD). Technical Debt is a concept in programming which reflects the extra maintenance effort that arises when “sub-optimal” code, that is easy to implement in the short run, is used instead of applying best practices. This research work investigates the evolution of a large number of open source web applications. The applications were analyzed in the context of Lehman’s eight (8) laws of software evolution and particularly, it has been examined whether the laws are confirmed in practice for open source web applications. Lehman’s Laws are, Continuing Change (Law I), Increasing Complexity (Law II), Self-Regulation (Law III), Conservation of Organizational Stability (Law IV), Conservation of Familiarity (Law V), Continuing Growth (Law VI) Declining Quality (VII) and Feedback System (VIII). The results provide evidence that the evolution of web applications comply with most of the laws. The TD metaphor has received increasing attention from the research community in the last years. Particularly, a large portion of research studies have focused on the notion of TD Interest, which reflects the additional maintenance effort that is incurred due to the existence of sub-optimal software (i.e., due to the existence of Technical Debt). Adopting the state-of-research on TD Management, this dissertation has aimed at investigating the impact of TD on corrective maintenance and particularly, to what extent does the presence of TD in software modules slows down development pace by increase the time and effort required for fixing bugs. Although TD is usually assessed on either the entire system or on individual software modules and most studies focus on the identified inefficiencies, it is the actual craftsmanship of developers that causes the accumulation of TD. Driven by the fact that TD is introduced by the developers themselves, this dissertation attempts to explore the relation between developers’ characteristics and the tendency to introduce inefficiencies that lead to TD. Despite the fact that TD is an established and recognized concept in the software engineering community, it also remains a metaphor and like all metaphors it is inherently abstract. This means that the way it is defined and interpreted by software engineering stakeholders constitutes a subjective matter. Each software professional has developed his/her own perception around TD prioritization and management. Thus, many voices question the validity of the way it is detected and calculated by automated static analysis tools. To this end, in the context of this research a survey has been sent to a large number of developers that contribute to open source web applications in order to gain insights into the factors that lead developers to adopt or reject fixes as suggested by automated static code analysis. By acknowledging the lack of a ground truth regarding the assessment of TD among the existing TD tools, this dissertation attempts to extract a set of classes with high TD, as detected by three (3) major TD tools (CAST AIP, Squore and SonarQube). These classes can serve as benchmark of validated high-TD classes and this way a basis can be established that can be used either for prioritization of maintenance activities or for training more sophisticated TD identification techniques. To this end, the current research work proposes a methodology to extract a benchmark set of validated high-TD classes while at the same time, it reveals three types of class profiles that successfully capture the spectrum of TD measurements provided by the three TD tools. Finally, the overall contribution of this dissertation comprises five (5) points; (a) it provides valuable undiscovered information on how open source web applications evolve over time since web-based systems had received limited attention in contrast to desktop ones, (b) considering the lack of empirical evidence on the relation between TD amount and TD Interest, it investigates to what extent the presence of TD in software modules slows down development pace by increasing the time and effort required for fixing bugs, (c) by acknowledging that it is the actual craftsmanship of the developers that causes the accumulation of TD, this work outlines the characteristics of the developers who tend to add TD in open source applications, (d) it sheds light into the reasons that drive developers to agree or disagree with automatically detected TD whose urgency is very often questionable by developers and (e) by acknowledging the lack of a basis regarding the detection and prioritization of TD among the existing TD tools, the current work proposes a methodology to extract a benchmark set of modules which are ranked as high-TD modules by three (3) TD tools altogether. Keywords: software evolution, Lehman’s laws, software quality, software maintenance, open-source applications, software repositories, technical debt, technical debt management, corrective maintenance, archetypal analysis, inter-rater agreement, ground truth, benchmark Contents Chapter I. INTRODUCTION....................................................................................................... 1 1. Software Evolution and Technical Debt ....................................................................................... 1 2. Dissertation Goals and Research Questions ................................................................................. 1 2.1. Evolution of Web Applications .............................................................................................. 4 2.2. Technical Debt and Corrective Maintenance ......................................................................... 4 2.3. Personalized Assessment of Technical Debt Principal ............................................................ 5 2.4. Factors Affecting Decision to Repay Technical Debt .............................................................. 5 2.5. Benchmark of Technical Debt Liabilities ................................................................................ 6 3. Dissertation outline .................................................................................................................... 6 Chapter II. BACKGROUND ON TECHNICAL DEBT ................................................................ 7 1. Foreword .................................................................................................................................... 7 2. Key Components of TD ................................................................................................................ 7 3. What TD really is ......................................................................................................................... 8 4. TD Types ..................................................................................................................................... 9 5. Activities and Strategies for Managing TD ................................................................................. 10 6. Tools assessing TD (TD tools)..................................................................................................... 11 7. A comment on Technical Debt Management ............................................................................. 13 Chapter III. EVOLUTION OF WEB APPLICATIONS ................................................................ 16 1. Introduction .............................................................................................................................. 16 2. Related Work ............................................................................................................................ 18 3. Case Study Design ..................................................................................................................... 19 3.1. Goal and Research Question ............................................................................................... 19 3.2. Selection of Cases ............................................................................................................... 20 3.3. Employed Process and Tools ............................................................................................... 22 3.4. Data Analysis .....................................................................................................................