A Study on the Evolution of Software Quality and Technical Debt in Open Source Applications

Total Page:16

File Type:pdf, Size:1020Kb

A Study on the Evolution of Software Quality and Technical Debt in Open Source Applications UNIVERSITY OF MACEDONIA SCHOOL OF INFORMATION SCIENCES DEPARTMENT OF APPLIED INFORMATICS A study on the evolution of software quality and technical debt in open source applications Theodoros Amanatidis PhD Dissertation Supervisor: Prof. Alexander Chatzigeorgiou Advisors: Prof. Maria Satratzemi, Prof. Ioannis Stamelos Thessaloniki, 05/2020 “A study on the evolution of software quality and technical debt in open source applications” Theodoros Amanatidis BSc Financial Accounting 2011, MSc Applied Informatics 2014 PhD Dissertation A dissertation submitted in fulfillment of the requirements for the degree of Doctor of Philosophy University of Macedonia Thessaloniki, Greece 2020 This dissertation was submitted during the massive pandemic of the corona virus COVID-19 with over 4M cases and 280K deaths all over the world at the time of this submission Summary Almost all software systems around us evolve constantly to accommodate new requirements, to adapt in changing environments and to fix known issues. Software evolution analysis can reveal important information concerning maintenance practices followed by development teams. The goal of this dissertation is to study the evolution of open source web applications by investigating the evolution of software quality and maintenance cost as captured by the metaphor of Technical Debt (TD). Technical Debt is a concept in programming which reflects the extra maintenance effort that arises when “sub-optimal” code, that is easy to implement in the short run, is used instead of applying best practices. This research work investigates the evolution of a large number of open source web applications. The applications were analyzed in the context of Lehman’s eight (8) laws of software evolution and particularly, it has been examined whether the laws are confirmed in practice for open source web applications. Lehman’s Laws are, Continuing Change (Law I), Increasing Complexity (Law II), Self-Regulation (Law III), Conservation of Organizational Stability (Law IV), Conservation of Familiarity (Law V), Continuing Growth (Law VI) Declining Quality (VII) and Feedback System (VIII). The results provide evidence that the evolution of web applications comply with most of the laws. The TD metaphor has received increasing attention from the research community in the last years. Particularly, a large portion of research studies have focused on the notion of TD Interest, which reflects the additional maintenance effort that is incurred due to the existence of sub-optimal software (i.e., due to the existence of Technical Debt). Adopting the state-of-research on TD Management, this dissertation has aimed at investigating the impact of TD on corrective maintenance and particularly, to what extent does the presence of TD in software modules slows down development pace by increase the time and effort required for fixing bugs. Although TD is usually assessed on either the entire system or on individual software modules and most studies focus on the identified inefficiencies, it is the actual craftsmanship of developers that causes the accumulation of TD. Driven by the fact that TD is introduced by the developers themselves, this dissertation attempts to explore the relation between developers’ characteristics and the tendency to introduce inefficiencies that lead to TD. Despite the fact that TD is an established and recognized concept in the software engineering community, it also remains a metaphor and like all metaphors it is inherently abstract. This means that the way it is defined and interpreted by software engineering stakeholders constitutes a subjective matter. Each software professional has developed his/her own perception around TD prioritization and management. Thus, many voices question the validity of the way it is detected and calculated by automated static analysis tools. To this end, in the context of this research a survey has been sent to a large number of developers that contribute to open source web applications in order to gain insights into the factors that lead developers to adopt or reject fixes as suggested by automated static code analysis. By acknowledging the lack of a ground truth regarding the assessment of TD among the existing TD tools, this dissertation attempts to extract a set of classes with high TD, as detected by three (3) major TD tools (CAST AIP, Squore and SonarQube). These classes can serve as benchmark of validated high-TD classes and this way a basis can be established that can be used either for prioritization of maintenance activities or for training more sophisticated TD identification techniques. To this end, the current research work proposes a methodology to extract a benchmark set of validated high-TD classes while at the same time, it reveals three types of class profiles that successfully capture the spectrum of TD measurements provided by the three TD tools. Finally, the overall contribution of this dissertation comprises five (5) points; (a) it provides valuable undiscovered information on how open source web applications evolve over time since web-based systems had received limited attention in contrast to desktop ones, (b) considering the lack of empirical evidence on the relation between TD amount and TD Interest, it investigates to what extent the presence of TD in software modules slows down development pace by increasing the time and effort required for fixing bugs, (c) by acknowledging that it is the actual craftsmanship of the developers that causes the accumulation of TD, this work outlines the characteristics of the developers who tend to add TD in open source applications, (d) it sheds light into the reasons that drive developers to agree or disagree with automatically detected TD whose urgency is very often questionable by developers and (e) by acknowledging the lack of a basis regarding the detection and prioritization of TD among the existing TD tools, the current work proposes a methodology to extract a benchmark set of modules which are ranked as high-TD modules by three (3) TD tools altogether. Keywords: software evolution, Lehman’s laws, software quality, software maintenance, open-source applications, software repositories, technical debt, technical debt management, corrective maintenance, archetypal analysis, inter-rater agreement, ground truth, benchmark Contents Chapter I. INTRODUCTION....................................................................................................... 1 1. Software Evolution and Technical Debt ....................................................................................... 1 2. Dissertation Goals and Research Questions ................................................................................. 1 2.1. Evolution of Web Applications .............................................................................................. 4 2.2. Technical Debt and Corrective Maintenance ......................................................................... 4 2.3. Personalized Assessment of Technical Debt Principal ............................................................ 5 2.4. Factors Affecting Decision to Repay Technical Debt .............................................................. 5 2.5. Benchmark of Technical Debt Liabilities ................................................................................ 6 3. Dissertation outline .................................................................................................................... 6 Chapter II. BACKGROUND ON TECHNICAL DEBT ................................................................ 7 1. Foreword .................................................................................................................................... 7 2. Key Components of TD ................................................................................................................ 7 3. What TD really is ......................................................................................................................... 8 4. TD Types ..................................................................................................................................... 9 5. Activities and Strategies for Managing TD ................................................................................. 10 6. Tools assessing TD (TD tools)..................................................................................................... 11 7. A comment on Technical Debt Management ............................................................................. 13 Chapter III. EVOLUTION OF WEB APPLICATIONS ................................................................ 16 1. Introduction .............................................................................................................................. 16 2. Related Work ............................................................................................................................ 18 3. Case Study Design ..................................................................................................................... 19 3.1. Goal and Research Question ............................................................................................... 19 3.2. Selection of Cases ............................................................................................................... 20 3.3. Employed Process and Tools ............................................................................................... 22 3.4. Data Analysis .....................................................................................................................
Recommended publications
  • D1.1.1 Public State of the Art Document
    D1.1.1 Public state of the art document Programme ITEA3 Challenge Smart Engineering Project number 17038 Project name Visual diagnosis for DevOps software development Project duration 1st October 2018 – 30st June 2022 Project website Project WP WP1 - Pre-studies and requirements Project Task Task 1.1 – Update of state-of-art and state-of-the practice analysis for visualization in software projects in DevOps context Deliverable type X Doc Textual deliverable SW Software deliverable Version V17 Delivered 18/11/2019 Access x Public Abstracts are public Confidential D1.1.1 Public state of the art document Document Contributors Partber Author Role EXPERIS Ester Sancho editor EXPERIS Miriam Moreno writer GRO Paris Avgeriou writer INVENCO Mika Koivuluoma writer OCE Lou Somers writer/reviewer OULU Markus Kelanti writer TAU Kari Systä writer/reviewer TAU Outi Sievi-Korte writer/reviewer TIOBE Paul Jansen writer TIOBE Marvin Wener writer UPC Lidia López writer UPC Xavier Franch writer VINCIT Veli-Pekka Eloranta writer Document History Date Version Editors Status 18/06/2019 ToC EXPERIS Table of Content 10/07/2019 V01 TAU Draft 30/09/2019 V02 OCE/GRO Draft 17/10/2019 V06 TIOBE Draft 22/10/2019 V08 OULU Draft 12/11/2019 V09 VINCIT 1ST Final Draft 20/11/2019 V10 OCE Reviewed version 26/11/2019 V12 UPC/EXPERIS/TAU 2nd Final Draft 10/12/2019 V16 EXPERIS Final Version 16/12/2019 V16.01 TAU Peer Review 18/12/2019 V17 EXPERIS Submission 2 D1.1.1 Public state of the art document Table of Contents Executive Summary ............................................................................................................................. 6 1.
    [Show full text]
  • Command Line Interface
    Command Line Interface Squore 21.0.2 Last updated 2021-08-19 Table of Contents Preface. 1 Foreword. 1 Licence. 1 Warranty . 1 Responsabilities . 2 Contacting Vector Informatik GmbH Product Support. 2 Getting the Latest Version of this Manual . 2 1. Introduction . 3 2. Installing Squore Agent . 4 Prerequisites . 4 Download . 4 Upgrade . 4 Uninstall . 5 3. Using Squore Agent . 6 Command Line Structure . 6 Command Line Reference . 6 Squore Agent Options. 6 Project Build Parameters . 7 Exit Codes. 13 4. Managing Credentials . 14 Saving Credentials . 14 Encrypting Credentials . 15 Migrating Old Credentials Format . 16 5. Advanced Configuration . 17 Defining Server Dependencies . 17 Adding config.xml File . 17 Using Java System Properties. 18 Setting up HTTPS . 18 Appendix A: Repository Connectors . 19 ClearCase . 19 CVS . 19 Folder Path . 20 Folder (use GNATHub). 21 Git. 21 Perforce . 23 PTC Integrity . 25 SVN . 26 Synergy. 28 TFS . 30 Zip Upload . 32 Using Multiple Nodes . 32 Appendix B: Data Providers . 34 AntiC . 34 Automotive Coverage Import . 34 Automotive Tag Import. 35 Axivion. 35 BullseyeCoverage Code Coverage Analyzer. 36 CANoe. 36 Cantata . 38 CheckStyle. ..
    [Show full text]
  • Information Needs for Software Development Analytics
    Information Needs for Software Development Analytics Raymond P.L. Buse Thomas Zimmermann The University of Virginia, USA Microsoft Research [email protected] [email protected] Abstract—Software development is a data rich activity with decision making will likely only become more difficult. many sophisticated metrics. Yet engineers often lack the tools Analytics describes application of analysis, data, and sys- and techniques necessary to leverage these potentially powerful tematic reasoning to make decisions. Analytics is especially information resources toward decision making. In this paper, we present the data and analysis needs of professional software useful for helping users move from only answering questions engineers, which we identified among 110 developers and of information like “What happened?” to also answering managers in a survey. We asked about their decision making questions of insight like “How did it happen and why?” process, their needs for artifacts and indicators, and scenarios Instead of just considering data or metrics directly, one can in which they would use analytics. gather more complete insights by layering different kinds of The survey responses lead us to propose several guidelines for analytics tools in software development including: Engi- analyses that allow for summarizing, filtering, modeling, and neers do not necessarily have much expertise in data analysis; experimenting; typically with the help of automated tools. thus tools should be easy to use, fast, and produce concise The key to applying analytics to a new domain is under- output. Engineers have diverse analysis needs and consider most standing the link between available data and the information indicators to be important; thus tools should at the same time needed to make good decisions, as well as the analyses support many different types of artifacts and many indicators.
    [Show full text]
  • Software Analytics for Incident Management of Online Services: An
    Software Analytics for Incident Management of Online Services: An Experience Report Jian-Guang Lou, Qingwei Lin, Rui Ding, Tao Xie Qiang Fu, Dongmei Zhang University of Illinois at Urbana-Champaign Microsoft Research Asia, Beijing, P. R. China Urbana, IL, USA {jlou, qlin, juding, qifu, dongmeiz}@microsoft.com [email protected] Abstract—As online services become more and more popular, incident management needs to be efficient and effective in incident management has become a critical task that aims to order to ensure high availability and reliability of the services. minimize the service downtime and to ensure high quality of the A typical procedure of incident management in practice provided services. In practice, incident management is conducted (e.g., at Microsoft and other service-provider companies) goes through analyzing a huge amount of monitoring data collected at as follow. When the service monitoring system detects a ser- runtime of a service. Such data-driven incident management faces several significant challenges such as the large data scale, vice violation, the system automatically sends out an alert and complex problem space, and incomplete knowledge. To address makes a phone call to a set of On-Call Engineers (OCEs) to these challenges, we carried out two-year software-analytics trigger the investigation on the incident in order to restore the research where we designed a set of novel data-driven techniques service as soon as possible. Given an incident, OCEs need to and developed an industrial system called the Service Analysis understand what the problem is and how to resolve it. In ideal Studio (SAS) targeting real scenarios in a large-scale online cases, OCEs can identify the root cause of the incident and fix service of Microsoft.
    [Show full text]
  • Avatud Lähtekoodiga Vahendite Kohandamine Microsoft Visual C++ Tarkvaralahenduste Kvaliteedi Analüüsiks Sonarqube Serveris
    TALLINNA TEHNIKAÜLIKOOL Infotehnoloogia teaduskond Tarkvarateaduse instituut Anton Ašot Roolaid 980774IAPB AVATUD LÄHTEKOODIGA VAHENDITE KOHANDAMINE MICROSOFT VISUAL C++ TARKVARALAHENDUSTE KVALITEEDI ANALÜÜSIKS SONARQUBE SERVERIS Bakalaureusetöö Juhendaja: Juhan-Peep Ernits PhD Tallinn 2019 Autorideklaratsioon Kinnitan, et olen koostanud antud lõputöö iseseisvalt ning seda ei ole kellegi teise poolt varem kaitsmisele esitatud. Kõik töö koostamisel kasutatud teiste autorite tööd, olulised seisukohad, kirjandusallikatest ja mujalt pärinevad andmed on töös viidatud. Autor: Anton Ašot Roolaid 21.05.2019 2 Annotatsioon Käesolevas bakalaureusetöös uuritakse, kuidas on võimalik saavutada suure hulga C++ lähtekoodi kvaliteedi paranemist, kui ettevõttes kasutatakse arenduseks Microsoft Visual Studiot ning koodikaetuse ja staatilise analüüsi ülevaate saamiseks SonarQube serverit (Community Edition). Seejuures SonarSource'i poolt pakutava tasulise SonarCFamily for C/C++ analüsaatori (mille eelduseks on SonarQube serveri Developer Edition) asemel kasutatakse tasuta ja vaba alternatiivi: SonarQube C++ Community pluginat. Analüüsivahenditena eelistatakse avatud lähtekoodiga vabu tarkvaravahendeid. Valituks osutuvad koodi kaetuse analüüsi utiliit OpenCppCoverage ja staatilise analüüsi utiliit Cppcheck. Siiski selgub, et nende utiliitide töö korraldamiseks ja väljundi sobitamiseks SonarQube Scanneri vajadustega tuleb kirjutada paar skripti: üks PowerShellis ja teine Windowsi pakkfailina. Regulaarselt ajastatud analüüside käivitamist tagab QuickBuild,
    [Show full text]
  • Patterns for Implementing Software Analytics in Development Teams
    Patterns for Implementing Software Analytics in Development Teams JOELMA CHOMA, National Institute for Space Research - INPE EDUARDO MARTINS GUERRA, National Institute for Space Research - INPE TIAGO SILVA DA SILVA, Federal University of São Paulo - UNIFESP The software development activities typically produce a large amount of data. Using a data-driven approach to decision making – such as Software Analytics – the software practitioners can achieve higher development process productivity and improve many aspects of the software quality based on the insightful and actionable information. This paper presents a set of patterns describing steps to encourage the integration of the analytics activities by development teams in order to support them to make better decisions, promoting the continuous improvement of software and its development process. Before any procedure to extract data for software analytics, the team needs to define, first of all, their questions about what will need to be measured, assess and monitored throughout the development process. Once defined the key issues which will be tracked, the team may select the most appropriate means for extracting data from software artifacts that will be useful in decision-making. The tasks to set up the development environment for software analytics should be added to the project planning along with the regular tasks. The software analytics activities should be distributed throughout the project in order to add information about the system in small portions. By defining reachable goals from the software analytics findings, the team turns insights from software analytics into actions to improve incrementally the software characteristics and/or its development process. Categories and Subject Descriptors: D.2.8 [Software and its engineering]: Software creation and management—Metrics General Terms: Software Analytics Additional Key Words and Phrases: Software Analytics, Decision Making, Agile Software Development, Patterns, Software Measurement, Development Teams.
    [Show full text]
  • Linkedin Learning Digital Framework
    Competency Courses Duration Level Course Objective Use ICT-based devices, applications, software and Learn what it takes to break through the clutter and sell in the telecommunications market. In this course, Meridith Elliott Powell helps sales professionals services understand and master the unique challenges and skills required to sell into this ever-changing industry. Meridith acquaints you with the trends and Selling into Industries: Telecommunications 00:33.0 Intermediate changes—including network security and over-the-top (OTT) services—that are currently shaping this industry, as well as what telecommunications clients Use basic productivity software, use email and expect from sales reps. Learn how to use a consultative selling approach to gain a deeper understanding of client needs, create urgency by recognizing and other digital communication solving those needs, and continue to expand your sales relationship after the deal is signed. The Systems Security Certified Practitioner (SSCP) certification is an excellent entry point to a career in IT security. To help you prepare for the SSCP exam, Use digital capture devices such as a camera instructor Mike Chapple has designed a series of courses covering each domain. In this installment, Mike covers the objectives of Networks and SSCP Cert Prep: 6 Networks and Communications Security, Domain 6, which comprises 16% of the questions on the exam. Topics include TCP/IP networking, configuring network security 10:09.0 Intermediate Use subject-specialist ICT devices and Communications Security devices, and identifying the different types of network attacks. Plus, learn how to secure your network with firewall rules, switch and router configuration, and applications confidently network monitoring; protect your telecommunications; and understand the unique features and vulnerabilities of wireless networks.
    [Show full text]
  • Write Your Own Rules and Enforce Them Continuously
    Ultimate Architecture Enforcement Write Your Own Rules and Enforce Them Continuously SATURN May 2017 Paulo Merson Brazilian Federal Court of Accounts Agenda Architecture conformance Custom checks lab Sonarqube Custom checks at TCU Lessons learned 2 Exercise 0 – setup Open www.dontpad.com/saturn17 Follow the steps for “Exercise 0” Pre-requisites for all exercises: • JDK 1.7+ • Java IDE of your choice • maven 3 Consequences of lack of conformance Lower maintainability, mainly because of undesired dependencies • Code becomes brittle, hard to understand and change Possible negative effect • on reliability, portability, performance, interoperability, security, and other qualities • caused by deviation from design decisions that addressed these quality requirements 4 Factors that influence architecture conformance How effective the architecture documentation is Turnover among developers Haste to fix bugs or implement features Size of the system Distributed teams (outsourcing, offshoring) Accountability for violating design constraints 5 How to avoid code and architecture disparity? 1) Communicate the architecture to developers • Create multiple views • Structural diagrams + behavior diagrams • Capture rationale Not the focus of this tutorial 6 How to avoid code and architecture disparity? 2) Automate architecture conformance analysis • Often done with static analysis tools 7 Built-in checks and custom checks Static analysis tools come with many built-in checks • They are useful to spot bugs and improve your overall code quality • But they’re
    [Show full text]
  • Arxiv:2106.03271V1 [Cs.SE] 6 Jun 2021
    Noname manuscript No. (will be inserted by the editor) Fixing Vulnerabilities Potentially Hinders Maintainability Sofia Reis · Rui Abreu · Luis Cruz Received: date / Accepted: date Abstract Security is a requirement of utmost importance to produce high- quality software. However, there is still a considerable amount of vulnerabil- ities being discovered and fixed almost weekly. We hypothesize that develop- ers affect the maintainability of their codebases when patching vulnerabili- ties. This paper evaluates the impact of patches to improve security on the maintainability of open-source software. Maintainability is measured based on the Better Code Hub’s model of 10 guidelines on a dataset, including 1300 security-related commits. Results show evidence of a trade-off between secu- rity and maintainability for 41:90% of the cases, i.e., developers may hinder software maintainability. Our analysis shows that 38:29% of patches increased software complexity and 37:87% of patches increased the percentage of LOCs per unit. The implications of our study are that changes to codebases while patching vulnerabilities need to be performed with extra care; tools for patch risk assessment should be integrate into the CI/CD pipeline; computer science curricula needs to be updated; and, more secure programming languages are necessary. Keywords Software Security Software Maintenance Open-Source Software · · Sofia Reis INESC ID and IST, University of Lisbon, Lisbon, Portugal E-mail: sofi[email protected] Rui Abreu INESC ID and FEUP, University of Porto, Porto, Portugal E-mail: [email protected] arXiv:2106.03271v2 [cs.SE] 12 Sep 2021 Luis Cruz Delft University of Technology, Delft, The Netherlands E-mail: [email protected] 2 Sofia Reis et al.
    [Show full text]
  • Ics-Fd-Qa Project • Objective • Present Scope • QA Calling Sequence • Present Status & Tools • Integration Into Projects & Kobe • Some Issues • Conclusion
    ics-fd-qa project • Objective • Present scope • QA calling sequence • Present status & tools • Integration into projects & kobe • Some issues • Conclusion • Your input, discussion Objective • Provide integrated code quality assessment • Standard analysis method, but ignore coding styles a.f.a.p. • High quality code scans, try to find “fishy” pieces automatically • Report only false-true, keep noise down • Graphical output with sonarqube, fancy, motivating • Support and encourage easy documentation writing (sphinx, …) • Provide clear code fix proposals if available • Provide q.a. history • Have minimal impact on projects • Existing build chains are sacred • Be able to digest lots of legacy code and give orientation for “hot spots” Present Scope • ICS-FD section internal • C/C++ code, no java, no control+ • Cmake build chain ( qmake/Qt/bear come later ) • Static code analysis: clang-tidy, cppcheck (more tools can be easily added, but clang-tidy is the upshot) • Push results to tmp-sonarqube, will have own fd instance later (dixit ics- in) • Integrated with docker, gitlab-CI and uses a bit of nexus • Therefore is kobe-compatible , see later • Can update existing sphinx-doc “almost for free”: tools are included in the docker image QA Sonarqube Ics-fd-qa (gitlab, nexus) Ics-fd qa webpage Sonar-scanner transmission Docker: clang 7.0.1 Calling • Code issues Python 3, Sphinx, • Proposed corrections Doxygen, Breathe, sequence • Statistics Trigger with payload Sonar-scanner, break, • Graphs • Artifact URL cmake 3, … • Project name stages: Push new sources • Control flags 1. Fetch sources URL & unpack 2. analysis: clang-tidy, cppcheck, aux results, output & logs artifacts C/C++ 3. Push to sonarqube Trigger project using sonar-scanner Pipeline Build: ADDED stage: • gitlab • all sources • Pull qa scripts from nexus • cmake are available • Analyze json-db for full • Produce a source dependencies build • Pack sources into artifact Nexus command (URL) Temp json-db • Trigger with payload repo QA calling sequence • works mostly blind.
    [Show full text]
  • A Graduate Certificate
    ADUS Technologies s.r.o. Fraňa Kráľa 2049 058 01 Poprad, Slovakia T: +421 (0) 949 407 310 | E: [email protected] www.adus-technologies.com Mykhailo Riabets [email protected] EDUCATION 2005 to The Torun secondary school (engineering) 2012 A graduate certificate 2015 to Uzhgorod National University – Faculty of Information Technologies 2019 Bachelor PROFESSIONAL EXPERIENCE 12/2018 to ADUS Technologies s.r.o., Poprad, Slovakia present ELVIS – a software product for a Swiss Customer that allows you to create powerful online applications, design services and systems for any kind of business. Utilized technologies: TypeScript, Angular 11, Sass, Git, IntelliJ IDEA, Sonarqube, JWT authentication, Rest Api, YouTrack Denta5 – an administrative system for managing clients and their orders, which allows clients to create orders containing a large number of attachments such as dental scans and dental aids. Utilized technologies: TypeScript, Angular 11, Sass, Git, IntelliJ IDEA, YouTrack Treves – internal system for organizing and synchronizing work in the company, planning the production of products and reporting on the products produced. Utilized technologies: TypeScript, Angular 11, Sass, Git, IntelliJ IDEA, YouTrack Reactoo Node.js backend – Node.js platfrom developer for UK Customer specializing in live streaming. Custom solution based on AWS Lambdas and heavy use of customized JavaScript solutions o Successful refactoring of code to use CI and Lambda layers o Backend and front-end tasks node.js + React Utilized technologies: JavaScript, AWS Lamda, Lambda Layers, ffmpeg Utimate solution – React & Angular Developer Developing internal platfrom system for corporate purposes Utilized technologies: TypeScript, Angular9, NgRx, Git, IntelliJ IDEA, React. Angular Common – Angular Developer It is an internal core library based on Angular 9 and Prime NG UI components.
    [Show full text]
  • Evaluating and Improving Software Quality Using Text Analysis Techniques - a Mapping Study
    Evaluating and Improving Software Quality Using Text Analysis Techniques - A Mapping Study Faiz Shah, Dietmar Pfahl Institute of Computer Science, University of Tartu J. Liivi 2, Tartu 50490, Estonia {shah, dietmar.pfahl}@ut.ee Abstract: Improvement and evaluation of software quality is a recurring and crucial activ- ity in the software development life-cycle. During software development, soft- ware artifacts such as requirement documents, comments in source code, design documents, and change requests are created containing natural language text. For analyzing natural text, specialized text analysis techniques are available. However, a consolidated body of knowledge about research using text analysis techniques to improve and evaluate software quality still needs to be estab- lished. To contribute to the establishment of such a body of knowledge, we aimed at extracting relevant information from the scientific literature about data sources, research contributions, and the usage of text analysis techniques for the im- provement and evaluation of software quality. We conducted a mapping study by performing the following steps: define re- search questions, prepare search string and find relevant literature, apply screening process, classify, and extract data. Bug classification and bug severity assignment activities within defect manage- ment are frequently improved using the text analysis techniques of classifica- tion and concept extraction. Non-functional requirements elicitation activity of requirements engineering is mostly improved using the technique of concept extraction. The quality characteristic which is frequently evaluated for the prod- uct quality model is operability. The most commonly used data sources are: bug report, requirement documents, and software reviews. The dominant type of re- search contributions are solution proposals and validation research.
    [Show full text]