The Role of Duplicated Code in Software Readability and Comprehension

Total Page:16

File Type:pdf, Size:1020Kb

The Role of Duplicated Code in Software Readability and Comprehension Master of Science in Software Engineering September 2020 The Role of Duplicated Code in Software Readability and Comprehension Xuan Liao Linyao Jiang Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology in partial fulfilment of the requirements for the degree of Master of Science in Software Engineering. The thesis is equivalent to 20 weeks of full time studies. The authors declare that they are the sole authors of this thesis and that they have not used any sources other than those listed in the bibliography and identified as references. They further declare that they have not submitted this thesis at any other institution to obtain a degree. Contact Information: Author(s): Xuan Liao E-mail: [email protected] Linyao Jiang E-mail: [email protected] University advisor: DEEPIKA BADAMPUDI Department of Software engineering Faculty of Computing Internet : www.bth.se Blekinge Institute of Technology Phone : +46 455 38 50 00 SE–371 79 Karlskrona, Sweden Fax : +46 455 38 50 57 Abstract Background. Readability and comprehension are the critical points of software de- velopment and maintenance. There are many researcher point out that the duplicate code as a code smell has effect on software maintainability, but lack of research about how duplicate code affect software readability and comprehension, which are parts of maintainability. Objectives. In this thesis, we aim to briefly summarize the impact of duplicate code and typical types of duplicate code according to current works, then our goal is to explore whether duplicate code is a factor to influence readability and compre- hension. Methods. In our present research, we did a background survey to asked some back- ground questions from forty-two subjects to help us classify them, and conduct an experiment with subjects to explore the role of duplicate code on perceived readabil- ity and comprehension by experiment. The perceived readability and comprehension are measured by perceived readability scale, reading time and the accuracy of cloze test. Results. The experimental data shows code with duplication have higher perceived readability and better comprehension, however, the difference are not significant. And code with duplication cost less reading time than code without duplication, and the difference is significant. But duplication type are strongly associate with perceived readability. For reading time, it is significant associate with duplication type and size of code. While there do not exists significant correlation between pro- gramming experience of subjects and perceived readability or comprehension, and it also has no significant relation between perceived readability and comprehension, size and CC according to our data results. Conclusions. Code with duplication has higher software readability according to the results of reading time, which is significant. And code with duplication has higher comprehension than code without duplication, but the difference is not statistically significant according to our experimental results. Longer size of code will increase reading time, and different duplication type also influence the perceived readability, the three duplication types we discussed show these relationship obviously. Keywords: Duplicate code, Software readability, Comprehension, Experiment, Sur- vey Acknowledgments We are very grateful to our supervisor DEEPIKA BADAMPUDI. Her careful guid- ance of our master thesis significantly improved our understanding of academic writ- ing and taught us a lot of specific research skills. We faced a lot of problem with duplicate code types classification sections. She helps us check related literature and provided us alternative solutions that guide us in the right direction of the thesis. We also want to sincerely thank the participants who sacrifice their spare time to attend our experiment with excellent cooperation. iii Contents Abstract i Acknowledgments iii 1 Introduction 1 1.1 Background ................................ 1 1.2 Defining the scope of thesis ....................... 1 1.3 Outline ................................... 3 2 Related Work 5 2.1 Readability and comprehension ..................... 5 2.2 Duplicated code .............................. 6 2.3 Identification of gap ............................ 7 3 Method 9 3.1 Research Question ............................ 9 3.2 Alternative method ............................ 9 3.2.1 Survey ............................... 9 3.2.2 Case Study ............................ 9 3.3 Experiment ................................ 10 3.3.1 Subjects .............................. 10 3.3.2 Experiment Materials ....................... 10 3.3.3 Dependent and Independent Variables ............. 16 3.3.4 Tasks ................................ 16 3.3.5 Experiment Design . ...................... 17 3.3.6 Piloting .............................. 18 3.3.7 Experiment Execution ...................... 18 4 Results and Analysis 21 4.1 Results ................................... 21 4.2 Analysis .................................. 22 4.2.1 Analysis about duplicate code overall .............. 23 4.2.2 Analysis about code snippet ................... 25 4.2.3 Analysis about subjects characters ............... 30 5 Discussion 33 5.1 Research question ............................. 33 5.2 Discuss the experiment data ....................... 33 v 5.3 Comparison of experimental results and related literature ....... 34 5.4 What a developer should do with duplicate code ............ 35 5.5 Validity Threat . ............................ 35 6 Conclusions and Future Work 37 6.1 Conclusion ................................. 37 6.2 Limitation and Future Work ....................... 37 References 39 A Background/Task-specific Questions 43 B Code snippet 45 C Cloze test 53 vi Chapter 1 Introduction 1.1 Background Maintainability plays a significant role in the product life cycle, the most costly part of software development is product maintenance[8][21]. The main reason is that the programmer cannot understand the project accurately[33]. So it is pointed out that code comprehension takes up more than half of the life cycle cost in software maintenance[10] and readability is the critical point of software development and maintenance[3], both the readability of the documentation and the readability of the source code is vital to the maintainability of the software project [6]. Knight and Myers indicated that we should check the readability of source code at the initial phase at the software inspection stage[29], which are a benefit for the maintainability, reusability and other quality attributes. Chanchal K Roy et al.[25] also claimed that the most time-consuming part of all maintenance activities is read- ing and comprehension of the code. So people are committed to research what can affect readability and comprehension. Various programming style and coding guidelines were proposed to enhance the comprehension and readability of the code[7][31], and some other good activities and designs can also benefit such as identifying names[18] [15], code refactoring [22] and code smells[11]. However, various factors such as the education of reviews, con- text [27], the length of code snippets, the coding experience of people, are intertwined and affect each other, it can be pretty hard to analyze these factors individually. 1.2 Defining the scope of thesis In our study, we are interested in the impact of duplicated code on software read- ability and comprehension. Duplicate code is defined as the similar codes found in more than two methods, or two snippets execute the same function with different codes in the same class or different classes[23]. It can be generally classified into four types and the classification standards are presented in Section 3.3.2. The snippets selection standards are also presented in Table 3.1. Duplicated code is defined as code smell[11] for people to indicate it can make code longer and need additional cost for maintenance if one of the duplicated codes has defects[25], but it is a ben- efit for avoiding repeating the same mistakes as before and decoupling to make the 1 2 Chapter 1. Introduction component independent. There are also some studies that provide strong empirical evidence to support duplicated codes that have some positive impact and should not be refactored, which points duplicate code can be more stable in general than code without duplication, but less stable only in deletion situation, and big size dupli- cate code is less stable than small code[12]. Also duplicate code are often used as a development, and some places that will cause bugs are almost handled correctly[1]. In addition, in some situation such as clone the subsystem is one of the methods to introduce experimental modification to core subsystem, this method can improve the code by testing in the subsystem, and finally introduce the code into a stable code base, which is reasonable and helpful[17]. The main treatments for duplicate code in the same class are extract method and extract class for duplicate code in different class[11]. Clone tracking is another way to handle duplicated code when refactoring in some situation is impractical[23]. In this research, software readability is defined as the inherent property of text and comprehension is readers’ understanding of text description. Readability is the pre- condition of comprehensibility. The motivation of the definition is detailed in section 2.1. We list factors that affect software comprehension
Recommended publications
  • The Vision of Software Clone Management: Past, Present, and Future (Keynote Paper)
    The Vision of Software Clone Management: Past, Present, and Future (Keynote Paper) Chanchal K. Roy Minhaz F. Zibran Rainer Koschkey University of Saskatchewan, Canada yUniversity of Bremen, Germany {chanchal.roy, minhaz.zibran}@usask.ca, [email protected] Abstract—Duplicated code or code clones are a kind of code inflate the code base and may increase resource requirements. smell that have both positive and negative impacts on the This may be crucial for embedded systems and systems such development and maintenance of software systems. Software as hand held devices, telecommunication switches, and small clone research in the past mostly focused on the detection and analysis of code clones, while research in recent years sensor systems. Moreover, cloning a code snippet that contains extends to the whole spectrum of clone management. In the last any unknown fault may result in propagation of that fault decade, three surveys appeared in the literature, which cover to all copies of the faulty fragment. From the maintenance the detection, analysis, and evolutionary characteristics of code perspective, a change in one code segment may necessitate clones. This paper presents a comprehensive survey on the state consistent changes in all clones of that fragment. Any incon- of the art in clone management, with in-depth investigation of clone management activities (e.g., tracing, refactoring, cost- sistency may introduce bugs or vulnerabilities in the system. benefit analysis) beyond the detection and analysis. This is Fowler et al. [35] recognize code clones as a serious kind of the first survey on clone management, where we point to the code smell.
    [Show full text]
  • Code Smells Quantification: a Case Study on Large Open Source Research Codebase Swapnil Singh Chauhan University of Texas at El Paso, [email protected]
    University of Texas at El Paso DigitalCommons@UTEP Open Access Theses & Dissertations 2019-01-01 Code Smells Quantification: A Case Study On Large Open Source Research Codebase Swapnil Singh Chauhan University of Texas at El Paso, [email protected] Follow this and additional works at: https://digitalcommons.utep.edu/open_etd Part of the Computer Sciences Commons Recommended Citation Chauhan, Swapnil Singh, "Code Smells Quantification: A Case Study On Large Open Source Research Codebase" (2019). Open Access Theses & Dissertations. 50. https://digitalcommons.utep.edu/open_etd/50 This is brought to you for free and open access by DigitalCommons@UTEP. It has been accepted for inclusion in Open Access Theses & Dissertations by an authorized administrator of DigitalCommons@UTEP. For more information, please contact [email protected]. CODE SMELLS QUANTIFICATION: A CASE STUDY ON LARGE OPEN SOURCE RESEARCH CODEBASE SWAPNIL SINGH CHAUHAN Master’s Program in Computer Science APPROVED: Omar Badreddin, Ph.D., Chair Eric Smith, Ph.D. Monika Akbar, Ph.D. Yoonsik Cheon, Ph.D. Charles H. Ambler, Ph.D. Dean of the Graduate School Copyright © by Swapnil Singh Chauhan 2019 Dedicated to my inspiring parents and family CODE SMELLS QUANTIFICATION: A CASE STUDY ON LARGE OPEN SOURCE RESEARCH CODEBASE by SWAPNIL SINGH CHAUHAN, B.E in Computer Science THESIS Presented to the Faculty of the Graduate School of The University of Texas at El Paso in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE Department of Computer Science THE UNIVERSITY OF TEXAS AT EL PASO May 2019 ACKNOWLEDGEMENTS I would like to thank my supervisor, Dr.
    [Show full text]
  • Anti-Patterns and Code Smells Contributions: Yann-Gaël Guéhéneuc, Foutse Khomh, , Diana El-Masri, Fàbio Petrillo, Zéphryin Soh and Naouel Moha
    SOEN6461: Software Design Methodologies Zeinab (Azadeh) Kermansaravi Smells: Anti-patterns and Code smells Contributions: Yann-Gaël Guéhéneuc, Foutse Khomh, , Diana El-Masri, Fàbio Petrillo, Zéphryin Soh and Naouel Moha 1 Quality Evolution Development Team Code Smells Lexical Smells, … Design Smells Name conventions Design patterns 2 “qual·i·ty noun \ˈkwä-lə-tē\ . how good or bad something is . a characteristic or feature that someone or something has: something that can be noticed as a part of a person or thing . a high level of value or excellence” —Merriam-Webster, 2013 3 Software Quality In the context of software engineering, software quality measures how well software is designed (quality of design), and how well the software conforms to that design (quality of conformance). 4 Software Quality • Division of software quality according to ISO/IEC 9126:2001, 25000:2005… • Process quality • Product quality • Quality in use 5 http://www.sqa.net/iso9126.html Software Quality • Division of software quality according to ISO/IEC 9126:2001, 25000:2005… • Process quality • Product quality • Quality in use 6 http://www.sqa.net/iso9126.html Software Quality http://romainvv.ddns.net/iso-25010/ 7 Quality Evolution Development Team Code Smells Lexical Smells, … Design Smells Name conventions Design patterns 8 Smells Development team may implement software features with poor design, or bad coding… . Code Smells (Low level (local) problems) Poor coding decisions . Lexical smells (Linguistic Anti-patterns) Poor practice in the naming, documentation, … in the implementation of an entity. Anti-patterns (High Level (global) problems) Poor design solutions to recurring design problems 9 Anti-patterns • Anti-patterns are “poor” solutions to recurring design and implementation problems.
    [Show full text]
  • Improving Software Quality Using an Ontology-Based Approach" (2010)
    Louisiana State University LSU Digital Commons LSU Doctoral Dissertations Graduate School 2010 Improving software quality using an ontology- based approach Yixin Luo Louisiana State University and Agricultural and Mechanical College Follow this and additional works at: https://digitalcommons.lsu.edu/gradschool_dissertations Part of the Computer Sciences Commons Recommended Citation Luo, Yixin, "Improving software quality using an ontology-based approach" (2010). LSU Doctoral Dissertations. 1223. https://digitalcommons.lsu.edu/gradschool_dissertations/1223 This Dissertation is brought to you for free and open access by the Graduate School at LSU Digital Commons. It has been accepted for inclusion in LSU Doctoral Dissertations by an authorized graduate school editor of LSU Digital Commons. For more information, please [email protected]. IMPROVING SOFTWARE QUALITY USING AN ONTOLOGY-BASED APPROACH A Dissertation Submitted to the Graduate Faculty of the Louisiana State University and Agricultural and Mechanical College in partial fulfillment of the requirements for the degree of Doctor of Philosophy In The Department of Computer Science By Yixin Luo B.S., Wuhan University of Technology, 1991 M.S., Southern University, 2001 May 2010 © Copyright 2010 Yixin Luo All rights reversed ii 谨将此论文 献给敬爱的父亲,罗孝杨 对您的怀念将伴我一生 和我的家人 母亲,蒋亚平 爷爷, 罗重秀 奶奶, 蒋金莲 外公, 蒋荣陪 外婆, 黄淑华 公公, 周垠庚 阿婆, 蒋宏英 岳母, 曾桂芳 你们的关爱是我完成此文的动力之源 iii Acknowledgements I express my deepest gratitude to Dr. Doris Carver, my supervising professor, for her guidance, patience, and encouragement throughout my graduate studies. It is an honor and a blessing for being her student. She has a great knowledge and her wisdom is second to no one. Also, her honesty and integrity is my moral model.
    [Show full text]
  • Analysis of Code Refactoring Impact on Software Quality
    MATEC Web of Conferences57, 02012 (2016) DOI: 10.1051/matecconf/2016 57 02012 ICAET - 2016 Analysis of Code Refactoring Impact on Software Quality Amandeep Kaur1 and Manpreet Kaur2 1Computer Science and Engg. Department, Punjab Technical University,Jalandhar, India, [email protected] 2Computer Science and Engg. Department, Punjab Technical University,Jalandhar,India, [email protected] Abstract. Code refactoring is a “Technique used for restructuring an existing source code, improving its internal structure without changing its external behaviour”. It is the process of changing a source code in such a way that it does not alter the external behaviour of the code yet improves its internal structure. It is a way to clean up code that minimizes the chances of introducing bugs. Refactoring is a change made to the internal structure of a software component to make it easier to understand and cheaper to modify, without changing the observable behaviour of that software component. Bad smells indicate that there is something wrong in the code that have to refactor. There are different tools that are available to identify and remove these bad smells. It is a technique that change our source code in a more readable and maintainable form by removing the bad smells from the code. Refactoring is used to improve the quality of software by reducing the complexity. In this paper bad smells are found and perform the refactoring based on these bad smell and then find the complexity of program and compare with initial complexity. This paper shows that when refactoring is performed the complexity of software decrease and easily understandable.
    [Show full text]
  • Anti-Pattern Detection: Methods, Challenges, and Open Issues
    Anti-Pattern Detection: Methods, Challenges, and Open Issues FABIO PALOMBA, GABRIELE BAVOTA, ROCCO OLIVETO, ANDREA DE LUCIA Abstract Anti-patterns are poor solutions to recurring design problems. They occur in object-oriented systems when developers unwillingly introduce them while designing and implementing the classes of their systems. Several empirical studies have highlighted that anti-patterns have a negative impact on the comprehension and maintainability of a software systems. Consequently, their identification has received recently more attention from both researchers and practitioners who have proposed various approaches to detect them. This chapter discusses on the approaches proposed in the literature. In addition, from the analysis of the state of the art, we will (i) derive a set of guidelines for building and evaluating recommendation systems supporting the detection of anti- patterns; and (ii) discuss some problems that are still open, to trace future research directions in the field. For this reason, the chapter provides a support to both researchers, who are interested in comprehending the results achieved so far in the identification of anti-patterns, and practitioner, who are interested in adopting a tool to identify anti-patterns in their software systems. Keywords Anti-pattern, Code Bad Smells, Linguistic Anti-pattern, Software Metrics. ________________________ Fabio Palomba ! Andrea De Lucia Department of Management and Information Technology University of Salerno, Fisciano (SA) - Italy Fabio Palomba e-mail: [email protected] Andrea De Lucia e-mail: [email protected] URL: http://www.unisa.it/docenti/deluciaa/index Gabriele Bavota Department of Engineering University of Sannio, Benevento – Italy e-mail: [email protected] URL: http://www.dmi.unisa.it/people/bavota/www/index.html Rocco Oliveto Department of Bioscience and Territory University of Molise, Pesche (IS) - Italy e-mail: [email protected] URL: http://distat.unimol.it/people/oliveto ! ANTI-PATTERN DETECTION: METHODS, CHALLENGES, AND OPEN ISSUES 1.
    [Show full text]
  • A Large Scale Empirical Study of the Impact of Spaghetti Code and Blob
    A Large Scale Empirical Study of the Impact of Spaghetti Code and ? Blob Anti-patterns on Program Comprehension a < b c d e Cristiano Politowski , , Foutse Khomh , Simone Romano , Giuseppe Scanniello , Fabio Petrillo , a f Yann-Gaël Guéhéneuc and Abdou Maiga aConcordia University, Montreal, Quebec, Canada bPolytechnique Montréal, Montreal, Quebec, Canada cUniversity of Bari, Bari, Italy dUniversity of Basilicata, Potenza, Italy eUniversité du Québec à Chicoutimi, Chicoutimi, Quebec, Canada fUniversité Félix Houphouet Boigny, Abidjan, Ivory Coast ARTICLEINFO Abstract Keywords: Context: Several studies investigated the impact of anti-patterns (i.e., “poor” solutions to recurring Anti-patterns design problems) during maintenance activities and reported that anti-patterns significantly affect the Blob developers’ effort required to edit files. However, before developers edit files, they must understand Spaghetti Code the source code of the systems. This source code must be easy to understand by developers. Program Comprehension Objective: In this work, we provide a complete assessment of the impact of two instances of two Java anti-patterns, Blob or Spaghetti Code, on program comprehension. Method: We analyze the impact of these two anti-patterns through three empirical studies conducted at Polytechnique Montréal (Canada) with 24 participants; at Carlton University (Canada) with 30 par- ticipants; and at University Basilicata (Italy) with 79 participants. Results: We collect data from 372 tasks obtained thanks to 133 different participants from the three universities. We use three metrics to assess the developers’ comprehension of the source code: (1) the duration to complete each task; (2) their percentage of correct answers; and, (3) the NASA task load index for their effort.
    [Show full text]
  • Sonarqube in Action
    IN ACTION G. Ann Campbell Patroklos P. Papapetrou FOREWORD BY Olivier Gaudin MANNING SonarQube in Action Download from Wow! eBook <www.wowebook.com> Download from Wow! eBook <www.wowebook.com> SonarQube in Action G. ANN CAMPBELL PATROKLOS P. PAPAPETROU MANNING SHELTER ISLAND Download from Wow! eBook <www.wowebook.com> For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: [email protected] ©2014 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed
    [Show full text]