Visualization and Analysis of Software Clones
Total Page:16
File Type:pdf, Size:1020Kb
Visualization and Analysis of Software Clones AThesisSubmittedtothe College of Graduate Studies and Research in Partial Fulfillment of the Requirements for the degree of Master of Science in the Department of Computer Science University of Saskatchewan Saskatoon By Muhammad Asaduzzaman Muhammad Asaduzzaman, January 2012. All rights reserved. Permission to Use In presenting this thesis in partial fulfilment of the requirements for a Postgraduate degree from the University of Saskatchewan, I agree that the Libraries of this University may make it freely available for inspection. I further agree that permission for copying of this thesis in any manner, in whole or in part, for scholarly purposes may be granted by the professor or professors who supervised my thesis work or, in their absence, by the Head of the Department or the Dean of the College in which my thesis work was done. It is understood that any copying or publication or use of this thesis or parts thereof for financial gain shall not be allowed without my written permission. It is also understood that due recognition shall be given to me and to the University of Saskatchewan in any scholarly use which may be made of any material in my thesis. Requests for permission to copy or to make other use of material in this thesis in whole or part should be addressed to: Head of the Department of Computer Science 176 Thorvaldson Building 110 Science Place University of Saskatchewan Saskatoon, Saskatchewan Canada S7N 5C9 i Abstract Code clones are identical or similar fragments of code in a software system. Simple copy-paste pro- gramming practices of developers, reusing existing code fragments instead of implementing from the scratch, limitations of both programming languages and developers are the primary reasons behind code cloning. Despite the maintenance implications of clones, it is not possible to conclude that cloning is harmful because there are also benefits in using them (e.g. faster and independent development). As a result, researchers at least agree that clones need to be analyzed before aggressively refactoring them. Although a large number of state-of-the-art clone detectors are available today, handling raw clone data is challenging due to the tex- tual nature and large volume. To address this issue, we propose a framework for large-scale clone analysis and develop a maintenance support environment based on the framework called VisCad. To manage the large volume of clone data, VisCad employs the Visual Information Seeking Mantra: overview first, zoom and filter, then provide details-on-demand. With VisCad users can analyze and identify distinctive code clones through a set of visualization techniques, metrics covering different clone relations and data filtering operations. The loosely coupled architecture of VisCad allows users to work with any clone detection tool that reports source-coordinates of the found clones. This yields the opportunity to work with the clone detectors of choice, which is important because each clone detector has its own strengths and weaknesses. In addition, we extend the support for clone evolution analysis, which is important to understand the cause and effect of changes at the clone level during the evolution of a software system. Such information can be used to make software maintenance decisions like when to refactor clones. We propose and implement a set of visualizations that can allow users to analyze the evolution of clones from a coarse grain to a fine grain level. Finally, we use VisCad to extract both spatial and temporal clone data to predict changes to clones in a future release/revision of the software, which can be used to rank clone classes as another means of handling a large volume of clone data. We believe that VisCad makes clone comprehension easier and it can be used as a test-bed to further explore code cloning, necessary in building a successful clone management system. ii Acknowledgements First of all, I would like to express my heart-felt and most sincere gratitude to my respected super- visors Chanchal K. Roy and Kevin A. Schneider for their constant guidance, advice, encouragement and extraordinary patience during this thesis work. This thesis would not have been possible without them. I would like to thank Julita Vassileva, Christopher Dutchyn, and Shahedul A. Khan who have generously given their time and expertise to better my work. I am indebted to the members of the Software Research Lab for their support and inspiration. In particular, I would like to thank Minhaz Fahim Zibran, Ripon Kumar Saha, Mohammad Asif Ashraf Khan, Md. Sharif Uddin, Md. Saidur Rahman, Khalid Billah, Manishankar Mondal, and Avigit Saha. I am also grateful to Department of Computer Science, the University of Saskatchewan for their generous support through scholarship, awards and bursaries that helped me to concentrate more deeply on my thesis work. I would like to thank all of my friends and other staffmembers of the Department of Computer Science who have helped me in one way or another along the way. In particular, I would like to thank Janice Thompson, Gwen Lancaster, Maureen Desjardins, and Heather Webb. I express my gratefulness to my family members and relatives especially my mother Ayesha Akhtara Begum, my father Md. Abdul Jalil Miah, my brother Muhammad Ahasanuzzaman, my uncle Motahar Hossain, Atahar Hossain, and Mozahar Hossain. Last but no means least, I thank my friends in Bangladesh for their constant support and encouragement. In particular I would like to thank Matiur Rahman, Sazzad Hasan, Ikhtear Sharif, Ashraf Mohammed Iqbal, and Mizanur Rahman. For those that I have not listed explicitly, thank you for being a part of this thesis and helping me grow as a person and a researcher. iii This thesis is dedicated to my mother Ayesha Akhtara Begum, who always be the source of inspiration to me. iv Contents Permission to Use i Abstract ii Acknowledgements iii Contents v List of Tables viii List of Figures ix List of Abbreviations x 1 Introduction 1 1.1 Motivation................................................... 1 1.2 ProblemStatement ............................................. 2 1.3 ContributionsoftheThesis......................................... 3 1.4 Publications.................................................. 3 1.5 OutlineoftheThesis ............................................ 4 2 Background and Related Work 5 2.1 CloneTerminology.............................................. 5 2.1.1 CodeFragment ........................................... 5 2.1.2 CodeClone.............................................. 5 2.1.3 CloneTypes ............................................. 6 2.1.4 CloneRelations ........................................... 7 2.2 WhyarethereClonesinSoftwareSystems?............................... 8 2.3 Negative Impacts of Clones . 9 2.4 CloneDetection ............................................... 12 2.4.1 Text-based Approaches . 12 2.4.2 Lexical Approaches . 12 2.4.3 Graph-based Approaches . 13 2.4.4 Syntax-Tree based Approaches . 13 2.4.5 Metric-based Approaches . 13 2.4.6 Hybrids................................................ 13 2.5 Supporting Clone Analysis and Visualization . 15 2.6 CodeCloneEvolution............................................ 16 2.6.1 Clone Genealogy Model . 16 2.6.2 Clone Evolution Analysis . 18 2.6.3 Supporting Clone Evolution Comprehension . 20 2.7 Conclusion................................................... 23 3 Supporting Large Scale Clone Analysis: A Pragmatic Approach 24 3.1 Introduction.................................................. 24 3.2 A Framework for Clone Analysis . 25 3.2.1 Import Clone Candidates . 26 3.2.2 GlobalFiltering ........................................... 26 3.2.3 Categorization and Presentation of Clone Data . 27 3.2.4 Goal Selection . 27 v 3.2.5 ViewData .............................................. 27 3.2.6 Analyze and Annotate the Clone Data . 27 3.2.7 RefinetheGoal ........................................... 28 3.2.8 LocalFiltering............................................ 28 3.2.9 MakeDecision ............................................ 28 3.3 From Design to Implementation . 28 3.3.1 Supporting Multiple Clone Detectors . 30 3.3.2 Visualization of Code Clones . 31 3.3.3 Metrics Supported by VisCad . 34 3.3.4 Filtering................................................ 36 3.4 Analyzing Clones with VisCad . 38 3.4.1 Subject Systems and Research Questions . 38 3.4.2 CloneDetection ........................................... 39 3.4.3 Preparing for Analysis . 39 3.4.4 AnsweringQuestions ........................................ 40 3.5 Usability Evaluation . 44 3.6 RelatedWork................................................. 44 3.7 Conclusion................................................... 46 4 Improving Clone Evolution Comprehension: A Visual Endeavour 48 4.1 Motivation................................................... 48 4.2 Clone Genealogy Extractor . 49 4.2.1 Methodology . 49 4.2.2 TextSimilarity............................................ 51 4.2.3 CombineSimilarity ......................................... 51 4.2.4 Mapping Clone Classes . 51 4.2.5 Mapping Clone Fragments . 52 4.2.6 Automatic Change Pattern Classification . 52 4.3 RequirementIdentification ......................................... 53 4.4 Data Collection and Metrics Computed . 54 4.5 Mapping Data to Visual Entities . 54 4.5.1 Clone Class Evolution . 55 4.5.2 Clone Fragment Evolution . 56 4.5.3 Clone File Evolution . 57 4.5.4 Line-based Evolution . 58 4.5.5 Developers Cloning Profile