Analyzing Repetitiveness in Big Code to Support Software Maintenance and Evolution Hoan Anh Nguyen Iowa State University

Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2015 Analyzing repetitiveness in big code to support software maintenance and evolution Hoan Anh Nguyen Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Computer Engineering Commons Recommended Citation Nguyen, Hoan Anh, "Analyzing repetitiveness in big code to support software maintenance and evolution" (2015). Graduate Theses and Dissertations. 14591. https://lib.dr.iastate.edu/etd/14591 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Analyzing repetitiveness in big code to support software maintenance and evolution by Hoan Anh Nguyen A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Computer Engineering Program of Study Committee: Tien N. Nguyen, Major Professor Samik Basu Manimaran Govindarasu Suraj C. Kothari Hridesh Rajan Akhilesh Tyagi Iowa State University Ames, Iowa 2015 Copyright c Hoan Anh Nguyen, 2015. All rights reserved. ii TABLE OF CONTENTS LIST OF TABLES . vii LIST OF FIGURES . ix ACKNOWLEDGMENTS . xii ABSTRACT . xiii CHAPTER 1. INTRODUCTION . 1 CHAPTER 2. CODE REPETITIVENESS DETECTION . 5 2.1 Code and Code Clones Representation . .6 2.1.1 Code Fragment and Clone . .6 2.2 Structural Similarity Measurement . .7 2.2.1 Structure-oriented Representation . .7 2.2.2 Structural Feature Selection . .8 2.2.3 Characteristic Vectors . .8 2.2.4 Vector Computing Algorithm . 12 2.3 Code Clone Detection . 15 2.3.1 Locality-sensitive Hashing . 15 2.3.2 Code Clone Detection Algorithm . 16 2.3.3 Code Clone Detection Algorithm Complexity . 17 2.3.4 Clone Reporting . 18 2.4 Empirical Evaluation . 18 2.4.1 Correctness . 20 2.4.2 Time Efficiency . 21 2.4.3 Scalability . 21 2.4.4 Clone Consistency Management . 22 2.4.5 Threats to Validity . 26 2.5 Related Work . 26 iii 2.6 Conclusions . 28 CHAPTER 3. TEMPORAL SPECIFICATION MINING . 29 3.1 Introduction . 29 3.2 Mining Multiple Object Usage Patterns . 31 3.2.1 Formulation . 32 3.2.2 Algorithm Design Strategy . 33 3.2.3 Detailed Algorithm . 35 3.2.4 Anomaly Detection . 36 3.3 Empirical Evaluation . 37 3.3.1 Pattern Mining Evaluation . 37 3.3.2 Anomaly Detection Evaluation . 39 3.4 Related Work . 39 3.5 Conclusions . 41 CHAPTER 4. API PRECONDITION MINING . 42 4.1 Introduction . 42 4.2 Motivating Example . 45 4.3 Mining with Large Code Corpus . 48 4.3.1 Control Dependence and Preconditions . 49 4.3.2 Precondition Normalization . 51 4.3.3 Precondition Inference . 51 4.3.4 Precondition Filtering . 52 4.3.5 Precondition Ranking . 53 4.4 Empirical Evaluation . 53 4.4.1 Data Collection . 53 4.4.2 Ground-truth: Java Modeling Language (JML) Preconditions . 54 4.4.3 RQ1: Accuracy . 56 4.4.4 RQ2: Usefulness . 61 4.4.5 Threats to Validity. 64 4.5 Related Work . 65 4.6 Conclusions . 66 iv CHAPTER 5. CHANGE REPETITIVENESS ANALYSIS . 67 5.1 Introduction . 67 5.2 Research Question and Methodology . 69 5.2.1 Research Question . 69 5.2.2 Data Collection . 70 5.2.3 Experimental Methodology . 71 5.3 Code Change Representation . 71 5.3.1 Illustration Example . 71 5.3.2 Representation . 72 5.4 Code Change Detection . 74 5.4.1 Coarse-grained Differencing . 74 5.4.2 Fine-grained Differencing . 78 5.4.3 Collecting Code Changes . 80 5.5 Change Database Building and Change Repetitiveness Computing . 81 5.5.1 Design Strategies . 81 5.5.2 Detailed Algorithm . 82 5.6 Analysis Results . 82 5.6.1 Boxplot Representation of General Change and Fix Repetitiveness . 82 5.6.2 Exponential Relationship of Repetitiveness and Size . 84 5.6.3 Within and Cross-project Repetitiveness Comparison . 85 5.6.4 Repetitiveness of Bug Fixes . 85 5.6.5 Repetitiveness on Individual Datasets in SourgeForge and GitHub . 86 5.6.6 Repetitiveness on Representative Projects . 88 5.6.7 Repetitiveness and Change Type . 88 5.6.8 Threats to Validity . 90 5.7 Related Work . 90 5.7.1 Large-scale Studies on Uniqueness and Repetitiveness of Source Code . 90 5.7.2 Studies on Code Changes . 90 5.7.3 Code Clones . 91 5.7.4 Applications of Repetitiveness of Code Changes . 91 5.8 Conclusions . 92 v CHAPTER 6. CODE CHANGE SUGGESTION . 93 6.1 Introduction . 93 6.2 Problem Formulation . 95 6.2.1 Transactions and Tasks . 96 6.2.2 Fine-grained Code Change Representation . 96 6.2.3 Fine-grained Code Change Extraction . 99 6.3 Modeling Task Context with LDA . 99 6.3.1 Key design strategies . ..

Load more