Analyzing Repetitiveness in Big Code to Support Software Maintenance and Evolution Hoan Anh Nguyen Iowa State University

Analyzing Repetitiveness in Big Code to Support Software Maintenance and Evolution Hoan Anh Nguyen Iowa State University

Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2015 Analyzing repetitiveness in big code to support software maintenance and evolution Hoan Anh Nguyen Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Computer Engineering Commons Recommended Citation Nguyen, Hoan Anh, "Analyzing repetitiveness in big code to support software maintenance and evolution" (2015). Graduate Theses and Dissertations. 14591. https://lib.dr.iastate.edu/etd/14591 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Analyzing repetitiveness in big code to support software maintenance and evolution by Hoan Anh Nguyen A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Computer Engineering Program of Study Committee: Tien N. Nguyen, Major Professor Samik Basu Manimaran Govindarasu Suraj C. Kothari Hridesh Rajan Akhilesh Tyagi Iowa State University Ames, Iowa 2015 Copyright c Hoan Anh Nguyen, 2015. All rights reserved. ii TABLE OF CONTENTS LIST OF TABLES . vii LIST OF FIGURES . ix ACKNOWLEDGMENTS . xii ABSTRACT . xiii CHAPTER 1. INTRODUCTION . 1 CHAPTER 2. CODE REPETITIVENESS DETECTION . 5 2.1 Code and Code Clones Representation . .6 2.1.1 Code Fragment and Clone . .6 2.2 Structural Similarity Measurement . .7 2.2.1 Structure-oriented Representation . .7 2.2.2 Structural Feature Selection . .8 2.2.3 Characteristic Vectors . .8 2.2.4 Vector Computing Algorithm . 12 2.3 Code Clone Detection . 15 2.3.1 Locality-sensitive Hashing . 15 2.3.2 Code Clone Detection Algorithm . 16 2.3.3 Code Clone Detection Algorithm Complexity . 17 2.3.4 Clone Reporting . 18 2.4 Empirical Evaluation . 18 2.4.1 Correctness . 20 2.4.2 Time Efficiency . 21 2.4.3 Scalability . 21 2.4.4 Clone Consistency Management . 22 2.4.5 Threats to Validity . 26 2.5 Related Work . 26 iii 2.6 Conclusions . 28 CHAPTER 3. TEMPORAL SPECIFICATION MINING . 29 3.1 Introduction . 29 3.2 Mining Multiple Object Usage Patterns . 31 3.2.1 Formulation . 32 3.2.2 Algorithm Design Strategy . 33 3.2.3 Detailed Algorithm . 35 3.2.4 Anomaly Detection . 36 3.3 Empirical Evaluation . 37 3.3.1 Pattern Mining Evaluation . 37 3.3.2 Anomaly Detection Evaluation . 39 3.4 Related Work . 39 3.5 Conclusions . 41 CHAPTER 4. API PRECONDITION MINING . 42 4.1 Introduction . 42 4.2 Motivating Example . 45 4.3 Mining with Large Code Corpus . 48 4.3.1 Control Dependence and Preconditions . 49 4.3.2 Precondition Normalization . 51 4.3.3 Precondition Inference . 51 4.3.4 Precondition Filtering . 52 4.3.5 Precondition Ranking . 53 4.4 Empirical Evaluation . 53 4.4.1 Data Collection . 53 4.4.2 Ground-truth: Java Modeling Language (JML) Preconditions . 54 4.4.3 RQ1: Accuracy . 56 4.4.4 RQ2: Usefulness . 61 4.4.5 Threats to Validity. 64 4.5 Related Work . 65 4.6 Conclusions . 66 iv CHAPTER 5. CHANGE REPETITIVENESS ANALYSIS . 67 5.1 Introduction . 67 5.2 Research Question and Methodology . 69 5.2.1 Research Question . 69 5.2.2 Data Collection . 70 5.2.3 Experimental Methodology . 71 5.3 Code Change Representation . 71 5.3.1 Illustration Example . 71 5.3.2 Representation . 72 5.4 Code Change Detection . 74 5.4.1 Coarse-grained Differencing . 74 5.4.2 Fine-grained Differencing . 78 5.4.3 Collecting Code Changes . 80 5.5 Change Database Building and Change Repetitiveness Computing . 81 5.5.1 Design Strategies . 81 5.5.2 Detailed Algorithm . 82 5.6 Analysis Results . 82 5.6.1 Boxplot Representation of General Change and Fix Repetitiveness . 82 5.6.2 Exponential Relationship of Repetitiveness and Size . 84 5.6.3 Within and Cross-project Repetitiveness Comparison . 85 5.6.4 Repetitiveness of Bug Fixes . 85 5.6.5 Repetitiveness on Individual Datasets in SourgeForge and GitHub . 86 5.6.6 Repetitiveness on Representative Projects . 88 5.6.7 Repetitiveness and Change Type . 88 5.6.8 Threats to Validity . 90 5.7 Related Work . 90 5.7.1 Large-scale Studies on Uniqueness and Repetitiveness of Source Code . 90 5.7.2 Studies on Code Changes . 90 5.7.3 Code Clones . 91 5.7.4 Applications of Repetitiveness of Code Changes . 91 5.8 Conclusions . 92 v CHAPTER 6. CODE CHANGE SUGGESTION . 93 6.1 Introduction . 93 6.2 Problem Formulation . 95 6.2.1 Transactions and Tasks . 96 6.2.2 Fine-grained Code Change Representation . 96 6.2.3 Fine-grained Code Change Extraction . 99 6.3 Modeling Task Context with LDA . 99 6.3.1 Key design strategies . ..

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    182 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us