Improved Methods for Mining Software Repositories to Detect Evolutionary Couplings

IMPROVED METHODS FOR MINING SOFTWARE REPOSITORIES TO DETECT EVOLUTIONARY COUPLINGS A dissertation submitted to Kent State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy by Abdulkareem Alali August, 2014 Dissertation written by Abdulkareem Alali B.S., Yarmouk University, USA, 2002 M.S., Kent State University, USA, 2008 Ph.D., Kent State University, USA, 2014 Approved by Dr. Jonathan I. Maletic Chair, Doctoral Dissertation Committee Dr. Feodor F. Dragan Members, Doctoral Dissertation Committee Dr. Hassan Peyravi Dr. Michael L. Collard Dr. Joseph Ortiz Dr. Declan Keane Accepted by Dr. Javed Khan Chair, Department of Computer Science Dr. James Blank Dean, College of Arts and Sciences ii TABLE OF CONTENTS TABLE OF CONTENTS ............................................................................................... III LIST OF FIGURES ..................................................................................................... VIII LIST OF TABLES ....................................................................................................... XIII ACKNOWLEDGEMENTS ..........................................................................................XX CHAPTER 1 INTRODUCTION ................................................................................... 22 1.1 Motivation and Problem .......................................................................................... 24 1.2 Research Overview ................................................................................................. 26 1.3 Contributions ........................................................................................................... 27 1.4 Organization ............................................................................................................ 28 CHAPTER 2 BACKGROUND AND RELATED WORK ......................................... 30 2.1 Impact Analysis ....................................................................................................... 30 2.2 Evolutionary Couplings ........................................................................................... 32 2.3 Static Program Analysis .......................................................................................... 37 2.4 Changesets ............................................................................................................... 39 2.5 Failure Prediction and Maintenance Effort Using Metrics ..................................... 41 2.5.1 Code Metrics ........................................................................................................... 41 2.5.2 Code Change Metrics .............................................................................................. 43 2.5.3 Previous Changes and Defects ................................................................................ 46 2.5.4 Collaboration Metrics .............................................................................................. 47 iii CHAPTER 3 USING CHANGE METRICS TO IMPROVE THE DETECTION OF EVOLUTIONARY COUPLINGS ..................................................................... 51 3.1 Introduction ............................................................................................................. 51 3.2 Detecting Evolutionary Coupling ............................................................................ 53 3.3 Change Metrics ....................................................................................................... 55 3.4 Adding change metrics ............................................................................................ 63 3.4.1 Discrete Change metrics .......................................................................................... 64 3.4.2 Change metrics + Mining ........................................................................................ 65 3.4.3 Implementation ........................................................................................................ 67 3.5 Evaluation ................................................................................................................ 68 3.5.1 Evaluation Using Prediction .................................................................................... 70 3.5.2 Interestingness Measures ......................................................................................... 73 3.5.3 Manual Validation of Patterns ................................................................................. 81 3.6 Threats to Validity ................................................................................................... 84 3.7 Discussion ............................................................................................................... 84 CHAPTER 4 CHANGE PATTERNS INTERACTIVE TOOL AND VISUALIZER ............................................................................................................................... 87 4.1 Introduction ............................................................................................................. 87 4.2 Controls ................................................................................................................... 88 4.3 Summary ................................................................................................................. 92 CHAPTER 5 DISTRIBUTION AND CORRELATION OF CODE, CHANGE AND COLLABORATION METRICS ....................................................................... 93 iv 5.1 Introduction ............................................................................................................. 93 5.2 Software Metrics ..................................................................................................... 96 5.3 Data Collection ........................................................................................................ 97 5.4 Time Window ........................................................................................................ 102 5.5 Metrics Distribution .............................................................................................. 103 5.5.1 Frequency Histogram ............................................................................................ 104 5.5.2 Frequency Histogram on a Log-Log Plot .............................................................. 108 5.5.3 Complementary Cumulative Distribution Function .............................................. 111 5.6 Metrics Correlation ............................................................................................... 129 5.7 Discussion ............................................................................................................. 140 CHAPTER 6 USING AGE AND DISTANCE TO IMPROVE THE DETECTION OF EVOLUTIONARY COUPLINGS ............................................................ 142 6.1 Introduction ........................................................................................................... 142 6.2 Frequent Pattern Mining ........................................................................................ 143 6.3 Data Collection and Patterns Generation .............................................................. 144 6.4 Pattern Distance ..................................................................................................... 146 6.5 Pattern Age ............................................................................................................ 149 6.6 Evaluation Using Interestingness Measures .......................................................... 155 6.7 Summary ............................................................................................................... 167 CHAPTER 7 ASSESSING TIME WINDOW SIZE IN THE MINING OF SOFTWARE REPOSITORIES FOR EVOLUTIONARY COUPLINGS .. 169 7.1 Introduction ........................................................................................................... 169 v 7.2 Evolutionary Couplings ......................................................................................... 172 7.3 Approach & Setup of the Study ............................................................................ 172 7.3.1 Experimental Data ................................................................................................. 172 7.3.2 Patterns Generation ............................................................................................... 176 7.3.3 Design of the Evaluation ....................................................................................... 176 7.3.4 Evaluation Using Prediction .................................................................................. 177 7.4 Empirical Study ..................................................................................................... 180 7.4.1 Time Windows Comparison .................................................................................. 181 7.4.2 Time Window Cross Prediction ............................................................................ 186 7.4.3 Combining Time Windows ......................................................................... 192 7.5 Threats To Validity ............................................................................................... 199 7.6 Discussion ............................................................................................................. 199 CHAPTER 8 PREDECTION PARAMTERS ON THE DETECTION OF EVOLUTIONARY COUPLINGS ..................................................................

Improved Methods for Mining Software Repositories to Detect Evolutionary Couplings

Building a Scalable Index and a Web Search Engine for Music on the Internet Using Open Source Software

The 3Ourn L of AUUG Inc. Volume 25 ¯ Number 4 December 2004

NYC*BUG in Perspective

The GNU General Public License (GPL) Does Govern All Other Use of the Material That Constitutes the Autoconf Macro

Pipenightdreams Osgcal-Doc Mumudvb Mpg123-Alsa Tbb

Rotterdam Werkt!; Improving Interorganizational Mobility Through Centralizing Vacancies and Resumes

Meta-Revelation-Ebook-Web.Pdf

KDE E.V. Quarterly Report 2008Q1/Q2

Picongpu Documentation Release 0.6.0-Dev

R 3.1 Open Source Packages

Glossary.Pdf

Cross-Instance Search System Search Engine Comparison