The Effects of Parameter Tuning on Machine Learning Performance in a Software Defect Prediction Context

Graduate Theses, Dissertations, and Problem Reports 2015 The Effects of Parameter Tuning on Machine Learning Performance in a Software Defect Prediction Context Benjamin N. Province Follow this and additional works at: https://researchrepository.wvu.edu/etd Recommended Citation Province, Benjamin N., "The Effects of Parameter Tuning on Machine Learning Performance in a Software Defect Prediction Context" (2015). Graduate Theses, Dissertations, and Problem Reports. 6457. https://researchrepository.wvu.edu/etd/6457 This Thesis is protected by copyright and/or related rights. It has been brought to you by the The Research Repository @ WVU with permission from the rights-holder(s). You are free to use this Thesis in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you must obtain permission from the rights-holder(s) directly, unless additional rights are indicated by a Creative Commons license in the record and/ or on the work itself. This Thesis has been accepted for inclusion in WVU Graduate Theses, Dissertations, and Problem Reports collection by an authorized administrator of The Research Repository @ WVU. For more information, please contact [email protected]. The Effects of Parameter Tuning on Machine Learning Performance in a Software Defect Prediction Context Benjamin N. Province Thesis submitted to the Statler College of Engineering and Mineral Resources at West Virginia University in partial fulfillment of the requirements for the degree of Master of Science in Computer Science with concentration in Software and Knowledge Engineering Tim Menzies, Ph.D., Chair Katerina Goseva-Popstojanova, Ph.D. Thirimachos Bourlai, Ph.D. Lane Department of Computer Science and Electrical Engineering Morgantown, West Virginia 2015 Keywords: Data Mining, Software Defect Prediction, Parameter Tuning, Machine Learning, Cross-Version Learning, Software Engineering Copyright 2015 Benjamin Province ABSTRACT The Effects of Parameter Tuning on Machine Learning Performance in a Software Defect Prediction Context Benjamin N. Province Most machine learning techniques rely on a set of user-defined parameters. Changes in the values of these parameters can greatly affect the prediction performance of the learner. These parameters are typically either set to default values or tuned for best performance on a particular type of data. In this thesis, the parameter-space of four machine learners is explored in order to determine the efficacy of parameter tuning within the context of software defect prediction. A distinction is made between the typical within-version learning scheme and forward learning, in which learners are trained on defect data from one software version and used to predict defects in the following version. The efficacy of selecting parameters based on within- version tuning and applying those parameters to forward learning is tested. This is done by means of a cross-validated parameter-space grid search with each tuning's performance being compared to the performance of the default tuning given the same data. For the Bernouli naive Bayes classifier and the random forest classifier, it is found that parameter tuning within-version is a viable strategy for increasing forward learning performance. For the logistic regression classifier, it is found that tuning can be effective within a single version, but parameters learned in this manner do not necessarily perform well in the forward learning case. For the multinomial Bayes classifier, no substantial evidence for the efficacy of parameter tuning is found. Acknowledgments First, I would like to thank my advisor, Dr. Tim Menzies, his colleague Dr. Susan Partington, the Lane Department of Computer Science and Electrical Engineering, and the Program of Human Nutrition and Foods for employing me as a GRA for the first year and a half of my graduate studies and thus providing me with an important source of much-needed funding. I would also like to thank the West Virginia Space Grant Consortium, the Jet Propulsion Laboratory at the California Institute of Technology, and my JPL mentor Justin Lin for making possible the summer internship which inspired me to pursue graduate studies in the field of Computer Science as a bridge into the field of intelligent robotics. I would be remiss not to also thank Dimitris Vassiliadis, the Colorado Space Grant Consortium, and the WVU department of Physics and Astronomy for their support of RockSat, a program in which I participated as an undergraduate which put me on the path towards JPL and eventually graduate school. I would like to thank my wife, Randy, who helps me to stay focused, and keep my nose to the grindstone. Without her support, this thesis would have almost certainly taken another semester to complete. I would like to thank my parents and grandmother for encouraging my academic success through childhood and for supplementing my undergraduate scholarships so that I could avoid the burden of student loans. I would like to thank my friend Andrew Duncan for his friendship and for sharing some of his technical expertise. Any time I face a software or electronics challenge that has me stumped, Andrew is the first person I ask for advice. Last, but not least, I would like to thank all those who have contributed to Python, SciPy, NumPy, SciKit-Learn, SPyDEr, Matplotlib, Ubuntu, LinuxMint, MacPorts, LATEX, and all the other free software I use. Without free and open-source software, this thesis could not have happened, and the world would truly be a much less wonderful place. iii Contents 1 Introduction 1 1.1 Motivation . .1 1.2 Research Questions . .3 1.3 Statement of Thesis . .5 1.4 Structure of Thesis . .6 2 Background 7 2.1 Simple Methods Work Well for Defect Prediction . .7 2.2 Parameter Tuning of Machine Learners . .8 2.3 Parameter Tuning in Evolutionary Algorithms . .9 3 Methods 10 3.1 Machine Learning Techniques . 10 3.1.1 Bayesian Classification . 11 3.1.2 Random Forest Classifier . 14 3.1.3 Logistic Regression . 14 3.2 Evaluating Machine Learning Performance . 16 3.2.1 Confusion Matrices . 17 iv 3.2.2 Measures of Classification Performance . 18 3.3 Data . 20 3.3.1 Data Sources . 20 3.3.2 CKJM Metrics . 21 3.3.3 Data Selection . 23 3.4 Experimental Design . 25 3.4.1 Parameter Grid Search . 26 3.4.2 Cross-Validation Setup . 30 3.4.3 Current-Version Vs. Forward-Version Evaluation . 31 3.4.4 Machine Learning Trials . 32 3.5 Statistical Methods . 33 3.5.1 Testing Significance with the Wilcoxon Signed Rank Test . 33 3.5.2 Limiting False Discovery for Multiple Hypothesis Testing with the Benjamini-Hochberg Procedure . 36 3.5.3 Comparing Current and Forward Learning Performance . 39 3.5.4 Splitting Helpful Hairs with Effect Size . 40 4 Results 42 4.1 Multinomial Naive Bayes: a Negative Outcome . 42 4.2 Logistic Regression: a Mixed Outcome . 45 4.3 Random Forest: a Positive Outcome . 49 4.4 Bernoulli Naive Bayes: a Very Positive Outcome . 53 5 Threats to Validity 56 5.1 Construct Validity . 56 v 5.2 Internal Validity . 57 5.3 Conclusion Validity . 58 5.4 External Validity . 59 6 Conclusions and Suggestions for Implimentation 60 6.1 Current-Version Tuning . 61 6.2 Forward Tuning . 62 6.3 Pre-Trials: Does Tuning Work for My Learner? . 63 6.4 Tuned Learner Pooling . 64 6.5 Summary of Conclusions: Research Questions Revisited . 65 A Multinomial Naive Bayes Forward Results By Project 71 B Logistic Regression Forward Results By Project 73 C Random Forest Forward Results By Project 75 D Bernoulli Naive Bayes Forward Results By Project 77 E Multinomial Naive Bayes Forward Results By Dataset 79 F Logistic Regression Forward Results By Dataset 82 G Random Forest Forward Results By Dataset 85 H Bernoulli Naive Bayes Forward Results By Dataset 88 I Multinomial Bayes Current Version Results By Dataset 91 vi J Logistic Regression Current Version Results By Dataset 92 K Random Forest Current Version Results By Dataset 93 L Bernoulli Bayes Current Version Results By Dataset 94 vii List of Figures 3.1 Confusion matrix examples and labels . 18 3.2 g contours as a function of pD, pF . 20 3.3 Confusion Matrix for m tests of significance . 37 4.1 Multinomial Naive Bayes Mean g Box Plots . 44 4.2 Histogram of Logistic Regression Current-Version Successful Tunings . 46 4.3 Logistic Regression Mean g Box Plots . 48 4.4 Histogram of Random Forest Current-Version Successful Tunings . 50 4.5 Random Forest Mean g Box Plots . 52 4.6 Histogram of Bernoulli Bayes Current-Version Successful Tunings . 53 4.7 Bernoulli Naive Bayes Mean g Box Plots . 55 viii List of Tables 3.1 CKJM Extended Metrics [17] . 22 3.2 CKJM datasets of the PROMISE repository . 24 3.3 Sci-kit Learn Multinomial Naive Bayes Parameters . 27 3.4 Sci-kit Learn Bernoulli Naive Bayes Parameters . 28 3.5 Sci-kit Learn Logistic Regression Parameters . 28 3.6 Sci-kit Learn Random Forest Parameters . 29 3.7 Version-Forward Learning Outcomes . 40 3.8 Cohen '88 Effect Sizes . 41 3.9 Version-Forward Learning Outcomes by Effect Size . 41 4.1 Version-Forward Learning Outcomes: Multinomial Naive Bayes Total Counts 43 4.2 Version-Forward Learning Outcomes: Logistic Regression Total Counts . 47 4.3 Version-Forward Learning Outcomes: Logistic Regression Xalan counts . 47 4.4 Version-Forward Learning Outcomes: Random Forest Total Counts . 50 4.5 Version-Forward Learning Outcomes: Random Forest Total Counts . 51 4.6 Version-Forward Learning Outcomes: Bernoulli Bayes Total Counts . 54 4.7 Version-Forward Learning Outcomes: Bernoulli Bayes Total Counts . 54 ix Chapter 1 Introduction 1.1 Motivation The ultimate end-goal of the work presented in this thesis is the betterment of the quality of future software.

Load more