View of Defect Mining Approaches
Total Page:16
File Type:pdf, Size:1020Kb
PRECISION IMPROVEMENT AND COST REDUCTION FOR DEFECT MINING AND TESTING By BOYA SUN Submitted in partial fulfillment of the requirements For the degree of Doctor of Philosophy Dissertation Advisor: Dr. H. Andy Podgurski Department of Electrical Engineering and Computer Science CASE WESTERN RESERVE UNIVERSITY January, 2012 CASE WESTERN RESERVE UNIVERSITY SCHOOL OF GRADUATE STUDIES We hereby approve the thesis/dissertation of Boya Sun ______________________________________________________ Doctor of Philosophy candidate for the ________________________________degree *. Andy Podgurski (signed)_______________________________________________ (chair of the committee) Gultekin Ozsoyoglu ________________________________________________ Soumya Ray ________________________________________________ M. Cenk Cavusoglu ________________________________________________ ________________________________________________ ________________________________________________ 10/20/2011 (date) _______________________ *We also certify that written approval has been obtained for any proprietary material contained therein. TABLE OF CONTENTS Table of Contents ..................................................................................................................................... I List of Tables ....................................................................................................................................... VII List of Figures ....................................................................................................................................... IX Acknowledgements ............................................................................................................................... XI Abstract .............................................................................................................................................. XIII Chapter One. Introduction ............................................................................................................... 15 1.1 Precision improvement and cost reduction for defect mining ............................................... 15 1.1.1 Overview of defect mining approaches ......................................................................... 15 1.1.2 Costs of defect mining ................................................................................................... 16 1.1.3 Proposed approaches ..................................................................................................... 17 1.2 Precision improvement and cost reduction for operational software testing ......................... 18 1.3 Contributions ......................................................................................................................... 19 Chapter Two. Related work ............................................................................................................. 20 I 2.1 Bug detection by mining frequent code patterns ................................................................... 20 2.2 Bug detection by employing revision histories ..................................................................... 21 2.3 Classifying and ranking static warnings ................................................................................ 22 2.4 Application and augmentation of static analysis tools .......................................................... 23 2.5 Considering cost in software testing and reliability .............................................................. 24 2.6 Test case clustering and classification ................................................................................... 26 Chapter Three. Background ............................................................................................................... 28 3.1 Program dependence graph and system dependence graph ................................................... 28 3.2 Dependence graph based bug mining .................................................................................... 30 3.3 Dependence graph based bug fix propagation ....................................................................... 31 3.4 Cost sensitive active learning ................................................................................................ 32 3.4.1 Active learning .............................................................................................................. 32 3.4.2 Cost-sensitive active learning ........................................................................................ 33 Chapter Four. Improving precision of dependence graph based defect mining: a machine learning II approach 35 4.1 Introduction ........................................................................................................................... 35 4.2 Previously proposed classification and ranking techniques .................................................. 37 4.3 Proposed Solution ................................................................................................................. 40 4.3.1 Classifying and Ranking Rules ..................................................................................... 40 4.3.2 Classifying and Ranking Violations .............................................................................. 44 4.4 Empirical Study ..................................................................................................................... 47 4.4.1 Methodology ................................................................................................................. 47 4.4.2 Summary of the trained Rule/Violation models ............................................................ 51 4.4.3 HP-1: Comparing our rule model with the baseline rule models .................................. 54 4.4.4 HP-2: Comparing our violation model with the baseline violation models .................. 56 4.4.5 HP-3: Learning curves................................................................................................... 58 Chapter Five. Extending static analysis by automatically mining project-specific rules ................. 61 5.1 Introduction ........................................................................................................................... 61 III 5.2 The Rule Mining Tool and Static Analysis Tool Used In This Work .................................... 64 5.2.1 Mining Frequent Code Patterns ..................................................................................... 64 5.2.2 Static Analysis Tools and Custom Checkers ................................................................. 67 5.3 Automatic P2C (Pattern to Checker) Converter .................................................................... 69 5.3.1 Rule Extractor ............................................................................................................... 70 5.3.2 Checker Generator ......................................................................................................... 75 5.4 Empirical Study ..................................................................................................................... 81 5.4.1 Preparing patterns for analysis ...................................................................................... 82 5.4.2 R-1: Generality of generated checkers .......................................................................... 84 5.4.3 R-2: Effectiveness of the generated checkers ................................................................ 88 5.5 Lessons Learned .................................................................................................................... 91 Chapter Six. Bug fix propagation with fast subgraph matching ..................................................... 94 6.1 Introduction ........................................................................................................................... 94 6.2 GADDI: index based Fast subgraph matching algorithm ................................................... 101 IV 6.3 Specifics of Our Approach .................................................................................................. 102 6.3.1 Base graph generation ................................................................................................. 103 6.3.2 Generating a query graph from a bug fix: the PatternBuild tool ................................. 106 6.3.3 Applying the GADDI Algorithm ................................................................................. 110 6.4 Empirical evaluation ........................................................................................................... 110 6.4.1 Study design ................................................................................................................ 111 6.4.2 Results ......................................................................................................................... 114 6.4.3 Threats to Validity ....................................................................................................... 124 Chapter Seven. CARIAL: Cost-Aware reliability improvement with active learning .................. 127 7.1 Introduction ......................................................................................................................... 128 7.2 Operational distribution and failure rates ............................................................................ 131 7.3 The CARIAL Framework...................................................................................................