View of Defect Mining Approaches

Total Page:16

File Type:pdf, Size:1020Kb

View of Defect Mining Approaches PRECISION IMPROVEMENT AND COST REDUCTION FOR DEFECT MINING AND TESTING By BOYA SUN Submitted in partial fulfillment of the requirements For the degree of Doctor of Philosophy Dissertation Advisor: Dr. H. Andy Podgurski Department of Electrical Engineering and Computer Science CASE WESTERN RESERVE UNIVERSITY January, 2012 CASE WESTERN RESERVE UNIVERSITY SCHOOL OF GRADUATE STUDIES We hereby approve the thesis/dissertation of Boya Sun ______________________________________________________ Doctor of Philosophy candidate for the ________________________________degree *. Andy Podgurski (signed)_______________________________________________ (chair of the committee) Gultekin Ozsoyoglu ________________________________________________ Soumya Ray ________________________________________________ M. Cenk Cavusoglu ________________________________________________ ________________________________________________ ________________________________________________ 10/20/2011 (date) _______________________ *We also certify that written approval has been obtained for any proprietary material contained therein. TABLE OF CONTENTS Table of Contents ..................................................................................................................................... I List of Tables ....................................................................................................................................... VII List of Figures ....................................................................................................................................... IX Acknowledgements ............................................................................................................................... XI Abstract .............................................................................................................................................. XIII Chapter One. Introduction ............................................................................................................... 15 1.1 Precision improvement and cost reduction for defect mining ............................................... 15 1.1.1 Overview of defect mining approaches ......................................................................... 15 1.1.2 Costs of defect mining ................................................................................................... 16 1.1.3 Proposed approaches ..................................................................................................... 17 1.2 Precision improvement and cost reduction for operational software testing ......................... 18 1.3 Contributions ......................................................................................................................... 19 Chapter Two. Related work ............................................................................................................. 20 I 2.1 Bug detection by mining frequent code patterns ................................................................... 20 2.2 Bug detection by employing revision histories ..................................................................... 21 2.3 Classifying and ranking static warnings ................................................................................ 22 2.4 Application and augmentation of static analysis tools .......................................................... 23 2.5 Considering cost in software testing and reliability .............................................................. 24 2.6 Test case clustering and classification ................................................................................... 26 Chapter Three. Background ............................................................................................................... 28 3.1 Program dependence graph and system dependence graph ................................................... 28 3.2 Dependence graph based bug mining .................................................................................... 30 3.3 Dependence graph based bug fix propagation ....................................................................... 31 3.4 Cost sensitive active learning ................................................................................................ 32 3.4.1 Active learning .............................................................................................................. 32 3.4.2 Cost-sensitive active learning ........................................................................................ 33 Chapter Four. Improving precision of dependence graph based defect mining: a machine learning II approach 35 4.1 Introduction ........................................................................................................................... 35 4.2 Previously proposed classification and ranking techniques .................................................. 37 4.3 Proposed Solution ................................................................................................................. 40 4.3.1 Classifying and Ranking Rules ..................................................................................... 40 4.3.2 Classifying and Ranking Violations .............................................................................. 44 4.4 Empirical Study ..................................................................................................................... 47 4.4.1 Methodology ................................................................................................................. 47 4.4.2 Summary of the trained Rule/Violation models ............................................................ 51 4.4.3 HP-1: Comparing our rule model with the baseline rule models .................................. 54 4.4.4 HP-2: Comparing our violation model with the baseline violation models .................. 56 4.4.5 HP-3: Learning curves................................................................................................... 58 Chapter Five. Extending static analysis by automatically mining project-specific rules ................. 61 5.1 Introduction ........................................................................................................................... 61 III 5.2 The Rule Mining Tool and Static Analysis Tool Used In This Work .................................... 64 5.2.1 Mining Frequent Code Patterns ..................................................................................... 64 5.2.2 Static Analysis Tools and Custom Checkers ................................................................. 67 5.3 Automatic P2C (Pattern to Checker) Converter .................................................................... 69 5.3.1 Rule Extractor ............................................................................................................... 70 5.3.2 Checker Generator ......................................................................................................... 75 5.4 Empirical Study ..................................................................................................................... 81 5.4.1 Preparing patterns for analysis ...................................................................................... 82 5.4.2 R-1: Generality of generated checkers .......................................................................... 84 5.4.3 R-2: Effectiveness of the generated checkers ................................................................ 88 5.5 Lessons Learned .................................................................................................................... 91 Chapter Six. Bug fix propagation with fast subgraph matching ..................................................... 94 6.1 Introduction ........................................................................................................................... 94 6.2 GADDI: index based Fast subgraph matching algorithm ................................................... 101 IV 6.3 Specifics of Our Approach .................................................................................................. 102 6.3.1 Base graph generation ................................................................................................. 103 6.3.2 Generating a query graph from a bug fix: the PatternBuild tool ................................. 106 6.3.3 Applying the GADDI Algorithm ................................................................................. 110 6.4 Empirical evaluation ........................................................................................................... 110 6.4.1 Study design ................................................................................................................ 111 6.4.2 Results ......................................................................................................................... 114 6.4.3 Threats to Validity ....................................................................................................... 124 Chapter Seven. CARIAL: Cost-Aware reliability improvement with active learning .................. 127 7.1 Introduction ......................................................................................................................... 128 7.2 Operational distribution and failure rates ............................................................................ 131 7.3 The CARIAL Framework...................................................................................................
Recommended publications
  • Building Useful Program Analysis Tools Using an Extensible Java Compiler
    Building Useful Program Analysis Tools Using an Extensible Java Compiler Edward Aftandilian, Raluca Sauciuc Siddharth Priya, Sundaresan Krishnan Google, Inc. Google, Inc. Mountain View, CA, USA Hyderabad, India feaftan, [email protected] fsiddharth, [email protected] Abstract—Large software companies need customized tools a specific task, but they fail for several reasons. First, ad- to manage their source code. These tools are often built in hoc program analysis tools are often brittle and break on an ad-hoc fashion, using brittle technologies such as regular uncommon-but-valid code patterns. Second, simple ad-hoc expressions and home-grown parsers. Changes in the language cause the tools to break. More importantly, these ad-hoc tools tools don’t provide sufficient information to perform many often do not support uncommon-but-valid code code patterns. non-trivial analyses, including refactorings. Type and symbol We report our experiences building source-code analysis information is especially useful, but amounts to writing a tools at Google on top of a third-party, open-source, extensible type-checker. Finally, more sophisticated program analysis compiler. We describe three tools in use on our Java codebase. tools are expensive to create and maintain, especially as the The first, Strict Java Dependencies, enforces our dependency target language evolves. policy in order to reduce JAR file sizes and testing load. The second, error-prone, adds new error checks to the compilation In this paper, we present our experience building special- process and automates repair of those errors at a whole- purpose tools on top of the the piece of software in our codebase scale.
    [Show full text]
  • Parfait – Designing a Scalable Bug Checker
    Parfait – Designing a Scalable Bug Checker Cristina Cifuentes Bernhard Scholz Sun Microsystems Laboratories Sun Microsystems Laboratories Brisbane, Australia Brisbane, Australia [email protected] and The University of Sydney Sydney, Australia [email protected] ABSTRACT library code to 6 MLOC for the core of the Solaris OS and We present the design of Parfait, a static layered program the compilers. analysis framework for bug checking, designed for scalability and precision by improving false positive rates and scale to In talking to these Sun organizations it became clear that millions of lines of code. The Parfait framework is inherently a set of requirements had not been addressed for those teams parallelizable and makes use of demand driven analyses. to be using existing off-the-shelf bug checking tools for C/C++ code. Such requirements are: In this paper we provide an example of several layers of analyses for buffer overflow, summarize our initial imple- mentation for C, and provide preliminary results. Results • scalability: few tools can run over large (millions of are quantified in terms of correctly-reported, false positive lines of code) code bases in an efficient way. Several and false negative rates against the NIST SAMATE syn- tools cannot parse the code or easily integrate with thetic benchmarks for C code. existing build environments, others can but may take too long (> 3 days) to run. Categories and Subject Descriptors • rate of false positives: the tools that support millions D.2.4 [Software Engineering]: Software/Program Veri- of lines of code tend to report many bugs that are not fication; D.2.8 [Software Engineering]: Metrics; D.3.4 bugs, leading to dissatisfaction and lack of use of the [Programming Languages]: Processors tool.
    [Show full text]
  • Helix QAC and Klocwork: Which One Is Right for You? Perforce Static Code Analyzers Comparison Guide
    & COMPARISON Helix QAC and Klocwork: Which One Is Right For You? Perforce Static Code Analyzers Comparison Guide Perforce’s static code analyzers — Helix QAC and Klocwork — have been trusted for over 30 years to deliver the most accurate and precise results to mission-critical project teams across a variety of industries. However, depending on your project, one of our software development tools may better meet your needs. Here, we breakdown both tools in order to help you decide which one is right for you. Helix QAC: Best For Functional Klocwork: Best For Developer Safety Compliance Productivity, SAST, and DevOps For over 30 years, Helix QAC has been the trusted static code Klocwork SAST and SAQT for C, C++, C#, and Java identifies analyzer for C and C++ programming languages. With its depth software security, quality, and reliability issues and ensures and accuracy of analysis, Helix QAC has been the preferred compliance to a broad spectrum of recognized standards. static code analyzer in tightly regulated and safety-critical Built for enterprise DevOps and DevSecOps, Klocwork industries that need to meet rigorous compliance requirements. scales to projects of any size, integrates with large complex Often, this involves verifying compliance with coding standards environments, a wide range of developer tools, and provides — such as MISRA and AUTOSAR — and functional safety control, collaboration, and reporting for the entire enterprise. standards, such as ISO 26262. This has made Klocwork the preferred static analyzer that keeps development velocity high while enforcing continuous Helix QAC is certified for functional safety compliance by SGS- compliance for security and quality.
    [Show full text]
  • 4D Ariadne the Static Debugger of Java Programs 2011 55 3-4 127 Ariadne Syntax Tree (DST), the Control Flow Graph Build Mechanism, I.E
    Ŕ periodica polytechnica 4D Ariadne the Static Debugger of Electrical Engineering Java Programs 55/3-4 (2011) 127–132 doi: 10.3311/pp.ee.2011-3-4.05 Zalán Sz˝ugyi / István Forgács / Zoltán Porkoláb web: http://www.pp.bme.hu/ee c Periodica Polytechnica 2011 RESEARCH ARTICLE Received 2012-07-03 Abstract 1 Introduction Development environments support the programmer in nu- During software development maintaining and refactoring merous ways from syntax highlighting to different refactoring programs or fixing bugs are essential part of this process. Al- and code generating methods. However, there are cases where though there are prevalent, good quality tools for the latter, there these tools are limited or not usable, such as getting familiar are only a few really reliable tools to maintain huge projects: with large and complex source codes written by a third person; finding the complexity of projects, finding the dependencies of finding the complexities of huge projects or finding semantic er- different modules, understanding large software codes written rors. by third party programmers or finding semantic errors. In this paper we present our static analyzer tool, called 4D In this paper we present 4D Ariadne [21] that helps the pro- Ariadne, which concentrates on these problems. 4D Ariadne grammer or the software architect to deal with maintenance and is a static debugger of Object Oriented applications written in program comprehension. 4D Ariadne is a static debugger tool Java programming language It calculates data dependencies of of Object Oriented programs written in Java programming lan- objects being able to compute them both forward and backward.
    [Show full text]
  • Klocwork 2019.2 System Requirements System Requirements the Following System Configurations Are Required to Run the Klocwork Tools
    Klocwork 2019.2 System Requirements System Requirements The following system configurations are required to run the Klocwork tools. To ensure the best experience, use the recommended settings listed below. Supported platforms The Klocwork Server and Build Tools packages are supported on the following operating systems (except where noted). This means that Klocwork has performed the full test suite on these operating systems with certain hardware and will provide technical support as specified in the Klocwork support policies. Note that for AIX, Mac, and Solaris, the Klocwork Server package is not supported. For more information, see . Klocwork Server Note: It is not possible to use Klocwork tools with SELinux (Security-Enhanced Linux) enabled. Processor Operating system Intel and AMD 32 bit and 64 bit • CentOS 7.5. As of Klocwork 2019.1, includes 7.5 to 7.6. • Debian 8.x to 8.11 and 9.x to 9.5. As of Klocwork 2019.1, includes 9.x to 9.8. As of Klocwork 2019.2, includes 9.x to 9.9. • Fedora 27 to 29. As of Klocwork 2019.2, includes 27 to 30. • OpenSUSE Leap 15, Tumbleweed • SUSE Enterprise 12 to 12 SP4 and Leap 15. As of Klocwork 2019.2, includes Enterprise/Leap 15. • Red Hat Enterprise Linux 7.5. As of Klocwork 2019.1, includes 7.5 to 7.6. • Ubuntu 16.04 to 16.04.4 LTS and 18.04 to 18.04.1 LTS. As of Klocwork 2019.1, includes 16.04 to 16.04.5 LTS and 18.10. As of Klocwork 2019.2, includes 16.04 to 16.04.5 LTS, 18.04 to 18.04.2 LTS, 18.10, 19.04.
    [Show full text]
  • Improving Code Quality in ROS Packages Using a Temporal Extension of first-Order Logic
    Improving code quality in ROS packages using a temporal extension of first-order logic David Come Julien Brunel David Doose Universite Paul Sabatier ONERA, 2 avenue Edouard Belin ONERA, 2 avenue Edouard Belin 31400 Toulouse, France 31400 Toulouse, France 31400 Toulouse, France Abstract—Robots are given more and more challenging tasks (which are not necessarily bugs) will improve code quality. in domains such as transport and delivery, farming or health. Finding such faulty patterns can be done manually by peer- Software is key components for robots, and ROS is a popular review, but this is time and money consuming as it requires open-source middleware for writing robotics applications. Code quality matters a lot because a poorly written software is much to divert one (or several) programmers from their current task more likely to contain bugs and will be harder to maintain over to perform the review. Instead, we propose an approach in time. Within a code base, finding faulty patterns takes a lot which (1) each pattern is specified in a formal language, having of time and money. We propose a framework to search auto- thus an unambiguous meaning, and (2) the detection of the matically user-provided faulty code patterns. This framework is pattern within the code relies on a formal technique (model based on FO++, a temporal extension of first-order logic, and Pangolin, a verification engine for C++ programs. We formalized checking) which is fully automatic and provides an exhaustive with FO++ five faulty patterns related to ROS and embedded exploration of the code. systems. We analyzed with Pangolin 25 ROS packages looking for The specification language for patterns, which we call occurrences of these patterns and found a total of 218 defects.
    [Show full text]
  • Using Static and Runtime Analysis to Improve Developer Productivity And
    Using Static and Runtime Analysis to Improve Developer Productivity and Product Quality Bill Graham and Paul N. Leroux Todd Landry QNX Software Systems Klocwork [email protected], [email protected] [email protected] April 2008 Static and runtime analysis QNX Software Systems Abstract Static analysis can discover a variety of defects and weaknesses in system source code, even before the code is ready to run. Runtime analysis, on the other hand, looks at running software to detect problems as they occur, usually through sophisticated instrumentation. Some may argue that one form of analysis precludes the other, but developers can combine both techniques to achieve faster development and testing as well as higher product quality. The paper begins with static analysis, which prevents problems from entering the main code stream and ensures that any new code is up to standard. Using techniques such as abstract syntax tree (AST) validation and code path analysis, static analysis tools can uncover security vulnerabilities, logic errors, implementation defects, and other problems, both at the developer’s desktop and at system build time. The paper then explores runtime analysis, which developers can perform during module development and system integration to catch any problems missed by static analysis. Runtime analysis not only detects pointer errors and other violations, but also helps optimize utilization of CPU cycles, RAM, flash memory, and other resources. The paper then discusses how developers can combine static and runtime analysis to prevent regressions as a product matures. This two-pronged approach helps to eliminate most problems early in the development cycle, when they cost least to fix.
    [Show full text]
  • Enterprise Application Security HowToBalanceTheUseOfCodeReviewsAndWeb ApplicationFirewallsForPCICompliance
    Enterprise Application Security - How to Balance the use of Code Reviews and Web Application Firewalls for PCI compliance Ulf Mattsson, CTO Protegrity Page 1 Introduction ................................................................................................................................ 3 Payment Card Industry (PCI) Requirements .............................................................................. 4 PCI Requirement 6 - Developing and maintaining secure applications ................................ 4 PCI Requirement 6.6 mandates the following: .................................................................... 4 Complying with Requirement 6.6 ........................................................................................ 4 PCI quarterly network scans – too little too late ................................................................... 5 Requirement 6.6 Option 1 – Application Code Reviews ...................................................... 5 Requirement 6.6 Option 2 – Application Firewalls ............................................................... 6 Application Layer Attacks ........................................................................................................... 6 Web Application Attacks ..................................................................................................... 6 Finding vulnerabilities in applications .................................................................................. 6 Different types of Firewalls .................................................................................................
    [Show full text]
  • Automatic Coding Rule Conformance Checking Using Logic Programming⋆
    Automatic Coding Rule Conformance Checking Using Logic Programming⋆ Guillem Marpons1, Julio Mari˜no1, Manuel Carro1, Angel´ Herranz1, Juan Jos´e Moreno-Navarro1,2, and Lars-Ake˚ Fredlund1 1 Universidad Polit´ecnica de Madrid 2 IMDEA Software {gmarpons,jmarino,mcarro,aherranz,jjmoreno,lfredlund}@fi.upm.es Abstract An extended practice in the realm of Software Engineering and programming in industry is the application of coding rules. Coding rules are customarily used to constrain the use (or abuse) of certain pro- gramming language constructions. However, these rules are usually writ- ten using natural language, which is intrinsically ambiguous and which may complicate their use and hinder their automatic enforcement. This paper presents some early work aiming at defining a framework to for- malise and check for coding rule conformance using logic programming. We show how a certain class of rules – structural rules – can be refor- mulated as logic programs, which provides both a framework for formal specification and also for automatic conformance checking using a Prolog engine. Some examples of rules belonging to actual, third-party coding rule sets are discussed, along with the corresponding Prolog code. Exper- imental data regarding the practicality and impact of their application to real-life software projects is presented and discussed. Keywords: Coding rule checking, Declarative domain-specific languages and applications, Logic programming, Programming environments. 1 Introduction Although there is a trend towards increased use of higher-level languages in the software industry, offering convenient programming constructs such as type-safe execution, automatic garbage collection, etc., it is equally clear that more tra- ditional programming languages like C (which is notorious for fostering dubious practices) remain very popular.
    [Show full text]
  • Using Static Code Analysis to Find Bugs Before They Become Failures
    Using Static Code Analysis to Find Bugs Before They Become Failures Presented by Brian Walker Senior Software Engineer, Video Product Line, Tektronix, Inc. Pacific Northwest Software Quality Conference, 2010 In the Beginning, There Was Lint . Syntactic analysis found many simple coding mistakes – Mismatched arguments – Incompatible data type usage – Uninitialized variables – Printf() format argument mismatch . Analyzed one source file at a time . Generated lot’s of warnings – Not all of them useful . Function is now integrated into most compiler PNSQC 2010: Using Static Code Analysis to Find Bugs Before They Become Failures Brian Walker, 19 October 2010 Beyond Lint . Functional analysis without execution – Functional analysis, not just compilation errors – Data flow and control flow analysis – Typically automated . Incorporates some aspects of reverse engineering – Analyzes objects and their usage – Follows resource allocation, initialization and de-allocation . Considers the whole program – Analyses entire build, not just individual files – Examines all execution paths within functions and between functions – Calculates range of possible values for each variable – line by line, complete coverage (for known issues) . Quality, not quantity – Balance aggressiveness with restraint PNSQC 2010: Using Static Code Analysis to Find Bugs Before They Become Failures Brian Walker, 19 October 2010 Elements of a Static Code Analysis System . Library of issues and patterns – Identifies problem severity – Ability to define new patterns or suppress unwanted patterns . Database for tracking issues – Manage issue priority and ignore false positives – Identify new issues over time and track as source files change – Identify fixed issues . Accessible issue reporting interface – Detailed description of specific issue and location in source code – Assist developers to understand, investigate and fix bugs – Sometimes integrated into IDE .
    [Show full text]
  • Continuous Static Code Analysis for C, C++, C#, and Java
    PRODUCT BRIEF Continuous Static Code Analysis for C, C++, C#, and Java Klocwork is a modern, Agile static code analyzer that automatically scans code for violations based on C, C++, C#, and Java coding rules. It was designed to scale to projects of any size and work effectively within the DevOps cycle. With it, development teams are able to detect defects earlier in development, and ensure that the code is safe, secure, and reliable from the start. Benefits of Using Klocwork MONITOR CODE QUALITY WITH REPORTS AND METRICS IMPROVE CODE QUALITY Klocwork Quality Standard provides an easy way Klocwork can improve the overall quality of your software. to monitor, manage, and improve the reliability of It identifies must-fix defects and provides detailed guidance your software projects. The built-in quality report to help developers fix issues in the source code. What’s classifies software defects into categories, such as more, Klocwork finds all of the most critical software issues suspicious code practices, resource leaks, maintainability, and provides a low false positive rate. and performance. What’s more, the report will show ACCELERATE DEVELOPMENT you the trends, new issues, and areas of code with the most issues in these categories. Klocwork integrates with build systems and continuous integration environments to accelerate development times PRIORITIZE AND ADDRESS ISSUES FASTER by reducing bottlenecks. This enables development teams WITH SMARTRANK to identify defects earlier and more frequently — when Klocwork uses a sophisticated analysis to measure they’re easier and less costly to fix. the complexity of code, detect any coding issues, and identify security vulnerabilities.
    [Show full text]
  • An Analysis of Software Quality and Maintainability
    AN ANALYSIS OF SOFTWARE QUALITY AND MAINTAINABILITY METRICS WITH AN APPLICATION TO A LONGITUDINAL STUDY OF THE LINUX KERNEL By Lawrence Gray Thomas Dissertation Submitted to the Faculty of the Graduate School of Vanderbilt University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY In Computer Science August, 2008 Nashville, Tennessee Approved: Dr. Stephen R. Schach Dr. Larry Dowdy Dr. Julie Adams Dr. Richard Alan Peters II Dr. Ralph M. Butler (MTSU) Copyright © 2008 by Lawrence Gray Thomas All Rights Reserved To my beloved wife, Peggy and To the memory of my mother, Nina Irene Gray, Vanderbilt Class of 1947 iii ACKNOWLEDGEMENTS This work was funded in part by a grant from Microsoft Corporation. I am also grateful to Klocwork Inc., for generously providing their software tools used in this research. I am especially indebted to Dr. Stephen R. Schach, Associate Professor of Computer Science and Computer Engineering at Vanderbilt University, who has been invaluable in supporting my research, and who as worked actively to provide me with the protected academic time and other necessary resources to pursue my research. Nobody has been more important to me in the pursuit of this goal than my loving wife, Peggy. I am thankful for her unending encouragement, support, inspiration, and patience. iv TABLE OF CONTENTS Page DEDICATION .................................................................................................................. iii ACKNOWLEDGEMENTS .............................................................................................
    [Show full text]