To What Extent Could We Detect Field Defects? an Empirical

View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Institutional Knowledge at Singapore Management University Singapore Management University Institutional Knowledge at Singapore Management University Research Collection School Of Information Systems School of Information Systems 9-2012 To what extent could we detect field defects? An empirical study of false negatives in static bug finding tools Ferdian THUNG Singapore Management University, [email protected] Lucia Lucia Singapore Management University, [email protected] David LO Singapore Management University, [email protected] Lingxiao JIANG Singapore Management University, [email protected] Premkumar DEVANBU University of California, Davis See next page for additional authors DOI: https://doi.org/10.1145/2351676.2351685 Follow this and additional works at: https://ink.library.smu.edu.sg/sis_research Part of the Software Engineering Commons Citation THUNG, Ferdian; Lucia, Lucia; LO, David; JIANG, Lingxiao; DEVANBU, Premkumar; and RAHMAN, Foyzur. To what extent could we detect field defects? An empirical study of false negatives in static bug finding tools. (2012). ASE 2012: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering: September 3-7, 2012, Essen, Germany. 50-59. Research Collection School Of Information Systems. Available at: https://ink.library.smu.edu.sg/sis_research/1591 This Conference Proceeding Article is brought to you for free and open access by the School of Information Systems at Institutional Knowledge at Singapore Management University. It has been accepted for inclusion in Research Collection School Of Information Systems by an authorized administrator of Institutional Knowledge at Singapore Management University. For more information, please email [email protected]. Author Ferdian THUNG, Lucia Lucia, David LO, Lingxiao JIANG, Premkumar DEVANBU, and Foyzur RAHMAN This conference proceeding article is available at Institutional Knowledge at Singapore Management University: https://ink.library.smu.edu.sg/sis_research/1591 To What Extent Could We Detect Field Defects? An Empirical Study of False Negatives in Static Bug Finding Tools Ferdian Thung1, Lucia1,DavidLo1, Lingxiao Jiang1, Foyzur Rahman2, and Premkumar T. Devanbu2, 1Singapore Management University, Singapore 2University of California, Davis {ferdianthung,lucia.2009,davidlo,lxjiang}@smu.edu.sg, {mfrahman,ptdevanbu}@ucdavis.edu ABSTRACT estimated that bugs cost the US economy billions of dollars Software defects can cause much loss. Static bug-finding annually [48]. Bugs are not merely economically harmful; tools are believed to help detect and remove defects. These they can also harm life & property when mission critical sys- tools are designed to find programming errors; but, do they tems malfunction. Clearly, techniques that can detect and in fact help prevent actual defects that occur in the field reduce bugs would be very beneficial. To achieve this goal, and reported by users? If these tools had been used, would many static analysis tools have been proposed to find bugs. they have detected these field defects, and generated warn- Static bug finding tools, such as FindBugs [28], JLint [5], ings that would direct programmers to fix them? To answer and PMD [14], have been shown to be helpful in detecting these questions, we perform an empirical study that investi- many bugs, even in mature software [8]. It is thus reason- gates the effectiveness of state-of-the-art static bug finding able to believe that such tools are a useful adjunct to other tools on hundreds of reported and fixed defects extracted bug finding techniques such as testing and inspection. from three open source programs: Lucene, Rhino, and As- Although static bug finding tools are effective in some set- pectJ. Our study addresses the question: To what extent tings, it is unclear whether the warnings that they generate could field defects be found and detected by state-of-the-art are really useful. Two issues are particularly important to static bug-finding tools? Different from past studies that are be addressed: First, many warnings need to correspond to concerned with the numbers of false positives produced by actual defects that would be experienced and reported by such tools, we address an orthogonal issue on the numbers users. Second, many actual defects should be captured by of false negatives. We find that although many field defects the generated warnings. For the first issue, there have been could be detected by static bug finding tools, a substantial a number of studies showing that the numbers of false warn- proportion of defects could not be flagged. We also analyze ings (or false positives) are too many, and some have pro- the types of tool warnings that are more effective in finding posed techniques to prioritize warnings [23–25, 44]. While field defects and characterize the types of missed defects. the first issue has received much attention, the second issue has received less. Many papers on bug detection tools just report the number of defects that they can detect. It Categories and Subject Descriptors is unclear how many defects are missed by these bug de- D.2.5 [Testing and Debugging]: Debugging aids/Testing tection tools. While the first issue is concerned with false tools positives, the second focuses on false negatives. We argue that both issues deserve equal attention as both have impact General Terms on the quality of software systems. If false positives are not satisfactorily addressed, this would make bug finding tools Experimentation, Measurement, Reliability unusable. If false negatives are not satisfactorily addressed, the impact of these tools on software quality would be min- Keywords imal. On mission critical systems, false negatives may even Static bug-finding tools, field defects, false negatives deserve more attention. Thus, there is a need to investigate the false negative rates of such tools on actual field defects. 1. INTRODUCTION Our study tries to fill this research gap by answering the following research question, and we use the term “bug” and Bugs are prevalent in many software systems. The Na- “defect” interchangeably both of which refer to errors or tional Institute of Standards and Technology (NIST) has flaws in a software: To what extent could state-of-the-art static bug finding tools detect field defects? Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are To investigate this research question, we make use of abun- not made or distributed for profit or commercial advantage and that copies dant data available in bug-tracking systems and software bear this notice and the full citation on the first page. To copy otherwise, to repositories. Bug-tracking systems, such as Bugzilla or JIRA, republish, to post on servers or to redistribute to lists, requires prior specific record descriptions of bugs that are actually experienced and permission and/or a fee. ASE ’12, September 3–7, 2012, Essen, Germany reported by users. Software repositories contain information Copyright 12 ACM 978-1-4503-1204-2/12/09 ...$15.00. on what code elements get changed, removed, or added at different periods of time. Such information can be linked to- pabilities of the tools, such tools may still suffer from false gether to track bugs and when and how they get fixed. JIRA negatives with respect to all kinds of defects. has the capability to link a bug report with the changed code In this study, we analyze several static bug finding tools that fixes the bug. Also, many techniques have been em- that make use of warning patterns for bug detection. These ployed to link bug reports in Bugzilla to their corresponding tools are lightweight and can scale to large programs. On the SVN/CVS code changes [19,51]. These data sources provide downside, these tools do not consider the specifications of a us descriptions of actual field defects and their treatments. system, and may miss defects due to specification violations. Based on the descriptions, we are able to infer root causes Other bug finding tools also use dynamic analysis tech- of defects (i.e., the faulty lines of code) from the bug treat- niques, such as dynamic slicing [50], dynamic instrumenta- ments. To ensure accurate identification of faulty lines of tion [38], directed random testing [12, 21, 45], and invariant code, we perform several iterations of manual inspections detection [11, 20]. Such tools often explore particular parts to identify lines of code that are responsible for the defects. of a program and produce no or few false positives. How- Then, we are able to compare the identified root causes with ever, they seldom cover all parts of a program; they are thus the lines of code flagged by static bug finding tools, and to expected to have false negatives. analyze the proportion of defects that are missed or captured There are also studies on bug prediction with data mining by the tools. and machine learning techniques, which may have both false In this work, we perform an exploratory study with three positives and negatives. For example, Sliwerski et al. [46] an- state-of-the-art static bug finding tools, FindBugs, PMD, alyze code change patterns that may cause defects. Ostrand and Jlint, on three reasonably large open source Java pro- et al. [40] use a regression model to predict defects. Nagap- grams, Lucene, Rhino, and AspectJ. Lucene, Rhino, and pan et al. [36] apply principal component analysis on the AspectJ have 58, 35, and 9 committers respectively1.We code complexity metrics of commercial software to predict use bugs reported in JIRA for Lucene version 2.9, and the failure-prone components. Kim et al. [33] predict potential iBugs dataset provided by Dallmeier and Zimmermann [19] faults from bug reports and fix histories. for Rhino and AspectJ. Our manual analysis identifies 200 real-life defects that we can unambiguously locate faulty 2.2 FindBugs lines of code.

To What Extent Could We Detect Field Defects? an Empirical

The Origins of Severe Software Defects on Evolving Information Systems: a Double Case Study

Marcia Knous: My Name Is Marcia Knous

Mozillamessaging.Com Site Redesign Site Map — Version 3.0 — September 22, 2008

A First Look at Firefox OS Security

Imagem Digitalizada

An Empirical Analysis of the Bug-Fixing Process in Open Source

Eclipse BIRT 2.5 Release Review May 27, 2009

The Bugzilla Guide - 2.22.7 Release

Mozilla Development Roadmap

Reduction of Redundant Rules in Association Rule Mining-Based Bug Assignment 1. Introduction an Essential Aspect of Software

Firefox OS Web Apps for Science

Escuela Técnica Superior De Ingenieros Industriales Y De Telecomunicación