Challenges and Issues of Mining Crash Reports

Challenges and Issues of Mining Crash Reports Le An, Foutse Khomh SWAT, Polytechnique Montreal,´ Quebec,´ Canada fle.an, [email protected] Abstract—Automatic crash reporting tools built in many soft- In the rest of this paper, we describe the structure of a ware systems allow software practitioners to understand the crash report before presenting some analytic techniques to origin of field crashes and help them prioritise field crashes mine implicit information from crash reports. For each of the or bugs, locate erroneous files, and/or predict bugs and crash occurrences in subsequent versions of the software systems. In techniques, we discuss its challenges and issues. After giving this paper, after illustrating the structure of crash reports in an overview of the related work, we conclude the paper with Mozilla, we discuss some techniques for mining information from suggestions for future studies. crash reports, and highlight the challenges and issues of these techniques. Our aim is to raise the awareness of the research II. STRUCTURE OF CRASH REPORTS community about issues that may bias research results obtained In this section, we take crash reports from Mozilla Socorro from crash reports and provide some guidelines to address certain server (Mozilla’s crash collecting database) [4] as an example challenges related to mining crash reports. Index Terms—Crash report, bug report, mining software to describe the typical crash collecting system. repositories. Mozilla delivers its applications with a built-in automatic crash reporting tool: Mozilla Crash Reporter, which sends a crash report to the Mozilla Socorro server, once end users I. INTRODUCTION encounter an unexpected halt of the application. Each crash Nowadays, crash reporting tools are embedded in many report records a stack trace of the falling thread and other software systems to collect information about crashes in the information about the user’s environment. A stack trace is an field (i.e., when a program stops functioning properly in a ordered set of frames where each frame indicates a method user environment). Crashes with the same crashing signature, signature and a link to the corresponding source code. Source the stack trace of the failing thread, will be grouped into code information is not always available in the frames, espe- a crash type. Usually, software quality managers prioritise cially when a frame belongs to a third party binary [1]. crash types by the number of crash occurrences, then file Crash Report – e1c1267874640-94324-32423 Each Crash Report is the top crash types into bug reports. Crash reports provide assigned a unique ID Crash Time - OCT 24, 2010 11:20:53 useful reference to analyse and resolve bugs. They could help Firefox Install Time – SEP 22, 2010 10:20:15 System Uptime – 1125 seconds software practitioners locate erroneous code, understand the Version- 3.6.13 User Environment impact of failures, and prioritise crash-related bug reports. OS – Windows NT 6.1 2600 Information CPU – x86 Leveraging crash reports, previous studies propose new User Comment – ideas to improve the current software maintenance techniques. Stack Trace – Frame Module Signature Source All crash Reports with top Khomh et al. [1] analysed crash reports of Mozilla Firefox 0 @0x654789 signature as 1 User32.dll UserCallWinProcCheckWow “UserCallWinCheckWow” are and proposed an entropy metric that reveals the distribution of 2 User32.dll DispatchMethod grouped together 3 User32.dll DispatchMessage the occurrences of crashes among end users. Wang et al. [2] Not all frames have Source 4 XUI.dll ProcessNextNativeEvent Src/win/nsAppShell.cpp:179 Information studied the crash reports of Firefox and stack traces of Eclipse. 5 XUI.dll nsShell::OnProcess Src/win/nsShell.cpp:77 6 Nspr4.dll mozilla::Pump Ipc/glue/MessagePump.cpp:134 They proposed five rules to automatically identify correlated 7 XUI.dll MessageLoop:Run Ipc/glue/MessageLoop.c:784 crash types. In a recent study, we also analysed crash reports from Mozilla products to identify bugs that affect a large Fig. 1: A sample crash report from Firefox population of users [3]. However, some inherent properties of crash reports may Figure 1 shows a sample crash report from Mozilla Firefox. challenge the research conclusions if researchers and software Among all information provided in a crash report, researchers practitioners do not take care of them when they mine infor- and software practitioners may be interested in the following mation from crash reports. For example, user information is attributes when they study crashes for software maintenance: not always available in crash reports, and the many-to-many • crash signature: top method signature of a crash. Crashes relationship between crashes and their related bugs increases with the same crash signature are grouped into one crash the difficulty of analysis. Furthermore, since there are not type in the Socorro server [1]. enough publicly opened crash collecting databases, researchers • Install age: the time in seconds from the installation until can hardly estimate the latent peril due to some of these the crash occurred. issues (e.g., lack of precise users’ information) or validate the • Crash date: the point of time when the crash was re- generalisability of their findings. ported. /15/$31.00 c 2015 IEEE 17 SWAN 2015, Montréal, Canada • OS name: name of the operating system on which the Algorithm 1: Map different crashed users to a crash type, crash occurred. and map different crash types to a bug • OS version: version of the operating system on which the Input: signature, user, buglist crash occurred. 1 if signature in dictcrash then • CPU type: family model of the CPU of a machine on 2 stackuser dictcrash[signature] which the crash occurred. 3 add user to stackuser 4 else • Bug list: list of bugs related to the crash. 5 stackuser new stack with user • Up time: the amount of seconds since the application 6 dictcrash[signature] stackuser startup. 7 foreach bug in buglist do Software researchers and practitioners can refer to the So- 8 if bug in bug dict then corro documentation [5] for the description of other attributes. 9 setsignature dictbug[bug] 10 add signature to setsignature 11 else III. ANALYSIS OF CRASH REPORTS AND CHALLENGES 12 setsignature new Set with signature In this section, we describe some analytic techniques of 13 dictbug[bug] setsignature crash reports for software maintenance and discuss their challenges and issues. Algorithm 2: Map crashed user occurrences to their bugs A. User Identification Input: dictbug When we study crash reports, an important process is to 1 foreach bug in dictbug do identify unique users who encounter crashes in a software 2 setsignature dictbug[bug] 3 signature in set system. Mapping crashes to different users can help software foreach signature do 4 stackuser dictcrash[signature] practitioners understand the dispersion of these crashes in the 5 concatenate stackuser to occuruser user base and determine the priority of related crash types and bugs. However, user identity is not always available. For example, Mozilla crash reports do not contain personal information indicating unique users reporting the crashes due On the one hand, we can use the bug lists indicated in to privacy concerns. Khomh et al. [1] used a heuristic where crash reports to directly map crashes to bugs. However, some they took installed point of time (subtraction of crash date crashes are reported before the opening of their corresponding from install age) of the studied system to identify different bugs. If we apply the direct mapping only, we may omit the users (noted as install profiles). In our recent work [3], which information about these crashes. consisted in studying the crashing impact of bugs on the user On the other hand, we can also perform an indirect mapping population base, we used a vector of computer configuration from crashes to bugs via crash types. In other words, we (combination of CPU type, OS name, and OS version) to perform a two-step mapping as following: we map each bug represent different “users” (noted as machine profiles) in a to a set of crash types (i.e., a group of crashes with the system. same crash signature), then map each crash type to a set of However, these user identification heuristics may lead to crash reports. Concretely, we build two hash tables, dictbug inconsistent results. For example, when we apply the heuristic and dictcrash. In any item of dictbug, the key is a bug ID by install profile, two users who install an application at the and the value is a set of crash signatures. In any item of same point of time (by second) may encounter quite different dictcrash, the key is a crash signature and the value is a types of crashes. When we apply the heuristic by machine stack1 of crash reports. By associating the two hash tables, profile, users with the same computer configuration may also we can finally map a bug to a set of corresponding crash exhibit different crashing characteristics. In other words, users reports, even including those lacking bug information (i.e., that happened to install a software at the same time or in crashes reported before the creation of the bugs). Similarly, the same model of computer, especially for enterprise users to investigate the crash distribution in the user base, we can whose computers are uniformly configured or software is also link different users (represented by install profiles or simultaneously installed, may behave differently, and should machine profiles) to a bug by analysing the bug’s related crash not be clustered together in terms of crash analysis. reports. Algorithm 1 and 2 show the pseudocode to link a bug to the users suffering its related crashes.2 However, in B. Mapping between Crashes and Bugs a software system, a crash report may bring about several When we study the characteristics of bugs related to field bugs, a bug may be also related to more than one crashes.

Challenges and Issues of Mining Crash Reports

Crash Reporting: Mozilla’S Open Source Solution

An Autopsy of a Web Browser's Leaked Crash Reports

Why Did This Reviewed Code Crash? an Empirical Study of Mozilla Firefox

City Research Online

The Unreasonable Effectiveness of Traditional Information Retrieval in Crash Report Deduplication

Supporting Software Development Based on Crash Report Mining and Analysis

Privacy-Aware Remote Error Analysis on Commodity Software

Statistical Debugging

A Crash Reporting Library for Android

Stephen a Ridley

Look 360: an Android Application Using Google Analytics and Firebase with Mobile Cloud Computing

Ingimp: Introducing Instrumentation to an End-User Open Source Application Michael Terry, Matthew Kay, Brad Van Vugt, Brandon Slack, Terry Park David R