An Autopsy of a Web Browser's Leaked Crash Reports

Crashing Privacy: An Autopsy of a Web Browser’s Leaked Crash Reports Kiavash Satvat Nitesh Saxena University of Alabama at Birmingham University of Alabama at Birmingham [email protected] [email protected] ABSTRACT private information that can be found in crash reports which may Harm to the privacy of users through data leakage is not an un- pose a threat to the users’ privacy. known issue, however, it has not been studied in the context of In this paper, to demonstrate possible risks that the current the crash reporting system. Automatic Crash Reporting Systems implementation of ACRS may pose to users’ privacy, we study a (ACRS) are used by applications to report information about the er- dataset of partially anonymized crash reports released by one of 1 rors happening during a software failure. Although crash reports the major browsers . The current state of the art browsers collect are valuable to diagnose errors, they may contain users’ sensitive relatively similar data including browsed URL during the crash and information. In this paper, we study such a privacy leakage vis-a- runtime information, as they can be required for the debugging. vis browsers’ crash reporting systems. As a case study, we mine To the best of our knowledge, this is the first case study which a dataset consisting of crash reports collected over the period of scrutinizes a database of 2.5 million partially anonymized crash re- six years. Our analysis shows the presence of more than 20,000 ports, containing visited URLs, system runtime information and sessions and token IDs, 600 passwords, 9,000 email addresses, an users’ descriptions. Conducting our inspection, we extracted a sig- enormous amount of contact information, and other sensitive data. nificant amount of private data including, approximately 20,000 Our analysis sheds light on an important security and privacy is- session and token IDs, 600 clear text password, 9,000 email address, sue in the current state-of-the-art browser crash reporting systems. and an enormous amount of contact information, including names, Further, we propose a hotfix to enhance users’ privacy and security addresses, and phone numbers. The obtained results signify the de- in ACRS by removing sensitive data from the crash report priorto ficiency of the current approach and shows the potential for ACRS submit the report to the server. Our proposed hotfix can be easily to be a compelling target for attackers. Furthermore, to address integrated into the current implementation of ACRS and has no im- this deficiency, we proposed a hotfix to remove the sensitive infor- pact on the process of fixing bugs while maintaining the reports’ mation from the environmental information, users’ feedback, and readability. visited URLs at the client side prior to disseminating to the server. Our Contribution. The detail contributions of our system are as follow. 1 INTRODUCTION • Data Analysis (Section 4). We conduct an empirical study on A few studies were conducted, focusing on legal, financial, reputa- partially anonymized crash reports released by one of the major tional, and other aspects of information disclosure [1, 10, 30]. Some web browsers. Through our inspection, we extract and represent of these studies analyzed a specific data leakage incident [1, 10]. a significant amount of personal information and confidential Similarly, several studies analyzed the security and privacy of web data presented in the reports, delineating to what extent the cur- browsers [8, 19, 31, 35]. However, no previous study was conducted rent implementation of ACRS can be harmful to users’ security to examine the risks that the current implementation of browsers’ and privacy. Our analysis sheds light on an important security Automatic Crash Reporting System (ACRS) may pose to the users’ and privacy issue in the current state-of-the-art browser crash privacy. arXiv:1808.01718v1 [cs.CR] 6 Aug 2018 reporting systems. A crash as an inevitable fact occurs when a software terminates • Deploying A Hotfix (Section 5). According to the result of our its normal execution process unexpectedly. The use of ACRS is a data analysis, we propose a hotfix, to enhance users’ privacy and common approach to collect crash reports from the users. ACRS security in ACRS, by removing sensitive data from the users’ de- makes troubleshooting smooth amongst a variety of factors con- scription field and URL. Our hotfix can be easily integrated into tributing to the crash, such as underlying hardware and associated the current implementation of ACRS and has no impact on the software. Using ACRS, companies collect crash reports from clients process of fixing bugs while maintaining the reports’ readability. to address application bugs and improve users’ experience during However, it protects users’ private data against unauthorized ac- web browsing. This information is used by developers to detect cess in the cases server gets compromised or data get leaked. the crash root cause and rectify the bug. Chrome, Firefox, Safari, Paper Organization. The rest of this paper is organized as fol- and Opera are a few common names that use the ACRS to collect lows. Section 2 provides the background about ACRS and explains field crashes [4, 16, 25, 27]. In web browsers, ACRS is designed to the dataset used in this study. Section 3 defines the methodology collect the stack trace of the failing thread, user’s system environ- used to analyze the data and presents the reports. In Section 4, we mental information (e.g., OS and browser version), and probable present the results of our data analysis. Section 5 describes insights user’s feedback on the crash. However, a fact that has been widely disregarded is the content 1The name of the browser, year, and the fields of dataset are removed or anonymized of the crash collected from the users’ systems and the amount of as discussed in Section 3.1 and countermeasures. Finally, Section 6 concludes our study and • Deleted Fields: Fields like email and login were deleted since suggests future research directions. they explicitly carry sensitive information. • Masked Fields: In this category, some efforts have been taken for data de-identification. In ip, some records were de-identified 2 STUDY PRELIMINARIES AND DATASET by replacing the last two digits of IP address with zeros. • Untouched Fields: All fields except email, login and address 2.1 ACRS Background apparently remained untouched, however, not necessarily all The use of ACRS is a common approach for collecting and handling of them convey any meaningful information. In this paper, we crashes. ACRS primarily consists of two main components. First, mostly focused on these attributes to demonstrate the breach of what we refer to as Collector, a set of client side libraries which the information. collects the data from the clients and transfers it to the server for further processing. Second, the Processor which is a set of servers 3 METHODOLOGY and services responsible for analyzing, categorizing, and reporting In this section, we explain our approach for inspecting and ana- the collected crashes. On the client side, Collector, as an interacting lyzing the dataset. We also describe the methodology we use to interface between the Processor and clients, runs when the appli- present our results in Section 4. cation terminates its normal execution process. Collector typically asks for the users’ permission to send the crash report to Proces- 3.1 Ethical Consideration sor or to quit without sending the report. While choosing to send the report to the Processor, the user has an option to include the We conducted our research based on a dataset which was initially visited website at the time crash occurred or not. The user also can published and later removed following a report regarding the pres- provide additional information in the description field to explain ence of sensitive information. During our scrutiny, however, we no- the issue that caused the crash. Additionally, there might be an op- ticed the presence of similar data which is widely available for the tion for the user to provide an email address for future support public use. Therefore our study aims to highlight a potential pitfall and contributions. After these steps, Collector transfers a dump of associated with the current implementation of the ACRS to the re- crash report and collected details to Processor that is responsible searchers, software developers, and the society. Our study neither for further processing and presenting for the view of developers propagates the data nor does it draw attention to any specific com- and the general public. pany or individual. Hence, we believe our study causes no further The collected data from clients are referred to as Crash reports. harm to victims. To preserve the privacy of associated individuals, Each crash report is composed of two components. The first com- companies and third parties, and avoid working directly with this ponent is the stack trace of the failed thread which is generally sensitive information, we used complex queries and avoided per- referred to the minidump and carries information about the sys- forming a manual check. Therefore, all results provided in Section tem memory. The second component is the runtime information 4 represents a close estimation based on the output of our queries. of the users’ system and the possible feedback provided by the For the presentation, we removed identifiable information from user. While these details are essential for the developers to detect our results, including but not limited to the names of individuals, where the problem lies, they may contain users’ sensitive informa- websites, applications, companies and their contact information, tion. For instance, the minidump as a sensitive chunk of data may such as phone numbers and addresses. The de-identification pro- contain information such as username, password, and encryption cess was performed using a python script replacing this sensitive key [7].

An Autopsy of a Web Browser's Leaked Crash Reports

Challenges and Issues of Mining Crash Reports

Crash Reporting: Mozilla’S Open Source Solution

Why Did This Reviewed Code Crash? an Empirical Study of Mozilla Firefox

City Research Online

The Unreasonable Effectiveness of Traditional Information Retrieval in Crash Report Deduplication

Supporting Software Development Based on Crash Report Mining and Analysis

Privacy-Aware Remote Error Analysis on Commodity Software

Statistical Debugging

A Crash Reporting Library for Android

Stephen a Ridley

Look 360: an Android Application Using Google Analytics and Firebase with Mobile Cloud Computing

Ingimp: Introducing Instrumentation to an End-User Open Source Application Michael Terry, Matthew Kay, Brad Van Vugt, Brandon Slack, Terry Park David R