Privacy-Aware Remote Error Analysis on Commodity Software

Panalyst: Privacy-Aware Remote Error Analysis on Commodity Software Rui Wang†, XiaoFeng Wang† and Zhuowei Li‡ †Indiana University at Bloomington, ‡Center for Software Excellence, Microsoft wang63,xw7 @indiana.edu, [email protected] { } Abstract typically achieved by running an error reporting tool on a client system, which gathers data related to an applica- Remote error analysis aims at timely detection and rem- tion’s runtime exception (such as a crash) and transmits edy of software vulnerabilities through analyzing run- them to a server for diagnosis of the underlying software time errors that occur on the client. This objective can flaws. This paradigm has been widely adopted by soft- only be achieved by offering users effective protection ware manufacturers. For example, Microsoft relies on of their private information and minimizing the perfor- WER to collect data should a crash happen to an applica- mance impact of the analysis on their systems without tion. Similar tools developed by the third party are also undermining the amount of information the server can extensively used. An example is BugToaster [27], a free access for understanding errors. To this end, we propose crash analysis tool that queries a central database using in the paper a new technique for privacy-aware remote the attributes extracted from a crash to seek a potential analysis, called Panalyst. Panalyst includes a client com- fix. These tools, once complemented by automatic anal- ponent and a server component. Once a runtime excep- ysis mechanisms [44, 34] on the server side, will also tion happens to an application, Panalyst client sends the contribute to quick detection and remedy of critical se- server an initial error report that includes only public in- curity flaws that can be exploited to launch a large-scale formation regarding the error, such as the length of the cyber attack such as Worm epidemic [47, 30]. packet that triggers the exception. Using an input built from the report, Panalyst server performs a taint analysis The primary concern of remote error analysis is its pri- and symbolic execution on the application, and adjusts vacy impact. An error report may include private user the input by querying the client about the information information such as a user’s name and the data she sub- upon which the execution of the application depends. mitted to a website [9]. To reduce information leaks, er- The client agrees to answer only when the reply does ror reporting systems usually only collect a small amount not give away too much user information. In this way, of information related to an error, for example, a snippet an input that reproduces the error can be gradually built of the memory around a corrupted pointer. This treat- on the server under the client’s consent. Our experimen- ment, however, does not sufficiently address the privacy tal study of this technique demonstrates that it exposes a concern, as the snippet may still carry confidential data. very small amount of user information, introduces neg- Moreover, it can also make an error report less informa- ligible overheads to the client and enables the server to tive for the purpose of rapid detection of the causal bugs, effectively analyze an error. some of which could be security critical. To mitigate this problem, prior research proposes to instrument an application to log its runtime operations and submit the 1 Introduction sanitized log once an exception happens [25, 36]. Such approaches affect the performance of an application even Remote analysis of program runtime errors enables when it works normally, and require nontrivial changes timely discovery and patching of software bugs, and has to the application’s code: for example, Scrash [25] needs therefore become an important means to improve soft- to do source-code transformation, which makes it un- ware security and reliability. As an example, Microsoft suitable for debugging commodity software. In addition, is reported to fix 29 percent of all Windows XP bugs these approaches still cannot ensure that sufficient infor- within Service Pack 1 through its Windows Error Re- mation is gathered for a quick identification of critical porting (WER) utility [20]. Remote error analysis is security flaws. Alternatively, one can analyze a vulner- USENIX Association 17th USENIX Security Symposium 291 able program directly on the client [29]. This involves software manufacturers to demonstrate their “due dili- intensive debugging operations such as replaying the in- gence” in preserving their customers’ privacy, and by a put that causes a crash and analyzing an executable at third party to facilitate collaborative diagnosis of vulner- the instruction level [29], which could be too intrusive to able software. the user’s normal operations to be acceptable for a prac- We sketch the contributions of this paper as follows: tical deployment. Another problem is that such an analysis consumes a large amount of computing resources. For example, instruction-level tracing of program execu- Novel framework for remote error analysis. We pro- • tion usually produces an execution trace of hundreds of pose a new framework for remote error analysis. megabytes [23]. This can hardly be afforded by the client The framework minimizes the impact of an analy- with limited resources, such as Pocket PC or PDA. sis to the client’s performance and resources, lets We believe that a good remote analyzer should help the user maintain a full control of her information, the user effectively control the information to be used in and in the meantime provides her the convenience an error diagnosis, and avoid expensive operations on the to contribute to the analysis the maximal amount of client side and modification of an application’s source or information she is willing to reveal. On the server binary code. On the other hand, it is also expected to side, our approach interleaves the construction of offer sufficient information for automatic detection and an accurate input for triggering an error, which is remedy of critical security flaws. To this end, we pro- achieved through interactions with the client, and pose Panalyst, a new technique for privacy-aware remote the analysis of the bug that causes the error. This analysis of the crashes triggered by network inputs. Pan- feature allows our analyzer to make full use of the alyst aims at automatically generating a new input on the information provided by the client: even if such in- server side to accurately reproduce a crash that happens formation is insufficient for reproducing the error, it on the client, using the information disclosed according helps discover part of input attributes, which can be to the user’s privacy policies. This is achieved through fed into other debugging mechanisms such as fuzz collaboration between its client component and server testing [35] to identify the bug. component. When an application crashes, Panalyst client Automatic control of information leaks. We present identifies the packet that triggers the exception and gen- • erates an initial error report containing nothing but the our design of new privacy policies to define the public attributes of the packet, such as its length. Taking maximal amount of information that can be leaked the report as a “taint” source, Panalyst server performs an for individual fields of an application-level proto- instruction-level taint analysis of the vulnerable applica- col. We also developed a new technique to enforce tion. During this process, the server may ask the client such policies, which automatically evaluates the in- questions related to the content of the packet, for exam- formation leaks caused by responding to a question ple, whether a tainted branching condition is true. The and then makes decision on whether to submit the client answers the questions only if the amount of infor- answer in accordance with the policies. mation leaked by its answer is permitted by the privacy Implementation and evaluations. We implemented policies. The client’s answers are used by the server to • a prototype system of Panalyst and evaluated it us- build a new packet that causes the same exception to the ing real applications. Our experimental study shows application, and determine the property of the underlying that Panalyst can accurately restore the causal input bug, particularly whether it is security critical. of an error without leaking out too much user infor- Panalyst client measures the information leaks associ- mation. Moreover, our technique has been demon- ated with individual questions using entropy. Our pri- strated to introduce nothing but negligible over- vacy policies use this measure to define the maximal heads to the client. amount of information that can be revealed for individual fields of an application-level protocol. This treatment enables the user to effectively control her information The rest of the paper is organized as follows. Section 2 during error reporting. Panalyst client does not perform formally models the problem of remote error analysis. any intensive debugging operations and therefore incurs Section 3 elaborates the design of Panalyst. Section 4 only negligible overheads. It works on commodity appli- describes the implementation of our prototype system. cations without modifying their code. These properties Section 5 reports an empirical study of our technique us- make a practical deployment of our technique plausible. ing the prototype. Section 6 discusses the limitations of In the meantime, our approach can effectively analyze a our current design. Section 7 presents the related prior vulnerable application and capture the bugs that are ex- research, and Section 8 concludes the paper and envi- ploitable by malicious inputs. Panalyst can be used by sions the future research. 292 17th USENIX Security Symposium USENIX Association 2 Problem Description brought in by error reporting systems. For example, advice has been given to turn off WER on Windows We formally model the problem of remote error analysis Vista and Windows Mobile to improve their perfor- as follows.

Privacy-Aware Remote Error Analysis on Commodity Software

Challenges and Issues of Mining Crash Reports

Crash Reporting: Mozilla’S Open Source Solution

An Autopsy of a Web Browser's Leaked Crash Reports

Why Did This Reviewed Code Crash? an Empirical Study of Mozilla Firefox

City Research Online

The Unreasonable Effectiveness of Traditional Information Retrieval in Crash Report Deduplication

Supporting Software Development Based on Crash Report Mining and Analysis

Statistical Debugging

A Crash Reporting Library for Android

Stephen a Ridley

Look 360: an Android Application Using Google Analytics and Firebase with Mobile Cloud Computing

Ingimp: Introducing Instrumentation to an End-User Open Source Application Michael Terry, Matthew Kay, Brad Van Vugt, Brandon Slack, Terry Park David R