DESENSITIZATION: Privacy-Aware and Attack-Preserving Crash Report
Ren Ding∗, Hong Hu∗, Wen Xu, Taesoo Kim Georgia Institute of Technology {rding, hhu86, wen.xu, taesoo}@gatech.edu
Abstract—Software vendors collect crash reports from end- To protect users’ privacy while supporting timely bug users to assist in the debugging and testing of their products. fixing, we need to remove users’ sensitive data from crash However, crash reports may contain users’ private information, reports before sending them to developers. We call this process like names and passwords, rendering the user hesitant to share the DESENSITIZATION. Several proposals aim to desensitize the reports with developers. We need a mechanism to protect users’ crash report, but unfortunately, we have not seen any real-world privacy in crash reports on the client side while keeping sufficient deployment. Techniques like Scrash [13] rely on developers information to support server-side debugging and analysis. to manually annotate sensitive data to remove them from In this paper, we propose the DESENSITIZATION technique, crash reports. However, the manual annotation is usually time- which generates privacy-aware and attack-preserving crash re- consuming and error-prone, and does not work on closed- ports from crashed executions. Our tool adopts lightweight source programs. Pattern-based search is a general method methods to identify bug-related and attack-related data from the to identify particular privacy-sensitive data, like URLs and memory, and removes other data to protect users’ privacy. Since email accounts [69], but it cannot handle program-specific data. a large portion of the desensitized memory contains null bytes, we store crash reports in spare files to save the network bandwidth Another set of works aims to desensitize the crashing input, and the server-side storage. We prototype DESENSITIZATION and which should make the program crash in a similar manner as the apply it to a large number of crashes of real-world programs, original [19], [80], [24], [4]. However, these proposals rely on like browsers and the JavaScript engine. The result shows that computation-heavy techniques, such as symbolic execution [15], our DESENSITIZATION technique can eliminate 80.9% of non- [22], [20], [16] or taint analysis [76], [61], [68], causing zero bytes from coredumps, and 49.0% from minidumps. The significant burdens to users or developers. More importantly, desensitized crash report can be 50.5% smaller than the original they require a non-trivial change to the current crash-report one, which significantly saves resources for report submission systems, which mainly use crash files instead of crashing inputs and storage. Our DESENSITIZATION technique is a push-button to classify and prioritize different bugs [2], [5], [36], [58], [79]. solution for the privacy-aware crash report. Given the understanding of the current obstacles, we identify I.INTRODUCTION three desirable properties that a DESENSITIZATION technique should have for wider deployment: Software vendors collect crash reports from their end-users to improve the stability and security of their products [2], [5], • Privacy-aware: The technique removes users’ private [36], [58], [79]. The crash report is either a coredump file information from the crash report as much as possible. that captures the CPU context and the memory content of the • Attack-preserving. The desensitized crash report still crashed program [1], [51], or the program input that makes the contains sufficient information for effective bug analysis. If execution crash [19], [80], [24], [4]. Unfortunately, in either the execution crashes due to a failed attack, the technique case, the crash report may contain privacy-sensitive information should keep the attack residue in the report. of individual users. For example, a recent study on 2.5 million • Widely applicable. The technique should be lightweight crash reports of a popular web browser identifies an enormous for end-users and compatible with current crash-report amount of private user data, including user names, passwords, systems. It should work on closed-source programs, cover session tokens, and personal contact information [69]. A crash various types of sensitive data, and generate minimal crash report with private information is unwelcome to both users reports for resource-limited environments. and developers. While users are unwilling to share the crash In this paper, we propose a bug-oriented and attack-oriented reports due to the concern of leaking their private information, method to desensitize crash reports so that they capture a developers do not want to collect privacy-sensitive reports snapshot of the crashed program with minimal risk of leaking where any storage breach may lead to bad publicity, and even users’ private information. Our observation is that although financial liability. there are many bugs on the client side, only a few general
∗ techniques are used on the server side to diagnose bugs The two lead authors contributed equally to this work. and detect attacks. For example, call-stack-based analysis is widely used to classify and prioritize crash reports [57], [8]. Therefore, DESENSITIZATION keeps only the information for Network and Distributed Systems Security (NDSS) Symposium 2020 commonly used bug-analysis techniques and removes all other 23-26 February 2020, San Diego, CA, USA ISBN 1-891562-61-4 data to protect users’ privacy. We design one general and four https://dx.doi.org/10.14722/ndss.2020.24428 lightweight techniques to identify bug-related and attack-related www.ndss-symposium.org data from the crashed memory snapshot. The general technique 1 #define MAX_LEN 64 stack heap ROP chain gadgets shellcode 2 typedef struct { char *ptr; void (*print)(); } String; pop rsi xor rdx, rdx S 3 void printString() { ... } passwd … 4 void vuln(char *input) { ret G1 add rbx, 0x8 5 char passwd[MAX_LEN]; load_passwd(passwd); … S’ buf ptr … 6 char *buf=( char *) malloc(MAX_LEN); pop rdi 7 String*string= (String*) malloc( sizeof(String)); string print ret mov [rdx], rbx 8 string->ptr= buf; string->print=&printString; … G2 syscall 9 strcpy(buf, input); 10 string->print(); xchg rax,rsp mprotect 11 } ret Gpivot G3
(a) Vulnerable source code. (b) Memory layout before crash.
Fig. 1: A vulnerable code snippet and the memory layout before the execution crashes. The code has a heap-based buffer overflow bug at line 9. Attackers corrupt a function pointer to pivot the stack, launch ROP attacks, and eventually run the shellcode. However, due to the incorrect address, the program crashes and generates a coredump. The stack contains user password — a sensitive private data value.
scans the memory snapshot to identify all pointers, as most Type Semantics Examples crashes and attacks stem from corrupted pointers. For each type Cookies website tokens Geo=(4.3267961,52.096922) of prevalent bug and attack, we design one technique to identify Forms current user-input