Automatic Tracking of Personally Identifiable Information in Windows Meisam Navaki Arefi Geoffrey Alexander Jedidiah R
Total Page:16
File Type:pdf, Size:1020Kb
PIITracker: Automatic Tracking of Personally Identifiable Information in Windows Meisam Navaki Arefi Geoffrey Alexander Jedidiah R. Crandall University of New Mexico University of New Mexico University of New Mexico Albuquerque, New Mexico, USA Albuquerque, New Mexico, USA Albuquerque, New Mexico, USA [email protected] [email protected] [email protected] ABSTRACT the network. PIITracker is based on dynamic information flow Personally Identifiable Information (PII) is information that tracking (DIFT) [29], also known as Dynamic Taint Analysis [22], can be used on its own or with other information to distinguish which is a promising technology for making systems transparent. It or trace an individual’s identity. To investigate an application for works by tagging certain inputs or data with meta information and PII tracking, a reverse engineer has to put considerable effort to then propagating tags as the program or system runs, usually at the reverse engineer an application and discover what an application instruction level, so that the transparency of the flow of information does with PII. To automate this process and save reverse engineers can be achieved. substantial time and effort, we propose PIITracker which is anew The PII that we have investigated in this paper are (1) MAC and novel tool that can track PII automatically and capture if any address, (2) hard drive serial number, (3) hard drive model name, processes are sending PII over the network. This is made possible by (4) volume serial number, (5) host name, (6) computer name, (7) 1) whole-system dynamic information flow tracking 2) monitoring security identifier number (SID), (8) CPU model, and (9) Windows specific function and system calls. We analyzed 15 popular chat version and build. applications and browsers using PIITracker, and determined that Our goal is to develop a tool that can automatically reveal how a 12 of these applications collect some form of PII. Windows application treats PII and determine whether an applica- tion sends any form of PII out using the network. For this end, we CCS CONCEPTS have three requirements. (i) We must be to able to monitor reading of PII by the application. As an input to our system, we need to • Security and privacy → Software security engineering; know when and where PII may be read. (ii) We must be able to KEYWORDS track PII throughout the system. Applications often read PII but use it for other purposes, never sending it our over the network. Privacy, Reverse Engineering, Dynamic Information Flow Tracking (iii) We must be able to automatically analyze an application and ACM Reference Format: return the result back to the analyst. Meisam Navaki Arefi, Geoffrey Alexander, and Jedidiah R. Crandall. 2018. To fulfil these requirements, we have developed PIITracker asa PIITracker: Automatic Tracking of Personally Identifiable Information in plugin for PANDA [15], which is a framework for dynamic analysis. Windows. In EuroSec’18: 11th European Workshop on Systems Security , April We have monitored specific function and system calls to track 23–26, 2018, Porto, Portugal. ACM, New York, NY, USA, 6 pages. https: reading PII, and we have used PANDA’s taint analysis engine to //doi.org/10.1145/3193111.3193114 track PII throughout the system. In summary, this paper presents the following contributions: 1 INTRODUCTION Personally Identifiable Information (PII) is information that canbe • We propose PIITracker, a DIFT-based reverse engineering used on its own or along with other information to distinguish or tool for PII tracking, which can automatically track PII and trace an individual’s identity. Other information that is linkable to capture if a process sends any form of PII over the network. an individual, such as a MAC address or hard drive serial number, • We implement PIITracker as a PANDA plugin supporting also counts as PII. As users face new threats to their privacy and Windows 7 as the guest operating system. We also make anonymity, VPNs and anonymity tools such as Tor [14] are becom- PIITracker open source for the community, which is available ing increasingly popular. However, applications that send PII over for public download. the network still pose a threat to user privacy and anonymity. • We analyze 15 popular Windows applications using PIITracker. In this paper, we propose PIITracker, a reverse engineering tool Our analysis show that the majority of these application col- for automatic tracking of PII. PIITracker can determine whether lect some form of PII and send it over the network. an application is collecting PII, and, if so, does it send it out over The paper is organized as follows. Section 2 provides background Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed information on dynamic information flow tracking systems. Section for profit or commercial advantage and that copies bear this notice and the full citation 3 discusses PIITracker architectural design and implementation on the first page. Copyrights for third-party components of this work must be honored. details. Section 4 presents our experimental results, analysis of false For all other uses, contact the owner/author(s). EuroSec’18, April 23–26, 2018, Porto, Portugal positives and false negatives, as well as performance evaluation. © 2018 Copyright held by the owner/author(s). Section 5 discusses related works and finally section 6 concludes ACM ISBN 978-1-4503-5652-7/18/04. the paper. https://doi.org/10.1145/3193111.3193114 EuroSec’18, April 23–26, 2018, Porto, Portugal M. Navaki Arefi et al. x = 0; char taintedinput; x = y + 1; if (y == 1) char untaintedoutput = 0; (a) Direct flow. x = 1; for (bit = 1; bit < 256; bit <<= 1){ (b) Indirect flow. if (bit & taintedinput) untainedoutput |= bit; Figure 1: Examples of direct and indirect flow } Figure 3: An example of control dependencies in C code const char str1 = "Tainted string"; char str2[80]; char lookuptable[256]; but how to handle them in a purely dynamic environment is still for (i = 0; i < 256; i++) an open area of resaearch. See Espinoza et al. [17] for more details. lookuptable[i] = i; The taint2 PANDA plugin that we utilize in this paper tracks for (j = 0; j < strlen(str1); j++) direct flows in the standard fashion (by copying or combining taint str2[j] = lookuptable[str1[j]]; tags). It also handles one type of indirect flow, namely address de- pendencies. It achieves this by propagating the taint from addresses Figure 2: An example of address dependencies in C code used to load data used in computations or copies. The taint2 plugin does not handle control dependencies. One of the interesting results of our paper is that tracking address dependencies, but not control 2 BACKGROUND - DIFT dependencies, is sufficient to handle common operations applied to Dynamic Information Flow Tracking (DIFT) [29], also called dy- PII such as encryption and hashing. namic taint analysis [22], has been used for over a decade but has still seen limited application in real reverse engineering tasks. DIFT 3 IMPLEMENTATION AND DESIGN works by tagging certain inputs or data and then propagating tags This section describes in details PIITracker architectural design and as the program or system runs. In this way the information flow of implementation details. a program can be determined. There are two main types of flows that DIFT systems can track: 3.1 Architecture direct and indirect [29]. Direct flows are data-flow based, while PIITracker was implemented as a plugin to PANDA [15, 33] which is indirect flows are control-flow based. For example, in Fig. 1a, there an open-source platform for dynamic analysis. PANDA is built upon is a direct flow from y to x. However, in Fig. 1b, the value of x is the QEMU whole-system emulator [5]. PANDA added to QEMU dependent on y, meaning that there is an indirect flow from y to x. three main features: There are two types of direct flows: copy and computation depen- (1) The ability to record and replay executions to enable whole dencies. In a copy dependency, a value is copied from one location system analysis. to another, where a location can be a byte, a word of memory or a (2) An architecture to streamline the addition of other plugins CPU register. The simplest way to track this information flow is to to QEMU. propagate the tag so that the destination location is tainted with the (3) The ability to use LLVM [26] as QEMU’s intermediate lan- same tag as the source location. In computation dependencies, tags guage instead of TCG [4] for analysis. must be combined. For example, after the computation for x = y +z the tag for x should contain the union of the tags for y and z; or, in Figure 4 shows an overview of the PIITracker architecture. PI- single-bit tag DIFT systems, x should be tainted if either y or z is ITracker runs on top of Linux (we used Ubuntu 14.04 for develop- tainted. ment and testing) as the host OS, and supports Windows 7 32-bit as An indirect flow occurs when information dependent on program the guest OS. PIITracker interacts with three other plugins, namely input determines how information flows. There are two types of taint2, syscalls2 and OSI/Win7x86intro which are described below: indirect flows: address and control dependencies. Figure 2 provides • taint2: This plugin tracks information flow throughout the an example of address dependencies. In this figure, if the characters system via whole-system taint analysis. It provides APIs in str1 are tainted then the characters in str2 at the end should and callbacks to tag memory locations and later query a also be tainted, because they carry the exact same information.