Enabling Fine-Grained Permissions in Smartphones

by

Nisarg Raval

Department of Computer Science Duke University

Date: Approved:

Ashwin Machanavajjhala, Supervisor

Landon Cox

Alvin Lebeck

Maria Gorlatova

Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science in the Graduate School of Duke University 2019 ABSTRACT Enabling Fine-Grained Permissions in Smartphones

by

Nisarg Raval

Department of Computer Science Duke University

Date: Approved:

Ashwin Machanavajjhala, Supervisor

Landon Cox

Alvin Lebeck

Maria Gorlatova

An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science in the Graduate School of Duke University 2019 Copyright c 2019 by Nisarg Raval All rights reserved except the rights granted by the Creative Commons Attribution-Noncommercial Licence Abstract

The increasing popularity of smart devices that continuously monitor various aspects of users’ life and the prevalence of third-party services that utilize these data feeds have resulted in a serious threat to users’ privacy. One-sided focus on the utility of these ap- plications (apps) and lack of proper access control mechanism often lead to inadvertent (or deliberate) leak of sensitive information about users. At the core of protecting user data on smart devices lies the permissions framework. It arbitrates apps’ accesses to re- sources on the device. The existing permissions frameworks in smartphones are largely coarse-grained allowing apps to collect more information than that is required for their functionality thereby putting users’ privacy at risk. In this dissertation, we address these privacy concerns by proposing an extensible per- missions framework that gives users fine-grained control over the resources accessed by apps. It uses permissions plugins which are special modules that govern the app’s access to resources on the device. We develop a number of permissions plugins to arbitrate ac- cess to key resources including location, contacts, camera and external storage. Moreover, we show that the existing privacy solutions can be easily integrated in our framework via plugins. We also develop two novel privacy frameworks that help users balance privacy- utility tradeoffs, and allow them to take an informed decision about sharing their data to apps in order to obtain services in return. We envision a repository of permissions plugins where privacy experts publish plugins that are customized to the needs of users as well as apps, and users simply install the plugins they are interested in to protect their privacy.

iv To Mom, Dad, and Mansi

v Contents

Abstract iv

List of Tables x

List of Figures xi

Acknowledgements xiii

1 Introduction 1

1.1 Primary Contributions ...... 8

1.2 Thesis Organization ...... 9

2 Background 11

2.1 Android Permissions ...... 11

2.2 SE Android ...... 13

2.3 Android Internals ...... 14

2.3.1 Binder IPC ...... 14

2.3.2 Launching apps ...... 15

3 Plugin-driven Permissions Framework for Smartphones 16

3.1 Introduction ...... 17

3.2 Related Work ...... 19

3.3 Overview ...... 22

3.3.1 Trust model ...... 22

3.3.2 Design principles ...... 22

vi 3.3.3 Dalf ...... 26

3.3.4 Malicious plugins ...... 27

3.4 Implementation ...... 29

3.4.1 Plugins ...... 29

3.4.2 Interposers ...... 30

3.5 Permissions Plugins ...... 35

3.5.1 Location plugin ...... 35

3.5.2 Contacts plugin ...... 35

3.5.3 Camera plugin ...... 36

3.5.4 Storage plugin ...... 37

3.6 Evaluation ...... 37

3.6.1 Experimental methodology ...... 37

3.6.2 Performance slowdown ...... 39

3.6.3 Memory overhead ...... 42

3.6.4 Battery usage ...... 42

3.6.5 Scalability ...... 43

3.6.6 Real world apps ...... 44

3.7 Limitations ...... 45

3.7.1 Design ...... 46

3.7.2 Prototype ...... 47

3.8 Conclusion ...... 48

4 Privacy Markers for Protecting Visual Secrets 49

4.1 Introduction ...... 50

4.2 Motivation ...... 51

4.3 Approach Overview ...... 53

vii 4.3.1 Design principles ...... 53

4.3.2 Trust and attacker model ...... 56

4.3.3 Limitations ...... 57

4.4 Implementation ...... 58

4.4.1 Android’s camera subsystem ...... 58

4.4.2 WAVEOFF in Android ...... 60

4.5 Evaluation ...... 64

4.5.1 User study ...... 64

4.5.2 Evaluating real-world scenarios ...... 69

4.5.3 Performance impact on mobile device ...... 73

4.6 Related Work ...... 75

4.7 Conclusion ...... 77

5 Sensor Privacy through Utility Aware Obfuscation 78

5.1 Introduction ...... 78

5.2 OLYMPUS Overview ...... 82

5.2.1 Problem setting ...... 82

5.2.2 Design principles ...... 83

5.2.3 Privacy framework ...... 85

5.3 Utility Aware Obfuscation ...... 88

5.3.1 Problem formulation ...... 88

5.3.2 Learning to obfuscate ...... 92

5.4 Implementation ...... 94

5.4.1 Constructing OLYMPUS ...... 94

5.4.2 Training OLYMPUS ...... 95

5.4.3 Deploying OLYMPUS ...... 97

viii 5.5 Experiments ...... 98

5.5.1 Experimental setup ...... 99

5.5.2 Evaluation on android app ...... 103

5.5.3 Evaluation on benchmark datasets ...... 105

5.5.4 Utility evaluation ...... 107

5.5.5 Privacy evaluation ...... 107

5.5.6 Privacy-utility tradeoff ...... 109

5.5.7 Effect of correlation ...... 111

5.5.8 Obfuscation time ...... 113

5.5.9 Scaling to multiple applications ...... 113

5.5.10 App classifiers ...... 114

5.6 Related Work ...... 115

5.7 Conclusion ...... 117

6 Conclusion 119

6.1 Future Work ...... 121

A Supporting Materials for OLYMPUS 124

A.1 Neural Network Architectures ...... 124

A.1.1 OLYMPUS for images ...... 124

A.1.2 OLYMPUS for motion sensors ...... 129

Bibliography 132

Biography 141

ix List of Tables

3.1 A summary of how prior work compares to DALF...... 19

3.2 The resource interposers currently supported by the DALF prototype. . . . 30

3.3 Configurations used in evaluating DALF...... 38

4.1 WAVEOFF API for interacting with the camera service...... 61

4.2 Salient features of user study participants...... 66

4.3 Scenarios to evaluate WAVEOFF...... 69

5.1 Summary of benchmark datasets used for evaluating OLYMPUS...... 99

5.2 Evaluation results on DL4Mobile...... 103

5.3 Example images obfuscated by OLYMPUS...... 104

5.4 Accuracy of attackers on obfuscated data...... 107

5.5 Evaluating OLYMPUS using LR as an app classifier...... 114

x List of Figures

2.1 Android permissions...... 12

3.1 A high-level overview of DALF’s design...... 25

3.2 The performance slowdown in DALF when accessing various resources. . 39

3.3 The memory overhead and battery usage in DALF under different workloads. 40

3.4 Performance of DALF as we apply plugins to multiple instances of LOCFINDER (top row) and FILEREADER (bottom row)...... 41

3.5 Results of finding nearby restaurants in Grubhub app...... 44

4.1 Hypothetical scenarios...... 52

4.2 An illustration of marking a coffee mug safe via WAVEOFF...... 54

4.3 Android’s camera subsystem...... 59

4.4 Usability results of WAVEOFF...... 67

4.5 WAVEOFF results on (a) plain background, (b) plain background with pri- vate object, and (c) cluttered background with private objects use cases. . 70

4.6 (a) Frames per second achieved by WAVEOFF across all the use cases. (b) Accuracy (left) and runtime (right) on multi object video...... 72

4.7 Performance impact on (a) Power consumption, (b) Memory consumption, and (c) CPU load, over 60 seconds usage...... 74

5.1 OLYMPUS framework ...... 85

5.2 OLYMPUS architecture for image data...... 94

5.3 OLYMPUS architecture for motion sensor data...... 95

5.4 An illustration of how OLYMPUS intercepts and obfuscates images re- quested by the target app Classify...... 97

xi 5.5 Accuracy of App (in blue) and Attacker (in red) networks on obfuscated data while training the Obfuscator...... 105

5.6 Classification accuracy of the target apps on unperturbed and perturbed data.106

5.7 Comparison with existing approaches...... 109

5.8 Learning obfuscation when sensitive and useful properties are correlated. 111

xii Acknowledgements

The last six years of my life have been a rollercoaster ride with many ups and downs. Nevertheless, it was a joyful and an intellectually satisfying ride, thanks to folks who were instrumental in making my PhD a successful journey! I am extremely grateful to my advisor, Ashwin Machanavajjhala for recognizing my potential and yearn for research. His invaluable guidance and an unvarying impetus to push my boundaries have helped me become a better researcher. I am also thankful to Landon Cox for his guidance and feedback that played a major role in shaping this dissertation. I would like to thank all my committee members for their critical feedback. I enjoyed working with many amazing collaborators who helped me nurture my re- search skills and made the paper deadlines bearable! Special thanks to Ali, Animesh, and Xi; I learned a lot from them. I would like to thank Ye Wang from MERL and Badrul Sarwar from LinkedIn, for mentoring me during my internships. I thank my friends from Duke without whom it would be hard to survive the grad school life! Ios, you were there at every step of my PhD, from brainstorming ideas to celebrating the milestones. Thank you for your constant support, encouragement, and drinks. I am thankful to Lesia for being a wonderful office-mate with whom I have enjoyed countless hours of discussions on almost every topic. I have also enjoyed the technical discussions with Ben, Ergys, Maryam, and Yan. I thank Marilyn for tirelessly helping me with all the departmental requirements. Thanks to Abhi, Aditi, Seth, and Shruti for making the after school hours enjoyable.

xiii I have been blessed with a wonderful family who has always been there for me with their unconditional support and love! Whatever I have achieved in my life would not have been possible without them by my side. This work was supported in part by the National Science Foundation grant 1253327 and by DARPA and SPAWAR under contract N66001-15-C-4067.

xiv 1

Introduction

Mobile phones were introduced as a means of communication allowing people to connect from anywhere in the world. With the introduction of smartphones that are equipped with sophisticated sensors and the power of the Internet, the mobile phone no longer remains a mere instrument of communication. People started developing applications (apps) for smartphones that allow users to perform a wide range of activities such as taking pictures, booking tickets, playing games and monitoring health, just to name a few. Today, almost all smartphones come with a sizable set of pre-installed apps. Moreover, users can easily install new apps of their interests from millions of apps available on popular platforms like Google Play [18] and App Store [11]. Many smartphone apps provide services based on the information they collect about the users and their surroundings through the resources available on the device such as cam- era, GPS, accelerometer, etc. Large scale data collection and analysis is crucial for many applications that provide personalized services. For example, apps providing location- based services (e.g., navigation, weather updates, etc.) need to access the user’s location, fitness apps need to access accelerometer to monitor the user’s activity, and video chat apps need to access the camera to enable video calling.

1 Fine-grained collection of user data contains highly sensitive information. For in- stance, location tracking reveals intimate details of a user’s life such as visits to a speciality clinic [70], a user’s health status could be inferred from the motion sensor data [122], and videos captured by cameras on smartphones could include confidential documents [104]. A naive solution to protect user’s privacy is to perform on-device analytics so that the user data never leave the device. However, many apps offload the user data to a server or the cloud for various analytics. It is hard to avoid the offloading of user data because a) many apps use complex algorithms which are difficult to run on a resource constraint device, b) often app developers use third-party services that are hosted on the cloud, and c) apps may use proprietary code or data that can not be shipped to a user’s device. Not only do mobile apps collect sensitive user information but they also sell the user data to third-party companies who then use it for various purposes such as user profiling, targeted advertising, and surveillance. Consider a recent study conducted by The New York Times on how apps collect and sell users’ location information without their knowl- edge [38]. According to the study at least 75 companies receive precise location data from apps that provide location based services such as weather information or local news. Many of these companies were tracking up to 200 million mobile devices in the United States. By inspecting one such database, they found that the user’s location was recorded as often as every two seconds! Even though the location data was anonymized, the Times suc- cessfully identified users from the anonymized location data. In fact, a separate study on mobility data of 1.5 million individuals showed that only four spatio-temporal points are enough to uniquely identify 95% of the individuals [59]. It also shows that coarsening spa- tial and temporal resolution of the trajectories do not significantly decays the uniqueness of human mobility trajectories. In summary, the user’s privacy in a mobile world is at risk due to the ability to collect large amounts of private user data, and the existence of multiple agents that are incen- tivized to exploit the user data. This leads us to the following question - How to give users

2 fine-grained control over the information they share with the third-party services (apps)? All the leading operating systems such as Android and iOS, protect the user’s privacy through the permissions. The OS mediates apps’ accesses to various resources (e.g., Cam- era, GPS, etc.) on the device. Whenever an app wants to access a particular resource, the OS first asks the user for the necessary permission. The app is given access to the requested resource only if the user grants the appropriate permission. Even though the permissions frameworks in mobile operating systems have evolved over the years, they largely remained coarse-grained and binary. In other words, the user has only two choices when an app requests a particular permission - allow or deny. Moreover, if the user allows the app access to a specific resource, the access is granted forever unless the user explicitly revokes it later. Many apps exploit the coarse-grained nature of the permissions framework to collect more information than that is needed for their functionality [65, 101, 124]. As explained before, the apps thrive on the user data. Thus, simple binary access control (allow/deny) is not sufficient to provide strong privacy without compromising the utility (functionality) of the app, and vice-a-versa. Consider a restaurant-finder app that needs to access the user’s location to display nearby restaurants to the user. Allowing access to the location jeopardizes the user’s privacy while denying the access breaks the functionality of the app. However, there exist many alternatives that provide better privacy- utility tradeoffs. For example, the user can share her approximate location (e.g., within a city block) to the app, thereby achieving some level of privacy without breaking the app. In fact, differential privacy (DP) [61] is one such mechanism that allows users to share their (obfuscated) data with provable privacy guarantees. It has been shown that the user’s privacy preferences are context-dependent, and the decision to grant the access to a resource depends on a number of external factors such as time of day [56, 119], the app’s state [54, 66], access to other resources [48], etc. Thus, it is necessary to move beyond the binary permissions and redesign the permissions framework that seamlessly supports contextual policies and complex protection mechanisms like DP.

3 Developing such a permissions framework for a smartphone OS is challenging due to the following reasons. First, the privacy preferences vary across different users. Thus, having a common framework that satisfies all users is an overkill for the OS. Second, smartphones have a wide range of applications that are constantly being upgraded. It is infeasible for an immutable privacy framework to cater to the ever-changing needs of the apps. Third, the data protection mechanisms are rapidly evolving as new attacks are being discovered. It is difficult to rapidly update a permissions framework residing in the OS to incorporate state of the art privacy solutions. Finally, most of the existing solutions specify access control policies in a predefined domain specific language, which often fails to express complex contextual policies and protection mechanisms. In this dissertation, we address the above challenges by proposing an extensible per- missions framework DALF, that enables users to have flexible fine-grained control over apps’ accesses to resources on the device. DALF is primarily driven by permissions plu- gins which are special modules that govern the access to the user data or resources on the device. The permissions framework provides a common platform to enforce various policies specified by the permissions plugins. The key design principle in DALF is that the permissions plugins are apps which empowers DALF with the following important characteristics:

• Limit trust in permissions plugins: It is crucial to limit the trust extended to per- missions plugins to minimize the damage caused by malicious or buggy plugins. A malicious plugin can compromise the entire system if it runs in the OS. On the other hand, if it runs in the address space of each app , then it can exfiltrate sensitive data from the app as we demonstrate later in Section 3.3.2. Thus, we de- signed permissions plugins as apps so that they are subject to the same isolation mechanisms that Android applies to the regular apps.

• Prevent circumvention of permissions plugins: It should not be possible for apps

4 to bypass the permissions plugins, otherwise they can have unrestricted access to the resources on the device. If a plugin is running in an app’s address space then it is possible for an app to bypass the plugin by changing its runtime environment. Having plugins as apps ensures that a) other apps cannot interfere with the plugins, and b) plugins can enforce the restriction (e.g., obfuscate the location) before apps access the user data.

• Flexible permissions plugins: Having permissions plugins as apps gives plugins the same capabilities as apps. Thus, plugin developers can specify complex policies such as obfuscate the location using Geo-Indistinguishability, a variant of DP [43]. Moreover, they do not need to learn any additional language or tools as they can use the same development tools they use for developing a regular Android app.

At a high level, DALF works as follows. When an app requests a particular resource,

DALF invokes the corresponding plugin and provides it the original resource data. In response, the plugin can perform one of the following three operations – a) allow access to the data, b) deny access to the data, or c) return modified (perturbed) data to forward to the app. We envision a repository of permissions plugins where privacy experts can develop plugins that are customized to the needs of users as well as apps, and users can simply install the plugins they are interested in.

We implemented DALF by instrumenting Android 8.1 that mediates access to the fol- lowing resources: GPS (location), contacts, calendar, camera, and external storage. We developed permissions plugins that enforce different privacy policies for each of these re- sources. We also demonstrate that existing privacy frameworks can be easily implemented

as plugins in DALF. Our evaluation of the prototype with microbenchmarks shows that

DALF incurs very low performance overhead.

DALF is a general purpose permissions framework whose power lies in the permis- sions plugins it uses to arbitrate an app’s access to the user data. Since our goal is to

5 move beyond binary permissions (allow/deny), we focus on developing novel permissions plugins that provide strong protection for sensitive data (privacy) while maintaining the functionality of the apps (utility). Developing such a plugin is a challenging task because often an app’s utility is tightly coupled with the user’s sensitive information. For exam- ple, a navigation app may not work properly if it does not receive accurate information about the user’s location. Thus, it becomes difficult to achieve complete privacy without compromising its functionality and vice-a-versa. To this end, we developed the following two permissions plugins that balance privacy-utility tradeoff and allow users to take an informed decision about sharing their data to obtain services in return.

• WAVEOFF: It is a privacy marker based approach that gives users fine-grained con-

trol of what visual information apps can access through a device’s camera. WAVE-

OFF allows a user to mark an object as safe to release to an app. It has a special user interface (UI) through which users can easily mark objects safe in a live cam-

era feed. When a user activates the plugin, WAVEOFF blocks all unmarked regions in the camera feed so that apps can only access objects that have been explicitly marked safe by the user.

Although WAVEOFF can be used as a plugin in DALF, we developed WAVEOFF as an end-to-end privacy framework that can be used independently. We developed

a prototype implementation of WAVEOFF by instrumenting Android 5.1 that runs on Nexus 5. Our evaluation on a benchmark of representative videos shows that

WAVEOFF blocked almost all unsafe regions (∼ 99%) while supporting more than 20 frames per second. We also conducted a user study to assess the usability of our

prototype. All 26 participants successfully used WAVEOFF to mark public objects containing QR codes. Overall, they reported that marking and scanning the QR code was easy and fast.

• OLYMPUS: It is a utility aware obfuscation mechanism to protect sensitive infor-

6 mation in the sensor data. A key observation that leads to the design of OLYMPUS is that the personal data gathered by apps are offloaded to the cloud for analytics, and often these analytics are limited to running the data through a machine learning (ML) model. For instance data from accelerometer are fed into an activity recog- nition model to predict a user’s activity. OLYMPUS uses a game theoretic approach to learn an obfuscation mechanism that limits the risk of disclosing private user information (i.e., privacy) while minimally affecting the functionality the data are intended for (i.e., utility). We model the problem of learning the obfuscation mech- anism as a minimax optimization and solve it using adversarial networks where privacy and utility are enforced through carefully designed loss functions.

In OLYMPUS, a user can specify the privacy requirements using a set of labeled ex- amples that contain private information. A user can specify the utility requirements by providing one or more apps (denoted as target apps) whose functionality must be preserved. Given this training data and access to the target apps, OLYMPUS learns an obfuscation function that jointly minimizes both privacy and utility loss. Here privacy loss captures the failure to protect the private information, while utility loss captures the failure to preserve the functionality of the target apps. By minimiz- ing both, OLYMPUS learns an obfuscation mechanism such that the obfuscated data hides the private information, and it can be used by the unmodified target apps. In cases where the private properties are correlated with the objective of the app’s ML model, OLYMPUS allows the user to tradeoff between the privacy and the utility.

When the target app requests the user data, OLYMPUS uses the learned obfuscation mechanism to return the obfuscated data.

OLYMPUS is a general purpose obfuscation mechanism that can be used as plugins in DALF as well as an independent mechanism on platforms other than the smart- phones. Hence, we implemented OLYMPUS as an end-to-end privacy framework

7 that works on various data modalities. Our evaluation on a benchmark dataset shows

that OLYMPUS successfully limits the disclosure of private information without sig-

nificantly affecting the functionality of the apps. We also deployed OLYMPUS on a Nexus 9 tablet and evaluated it using a real-world handwriting recognition app.

1.1 Primary Contributions

In this section, we summarize the primary contributions of this dissertation.

Extensible plugin-driven permissions framework for Android

Permissions framework is at the core of controlling access to resources on the smart- phones. We identify the need for developing an extensible permissions framework that adapts to the varying needs of users and apps. We achieve this goal by designing a plugin- driven permissions framework that utilizes user selected permissions plugins to arbitrate apps’ accesses to resources on the device. In particular, we achieve extensibility by sepa- rating the policies from the enforcement; the privacy policies are specified by the plugins and the enforcement support is provided by the framework. Moreover, we designed per- missions plugins as apps allowing us to a) limit the damage caused by malicious or buggy plugins, b) prevent the apps from circumventing the plugins, and c) provide great flexibil- ity to develop complex plugins. We show that the previously proposed privacy frameworks can be easily implemented as plugins. We also developed novel permissions plugins to arbitrate key resources on the device.

Privacy markers for hiding visual secrets

In many situations, sensitive objects are maliciously (or accidentally) captured in a video frame by third-party applications. We developed a privacy marker based approach to en- force the principle of least privilege in apps’ access to a device’s camera. Our proposed

8 marking technique allows users to easily mark a particular object safe to release to apps, and only safe objects are delivered to the apps in the camera feed. Performing real-time image processing in a resource-constrained environment was a challenging task. To ad- dress this challenge, we use a fusion of object detection and object tracking to efficiently find safe objects in the camera feed in real-time. Our optimized pipeline based implementation achieves high accuracy of blocking un- safe regions without affecting either the app’s performance or the user’s experience.

Utility aware obfuscation

Many apps offload potentially sensitive user information to the cloud and use machine learning to perform various analytics. We employ a game-theoretic approach to design a utility aware obfuscation wherein inputs to a machine learning model are obfuscated to minimize – a) the private information disclosed to the model, and b) the accuracy loss in the model’s output. We modeled the problem as a minimax optimization that allows users to tradeoff privacy and utility. To the best of our knowledge, we present the first proof-of-concept implementation of a utility aware obfuscation mechanism, deploy it on a smartphone, and evaluate it against a real-world mobile app. We show that the proposed obfuscation mechanism successfully hides private information in the obfuscated data, and allows apps to utilize the obfuscated data without any modifications to the apps. Moreover, the proposed approach works across different data modalities and supports multiple apps with a single obfuscation mechanism.

1.2 Thesis Organization

The rest of the dissertation is organized as follows. We provide the necessary background

on the Android OS in Chapter 2. Chapter 3 presents DALF, a plugin-driven permissions framework together with several permissions plugins that arbitrate key resources on the device. In Chapter 4, we introduce WAVEOFF, a privacy marker based approach to protect

9 visual secrets, and demonstrate its usability with a user study. We then present OLYMPUS, a utility aware obfuscation mechanism in Chapter 5, and demonstrate its applicability on images and motion sensor data. Finally, we conclude in Chapter 6 with a summary of our work and future directions.

10 2

Background

In this chapter, we provide background on the portions of Android salient to this disserta- tion. Android is a -based OS that is bundled with a set of trusted software compo- nents collectively known as the Android framework. The framework implements all of the necessary tooling and API required by the apps. It is also responsible for several important tasks such as setting up a runtime environment for each app; ensuring the usability of the device by terminating misbehaving apps that use too much memory or are unresponsive to user inputs; and mediating apps’ accesses to sensitive resources on the device, e.g., the camera or GPS location.

2.1 Android Permissions

The framework assigns unique permissions labels to each system resource and requires apps to declare the resources they need. For example, developers must specify the CAM- ERA label in a manifest file bundled with the app if it makes use of the camera. Android makes a distinction between normal and dangerous permissions. The resources in the for- mer class, such as Internet access, are considered to pose a low risk to user privacy and are

11 (a) Permissions dialog (b) Permissions by app (c) Permissions by resource

FIGURE 2.1: Android permissions. granted automatically [5]. The latter are considered high-risk, sensitive resources. Prior to Android 6.0, a user must grant all the dangerous permissions an app needs during install time. Otherwise, the app is not installed. On Android 6.0 and later, apps do not receive dangerous permissions automatically. During runtime, before an app uses sensitive resources, it is required to check if it has the necessary permissions. If it does not, it may request them from the user who is then presented with a dialog that indicates the permission the app needs. If the user grants the request, the app may access the resource for the rest of its lifetime or until the user revokes it later. If permissions are denied, the app may either provide a degraded service or refuse to provide it at all until permissions are granted. Android 6.0 also allows users to change their dangerous permissions decisions by toggling them on a per-app/per-resource basis in the system settings. The permissions granted to an app are recorded in a database managed by the PackageManagerService, a trusted service that is a part of the framework.

12 An example permissions dialog and the system settings to manipulate the permissions are shown in Figure 2.1. Android enforces most permissions using IPC (interprocess communication). It only allows certain trusted components in the framework to have raw, unmediated accesses to sensitive resources. All other apps have to request for them via IPC calls to the framework. When the trusted components receive a request for a resource, they use the identity of the requesting app in conjunction with the permissions database managed by the Package- ManagerService to determine if the request should be allowed. If so, the framework sends the resource or a reference to the resource in a return IPC call to the app. This approach enables the framework to independently verify if an app has the necessary per- missions to access different resources.

2.2 SE Android

Android is a Linux-based OS that primarily uses Discretionary Access Control (DAC). DAC manages access to the resources on the device through the concept of ownership. Each resource has an owner who controls access to permissions associated with that re- source. Although it is simple, DAC is coarse-grained and prone to privilege escalation attacks. For instance, once a process gains the root privilege, it has unregulated access to the system. To address this problem, Android employs a fine-grained access control through SELinux which is a Mandatory Access Control (MAC) system. In MAC, access to resources is controlled based on a predefined security policy, which is defined and man- aged by a system admin. SELinux acts on the principle of default denial, meaning unless explicitly stated in the security policy, access to a resource by an app is denied. Thus, even processes with root privileges cannot perform unauthorized actions. Starting from Android 4.4, Android uses SELinux to enforce MAC over all processes including processes running with root privileges [26]. However, the use of SELinux poli-

13 cies are limited to kernel-provided resources such as syscalls. The access to resources such as GPS and Camera are still arbitrated through the Android framework as described above. There has been an attempt to extend the SELinux to enhance Android permissions through Middleware MAC (MMAC) [23]. However, it has not been adopted by Android due to its complexity. Even with MMAC, which is limited to a binary security policy (i.e., allow or deny access to a resource), it is not clear how to enforce complex protection mechanisms such as providing obfuscated resources to the app.

2.3 Android Internals

2.3.1 Binder IPC

IPC in Android is done with Binder, a core system feature, and with the notion of binder objects. The methods of a binder object instance may be invoked by remote processes even if they did not originally create the object. The only requirement is that they must first receive a valid reference to the object from a process that possesses such a reference. A binder object’s methods are assigned unique identifiers and are either marked as one- way or blocking. As their names suggest, a one-way method returns immediately while a blocking method blocks the caller until the callee returns from the method. The callee is always the process that created the binder object while the caller is the process with the object reference. When a binder object’s method is called, if the caller and the callee are different pro- cesses, a binder transaction is initiated using the binder driver in the Linux kernel. The driver copies the data of the IPC message from the caller’s address space to the callee’s. In blocking transactions, it copies the return value from the callee to the caller at the end of the method. If the caller and callee are the same processes, the method calls take place within the process and the binder driver is not involved. The method arguments and the return data between the caller and callee are transferred

14 using instances of the Parcel class, an Android abstraction over byte arrays. The caller will serialize a method’s identifier and arguments into a parcel object before invoking the binder object method and passing the parcel. The callee will read the method identifier from the received parcel, deserialize all the arguments for that method from the parcel, and make a direct call to the method with the deserialized arguments. If the invoked method has a return value, the callee sends it to the caller in a separate reply parcel.

2.3.2 Launching apps

Android uses a unique approach to launching new apps. When the OS boots, the frame- work starts a special zygote process. Zygote loads the application runtime environment in its address space and then listens for requests to launch new apps. When one is received, it uses the fork() syscall to create a new process. The parent process resumes being zygote. On the other hand, the child process initializes the runtime, relinquishes system capabilities not required by the app, and starts executing the app code. In other words, An- droid does not use the traditional exec() syscall to execute an application binary. This technique reduces both app startup time and memory usage because an app process starts with the runtime already loaded and will share unmodified regions of its address space, such as those that contain code, with the zygote process.

15 3

Plugin-driven Permissions Framework for Smartphones

The permissions model in Android is coarse; it either allows apps unconditional access to a resource or denies it completely. Numerous prior work have proposed fine-grained permissions architectures that are also extensible via the use of permissions policies or plugins. However, these architectures either restrict the capabilities of the plugins or re- quire limitless trust in them because they run with excessive privileges. In this chapter, we propose DALF, an extensible permissions architecture not subject to either limitation. Our key insight is that the permissions plugins should be apps themselves, which naturally means they have the same capabilities as apps while being subject to identical isolation mechanisms. That is, plugins do not run with elevated privileges in the OS and may not access the address spaces of regular apps. Experiments with microbenchmarks and case studies with real third-party apps show promising results: plugins are easy to develop and generally impose a small amount of overhead when mediating on apps’ accesses to resources.

16 3.1 Introduction

The permissions framework on Android, which mediates apps’ accesses to sensitive re- sources on the device (e.g., GPS location, camera, SMS, files on the external storage parti- tion, etc.), has been coarse from the beginning. It uses a binary access control model which either allows apps unconditional access to a resource or denies it completely. Despite be- ing the subject of extensive research [48, 56, 66, 96, 112, 117, 118, 119], it has only seen one major change in practice: beginning from Android 6.0, apps may no longer assume they have the permissions to access sensitive resources just by virtue of being installed [5]. They have to request for them from users during runtime. However, it remains coarse. Prior work have proposed fine-grained permissions with the use of extensible frame- works [48, 56, 112, 118, 119]. They enable users to base access control decisions on the current context, such as time of day, by running permissions plugins. Permissions plugins are custom modules or policies that govern the access to user data or resources. Some frameworks also allow plugins to modify the data delivered to an app in response to a resource request. The existence and popularity of open-source projects that implement similar capabilities, such as XPrivacyLua [12, 36], suggest users desire more fine-grained control beyond what Android permits. An extensible Android permissions framework should minimally provide the following properties: (i) limit the trust extended to the permissions plugins, (ii) prevent apps from circumventing the permissions plugins applied on them, and (iii) allow the development of permissions plugins in a language that does not severely limit their capabilities (i.e., not a restricted domain-specific language). While we defer a thorough treatment of prior work to Section 3.2, we note here that the work we surveyed unfortunately provide only a strict subset of these properties. In particular, some of them run the permissions plugins either in the OS or within the address space of an app’s process. As we discuss in Section 3.3, a misbehaving plugin may cause a lot of damage in the former or exfiltrate confidential app

17 information in the latter.

In this work, we present DALF, an extensible permissions framework for Android that provides all three properties. It allows users to control each resource (e.g., location) accessible by an app with a permissions plugin. When the app requests that particular resource, DALF invokes the corresponding plugin and provides it the original resource data. In response, the plugin must provide a return value (e.g., perturbed data) of the same data type to forward to the app. In a practical setting, we expect privacy experts to develop plugins that protect users under different scenarios and for users to install the plugins they are interested in.

The key design decision in DALF is that the plugins are apps themselves. They are hence subject to the same isolation mechanisms Android applies to regular apps, which goes a long way in satisfying the aforementioned properties. The main difference between plugins and regular apps is that plugins are invoked automatically by DALF using IPC (inter-process communication) whenever there is a need to interpose on a regular app’s resource request.

Our prototype implementation of DALF for Android 8.1 supports the arbitration of the following resources: the user’s location; her contacts and calendar entries; image frames from the camera; and accesses to files stored on the external storage partition, which is a large-capacity partition typically used by an Android device for mass storage. Other resource types (e.g., accelerometer, gyroscope, etc.) can be easily supported with the tech- niques described in this work. Our evaluation of the prototype with microbenchmarks and a case study with real world apps demonstrate DALF’s practicality. The performance overheads for location, contacts, and the camera, are low. In the case of external stor- age, it imposes an 80x slowdown over unmodified Android. However, this is due to an unoptimized prototype. We discuss in Section 3.7 how such performance issues may be addressed. We make the following contributions in this work:

18 • We identify the need for designing permissions plugins as apps in an extensible per- missions architecture. Our design allows them to provide fine-grained permissions without needing to run with elevated privileges in the OS or in the same address spaces as app processes. Hence, it limits the damage malicious or buggy plugins may cause.

• We demonstrate, using our prototype implementation, that the design of our pro- posed framework is flexible enough to express prior work as plugins. More specifi- cally, we implemented (i) a plugin to protect a user’s location privacy using geo-in- distinguishability [43], and (ii) a plugin to protect visual secrets using the techniques proposed in PrivateEye [104].

The rest of the chapter is organized as follows: we describe prior work in Section 3.2; we discuss their limitations in the context of our design principles and also provide an overview of DALF in Section 3.3; we detail the implementation of prototype in Section 3.4

and describe the plugins we developed in Section 3.5. Finally, we evaluate DALF in Sec- tion 3.6, discuss its limitations in Section 3.7 and conclude in Section 3.8.

3.2 Related Work

Table 3.1: A summary of how prior work compares to DALF.

Limits trust in Prevents Flexi- permissions circumvention ble plugins of plugins plug- ins CoDRA [112], Pegasus [53], SweetDroid [54], ! ! # SemaDroid [119], ipShield [48], BinderFilter [118], CRePe [56] ASF [45] # ! ! Xposed [35], XPrivacyLua [36] # # ! DALF " " "

19 Improving the permissions model on Android is an active area of work both in the academic and open-source communities. Since the default permissions model is coarse, the aim has generally been to provide fine-grained permissions. Some of them also focus on improving usability so that users may enjoy privacy protections with minimal manual effort. We describe prior work in detail in this section. We summarize their limitations

relative to DALF in Table 3.1 and elaborate further in Section 3.3.2.

Fine-grained permissions

A popular approach of providing fine-grained permissions has been to take into account the context in which a resource request is made and to return custom data in response to a re- quest, instead of merely allowing or denying it. CoDRA [112], ipShield [48], CRePe [56], and SemaDroid [119] allow users to specify policies to control apps’ accesses to resources on the device based on external context (location, time of day, etc.). ipShield and Se- maDroid focus on protecting sensor data while CoDRA and CRePe allow control over a wider set of resources, such as access to WiFi. Pegasus [53] and SweetDroid [54] use code contexts to capture the code paths of apps’ sensitive resource requests. This may be used, for example, to prevent an advertisement library bundled within an audio recording app from accessing the microphone but not when the user explicitly records audio. INSPIRED [66] uses the UI elements shown on the screen and the relationships between them as the context of a resource request. It checks if an app’s UI is in agreement with the intent of the permission requested e.g., an app that requests the SEND SMS permission should look like a messaging app. Xposed [35] is an open-source framework for Android that enables interposition on general portions of the Android API. It allows users to execute custom plugins within the address space of each app and hook well-known API methods. This enables plugins to have strong controls over an app’s actions. XPrivacyLua [36] builds on top of Xposed to provide a fine-grained permissions system with a set of default actions such as faking

20 location data. It also lets users write custom policies using Lua. BinderFilter [118] is a kernel-level firewall for Binder. It allows users to use context- aware policies to filter and modify IPC messages. These policies are executed directly in the kernel. ASF [45] proposes an architecture for security experts to develop security modules that control the data apps receive when they request sensitive resources.

Protections against specific attacks

PrivateEye and WaveOff protect visual secrets from being inadvertently leaked in images and videos captured by smartphone devices [104]. The authors modified the camera sub- system in Android so that only specially marked regions of the physical environment are made visible to apps using the camera; all other regions are blocked and appear black in the frame.

Usability improvements

SmarPer [96] and Wijsekera et al. [117] proposed the use of machine learning techniques to automatically respond to permission requests from apps and involve the user only when absolutely necessary. ipShield lets users rate how concerned they are about an app’s abil- ity to perform inference attacks, which is the use of data from multiple sources to in- fer sensitive information e.g., using the accelerometer and gyroscope together to infer keystrokes [91]. Based on their ratings, ipShield automatically recommends the policies that should be applied to the app.

Others

TISSA [125] is an early work that studied the feasibility of filtering data before delivering it to apps. MockDroid [46] is a version of Android that lets users send mock data to apps when they request for resources. SAINT [98] allows an app to provide policies to regulate how other apps may interact with it.

21 3.3 Overview

In this section, we present the design of DALF, an extensible permissions framework for Android, and the principles that guided it. In the rest of this chapter, our discussions take place in the context of the permissions model used in Android 6.0 and onwards.

3.3.1 Trust model

The TCB (Trusted Computing Base) in this work is the Android OS, which includes the

Linux kernel, the Android framework, and the device drivers. DALF does not require any trust in the apps or permissions plugins installed by users. DALF allows users to apply plugins on a per-app/per-resource basis and it provides each activated plugin the raw resource data. For example, a location plugin activated for a particular app receives the raw location data whenever the app accesses the location. Hence, we expect users to trust the plugins they install. A plugin may wish to access additional resources (e.g., to infer context). In such cases, it must explicitly request the user’s permissions to do so since it is an app itself.

3.3.2 Design principles

Below, we describe the important principles that guided DALF’s design.

Limit trust in permissions plugins

We believe that a prerequisite for the practical adoption of an extensible permissions ar- chitecture is the ability to limit the trust in the permissions plugins. In other words, the damage that a misbehaving (buggy or malicious) plugin can cause should be limited. As the majority of prior work [45, 48, 53, 54, 56, 112, 118, 119] execute their per- missions plugins directly in the OS, the plugins run with elevated privileges. In these systems, a misbehaving plugin could in theory cause a lot of damage. However, with the exception of ASF [45] as shown in Table 3.1, many of them do limit the trust extended to

22 plugins. This is a direct consequence of those systems requiring plugins to be written in custom domain-specific languages (DSLs) with restricted capabilities. The limitations of the DSLs are directly responsible for limiting the damage a misbehaving plugin can per- form. Unfortunately, this also means that increasing the capabilities of the DSLs increases the damage potential of misbehaving plugins. In contrast, other prior work (Xposed [35], XPrivacyLua [36], SmarPer [96]) run the plugins in the address space of each app process. In this approach, misbehaving plugins may exfiltrate confidential information from the app. We demonstrate this by developing a custom Xposed plugin that perturbs location data when the app receives it from An- droid framework via IPC. It hooks Location.CREATOR.createFromParcel(), an internal Android method that deserializes the location data structure received in a par- cel. The plugin obfuscates the location after deserialization completes. It also hooks the constructor of the android.text.TextView class to exfiltrate the content of all text elements in the UI. At the moment, the plugin prints them to logcat, the system logger in Android. However, it can be easily modified to send them to a remote server over the Internet. We tested this plugin with Telegram [29], a secure messaging app, by manually sharing the current location with a Telegram contact. As expected, the shared location was perturbed. However, the plugin was also able to access the chat messages users send each other since they are rendered on the UI.

Prevent circumvention of permissions plugins

It should not be possible for apps to circumvent the plugins applied on them. This requires the plugins to run their arbitration logic before apps receive the requested resource. Con- sider instead Xposed-based frameworks. Since an Xposed plugin’s hooks run in the same address space as the app, they can only execute after the data is received in the app pro- cess. Hence, it is possible for apps that are aware of these plugins to change their runtime environment and access resources before the plugins modify them. To demonstrate this,

23 we added a small number of modifications (less than 100 lines of code) to the free and open-source version of Telegram [30]. We first added a new method to manually deserialize the location from a parcel. Next, we used YAHFA [37] to change the original pointer of the Location.CREATOR.create- FromParcel() method to point to the new method during runtime. We used this modi- fied version of Telegram with XPrivacyLua, reran the test described earlier, and found that the shared location was obfuscated. However, recall from Section 2.3.1 that in binder IPC, methods run after the deserialization of objects from the parcel. The new method we added is invoked during the deserialization step. When our method returns, the location object it creates is eventually perturbed by XPrivacyLua. However, it is too late by this time: the app has already observed the true coordinates during deserialization in our method. In other words, XPrivacyLua was only providing an illusion of safety to the user in this test. We also used our custom malicious Xposed plugin and found it to be ineffective as well. Note its doubly harmful effects in this test: it did not provide any protections and it was exfiltrating chat messages.

Flexible permissions plugins

We believe developers should be able to create plugins without unnecessary restrictions to aid the traction of an extensible permissions framework. SemaDroid [119], ipShield [48], and Pegasus [53], among others, use domain-specific languages (DSLs) to write permis- sions plugins and are limited. For example, they may not be able to maintain program state, run arbitrary computations, or access other resources such as the Internet. These limit the types of plugins that can be developed. Consider a trusted privacy watchdog organization. They may wish to create a plugin that tracks if certain apps make network requests to suspicious IP ranges. The plugin itself may need to periodically connect to the Internet to update the IP ranges. Such plugins are not possible in prior work that rely on restrictive DSLs.

24 FIGURE 3.1: A high-level overview of DALF’s design.

Allow support of all resource types

A permissions architecture’s design should not fundamentally limit itself to only certain resource types. In contrast, consider prior work: SemaDroid and ipShield focus on sensor data such as the accelerometer; BinderFilter [118] does not support interposing accesses to files and does not appear to support perturbing the frames in a camera stream; ASF does not support the camera stream either and although it supports interposing accesses to files, it does so by rewriting the app, which may be defeated if apps use direct syscalls (e.g., open()) in native code.

25 3.3.3 Dalf

DALF is an extensible permissions architecture for Android that complements its existing permissions model. A high-level overview of its design is illustrated in Figure 3.1. It en- ables users to apply plugins on a per-app/per-resource basis. If an app accesses a permitted resource and there is a plugin applied for that resource, the Android framework invokes the plugin using binder IPC, waits for a reply from the plugin, and then responds to the app’s request based on the reply. It provides the plugin the identity of the app and an unmodified version of the data. The plugin may allow the request by returning the input data, deny the request by completely redacting the data, or return some modified version of the data. In DALF, plugins are essentially apps that implement the DALF plugin API described in Section 3.4. That aside, they have the same capabilities as regular apps. For example, they may have a UI or request permissions from the user to access additional sensitive resources to determine the current context of the device (e.g., time of day, current location, etc.). In practice, we envision the presence of a plugin store similar to app stores. We expect privacy experts to develop and publish different kinds of plugins on the plugin store, and expect users to install the plugins that suit them. We believe that the plugins’ reputation and trustworthiness would depend on the user ratings of each plugin, the reputation of the entities making them, and whether or not the source code for those plugins are available.

Restrictions

We intentionally placed two restrictions on our design. First, a plugin’s accesses to re- sources are not themselves mediated by other plugins. Second, users may only apply a single plugin for each resource type accessed by an app i.e., plugins may not be composed for an app-resource pair. From an implementation perspective, it is straight-forward to remove both of these restrictions. However, doing so makes it difficult to reason about

26 the protections ultimately applied to apps. We leave a feasibility study of loosening these restrictions to future work.

Satisfying the design goals

Our approach of having plugins as apps meets the first three goals stated above. First, they are subject to the same isolation mechanisms Android places on regular apps. Hence, they have limited trust by default. They do not run with elevated privileges in the OS nor do they have access to apps’ address spaces. Second, because plugins run as separate processes and are invoked before resources are delivered, apps only receive arbitrated

data and cannot circumvent plugins. Third, aside from having to implement the DALF plugin API, developers may use the same tools they use to develop Android apps to create plugins and have access to the same capabilities. Finally, DALF’s general design supports all resource types. We demonstrate this with our prototype in Section 3.4.

3.3.4 Malicious plugins

In this section, we discuss what might happen if users inadvertently install malicious plu- gins while using DALF and potential mitigation measures. As plugins are apps, they are subject to Android’s application sandbox, which isolates apps from each other and from the OS. Thus, a malicious plugin cannot directly interfere with other apps or the OS. How- ever, it can a) track apps’ access patterns to the resources it interposes on, b) access other resources if granted by the user, and c) feed garbage data to the apps in response to their requests. Here, (a) and (b) are the issues of privacy and (c) is an issue of integrity. It is difficult to prevent a plugin from tracking apps’ access patterns because for each resource request by an app, the plugin must be invoked with the original requested data. For a stateless plugin, one can replicate the plugin such that for each request made by the app, DALF invokes one replica randomly. This way no one plugin (replica) has a complete view of the access pattern of the app. One can also protect user/app data by preventing

27 the plugin from sharing the data to the outside world, e.g., by denying the INTERNET

permission or by providing opaque references to the data [76]. For instance, the GEOIND plugin described in Section 3.5.1 uses differential privacy to perturb the location data. It can perturb the user’s location without accessing her true location since the noise added to perturb the location data is independent of the data value. DALF can prevent the plugin from feeding garbage data by performing integrity checks before forwarding the data to the app. That said, it may be difficult to distinguish perturbed data from garbage data. Since plugins are apps, we may borrow the measures that are employed to reduce the chances users install malicious apps to also reduce the probability that malicious plug- ins are installed in the first place. We can establish a vetting mechanism for plugins or plugin developers, similar to the vetting mechanisms of regular apps, e.g., Google Play Protect [19]. Observe that the scope of a plugin is much more narrow compared to a regular app: a plugin’s focus is on resource mediation while apps are far more richer in functionality. Hence, we argue that in most cases vetting a plugin would be easier than vetting a regular app. For instance, while a normal app may use a lot of external libraries (e.g., for ads), which are a prime source of malicious/buggy behavior [68], we can ensure plugins only use trusted libraries (e.g., OpenCV [47] for computer vision tasks). Alterna- tively, the vetting process may be stricter for plugin developers and require them to submit the source code of their plugins for malware analysis. Finally, similar to open-source privacy plugins for web browsers [1, 16, 21, 25, 33], we believe useful DALF plugins will gain popularity and trustworthiness by being open- source. We also envision a repository where developers are encouraged to publish open- source plugins (such as F-Droid [15]). Although this does not prevent a malicious open- source plugin from existing, it decreases the opportunity for plugin developers to hide malware in their code. Users may still install a malicious plugin despite these safety measures. In such cases, our plugins as apps design allows them to uninstall a plugin as soon as they identify it as

28 malicious. This is akin to uninstalling a malicious app. While it does not undo the damage done (e.g., the leaking of resource data), it prevents further damage.

3.4 Implementation

We implemented a prototype of DALF for Android 8.1 (AOSP branch android-8.1.0 - r1 [9]). We modified the Android framework to implement a plugin API, allow users to install plugins, and to provide a companion settings app that lets users select the plugins to apply for each app-resource combination.

3.4.1 Plugins

Each plugin must meet the following requirements. First, it must be able to run in the background using Android’s Service class. Next, it has to declare the resources it ar- bitrates in a plugin manifest file and for each of them, define a corresponding interposer

(explained below). Finally, it must implement DALF’s PluginService class and over- ride the appropriate get() methods in the class to return the interposers. Aside from these, plugins have access to the same capabilities as regular apps e.g., use native code, provide a UI, etc. When the user installs a plugin, the PackageManagerService reads its manifest and populates a new plugin database. This database is used by the settings app to list the available plugins to the user and to record the plugins applied for each app-resource combination. During runtime, the Android framework starts a plugin’s background service on demand, when there is a need to arbitrate on a resource for an app. It retrieves the interposer corresponding to that resource from the plugin, invokes its methods using binder IPC, and then responds to the app based on the plugin’s reply. Our prototype only starts a single instance of each plugin and shares it across all the apps that require it.

29 Table 3.2: The resource interposers currently supported by the DALF prototype.

Interposer Data type Arbitrates Location Plain object Device location. Contacts Content provider The user’s contacts database. Calendar Content provider The user’s calendar database. Camera Streaming Frames from a camera stream. External storage Kernel-provided External storage file access.

3.4.2 Interposers

In DALF, an interposer is a binder object represented as a Java class with several abstract callback methods. These methods must be implemented by the plugin and are invoked by the framework to interpose on a resource. In general, the framework passes the app’s name and the resource data to the interposer and in return, the interposer must respond with how the app’s resource access should be handled. As the details of an interposer differ based on its data type, we discuss them separately below.

Plain objects

It is straightforward to interpose on resources that are simple class objects. For example, location data in Android are represented as instances of the Location Java class. It uses field variables to represent attributes such as latitude and longitude. Our modifications to the Android framework to handle plain objects are summarized as follows:

During serialization, DALF tracks whether a parcel contains plain objects that represent resources that should be interposed on and if so, their positions within the parcel. Before the parcel is sent in a transaction to an app, the framework loops through each object and invokes the callback of the appropriate plugin interposer using binder IPC. In doing so, the framework provides the plugin the app’s name and a copy of the object. In response, the interposer must return an object instance of the same type but it has the freedom of changing the values of the object’s fields. Once all interposers are invoked, the framework creates a copy of the original parcel but replaces the objects that were interposed on with

30 the objects received from the plugins. Finally, the framework resumes the initial binder transaction to the app but sends the new parcel instead of the original parcel.

Content providers

A class of sensitive resources in Android, such as contacts, calendar, and call logs, are accessed using the content provider design pattern [3]. Apps need to query the framework and provide (i) a URI that identifies the data type, (ii) the specific data columns the app needs, e.g., display name and phone numbers of contacts, and (iii) other arguments that may affect the results of the query, such as conditional operators. A query’s results are tabular in nature, much like the results of a regular database query. The framework maintains a different content provider for each resource URI. When a content provider receives a query it is responsible for, it executes the query and instantiates a CursorWindow object, thereby allocating a chunk of memory. It writes the query results into the allocated memory and sends a Cursor object back to the app so that it may read the results. Internally, the cursor uses a read-only CursorWindow handle to the memory allocated by the content provider. Similar to plain objects, we support such resources by keeping track of whether a parcel contains CursorWindow handles that were created because the app queried for data interposed on by a plugin. If so, the framework invokes the corresponding interposer and provides the app’s name and the CursorWindow handle. In response, the interposer must return a CursorWindow handle to return to the app. If a plugin needs to customize the data returned, the interposer must create a new instance of the CursorWindow object, fill the newly allocated memory accordingly, and return the handle. In our current prototype, plugin developers have to write a considerable amount of boil- erplate code to perform simple tasks such as removing the contacts whose phone numbers begin with a particular area code. The amount of manual effort they need to exert may be significantly reduced by adding helper methods that implement the necessary boilerplate.

31 We leave such improvements to future work.

Streaming data

A typical Android device has several streaming data sources such as the accelerometer, gyroscope, camera, etc. The framework delivers data from these sources to apps using

different implementations of the producer-consumer pattern. We support them in DALF by allowing plugins to relay the data flow between the producer at the source and the consumer at the app. We discuss how our prototype supports the camera stream below. An app has to send a capture request to libcameraservice, a framework com- ponent, to access the camera. At a high-level, a request contains camera configuration parameters and the surface that will receive the camera output (i.e., the rendering target). Internally, a surface is a BufferQueue object with references to a producer-consumer binder object pair [8] and a capture request contains the surface’s producer object. When libcameraservice receives a request, it verifies that the app has permis- sions to access the camera, creates a new camera stream, configures it according to the parameters of the request, and registers the surface’s producer object with the stream. Subsequently, when the stream renders new frames, it uses the producer object to deliver the frame handles to the app. It leaves it to the app to use the corresponding consumer object to read the frames. In our prototype, when libcameraservice configures a camera stream for an app, it checks if the app’s camera access is arbitrated by a plugin. If so, it invokes the should- Interpose() method on the camera interposer and passes the app’s name and surface producer object. This enables the plugin to decide whether or not to interpose on that particular stream. If it does not, then the stream is configured normally and frames are rendered directly to the app’s surface. Otherwise, the plugin creates a new surface of its own and sends that surface’s pro- ducer object back to libcameraservice which will, in turn, register the producer

32 received from the plugin in the camera stream. Now, whenever the camera stream renders a new frame, the plugin’s camera interposer’s onFrameAvailable() callback will be invoked with a pointer to the raw frame data to let the interposer modify the frame. Once the method returns, the frame is copied and a handle to the copy is sent to the app (using the app surface’s producer object received in the earlier invocation to shouldInter- pose()). Our prototype introduces a new InterposableSurface object to perform the heavy lifting necessary for the above tasks and to simplify the development of a cam- era plugin. Note that frames must be copied to prevent data leakage. We observed that the camera stream reuses the memory allocated for past frames to render a new frame. If instead of performing a copy, we pass the original handle received from the camera stream to the app, it may be able to read new frames directly and circumvent the plugin. Our prototype does not yet support other sources of streaming data, such as the mi- crophone and sensors. Our initial investigations suggest that an approach similar to the above will work as they use different implementations of a producer-consumer model (e.g., BitTube [10] in the case of sensors).

Kernel-provided resources

Certain resources, such as files and the network, are provided by the OS kernel. The Android framework is only involved in granting an app the capabilities to request those resources. Once granted, apps may access them using the appropriate Linux syscalls. In the following, we detail how our prototype uses ptrace, a syscall tracer in Linux, to interpose on accesses to files on the external storage partition. In Android devices, this partition typically has a large capacity and is used for mass storage. In general, ptrace allows a tracer process to trace the syscalls made by another tracee process. When the tracee calls a syscall, the kernel traps to the tracer twice: once before the kernel executes the syscall (syscall-enter-stop), and once afterwards

33 (syscall-exit-stop). The former allows the tracer to inspect and modify the input arguments to the syscall and the latter allows the same for the syscall’s return values. As explained in Section 2.3.2, Android uses the zygote process to launch new apps. We tweaked it to trace syscalls via ptrace. As usual, zygote performs a fork() to start a child process to run the app. We denote this child CA. If the user applied a plugin on the app to arbitrate accesses to external storage, zygote performs a second fork() and the second child, CT , is designated as CA’s tracer. We synchronize CA and CT using shared semaphores so that the tracer may set various ptrace options before CA starts executing the app code. For example, since Android apps are inherently multi-threaded, it sets the

PTRACE O TRACECLONE option to monitor the syscalls made by all threads in CA.

When CT traps because of syscall-enter-stop, it identifies the syscall being called. If it is one that is used to open files, it reads the filename argument to check whether it resides in external storage. If so, it invokes the storage interposer of the plugin and sends the app’s name and the filename. The interposer must now return a path to the file that should be opened. It may allow the access, deny it (by returning an empty string), or redirect it to a different file by returning a new path. In the last two cases, the tracer copies the returned string into an unused portion of the app thread’s stack (a popular thread-safe trick of placing new data in the app [24]), changes the filename pointer argument to point to the new string, and then resumes the syscall. Many of the ptrace operations, such as identifying the syscall being called, are architecture-dependent. Our prototype currently supports aarch64 (ARMv8-A, 64-bit mode), the architecture used in modern Android devices, and only checks for calls to openat(), the syscall used to open files in aarch64. We leave the support of other architectures to future work. Even though we focus on interposing on the external storage in this work, this ptrace- based approach can also be used to interpose on other resources accessed through syscalls, such as network requests.

34 3.5 Permissions Plugins

We demonstrate DALF’s capabilities by developing four plugins, one for each resource

data type. These plugins demonstrate the extensibility of DALF by addressing diverse privacy requirements of users. In this section, we motivate and describe them.

3.5.1 Location plugin

Use case: Alice has an app on her smartphone to discover nearby points of interests (POI) such as restaurants, bars or coffee shops. It sends her location to the cloud to retrieve the POI within some distance from her location. As Alice is afraid the app might track her movements, she often avoids using the app even though she likes it.

To address such scenarios, we developed GEOIND, a plugin that perturbs a user’s lo- cation data with geo-indistinguishability [43], a differentially-private location obfuscation technique. It adds a carefully chosen amount of noise to the user’s location so that with high probability, an adversary can not infer a user’s true location even after observing the noisy location. The amount of noise added depends on r, a user-specified parameter that represents the radius within which the user wants her privacy protected. Increasing r adds more noise to the location and improves privacy but reduces the accuracy of the POI re- sults. Alice may install this plugin and choose r as desired to receive reasonably accurate results while protecting her true location.

3.5.2 Contacts plugin

Use case: Bob uses a social networking app and it allows him to find his friends on the network by giving it access to his address book on the phone. Although he likes this feature, he does not want to share all of his personal contacts.

We developed CONTACTSGUARD, a plugin that uses the contacts interposer to run user-specified policies each time an app queries the contacts database. The plugin inspects the results and removes or perturbs entries based on the policies. Bob may install this

35 plugin and set a policy to filter out contacts he does not want to share with the social networking app. Our prototype implementation of CONTACTSGUARD supports two poli- cies: (i) filtering out contacts based on their phone numbers’ area code, and (ii) hide email addresses.

3.5.3 Camera plugin

Use case: Charlie installed a translation app on his phone when he traveled overseas because it simplified translating foreign language text into his native language. He only had to point his phone camera at the text and the app would translate it. He wants to use it back home, too, to translate foreign language documents. However, he is worried that he might inadvertently share images of sensitive information lying around in his house, such as bank statements and family pictures. As discussed in Section 3.2, PrivateEye [104] addresses the above problem with a default-deny approach. Users have to specifically identify the portions of the physical en- vironment considered public using special markers. PrivateEye then uses computer vision techniques to identify and disclose only the portions of the image frame that lie within the markers. As PrivateEye is implemented by modifying the camera subsystem, it affects all camera apps and does not consider user context.

Using DALF, we implemented GEOPRIVATEEYE, a location-aware PrivateEye plu- gin. It allows users to specify the locations where they are concerned about inadvertently sharing sensitive information and only enables PrivateEye in those locations. Charlie may install this plugin, add his home as a sensitive location, and apply it to the translation app. Consequently, when he launches the app at home, the plugin block everything by default. He will have to place markers around the documents he wants to translate. However, once he leaves the house, the plugin will not perturb the image the app receives. GEOPRIVATE-

EYE demonstrates how a plugin in DALF may access data from one resource (location) to arbitrate another resource (camera).

36 3.5.4 Storage plugin

Use case: Eve stores photos and sensitive documents on the external storage partition of her phone. She wants to use an image-sharing app to share her photos with friends and family. If she uses one, she has to grant it the permission to read files on the external storage since her photos are stored there. However, in doing so, she will also grant it the permission to access her sensitive documents. Hence, she is afraid of using such an app.

We developed a FILEGUARD plugin that implements a storage interposer to handle

these scenarios. Similar to CONTACTSGUARD, it allows user-specified policies to dictate whether accesses to files on the external storage should be allowed. It currently supports the following policy types: (i) allow access to whitelisted files, (ii) deny access to black- listed files, and (iii) deny access to photos based on where they were taken. To support the last policy, it relies on the location data stored in the photo’s exif metadata. Eve

may apply the FILEGUARD plugin to an image-sharing app and whitelist just the photos directory on her external storage, thereby preventing the app from accessing anything else.

3.6 Evaluation

To evaluate DALF, we ask two questions: (i) Is its design practical? (ii) How do plug- ins behave with real world Android apps? To answer the first question, we focused on measuring the different aspects of overhead of our prototype. We answer the second ques- tion by manually running real-world apps with our plugins and performing a qualitative investigation. In this section, we report our findings.

3.6.1 Experimental methodology

In order to perform the overhead experiments, we used two different variants of Android:

STOCK, an unmodified version of Android 8.1, and DALF, which is Android 8.1 with our permissions framework implemented. We used STOCK as the baseline and ran different

37 Table 3.3: Configurations used in evaluating DALF. Resource Configuration Description STOCK Unmodified Android 8.1 (AOSP branch android-8.1.0 r1 [9]) DALF Android 8.1 instrumented with our permissions plugin framework. LOCNOOP DALF + location plugin that does not perform any opera- Location tion. CONSTLOC DALF + location plugin that replaces original location with a constant location. GEOIND DALF + GEOIND plugin. CONTACTSNOOP DALF + contacts plugin that does not perform any opera- Contacts tion. FILTERPHONE DALF + CONTACTSGUARD plugin that filters out private contacts. PERTURBEMAIL DALF + CONTACTSGUARD plugin that perturbs emails. CAMERANOOP DALF + camera plugin that does not perform any opera- Camera tion. BLUR DALF + camera plugin that blurs camera frames. GEOPRIVATEEYE DALF + GEOPRIVATEEYE plugin. STORAGENOOP DALF + storage plugin that does not perform any opera- Storage tion. WHITELIST DALF + FILEGUARD plugin with whitelisting policy. IMAGEGUARD DALF + FILEGUARD plugin with location based access control to images. workloads on DALF using four microbenchmark apps, one for each resource type. We used each app with a plugin that interposes on the corresponding resource type. The apps and their workloads are detailed below:

• LOCFINDER retrieves the user’s location repeatedly a 100,000 times using the Loc- ationManager.getLastKnownLocation().

• CONTACTSLOADER retrieves the display name, phone numbers, and emails, of the contacts saved on the phone using the ContentProvider.query() API. We pre-populated the phone with 100 contact entries.

• CAMERAPREVIEW displays the frames from a camera stream (i.e., a camera pre- view) for 30 seconds using the camera2 API [2].

38 0.5 12 0.4 10 0.3 8 Time (ms) Time (ms) 0.2 6

DALF NOOP STOCK DALF NOOP STOCK GeoInd FilterPhone ConstLoc PerturbEmail

(a) Location retrieval time (b) Contacts load time

30 6

20 4 FPS 10 2 Time (ms) 0 0 Blur DALF NOOP STOCK DALF NOOP STOCK Whitelist GeoPrivateEye ImageGuard

(c) Camera preview FPS (d) File load time

FIGURE 3.2: The performance slowdown in DALF when accessing various resources.

• FILEREADER reads all the files in a specified directory located on the phone’s ex- ternal storage. In our workload, the directory has 1000 files, each of size 1 KB.

In Table 3.3, we list the complete set of experimental configurations we used. Note that we ran all our experiments on a Pixel 2 XL smartphone, which has an Octa-core CPU and 4 GB of RAM. We used Python scripts that control the phone with ADB (Android Debug Bridge) to run the experiments systematically.

3.6.2 Performance slowdown

We instrumented our microbenchmark apps to compute the elapsed time between request- ing for a resource and receiving it. We use this to measure the slowdown plugins impose from an app’s perspective. For location, contacts and storage, this is given in milliseconds.

39 S D DN DP S D DN DP 160 100

140 75

50 120 Memory (MB) Memory (MB) 25

100 0 location contacts camera storage location contacts camera storage

(a) System memory usage (b) App memory usage

DN DP S D DN DP 60 12 10 40 8 6 20

Power (mAh) 4 Memory (MB) 2 0 0 location contacts camera storage location contacts camera storage

(c) Plugin memory usage (d) Battery usage

FIGURE 3.3: The memory overhead and battery usage in DALF under different workloads.

For the camera, we measured the frames per seconds (FPS) of the preview. Our results are illustrated in Figure 3.2.

As the graphs illustrate, DALF’s impact on apps when plugins are not applied is neg- ligible. When plugins are applied, there is a visible slowdown on the time taken for apps to receive resources. In location and contacts, the slowdown is 3.5x and 1.2x, respec- tively. We attribute this to the additional IPCs in DALF from the Android framework to the plugins. However, in terms of the absolute numbers, the slowdowns are less 1 ms.

In the case of camera, BLUR and GEOPRIVATEEYE deliver the preview frames at a

median rate of 10 FPS and 15 FPS, respectively. GEOPRIVATEEYE is faster than BLUR

even though it employs complex detection algorithms because unlike BLUR, it resizes the input frames (from 4032×3024 to 400×240) before performing its detection. Note that

40 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 11 6 1.5

1.0 4 9

Time (ms) 0.5 2 Power (mAh) Memory (MB)

0.0 7 0 STOCK DALF NOOP GeoInd NOOP GeoInd STOCK DALF NOOP GeoInd (a) Location retrieval time (b) Memory usage of plugins (c) Power consumption

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 3 9 4

3 2 7 2

Time (ms) 1 Power (mAh) Memory (MB) 1

0 5 0 STOCK DALF NOOP ImageGuard NOOP ImageGuard STOCK DALF NOOP ImageGuard (d) File read time (e) Memory usage of plugins (f) Power consumption

FIGURE 3.4: Performance of DALF as we apply plugins to multiple instances of LOCFINDER (top row) and FILEREADER (bottom row).

GEOPRIVATEEYE does not achieve the frame rate reported in the original paper (20 FPS) because we did not employ all of its optimization techniques [104]. Nevertheless, our proof-of-concept implementation shows the feasibility of incorporating PrivateEye-like

privacy solutions as plugins in DALF.

Finally, reading a file in DALF with a storage plugin is significantly slower (≈ 80x)

compared to STOCK. We believe this is due to the inefficient implementation of the ptrace-based tracer in our prototype that intercepts every syscall made by the app as opposed to only those related to the file access. Android apps invoke a large number of syscalls over their lifetimes. For example, our microbenchmark app made over 15 K syscalls to perform a variety of tasks in addition to reading the files in our workload, such as invoking the binder driver to send and receive IPC calls. We discuss inefficiencies of the tracer and means to address them in Section 3.7.

41 3.6.3 Memory overhead

We used Android’s dumpsys tool [4] to measure the memory usage of apps, plugins, and the system services. We use the PSS (Proportional Set Size) metric to report our results, which proportionally includes the memory shared by an app with other processes. For example, if a process A uses 100 MB of memory in total and 50 MB of it is shared with another process, A’s PSS is 75 MB. The results of our memory experiments are illustrated in Figures 3.3a, 3.3b, and 3.3c.

Figure 3.3a shows that DALF’s memory overhead on the system process is up to 12 MB

across all configurations. This suggests that the DALF modifications to the system services have minimal impact on memory. This is desirable because the permissions framework code runs all the time, irrespective of the apps and plugins currently applied. Similarly, Figure 3.3b shows that the memory usage of the microbenchmark are also consistent across different configurations. Finally, Figure 3.3c shows the memory used by plugins. It depends on the resource it arbitrates and the arbitration mechanism. In the cases of location, contacts and storage, the

plugins only use a small amount of memory over the NOOP plugin as they perform rela-

tively simple operations (e.g., adding noise to location data in case of GEOIND). However,

in the case of the camera, GEOPRIVATEEYE uses about 45 MB of memory on average be- cause it performs intensive image processing. Note that the authors of PrivateEye also reported a similar memory overhead [104].

3.6.4 Battery usage

We used batterystats [6] to measure the usage of the phone’s battery in DALF. To keep the experiments consistent, we fixed the screen brightness, disabled bluetooth and the phone network, and controlled the phone wirelessly with WiFi-ADB. We report the dis- charge metric of batterystats, which indicates the amount of battery discharged

42 during the experiment1. Figure 3.3d illustrates our results. Unlike the time and memory results, we could not observe a clear impact on power due to DALF and found a lot of variations in the results. We attribute this to the discharge metric, which includes the total battery discharge on the phone, including all the other apps and services, and does not focus only on the plugin or DALF. These results suggest DALF’s overall impact on the battery may be low. We leave it to future work to rerun the experiments in a more controlled setting (e.g., forcefully terminating all the apps and services that are not part of the experiment).

3.6.5 Scalability

Thus far, we evaluated DALF with a single app using a single plugin. In this section, we evaluate the performance when multiple apps access a resource mediated by a single plu- gin. To answer this question, We simultaneously ran multiple instances of the LOCFINDER and FILEREADER microbenchmark apps. We focused on these two resources since ac- cording to our previous results, they are on the opposite ends of the performance slowdown spectrum. As Android limits the number of foreground apps that can run simultaneously, we modified our microbenchmarks so that they run their workloads in a background ser- vice and ran multiple instances of the services. Our results are illustrated in Figure 3.4.

In the case of location, DALF’s impact on resource consumption remains low as the number of apps increases. However, observe that the absolute time taken to access the location has increased. In the case of STOCK with a single app, it takes about 0.4 ms to access the location compared to 0.05 ms in Figure 3.2a. As we ran the previous experi- ments with an app running in the foreground, we attribute the increase in time to Android’s prioritization scheme that prioritizes foreground apps over background services [7]. The results for the storage experiments exhibit a similar trend with one key exception:

1 batterystats shows the amount of battery discharged since the device was last charged. However, as we reset its stats before each experiment run, the discharge value represents the amount of battery discharged during the experiment.

43 (a) STOCK (b) GEOIND

FIGURE 3.5: Results of finding nearby restaurants in Grubhub app. the absolute time taken to read files are lower than the results shown earlier in Figure 3.2d. We reran the storage workload experiments both in the foreground and background, and closely examined the timestamps observed by the tracer when the apps try to read files. The results confirmed that the app running as a background service was indeed reading files faster than in the foreground. We suspect that the tracer, which runs as a separate process, interacts with Android’s prioritization scheme in a way that favors the background service over the foreground. We aim to investigate this further in future.

3.6.6 Real world apps

We tested our plugins with real world apps downloaded from the Google Play Store and F-Droid [15]. Our goal was to understand how well the plugins worked and the issues either the plugin developers or users would face. In general, we were encouraged by our observations. Most of the apps we tested worked seamlessly with the plugins. We demonstrate this with Grubhub [20], an app to order food from restaurants. First, we searched for nearby restaurants without any plugins. Next, we repeated the search with

44 the GEOIND2 plugin applied to perturb our location. In both cases, we sorted the results by the restaurants’ distance to the current location. Figure 3.5a and Figure 3.5b show the screenshots of the app’s results page from both experiments, respectively. First, observe that the list in the two experiments are different. This is expected as the location given to the app is perturbed when the plugin is applied. Second, the overlap of the top results in the two lists is a result of running the experiment in an area with a relatively small number of restaurants. Interestingly, this suggests that there are cases where the utility of an app may only be minimally affected by the use of plugins that perturb resources.

We also discovered that a major benefit in DALF, compared to approaches like Xposed [35], is that a plugin developer only has to focus on whether an app uses a particular resource such as location. She does not have to reason about hooking the myriad Android APIs that provide location data. Consequently, this significantly reduces developer effort (our

GEOIND plugin is written in less than 200 lines of Java). In the cases where we observed issues between apps and plugins, they were due to bugs in our prototype. To illustrate, the VLC media player app [34] at the moment has issues with plugins that interpose on external storage. We narrowed it down to an interaction between the prototype’s syscall tracer and the app. Testing with more real-world apps and fixing such bugs remain an ongoing effort. Given the promising results, we believe an important next step would be to work on the limitations that prevent DALF from being practically adopted. We discuss them further in the next Section.

3.7 Limitations

We now discuss the design and prototype limitations of our work.

2 We set , the differential privacy budget, to 0.1.

45 3.7.1 Design

Reasoning about app semantics: In DALF, it is difficult for a plugin to reason about why an app is requesting a particular resource. Consider the use of GEOIND with Google Maps: since it indiscriminately adds noise to the user’s location each time the app requests it, the blue dot in the app that indicates the user’s current location keeps moving. Consequently, features of the app such as finding directions from the current location become unusable. One approach to address this limitation is to allow users to apply the plugins on a per-app-feature basis. For example, a user may want to allow Google Maps to access her location when she is using it for navigation, but not when she is merely browsing the maps. Alternatively, DALF may provide plugins additional context surrounding the app’s resource request, such as whether the app is running in the foreground, whether an app’s resource request was initiated by a user input event, etc. In this latter approach, we need to take into account misbehaving plugins so that they are not provided excessive information about users. We consider this the most important limitation to address as it will increase the granu- larity of the permissions model in DALF and enable a wide class of plugins.

Collusion between apps: It is possible for apps on Android to collude in order to evade the existing permissions model. For example, an app permitted to access the location may do so and send it to other apps that do not have the location permission. This is known as

the confused deputy attack [60] and DALF currently does not have any provisions against it. Techniques proposed in prior work such as Quire [60] and BinderFilter [118] may be

applied in DALF to address this limitation.

46 3.7.2 Prototype

Unimplemented features: Aside from the aforementioned limitations, the design of DALF is general and there is a sheer number of possible actions that a plugin may wish to per- form: interpose accesses to sensor data (e.g., accelerometer); interpose accesses to other content provider data (e.g., SMS); tweak the parameters of a camera capture request; con- trol the network accesses of an app with a IP whitelist or blacklist; allow a resource request only for the next N minutes; etc. These actions are currently not supported by our pro- totype because the required machinery have not been implemented. We expect to be able to add these features using the binder and ptrace techniques described earlier. We are currently working on expanding our prototype’s capabilities.

Inefficient syscall tracer: The ptrace-based tracer used to control an app’s access to the external storage is inefficient. First, ptrace traces all syscalls and not just openat(). Hence, the tracer traps twice for every single syscall called by the threads in an app. Sec- ond, for each openat() syscall, the tracer has to perform additional syscalls to identify the syscall called by the tracee and to read the filename argument. If the plugin returns a new file path, the tracer has to use more syscalls to copy the new path into the app thread’s stack and modify the syscall argument. Third, a single instance of the tracer runs for all threads in an app. If multiple threads in the app call syscalls in parallel, the tracer will han- dle each of them in a serial manner, thereby unnecessarily impeding parallelism. These issues are limitations of ptrace and to address them satisfactorily, the kernel needs to be modified. We aim to investigate these further.

In-band frame processing: Our DALF prototype invokes the onFrameAvailable() callback on a plugin’s camera interposer whenever a new frame is received from the cam- era stream. Since the plugin is required to finish perturbing the frame before the callback

47 method returns, our prototype only supports in-band frame processing at the moment. It thus prevents camera plugins from performing certain kinds of optimizations, such as us- ing multiple threads to process frames in a pipeline. Enabling out-of-band processing is a matter of modifying the prototype so that plugins are required to call a sendFrame- ToApp() method when they have finished perturbing a particular frame. In this approach, onFrameAvailable() will become a non-blocking call that immediately returns after delivering the newly available frame to the plugin.

3.8 Conclusion

In this chapter, we presented DALF, an extensible permissions framework for Android. It gives users fine-grained control over the resources accessed by apps using permissions plugins. The key in DALF is that the plugins are apps themselves thereby granting them a wide set of capabilities while also (i) limiting the amount of trust extended to the plugins, and (ii) preventing apps from circumventing the plugins applied on them. Our evaluation with a prototype implementation suggests that DALF’s design is practical. It generally exhibits low overheads and in those cases where they are high, they appear to be a result of an unoptimized prototype rather than limitations of the proposed framework.

48 4

Privacy Markers for Protecting Visual Secrets

The increasing popularity of recording devices that continuously capture video and the prevalence of third-party applications that utilize these feeds have resulted in a serious threat to user privacy. In many situations, sensitive objects are maliciously (or acciden- tally) captured in a video frame by third-party applications. In this chapter, we investigate an approach of mitigating the risk of such inadvertent leaks called privacy markers. Pri- vacy markers give users fine-grained control over what visual information an application can access through a device’s camera. We designed a representative system, WAVEOFF, that helps users mark public regions in a camera’s view and only deliver the content within the public regions in subsequent frames. We have integrated WAVEOFF with Android’s camera subsystem. Experiments with our prototype show that a Nexus 5 smartphone can deliver near real-time frame rates while protecting secret information, and a 26-person user study elicited positive feedback on our prototype’s speed and ease-of-use.

49 4.1 Introduction

All smartphones have cameras, as do many wearable gadgets and Internet-of-things de- vices. These devices are becoming increasingly interconnected allowing them to contin- uously monitor user’s surroundings. In order to provide a personalized experience, many applications (running on such devices) gather sensitive information for user profiling. With Internet-of-things and Life logging camera, the extent to which such information can be gathered is unimaginable [71, 90]. For example, Word Lens1 - a text translation app on Google Glass accesses a real-time video feed from Glass, detects text in the input, per- forms required translation and projects the translated video back onto the user’s view. Strictly speaking, a translation app only needs the textual content of the video, but in- stead it accesses the whole video even when no text is present! A malicious translation app can easily gather personal information about a user underneath the pretense of pro- viding translation services. Even if the app is not malicious, it may inadvertently record personal information about the user. For example, a Glass user may accidentally share an embarrassing picture while playing with it. The always recording feature of wearable devices can leak sensitive information like personal pictures, enterprise secrets, health information etc. Although, apps provide some level of functional access control, the user doesn’t have fine grained control over what information is revealed and what information is kept private against these third-party apps. In this work, we address the problem of inadvertent leak of sensitive information through smartphone cameras. Unfortunately, preventing leaks by determined attackers is extremely hard. However, one can give individuals and organizations enough control on their devices so that they can reduce the risk of leaking private information. To this end, we present a marker based system, WAVEOFF that is based on the principle of least privilege for protecting visual secrets. Users use simple privacy markers to specify which

1 http://questvisual.com

50 objects in the camera’s view can be seen by mobile apps. In particular, WAVEOFF allows marking objects in a camera’s view, such as a new prototype of a drone which user would like to reveal during a video chat with her collaborators. Users can mark a rectangular

region bounding a public object on the camera preview. WAVEOFF then builds a model of the object by extracting features from the marked region and uses this model to recognize and reveal only that object in subsequent frames.

In a 26-person user study, participants were able to successfully use WAVEOFF to mark

public regions, and found out system easy to use and fast. WAVEOFF reveals enough of the public region to allow unimpeded functioning of apps like QR code scanner. On a

benchmark of representative videos, WAVEOFF blocks at least 99% of non-public regions while supporting at least 20 frames per second on a Nexus 5 smartphone. The rest of the chapter is organized as follows: Section 4.2 motivates the problem of protecting visual secrets and our approach of using privacy markers. Section 4.3 outlines the principles that inform the design of WAVEOFF. Section 4.4 delineates our prototype

implementation of WAVEOFF. Section 4.5 describes the results of a user study and a comprehensive evaluation of privacy, utility and performance of our system. Finally, we provide related work in Section 4.6 and conclude in Section 4.7.

4.2 Motivation

Figure 4.1 describes two scenarios that capture how camera-driven apps can put secret information at risk: a video chat with sensitive information visible in the background, and a password-entry app that should only view a small number of objects. A naive solution for protecting visual secrets is coarse-grained blocking. An or users can turn off their cameras when sensitive information is in view. However, we can not apply this solution in above mentioned scenarios as the secret information and the application-essential information are co-mingled.

51 Bob is video chatting on his laptop with collaborators. He has a prototype model of a newly designed drone and a bottle of medication on his desk. Bob would like to show the prototype to his collaborators, but would be upset if his collaborators saw his medication. (a) Scenario one: Video chat

Cathy relies on an augmented-reality app on her mobile device to superimpose indicators over her keyboard to help her input a complex password. While she is comfortable allowing the password app to view her keyboard, she does not want the app to capture her computer screen or anything else on her desk. (b) Scenario two: Password entry

FIGURE 4.1: Hypothetical scenarios.

A fine-grained approach would be to apply principle of least privilege to visual content and provide applications access to only the visual information required for their function- ality [74, 76]. In such systems, the application must request access to objects within a camera’s view. If the request is granted, the system can use computer vision to ensure that the application is only allowed to access requested objects in the camera’s view. Therefore, these systems must anticipate the classes of objects that applications will want to access, and provide appropriate recognizers to efficiently and accurately identify those objects in the camera frames. Such an approach works well when the objects of interests are known in advance and are limited in numbers. For example, a study of 87 Kinect applications found that most of the applications only need access to objects identified by four recog- nizers [74]. However, as illustrated by our use cases, it is difficult to anticipate all classes of objects that mobile apps may need to access. Thus, least-privilege systems that use predefined recognizers are likely to provide strong security, but support fewer applications. For example, in the “video chat” scenario, if the application can only access faces and hands, then it cannot capture more spontaneous moments, such as sharing a prototype drone. A second fine-grained approach is to block secrets by defining sets of private objects and using computer vision to remove those objects from an application’s view [50, 63,

52 107]. As with least privilege, these systems require designers to anticipate a large uni- verse of objects, which vision algorithms must detect and block. For example, to protect a medicine bottle from a video-chat application, the system would have to construct a recognizer for a medicine bottle. In light of these challenges, we propose privacy markers which provide a promising new approach to mitigating camera-based leaks. At a high level, privacy markers consist of two parts: (1) a simple interface for marking objects in the physical environment, and (2) device software for efficiently recognizing marked objects. WAVEOFF is a privacy-marker system that takes a least-privilege approach. When a user puts her device into privacy- mode, apps can view only marked objects through the camera. This approach provides solutions to each scenario described in Figure 4.1.

In the “password” scenario, Cathy can use WAVEOFF on her mobile device to mark any keyboard she uses to enter a password. Thereafter, Cathy can safely allow the password- manager app to view the camera input and be assured that everything other than the marked keyboards will be blocked from its view. Similarly, in the “video chat” scenario, Bob can mark his face and his prototype drone model using WAVEOFF before starting the chat session so that only those objects are visible to his collaborators.

4.3 Approach Overview

In this section we provide an overview of WAVEOFF including its design principles, at- tacker model and limitations.

4.3.1 Design principles

WAVEOFF must be easy to use and efficiently block all non-public regions from a camera’s view at runtime. The following design principles guided our work.

53 (a) coffee mug (b) marked as safe (c) revealed by WAVEOFF

FIGURE 4.2: An illustration of marking a coffee mug safe via WAVEOFF.

Avoid special equipment

Since, the user needs to mark public objects, we wanted to build an efficient marking tech- nique that balances both ease of use and efficient detection. Before arriving at our current marking technique, we considered alternative approaches, such as affixing or projecting QR codes onto public objects. However, we rejected these approaches as too inconvenient for users in practice. Ideally, a user should be able to specify public objects without the need for any special equipment.

WAVEOFF is used for sharing objects that are difficult to physically mark, such as three-dimensional objects. To mark these objects, we take advantage of the recording device’s screen by overlaying a bounding box on a live camera feed, as seen in Figure 4.2b. The marking UI is part of an app that is trusted to handle unmodified camera data.

Simplify recognition

A potential drawback of simplifying life for users is that it can create complexity for the vision algorithms needed to identify objects. Hence, we carefully designed markers for

WAVEOFF that play to computer vision’s strengths. WAVEOFF recognizes objects by ex- tracting features from the bounded area of a camera’s view to build a model of the objects within and stores the model in a database. We would like to note that WAVEOFF recog- nizes a specific object (e.g., Cathy’s Mac keyboard) rather than a class of objects (e.g., any

54 Mac keyboard). This simplifies recognition since the model can use all distinctive key- points (including those that might not be present in other objects of the same type) without worrying about overfitting. In order to build the model, WAVEOFF computes descriptors at selective keypoints in the marked region of the camera’s view. The keypoints are the distinctive regions (e.g., corners and edges) of the object and descriptors are the features at those keypoints (e.g., gradient and orientation). We use BRISK [85], a set of binary features, to compute keypoints and the corresponding de- scriptors. Computing and matching BRISK features is efficient and is ideal for real-time performance on low-power mobile devices. WAVEOFF uses feature matching to detect the presence of a marked object in a given frame. For each frame, it computes the BRISK fea- tures and matches them with the features of models in the database. The feature matching process is independent for every object allowing WAVEOFF to scale well with the number of objects by parallelizing feature matching task. WAVEOFF unblocks a region when at least 20% of its features match a model in the database.

Track to mitigate recognition failures

Recognizing privacy markers on every incoming camera frame is computationally inten- sive and can cause a low frame rate. Furthermore, recognizers can fail if the camera’s viewpoint changes significantly. Building recognizers for multiple viewpoints is costly and inconvenient. We address both issues by tracking features since tracking is robust to minor changes in viewpoints and faster than object recognition. In the computer vision literature many approaches have been proposed to use object tracking in conjunction with object detection to improve overall accuracy [80, 92]. In- spired by this idea, WAVEOFF tracks object features across consecutive frames using the Lucas-Kanade optical-flow-in-pyramids technique [121]. While recognition is essential when an object appears in a stream for the first time, it is less helpful when the object is fully visible, because tracking alone is sufficient to identify the object’s location in subse-

55 quent frames. Hence, we skip recognition altogether when tracking finds enough features (at least 75% of the features stored in the model). However, when there are not enough features, tracking alone can introduce localization errors that propagate across subsequent frames. Hence, we use both matching and tracking to find a larger set of features when tracking alone is insufficient. We estimate an object’s location in the camera’s view using Consensus-based Matching and Tracking (CMT) [93].

Block under uncertainty

For privacy-sensitive applications, blocking the entire camera view might be acceptable when the system is uncertain about the presence of public regions. In fact, in our user study, we found that initially blocking entire frames for several seconds while scanning a QR code did not impact the utility of the app. Based on this observation, we take a conser- vative approach and only reveal a region when we are confident that it has been marked. However, blocking the entire frame under uncertainty may create a poor user experience when an object is blocked frequently after it appears. This can happen when a frame be- comes blurry due to motion. To address this issue, during times of uncertainty, the system displays the previous frame (in which the object was visible) instead of showing nothing. This approach does not harm security because it does not reveal any new information, but it can help usability. Note that if the view changes significantly (i.e., the object disappears) we quickly detect the change and stop showing the old frame.

4.3.2 Trust and attacker model

Privacy-marker systems assume that recording devices run a combination of trusted and untrusted software. The computer-vision software needed to recognize and track marked regions must be part of the trusted computing base. This software most likely resides in the camera subsystem of the computing platform, either integrated with the camera driver or with a trusted camera service, as in Android.

56 We are primarily concerned with inadvertent leaks by untrusted, third-party software, such as video-chat and life-logging apps. We assume that untrusted software can only access camera data through well known APIs defined by the device platform, and that privacy-marker software is properly isolated from third-party software. Our trust model is based on settings such as an enterprise, in which a company may purchase video-chat stations and employee equipments like laptops and smartphones. We cannot prevent a determined attacker from capturing secrets using a malicious recording device, such as an analog video recorder or a digital recording device that ignores users’ markers.

4.3.3 Limitations

Though WAVEOFF is robust in many settings, its reliance on standard computer-vision algorithms limit its security guarantees and can impinge on apps’ utility.

First, WAVEOFF requires users to mark public regions with a rectangle. As a result, it is possible for sensitive information to inadvertently appear within a marked area and cause

a leak. For example, if a user marks an object with WAVEOFF as public, but a secret object is placed in front of it without completely occluding the public object, then an app may view the secret object. We believe that experience using privacy markers would reduce the likelihood of such incidents, but they could never be eliminated.

Second, WAVEOFF can confuse among similar looking objects and accidentally reveal a private object if a similar looking object is marked as public. For example, in our ‘pass-

word entry’ scenario, after user marks her Mac keyboard, WAVEOFF may reveal other Mac keyboards as well. Since, all these Mac keyboards are almost identical it is impossi- ble for vision algorithms to distinguish them. We believe that all objects identical to the one marked public by a user are probably not sensitive objects.

Third, a camera could zoom so far onto an object marked by user such that WAVEOFF may not reveal the object due to insufficient details (e.g., features). While this would not

57 be insecure, it would hurt the utility of an app that needs access to the blocked region. This could be particularly problematic for large, marked objects with very small details inside; the only way to take a picture would be from a far enough distance that the object was fully visible, but at this distance little of the content may be legible.

Finally, while WAVEOFF allows users to mark arbitrary objects, it only allows objects to be treated as public or private. Thus, when a device is in privacy-mode, an app can access all public regions. It is easy to imagine scenarios in which a user may want to restrict an app to a narrow subset of private objects, such as our hypothetical password- entry app. To address this problem, one could provide a UI that allows users to specify which marked objects should be revealed to which apps.

4.4 Implementation

We implemented WAVEOFF by modifying the Android Open Source Project (AOSP) ver- sion of Android 5.1. Our prototype currently runs on a Nexus 5 smartphone. Before we describe the implementation of WAVEOFF, we first provide some background on An- droid’s camera subsystem. We then present the design of an initial implementation that recorded video at an unacceptable four frames-per-second (FPS). We then highlight the causes of this poor performance and describe techniques that allow us to record video at a median framerate of 20 FPS.

4.4.1 Android’s camera subsystem

In Android 5, Google made significant changes to the camera subsystem to give apps more control over the camera feed. Figure 4.3 shows how the subsystem is split between hardware-dependent and hardware-independent software layers. The hardware-dependent layer consists of the camera device-driver that interacts directly with the physical camera, and a hardware abstraction layer (HAL) that implements a common interface for the cam- era service. Because the camera service is hardware independent, it is the ideal place to

58 Capture Request

Preview Image capture Video android. android. android. graphics. media. media. App SurfaceTexture ImageReader MediaRecorder

android. Application hardware. Framework camera2

BufferQueues

Camera Service

WaveOff stream Stream1 Stream2 Stream3 configuration Hardware Independent

Hardware Hardware Abstraction Layer Dependent

Camera Device Driver

FIGURE 4.3: Android’s camera subsystem.

integrate WAVEOFF. An app can submit a capture request to the camera service in one or more of the following modes: (a) preview, (b) image capture, and (c) video recording. In addition to a mode, each capture request describes a set of image attributes, such as resolution and pixel format. For instance, an app may request access to the camera in preview mode at a resolution of 1280 × 960, or in image-capture mode, with the image stored in a file at a resolution of 640 × 480. The camera service creates one stream (an internal buffer) for each capture mode. The camera service and HAL can access all streams, and each stream is protected by its own lock. Multiple streams can be active at a time, e.g., a preview stream and a video- recording stream, and the active streams are configured according to their corresponding capture-request attributes. The application framework also creates a BufferQueue for each

59 capture mode so that the camera service can send image data to an app. When the camera driver delivers a frame, HAL locks all active streams, copies image data to those streams according to their configuration parameters, and then releases the locks. To forward frames to an app, the camera service acquires all active streams’ locks, copies image data from the streams to their corresponding BufferQueues, and releases all locks.

4.4.2 WAVEOFF in Android

In this section, we present our Android implementation of WAVEOFF. We start with a naive design that highlights the challenges of achieving real-time performance, and then describe how we overcame these challenges.

A simple implementation

Our initial implementation integrated WAVEOFF into Android as follows: (1) the camera service locked each active stream and sequentially passed each stream’s image data to

WAVEOFF; (2) WAVEOFF detected any public regions and masked the rest of the image; (3) the camera service copied each modified image to its corresponding BufferQueue and released all locks. This straightforward implementation resulted in unacceptable video-recording perfor- mance. In particular, on our Nexus 5 we could only record video at a median rate of 4 FPS. This was due to two factors. First, the camera service held all active-stream locks while

waiting for WAVEOFF to identify public regions within a frame. This blocked the HAL for long stretches, and caused it to drop frames. Second, streams with different resolution and pixel format still contain most of the same visual information. Our initial implementation ignored similarities across streams, e.g., between a preview and video-recording stream, and needlessly analyzed each frame independently. Our current implementation addresses both of these problems.

60 Table 4.1: WAVEOFF API for interacting with the camera service. Method Description processFrame Sends a camera frame data for processing. isFrameAvailable Checks if a frame is available for the delivery to application. getLastRects Returns a list of regions marked public in the last frame. addObjectModel Builds and adds model of the marked public object in model database. removeObjectModel Removes model of the marked public object from model database.

Improved implementation

WAVEOFF is implemented as a separate module loaded by the camera service when the camera is turned on. We rely on OpenCV for all computer-vision algorithms. WAVEOFF exposes an API of six calls to the camera service, which is described in Table 4.1.

WAVEOFF maintains one queue for incoming frames, InQueue, and a second queue for processed frames, OutQueue. The camera-service thread blocks until it receives a frame from HAL, and then passes the frame to WAVEOFF via a processFrame call. We currently set InQueue’s maximum depth to 5, and calls to processFrame will add a new frame to InQueue as long as there is enough room. If there are already 5 frames in the queue, processFrame drops the new frame. All calls to processFrame return as soon as the passed-in frame has been copied onto InQueue or dropped.

To process frames in InQueue, WAVEOFF has a dispatch thread that blocks waiting for frames to be added to InQueue. After dequeueing a frame from InQueue, the dis- patch thread hands off the frame for further processing. Processed frames are eventually forwarded back to the camera service by placing them in OutQueue. After calling processFrame, the camera-service thread checks for fully processed frames by calling isFrameAvailable. If a frame is available in OutQueue, is- FrameAvailable dequeues the frame and returns it to the camera service. The camera service can then copy the processed frame to the appropriate BufferQueues. Allowing the camera service to accept new frames from the HAL before older ones have been processed prevents the HAL from dropping frames. However, if WAVEOFF cannot analyze frames

61 fast enough, InQueue will fill up and cause the camera service to drop frames.

WAVEOFF’s performance depends on the number of object models in its database, and

for each model WAVEOFF spawns one thread. Each thread is responsible for matching and

tracking a specific public object. However, before tasking the object threads, WAVEOFF’s dispatch thread dequeues a frame from InQueue and makes a single pass over the data to compute all image features. It then passes these features to the object threads before they begin their work. The last object thread to complete its analysis aggregates the results of the others to render the final frame.

Video recording

When multiple streams are active, e.g., the preview and video-recording streams, our initial implementation sequentially and independently processed a frame from each source. This

effectively doubled the work that WAVEOFF had to perform even though the visual content of each stream was nearly identical. To eliminate redundant analysis, WAVEOFF only processes frames from the preview stream. After processing a preview frame, it adds the modified frame to OutQueue and save the coordinates of all public regions so that they can be applied to other frames. In particular, for non-preview streams, the camera service uses the getLastRects method to retrieve the coordinates of any public regions that were identified in the preview stream. It can then mask the non-preview frame after adjusting for the stream’s settings, such as its resolution.

In addition, to re-using processing results across multiple streams, WAVEOFF applied several other optimizations. First, analyzing frames in the Nexus 5’s original resolution of

1280 × 960 takes several seconds. Therefore, WAVEOFF resizes each frame to 320 × 240

before performing any compute-intensive operations. Additionally, WAVEOFF operates on grayscale frames to reduce processing time. The camera-preview stream delivers frame data in the full-color YUV format, and OpenCV converts a YUV frame with resolution 1280 × 960 to grayscale in about 30 ms. However, for a 1280 × 960 image, the YUV

62 format uses the first 1280 × 960 bytes to store an image’s luminance data, which encodes a grayscale version of the frame. We found that simply copying the luminance data of a YUV frame into a separate buffer creates a grayscale version of the frame in under 1 ms.

Finally, WAVEOFF avoids allocating and de-allocating memory whenever possible. The camera service pre-allocates all circular buffers, queues, and other data structures when it loads our modules. We only allocate new memory when a stream’s configuration, such as its resolution, changes, and we only de-allocate memory when the camera is turned off.

Image capture

Capturing an image with an active preview stream is slightly different than video record- ing. When a user takes a picture, the HAL returns a JPEG-compressed byte stream rather than a YUV frame. The coordinates returned by getLastRects correspond to raw pixels, and thus the JPEG image must first be converted before it can be masked. After masking any non-public regions, the camera service converts the result back to a JPEG and forwards it to the appropriate BufferQueue. AOSP provides jpeglib (version 6b) for encoding and decoding JPEG data. However, this version of the library only decodes and encodes JPEG data through the file system. We avoided costly system calls by adding two support functions to perform in-memory encoding and decoding. Finally, to read and write to the image-capture JPEG stream in the camera service, we modified the HAL implementation provided by LG, which manufactured the Nexus 5. Each stream has a usage flag that determines how the stream will be accessed by different components in the camera subsystem. We added two flags to the JPEG stream: GRAL- LOC USAGE SW READ OFTEN and GRALLOC USAGE SW WRITE OFTEN. These flags direct all JPEG data to the camera service for reading and writing. These small changes were the only modifications we made to a hardware-dependent component.

63 4.5 Evaluation

In this section, we discuss the results of an extensive evaluation of WAVEOFF. First, we show the usability of our system based on a controlled user study performed on 26 partici- pants.2 Then, we describe the accuracy and run-time of our system on a set of benchmark videos covering real-world scenarios. Finally, we discuss the energy and performance overhead of our prototype implementation on Nexus 5.

4.5.1 User study

Here, we describe the user study design followed by the analysis of the results.

Study design

The aim of the user study is to understand whether the experience of using a camera

enabled app is significantly different (in terms of ease of use and speed) on WAVEOFF enabled smartphones versus stock (unmodified) android smartphones. We randomized participants into two groups: 1) Control group, which performed baseline evaluation on Nexus 5 running stock AOSP and 2) Case group, which performed evaluation on Nexus 5

running AOSP instrumented with WAVEOFF.

We use a QR code scanner app to evaluate WAVEOFF for the following reasons: 1) QR code scanning is a popular use case that requires real-time access to the camera, 2) It requires the input image (video feed) to be of high quality for accurate and fast scanning, and 3) It allows us to accurately measure both quantitative (time to scan, accuracy of scanning) as well as qualitative (user satisfaction) results. To scan a QR code, we use a popular app Barcode Scanner3 that is available for free on Google Play Store and has more than 100 Million downloads.

2 This study is approved by the university Internal Review Board (IRB). 3 https://play.google.com/store/apps/details?id=com.google.zxing. client.android&hl=en

64 First, we ask participants to complete a short pre-study questionnaire about their fa- miliarity with smartphone apps and, in particular, camera enabled apps. Then, we explain to them the purpose of the study, describe the task they need to perform and clarify any doubts they have about the study. Each user session involves scanning a QR code affixed to a coffee mug. This is followed by a post-study questionnaire about their experience of the task. In case the participants cannot scan a QR code in one minute, we ask them to repeat the experiment at most 3 times. Based on their group, the scanning task is slightly different as explained below. Control group: The participants in Control group evaluate the baseline for comparison. After explaining the purpose of the study, we show them how to scan a QR code using Barcode Scanner. Specifically, we tell them to start the app and point the camera at QR code such that the QR code stays in the center of the viewfinder. We also instruct them to adjust the position of the camera (i.e., near or far from the QR code) for a successful scan. After that, they scan a QR code on a coffee mug. Case group: The scanning task is same as explained above with the exception that it involves an additional marking phase before scanning QR code. Since, participants in

Case group are evaluating WAVEOFF, we ask them to mark a public region containing QR code before scanning so as to reveal only the marked region to the scanner app. We explain the process of marking a public object. First, we show them how to mark a dummy object (in our case a different cup) using the marking app that we built. We emphasize on the fact that the marked object should be properly visible and fully enclosed by the digital marker (in our case a resizable rectangle). Once they understand the process of marking, we ask them to mark the coffee mug using the marking app and scan the QR code affixed on it using Barcode Scanner. Measurements: Through this user study, we wanted to answer the following questions about WAVEOFF with respect to the baseline: 1. What is the overhead of the marking process?

65 Table 4.2: Salient features of user study participants. Volunteers: Men/ Women/ Total 19/ 7/ 26 Age: min/ max/ median 23/ 53/ 27 Background: CS/ Non-CS 20/ 6 Number of participants using camera at least 10 times per week 7 Number of participants using video chat service at least 2 times per week 8 Number of participants using at least 3 different camera-enabled apps per month 14 Number of participants scanned documents using camera: checks/ receipts 16/ 7

2. What is the accuracy of scanning the QR code? 3. How easy was it for a user to scan the QR code? 4. How fast was it for a user to scan the QR code? To answer first two questions, we created an Android app that users use to scan the QR code. Through this app, we measure the time it takes for a user to scan the QR code as well as the correctness of scanning. The app has a scan button, that launches Barcode Scanner through Android Intent. The Barcode Scanner automatically closes upon successful scan and returns the scanned content through Intent. The result of scanning is displayed on the screen for user’s reference. Using this app, we monitor the time taken to scan a QR code as well as whether it was a successful scan or not by checking the scanned content. We also measure the time taken to mark a public region. The marking app logs the time taken to mark an object which includes time to capture the image, marking the object and loading the model of the object. For qualitative evaluation (Questions 3 and 4), we designed a post-study questionnaire consists of two questions and an optional feedback section. Both questions are presented as a statement and are of the form ‘Completing the study tasks was fast’ and ‘Completing the study tasks was easy’. Users were asked to report their level of agreement with each statement on a scale from 1 (strongly disagree) to 7 (strongly agree).

66 50 40 30 20 Time (sec) 10 0 AOSP WaveOff

(a) Scanning time (b) User ratings

FIGURE 4.4: Usability results of WAVEOFF.

Recruiting participants

We sent out the recruitment ads through university mailing lists and social media such as Facebook. We asked interested participants to fill out a demographic questionnaire on our study website. Based on the responses, we selected 26 participants for our study. We randomly divided them into two equal groups for Case and Control. As an incentive, we gave Amazon gift card for the amount of $10 to each participant on the completion of the study. Before the start of the study, each participant was asked to fill out a pre- study questionnaire. We summarize the demographic information of participants as well as answers to pre-study questionnaire in Table 4.2. The summary shows that we have a diverse group of participants with a varying frequency and purpose of camera usage.

Results

Each participant successfully finished the user study within 10 minutes. In all cases, they were able to correctly scan the content, which shows that the frames altered by WAVEOFF do not affect the functionality of the mobile app. Next, we discuss our findings about the time to scan the QR code and ease of performing the task as reported by participants.

67 Scan time: Figure 4.4a shows the scanning time for WAVEOFF compared against the baseline. The participants took an additional 2.35 seconds to scan the QR code (on coffee mug) compared to baseline. This increase in time is mainly due to the fact that when the users run the Barcode Scanner app, they see a completely blocked camera feed to start with, and they have to guess the correct location of the public object before the system starts recognizing and revealing the public object in the camera view. However, this delay didn’t affect the user performance as reflected by the user ratings for the speed of task completion. Figure 4.4b shows the average (mean) user ratings on a scale of 1 to 7, where 7 means user completely agrees with the fact that scanning QR code was fast. The av- erage user ratings for scanning QR code on a coffee mug, was 5 and 6.69 (with standard deviation of 2.5 and 0.75) for baseline and WAVEOFF respectively. This is a surprising result considering the fact that WAVEOFF actually took slightly more time to scan the QR code than the baseline system. This counter intuitive result can be explained as follows:

Upon starting the camera, the screen is completely black until WAVEOFF recognizes the coffee mug. Not only does this lower the users’ expectations of success in scanning, it also creates an illusion of delay in camera startup (and thus, attributing the delay in scanning to delay in camera startup). Furthermore, when WAVEOFF reveals the coffee mug, it is fully visible in the view without any artifacts (such as motion blur). The fact that WAVEOFF is able to recognize a coffee mug means that the frame is of a quality such that the Barcode Scanner app can also recognize the QR code in the same frame. In other words, from a user’s point of view the scanning is performed immediately after coffee mug appears on the screen, explaining the high ratings. The high variation in baseline ratings for scanning coffee mug task, is due to the fact that 2 participants gave low ratings as they had to retry scanning. We found that in general, it is hard to scan a QR code on the curved surface, if the camera is not positioned properly.

Note, that this doesn’t affect the WAVEOFF experiment because participants marked an object before scanning and, hence, were more careful in positioning the camera.

68 Table 4.3: Scenarios to evaluate WAVEOFF. Scenario Description S1 User is taking a picture. S2 User is recording a video with camera rotation. S3 User is recording a video with the public region going in and out of view.

Ease of use: Figure 4.4b shows the average ratings (on a scale of 1 to 7) given by participants about how easy they feel the task was. On average, participants gave gave

5.07 and 6.38 ratings (with standard deviation of 2.59 and 1.12) for baseline and WAVEOFF respectively. These results show that WAVEOFF does not degrade the usability of the app. In order to understand the ease of marking process, we monitor the time to mark a public region. The task of marking a public region was a new experience to the partici- pants. Almost all the participants felt that the marking scheme for WAVEOFF is simple. The median time taken by the participants to mark a coffee mug is 11.2 seconds, which is acceptable considering it is a one time process.

4.5.2 Evaluating real-world scenarios

In this section, we further evaluate WAVEOFF on scenarios that are not tied to any spe- cific app. We recorded several videos of 10 seconds length using the default camera app in Nexus 5 running at a resolution of 1280 × 960. These videos represent several dif- ferent real-world scenarios under different use cases which we discuss in the following subsection.

Dataset description

We evaluate WAVEOFF on the following three use cases: (a) a public object in front of a plain background, (b) a public object with other private objects around it, and (c) a public object with a cluttered background. For every use case, we analyze our system’s performance under three different scenarios as shown in Table 4.3. Scenario S1 simulates a user taking a picture with a camera-equipped device. In this

69 FIGURE 4.5: WAVEOFF results on (a) plain background, (b) plain background with private object, and (c) cluttered background with private objects use cases. scenario, the camera is focussed on the public object as the user’s intent is to take a picture. S2 represents the scenario where a user is taking a video of a public region while camera is being rotated. S3 represents a scenario, where the user is taking a video of the public content along with its surroundings and, hence, the camera moves around. In this scenario, the object is in the view for first 3 seconds, then the camera pans to the right (and the public content goes out of the view) and then comes back to its original position for last 3 seconds.

Accuracy

We measure the accuracy of our system in terms of standard precision and recall measure- ments. To collect the statistics, we first turned off the blocking feature and recorded videos under scenarios S1, S2 and S3. The raw video footage helped us to collect the ground truth by explicitly recording the coordinates of public objects in the view. At the same time, we allowed WAVEOFF to log the coordinates of the public region detected by them during the recordings. To calculate precision and recall, we divide each frame into cells of size 5 × 5. Since, our videos are taken at 1280 × 960 resolution, the total number of cells in each frame

70 is 49152, which is a sufficient granularity to measure the accuracy. We mark each cell as public or private based on our ground truth. Later, we analyze the number of cells correctly marked by our systems as public or private. A cell is said to be revealed by the system if any part of it is revealed, otherwise we consider it as blocked. Formally, let pbt be the set

of true public cells and pbs be the set of cells revealed by our system. The precision and recall of revealing public content is calculated as follows:

pb ∩ pb pb ∩ pb precision = t s recall = t s pbs pbt

For each scenario under each use case, we repeated our experiment 3 times and report

the median precision and recall. Figure 4.5 shows accuracy achieved by WAVEOFF in re- vealing public objects. WAVEOFF achieves an acceptable accuracy indicating that privacy can be achieved without compromising utility. The worst recall for public object is 0.78 which happens in the rotation scenario (S2) because the camera is constantly rotating and the tracking algorithm loses some of the boundary cells around the public region marking them private. WAVEOFF does not rely on a fixed shape or size to recognize a public object. Instead, it relies on the object features, which results in estimation error at the boundary regions. We observed that WAVEOFF tends to reveal boundary cells around the object re- sulting in low precision for public object, specifically in case of S1 and S2, where object stays in the view for the entire duration of the video.

FPS

One of the primary objectives of WAVEOFF is to support camera-enabled apps that re- quire real-time camera feeds, e.g., video chat apps. Therefore, it is extremely important to deliver camera frames at a rate close to 30 FPS. Figure 4.6a shows FPS achieved by

WAVEOFF on Nexus 5. The median FPS over all the benchmark videos is 22. This shows that our system can be used in real-time applications. Overall, we observed that our sys-

71 30 35 25

25 20

FPS 15 FPS 15 10 5 S1 S2 S3 5 WaveOff 1 Marked2 Objects 3

(a) System run-time (b) Scalability of WAVEOFF

FIGURE 4.6: (a) Frames per second achieved by WAVEOFF across all the use cases. (b) Accuracy (left) and runtime (right) on multi object video.

tems achieve higher FPS for S3 compared to S1 and S2, because, in S3, the public region

remains out of the view for a significant number of frames. Hence, WAVEOFF does not need to track any object and quickly delivers the frame. On the other hand, in S2, due to camera rotation, the system needs to continuously track the change in position of the object, resulting in lower FPS.

Scalability

WAVEOFF requires users to mark public objects which are in turn revealed to apps. Hence, it is important to ask: what is the impact on the performance when multiple public objects

are marked by users? To this end, we measure the scalability of WAVEOFF on a represen- tative video of 3 different objects. We record a video using camera app of Nexus 5 with all three objects in the camera view. We create models of all 3 objects by marking them one by one. Initially, all 3 objects are private. Then, we make them public by pre-loading the corresponding model one by one. This allows us to evaluate WAVEOFF on the recorded video by increasing the number of public objects. Figure 4.6b (left) shows the accuracy of revealing public region in terms of precision and recall. The accuracy degrades gracefully with increasing number of objects. In partic- ular, with 3 public objects marked, WAVEOFF achieves 81% precision and 90% recall. The trend in FPS is similar as shown in Figure 4.6b (right). Increasing the number of marked

72 objects reduces the FPS since each stored model has to be evaluated on every frame for

detecting public objects. With 3 public objects marked, WAVEOFF achieves 16 FPS on

Nexus 5. These results show that WAVEOFF is able to simultaneously process up to 3 objects per frame, which is sufficient for may real world applications such as QR code scanner. Having more than a few public objects in a frame means that the majority region in a frame is public and one can reveal the entire frame without processing it.

Impact of motion on recognition

We observed that our systems fail to detect public regions whenever the frame is blurred

due to either object or camera motion. Due to motion blur, WAVEOFF can not find enough features in the frame to match with the stored model features. Furthermore, some of the important features are lost during tracking due to significant change in the appearance. When the public region is not detected, the user sees a completely blocked frame. Once the camera or the view becomes stable, our system starts recognizing and tracking the public region. To measure the impact of motion blur on recognition, we measured the recognition delay under camera motion (S3). The recognition delay is measured as the time difference between the frame where the public object first fully appears in the camera view and the

frame where WAVEOFF recognizes the object. We found that the median time to recover

from motion blur is 134 ms. Due to the principle of blocking under uncertainty,WAVEOFF does not reveal the object unless it is sufficiently confident, resulting in longer delays in some cases.

4.5.3 Performance impact on mobile device

In this section, we investigate the impact of our modifications on the Android system’s overall performance. We performed a simple experiment where we turned on the camera and let it run for one minute in preview mode and measured CPU load (in percentage),

73 980 60 7000 960

940 50 6000 920 40 5000 900

880 30 4000 Power (in mW) Memory (in MB) 860 CPU Load (in %) 20 840 3000 820 10

2000 800 AOSP WaveOff AOSP WaveOff AOSP WaveOff

FIGURE 4.7: Performance impact on (a) Power consumption, (b) Memory consumption, and (c) CPU load, over 60 seconds usage. memory consumption (in MB) and power consumption (in mW). We took the measure- ment every 100 ms using a popular app Trepn [73] developed by Qualcomm Technologies Inc. to monitor the system performance. We turned off the auto-brightness feature and fix the brightness level to 10. We also set the smartphone on Airplane mode to avoid varia- tion in measurements because of network-related operations. We repeat this experiment 5 times with a complete system reboot between any two repeat instances. We subject stock

AOSP and AOSP with WAVEOFF to this setup. In case of WAVEOFF, the camera was focused on a public object whose model was loaded in memory. Figure 4.7 shows the impact of performing a preview operation for a minute on mem- ory, power and CPU load. By running WAVEOFF, the system incurs an additional 140 MB of memory because it has to load the object model in the memory to perform match- ing. WAVEOFF increases the CPU load from a median of 11% to 31%. This high CPU load is due to the expensive detection and tracking algorithms applied on every frame dur- ing a camera session. The power consumption, expectedly, shows a similar trend where

74 WAVEOFF introduces a median increase of 1150 mW.

4.6 Related Work

Many solutions have been proposed that follow least privilege approach [50, 74, 76, 107] to protect visual secrets by intercepting video frames. Recognizer [74] is one of the pio- neer work in this category and provides fine grained access control in augmented reality systems. The idea behind recognizer is to have least privilege access control for third-party applications. In other words, the app should access only what is strictly required for its functionality and nothing more from the sensor input. The authors proposed Kinect based recognizers for various abstractions like skeleton, person texture, voice command, hand position, etc. and showed that around 90% of Kinect apps use only four recognizers. They also showed performance improvement by sharing recognizers among concurrently run- ning applications. A similar concept is used in SurroundWeb [114] system that protects privacy of an indoor area by exposing room skeleton abstraction to 3D web browsers for rendering web content. Enforcing least privilege is difficult when application requirements are ill-specified.

As a solution, WAVEOFF provides special marking scheme that allows dynamic content specification. Other systems have previously used markers to detect and block private content in the camera feed. Chaudhari et al. [50] used real-time audio distortion and visual blocking to protect privacy of subjects in a video. The proposed method blocks all the faces and is not customized for individuals. A similar problem was solved by Schiff et al. [107] using color based visual markers to detect and block faces. Their method is limited to the assumption that the face should have some distinctive color based marker (e.g., hat which has a color different from the background). Moreover, these systems do not protect a wide range of private objects due to the constrained marker designs (e.g., hat [107] or faces [50]). Enev et al. [63] proposed a novel approach of transforming raw

75 sensor data such that the transformed output minimizes the exposure of user defined private attributes while maximally exposing application defined public attributes. Another approach is to secure the APIs through which third-party applications access the camera data. Scanner Darkly [76] operates behind the OpenCV library and protects secret by applying image transformations before they are passed to applications. The applications can not directly use raw image data and instead has to access images through a Darkly interface. Darkly preserves user privacy by providing transformed image (e.g., blurred) or restricted access to raw image (e.g., only computations and no system/network calls). This is similar to our approach, but privacy markers can be applied to arbitrary objects. WAVEOFF does not require system designers to anticipate all sensitive objects in advance. A different line of work is enforcing access control policies for camera sensors. Chakraborty et al. [49] modified the Android system to limit access to sensors based on user-defined access control policies. Their system can convey the risks to the user as a list of possible inferences that can be drawn using the sensors accessed by the app. Based on this information users can choose to modify the app’s permissions. Roesner et al. [106] present a framework for world driven access control (WDAC) to mediate continuous visual and audio sensing based on real-world policies. This framework allows the system to detect a policy broadcast by a real-world object and operate at the policy defined privacy level. Recently, Davies et al. [58] proposed a conceptual system – privacy mediators, that interposes on every sensor stream to dynamically enforce privacy policies. There exists a line of work that examines the visual content for sensitive information and flags any suspicious content for users to review. For example, Templeman et al. pro- posed a PlaceAvoider [111] system that identifies images captured in sensitive locations and flags them for manual review. A similar system is ScreenAvoider [82] that detects computer screens in life logging images using Convolutional Neural Networks (CNN). To detect private content, these systems use sophisticated machine learning tools that require a large amount of training data and time, which limits their usage to offline applications.

76 Roesner et al. [105] introduced new security and privacy challenges which need to be addressed in the context of augmented reality technologies. They characterized these chal- lenges along two axes - system scope and functionality. The challenges are classified into output, input and data access along the functionality axis, and based on the scope they are classified into single application, multiple applications and multiple interacting systems. The proposed privacy markers protect information captured through camera input that are requested by single application. Hence, our solution can be applied to various challenges classified into input and single application category along the axes – functionality and system scope respectively.

4.7 Conclusion

This chapter has made the case that privacy markers are a promising new way to protect visual data from inadvertent disclosure by third-party apps running on trusted recording devices. We identified three main challenges to using the privacy marker approach to limit inadvertent disclosure of visual secrets – (a) privacy markers must be easy to apply by humans, (b) system software must reliably recognize marked regions, and (c) recogniz- ing privacy markers must be efficient enough for real-time applications. To address these challenges, we designed a representative system WAVEOFF that helps users mark public regions in a camera’s view and only deliver the content within the public regions in sub- sequent frames. WAVEOFF is implemented in Android’s camera service module and can provide privacy preserving access to the camera in near real-time (at a median rate of 20

FPS) with acceptable accuracy. A user study deemed camera operations with WAVEOFF quick and easy to use.

77 5

Sensor Privacy through Utility Aware Obfuscation

Personal data garnered from various sensors are often offloaded by applications to the cloud for analytics. This leads to a potential risk of disclosing private user information. We observe that the analytics run on the cloud are often limited to a machine learning model such as predicting a user’s activity using an activity classifier. In this chapter, we present OLYMPUS, a privacy framework that limits the risk of disclosing private user infor- mation by obfuscating sensor data while minimally affecting the functionality the data are

intended for. OLYMPUS achieves privacy by designing a utility aware obfuscation mech- anism, where privacy and utility requirements are modeled as adversarial networks. By rigorous and comprehensive evaluation on a real world app and on benchmark datasets,

we show that OLYMPUS successfully limits the disclosure of private information without significantly affecting functionality of the application.

5.1 Introduction

A large number of smart devices such as smartphones and wearables are equipped with so- phisticated sensors that enable fine-grained monitoring of a user’s environment with high

78 accuracy. In many cases, third-party applications running on such devices leverage this functionality to capture raw sensor data and upload it on the cloud for various analytics. Fine-grained collection of sensor data contains highly sensitive information about users: images and videos captured by cameras on smart phones could inadvertently include sen- sitive documents [104], and a user’s health status could be inferred from motion sensor data [102, 122]. We observe that in many cases, sensor data that are shipped to the cloud are primarily used as input to a machine learning (ML) model (e.g. a classifier) hosted on the cloud. In this work, we investigate whether we can avoid the disclosure of private sensor data to the cloud in such scenarios. Consider Alice who uses a driver safety app (e.g. CarSafe [120]) that helps her from distracted driving. The app uses a smartphone camera to record Alice, performs activ- ity recognition on the cloud using a deep neural network (DNN) and alerts her when she is distracted. While she is comfortable allowing the app to monitor activities related to distracted driving (such as detecting whether she is drowsy or inattentive), she is not com- fortable that the raw camera feed is uploaded to the cloud. She has no guarantee that the app is not monitoring private attributes about her such as her identity, race or gender. Our goal is to develop a mechanism that allows Alice to send as little information to the cloud as possible so as to (a) allow the driver safety app to work unmodified, (b) minimally af- fect the app’s accuracy, while (c) providing her guarantee that app cannot monitor other attributes that are private to Alice. We address the problem of utility aware obfuscation: design an obfuscation mech- anism for sensor data to minimize the leakage of private information while preserving the functionality of the existing apps. Building upon prior work [62, 72], we formulate this problem as a game between an obfuscator and an attacker, and propose OLYMPUS, a privacy framework that uses generative adversarial networks (GAN) [67] to solve the problem. Unlike prior work, where utility is formulated using a closed form mathematical expression, OLYMPUS tunes the obfuscator to preserve the functionality of target apps.

79 This ensures that the obfuscated data works well with the existing apps without any mod- ifications to the apps.

In OLYMPUS, a user can specify utility using one or more apps whose functional- ity must be preserved. A user can specify privacy using a set of labeled examples on data collected from the sensors. Given this training data and access to the target app(s),

OLYMPUS learns an obfuscation mechanism that jointly minimizes both privacy and util- ity losses. Moreover, if the private properties are correlated with the objective of the app’s

ML model, OLYMPUS allows the user to tradeoff between the privacy and utility losses

(since they can no longer be simultaneously minimized). At runtime, OLYMPUS uses the learned obfuscation mechanism to interact with the unmodified target apps. Continuing our driver safety example, given access to the app and a few example im-

ages of people driving labeled by their identity, OLYMPUS learns to obfuscate the images such that it hides the identity of drivers while allowing the app to detect distracted driving. We make the following contributions in this work:

• We design OLYMPUS, a privacy framework that solves the utility aware obfuscation problem wherein inputs to a machine learning model are obfuscated to minimize the private information disclosed to the model and the accuracy loss in the model’s output.

• To the best of our knowledge we present the first proof-of-concept implementation of a utility aware obfuscation mechanism, deploy it on a smartphone, and evaluate it against a real-world mobile app. Our evaluation on a real-world handwriting recog-

nition app shows that OLYMPUS allows apps to run unmodified and limits exposure of private information by obfuscating the raw sensor inputs.

• OLYMPUS allows users to tradeoff privacy and utility, when the two requirements

are at odds with one another. We empirically demonstrate that OLYMPUS provides

80 better privacy-utility tradeoffs than other competing techniques of image obfusca- tion.

• OLYMPUS works across different data modalities. On image analysis tasks, we em-

pirically show that OLYMPUS ensures strong privacy: the accuracy of an attacker (simulated as another ML model) trained to learn the private attribute in the ob- fuscated data was only 5% more than the accuracy of a random classifier (perfect privacy). For example, on a distracted driver detection dataset (StateFarm [28]) the attacker’s accuracy of identifying drivers were 100%, 15.3% and 10% on unper-

turbed images, images perturbed by OLYMPUS and images perturbed by an ideal

obfuscation mechanism, respectively. On the other hand, OLYMPUS suffers only a small loss in accuracy (<17%) across all image datasets. On motion sensor data, the accuracy of an attacker is slightly better, but no more than 13% higher than the accuracy of a random classifier, but with no loss in accuracy of the machine learning task.

• We empirically show that OLYMPUS supports multiple target applications with a

single obfuscation mechanism. We also demonstrate that OLYMPUS supports apps with different kinds of classifiers, namely, DNN and logistic regression.

The rest of the chapter is organized as follows. Section 5.2 provides an overview of

OLYMPUS along with its key design principles. Section 5.3 describes the methodology of learning the utility aware obfuscation. Section 5.4 provides the implementation details of training and deploying OLYMPUS on a smartphone to obfuscate data in real-time. We eval- uate OLYMPUS on an Android app as well as on various real-world datasets, and present our results in Section 5.5. Finally, we summarize related work in Section 5.6 and conclude in Section 5.7.

81 5.2 OLYMPUS Overview

In this section, we begin with a description of the utility aware obfuscation problem. Then,

we outline key design principles of OLYMPUS. Finally, we give an overview of our privacy framework.

5.2.1 Problem setting

Our goal is to design a utility aware obfuscation: given a set of ML models U that take as input x, and a specification of private attributes in the input S, construct an obfuscation mechanism M such that the privacy loss and the accuracy loss are jointly minimized.

We achieve this by developing a privacy framework – OLYMPUS, inspired by the idea of adversarial games [67]. In an adversarial game, two players compete against each other with conflicting goals, until they reach an equilibrium where no player can improve further

towards their respective goals. Similarly, OLYMPUS constructs the obfuscation mechanism by competing against an attacker whose goal is to break the obfuscation.

Trust model

We assume third party apps are honest-but-curious meaning they follow the protocol but may infer private information from the available data. We also assume that the device platform is trusted and properly isolated from the untrusted third party apps running on the devices. OLYMPUS runs as a part of the trusted platform on the device and intercepts all sensor data for obfuscation. The third party apps can not directly access the raw sensor data and have access to only obfuscated data.

Privacy goals

Our privacy goal is to prevent curious third party apps from inferring private user infor- mation in the captured sensor data. To this end, we model the attacker as an ML adver- sary who has a complete access to the obfuscation mechanism. Thus, it can train on the

82 obfuscated data (generated by OLYMPUS) to undo the obfuscation. On the other hand,

OLYMPUS learns the obfuscation mechanism by competing against such ML attackers. Rather than considering all powerful attackers, we limit the ML attacker to be one from a known hypothesis class (e.g. a DNN with a known structure). We believe this is a reasonable assumption since (a) state-of-the-art attacks on obfuscation methods are performed using DNN [89, 94], and (b) being a universal approximator [110], DNN is successfully used to learn many complex functions, making it an apt tool to model a strong adversary. In Section 5.3.1, we mathematically model attackers and formalize our privacy

goals as an optimization problem. In Section 5.3.2, we describe how OLYMPUS achieves these privacy goals by solving the optimization problem. To verify how well OLYMPUS achieves the proposed privacy goals, we empirically evaluate OLYMPUS against a suit of attackers, namely DNN, logistic regression, random forest and SVM (Section 5.5.5).

5.2.2 Design principles

A trivial solution to protect user data would be to run the classifier on a user’s device which would not require apps to send raw sensor data to the cloud. There are several issues with this approach. First, the output of the classifier may itself leak information that a user may deem private. Second, it requires app developers to modify the app so that it can run the ML task on a user’s device. This may be infeasible due to proprietary reasons or the limited availability of resources on a mobile device. To address above concerns we follow

the following important principles in designing OLYMPUS.

Compatible with existing apps

Our primary objective is to develop an obfuscation mechanism that would let apps run un- modified. Most of the previous approaches of data obfuscation do not consider the target apps in designing the mechanism [62, 69, 103]. This could lead to the obfuscated data being incompatible with the app, or cause a significant loss in the app’s utility. For in-

83 stance, we show that the standard approaches of protecting visual secrets such as blurring, perform poorly against image classification. To alleviate this issue, we allow users to specify their utility requirements in the form of apps. The onus is on the obfuscation mechanism to ensure that the specified apps work well with the obfuscated data. Specifying the utility requirements in terms of apps is not only easy and intuitive but also provides a natural way to quantify utility guarantees in terms of the accuracy of the specified apps on the obfuscated data. Moreover, we design

OLYMPUS such that a single obfuscation mechanism is sufficient for multiple apps.

Privacy-utility tradeoff

In certain scenarios, the app-essential information is tightly coupled with users’ private information. A trivial example is inferring the driver’s activity reveals the fact that she is in the car. Thus, often it is impossible to achieve complete privacy without compromising the functionality of the app and vice-a-versa. We achieve this crucial balance between privacy and utility through modeling the obfuscation problem as a minimax optimization.

OLYMPUS allows users to control the privacy-utility tradeoff by specifying a parameter (λ) that governs the relative importance of privacy and utility.

Holistic obfuscation

In many cases, hiding only private information may not be enough to protect users’ pri- vacy. For instance, researchers have shown that blurring faces is not enough to conceal the identity of a person since it is correlated with seemingly innocuous attributes such as clothing or environment [89, 94]. Hence, instead of explicitly detecting and removing the private attribute (e.g. face), we take a holistic approach to obfuscation. By formulating the obfuscation problem as an adversarial game, we aim to minimize the information leakage caused by any such correlation in the data.

84 FIGURE 5.1: OLYMPUS framework

Data agnostic framework

Many methods of protecting secrets are specifically designed for certain types of private attribute [41, 79, 82] or data types [42, 75, 77]. We aim to provide a general privacy framework that is agnostic to the private attribute as well as the data types. OLYMPUS comprises of multiple DNNs that are responsible for manipulating sensor data. Thus,

one can use OLYMPUS for any data type by plugging-in an appropriate DNN. By simply

changing the underlying networks, we show that OLYMPUS can be used for protecting private information in images and in motion sensor data.

5.2.3 Privacy framework

As outlined in Figure 5.1, OLYMPUS has two phases – an offline phase to learn the obfus- cation mechanism, and an online phase where it uses the learned mechanism to obfuscate

the sensor data. Before describing these phases, we describe the modules of OLYMPUS.

Modules

OLYMPUS consists of three modules – 1) an App that simulates the target app, 2) an At- tacker that attempts to break the obfuscation, and 3) an Obfuscator that learns to obfuscate

85 data to protect private information without affecting the functionality of the target app.

App: The App module simulates the target app to verify that the obfuscated data preserve the functionality of the app, i.e., satisfy the utility requirements. The target app can be any classifier that takes sensor data as input and outputs a classification probability for each class. The inputs to the App module are the obfuscated data and the associated utility labels. The utility labels are the true class labels of the obfuscated data and are used in computing the utility loss. Informally, the utility loss is a classification loss that captures the performance of the target app on the obfuscated data. The App module in the driver safety example uses obfuscated driver images and their activity labels as inputs. It uses the target app to get the classification probability for each driver image and computes the corresponding utility loss using the true activity labels.

Attacker: The Attacker module simulates an adversary that attacks the obfuscation mech- anism learned by the Obfuscator. In other words, it verifies that the obfuscated data hide private attributes, i.e., satisfy the privacy requirements. The inputs to the Attacker are a set of obfuscated images and their privacy labels, where a privacy label specifies the value (class) of the private attribute. Using these inputs, the Attacker trains a DNN to classify obfuscated images into correct privacy labels, and outputs the privacy loss. Intuitively, the privacy loss measures how well the Attacker classifies obfuscated images, i.e., infers the private attributes. In our driver safety scenario, given obfuscated images together with their privacy labels (identity of drivers), the Attacker learns to identify drivers in the obfuscated images.

Obfuscator: The Obfuscator module learns a transformation of the data such that the ob- fuscated data do not contain any private information but preserve useful information. The corresponding privacy and utility losses are estimated using the Attacker and the App mod- ules, respectively. To satisfy the privacy and utility requirements, the Obfuscator learns a

86 transformation function that minimizes both the privacy and utility losses. The design of the Obfuscator follows the architecture of an autoencoder [115]. It con- sists of a sequence of layers, where both input and output layers are of the same size, while the middle layer is much smaller in size. The idea is to encode the input to this compact middle layer and then decode it back to the original size in the output layer. This encoding and decoding process is learned by minimizing the privacy and utility losses of the output (obfuscated data). The smaller size of the middle layer forces the Obfuscator to throw away unnecessary information while preserving the information required to reconstruct the output. Essentially, the privacy loss forces the Obfuscator to throw away private in- formation, while the utility loss forces it to preserve the information required by the target app. In the driver safety use case, the Obfuscator obfuscates the input image such that the features required to identify drivers are removed, but the features important for activity detection are preserved.

Offline phase

In the offline phase, OLYMPUS learns an obfuscation mechanism that minimizes informa- tion leakage while maintaining the utility of the applications. The user specifies privacy and utility requirements by providing training data that includes examples with their re- spective privacy and utility labels. It is crucial to protect the training data as it contains private information. Note that the training data is only used during an offline phase and it never leaves the OLYMPUS system. Thus, we require the offline phase to be executed in a secure environment, either on the device or on the trusted server. Moreover, the target app always receives the obfuscated data, even during the training phase. Given the training data, the offline phase learns an obfuscation function M : X → X such that M(·) satisfies given privacy and utility requirements. Here, X refers to the domain of the data. In the next section, we provide an algorithm to learn M.

87 Online phase

During the online phase, OLYMPUS simply uses the learned obfuscation M to obfuscate

the data. In particular, when an app requests sensor data, OLYMPUS intercepts the request and obfuscates the data using M before sending it to the app.

5.3 Utility Aware Obfuscation

In this section, we mathematically formulate the problem of designing an obfuscation function M that satisfies given privacy and utility requirements, and describe how OLYM-

PUS learns M using adversarial games.

5.3.1 Problem formulation

Let X be the domain of the data that need to be obfuscated (e.g., images, sensor readings, etc.). Let s denote a private attribute associated with each x ∈ X (e.g., race, gender, identity, etc.). We use the notation x.s = z to denote the fact that the private attribute associated with x takes the value z. Here, z is a privacy label of x and it takes a value

from a set Zs which we denote as privacy classes. For instance, if s is gender then Zs = {male, female}, and x.s = female implies that the gender of x is female. Similarly, u denotes a utility attribute associated with each x ∈ X (e.g., activity, ex- pression, etc.). Again, we denote by x.u = y the fact that the utility attribute associated with x takes the value y. Here, y is a utility label of x and it takes a value from a set

Yu which we denote as utility classes. For example, if u is activity then Yu = {walking, running, ... }, and x.u = walking implies that the activity associated with x is walking.

n Let Dus = {(xi, yi, zi)}i=1 be a training dataset consisting of n examples drawn i.i.d.

from a joint distribution Pus over random variables (X , Yu, Zs). Here, yi ∈ Yu and zi ∈ Zs

are the corresponding utility and privacy labels of xi ∈ X. Let M : X → X denote a deterministic obfuscation function and H denote the hypothesis space of all obfuscation

88 functions. Next, we formalize the privacy and utility requirements of M.

Privacy requirements

The privacy requirement is to protect the private attributes in the obfuscated data. As mentioned earlier in Section 5.2.1, the adversary is an ML model that learns to identify private attributes in the obfuscated data. The perfect privacy (i.e., the ideal obfuscation) is achieved when the attacker cannot perform better than randomly guessing the privacy

labels, i.e., when the attacker’s probability of predicting the correct privacy label is 1/|Zs| for the private attribute s. Thus, we measure privacy loss in terms of how well the attacker performs over random guessing.

|Zs| Let Cs : X → [0, 1] be an attacker (classifier) that predicts the privacy label z = x.s

given M(x) as input. The output of Cs is a probability distribution ps over privacy labels

Zs, where ps(M(x))[zi] is the predicted probability of M(x) having privacy label zi, i.e.,

P r(z = zi|M(x)). For a given s, the privacy loss of the mechanism M with respect to an

attacker Cs can be measured using the cross entropy loss as follows.

n 1 X X 1 LP (M, Cs) = − log[ps(M(xi))[z]] (5.1) n |Zs| i=1 z∈Zs

The above privacy loss essentially measures the difference between two probability dis- tributions, namely a uniform distribution (random guessing) and the distribution predicted by the attacker. The privacy loss increases as the probability of predicting the correct pri- vacy label diverges from the uniform distribution. Thus, minimizing the above privacy loss ensures that the adversary cannot perform better than random guessing.

Utility requirements

The utility requirement is to preserve the utility attributes in the obfuscated data that are

|Yu| required by the target apps. Let Cu : X → [0, 1] represent an app classifier that pre- dicts the utility label y = x.u given x as input. Given an obfuscated input M(x), the

89 output of Cu is a probability distribution pu over utility labels Yu, where pu(M(x))[yi]

is the predicted probability of M(x) having utility label yi, i.e., P r(y = yi|M(x)). To measure the impact of the obfuscation on the performance of the classifier, we measure the classification error using the following utility loss function.

n 1 X LU(M, C ) = − log[p (M(x ))[y ]] (5.2) u n u i i i=1

Here, yi is the true utility label of xi. The above utility loss is the cross entropy loss that is 0 when the classifier correctly predicts the utility labels, and increases as the classification error increases. Thus, minimizing the above utility loss ensures that the functionality of the target app is preserved.

Obfuscation mechanism

Our aim is to design an obfuscation mechanism M that 1) hides the private attribute (s) of the data and 2) preserves the utility attribute (u) required by the target app. This can be achieved by designing M that minimizes both the privacy loss (LP) and the utility loss (LU) defined above. Minimizing the utility loss ensures that the obfuscated data works well with the target app’s classifier Cu. However, minimizing the privacy loss only protects the private attribute against a particular attacker Cs. To ensure that the obfuscation mechanism protects against a class of attackers (e.g., DNN, SVM, etc.), we minimize the maximum privacy loss across all attackers belonging to a particular class. Thus, the optimal obfuscation mechanism is a solution to the below minimax optimization problem.

arg min λLU(M, Cu) + (1 − λ)[ max LP (M, Cs)] (5.3) M∈H Cs∈Hs

Here, Hs is a hypothesis space of attacker models. For example, in case of an SVM

adversary, Hs is a set of all hyperplanes and Cs ∈ Hs is a particular hyperplane learned

by SVM. Even though the size of Hs is usually very large, we show that the optimal

90 attacker (one with the maximum privacy loss) can be computed using various optimization techniques such as gradient descent. We control the privacy-utility tradeoff using a hyper-parameter λ ∈ [0, 1]. If we set λ = 0, then the utility loss is ignored, and the optimal mechanism achieves uniform conditional distribution, ensuring no information about the private attribute are leaked. An example of such an obfuscation mechanism is a constant function, i.e., M(x) = c, ∀x ∈ X, where c is some constant. On the other hand, if we set λ = 1, then the privacy loss is completely ignored, and the optimal obfuscation mechanism is the identity function, i.e., M(x) = x, ∀x ∈ X. In Section 5.5.6, we show that by setting an appropriate value of λ, we can achieve the desired trade-off between privacy and utility. Next, we extend our formulation to design an obfuscation mechanism that protects multiple private attributes against multiple target apps. Let S denote a set of private at- tributes that user wants to protect and U denote a set of utility attributes required by the target apps. To design an obfuscation mechanism that protects all the private attributes in S while preserving all the utility attributes in U, we modify the above optimization function as follows.

arg min λ[max LU(M, Cu)]+ M∈H u∈U (5.4) (1 − λ)[max max LP (M, Cs)] s∈S Cs∈Hs

Here, we measure the max privacy loss across all the private attributes (maxs∈S) to minimize the maximum expected information leakage across all the private attributes. One can easily replace the max with a sum (average) to minimize the total (average) privacy loss across all the private attributes. Similarly, one can also consider the total/average utility loss instead of the max utility loss. Next, we present an iterative learning algorithm to solve the above optimization problem.

91 Algorithm 1 Obfuscation learning

Require: D, Pu(·, θu), Ps(·, θs), M(·, θM ), λ Ensure: Optimal obfuscation mechanism M∗ Randomly initialize θs and θM for number of training iterations do for each minibatch (X,Y,Z) ∈ D do . Train Attacker Generate obfuscated data M(X, θM ) Update θs using equation (5.5) . Train Obfuscator Update θM using equation (5.6) end for end for ∗ M = M(·, θM )

5.3.2 Learning to obfuscate

So far, we have formulated the problem of designing an obfuscation mechanism as a min- imax optimization. In order to solve this optimization, we use the concept of adversarial nets [67], originally proposed for learning an unknown data distribution using samples from that distribution. The problem of learning the distribution was formalized using a minimax optimization which was solved via training a pair of networks in an adversarial fashion. We adapt this training approach to learn the optimal obfuscation mechanism M∗. The method for learning the obfuscation mechanism is outlined in Algorithm 1. For simplicity, we describe the learning algorithm for the case where a user wants to protect a single private attribute against one target app. It can be easily extended to support multiple private attributes with multiple applications. Let M denote the Obfuscator network that takes as input x ∈ X and outputs the obfus- cated data M(x, θM ), where θM is the parameter of the Obfuscator network. Let Ps denote the Attacker network that learns to predict the privacy labels from the obfuscated data. It takes as input the obfuscated data and outputs the probability distribution ps(M(x, θM ), θs) over privacy labels Zs, where θs is the parameter of the Attacker network. Similarly, Pu denotes the App network that takes as input the obfuscated data and outputs the probability distribution pu(M(x, θM ), θu) over utility labels Yu, where θu is the parameter of the App network.

92 OLYMPUS uses an iterative optimization algorithm to jointly train the Obfuscator and the Attacker networks as follows1. On each iteration, we alternate the training between the Attacker and the Obfuscator. First, we generate obfuscated data using the Obfuscator and train the Attacker to classify the obfuscated data into correct private labels. Formally, we train the Attacker network using the following objective function.

n 1 X arg max − log[ps(M(xi, θM ), θs)[zi]] (5.5) θ n s i=1

Here, zi is the true privacy label of xi. The above loss captures the attacker’s ability to infer private attributes in the obfuscated data. It finds the attacker with a highest success rate in breaking the obfuscation mechanism. Note that maximizing the privacy loss (LP) defined in Equation (5.1) may have multiple worst case attackers. By maximizing Equa- tion (5.5), we pick the worst case attacker that not only outputs a probability distribution that is far from the uniform distribution, but also correctly predicts the private attributes. Next, we train the Obfuscator to minimize the privacy and utility losses computed using the Attacker and the App networks, respectively. Formally, the Obfuscator is trained via the following objective function.

n 1 X arg min −λ log[pu(M(xi, θM ), θu)[yi]] θ n M i=1 (5.6) n 1 X X 1 −(1 − λ) log[ps(M(xi, θM ), θs)[z]] n |Zs| i=1 z∈Zs

Here, the first term corresponds to the utility loss (LU) and the second term corresponds to the privacy loss (LP). We train both the networks until they reach an equilibrium, i.e., when the Attacker cannot perform better than random guessing. Since we are alternating the training between the Attacker and the Obfuscator, together equations (5.5) and (5.6)

1 We do not train the App network (i.e., learn θu) as it simulates the pretrained classifier of the target app.

93 (a) Obfuscator (b) Attacker

FIGURE 5.2: OLYMPUS architecture for image data. give a solution to the minimax optimization problem defined in Equation (5.3). Thus, at

∗ the equilibrium, the final Obfuscator M(·, θM ) gives the optimal obfuscation mechanism M∗.

5.4 Implementation

We developed a prototype implementation of OLYMPUS, deployed it on a Nexus 9 tablet, and evaluated on an Android app. The implementation of OLYMPUS consists of three steps – 1) instantiating its underlying modules, 2) training the Obfuscator (offline phase), and 3) deploying the Obfuscator on device (online phase).

5.4.1 Constructing OLYMPUS

Constructing OLYMPUS involves materializing its underlying modules – App, Attacker and

Obfuscator. The App module uses the classifier from the target app. OLYMPUS supports any app as long as its classifier (a) outputs probability scores, (b) uses known loss function that is continuous and differentiable, and (c) uses raw sensor data as inputs. Any app that uses DNN for classification is supported by OLYMPUS since all of the above assumptions are true for DNN. The trend in ML is that most classifiers are gravitating towards DNNs because they achieve higher accuracy, and jointly perform feature computation and clas- sification. In case of traditional classifiers, many of them satisfy assumptions (a) and (b) (e.g. Logistic Regression which we use in our experiments). Assumption (c) is not nec- essarily true as some classifiers involve arbitrarily complex feature computation steps that

94 (a) Obfuscator (b) Attacker

FIGURE 5.3: OLYMPUS architecture for motion sensor data. are hard to reason about. However, again, the trend is to use the last layers of standard

DNNs to compute features, which can be readily integrated with OLYMPUS. Both Attacker and Obfuscator are constructed using DNNs. The architecture of these networks depends on the type of the sensor data. For images, we use convolutional neural network (CNN) [84], while for motion sensor data we use DNN with fully connected layers. Figure 5.2 and 5.3 outline the architecture of the underlying networks of OLYMPUS for image and motion sensor data, respectively.

5.4.2 Training OLYMPUS

While training, OLYMPUS requires access to an app’s classifier to ensure that the perfor- mance of the classifier is not hindered by the obfuscation. The exact process of accessing the classifier varies based on the target app. For instance, many apps use TensorFlow [40] for classification tasks such as activity and speech recognition [22, 27]. Using TensorFlow, an app developer can either train a custom classifier or use a pre-trained classifier from a public repository of models [32]. The trained classifier can either be embedded in the app for on-device classification or it can be hosted on the cloud and accessed via APIs. If the classifier is embedded in the app, it can be easily extracted and used by OLYMPUS during the training phase. On the other hand, for cloud-based classification, OLYMPUS can use the APIs to query the classifier to retrieve class probabilities. As mentioned before (Section 5.2.1), our attacker model assumes that the third-party

95 apps are honest-but-curious and they follow the protocol to ensure OLYMPUS learns the optimal obfuscation mechanism. More specifically, the target app must provide correct classification scores for the obfuscated data during the training phase. We argue that it

is in the interest of the app to cooperate with OLYMPUS, otherwise it may impact the functionality of the app due to incorrect estimation of the utility loss. Note that the privacy is enforced via an independent attacker (not controlled by the app) making it harder for an uncooperative app to affect privacy guarantees. Learning an optimal obfuscation under adversarial setting (where the target app provides malicious labels) is an interesting future work.

OLYMPUS also requires appropriate training data to train the Obfuscator. The training data consists of several examples with their utility and privacy labels. For standard re- quirements (such as protecting identity), one can easily use publicly available benchmark dataset (e.g. celebA [86]) that fits the criteria. In cases when this is not feasible, the user needs to capture the dataset with appropriate labels. One can think of a data collection app that helps user collect appropriate data for training OLYMPUS. For instance, we de- veloped a mobile app that allows users to collect training data for training OLYMPUS to hide identity of the writer while preserving the ability to recognize the handwritten text. More details about the app and the case study of the handwriting recognition app are given in Section 5.5.1. Given the appropriate training data and an access to the app’s classifier, we train the Obfuscator using Algorithm 1 as described in Section 5.3.2. In our driver safety use case, the training data consists of several images of drivers with their activities (utility labels) and identities (privacy labels). Using these training data and the app, OLYMPUS learns the obfuscation that hides driver’s identity by minimizing the attacker’s ability to identify drivers in the obfuscated images. At the same time, it ensures that the obfuscated images preserve the activity by maximizing the app’s accuracy of identifying the activity in the obfuscated images. To improve usability, we envision a repository of pretrained

96 FIGURE 5.4: An illustration of how OLYMPUS intercepts and obfuscates images requested by the target app Classify.

obfuscators that users can simply use to protect their privacy against various third party apps.

5.4.3 Deploying OLYMPUS

OLYMPUS intercepts and obfuscates sensor data before they are received by the target app. This is achieved by instrumenting Android OS using Xposed [35], an open source framework that provides means to alter the functionality of an Android app without modi- fying the app. Xposed provides APIs to intercept method calls, modify method arguments, change the return values, or replace the method with custom code. Using these APIs, we built an Xposed module to intercept an app’s request to sensor data and apply obfuscation on-the-fly before the requested data reach the app. Since Android provides standard APIs to access the sensor data, one can easily hook those API calls to apply the obfuscation. Consider an Android app – Classify that allows a user to select an image from the Gallery and finds objects in the selected image using a classifier. It launches the Gallery app by invoking the startActivityForResult method for the user to select an image. Upon selection, the Gallery app wraps the selected image in a special class called Intent and returns it to the Classify app via the onActivityResult method. The Classify app then extracts the image from the received Intent and sends it to the classifier to detect objects.

97 Figure 5.4 demonstrates how OLYMPUS obfuscates images requested by the Classify

app. Since the app gets image data through the onActivityResult method, OLYMPUS hooks this method using the APIs provided by Xposed. The installed hook intercepts every call to the onActivityResult method, runs the Obfuscator on the image selected by the user and returns the resulting obfuscated image to the Classify app. Thus, the Classify app always receives an obfuscated image. Similarly, one can apply appropriate hooks to obfuscate the sensor data based on the target app. Note that this approach does not require any modifications to the app. However, we need to install appropriate hooks for each target app. We can easily avoid this by using

OLYMPUS as a plugin in DALF. Essentially, the OLYMPUS plugin uses the DALF API to intercept the camera frames and returns the frames obfuscated by the learned obfuscation

mechanism. DALF then delivers the obfuscated frames to the target apps.

5.5 Experiments

In this section, we evaluate OLYMPUS and compare it with the existing approaches of pro- tecting secrets. Through rigorous empirical evaluation, we seek answers to the following questions on an Android app (Qns 1-3) as well as on benchmark datasets (Qns 1-6). 1. What is the impact of obfuscation on the functionality of the target app? 2. How well does the obfuscation protect private information? 3. What is the overhead of obfuscating sensor data? 4. How well does the obfuscation mechanism tradeoff privacy for utility compared to existing approaches? 5. How well does the obfuscation mechanism scale with multiple applications? 6. How well does the obfuscation mechanism perform against different kinds of app classifiers?

98 Table 5.1: Summary of benchmark datasets used for evaluating OLYMPUS. Private #Utility #Privacy Training Obfuscation Dataset Data Type Target App Information Classes Classes time (s) time (ms) KTH Image Action recognition Identity of people 6 6 862 0.15 StateFarm Image Distracted driving detection Identity of people 10 10 607 0.11 CIFAR10 Image Object recognition Face 10 2 3384 0.04 HAR Inertial sensors Action recognition Identity of people 6 30 173 0.05 OPPORTUNITY Motion sensors Action recognition Identity of people 5 4 7222 0.03

5.5.1 Experimental setup

In this section, we describe the target app, benchmark datasets, and metrics we used to

evaluate OLYMPUS.

Android app case study: Handwriting Recognition

Consider a following use case: Alice wants to use a mobile app that transcribes text as she writes on the device. However, she does not want to reveal her handwriting style to the app, as that could reveal private attributes like her identity [123] or personality [113].

Hence, she wants to use OLYMPUS to obfuscate the app’s input.

Motivated by this scenario, we evaluate OLYMPUS on a handwritten digit recognition app – DL4Mobile [14] downloaded from the Google Play Store. DL4Mobile allows users to draw a digit between 0 to 9 and recognizes it using a DNN. Our goal is to learn an obfuscation mechanism that protects the writer’s identity while allowing DL4Mobile to correctly classify the written digits. To obtain training data with utility labels (digit) and privacy labels (user identity), we developed an Android app to collect images of handwrit- ten digits. Using this app, we collected data from two users with 30 images per person per digit. We did not collect any personal information about the users. We consider two variants of DL4Mobile, the first performs on-device classification and the other performs classification on the cloud. The original app uses an embedded DNN to perform on-device classification using the TensorFlow library [31]. We modified the original app to create a variant that uses the same DNN to perform classification on the cloud using the Google Cloud API [17]2. Since in both the variants the target classifier is

2 The source code of DL4Mobile is available here – https://github.com/nalsil/

99 the same, we train a single obfuscation mechanism for both. We use the DNN architectures described in Section 5.4.1 for the Attacker and the Obfuscator networks.

We evaluate OLYMPUS’s obfuscator on a 10% sample of the examples (held out as a test set) through the DL4Mobile app instrumented using Xposed. As explained in Sec- tion 5.4.3, we install hooks in both variants of the DL4Mobile app to intercept the original digit image and replace it by the corresponding obfuscated image via the Obfuscator.

Benchmark datasets

We also evaluate OLYMPUS on three image datasets and two motion sensor datasets. Next, we describe each dataset in detail. KTH: KTH [108] is a video database for action recognition. It contains six different actions performed by 25 subjects under four different scenarios. The actions are walking, jogging, running, boxing, hand-waving and hand-clapping. Our goal is to protect the identity of the subjects performing the actions while allowing the target app to recognize the actions correctly. The videos were recorded at 25fps with 160x120 resolution. We uniformly sampled 50 frames from each video of six randomly selected subjects for the evaluation. As a preprocessing step, we scaled all the extracted frames to 80x60 and converted them to grayscale. StateFarm: StateFarm [28] is an image dataset used in a Kaggle competition for de- tecting distracted drivers. It has images of drivers performing various activities in the car. In total, there are 10 different activities performed by the driver – safe driving, texting (right), texting (left), talking on phone (right), talking on phone (left), operating radio, drinking, reaching behind, hair/makeup and talking to passenger. Motivated by our driver safety example, our goal is to protect the identity of drivers while allowing the target app to infer driver activities. For our experiments, we use images of 10 randomly selected drivers. Each image was of size 224 x 224 which was scaled down to 56x56 and converted

TensorflowSimApp.

100 to grayscale as a preprocessing step. CIFAR10: CIFAR10 [83] is a popular object detection dataset. It consists of 32 × 32 color images from 10 categories, each having 6000 images. For the private attribute, we added faces from the ATT face dataset [13]. A random face from the ATT face dataset was added to about half of the randomly selected images from the dataset. Each face is added at a random location in the image such that the entire face is visible. The original face images were of size 92x112 pixels which we scaled down to 10x10 before adding. Our goal is to obfuscate the image such that the target app can classify obfuscated images into one of the 10 object categories correctly, while the adversary cannot infer whether there exist a face in the image. We preprocess all the images by converting them to grayscale. HAR: HAR [44] is a human activity recognition dataset containing readings from the accelerometer and gyroscope embedded in a smartphone. The readings involved 30 users performing six activities – walking, walking-upstairs, walking-downstairs, sitting, stand- ing and lying. A 561 dimensional feature vector with time and frequency domain variables is computed using a sliding window method. Each feature vector has an associated subject id and the activity label performed by the subject at that time. Our goal is to obfuscate the feature vector to protect the identity of the subject while allowing the target app to infer activities. OPPORTUNITY: OPPORTUNITY [51] is also a human activity recognition dataset. It contains 242 attributes from wearable, object, and ambient sensors. The sensor readings were recorded using four subjects performing various activities. In our experiments, we use data pertaining to following locomotion activities – stand, walk, sit, lie, and null, where null represents a transition between activities or other activities. The dataset contains six independent runs of daily activities per user. We ran our experiment on randomly sampled two runs across all users. As in the case of HAR, our goal is to protect the identity of individuals, while allowing the target app to infer the locomotion activities from the obfuscated sensor data.

101 Table 5.1 gives the summary of all the datasets we used to evaluate OLYMPUS along with their privacy and utility requirements. For each dataset, we use OLYMPUS to learn an obfuscation mechanism that protects respective private information (e.g. faces, identity of people, etc.) while preserving the ability to perform a specified classification task (e.g. object recognition, action recognition, etc.). We construct OLYMPUS based on the type of the dataset as explained in Section 5.4.1. We use the same architecture across datasets except minor modifications required for handling the data (e.g. input size). The complete specification of all the networks is available in Appendix A.1. For each dataset, we simulate the target app by training a DNN to classify unperturbed data into corresponding utility labels. We randomly split each dataset into train, validation

and test set, with 80%, 10% and 10% splits, respectively. We train OLYMPUS for 100 iter- ations (epochs) using the training set, use the validation set to choose the best obfuscation model, and evaluate it using the test set. All the networks are implemented in Keras [55] and are trained on a NVIDIA Tesla K80 GPU. For training, we used the Adam optimizer [81] with a learning rate of 0.001 and fixed the value of λ to 0.5. All reported results are averaged over 10 independent runs.

Evaluation metrics

We used the following metrics to evaluate OLYMPUS. Privacy: The privacy is measured in terms of the accuracy of an attacker who attempts to infer private information in the obfuscated data. It is a common practice to provide privacy guarantees relative to perfect privacy. In our case, the perfect privacy is achieved when the attacker cannot perform better than random guessing. The accuracy of random

1 guessing is # privacy classes , which differs for different applications. So we decided to measure privacy loss in terms of how much better an adversary is compared to random guessing.

102 Table 5.2: Evaluation results on DL4Mobile. Execution Time (ms) Utility Privacy On-Device On-Cloud Unperturbed 0.94 0.32 36.2 338.3 Images (±0.009) (± 0.04) (± 0.9) (± 62.14) Perturbed 0.93 0.01 44.48 346.6 Images (±0.04) (±0.02) (±0.9) (±60.1)

Given a set of privacy classes Z, we define the attacker’s score as follows.

# samples correctly classified by the attacker 1 − (5.7) total number of samples |Z|

1 The range of the attacker’s score is from 0 (perfect privacy) to (1 − |Z| ) (no privacy). Utility: We measure the utility of our obfuscation mechanism in terms of the classifi- cation accuracy of the target app. The utility score is defined as follows.

# samples correctly classified by the app utility score = (5.8) total number of samples

Hence, the utility score ranges from 0 (no utility) to 1 (highest utility). Unlike privacy, relative utility score (app’s accuracy on unperturbed data - accuracy on perturbed data) is not a suitable metric in our case as it may lead to a negative utility score, making it harder to explain the results. Sometimes OLYMPUS learns better features which results in higher accuracy on the perturbed data than on the unperturbed data. In particular, we observed this phenomenon when the app’s classifier is a weaker model (for example see Table 5.5). Overhead: We evaluate the efficiency of our mechanism by reporting the time to ob- fuscate the sensor data.

5.5.2 Evaluation on android app

We evaluate OLYMPUS on DL4Mobile using a Nexus 9 tablet running Android 7.1 (Nougat) and Xposed version 88. We compare our results by running DL4Mobile on an unmodified

103 Table 5.3: Example images obfuscated by OLYMPUS. Digits User 0 1 2 3 4 5 6 7 8 9 Unperturbed User A Images User B Perturbed User A Images User B

Android OS. Table 5.2 summarizes the results of evaluating DL4Mobile without OLYM-

PUS and with OLYMPUS.

Without OLYMPUS, the adversary’s accuracy of correctly predicting user’s identity from the handwritten digits is 82% (64% improvement over random guessing) which shows that handwriting contains unique patterns that can be exploited to identify users.

With OLYMPUS, the adversary’s accuracy of identifying user’s identity drops to 51% which is close to random guessing (50%) as there are only two users in the dataset. On the other hand, OLYMPUS incurs a minor drop (1%) in digit classification accuracy compared to the accuracy on the unperturbed images. These results show that OLYMPUS successfully protects users’ identity without affecting the functionality of DL4Mobile. The high privacy and high utility come at a small cost of obfuscating the image. The average time to train the Obfuscator was 718 seconds on a GPU. The mean time to ob- fuscate an image on Nexus 9 was 8.28 ms resulting in 22.9% and 2.4% overhead when classifying images on-device and on the cloud, respectively. Table 5.3 shows a randomly selected digit sequence from each user with the corre- sponding obfuscation. In many cases, the same digit is drawn differently by each user which makes it easy for the adversary to identify a user based on her digits. However, the obfuscated images of the same digit across different users look very similar, making it harder for the adversary to identify the user. On the other hand, the obfuscated images across different digits are different, allowing the classifier to easily distinguish different

104 (a) KTH (b) StateFarm (c) CIFAR10

(d) HAR (e) OPPORTUNITY

FIGURE 5.5: Accuracy of App (in blue) and Attacker (in red) networks on obfuscated data while training the Obfuscator. digits. Even though the obfuscated images are visually very different from the unper- turbed images, the classifier (trained only on unperturbed images) can still classify them with high accuracy. Note that we do not re-train the DL4Mobile classifier to recognize dig- its in the obfuscated images. Instead, the Obfuscator learns to obfuscate images such that it preserves features required by the DL4Mobile classifier for recognizing digits. Thus, the target app is able to complete the classification task with high accuracy on the obfuscated image without retraining.

5.5.3 Evaluation on benchmark datasets

Figure 5.5 shows the accuracy of the App and the Attacker networks while training the Obfuscator. In the initial stage of learning, the output of the Obfuscator is somewhat ran- dom due to random initialization of network parameters. Hence, the accuracy of both the networks are low. As the training proceeds, the Obfuscator learns to preserve informa- tion that is useful for the target app while hiding the private information. Over time, the

105 (a) Single target app (b) Multiple target apps

FIGURE 5.6: Classification accuracy of the target apps on unperturbed and perturbed data.

accuracy of the App network increases and saturates at some point across all the datasets. On the other hand, the accuracy of the Attacker remains low due to the obfuscation of the private attributes. This shows that the output of the Obfuscator is somewhat private to be- gin with due to the random initialization. All that Obfuscator needs to learn is to produce the output with the utility attributes that are required by the target application. Of course, the privacy loss ensures that the Obfuscator should not leak any private information in the output while learning to preserve the utility attributes. In Table 5.1, we report the training time (in seconds) and the obfuscation time (per sample in ms) averaged over 10 independent runs on a GPU. The training time varies from 2 minutes to a little over 2 hours while the obfuscation time across all the datasets is always under 1 ms. The high variation in the training time across different datasets is due to the varying size of the training data and the complexity of the underlying networks. We argue that the training time in the order of hours is acceptable since it is a one-time process that happens in the offline phase.

106 Table 5.4: Accuracy of attackers on obfuscated data. Unperturbed Perturbed Dataset (A1) A1 A2 LR RF SVM KTH 0.83 0.01 0.005 0.07 0.06 0.07 StateFarm 0.9 0.05 0.01 0.1 0.1 0.12 CIFAR10 0.5 0.04 0.03 0.07 0.03 0.06 HAR 0.7 0.008 0.008 0.09 0.09 0.02 OPPORTUNITY 0.75 0.11 0.13 0.16 0.43 0.17

5.5.4 Utility evaluation

We evaluate the utility of OLYMPUS by comparing the classification accuracy of the target app on the unperturbed data and the obfuscated data. The results are summarized in Fig- ure 5.6a. In the case of the motion sensor data, the classification accuracies on unperturbed data and the obfuscated data are comparable. Thus, the functionality of the target app is preserved even after obfuscation. For image data, we see a slight drop in the accuracy. The maximum drop occurs in the case of the CIFAR10 dataset which is about 17%. Note that in CIFAR10, we artificially added faces on to the images that may cover parts of the object. Due to this occlusion, it is hard to classify the objects correctly. This is also evident from the low classification accuracy on the unperturbed images of CIFAR10. In summary,

OLYMPUS preserves the functionality of the target app in the case of motion data, while achieving comparable accuracy in the case of image data.

5.5.5 Privacy evaluation

Traditional methods of protecting secrets (especially visual secrets) have focused on pro- tecting secrets from human adversaries. In our case, the obfuscated data is human imper- ceptible and thus is as good as random noise to any human adversary as evident from the examples given in Table 5.3. However, that does not mean it is secure against machines. Recently, researchers have shown that DNN can be trained as an adversary to recover hidden information in the obfuscated images [89]. Thus, we employ similar attacks by training five ML models to evaluate privacy of our mechanism.

107 • The first attacker (A1) is a DNN similar to the Attacker network that is used in

training OLYMPUS.

• The second attacker (A2) is also a DNN similar to A1, but with an additional layer. Using A2 we attempt to simulate more complex (and possibly stronger) adversary

than the one used in training OLYMPUS.

• The remaining three attackers are logistic regression (LR), random forest (RF), and support vector machine (SVM).

All attackers are trained on obfuscated data with the corresponding privacy labels, to classify a given obfuscated image into the correct class. Both A1 and A2 are trained using the same parameters (e.g. epochs, optimizer, etc.) that we used to train the Obfuscator. For the remaining three attackers, we set the regularizer parameter of LR to 1, the number of estimators of RF to 10, and used RBF kernel with c=1 for SVM. In the case of image datasets, we train these three attackers on the HOG (Histogram of Oriented Gradients) feature representation computed as described in [57]. Table 5.4 summarizes the results across various attackers and datasets. The standard error is within (±0.09) across all the reported results. On unperturbed data, A1 achieves significant improvement over random guessing. In fact, A1 is able to infer almost all the private information (100% accuracy in classifying private attributes), across all the datasets. Compared to this, OLYMPUS offers a significant improvement in protecting pri- vate information against all the attackers across all the datasets. Both A1 and A2 perform poorly (<5% improvement in accuracy over random guessing) in inferring private infor- mation from the obfuscated data except in the case of the OPPORTUNITY dataset. Thus, the obfuscation mechanism successfully protects private information against such adver- saries, namely DNNs. On the other hand, the traditional attackers (LR, RF and SVM) perform better than

DNN attackers. This is because the underlying attacker model of OLYMPUS is DNN.

108 (a) Results on KTH (left), StateFarm (center) and CIFAR10 (right) datasets

(b) Obfuscation time

FIGURE 5.7: Comparison with existing approaches.

Moreover, in case of images, the inputs to these attackers are sophisticated HOG features that are specially design to capture various patterns pertaining to human detection. Even

though we do not explicitly train against traditional attackers, OLYMPUS protects against those adversaries to some extent. For all the attackers, the accuracy of inferring private information is <17%, except in the case of RF on OPPORTUNITY dataset. This is be- cause the OPPORTUNITY dataset contains many different sensors which makes it harder to obfuscate. In the future, we plan to investigate this further and aim to improve our mechanism for multimodal data.

5.5.6 Privacy-utility tradeoff

We use image datasets to analyze how OLYMPUS balances privacy-utility tradeoff and compare our results with the following existing approaches.

109 • Blur: Blurring is a popular mechanism to hide private information in images and videos [39]. It is a convolution operation that smoothen the image by applying a Gaussian kernel. The size of the kernel determines the amount of blurring. Hence, increasing the kernel size gives more privacy.

• Mosaic: Mosaicing is also an averaging operation that applies a grid over an image and replace the values of all the pixels overlapping a cell with the average value of all the pixels that fall in that cell. Increasing the cell size results in averaging over a larger region and thus provides more privacy.

• AdvRep: AdvRep refers to a recent work on protecting visual secrets using GAN [103]. The main idea is to hide the private object by minimizing the reconstruction error and privacy loss given by an attacker network. It is not trained towards preserving

the utility of any particular app. Like OLYMPUS, AdvRep also has a parameter λ that controls privacy-utility tradeoff. Similar technique have also been used to hide sensitive text in images [62].

We evaluate above mentioned methods as well as OLYMPUS on three image datasets with the goals of protecting respective private attributes and preserving the functionality of the corresponding target app, i.e., the classifier. We obfuscate images using each method and then evaluate the obfuscation in terms of privacy and utility metrics defined previously. For the utility measure, we use the classification accuracy of the target app, i.e., the utility score. We measure privacy using the attacker A1, and report the privacy score as (1 - attacker’s score) for ease of representation. Thus, higher values mean higher privacy which makes it easy to compare against the utility score. To generate different degrees of obfuscation, we vary the kernel (cell) size from 3 to 50 in the case of Blur (Mosaic) method, and vary λ from 0 to 1 in the case of AdvRep and OLYMPUS. The resulting privacy-utility tradeoff graph is shown in Figure 5.7, where

110 FIGURE 5.8: Learning obfuscation when sensitive and useful properties are correlated.

* indicates the privacy and utility of the unperturbed data. OLYMPUS outperforms other methods in terms of providing the best privacy-utility balance across all the datasets. In particular, the Blur and Mosaic methods provide either high privacy or high utility, but not both. The AdvRep mechanism provides good tradeoff in the case of CIFAR10 but fails in the other two datasets. This is because it learns to reconstruct the given image while removing the private information from it. This strategy works well when the private information is separate from the useful information such as faces vs. objects in the case of CIFAR10. But, it does not perform well when the private and useful information are blended as in the case of other two datasets (identity vs. activity).

An interesting observation is that OLYMPUS strives to achieve high privacy even when λ = 1, i.e., privacy loss removed from the optimization (Equation 5.3). For instance, in case of KTH dataset, OLYMPUS provides high privacy irrespective of the value of λ. This is because the features learned by the target app for activity recognition are not very useful for person recognition. Since OLYMPUS only attempts to learn the features used by the target app, the output does not contain enough information to identify people.

5.5.7 Effect of correlation

Handling correlation between private and app-essential (utility) information is a major challenge in designing any obfuscation mechanism. To understand the effect of such corre- lation, we evaluate OLYMPUS on a synthetically generated data where we carefully control

111 the correlation between private and utility attributes. Synthetic data generation: We generate a two dimensional synthetic dataset sampled from a normal distribution with a specific mean and a covariance matrix. We fix the mean to (0,0) and vary the covariance matrix to control the degree of correlation among two dimensions. One can think of these two dimensions as features fs and fu representing private and utility attributes, respectively. Both these attributes are binary (positive and negative) and their values with respect to a data point is computed based on the values of their respective features as follows. For each data point, we say it belongs to the positive privacy (utility) class, if the value of fs (fu) is positive, and belongs to the negative class otherwise. We vary the value of the correlation factor among two features from 0 (no correlation) to 1 (highest correlation). For each correlation factor, we generate 10000 samples with their privacy and utility labels to evaluate OLYMPUS. Network architecture: Both the App and the Attacker networks comprise of a fully connected (FC) layer with 64 nodes followed by another FC layer with 32 nodes. The Obfuscator network consists of three FC layers with 64, 32 and 64 nodes, respectively. We use ReLU activation and dropout (p=0.25) at each layer in all the networks. Results: We split the data into train, validation and test set based on 80%, 10% and 10% splits, respectively. We train OLYMPUS using the Adam optimizer with the learning rate of 0.0001 using 100 epochs and report the results averaged over 10 runs. The privacy-utility tradeoff achieved by OLYMPUS with different degree of correlation is given in Figure 5.8. We can see that when there is little or no correlation among private and utility attributes,

OLYMPUS achieves high privacy and high utility. As the correlation increases, OLYMPUS degrades one at the expense of the other depending on the specified tradeoff parameter λ.

In summary, the results show that OLYMPUS gracefully handles correlation among private and app-essential information in the data, and allows users to control the privacy-utility tradeoff.

112 5.5.8 Obfuscation time

We showed that OLYMPUS outperforms simple obfuscation methods like blurring and mosaicing. However, the gain in accuracy comes at the cost of the overhead of learning the mechanism as well as applying the obfuscation at run time. As mentioned before, the overhead of learning is reasonable as it is a one time operation. Here, we measure the overhead of applying the obfuscation mechanism at run time. Figure 5.7b shows the time (ms) it takes to obfuscate an image using above mentioned methods. For Blur (Mosaic), we fix the kernel (cell) size to 25, and fix λ to 0.5 for Ad-

vRep and OLYMPUS. Mosaicing is the fastest among all since it only involves a single averaging operation per cell. Blurring is the second fastest method taking slightly more

time due to the application of Gaussian kernel. Both AdvRep and OLYMPUS takes similar amount of time across all the datasets. Overall, the obfuscation time is proportional to the size of the image across all the methods. OLYMPUS takes about 0.15 ms to obfuscate an image of size 80 x 60 (KTH dataset) which is sufficiently good for real-time processing.

5.5.9 Scaling to multiple applications

To understand how well OLYMPUS scales with multiple apps, we perform the following experiment. For each image dataset, we train a classifier per utility class in one-vs-rest fashion. Each of these classifiers is considered as a target app that is interested in the cor- responding utility class. Thus, we have in total 6, 10 and 10 classifiers for KTH, StateFarm and CIFAR10 datasets, respectively. For each dataset, we train OLYMPUS with the appro- priate set of classifiers as target apps. The App module queries each classifier to compute the respective utility loss and averages it over all the classifiers. From Figure 5.6b, we can see that the classification accuracy on obfuscated data is comparable to the accuracy on unperturbed data across all image datasets. When com- paring these results with the results from Figure 5.6a (single target app), we see that the

113 Table 5.5: Evaluating OLYMPUS using LR as an app classifier. Utility (LR) Privacy (DNN) Dataset Unperturbed Perturbed Unperturbed Perturbed KTH 0.51 0.49 0.20 0.08 StateFarm 0.43 0.74 0.9 0.05 CIFAR10 0.46 0.45 0.17 0.06 HAR 0.93 0.95 0.5 0.02 OPPORTUNITY 0.75 0.86 0.75 0.18

accuracy increases significantly in the case of perturbed as well as unperturbed images. This is not surprising, since each classifier is responsible to classify only a single utility class and hence it is a much simpler task. To evaluate privacy, we measure the accuracy of attacker (A1) on the obfuscated data

learned by OLYMPUS when trained with multiple apps and compare it with the results we got in the single app setting. We found a moderate increase in the attacker’s accuracy in the multiple apps case. The attacker’s accuracy increased from 0.01, 0.05 and 0.04 to 0.04, 0.06 and 0.05, for KTH, StateFarm and CIFAR10 datasets, respectively. Note that having multiple target apps only increases the overhead linearly in terms of querying the target apps to compute utility loss. It does not affect the online-phase since we learn a single obfuscation mechanism that works with all the target apps.

5.5.10 App classifiers

So far, we have evaluated OLYMPUS only using a DNN as an app classifier. To see how well OLYMPUS performs on different kinds of app classifiers, we change the app classifier to a logistic regression (LR) model. Unlike DNN, LR takes features as inputs instead of raw sensor data. Thus, for image datasets, we use HOG features as inputs to the app classifier. In other words, OLYMPUS learns to obfuscate features instead of raw images. Since motion sensor datasets already comprise of features, we directly use them as inputs to OLYMPUS.

We evaluated OLYMPUS on all the benchmark datasets with LR as the app classifier.

114 The results are given in Table 5.5. The standard error is within (±0.03) across all the reported results. The utility on the unperturbed data is low compared to DNN classifiers across all datasets. This is not surprising given that DNN is a more powerful model than LR. However, what is surprising is that on some datasets the utility improves significantly on the obfuscated data. We believe this is due to the Obfuscator learning better features in an attempt to minimize utility loss. Since we are using DNN as an adversary, the privacy achieved by OLYMPUS is comparable to previous results of privacy evaluation.

5.6 Related Work

In this section, we summarize existing approaches of protecting private information in the sensor data. Many prior methods of protecting visual secrets primarily rely on com- puter vision algorithms to classify images or objects within the images as sensitive (non- sensitive) and hide (reveal) their presence [41, 75, 99, 104, 111]. Unlike OLYMPUS, these methods do not provide an efficient way to balance privacy-utility tradeoff and are prone to leak private information against correlation attacks [89, 94]. Erdogdu [64] proposed privacy preserving mapping for time-series data using infor- mation theoretic approach to optimize for statistical privacy-utility tradeoff. Although this approach provides stronger guarantees, in practice, solving such an optimization problem is hard in many scenarios. Shokri et al. [109] proposed an optimal user-centric data obfus- cation mechanism based on a Stackelberg game. The obfuscation mechanism is a solution of a carefully constructed linear program that minimizes utility loss under specified privacy constraints. This approach works well on certain types of data such as location trajectories that are easy to discretize. However, it is not clear how we can use such mechanism on continuous data such as images. In [95], the authors proposed an approach of learning ad- versarial perturbations based on a user-recognizer game. Although the idea of adversarial game is common to our work, the user-recognizer game model restricts players to play

115 from a fixed set of strategies and hence is limited. On the contrary, OLYMPUS does not fix the strategy of obfuscation/attack, allowing the obfuscator/attacker to automatically learn the optimal strategy based on the specified privacy-utility requirements. SenseGen [42] uses GAN to generate synthetic sensor data to protect user’s privacy. However, it does not provide any privacy guarantees on the generated synthetic data. The approach proposed in [72] is not applicable in our setting as it uses generative models that are tailored to the statistics of the datasets, and applying it to real-life signals (such as im- ages) is an open problem. AttriGuard [78] uses adversarial perturbations to defend against an attacker that attempts to infer private attributes from the data. By defeating a particu- lar attacker, AttriGuard protects against other similar attackers based on the principle of transferability. In our work, we take a different approach of adversarial learning to ensure that the private data is protected against all ML attackers belonging to a specific class. Replacement AutoEncoder (RAE) [88] learns a transformation that replaces discrimi- native features that correspond to sensitive inferences by features that are more observed in the non-sensitive inferences. This approach only works when the sensitive and non- sensitive data is clearly separated. On the contrary, OLYMPUS also handles cases when there is a high overlap among sensitive and non-sensitive data. Moreover, RAE does not protect against an adversary who has the knowledge of the original gray-listed data, i.e., non-sensitive inferences. On the other hand, OLYMPUS protects against an adversary who has a complete access to the training data as well as the obfuscation mechanism. Our work is closely related to the recently proposed obfuscation techniques that for- mulate the obfuscation problem as a minimax optimization [52, 62, 87, 97, 100, 103]. Malekzadeh et al. [87] introduces Guardian-Estimator-Neutralizer (GEN) framework. The Guardian learns to obfuscate the sensor data and the Estimator guides the Guardian by inferring sensitive and non-sensitive information from the obfuscated data. Unlike the

Attacker module in OLYMPUS, the Estimator is pretrained and fixed during the optimiza- tion process. Thus, the learned obfuscation defends against a specific attacker, namely the

116 pretrained Estimator. On the other hand, the Attacker in OLYMPUS continuously evolve with the Obfuscator resulting in an obfuscation mechanism that defends against a class of attackers. PPRL-VGAN [52] learns a privacy preserving transformation of faces that hides the identity of a person while preserving the facial expression. Given a face image of a person it synthesizes an output face with a randomly chosen identity from a fixed dataset while preserving the facial expression of the input image. SGAP [97] uses Siamese networks to identify discriminative features related to identity and perturbs them using adversarial networks. A similar idea is proposed in [100] where Siamese networks are used to learn embeddings that are non-discriminatory for sensitive information, making it harder for the adversary to learn sensitive information from the embeddings. Adversarial networks are also used to learn an obfuscation mechanism to hide text [62] and QR-code [103] in images. All of these approaches focus on learning a transformation that preserves some prop- erty of the data without considering any target apps. On the contrary, OLYMPUS learns an obfuscation mechanism that seamlessly works with the existing apps without modify- ing them. We also demonstrated the feasibility of OLYMPUS by developing a prototype implementation that runs on a smartphone, and evaluated against a real world app.

5.7 Conclusion

We proposed a privacy framework OLYMPUS to learn a utility aware obfuscation that protects private user information in image and motion sensor data. We showed that such a mechanism can be constructed given the training data, and the obfuscated data works well with the target third-party apps without modifying the apps.

We implemented OLYMPUS by instrumenting Android OS and evaluated the obfusca- tion mechanism on a handwriting recognition app. We showed that OLYMPUS successfully

117 protects the identity of users without compromising the digit classification accuracy of the app. We also evaluated OLYMPUS on three image datasets and two sensor datasets con- taining readings from various motion sensors. For each dataset, we showed that OLYM-

PUS significantly reduced the risk of disclosing private user information. At the same time, it preserved the useful information that enabled the target app to function smoothly even on the obfuscated data. We verified the privacy guarantees using a number of ML adversaries that are trained to defeat the obfuscation. We also compared our approach with existing approaches of protecting visual secrets and demonstrated that OLYMPUS provides better control over privacy-utility tradeoff.

118 6

Conclusion

In this dissertation, we proposed and evaluated privacy frameworks that give users fine- grained control over what information they want to share with third-party apps on smart- phones. In Chapter 3, we made the case for a plugin-driven architecture for permissions frame- work in smartphones. First, we argued that the existing permissions frameworks are mainly developed for a specific use case, and it is not viable to design a monolithic permis- sions framework that cater to varying needs of users and apps. The existing plugin based frameworks require plugins to either run in the OS or in the address spaces of app. We showed that both of these approaches have drawbacks. A malicious/buggy plugin running in the OS can easily compromise the entire system. Similarly, a malicious/buggy plugin running in the app’s address space can exfiltrate app’s sensitive data. To avoid these draw- backs, we proposed permissions plugins as apps who run in their own address space. This way not only we prevent apps from circumventing the plugins, but also limit the damage a malicious/buggy plugin can cause.

Based on these principles, we developed DALF, an extensible permissions framework for Android. Our prototype implementation runs on Android 8.1 and supports arbitration

119 of app’s access to location, contacts, calendar, camera and external storage. For each of the resource type, we developed permissions plugins that govern the use of the data

with complex policies. We also demonstrated the flexibility of DALF by expressing two existing privacy mechanisms as plugins – a plugin to perturb location data using geo- indistinguishability [43], and a plugin to protect visual secrets on two dimensional surfaces

using PRIVATEEYE [104]. Through evaluation on a microbenchmarks, we demonstrated

that DALF exhibits low performance overheads when using plugins to mediate app’s access

to resources. Our case study of using DALF with real-world apps have shown promising results for it to be adapted by the mobile operating systems. Chapter 4 presents a novel approach of privacy markers to protect visual information from accidental disclosure by third-party apps. To this end, we developed an end-to-end

privacy framework WAVEOFF that gives users fine-grained control of what visual infor-

mation apps can access through a device’s camera. WAVEOFF allows users to specify safe objects in a camera’s view via easy to user markers. It then develops a model to efficiently detect safe regions in a video feed and only delivers the content within the safe regions in

subsequent frames. We instrumented Android’s camera service to implement WAVEOFF. Our prototype implementation runs on Nexus 5 and provides privacy preserving access to the camera in near real-time (at a median rate of 20 FPS) with acceptable accuracy.

Our evaluation on a benchmark of representative videos showed that WAVEOFF provides strong privacy without compromising apps’ functionality. We also conducted a user study

to evaluate the usability of WAVEOFF which deemed camera operations with WAVEOFF quick and easy to use. In Chapter 5, we argued that in many scenarios the private information is tightly cou- pled with the utility information, i.e, information needed by the apps. In such cases, the traditional protection methods (e.g., blurring the private object in an image) are ineffective in achieving the crucial privacy-utility tradeoff. To overcome this limitation, we proposed

OLYMPUS, a utility aware obfuscation mechanism that hides the user specified private

120 information but preserves the utility-centric information required by the target app. We showed that such a mechanism can be constructed using a game theoretic approach. We also showed that the obfuscated data works well with the target apps without modifying

the apps. OLYMPUS is implemented by instrumenting Android OS running on a Nexus 9

tablet. We showed that OLYMPUS successfully protects the identity of users without com- promising the digit classification accuracy of a real-world handwriting recognition app downloaded from the Google Play store. Our extensive evaluation on three image datasets and two motion sensor datasets shows that OLYMPUS significantly reduced the risk of dis- closing private user information while preserving the useful information that enabled the target apps to function smoothly even on the obfuscated data.

6.1 Future Work

Below, we briefly discuss future directions to move forward the ideas presented in this dissertation.

Plugin composition

Consider the following scenario: Alice is video chatting on her phone with collaborators. She would like to use her whiteboard to work out a problem, and hence would like her collaborators to see it. But, she has a confidential product roadmap on her whiteboard that her collaborators should not see. She also has a prototype model of a newly designed drone on her desk which she would like to show to her collaborators. However, she is worried that her collaborators may also see a bottle of medication lying on her desk. So

she wants to use a PRIVATEEYE plugin to share her whiteboard without revealing the

product roadmap, and a WAVEOFF plugin to share the drone prototype without disclosing the medication bottle. The above use case demonstrates the need to apply multiple plugins to an app’s request which leads to an important research question - How do we compose multiple plugins?

121 One way to add such support in DALF is to sequentially compose plugins. Each plugin is invoked in turn and given the data output by the previously invoked plugin, where the first plugin receives the original data. The output of the last plugin is then given to the app. This strategy works well in the cases where the composition is simple. For example, think of a face-blur plugin that blurs faces of people and a license-blur plugin that blurs license plates of vehicles. One can easily compose these two plugins sequentially and the final output will have both the faces and the license plates blurred. However, the sequential composition may not yield correct result in the above men- tioned video chat scenario. Assume we apply PRIVATEEYE plugin followed by WAVEOFF plugin. The output of the PRIVATEEYE plugin will block everything (including the proto- type drone) except the region of the whiteboard explicitly marked by the privacy marker.

Consequently, the WAVEOFF plugin will not be able to detect the drone in the camera frame received from the PRIVATEEYE plugin. Not only that, it will also block the entire whiteboard including the area revealed by the PRIVATEEYE plugin. Thus, the final output frame will be completely black! We need to come up with a systematic approach to compose multiple plugins based on their properties. It requires answering many questions such as: What should be the order of the composition? How does the order impact the privacy-utility guarantees of each plugin? How do we reason about the protections applied to the apps under plugin composition?

Contextual permissions

The decision to grant an app access to a particular resource depends a lot on the context in which the access is requested [96, 116]. For example, a user might be okay if an app accesses her location while she is using the app but not when the app is in the background. Here the decision to grant access to a user’s location depends on the state of the app, i.e., foreground vs. background. The current prototype of DALF can be easily extended to

122 provide such simple contextual information to the plugin so that it can make an informed decision. However, supporting complex contextual information is a challenging problem. For example, a user might want to provide her true location to an app only when she is using it for navigation. In all other cases the app should receive an obfuscated location. Here, the contextual information is the use of app for navigation which is very difficult to infer. It brings us to the question of understanding app semantics, i.e., reasoning about why an app is requesting a particular resource. Ideas from recent works on code analysis [53, 54] and understanding user’s intention [66, 96] can be used to infer app semantics. A downside of providing rich contextual information to plugins is that plugins now have more information about the users and their surroundings. A malicious plugin may request sensitive user information under the pretense of monitoring the context. Contex- tual information such as a user’s location or the apps she is using allow the plugin to profile her which may compromise her privacy. To address this issue, we need to develop novel solutions that balance both user’s privacy and plugin’s requirements of contextual information. We hope that the ideas presented in this dissertation will help develop a truly extensi- ble permissions framework that could be adopted in future versions of mobile operating systems.

123 Appendix A

Supporting Materials for OLYMPUS

A.1 Neural Network Architectures

In this section, we provide details of the underlying networks used in evaluation of OLYM-

PUS.

A.1.1 OLYMPUS for images

Figure 5.2 shows the architecture of underlying networks of OLYMPUS for image data. Convolutional Neural Network (CNN) is a popular network architecture for many com- puter vision tasks including image classification. The Attacker takes as input an obfuscated image (output of the Obfuscator) and classifies it into one of the privacy classes. Hence, we construct the Attacker using a CNN architecture. The input is connected to a series of convolutional layers. We apply ReLU (rectified linear unit) activation followed by Max- pool operation to the output of each convolutional layer. The final convolutional layer is followed by a few fully connected layers with ReLU activation except the last layer which uses softmax activation to output class probabilities. The number of nodes in the last layer is equal to the number of privacy classes.

124 The Obfuscator net follows the architecture of a standard autoencoder. In particular, it consists of a series of convolutional layers followed by a fully connected layer (code layer) which in turn is followed by a series of deconvolutional layers. The input to the Obfuscator is an unperturbed image and the output is an obfuscated image of the same size. All the layers use ReLU activation except the output layer which uses sigmoid to bring the output in [0,1]. As a preprocessing step, we normalize images to [0,1] which is a standard practice in computer vision.

Below, we provide complete specification of the networks used in evaluating OLYM-

PUS on the image datasets.

KTH App Net

1. Input: 60 x 80 x 1 obfuscated image 2. Conv2D (filters=32,size=(3,3),stride=1,activation=ReLU) 3. MaxPool (size=(2,2)) 4. Conv2D (filters=64,size=(3,3),stride=1,activation=ReLU) 5. MaxPool (size=(2,2)) 6. Dense (n=32,activation=ReLU) 7. Dense (n=6,activation=softmax)

Attacker Net

1. Input: 60 x 80 x 1 obfuscated image 2. Conv2D (filters=32,size=(3,3),stride=1,activation=ReLU) 3. MaxPool (size=(2,2)) 4. Conv2D (filters=64,size=(3,3),stride=1,activation=ReLU) 5. MaxPool (size=(2,2)) 6. Dense (n=32,activation=ReLU)

125 7. Dense (n=6,activation=softmax)

Obfuscator Net

1. Input: 60 x 80 x 1 unperturbed image 2. Conv2D (filters=32,size=(3,3),stride=1,activation=ReLU) 3. MaxPool (size=(2,2)) 4. Conv2D (filters=64,size=(3,3),stride=1,activation=ReLU) 5. MaxPool (size=(2,2)) 6. Dense (n=8,activation=ReLU) 7. Dense (n=300,activation=ReLU) 8. Reshape (size=(15,20)) 9. Deconv2D (filters=64,size=(3,3),stride=1,activation=ReLU) 10. UpSample2D (size=(2,2)) 11. Deconv2D (filters=32,size=(3,3),stride=1,activation=ReLU) 12. UpSample2D (size=(2,2)) 13. Deconv2D (filters=1,size=(3,3),stride=1,activation=sigmoid)

StateFarm App Net

1. Input: 56 x 56 x 1 obfuscated image 2. Conv2D (filters=32,size=(3,3),stride=1,activation=ReLU) 3. MaxPool (size=(2,2)) 4. Conv2D (filters=64,size=(3,3),stride=1,activation=ReLU) 5. MaxPool (size=(2,2)) 6. Dense (n=32,activation=ReLU) 7. Dense (n=10,activation=softmax)

126 Attacker Net

1. Input: 56 x 56 x 1 obfuscated image 2. Conv2D (filters=32,size=(3,3),stride=1,activation=ReLU) 3. MaxPool (size=(2,2)) 4. Conv2D (filters=64,size=(3,3),stride=1,activation=ReLU) 5. MaxPool (size=(2,2)) 6. Dense (n=32,activation=ReLU) 7. Dense (n=10,activation=softmax)

Obfuscator Net

1. Input: 56 x 56 x 1 unperturbed image 2. Conv2D (filters=32,size=(3,3),stride=1,activation=ReLU) 3. MaxPool (size=(2,2)) 4. Conv2D (filters=64,size=(3,3),stride=1,activation=ReLU) 5. MaxPool (size=(2,2)) 6. Dense (n=8,activation=ReLU) 7. Dense (n=196,activation=ReLU) 8. Reshape (size=(14,14)) 9. Deconv2D (filters=64,size=(3,3),stride=1,activation=ReLU) 10. UpSample2D (size=(2,2)) 11. Deconv2D (filters=32,size=(3,3),stride=1,activation=ReLU) 12. UpSample2D (size=(2,2)) 13. Deconv2D (filters=1,size=(3,3),stride=1,activation=sigmoid)

CIFAR10 App Net

1. Input: 32 x 32 x 1 obfuscated image

127 2. Conv2D (filters=32,size=(3,3),stride=1,activation=ReLU) 3. Conv2D (filters=32,size=(3,3),stride=1,activation=ReLU) 4. MaxPool (size=(2,2)) 5. Dropout (p=0.25) 6. Conv2D (filters=64,size=(3,3),stride=1,activation=ReLU) 7. Conv2D (filters=64,size=(3,3),stride=1,activation=ReLU) 8. MaxPool (size=(2,2)) 9. Dropout (p=0.25) 10. Dense (n=512,activation=ReLU) 11. Dropout (p=0.5) 12. Dense (n=10,activation=softmax)

Attacker Net

1. Input: 32 x 32 x 1 obfuscated image 2. Conv2D (filters=32,size=(3,3),stride=1,activation=ReLU) 3. MaxPool (size=(2,2)) 4. Conv2D (filters=64,size=(3,3),stride=1,activation=ReLU) 5. MaxPool (size=(2,2)) 6. Dense (n=32,activation=ReLU) 7. Dense (n=2,activation=softmax)

Obfuscator Net

1. Input: 32 x 32 x 1 unperturbed image 2. Conv2D (filters=32,size=(3,3),stride=1,activation=ReLU) 3. MaxPool (size=(2,2)) 4. Conv2D (filters=64,size=(3,3),stride=1,activation=ReLU) 5. MaxPool (size=(2,2))

128 6. Dense (n=8,activation=ReLU) 7. Dense (n=64,activation=ReLU) 8. Reshape (size=(8,8)) 9. Deconv2D (filters=64,size=(3,3),stride=1,activation=ReLU) 10. UpSample2D (size=(2,2)) 11. Deconv2D (filters=32,size=(3,3),stride=1,activation=ReLU) 12. UpSample2D (size=(2,2)) 13. Deconv2D (filters=1,size=(3,3),stride=1,activation=sigmoid)

A.1.2 OLYMPUS for motion sensors

For the motion sensor data, we use DNN with fully connected layers as shown in Fig- ure 5.3. We assume that the motion sensor data is available as a series of vectors with each vector having corresponding utility (e.g. activity) and privacy (e.g. person id) labels. Each vector can have values from the raw sensor readings or it can be some appropriate features computed using a sliding window approach. We normalize the sensor data to [-1,1] as a preprocessing step. Without loss of generality, we refer to the preprocessed sensor data as features. The Obfuscator net takes as input these features and outputs the corresponding obfus- cated features. The network follows an autoencoder architecture and is constructed using a series of fully connected layers with a ReLU activation. We apply tanh activation to the output of the final layer. The Attacker network uses an obfuscated feature vector as the input. It consists of fully connected layers with ReLU activation except the final layer which contains softmax activation. The size of the final layer of the Attacker network is equal to the number of privacy classes.

Below, we provide complete specification of the networks used in evaluating OLYM-

PUS on the motion sensor datasets.

129 HAR App Net

1. Input: 561 x 1 obfuscated feature vector 2. Dense (n=128,activation=ReLU) 3. Dense (n=6,activation=softmax)

Attacker Net

1. Input: 561 x 1 obfuscated feature vector 2. Dense (n=128,activation=ReLU) 3. Dense (n=30,activation=softmax)

Obfuscator Net

1. Input: 561 x 1 unperturbed feature vector 2. Dense (n=64,activation=ReLU) 3. Dense (n=8,activation=ReLU) 4. Dense (n=64,activation=ReLU) 5. Dense (n=561,activation=tanh)

OPPORTUNITY App Net

1. Input: 242 x 1 obfuscated feature vector 2. Dense (n=128,activation=ReLU) 3. Dense (n=5,activation=softmax)

Attacker Net

1. Input: 242 x 1 obfuscated feature vector 2. Dense (n=128,activation=ReLU) 3. Dense (n=4,activation=softmax)

130 Obfuscator Net

1. Input: 242 x 1 unperturbed feature vector 2. Dense (n=64,activation=ReLU) 3. Dense (n=8,activation=ReLU) 4. Dense (n=64,activation=ReLU) 5. Dense (n=242,activation=tanh)

131 Bibliography

[1] Adblock Plus. https://adblockplus.org.

[2] Android Developers: android.hardware.camera2. https://developer. android.com/reference/android/hardware/camera2/package- summary.

[3] Android Developers: Content providers. https://developer.android. com/guide/topics/providers/content-providers.

[4] Android Developers: dumpsys. https://developer.android.com/ studio/command-line/dumpsys.

[5] Android Developers: Permissions overview. https://developer. android.com/guide/topics/permissions/overview.

[6] Android Developers: Profile battery usage with Batterystats and Battery Historian. https://developer.android.com/studio/profile/ battery-historian.

[7] Android Developers: Who lives and who dies? Process priorities on Android. https://medium.com/androiddevelopers/who-lives-and-who- dies-process-priorities-on-android-cb151f39044f.

[8] Android Source: BufferQueue and gralloc. https://source.android. com/devices/graphics/arch-bq-gralloc.

[9] Android Source: Source Code Tags and Builds. https://source.android. com/setup/start/build-numbers#source-code-tags-and- builds.

[10] Android’s BitTube source. https://android.googlesource.com/ platform/frameworks/native/+/android-8.1.0_r1/libs/gui/ BitTube.cpp.

[11] App Store. https://www.apple.com/ios/app-store.

132 [12] [APP][XPOSED][6.0+] XPrivacyLua - Android privacy manager. https: //forum.xda-developers.com/xposed/modules/xprivacylua6- 0-android-privacy-manager-t3730663.

[13] AT&T Database of Faces. http://www.cl.cam.ac.uk/research/dtg/ attarchive/facedatabase.html.

[14] Deep learning for mobile:dl4mobile. https://play.google.com/store/ apps/details?id=com.nalsil.tensorflowsimapp&hl=en.

[15] F-Droid: an installable catalogue of FOSS (Free and Open Source Software) apps. https://f-droid.org/en/.

[16] Ghostery Goes Open Source. https://www.ghostery.com/press/ ghostery-goes-open-source/.

[17] Google Cloud AI. https://cloud.google.com/products/machine- learning/.

[18] Google Play. https://play.google.com/store?hl=en.

[19] Google Play Protect. https://www.android.com/play-protect.

[20] Grubhub. https://play.google.com/store/apps/details?id= com.grubhub.android.

[21] HTTPS Everywhere. https://www.eff.org/https-everywhere.

[22] Human activity recognition using cnn. https://github.com/aqibsaeed/ Human-Activity-Recognition-using-CNN.

[23] Install/Run-time MMAC Policy. http://selinuxproject.org/page/ NB_SEforAndroid_1#Install.2FRun-time_MMAC_Policy.

[24] Modifying Arguments With ptrace. https://www. alfonsobeato.net/c/modifying-system-call-arguments- with-ptrace/.

[25] Privacy Badger. https://github.com/EFForg/privacybadger.

[26] Security-Enhanced Linux in Android. https://source.android.com/ security/selinux.

[27] Speech recognition tensorflow machine learning. https://play.google. com/store/apps/details?id=machinelearning.tensorflow. speech&hl=en.

133 [28] State Farm Distracted Driver Detection. https://www.kaggle.com/c/ state-farm-distracted-driver-detection/data.

[29] Telegram. https://telegram.org.

[30] Telegram FOSS. https://github.com/Telegram-FOSS-Team/ Telegram-FOSS.

[31] TensorFlow Inference API. https://github.com/tensorflow/ tensorflow/tree/master/tensorflow/contrib/android.

[32] TensorFlow Models. https://github.com/tensorflow/models.

[33] uBlock Origin. https://github.com/gorhill/uBlock.

[34] VLC for Android. https://play.google.com/store/apps/details? id=org.videolan.vlc.

[35] Xposed. http://repo.xposed.info.

[36] XPrivacyLua. https://lua.xprivacy.eu.

[37] Yet Another Hook Framework for ART. https://github.com/rk700/ YAHFA.

[38] Your Apps Know Where You Were Last Night, and They’re Not Keeping It Secret. https://www.nytimes.com/interactive/2018/12/10/ business/location-data-privacy-apps.html.

[39] Youtube official blog. Blur moving objects in your video with the new custom blur- ring tool on youtube. https://youtube-creators.googleblog.com/ 2016/02/blur-moving-objects-in-your-video-with.html, 2016.

[40] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng. Tensorflow: A system for large-scale machine learning. OSDI, 2016.

[41] P. Aditya, R. Sen, P. Druschel, S. Joon Oh, R. Benenson, M. Fritz, B. Schiele, B. Bhattacharjee, and T. T. Wu. I-pic: A platform for privacy-compliant image capture. MobiSys, 2016.

[42] M. Alzantot, S. Chakraborty, and M. B. Srivastava. Sensegen: A deep learning architecture for synthetic sensor data generation. BICA, 2017.

134 [43] M. E. Andres,´ N. E. Bordenabe, K. Chatzikokolakis, and C. Palamidessi. Geo- indistinguishability: Differential privacy for location-based systems. Proceedings of CCS ’13, November 2013.

[44] D. Anguita, A. Ghio, L. Oneto, X. Parra, and J. L Reyes-Ortiz. A public domain dataset for human activity recognition using smartphones. ESANN, 2013.

[45] M. Backes, S. Bugiel, S. Gerling, and P. von Styp-Rekowsky. Android Security Framework: Extensible Multi-Layered Access Control on Android. In Proceedings of ACSAC ’14, December 2014.

[46] A. R. Beresford, A. Rice, N. Skehin, and R. Sohan. MockDroid: trading privacy for application functionality on smartphones. In Proceedings of HotMobile ’11, March 2011.

[47] G. Bradski. Dr. Dobb’s Journal of Software Tools, 2000.

[48] S. Chakraborty, C. Shen, K. R. Raghavan, Y. Shoukry, M. Millar, and M. Srivastava. ipShield: A Framework for Enforcing Context-aware Privacy. In Proceedings of NSDI ’14, April 2014.

[49] S. Chakraborty, C. Shen, K. R. Raghavan, Y. Shoukry, M. Millar, and M. Srivastava. ipshield: A framework for enforcing context-aware privacy. In NSDI, 2014.

[50] J. Chaudhari, S. Cheung, and M. Venkatesh. Privacy protection for life-log video. In SAFE, 2007.

[51] R. Chavarriaga, H. Sagha, A. Calatroni, S. T. Digumarti, G. Troster,¨ J. D. R. Millan,´ and D. Roggen. The opportunity challenge: A benchmark database for on-body sensor-based activity recognition. Pattern Recogn. Lett., 2013.

[52] J. Chen, J. Konrad, and P. Ishwar. Vgan-based image representation learning for privacy-preserving facial expression recognition. CVPR Workshops, 2018.

[53] K. Z. Chen, N. Johnson, V. D’Silva, S. Dai, K. MacNamara, T. Magrino, E. Wu, M. Rinard, , and D. Song. Contextual Policy Enforcement in Android Applications with Permission Event Graphs. In Proceedings of NDSS ’13, February 2013.

[54] X. Chen, H. Huang, S. Zhu, Q. Li, and Q. Guan. SweetDroid: Toward a Context- Sensitive Privacy Policy Enforcement Framework for Android OS. In Proceedings of WPES ’17, October 2017.

[55] F. Chollet. keras. https://github.com/fchollet/keras, 2015.

[56] M. Conti, V. T. N. Nguyen, and B. Crispo. CRePE: Context-Related Policy En- forcement for Android. In Proceedings of ISC ’10, October 2010.

135 [57] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. CVPR, 2005.

[58] N. Davies, N. Taft, M. Satyanarayanan, S. Clinch, and B. Amos. Privacy mediators: Helping iot cross the chasm. In HotMobile, 2016.

[59] Y.-A. de Montjoye, C. A. Hidalgo, M. Verleysen, and V. D. Blondel. Unique in the crowd: The privacy bounds of human mobility. In Scientific reports, 2013.

[60] M. Dietz, S. Shekhar, Y. Pisetsky, A. Shu, and D. S. Wallach. Quire: Lightweight Provenance for Smart Phone Operating Systems. August 2011.

[61] C. Dwork. Differential privacy. ICALP, 2006.

[62] H. Edwards and A. J. Storkey. Censoring representations with an adversary. ICLR, 2016.

[63] M. Enev, J. Jung, L. Bo, X. Ren, and T. Kohno. Sensorsift: Balancing sensor data privacy and utility in automated face understanding. ACSAC, 2012.

[64] M. Erdogdu, N. Fawaz, and A. Montanari. Privacy-utility trade-off for time-series with application to smart-meter data. AAAI Workshops, 2015.

[65] A. P. Felt, E. Chin, S. Hanna, D. Song, and D. Wagner. Android permissions de- mystified. CCS, 2011.

[66] H. Fu, Z. Zheng, S. Zhu, and P. Mohapatra. INSPIRED: Intention-based Privacy- preserving Permission Model. (arXiv:1709.06654 [cs.CR]), September 2017.

[67] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. NIPS, 2014.

[68] M. C. Grace, W. Zhou, X. Jiang, and A.-R. Sadeghi. Unsafe exposure analysis of mobile in-app advertisements. WISEC, 2012.

[69] J. Hamm. Minimax filter: Learning to preserve privacy from inference attacks. JMLR, 2017.

[70] X. He, G. Cormode, A. Machanavajjhala, C. M. Procopiuc, and D. Srivastava. Dpt: differentially private trajectory synthesis using hierarchical reference systems. VLDB, 2015.

[71] S. Hodges, L. Williams, E. Berry, S. Izadi, J. Srinivasan, A. Butler, G. Smyth, N. Kapur, and K. Wood. Sensecam: A retrospective memory aid. UbiComp, 2006.

[72] C. Huang, P. Kairouz, X. Chen, L. Sankar, and R. Rajagopal. Context-aware gener- ative adversarial privacy. Entropy, 2017.

136 [73] Q. T. Inc. Trepn Power Profiler. https://developer.qualcomm.com/ software/trepn-power-profiler.

[74] S. Jana, D. Molnar, A. Moshchuk, A. Dunn, B. Livshits, H. J. Wang, and E. Ofek. Enabling Fine-Grained Permissions for Augmented Reality Applications With Rec- ognizers. In USENIX Security, 2013.

[75] S. Jana, D. Molnar, A. Moshchuk, A. Dunn, B. Livshits, H. J. Wang, and E. Ofek. Enabling Fine-Grained Permissions for Augmented Reality Applications With Rec- ognizers. USENIX Security, 2013.

[76] S. Jana, A. Narayanan, and V. Shmatikov. A Scanner Darkly: Protecting User Privacy from Perceptual Applications. In S&P, 2013.

[77] S. Jana, A. Narayanan, and V. Shmatikov. A Scanner Darkly: Protecting User Privacy from Perceptual Applications. S & P, 2013.

[78] J. Jia and N. Z. Gong. Attriguard: A practical defense against attribute inference attacks via adversarial machine learning. USENIX Security, 2018.

[79] J. Jung and M. Philipose. Courteous glass. UbiComp ’14 Adjunct, 2014.

[80] Z. Kalal, K. Mikolajczyk, and J. Matas. Tracking-learning-detection. PAMI, 2012.

[81] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. ICLR, 2015.

[82] M. Korayem, R. Templeman, D. Chen, D. J. Crandall, and A. Kapadia. Screenavoider: Protecting computer screens from ubiquitous cameras. CoRR, 2014.

[83] A. Krizhevsky. Learning Multiple Layers of Features from Tiny Images. Master’s thesis, 2009.

[84] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998.

[85] S. Leutenegger, M. Chli, and R. Siegwart. Brisk: Binary robust invariant scalable keypoints. In ICCV, 2011.

[86] Z. Liu, P. Luo, X. Wang, and X. Tang. Deep learning face attributes in the wild. ICCV, 2015.

[87] M. Malekzadeh, R. G. Clegg, A. Cavallaro, and H. Haddadi. Protecting sensory data against sensitive inferences. EuroSys Workshop, 2018.

[88] M. Malekzadeh, R. G. Clegg, and H. Haddadi. Replacement autoencoder: A privacy-preserving algorithm for sensory data analysis. IoTDI, 2018.

137 [89] R. McPherson, R. Shokri, and V. Shmatikov. Defeating image obfuscation with deep learning. CoRR, 2016.

[90] R. Metz. More connected homes, more problems. MIT Technology Review, 2013.

[91] E. Miluzzo, A. Varshavsky, S. Balakrishnan, and R. R. Choudhury. TapPrints: Your Finger Taps Have Fingerprints. In Proceedings of MobiSys ’12, June 2012.

[92] G. Nebehay and R. Pflugfelder. Consensus-based matching and tracking of key- points for object tracking. In Applications of Computer Vision (WACV), 2014 IEEE Winter Conference on, pages 862–869, March 2014.

[93] G. Nebehay and R. Pflugfelder. Clustering of Static-Adaptive correspondences for deformable object tracking. In CVPR, 2015.

[94] S. J. Oh, R. Benenson, M. Fritz, and B. Schiele. Faceless person recognition; pri- vacy implications in social media. ECCV, 2016.

[95] S. J. Oh, M. Fritz, and B. Schiele. Adversarial image perturbation for privacy pro- tection - A game theory perspective. ICCV, 2017.

[96] K. Olejnik, I. Dacosta, J. S. Machado, K. Huguenin, M. E. Khan, and J. P. Hubaux. SmarPer: Context-Aware and Automatic Runtime-Permissions for Mobile Devices. In Proceedings of S&P ’17, May 2017.

[97] W. Oleszkiewicz, T. Wlodarczyk, K. J. Piczak, T. Trzcinski, P. Kairouz, and R. Ra- jagopal. Siamese generative adversarial privatizer for biometric data. CVPR Work- shops, 2018.

[98] M. Ongtang, S. McLaughlin, W. Enck, and P. McDaniel. Semantically rich application-centric security in Android. In Proceedings of ACSAC ’09, December 2009.

[99] T. Orekondy, M. Fritz, and B. Schiele. Connecting pixels to privacy and utility: Automatic redaction of private information in images. CVPR, 2018.

[100] S. A. Ossia, A. S. Shamsabadi, A. Taheri, H. R. Rabiee, N. D. Lane, and H. Haddadi. A hybrid deep learning architecture for privacy-preserving mobile analytics. CoRR, 2017.

[101] P. Pearce, A. P. Felt, G. Nunez, and D. Wagner. Addroid: Privilege separation for applications and advertisers in android. ASIACCS, 2012.

[102] K. Plarre, A. Raij, S. M. Hossain, A. A. Ali, M. Nakajima, M. Al’absi, E. Ertin, T. Kamarck, S. Kumar, M. Scott, D. Siewiorek, A. Smailagic, and L. E. Wittmers. Continuous inference of psychological stress from sensory measurements collected in the natural environment. IPSN, 2011.

138 [103] N. Raval, A. Machanavajjhala, and L. P. Cox. Protecting visual secrets using adver- sarial nets. CVPR Workshops, 2017. [104] N. Raval, A. Srivastava, A. Razeen, K. Lebeck, A. Machanavajjhala, and L. P. Cox. What You Mark is What Apps See. In Proceedings of MobiSys ’16, June 2016. [105] F. Roesner, T. Kohno, and D. Molnar. Security and privacy for augmented reality systems. Commun. ACM, 2014. [106] F. Roesner, D. Molnar, A. Moshchuk, T. Kohno, and H. J. Wang. World-driven access control for continuous sensing. Technical Report MSR-TR-2014-67, 2014. [107] J. Schiff, M. Meingast, D. Mulligan, S. Sastry, and K. Goldberg. Respectful cam- eras: detecting visual markers in real-time to address privacy concerns. In IROS, 2007. [108] C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: A local svm approach. ICPR, 2004. [109] R. Shokri. Privacy Games: Optimal User-Centric Data Obfuscation. PETS, 2015. [110] S. Sonoda and N. Murata. Neural network with unbounded activations is universal approximator. ACHA, 2017. [111] R. Templeman, M. Korayem, D. Crandall, and A. Kapadia. PlaceAvoider: Steering first-person cameras away from sensitive spaces. In NDSS, 2014. [112] N. K. Thanigaivelan, E. Nigussie, A. Hakkala, S. Virtanen, and J. Isoaho. CoDRA: Context-based dynamically reconfigurable access control system for android. Jour- nal of Network and Computer Applications, 101:1 – 17, 2018. [113] A. Varshney and S. Puri. A survey on human personality identification on the basis of handwriting using ann. ICISC, 2017. [114] J. Vilk, D. Molnar, E. Ofek, C. Rossbach, B. Livshits, A. Moshchuk, H. J. Wang, and R. Gal. Surroundweb : Mitigating privacy concerns in a 3d web browser. Tech- nical Report MSR-TR-2014-147, November 2014. [115] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol. Extracting and compos- ing robust features with denoising autoencoders. ICML, 2008. [116] P. Wijesekera, A. Baokar, A. Hosseini, S. Egelman, D. Wagner, and K. Beznosov. Android permissions remystified: A field study on contextual integrity. In USENIX Security, 2015. [117] P. Wijsekera, A. Baokar, L. Tsai, J. Reardon, S. Egelman, D. Wagner, and K. Beznosov. The Feasibility of Dynamically Granted Permissions: Aligning Mo- bile Privacy with User Preferences. In Proceedings of S&P ’17, May 2017.

139 [118] D. Wu and S. Bratus. A Context-Aware Kernel IPC Firewall for Android. In Pro- ceedings of ShmooCon ’17, January 2017.

[119] Z. Xu and S. Zhu. SemaDroid: A Privacy-Aware Sensor Management Framework for Smartphones. In Proceedings of CODASPY ’15, March 2015.

[120] C.-W. You, N. D. Lane, F. Chen, R. Wang, Z. Chen, T. J. Bao, M. Montes-de Oca, Y. Cheng, M. Lin, L. Torresani, and A. T. Campbell. Carsafe app: Alerting drowsy and distracted drivers using dual cameras on smartphones. MobiSys, 2013.

[121] J. yves Bouguet. Pyramidal implementation of the lucas kanade feature tracker. Intel Corporation, Microprocessor Research Labs, 2000.

[122] A. Zhan, M. Chang, Y. Chen, and A. Terzis. Accurate caloric expenditure of bicy- clists using cellphones. SenSys, 2012.

[123] X. Y. Zhang, G. S. Xie, C. L. Liu, and Y. Bengio. End-to-end online writer identi- fication with recurrent neural network. THMS, 2017.

[124] Y. Zhou, Z. Wang, W. Zhou, and X. Jiang. Hey, you, get off of my market: Detecting malicious apps in official and alternative android markets. In NDSS, 2012.

[125] Y. Zhou, X. Zhang, X. Jiang, and V. W. Freeh. Taming Information-Stealing Smart- phone Applications (on Android). In Proceedings of TRUST ’11, June 2011.

140 Biography

Nisarg Raval earned his BE in Computer Engineering from L. D. College of Engineering, Ahmedabad, India in 2007, and MS in Computer Science from International Institute of Information Technology Hyderabad, India in 2013. He received his PhD in Computer Science from Duke University in 2019. While at Duke, he has published in top Computer Science conferences such as MobiSys and PETS. He also received best demo award for his work on differential privacy at VLDB 2016.

141