ABSTRACT

WU, CHIACHIH. -Based Approaches for Mitigation of Malware Threats. (Under the direction of Xuxian Jiang.)

Modern computer systems consist of a number of software layers to provide efficient resource management, secure isolation, and convenient environment for program development and ex- ecution. The virtualizes multiple instances of hardware for guest operating systems or another layer of with strong isolation between the virtualized hardware instances. Running directly on hardware, physical or virtualized, the operating systems (OSs) provide efficient resource management and a convenient set of interface for applications to access the hardware resource including CPU. In addition, the OS regulates each application with virtual memory, user identification, file permissions, etc. such that a non-privileged application cannot interact with other applications or access privileged files without gaining the corresponding permissions. One level above the OS, the runtime libraries and system daemons help applications to communicate with the OS in an easier manner. Each software layer has various software bugs and design flaws such that the secure isolation could be broken through. This motivates our research of securing different software layers with a series of virtualization-based approaches. In this dissertation, we firstly present an OS-level virtualization system, AirBag, to improve Android architecture in the aspect of malware defense and analysis for dealing with flaws in Android runtime environment. AirBag allows users to “test” untrusted apps in an isolated Android runtime environment without private data leakage, system files corruption, and more severe damages such as sending SMSs to premium numbers. Besides, users can “profile” untrusted apps in the instrumented isolated Android runtime, which improves the capabilities of dynamic analysis. However, such an OS-level approach is vulnerable to attacks that exploit the vulnerabilities inside the OS. When the OS is compromised, all private data such as the bank account and password could be leaked while the amount of an online payment could be changed by the attacker. Since building a bug-free OS is impossible, we present a tiny hypervisor, tHype, to provide trusted IO access to users when they input the sensitive data or perform critical operations. Compared to existing hypervisors, tHype outperforms by its code size since it only virtualizes critical IO on mobile devices, mainly touchscreen and framebuffer. Yet, in general, virtualizing computer systems is complicated such that most existing hypervisors have large code base which make them vulnerable. Even worse, the hosted (or Type-II) hypervisor is considered to include the host OSs in its trusted code base (TCB) that makes it have a wider attack surface compared to bare-metal (or Type-I) hypervisors. We present the DeHype system to reduce the TCB of the hosted hypervisor by deprivileging its execution to user mode. With DeHype, the hypervisor is executed in the context of an user mode for each guest VM, which prevents a compromised hypervisor from attacking other guests. © Copyright 2015 by Chiachih Wu

All Rights Reserved Virtualization-Based Approaches for Mitigation of Malware Threats

by Chiachih Wu

A dissertation submitted to the Graduate Faculty of North Carolina State University in partial fulfillment of the requirements for the Degree of Doctor of Philosophy

Computer Science

Raleigh, North Carolina

2015

APPROVED BY:

Douglas Reeves William Enck

Huiyang Zhou Xuxian Jiang Chair of Advisory Committee DEDICATION

To my parents.

ii BIOGRAPHY

The author was born in a small town ...

iii ACKNOWLEDGEMENTS

I would like to thank my advisor for his help.

iv TABLE OF CONTENTS

LIST OF TABLES ...... vii

LIST OF FIGURES ...... viii

Chapter 1 INTRODUCTION ...... 1 1.1 Problem Overview...... 1 1.2 Our Approach ...... 3 1.3 Dissertation Statement and Contributions ...... 4 1.4 Dissertation Organization ...... 5

Chapter 2 RELATED WORK ...... 6 2.1 Securing Mobile Apps ...... 6 2.2 Securing I/O Access...... 8 2.3 Securing Hypervisors...... 10

Chapter 3 Isolating Android Malware with AirBag ...... 13 3.1 Introduction ...... 13 3.2 Design...... 16 3.2.1 Design Goals and Threat Model...... 16 3.2.2 Enabling Techniques ...... 17 3.2.3 Additional Capabilities...... 19 3.3 Implementation...... 21 3.3.1 Namespace/Filesystem Isolation...... 21 3.3.2 Context-Aware Device Virtualization...... 22 3.3.3 Decoupled App Isolation Runtime...... 26 3.3.4 Lessons Learned...... 28 3.4 Evaluation...... 29 3.4.1 Effectiveness ...... 30 3.4.2 Performance Impact...... 34 3.4.3 Power Consumption and Memory Usage...... 35 3.5 Discussion...... 36 3.6 Summary...... 38

Chapter 4 Securing Critical I/O Operations with tHype ...... 39 4.1 Introduction ...... 39 4.2 Design...... 43 4.2.1 Critical I/O Virtualization...... 43 4.2.2 Capability-Based Memory Sharing and Alternate Memory View ...... 45 4.3 Implementation...... 46 4.3.1 Critical I/O Virtualization...... 46 4.3.2 Capability-Based Memory Sharing and Alternate Memory View ...... 47 4.4 Evaluation...... 51 4.4.1 Performance Overhead of Memory View Switching ...... 51

v 4.4.2 Latency of Virtualized IO...... 52 4.5 Discussion...... 52 4.6 Summary...... 54

Chapter 5 Deprivileging Hosted Hypervisors with DeHype ...... 55 5.1 Introduction ...... 55 5.2 Design...... 57 5.2.1 Dependency Decoupling...... 59 5.2.2 Memory ...... 60 5.2.3 Optimizations ...... 61 5.3 Implementation...... 62 5.3.1 Dependency Decoupling...... 62 5.3.2 Memory Rebasing...... 63 5.3.3 Optimizations ...... 66 5.3.4 Lessons Learned...... 68 5.4 Evaluation...... 69 5.4.1 Security Benefits...... 69 5.4.2 Other Benefits ...... 70 5.4.3 Performance ...... 73 5.5 Discussion...... 73 5.6 Summary...... 76

Chapter 6 Conclusion and Future Work ...... 77

BIBLIOGRAPHY ...... 79

vi LIST OF TABLES

Table 3.1 Supported Android Hardware Devices in AirBag ...... 23 Table 3.2 Effectiveness of AirBag in Successfully Blocking 20 Representative Android Malware...... 29 Table 3.3 Benchmarks Used in Our Evaluation...... 34

Table 4.1 Hypercalls to Support Capability-Based Memory Sharing and Alternate Mem- ory View ...... 50

Table 5.1 Ten Privileged Services in DeHype...... 63 Table 5.2 Cached VMCS Fields...... 67 Table 5.3 Software Packages used in Our Evaluation...... 73

vii LIST OF FIGURES

Figure 1.1 An Overview of the Dissertation...... 5

Figure 3.1 An Overview of AirBag to Confine Untrusted Apps ...... 17 Figure 3.2 Framebuffer Virtualization in AirBag (Nexus One) ...... 22 Figure 3.3 Telephony Virtualization in AirBag...... 26 Figure 3.4 Seamless Integration of AirBag ...... 28 Figure 3.5 GoldDream Analysis...... 31 Figure 3.6 DKFBootKit Analysis...... 32 Figure 3.7 HippoSMS Analysis...... 33 Figure 3.8 Performance Measurement of AirBag on Google Nexus One, Nexus 7, and Samsung Galaxy S III...... 34 Figure 3.9 AnTuTu Measurement Results...... 35

Figure 4.1 Privileged File Access Example...... 40 Figure 4.2 An overview of tHype to secure critical I/O operations ...... 42 Figure 4.3 Screenshot of Using tHype-powered Trusted I/O ...... 44 Figure 4.4 Framebuffer Virtualization based on Second Level Address Translation . . . . . 47 Figure 4.5 Initial State of the EPT Tree with an Alternate Memory View...... 48 Figure 4.6 The EPT Tree with an Alternate GPA/HPA Mapping ...... 49 Figure 4.7 Performance Overhead of View Switching ...... 52

Figure 5.1 An overview of DeHype to deprivilege hosted hypervisor execution ...... 58 Figure 5.2 The memory management in DeHype. The solid lines mark the ways to gen- erate the memory blocks in different address spaces while the dotted lines mark the translation between memory address spaces...... 64 Figure 5.3 An example of constructing pseudo NPTs for the deprivileged hypervisor to traverse...... 65 Figure 5.4 A GDB session that debugs KVM code with the environment familiar to most programmers...... 71 Figure 5.5 A Valgrind session that checks possible KVM memory leaks...... 72 Figure 5.6 Relative Performance of DeHype...... 74

viii CHAPTER

1

INTRODUCTION

1.1 Problem Overview

Securing computer systems is always a challenging problem due to the complexity of the layered software architecture adopted by modern computer systems. At the software layer interacting with users, applications access hardware resources through the underlying operating systems with the help of runtime libraries and system daemons running with the applications. Any software bug or design flaw in the runtime libraries or system daemons would jeopardize the system in terms of attacking other applications from a malicious one, damaging system files, etc. Running one level below, a modern typically has hundreds of system calls exported to upper layers and a large code base. The critical vulnerabilities could be exploited through the wide attack surface (system calls) to subvert the whole system since the attacker would be in charge of the most privileged software layer. Although the hypervisors can isolate the guest operating systems by providing virtualized hardware layers, the large trusted code base (TCB) of hypervisors still lead to numerous vulnerabilities which allow attackers to break out of the virtualized layer and attack other guest virtual machines (VMs). As the attack patterns vary from one software layer to another, different defense mechanisms are needed to secure the whole system. First, we start from the defense of malicious applications that

1 1.1. PROBLEM OVERVIEW CHAPTER 1. INTRODUCTION attack the runtime libraries and system daemons. The explosive growth of smartphone sales makes mobile applications (“apps”) the most popular computer applications. Inevitably, the rise in the popularity of smartphones also makes them an attractive target for attacks. In light of these threats, current mobile platform providers have developed various server-side vetting processes to block malicious applications. While helpful, they are still far from ideal in achieving their goals. To make matters worse, the presence of alternative (less-regulated) mobile marketplaces also opens up new attack vectors, which necessitate client-side solutions (e.g., mobile anti-virus software) to run on mobile devices. However, existing client-side solutions still exhibit limitations in their capability or deployability. For example, the permission based sandboxing mechanism in Android architecture is not sufficient to regulate malicious apps since they may masquerade as legitimate apps and abuse additional permissions to break out the sandbox provided by Android runtime. To this end, we need a solution to better isolate apps even when the Android runtime is compromised. Being one level below the runtime libraries, the operating system (OS) provides a rich set of functionalities for accessing hardware and management resources. For example, the operating system enables an abundant combination of Android devices with its architecture and plenti- ful hardware support. However, the extensible architecture of Linux introduces new threats. The disclosure of a series of security advisories [Cod] demonstrates how the third-party drivers create vulnerable points in the OS. An experienced attacker can exploit those vulnerabilities to subvert the OS such that users’ private data could be leaked. Even the OS itself may have vulnerabilities that could be easily exploited (e.g., CVE-2013-6282 [Cvee]). Since we cannot easily control the quality of a complicated software project such as the , scrutinizing software bugs in the OS is impossible. As a result, we cannot completely prevent the OS from being compromised. However, we can try to isolate the damages even when the OS is taken over. Due to the fact that all private data need to be input by users before being leaked by attackers, the leakage could be prevented if we can have a way to process critical hardware input events such as the touch events on the touchscreen. After we can securely get the input data, we may need to store them on the disk, render them on the screen, or even perform a network-based transaction with them. To this end, we need a system to securely access the touchscreen controller, the disk, the screen, and the network interface controller when the OS is compromised. Since any security mechanism running in the OS level could be broken through at the same time that the OS is subverted, we need to do it in a more privileged level — the hypervisor level. Specifically, to have a better isolation between the OS and the secure IO mechanism, we need a bare-metal hypervisor which is small and simple for virtualizing the critical hardware components. The monitors (VMMs) or hypervisors hosting the operating systems can be used to isolate the compromised OSs and perform introspection on guest VMs to improve security.

2 1.2. OUR APPROACH CHAPTER 1. INTRODUCTION

Therefore, the security of the hypervisor itself becomes the most important problem that we need to deal with. As one of the two types of hypervisors, the increased adoption of hosted hypervisors in virtualized computer systems motivates our research on securing hosted hypervisors. By non- intrusively extending commodity OSs, hosted hypervisors can effectively take advantage of a variety of mature and stable features as well as the existing broad user base of commodity OSs. However, virtualizing a computer system is still a rather complex task. As a result, existing hosted hypervisors typically have a large code base (e.g., 33.6K SLOC for KVM), which inevitably introduces exploitable software bugs. Unfortunately, any compromised hosted hypervisor can immediately jeopardize the host system and subsequently affect all running guests in the same physical machine. To prevent a hosted hypervisor from being compromised, we need a way to reduce the code base of the hypervisor running in privileged level such that the whole system could not be subverted by exploiting software bugs in the deprivileged part of the hypervisor. To address theses challenges, we present a series of virtualization-based approaches for securing apps, operating systems, and hypervisors. Specifically, we present AirBag (Chapter3) to enhance the Android platform for isolating untrusted apps. For the Android operating system, the tiny bare- metal hypervisor, tHype, is presented to provide secure isolation between the OS and the critical IO operations (Chapter4). The increasingly adopted hosted hypervisor, KVM, is secured by DeHype (Chapter5) by deprivileging most of its execution to user mode.

1.2 Our Approach

To prevent malicious apps from attacking other apps or damaging the system, we present AirBag, a lightweight OS-level virtualization approach to enhance the popular Android platform and boost our defense capability against mobile malware infection. Assuming a trusted smartphone OS kernel and the fact that untrusted apps will be eventually installed onto users’ phones, AirBag is designed to isolate and prevent them from infecting our normal systems (e.g., corrupting the phone firmware) or stealthily leaking private information. More specifically, by dynamically creating an isolated runtime environment with its own dedicated namespace and virtualized system resources, AirBag not only allows for transparent execution of untrusted apps, but also effectively mediates their access to various system resources or phone functionalities (e.g., SMSs or phone calls). We have implemented a proof-of-concept prototype on three representative mobile devices, i.e., Google Nexus One, Nexus 7, and Samsung Galaxy S III. The evaluation results with a number of untrusted apps, including real-world mobile malware, demonstrate its practicality and effectiveness. In order to secure critical I/O access with a untrusted Android OS and software stack, we design and implement the thin hypervisor named tHype. With the paravirtualization interfaces provided

3 1.3. DISSERTATION STATEMENT AND CONTRIBUTIONS CHAPTER 1. INTRODUCTION by tHype, the trusted apps can access critical I/O devices parallely with the untrusted software with strong isolation. Our system also enables an efficient way of multiplexing hardware resources by manipulating memory mappings in the hypervisor level. The evaluation results shows that the I/O virtualization does not contribute too much overhead. For securing the hosted hypervisor, we present DeHype, a system that aims to dramatically reduce the exposed attack surface of it by deprivileging its execution to user mode. In essence, by decoupling the hypervisor code from the host OS and deprivileging its execution, our system demotes the hypervisor mostly as a user-level , which not only substantially reduces the attack surface (with a much smaller TCB), but also brings additional benefits in allowing for better development and debugging as well as concurrent execution of multiple hypervisors in the same physical machine. To evaluate its effectiveness, we have developed a proof-of-concept prototype that successfully deprivileges 93.2% of the loadable KVM module code base in user mode while only adding a small TCB (2.3K∼ SLOC) to the host OS kernel. Additional evaluation results with a number of benchmark programs further demonstrate its practicality and efficiency.

1.3 Dissertation Statement and Contributions

Compared to prior approaches, the central thesis of this dissertation is the effectiveness of improving the security a complex computer system with the virtualization technology. Specifically, we address the malware threats in different software layers in a computer system and the limitations of existing approaches. Toward this, we present the design and implementation, and evaluation of AirBag to isolate untrusted apps when the OS is trusted and tHype to secure critical I/O access when the OS is untrusted. Further, we reduce the attack surface of a hosted hypervisor which runs in the most privileged layer to virtualize the machine. In the following, we highlight the main contributions of this dissertation.

• Android malware defense We designed, implemented, and evaluated AirBag, an OS-level virtualization based system that improves the Android architecture to provide an isolated Android runtime environment for apps to be executed without affecting other apps in the native runtime or damaging the system. Under AirBag, an instrumented Android runtime can be executed parallely with the native runtime, which enables users to either safely test or profile untrusted apps in different running modes.

• Trusted I/O access on mobile platforms We designed, implemented, and evaluated tHype, a thin bare-metal hypervisor that provides paravirtualization based I/O access to the trusted apps for critical and security sensitive operations. Under tHype, users can interact with the

4 1.4. DISSERTATION ORGANIZATION CHAPTER 1. INTRODUCTION

AirBag Isolating Android Malware with OS−Level Virtualization Operating System

Type−II Hypervisor Type−I Hypervisor

DeHype tHype Deprivileging Hosted Hypervisors Securing Critical I/O Access for Reducing the Attack Surface by a Tiny Hypervisor

Figure 1.1 An Overview of the Dissertation

critical I/O devices with the isolation provided by the hypervisor such that the leakage of critical private data input by users could be prevented from source.

• Hosted hypervisor integrity We designed, implemented, and evaluated DeHype, a system that reduce the attack surface of a hosted hypervisor by deprivileging its execution. Under DeHype, the TCB of the hosted hypervisor is substantially reduced. DeHype also allows concur- rent execution of multiple hypervisors in the same machine, which prevents a compromised hypervisor from attacking other guest VMs that original hosted hypervisors cannot guarantee.

1.4 Dissertation Organization

Figure 1.1 shows an overview of this dissertation. The rest of the dissertation is organized as follows. First, we present closely related work in Chapter2. Second, we show the detailed design, imple- mentation, and evaluation of AirBag in Chapter3 followed by the design and implementation of the tHype hypervisor in Chapter4. Third, we give details about DeHype in Chapter5. Finally, we conclude this dissertation and propose directions for future work in Chapter6.

5 CHAPTER

2

RELATED WORK

2.1 Securing Mobile Apps

In this section, we categorize related work of securing mobile apps into different research areas and compare our AirBag system (Chapter3) with them. Server-side protection The first category of securing mobile apps includes systems that are designed to improve the walled garden model in detecting and pruning questionable apps (including malicious ones) from centralized mobile marketplaces. For example, Google introduces the bouncer service in February, 2012. Besides smartphone vendors, researchers also endeavor to develop various systems to expose potential security risks from untrusted apps. PiOS [Ege11] statically analyzes mobile apps to detect possible leaks of sensitive information; Enck et al. [Enc11] studies free apps from the official Google Play with the goal of understanding broader security characteristics of existing apps. Our system is different by proposing a complementary client-side solution to protect mobile devices from being infected by mobile malware. Client-side protection The second category aims to develop mitigation solutions on mobile devices. For example, mobile anti-malware software scan the apps on the devices based on known malware signatures, which limit their capability in detecting zero-day malware. MoCFI [Dav12] provides a CFI enforcement framework to prohibit runtime and control-flow attacks for Apple iOS.

6 2.1. SECURING MOBILE APPS CHAPTER 2. RELATED WORK

TaintDroid [Enc10] extends the Android framework to monitor the information flow of privacy- sensitive data. MockDroid [Ber11], AppFence [Hor11], Kantola et al. [Kan12], Airmid [Nad11], Apex [Nau10], and CleanOS [Tan12] also rely on extensions on Android framework to better control apps’ access to potential sensitive resources. Aurasium [Xu12] takes a different approach by repackaging untrusted apps and then enforcing certain access control policies at runtime. With varying levels of successes, they share a common assumption of a trustworthy Android framework, which unfortu- nately may not be the case for advanced attacks (that could directly compromise privileged system daemons such as init or zygote). In contrast, our system assumes that the Android framework inside AirBag could be compromised (by untrusted apps) but the damages are still contained in AirBag to prevent the native runtime environment being affected. From another perspective, a number of systems have been proposed to extend the Android permission system. For example, Kirin [Enc09] analyzes apps at install time to block apps with a dangerous combination of permissions. Saint [Ong09] enforces policies in both install time and run time to govern the assignment as well as the usage of permissions. Stowaway [Fel11] identifies the apps which request more permissions than necessary. In comparison, our system is different in not directly dealing with Android permissions. Instead, we aim to mitigate the risks by proposing a separate runtime that is isolated and enforced through a lightweight OS-level extension. Virtualization The third category of securing mobile apps includes recent efforts to develop or adopt various virtualization solutions which can strengthen the security properties of mobile platforms [Vas12]. Starting from the approaches based on Type-I hypervisors (e.g., OKL4 Microvisor [Okl], L4Android [Lan11], and on ARM [Hwa08]), they may have smaller TCB but require signifi- cant efforts to support new devices and cannot readily leverage commodity OS kernels to support hardware devices. In a similar vein, researchers have also applied traditional Type-II hypervisor approaches on mobile devices (e.g., VMware’s MVP [Bar10] and KVM/ARM [DN10]). Compared to Type-I hypervisors, Type-II hypervisors might take advantage of commodity OS kernels to support various hardware devices. However, it still needs to run multiple instances of guest OS kernels, which inevitably increase memory footprint and power consumption. Also, the world switching operation causes additional performance degradation, which affects the scalability in resource-constrained mobile device environments. Beside traditional Type-I and Type-II hypervisors, OS-level virtualization approaches are also being applied to mobile devices. For example, Cells [And11] introduces a foreground/background virtual phones usage model and proposes a lightweight OS-level virtualization to multiplex phone hardware across multiple virtual phones. Our system differs from Cells in two important aspects: First, as mentioned earlier, Cells aims to embrace the emerging “bring-your-own-device” (BYOD) paradigm by supporting multiple virtual phone instances in one hardware device. Each virtual

7 2.2. SECURING I/O ACCESS CHAPTER 2. RELATED WORK phone instance is treated equally and the isolation is achieved at the coarse-grained virtual phone boundary. AirBag instead is an app-centric solution that aims to maintain a single phone usage model and the same user experience while enforcing reliable isolation of untrusted apps. Second, to support multiple virtual phones, Cells needs to maintain an always-on root namespace for their management and hardware device virtualization. In comparison, AirBag is integrated with the native runtime for seamless user experience without such a root namespace. At the conceptual level, the presence of a root namespace is similar to the management domain in Type-I Xen hypervisor, which could greatly affect the portability on new phone models. Being a part of native system, our system can be readily ported to new devices with stock firmware.1 In addition, researchers also explore user-level solutions to provide separate mobile runtime en- vironments. For example, TrustDroid [Bug11] enhances the Android framework to provide domain- level isolation that confines the unauthorized data access and cross-domain communications. Recent Android release (Jellybean 4.2) extends the Android framework to add multi-user support. Such a user-level solution requires a trustworthy framework that is often the target for advance attacks. Moreover, these solutions require deep modifications on the Android framework. In com- parison, AirBag adds a lightweight OS-level extension to confine cross-namespace communications without affecting the native Android framework, achieving backward and forward compatibility.

2.2 Securing I/O Access

In this section, we summarize related work of securing I/O access and compare our tHype system (Chapter4) with them. Zhou et. al [Zho12] presents a hypervisor-based design that enables a trusted path to bypass untrusted OS, applications, and I/O devices, with a minimal TCB. The trusted-path system is closely related to our tHype system. However, the PC-based solution does not address the problems we deal with on mobile devices, such as framebuffer and touchscreen as well as ARM’s virtualization support. Cloud Terminal [Mar12] uses a lightweight secure thin client to access software in the cloud. The tiny terminal creates a secure I/O tunnel between the user and the remote software without relying on the large untrusted software stack. Our system shares the design of assuming the whole Android software stack is untrusted. Cloud Terminal uses the hypervisor to isolate the terminal client while the trusted client leverages the untrusted helper, which are also similar to our system. Again, Cloud Terminal is a PC-based solution such that the framebuffer and touchscreen support on mobile platforms has not been addressed. On the other hand, Cloud Terminal requires the

1 Our prototyping experience confirms that AirBag can be readily ported to a new phone model. In fact, the very first prototype on Google Nexus One is ported to Nexus 7 and Samsung Galaxy S III each within one week!

8 2.2. SECURING I/O ACCESS CHAPTER 2. RELATED WORK cloud side support for the VNC-based UI, which is not a common use case for smartphone users. In comparison, our system provides users an secure runtime environment for running security sensitive apps in the same machine. Gyrus [Jan14] also relies on the hypervisor to ensure “what you see is what you send (WYSIWYS)”. The user input is drawn by dom-0 in a secure overlay on top of dom-U display window for users to verify while WYSIWYS is ensured by network traffic monitoring. Our framebuffer virtualization can be tweaked to render the secure overlay on top of the display of the untrusted domain which is more convenient for users. However, Gyrus is implemented based on the full-fledged KVM hypervisor which introduces a large TCB compared to our system for the use of securing user input. Bumpy [McC09] protects sensitive user input strings by encrypting them before sending into the compromised OS or web browser with the help of the remote web server which decrypts the strings and retrieves the plaintext inputted by the user. Our system shares the idea of protecting user input when the OS and the upper software are compromised. Since Bumpy requires a Trusted Monitor for receiving the indicators from the encryption-capable input devices, our system is more likely to be used on existing mobile devices as we only use the ordinary LCD and touchscreen to protect sensitive user input. CoverPad [Yan13] improves leakage resilience of password entry by safely delivering hidden messages with the help of users’ gesture. Although our threat model is considered the “internal attack” which has not been addressed by CoverPad, such a scheme to prevent external attacks can be used in our trusted guest domain to improve the security of our system. The recent system M-Aegis [Lau14] creates a conceptual layer called “Layer 7.5” between the app and the user to provide true end-to-end encryption of user data. Since our system deals with the untrusted OS which is consider “out of scope threats”, M-Aegis is orthogonal to our work. Again, it could be used in the trusted domain for security improvement as well. Based on the advances of ARM architecture, some recent systems such as VeriUI [LC14] and TrustUI [Li14] secure I/O access with the help of ARM TrustZone [Tru] support. Although ARM architecture currently dominates the mobile phones and tablets, we think the virtualization support is a more general feature that most architectures would have. On the other hand, VeriUI is prone to UI spoofing attacks because the attacker can access the shared framebuffer from the Normal World while TrustUI addresses that problem with an extra LED. Our system isolates the framebuffer access by the hypervisor and support (e.g., Intel EPT) without extra hardware devices.

9 2.3. SECURING HYPERVISORS CHAPTER 2. RELATED WORK

2.3 Securing Hypervisors

In this section, we categorize related work of securing hypervisors into different research areas and compare our DeHype system (Chapter5) with them. Improving hypervisor security The first area of securing hypervisors is recent systems that are developed to improve hypervisor security. For example, seL4 [Kle09] is proposed to formally verify the absence of certain types of software vulnerabilities in a customized small hypervisor. Verve [YH10] mechanically verifies every instruction in the software stack so that the hypervisors running over it could also be verified to ensure type and memory safety. HyperSafe [WJ10] instead admits the presence of exploitable software bugs in hypervisors but proposes solutions to protect the runtime (bare-metal) hypervisor integrity. Others re-architect the hypervisor design for a minimized TCB. Specifically, NOVA [SK10] implements a thin bare-metal hypervisor that moves the virtualization support to user level. Xoar [Col11] modifies the original Xen design by breaking the control VM into single-purpose service VMs. Xen disaggregation [Mur08] decomposes Xen by moving the privileged domain builder into a minimal trusted compartment for trusted virtualization. Min-V [Ngu12] disables non-critical virtual devices by minimizing the codebase of the virtualization stack with the so-called delusional boot approach. By using formal verification, MinVisor [Dah11] provides integrity guarantees. Notice that such efforts require a new design of bare-metal hypervisors. Their applicability and effectiveness remain to be demonstrated to protect hosted hypervisors (e.g., KVM) that run together with a commodity host OS. From another perspective, NoHype [Sze11] works in a controlled cloud setting by eliminating the bare-metal hypervisor after preparing the virtualization environment. Specifically, it strictly partitions the hardware resource among guest VMs so that there is no need for the guest VM to interact with the hypervisor during its execution. Due to the close interaction between a hosted hy- pervisor and the host OS, the NoHype approach cannot be applied for hosted hypervisor protection. In addition, DeHype transparently supports commodity OS kernels (e.g., Linux and Windows) while NoHype still requires minor modifications on the guest OS. 4 KVM-L4 [Pet09] is a closely related system that enables a modified Linux kernel (i.e., L Linux with the KVM module loaded) to run in user mode over the customized L4/Fiasco microkernel. With that, in order for QEMU to interact with KVM, it has to go through the IPC mechanism implemented in the L4/Fiasco microkernel. In comparison, as KVM is largely demoted as a user-level library with DeHype, the interaction between QEMU and KVM is simply achieved with a user-mode function call – instead of expensive L4 IPC in KVM-L4. Also, DeHype naturally supports running multiple KVM instances on the same host while KVM-L4 requires starting a new to host another KVM instance on the same host.

10 2.3. SECURING HYPERVISORS CHAPTER 2. RELATED WORK

HyperLock [Wan12] is another closely related system that creates a separate address space in host OS kernel to confine the loadable KVM module execution. However, since it still executes in privileged mode, additional complex techniques still need to be proposed to prevent potential misuse of its privileged code (e.g., enforcing instruction alignment rules through the compiler). In comparison, by deprivileging the KVM execution to user mode, DeHype naturally leverages the user-kernel mode separation (or the process boundary) to protect the host system (or other unrelated guest VMs) from a compromised KVM. User-mode Linux [Dik00] is a system to run virtual Linux systems as applications of a normal Linux system. As such, the guest of UML is limited to the Linux while DeHype does not have such a limitation. On the other hand, UML can potentially be leveraged by DeHype for kernel function supports similar to SUD [BWZ10]. However, our prototype shows that a full-fledged Linux is not required as DeHype only relies on a small number of kernel functions that are simple to recreate in the user-space. The Turtles project [BY10] enables nested virtualization support for KVM. Since the deprivi- leged hypervisor in our system to some extent emulates certain privileged instructions such as

VMREAD/VMWRITE (Section 5.3.3), it has a similar role as an L1 hypervisor. Therefore, our VMCS caching approach shares the idea of the VMCS shadowing they proposed. The mechanism of Pseudo

NPT is also similar to the EPT0 2. However, the L0 hypervisor in the Turtles project is a full-fledge → hypervisor while HypeLet has a much smaller privileged code base which could be used to better secure the lowest level hypervisor. Isolating untrusted device drivers Since hosted hypervisors are similar to device drivers running together with the host OS, the second area of securing hypervisors includes systems that isolate device drivers from the host OS kernel. For example, Gateway [SG11], HUKO [Xio11], and SIM [Sha09] leverage a trustworthy hypervisor to isolate kernel device drivers or security monitors. Zhou et. al [Zho12] builds a verifiable trusted path to ensure data transfers between devices and user programs with the leverage of a small hypervisor. In comparison, our goal here is to deprivilege the hosted hypervisor, which is assumed to be trusted in these systems. Inside the host OS kernel, Nooks [Swi03] improves the OS reliability by isolating device drivers in the light-weight protection domain. By assuming the drivers to be faulty but not malicious, Nooks by design cannot handle malicious or compromised device drivers. From another perspective, researchers also proposed solutions to isolate device drivers in . For example, L3 [Lie91] enables user-level device drivers based on a micro-kernel architecture. SUD [BWZ10] executes existing drivers as untrusted user-level processes to prevent misbehaving drivers from crashing the rest of the system. MicroDrivers [Gan08] splits drivers to a privileged kernel part and an unprivileged user part at the cost of increased performance overhead. RVM [Wil08]

11 2.3. SECURING HYPERVISORS CHAPTER 2. RELATED WORK executes device drivers with limited privilege in user space, where all the interactions between the driver and the device is constrained by the reference monitor built with a customized device safety specification. When deprivileging the KVM execution, we share a similar motivation behind those efforts. However, a hosted hypervisor module is more than a traditional and its deprivileged execution poses additional challenges. Particularly, a hosted hypervisor has a richer set of special privileged instructions to execute than a driver. As a result, the earlier approach such as the way IOMMU is being employed in SUD [BWZ10] may not be applicable to hypervisors. In addition, the host hypervisor differs from traditional device drivers with its unique host-guest world switching operations and the need for hardware-assisted memory virtualization. Their support requires new design and implementation considerations (Sections 5.2 and 5.3). Specifically, the VMCS caching and memory rebasing are unique in our DeHype system to allow for efficient deprivileged execution without sacrificing security.

12 CHAPTER

3

ISOLATING ANDROID MALWARE WITH AIRBAG

3.1 Introduction

Smartphone sales have recently experienced an explosive growth. Canalys [Can] reports that the year of 2011 marks as the first time in history that smartphones have outsold personal computers. Their incredible popularity can be partially attributed to their improved functionality and convenience for end users. Especially, they are no longer basic devices for making phone calls and receiving text messages, but powerful platforms, with comparable computing and communication capabilities to commodity PCs, for GPS navigation, web surfing, and even online businesses. Among competing smartphone platforms, Google’s Android apparently gains the dominance with more than half of all smartphones shipped to end users running Android [Com]. One key appealing factor of smartphone platforms is the availability of a wide range of feature- rich mobile applications (“apps”). For instance, by September 2012, Google Play [Goo] and Apple App Store [App] are home to more than 650,000 and 700,000 apps, respectively. The centralized model of mobile marketplaces not only greatly helps developers to publish their mobile apps, but streamlines the process for mobile users to browse, download, and install apps, hence boosting

13 3.1. INTRODUCTION CHAPTER 3. ISOLATING ANDROID MALWARE WITH AIRBAG smartphone popularity. With the increased number of smartphone users, malware authors are also attracted to the opportunity to widely spread mobile malware. As an example, the DroidDream malware infected more than 260,000 devices within 48 hours, before Google took action to remove them from the official Android Market (now Google Play) [Dro]. Considering these threats, mobile platform providers have developed server-side vetting processes to detect or remove malicious apps from centralized marketplaces in the first place. With varying levels of success, many malicious apps are identified and removed from marketplaces. However, they are far from ideal as malware authors could still find new ways to penetrate current marketplaces and upload malicious apps. From another perspective, a number of client-side solutions have been developed. As a mobile platform provider, Google provides the Android security architecture which sandboxes apps based on their permissions and runs them as separate user identities. However, they are still insufficient as malicious apps may masquerade as legitimate apps but request (and abuse) additional permissions [Fel11] to access protected smartphone functionality or private information. In the face of these threats, traditional software security vendors have developed corresponding mobile anti-malware software. With the inherent dependence on known malware signatures, they are largely ineffective against new ones. To mitigate them, Aurasium [Xu12] is proposed to enforce certain access control policies on untrusted apps. However, it requires repackaging apps to enable the enforcement and the enforcement is still ineffective against attacks launched from native code. L4Android [Lan11] and Cells [And11] take a virtualization-based approach to allow for multiple virtual smartphones to run side-by-side on one single physical device. However, they are mainly designed to embrace the new “bring-your-own-device” (BYOD) paradigm and the offered isolation is too coarse-grained at the virtual smartphone boundary. For mobile users, it is desirable to have a lightweight solution that can strictly confine untrusted apps (including ones with native code or root exploits) at the app boundary. In this chapter, we present the design, implementation, and evaluation of AirBag, a new client- side solution that leverages lightweight OS-level virtualization to significantly boost our defense capability against mobile malware infection. Specifically, as a client-side solution, AirBag assumes a trusted smartphone OS kernel and considers users may unintentionally download and install malicious apps (that somehow manage to penetrate the vetting processes of mobile marketplace curators). Tostrictly isolate and prevent them from compromising normal phone functionalities such as SMSs or phone calls, AirBag dynamically instantiates an isolated virtual environment to ensure their transparent “normal” execution, and further mediate their access to various system resources or phone functionalities. Therefore, any damages that may be possibly inflicted by untrusted apps will be strictly isolated within the virtualized environment. To provide seamless user experience, AirBag is designed to run behind-the-scenes and transpar-

14 3.1. INTRODUCTION CHAPTER 3. ISOLATING ANDROID MALWARE WITH AIRBAG ently support mobile apps when they are downloaded, installed, or executed. Specifically, when an user installs (or sideloads) an app, the app will be automatically isolated within an AirBag environ- ment. Inside the AirBag, the app is prohibited to interact with legitimate apps and system daemons running outside. To accommodate its normal functionality, AirBag provides a (decoupled) App Iso- lation Runtime (AIR) whose purpose is to separate it from the native Android runtime, but still allow the isolated app to run as it is installed normally. Further, users can choose to run AIR in three differ- ent modes: (1) “incognito” is the default mode that will completely remove personally-identifying information about the phone (e.g., IMEI) or users (e.g., gmail accounts) to avoid unnecessary infor- mation leakage; (2) “profiling” mode will log detailed execution traces (in terms of invoked Android APIs or functionalities) for subsequent offline analysis; (3) “normal” mode will essentially execute the app without further instrumentation. For other normal phone features (e.g., networking and telephony), the AIR proxies related API calls to the external native Android runtime through an authenticated communication channel.1 This brings us new opportunities to apply fine-grained access control on the isolated app (e.g., prompting users for outgoing SMSs or phone calls) with- out repackaging the app itself or affecting the native Android runtime. Besides, the default mode (“incognito”) of AirBag allows users to “test” an app in the isolated runtime before running it in the native runtime. Throughout the “test” phase, users can check if the app has any abnormal or malicious behavior with the fine-grained access control logs provided by AirBag. This prevents end users from installing malicious apps in the first place. On the other hand, users can also use the “profiling” mode to gather detailed information of the identified malicious apps (in “incognito” mode) for analysis. To develop a robust AirBag mechanism and strictly confine untrusted apps, a common wisdom is to encapsulate their execution in a separate virtual machine (VM) that is isolated from the rest of the system. However, challenges exist to create a lightweight virtual machine for commodity mobile devices. In particular, current mobile devices are typically resource constrained with limited CPU, memory, and battery capability. And most off-the-shelf mobile devices do not have the processors with hardware virtualization support, which makes traditional virtualization approaches less desir- able [VH11]. As our solution, AirBag takes a lightweight OS-level virtualization approach but still obtains comparable isolation capability. Specifically, by sharing one single OS kernel instance, our approach scales better than traditional hypervisors and incurs minimal performance overhead. Also, by providing a separate namespace and virtualizing necessary system resources, AirBag still achieves comparable isolation. We have implemented a proof-of-concept prototype on three mobile devices, Google Nexus

1A network connection which relies on the authentication protocols to provide secure communication.

15 3.2. DESIGN CHAPTER 3. ISOLATING ANDROID MALWARE WITH AIRBAG

One, Nexus 7, and Samsung Galaxy S III, running Linux kernel 2.6.35.7, 3.1.10, and 3.0.8, respectively. To ensure seamless but confined execution of untrusted apps, our prototype builds the app isolation runtime or AIR by leveraging the Android Open Source Project (AOSP 4.1.1) to export the same interface while in the meantime allowing users to choose different running modes. Specifically, the “incognito” mode prevents personally-identifying information from being leaked while the “profiling” mode logs the untrusted app behavior, which we find helpful to analyze malicious apps (Section 5.4) in a live phone setting. Security analysis as well as the evaluation with more than a dozen of real-world mobile malware demonstrate that our system is effective and practical. The performance measurement with a number of benchmark programs further shows that our system introduces very low performance overhead.

3.2 Design

3.2.1 Design Goals and Threat Model

Our system is designed to meet three requirements. First, AirBag should reliably confine untrusted apps such that any damage they may incur would be isolated without affecting the native phone environment. The challenges for realizing this goal come from the fundamental openness design behind Android, which implies that any app is allowed to communicate with other apps or system daemons running in the phone (through built-in IPC mechanisms). In other words, once a malicious app is installed, it has a wide attack surface to launch the attack. The presence of privilege escalation or capability leak vulnerabilities [Gra12] further complicates the confinement requirement. Second, AirBag should achieve safe and seamless user experience throughout the lifespan of untrusted apps, from their installation to removal. Specifically, from the user’s perspective, AirBag should avoid incurring additional burden on users. Correspondingly, the challenge to meet this goal is to transparently instantiate AirBag’s app isolation runtime when an untrusted app is being installed and seamlessly adjust different runtime environments when the untrusted app is being launched or terminated. Third, because AirBag is deployed in resource-constrained mobile devices, it should remain lightweight and introduce minimal performance overhead. In addition, AirBag should be generically portable to a range of mobile devices without relying on special hardware or features (that may be limited to certain phone models). Threat Model and System Assumption We assume the following adversary model while designing AirBag: Users will download and install third-party untrusted apps. These apps may attempt to exploit vulnerabilities, especially those in privileged system daemons such as Zygote.

16 3.2. DESIGN CHAPTER 3. ISOLATING ANDROID MALWARE WITH AIRBAG

AirBag

Trusted Trusted Untrusted Trusted Trusted Untrusted App ..... App App App ..... App App

Decoupled AIR Native Android Runtime User Native Android Runtime

Context−Aware Device Virtualization Kernel Linux OS Kernel (w/ Android Extention) Linux OS Kernel (w/ Android Extention)

(a) Current Android Architecture (b) AirBag−Enhanced Android Architecture

Figure 3.1 An Overview of AirBag to Confine Untrusted Apps

By doing so, they could cause damages by either gaining unauthorized access to various system resources or abusing certain phone functionalities in a way not permitted by the user or not known to the user. Meanwhile, we assume a trusted smartphone OS kernel, including our lightweight OS extension to support isolated namespace and virtualized system resources. As a client-side solution, AirBag re- lies on this assumption to establish necessary trusted computing base (TCB). Also, such assumption is shared by other OS-level virtualization research efforts [Lan11; And11]. With that, we consider the threat of corrupting OS kernels falls outside the scope of this work.

3.2.2 Enabling Techniques

In Figure 3.1, we show the overview of AirBag to confine untrusted apps and its comparison with traditional Android-based systems. The confinement is mainly achieved from three key techniques: decoupled app isolation runtime (AIR), namespace/filesystem isolation, and context-aware device virtualization.

3.2.2.1 Decoupled App Isolation Runtime (AIR)

Due to the openness design of Android, all apps share the same Android runtime and consequently any app is allowed to communicate with other apps on the phone. As mentioned earlier, from the security perspective, this exposes a wide attack surface. In AirBag, to minimize the attack surface and avoid affecting the original Android runtime, we choose to decouple the untrusted app execution from it. A separate app isolation runtime that allows apps to run on it and has (almost) no interaction with the original Android runtime is instantiated for untrusted app execution.

17 3.2. DESIGN CHAPTER 3. ISOLATING ANDROID MALWARE WITH AIRBAG

There are several benefits behind such a design: First, by providing a consistent Android abstract layer that will be invoked by third-party Android apps, AIR effectively ensures proper execution of untrusted apps without impacting the original Android runtime. Second, by design, AIR does not need to be trusted as it might be potentially compromised by untrusted apps. Third, a separate app isolation runtime also allows for customization to support different running modes (Section 3.2.3). This is necessary as AIR mainly consists of essential Android framework classes and other service daemons that are tasked to manage various phone resources (e.g., device ID) or features (e.g., sensors). As a result, they likely access private or sensitive information that could be of concern when being exposed to untrusted apps.

3.2.2.2 Namespace/Filesystem Isolation

With a separate Android runtime to host untrusted apps, AirBag also provides a different namespace and filesystem to further restrict and isolate the capabilities of processes running inside. Because of namespace and filesystem isolation, an untrusted app inside AirBag is not able to “see” and interact with other processes (e.g., legitimate apps and system daemons) running outside. In fact, all processes running inside have their own view of running PIDs, which is completely different from external processes. In addition, to proactively contain possible damages, AirBag has its own filesystem different from the normal system. For storage efficiency, we extensively leverage unionfs [Qui06] to compose AirBag’s filesystem and isolate modifications from untrusted apps. To elaborate, when an Android system is loaded, a number of service processes or daemons (e.g., vold, binder and servicemanager) are created. Inside AirBag, we similarly launch the same subset of processes but group them in their own cgroup [Cgr]. By doing so, they are prevented from observing and interacting with processes in another group (i.e., processes in the original native Android system). The cgroup concept greatly facilitates AirBag management. Specifically, the set of processes inside AirBag is typically suspended until one untrusted app is being installed or launched. The newly installed untrusted app will automatically become a member of this cgroup. As a result, we can easily suspend the whole cgroup when no untrusted app is active to minimize the footprint or reduce the performance and power consumption. Note that cgroup is provided by the OS kernel and is assumed to be trusted.

3.2.2.3 Context-Aware Device Virtualization

The presence of a separate AIR and namespace in AirBag unavoidably creates contentions for underlying system resources, even though AirBag delineates a boundary and by default disallows any interaction from inside to outside and vice versa. To resolve the contention, there is a need to

18 3.2. DESIGN CHAPTER 3. ISOLATING ANDROID MALWARE WITH AIRBAG multiplex various system resources. In our design, we develop a lightweight OS-level extension to mediate and multiplex the accesses from native and AirBag runtimes. As an example, suppose two apps need to update the screen at the same time. Traditionally, a single service SurfaceFlinger is in charge of synthesizing data from different sources (including these two apps) and generating the final output to be rendered on the device screen. However, with AirBag, these two apps run in two different runtimes and they will not share the same SurfaceFlinger service. Instead, AirBag has its own SurfaceFlinger service which will independently update the screen. Our solution is to virtualize hardware devices in a context-aware manner. Specifically, our lightweight OS extension adds necessary multiplexing and demultiplexing mechanisms in place when the underlying hardware devices are being accessed. Also, our extension keeps track of the current “active” Android runtime (or namespace) and always allows the active runtime to access the hardware resources. Notice that an Android runtime is active if an app on it holds the focus, i.e., the user is currently interacting with the app. To maintain the same user experience, we disallow an user to simultaneously interact with two apps in different runtimes. As a result, in any particular moment, there exists at most one active runtime. Meanwhile, to gracefully handle contentious access from inactive runtime, we take different strategies base on the nature of relevant hardware resources. For example, for touch-screen and buttons, any press/release event will always be delivered to the active runtime only. For screen update, as the framebuffer device driver performs actual DMA operations from a memory segment to the LCD controller hardware, we accordingly prepare two separate memory segments such that each environment can independently render different output without interfering each other. The framebuffer driver can then choose the active memory segment to perform DMA and thus have an actual access to the LCD controller hardware.

3.2.3 Additional Capabilities

Beside the above key techniques, we also developed additional capabilities to facilitate the confine- ment and improve user experience.

3.2.3.1 Incognito/Profiling Modes

The decoupled AIR to host untrusted apps provide unique opportunities for its customization. Specifically, to prevent private information disclosure, we introduce the incognito mode that essen- tially instruments the AIR to exclude any sensitive data such as IMEI number, phone number, and contacts. For example, the device’s IMEI number can be normally retrieved by apps through the services provided by the Android framework. When entering the incognito mode, such services are

19 3.2. DESIGN CHAPTER 3. ISOLATING ANDROID MALWARE WITH AIRBAG configured to return faked IMEI number to the calling app. Therefore, the isolated app transparently proceeds with fake data without additional risks. Also, AirBag prepares a separate root filesystem that allows for convenient “restore to default” to undo damages from untrusted apps. In addition, we also provide profiling mode that essentially records the execution trace of untrusted apps. The trace is mainly collected in terms of Android-specific logcat, which turns out to be very helpful for malware analysis (Section 5.4).

3.2.3.2 User Confirmation for Sensitive Operations

The decoupled AIR also provides interesting opportunities to further limit the capabilities of isolated apps. For example, a malicious app may attempt to stealthily send SMS text messages to certain premium-rate numbers or record your phone conversation. When such an app runs inside AirBag, the access to related phone features (e.g., radio, audio, and camera) will immediately trigger user attention for approval. In other words, the stealthy behavior from these apps will now be brought to user attention and the user also has the option to disallow it. It is interesting to notice that the latest Android release, i.e., Jellybean 4.2, introduces a built-in security feature called premium SMS confirmation [Pre] to avoid malware to rack up phone bills. While achieving similar goals, AirBag is different in restricting the access to certain phone features outside the AIR environment, thus providing stronger robustness than any inside solutions (as the internal built-in feature can be potentially compromised by untrusted apps for circumvention).

3.2.3.3 Seamless Integration

To achieve seamless user experience, AirBag introduces minimal user interaction when an app is being installed or launched. Specifically, when an untrusted app is being installed (or sideloaded), AirBag will prompt user with a (default) option to install it inside AirBag. If chosen, AirBag essentially notifies its own PackageInstaller to start the installation.2 Note that for an app downloaded from Internet, the Android DownloadManager will store it in a specific directory located in microSD. In our prototype, we choose to export this directory read-only to AirBag so that its PackageInstaller can access it for installation. For improved user experience, AirBag will be installed as the default PackageInstaller. Inside AirBag, we have a daemon that listens to the command from it to kick off internal app installation. In other words, the isolated apps are physically installed in the AirBag instead of the original Android runtime. Moreover, for any app being installed inside AirBag, AirBag will automatically create an app stub that bears the same icon from the original app. (To indicate the fact that it is actually inside AirBag, we will attach a lock sign to the icon.) When the app stub is

2If not chosen, the normal installation procedure will be triggered without AirBag protection.

20 3.3. IMPLEMENTATION CHAPTER 3. ISOLATING ANDROID MALWARE WITH AIRBAG invoked, AirBag will be notified to seamlessly launch the actual app such that the user would feel just like invoking a normal app (without noticing the fact it is actually running inside AirBag). By doing so, the AIR becomes active and the original Android runtime goes to inactive. Once the user chooses to terminate the app, the original Android runtime is resumed back to active.

3.3 Implementation

We have implemented a proof-of-concept AirBag prototype on three different mobile devices, i.e., Google Nexus One, Nexus 7, and Samsung Galaxy S III, running Linux kernel 2.6.35.7, 3.1.10, and 3.0.8 respectively. Our prototype is portable without relying on any specialized hardware support. In the following, we present in detail about our prototype. For simplicity, unless explicitly mentioned, we will use Google Nexus One as the reference platform.

3.3.1 Namespace/Filesystem Isolation Our system confines untrusted apps in a separate namespace and filesystem. In our prototype, we leverage and extend the namespace isolation feature of [Cgr] in mainstream Linux kernels. At the high level, our prototype instantiates a new namespace and then starts from the very first process (i.e., airbag_init) inside AirBag. The airbag_init process will then bootstrap the entire AIR. Specifically, the new namespace of AirBag is created by cloning a new process with a few specific flags: CLONE_NEWNS, CLONE_NEWPID, CLONE_NEWIPC, CLONE_NEWUTS, and CLONE_NEWNET. Further, right before switching the control to the airbag_init program, we initialize a separate root filesystem for the newly clone’d process (and its decedent processes) by invoking pivot_root in the new root directory that contains essential AIR files. We then prepare and filesystems inside AirBag so that subsequent processes inside AirBag can properly interact with the underlying Linux kernel. After that, we yield the control by actually executing the airbag_init program that then kicks off the entire AIR, including various service daemons (e.g., SurfaceFlinger and system_server). These service daemons as well as essential Android framework classes collectively allow untrusted apps to execute transparently when they are dispatched to the AIR. With a new AirBag-specific namespace, all processes running inside cannot observe and interact with processes running outside. However, some features (mainly for improved user experience) may require inter-namespace communication. Specifically, when installing an untrusted app, our PackageInstaller needs to notify AirBag for seamless installation. To achieve that, we virtualize a network device [Vet] inside AirBag and connect it to a pre-allocated bridge interface on the native Android system. By building such an internal channel for “inter-namespace” communication, we

21 3.3. IMPLEMENTATION CHAPTER 3. ISOLATING ANDROID MALWARE WITH AIRBAG

time−sharing active/inactive

Native Runtime screen updater update pmem render framebuffer image memory GPU framebuffer DMA driver AirBag Runtime screen updater’ update pmem’ render framebuffer image memory’

Figure 3.2 Framebuffer Virtualization in AirBag (Nexus One)

can naturally enable networking and telephony support inside AirBag. By instantiating two different namespaces on the same kernel, our prototype needs to keep track of the current active namespace, which is needed to enable context-aware device virtualization (Section 3.3.2). Specifically, we need to export the related namespace information to corresponding OS components (e.g., framebuffer/GPU drivers) such that they can properly route or handle hard- ware device accesses from different namespaces. For instance, when a user-level process requests to update the framebuffer, we need to update the respective memory blocks associated with its names- pace in OS kernel. Fortunately, when a process is clone’d with the CLONE_NEWNS flag, an instance of struct nsproxy would be allocated in Linux kernel to store the information such as utsname and filesystem layout of the new namespace. Given that all processes belong to the same namespace share the same nsproxy data structure, our current prototype simply uses it as the namespace identifier. When a process accesses system resources (e.g., via ), we consult the nsproxy pointer of its task_struct via the current pointer and use it to guide proper access to virtualized system resources. For bookkeeping purpose, we maintain an internal mapping table which records the related nsproxy pointer for each namespace. In our prototype, we find it sufficient to support two namespaces, one for the native Android runtime and another for AirBag. The corresponding entry is dynamically created when the respective first process (i.e., init or airbag_init) is launched.

3.3.2 Context-Aware Device Virtualization

Our prototype permits contentious accesses from the two running namespaces. To accommodate that, AirBag effectively multiplexes their accesses to various system resources in a way transparent to user-level apps (so that normal user experience will not be compromised). In Table 3.1, we show the list of virtualized hardware devices supported in Airbag. Due to page limit, we will explain the six representative hardware devices in more details.

22 3.3. IMPLEMENTATION CHAPTER 3. ISOLATING ANDROID MALWARE WITH AIRBAG

Table 3.1 Supported Android Hardware Devices in AirBag

Hardware Device Description Audio Audio Playback and Capture Framebuffer Display Output GPU Graphics Processor Input Touchscreen and Buttons IPC Binder IPC Framework Networking WiFi Network Interface pmem Physical Memory Allocator Power Power Management (Suspend/Resume) RTC Real Time Clock Sensors Temperature, Accelerometer, GPS Telephony Cellular Radio (GSM, CDMA)

3.3.2.1 Framebuffer/GPU

In AirBag, one of the most important devices for virtualization is the device screen, which involves the respective framebuffer and GPU. Specifically, in Android, all the visual content to be shown by running apps are synthesized by the screen updater (SurfaceFlinger) to the framebuffer memory, which is allocated from the OS kernel but mapped to userspace. Any update will trigger the frame- buffer driver to issue DMA operations and display the synthesized image to the device screen. Since we have only one device screen and there exist two screen updaters from two different namespaces, we need to regulate which one will gain actual access to the screen. For isolation purposes, our prototype allocates a second framebuffer memory exclusively for the AIR runtime so that each updater can update its own framebuffer without affecting each other. But the underlying hardware driver will only deliver the framebuffer from the active namespace to the screen. In our prototype, since the framebuffer memory is mapped into the GPU’s private page table and the page table can be dynamically updated at runtime, we choose to only activate the framebuffer memory in GPU from the active runtime. Our solution works well in all three experimented mobile devices. However, the prototype on Nexus One deserves additional discussions. To efficiently manage and allocate physical memory for GPU, the Android support on Nexus One has a physical memory allocator called pmem. The user- level screen updater will request physical memory from the /dev/pmem device. In order for the GPU and the upper-layer screen updater to render on the screen, a 32MB contiguous physical memory block has been reserved for /dev/pmem. With two instantiated runtimes, an intuitive solution will be to double the memory reservation and dynamically allocate the first half for the original Android

23 3.3. IMPLEMENTATION CHAPTER 3. ISOLATING ANDROID MALWARE WITH AIRBAG runtime and the second half for AIR. In fact, we indeed implemented this approach but painfully realized that there also exist lots of other meta information associated with /dev/pmem, which also need to be decoupled for namespace awareness. For portability, we aim to avoid changing the internal logic. We then devise another solution by creating a separate /dev/pmem device for each namespace (while still doubling the memory reservation). From the upper-layer runtime perspective, it is still accessing the same /dev/pmem device. But in our OS extension, we dynamically map the device file to /dev/pmem_native and /dev/pmem_airbag respectively to maintain transparency and consistency inside the original pmem driver as well as upper-layer screen updaters. In Figure 3.2, we summarize the interaction between the screen updaters, decoupled pmem device, GPU, and framebuffer drivers on our Nexus One prototype.

3.3.2.2 Input Devices

After creating a distinct framebuffer for each namespace, our next step is to appropriately deliver events from various input devices (e.g., touchscreen, buttons, and trackball) to the right namespace. Interestingly, Linux kernel has designed a generic layer, i.e., (event device), which connects various input device drivers to upper-layered software components. The presence of such layer makes our prototype relatively straightforward. Specifically, the Android runtime (or its service daemons) will listen to input events (e.g., touchscreen and trackball) by registering itself as a client represented as evdev_client in OS kernel. When the underlying driver is notified with a pending input event from hardware (e.g. a tap on the touchscreen), the event is delivered to all the registered clients. Therefore, upon the input event registration, we will record its namespace into the evdev_- client data structure. When an input event occurs, similar to the framebuffer driver, we deliver it only to the registered clients from the active namespace. In other words, all other clients from inactive namespace will not be notified about the event.

3.3.2.3 IPC

After handling basic input and (screen) output devices, we find they are still insufficient to properly set up the AIR environment. It turns out that the problem is due to the custom IPC mechanism in Android. Specifically, unlike the traditional Linux IPC that is already isolated by different namespaces (or cgroups), a custom IPC driver named binder is developed in Android. With the binder driver, a special daemon servicemanager will register itself as the binder context manager during the loading process of Android. After that, various service providers will register themselves (via addService) so that other service users can look up and ask for their services (via getService). Note that all these operations are performed by passing IPC messages through /dev/binder.

24 3.3. IMPLEMENTATION CHAPTER 3. ISOLATING ANDROID MALWARE WITH AIRBAG

To virtualize /dev/binder, we create a separate context manager for AIR so that all subsequent services registration or lookup will be performed independently within AirBag. In our prototype, we have similarly created an array of context managers indexed by respective namespace. With that, both native runtime and AIR have their own servicemanager daemons registering as the context managers that handle follow-up addService/getService operations independently, such that all inter-app communications (e.g., intents) are fully supported within AirBag. Also, notice that binder is the first system resource the Android runtime acquires, we can also conveniently consider the moment when the device file /dev/binder is being opened as the indication that a new namespace needs to be created.

3.3.2.4 Telephony

The telephony support in Android largely relies on a service daemon, rild, which loads vendor- proprietary library (e.g., libhtc_ril.so) for controlling the underlying hardware. In particular, a Java class com.android.internal.telephony.RIL of Android runtime communicates with rild via an Unix domain socket (created by rild) to proxy various telephony services. To support necessary telephony functions inside AIR, as we do not have access to vendor-specific source code, we choose to multiplex the hardware access at the user level rild. Specifically, in our prototype, we create a TCP socket along with the normal Unix domain socket in rild that runs in the native runtime. The new TCP socket is used to accept incoming connections from the com.android.internal.telephony.RIL inside AirBag ( Figure 3.3). In other words, the rild inside AirBag is disabled (by adjusting the internal startup script init.rc). By design, our current prototype allows for outgoing phone calls from AirBag, but any incoming phone calls will be automatically answered in the native runtime.3

3.3.2.5 Audio

For the audio device, we find the support on Nexus One straightforward as it exports a device file /dev/q6dsp that allows for concurrent accesses. However, the support on Nexus 7 and Galaxy S III is rather complicated. Specifically, both devices adopt the standard ALSA-based audio driver [Als] in OS kernel, which allows only one active audio stream. In other words, if one namespace is currently accessing the device, the other will not be able to access it. Specifically, the process trying to access the audio device would be put into a wait queue when the device is in use. In our prototype, we take a similar approach with the /dev/pmem device. Specifically, we add a separate virtual audio stream for each namespace so that it will maintain exclusive use within

3If the native runtime is currently not active when an incoming phone call is received, we will automatically activate it to achieve the same level of user experience.

25 3.3. IMPLEMENTATION CHAPTER 3. ISOLATING ANDROID MALWARE WITH AIRBAG

Telephony Service Telephony Service Unix Domain Internet Socket Socket

Telephony Telephony Daemon multiplexing Daemon

AirBag Vendor Library Hardware

Figure 3.3 Telephony Virtualization in AirBag

respective namespace. The virtual audio stream from the active namespace will be bound to the hardware audio stream at runtime. For example, in ALSA, an ioctl operation, i.e., SNDRV_PCM_- IOCTL_WRITEI_FRAMES is used to send audio data to the device. Such an ioctl from the inactive runtime would silently return without actually sending data to the hardware. But for other to retrieve or update hardware states such as SNDRV_PCM_IOCTL_SYNC_PTR, we maintain its own latest cache of the states, which will then be applied to hardware when its namespace becomes active. When an inactive namespace becomes active, it is allowed to preempt the use of the audio device.

3.3.2.6 Power Management

The presence of two runtimes also complicates the power management. For example, when an untrusted game app runs inside AirBag for a while, the native runtime may time out and attempt to perform early suspend on the entire phone, which includes turning off the screen. To avoid causing inconvenience, our current prototype chooses to disable any power-related operations from AirBag. In other words, we only allow the native runtime to turn off or dim the screen. In order to prevent the native runtime to sleep while AirBag is active, it will require a wakelock [Wak] in the native runtime before activating the AIR. The AIR still maintains its own timeout for screen turn-off. But instead of actually turning off the screen, it will release the wakelock. Also, when the app inside AirBag terminates, it will then release the wakelock and yield the control back to the native runtime.

3.3.3 Decoupled App Isolation Runtime

With a separate app isolation runtime, we have the opportunity to customize it to better confine untrusted apps without affecting the original native runtime. As mentioned earlier, we build the

26 3.3. IMPLEMENTATION CHAPTER 3. ISOLATING ANDROID MALWARE WITH AIRBAG

AIR by customizing Android Open Source Project (AOSP 4.1.1) to export the same interface while in the meantime allowing users to choose different running modes. In particular, the AIR’s root directory is relocated with the pivot_root (so that any write operation issued in AirBag would not corrupt the original files in the firmware). Specifically, we build a unionfs [Qui06] that copy-on-writes all updates in a file-based ext4 disk image and uses a base filesystem as a squashfs image for read-only operations. Such an organization enables us to readily provide the “restore to default” feature, which essentially removes the dirty file-based ext4 disk image. Also, our system eliminates all potential personally-identifying information from AIR for the “incognito” mode. For instance, the Android API TelephonyManager.getDeviceId() has been instrumented to return a faked IMEI number. The layered design of AOSP also provides the opportunity to profile app behavior. For example, while analyzing a malware, we usually leverage logcat, to record various Android API calls we are interested in. We note that the collected log entries are pushed down from the namespace in which the untrusted app runs, which does lead to the concern of trustworthiness of collected log. However, from another perspective, the actual dumped message is maintained by the kernel-level log driver, which is assumed to be trusted (Section 3.2). Moreover, the profiling mode will turn on the support [Sys] to record syscalls from AirBag (with confined apps) to external SD card for in-depth analysis. In addition, our system also instruments the AIR to prevent untrusted apps from performing stealthy actions (e.g., sending SMSs to premium-rate numbers). In particular, by modifying the Android API in com.android.internal.telephony.RIL class, the untrusted app running inside AirBag mode is prevented from performing any stealthy telephony action. Further, thanks to the cgroup abstraction, we could white-list the devices for AirBag access. Specifically, before starting the AirBag namespace, we can write each allowed device file name with the corresponding permission to the cgroups virtual filesystem (e.g. /cgroup/airbag/devices.list). After that, all the access to the device files not listed in the white-list would be automatically blocked. To maintain transparency, our scheme is seamlessly integrated with the native system without breaking user experience. Specifically, when the system boots up, the AirBag environment is auto- matically initiated and then suspended. Its suspension will be removed in two scenarios when the user either (1) dispatches an app to it for isolation or (2) launches a previously isolated app. In the first case, our customized PackageInstaller automatically guides the installation procedure by simply adding an “isolate” button (Figure 3.4a). For each isolated app, our system will register an “app stub” in the native Android runtime. In Figure 3.4b, we show the example app stub for an isolated game app (com.creativemobi.DragRacing). For comparison, we also install the same game app inside the native runtime. The difference in their icons is the addition of a lock sign on the icon associated with

27 3.3. IMPLEMENTATION CHAPTER 3. ISOLATING ANDROID MALWARE WITH AIRBAG

Original App

App Stub

(a) Customized Package Installer (b) App Stub

Figure 3.4 Seamless Integration of AirBag

the isolated app. When the user clicks the app stub, AirBag is activated to execute the isolated app, which transparently marks native runtime inactive and thus yields underlying hardware accesses to AirBag. When the app terminates, AirBag would make itself inactive and seamlessly bring the native runtime up-front.

3.3.4 Lessons Learned

In the process of developing our early prototype on Nexus 7, we encounter an interesting problem that a benchmark program running inside the AirBag always scores one fourth of normal system, which indicates that AirBag only utilizes one of the four available CPU cores. After further inves- tigation, it turns out that Nexus 7 has a CPU hotplug mechanism that can dynamically put CPU cores online or offline based on the workload of the whole system. However, due to a bug [Cpu] in Linux kernel 3.1.10, the CPU online events are not properly delivered to AirBag, which then fails to scale up the computation power when AirBag is fully loaded but the native runtime is idle. We then

28 3.4. EVALUATION CHAPTER 3. ISOLATING ANDROID MALWARE WITH AIRBAG

Table 3.2 Effectiveness of AirBag in Successfully Blocking 20 Representative Android Malware

Malicious Behavior Malware Family Retrieve Retrieve Send Intercept Record Damage Firmware IMEI Phone Number SMS SMS Audio (w/ root exploits) BeanBot p p p p DKFBootKit p p p DroidKungFu p p p DroidLive p p p Fjcon p p Geinimi p p p p GingerMaster p p GoldDream p p p p HippoSMS p p NickiBot p p p p RogueLemon p p RogueSPPush p p RootSmart p p SMSSpoof p SndApps p Spitmo p p TGLoader p p YZHCSMS p p p Zitmo p Zsone p p

backport the patches from mainline Linux kernel [Mai] to have AirBag informed about the status of available CPU cores whenever a CPU core is online or offline. Another issue we encountered in our prototype is related to the low-memory killer, which will be waked up to sacrifice certain processes when the system is under high memory pressure. As our prototype supports two concurrent namespaces, the unknowing low-memory killer may pick up a process from the active namespace as victim for termination, which greatly affects user experience. Therefore, our prototype adjusts the algorithm and makes it in favor of choosing processes from inactive runtime as victims to maintain responsive user experience.

3.4 Evaluation

In this section, we present the evaluation results by first showing the effectiveness of AirBag with various mobile malware. We then measure the impact on performance as well as power consumption and memory usage.

29 3.4. EVALUATION CHAPTER 3. ISOLATING ANDROID MALWARE WITH AIRBAG

3.4.1 Effectiveness

To evaluate the effectiveness, we selected 20 Android malware that present a good coverage of state-of-the-art mobile malware in the wild [And]. In Table 3.2, we show the list and their malicious behavior which is manually triggered. Specifically, AirBag is able to successfully isolate these mali- cious apps and prevent them from performing the malicious operations in either Android framework level or OS kernel level. For example, the way AirBag detects and prevents NickiBot from recording audio is done by hooking the corresponding ioctls (e.g., SNDRV_PCM_IOCTL_READI_FRAMES) of the ALSA-based audio driver [Als] in OS kernel while the of the AIR’s root directory and the usage of unionfs (Section 3.3.3) enable us to prevent firmware damages. We emphasize that AirBag in all three supported mobile devices is able to achieve the same results.4 In the following, we present details of three representative experiments, to demonstrate the values from incognito mode, profiling mode, and flexible user confirmation for sensitive operations, respectively.

3.4.1.1 GoldDream Experiment

This malware [Gol] infected Android systems by hiding in popular game apps. It spies on SMS messages received by users, monitors incoming/outgoing phone calls, and then stealthily uploads them as well as device information to a remote server without user’s awareness. Specifically, by registering a receiver for various system events (e.g., when a SMS message is received), GoldDream launches a background service without user’s knowledge to monitor and upload private information. With AirBag, this malware is automatically dispatched to run inside the isolated AIR, instead of the native runtime. Also, the spying activities are effectively blocked as various system-wide events are by default isolated from the native runtime to AIR. In Figure 3.5, we show how the incognito mode is helpful to prevent real phone information from being leaked by a GoldDream-infected game app com.rainbw.Fish. In this experiment, we capture incoming/outgoing network traffic of AirBag with tcpdump when the malware runs. From the dumped log, we observed the collected IMEI number and phone number were being uploaded in an HTTP message to a remote server. Figure 3.5a shows the recorded malware behavior of retrieving the phone number (faked to be 0123456789 in our prototype). Figure 3.5b highlights the collected (fake) phone number being reported back to a remote server.

4The exceptional case is the Nexus 7 that is a tablet and does not have necessary telephony support. However, it does not affect AirBag’s effectiveness in isolating these apps.

30 3.4. EVALUATION CHAPTER 3. ISOLATING ANDROID MALWARE WITH AIRBAG

(a) Faked phone number is being accessed

(b) Faked phone number is being uploaded

Figure 3.5 GoldDream Analysis

3.4.1.2 DKFBootKit Experiment

The previous experiment effectively blocks malware’s spying behavior and prevents private infor- mation from being leaked. In this experiment, we further demonstrate how AirBag can prevent the firmware from being manipulated by malware. In this case, we experimented with DKFBootKit [Dkf], an Android malware that infects the boot sequence of Android (not the bootloader) and replaces a few system utilities such as ifconfig, rm, and mount under the system partition. With AirBag, DKFBootKit will not be able to cause any damage to our system. First, the native filesystem is completely isolated from the AIR on which the DKFBootKit runs. Second, the changes inflicted by DKFBootKit, while visible inside AirBag, are automatically copy-on-written to a separate file. With that, we can not only conveniently analyze the contamination from the malware (Section 3.3.3), but also apply “restore to default” feature to undo the changes. Moreover, with profiling mode, we collected syscalls from AirBag including confined processes to monitor the detailed infection sequence. From the infection sequence, we notice that DKFBootKit will release at runtime a payload file named a.exe, which when executed will copy it to /system/lib/libd1.so and further replace a few other files, such as rm and mount (Figure 3.6a). It turns out the replacement of rm is to protect various malware files. In Figure 3.6b, we report the internal logic of the replaced rm, which basically checks arguments and avoids removing infected files. (For other files, the compromised rm proceeds

31 3.4. EVALUATION CHAPTER 3. ISOLATING ANDROID MALWARE WITH AIRBAG

a.exe

unlink("/system/lib/libd1.so")

open("/data/buildarm/bin/a.exe", ...) = 3 open("/system/lib/libd1.so", ...) = 4

rm /system/bin/mount read(3, 0xbe8a82cc, 4096) (or other compromised files) write(4, "\177ELF\1\1\1", 4096) ... rm /system/bin/logcat (or other not compromised files) close(3); close(4) Try /system/xbin/rm

access("/system/xbin/rm", F_OK) unlink("/system/bin/rm") = 0 = -2 (ENOENT)

exit without any /system/xbin/rm not found open("/system/lib/libd1.so", ...) = 3 unlink call use /system/bin/toolbox open("/system/bin/rm", ...) = 4 execve(/system/bin/toolbox rm /system/bin/logcat)

read(3, 0xbe8a82cc, 4096) toolbox removes file write(4, "\177ELF\1\1\1", 4096) ... unlink("/system/bin/logcat")

close(3); close(4) exit(0)

(a) Payload Execution (b) /system/bin/rm Execution

Figure 3.6 DKFBootKit Analysis

32 3.4. EVALUATION CHAPTER 3. ISOLATING ANDROID MALWARE WITH AIRBAG

Bookmarks

History

Download

Help

(a) A screenshot of HippoSMS- (b) A pop-up alert on background infected video browser SMS behavior

Figure 3.7 HippoSMS Analysis

normally by invoking /system/xbin/rm or /system/bin/toolbox.)

3.4.1.3 HippoSMS Experiment

In this experiment, we present the capability of exposing stealthy malware behavior and how users can dynamically block them. Specifically, we run an Android malware HippoSMS [Hip] inside AirBag. As the name indicates, this particular malware sends text messages to a premium-rate number that incurs additional phone charges. Notice that the only interface to access the telephony hardware is the rild daemon running in the native runtime. And any telephony-related operation inside AirBag will be tunneled out to native runtime. The user will then have the option to either allow or disallow it. By doing so, we can effectively expose any background behavior that is often go unnoticed in a normal system (without AirBag). In Figure 3.7a, we show a screenshot of a HippoSMS-infected video browser that is involved in background SMS behavior. The background SMS-sending behavior is intercepted and reported to user in a pop-up window – Figure 3.7b. The user then has the option to permit or deny it.

33 3.4. EVALUATION CHAPTER 3. ISOLATING ANDROID MALWARE WITH AIRBAG

Table 3.3 Benchmarks Used in Our Evaluation

Benchmark Name Version Workload Type AnTuTu Benchmark [Ant] 2.8.3 Combination BrowserMark [Bro] 2.0 CPU/IO NenaMark2 [Nen] 2.3 GPU Neocore [Neo] 1.9.35 GPU SunSpider [Sun] 0.9.1 CPU/IO

Normalized Nexus One Results (%) Normalized Nexus 7 Results (%) Normalized Galaxy S3 Results (%)

100

80

Baseline 60 Busy−NA Busy−Idle Idle−Busy 40

20

0 Neocore SunSpider BrowserMark NenaMark2 SunSpider BrowserMark NenaMark2 SunSpider BrowserMark

Figure 3.8 Performance Measurement of AirBag on Google Nexus One, Nexus 7, and Samsung Galaxy S III

3.4.2 Performance Impact

To evaluate AirBag’s impact on performance, we have performed benchmark-based measurements on three supported devices – with and without AirBag. Table 3.3 shows the list of benchmarks used in our measurement. These benchmark programs are designed to measure various aspects of system performance. For each benchmark program run, we have measured the performance in four different settings: (1) “Baseline” means the results obtained from a stock mobile device without AirBag support; (2) “Busy-NA” means the results from a mobile device with our OS kernel extension for AirBag but without activating the AirBag; (3) “Busy-Idle” means results from an AirBag-enhanced system by running the benchmark program in the native runtime while keeping AirBag idle; and (4) “Idle- Busy” means results from an AirBag-enhanced system by running the benchmark program inside the AirBag while keeping the native runtime idle. All the performance results are normalized with the “Baseline” system to expose possible overhead introduced by AirBag. Figure 3.8 summarizes the measurement results. Overall, our benchmark experiments show that AirBag incurs minimal impact on system performance (with around 2.5%) in both GPU-intensive workloads (Neocore and NenaMark2) and CPU/IO-intensive workloads (SunSpider and BrowserMark). We also run AnTuTu [Ant], a comprehensive benchmark that reported similar small performance overhead (with around 2% – Figure 3.9). We point out that our experiments so far are conducted by entering the default incognito mode. When we turn the profiling mode on, the evaluation with Neocore benchmark

34 3.4. EVALUATION CHAPTER 3. ISOLATING ANDROID MALWARE WITH AIRBAG

Normalized Nexus 7 Results (%) 100

80 Baseline 60 Busy−NA Busy−Idle Idle−Busy 40

20

0 RAM cpuint cpufp 2D 3D

Figure 3.9 AnTuTu Measurement Results

indicates that our system introduces additional 10% overhead. We are not concerned as the profiling mode is only turned on when performing a forensics-style investigation of an untrusted app.

3.4.3 Power Consumption and Memory Usage

Beside the performance overhead, we also evaluate the impact of AirBag on battery use. With two concurrent namespaces, our system likely incurs additional battery drains. In our measurement, we perform two sets of experiments. In the first set, we start from a fully-charged Nexus 7 device, wait for 24 hours without running any workload, and then check its battery level. The stock system reports 91%, and AirBag-enhanced system shows 89%, indicating 2% more battery use. In the second set, we also start from a fully-charged Nexus 7 device, wait for 24 hours while keeping playing an audio file, and then check its battery level. The stock system reports 66%, and AirBag-enhanced system shows 63%, indicating 3% more battery use. Finally, we also measure the memory footprint of AirBag. Specifically, we examine the percentage of in-use memory (by reading /proc/meminfo) of the Nexus 7 by repeating the previous two sets of experiments. Instead of waiting for 24 hours, we collect our measurement results in 4 hours. The results from the first set of experiments indicate that our system increases the percentage of in-use memory from 59.31% to 60.87%, an addition of 1.56%. In the second set of experiments (with repeated playing of an audio file), the percentage of in-use memory is increased from 60.25% to 63.70%. The additional memory consumption is due to the reserved memory blocks in OS kernel (e.g., for second framebuffer).

35 3.5. DISCUSSION CHAPTER 3. ISOLATING ANDROID MALWARE WITH AIRBAG

3.5 Discussion

In this section, we re-visit our system design and implementation for possible improvements. First, the current usage model of AirBag is to isolate untrusted apps when they are being installed. While it achieves our design goals, it can still be improved with a unique capability to dynamically migrate apps between native and AirBag-confined runtime environments. For example, users may want to try the new features of newly released apps in the AirBag without affecting the native environment but “move” it to the native runtime environment when the app is considered safe and stable. On the other hand, when an app is reported to have malicious behavior (e.g., sending text messages in the background), users can still use the app by limiting its capabilities within the AirBag. Obviously, one solution will be simply uninstalling the app in one runtime and then re-install it in another runtime. However, it will lose all internal states accumulated from previous installation. A better solution might lively migrate it from one to another. This is possible as both runtime environments share the same trusted OS kernel, though in different namespaces. Possible challenges however may include handling dependent libraries that may be inconsistent in different runtimes as well as other currently interacting apps in the previous namespace. Second, to confine untrusted app execution, our prototype disallows confined apps to commu- nicate with other legitimate apps and service daemons running on the native runtime and vice versa. As a result, various system events are isolated at the AirBag boundary. In other words, when there is an incoming SMS or phone call on the native runtime, such an event will not be propagated to the AIR runtime, which will affect certain functionality of untrusted apps. Also, automatic updates on AirBag-confined apps may also break because of the current AirBag confinement. While an intuitive solution is to allow these events to cross the AirBag boundary, it may however break the isolation AirBag is designed to enforce. From another perspective, we are motivated to explore a hybrid approach, which might be ideal in selectively white-listing certain events to pass through (so that we can support legitimate feature needs such as automatic updates) without unnecessarily compromising AirBag isolation. On the other hand, if AirBag is configured to deny all permissions, our system could be considered to be replaced by a customized Android system. However, with our system, users can still run apps normally in the native runtime on the same mobile device which cannot be achieved by customized Android systems. Third, our current prototype is still limited in supporting one single AirBag instance and multiple untrusted apps will need to run within the same instance. This leads to problems when all apps are installed as untrusted. In particular, AirBag does not provide inter-app isolation within itself. Naturally, we can improve the scalability of AirBag by dynamically provisioning multiple AirBag instances with one for each untrusted app. It does raise challenging requirements for more efficient

36 3.5. DISCUSSION CHAPTER 3. ISOLATING ANDROID MALWARE WITH AIRBAG and lightweight AIRs. Note that our AirBag filesystem already made use of copy-on-write to keep all the updates in a separate data file, which should be scalable to multiple AirBag instances. However, context-aware device virtualization requires additional memory to be reserved (e.g., for smooth framebuffer support – Section 3.3.2). It remains an interesting challenge and we plan to explore possible solutions in our future work (e.g., by leveraging hardware virtualization support in latest ARM processors). Fourth, as an OS-level kernel extension, our approach requires updating the smartphone OS image for the enhanced protection against mobile malware infection. While this may be an obstacle for its deployment, we argue that our system does not require deep modifications in smartphone OS kernel. In fact, our kernel patch has less than 2K lines of source code and most of them are related to generic Linux drivers, not tied to specific hardware devices in different smartphone models. Furthermore, we can improve the portability of our system by implementing a standalone that can be conveniently downloaded and installed. Fifth, for simplicity, our current prototype does not provide the same runtime environment as the original one. Because of that, a malicious app can possibly detect the existence of AirBag and avoid launching their malicious behaviors. In fact, as an OS-level virtualization solution, our system shares with other virtualization approaches [Lan11; And11; FL12; Jia07; Sha09] by possibly exposing virtualization-specific artifacts or footprints. Note that with the capability of arbitrarily customizing the isolated runtime environment (AIR), we are able to further improve the fidelity of AirBag runtime and make it harder to be fingerprinted. However, this situation could lead to another round of “arms race.” From another perspective, if a mobile malware attempts to avoid launching its attacks in a virtualized environment, our system does achieve the intended purpose by resisting or deterring its infection. Last but not least, with a decoupled app isolation runtime to transparently support untrusted apps, AirBag opens up new opportunities that are not previously possible. For example, our current profiling mode basically collects logcat output as well as various syscalls from AirBag. However, it does not need to be limited in basic log collection. For example, recent development on virtual machine introspection [FL12; Jia07; DG11; GR03; YY12] can be applied in AirBag to achieve better introspection and monitoring capabilities. Moreover, it also provides better avenues to integrate with current mobile anti-virus software so that they can reliably monitor runtime behavior without being limited in only statically scanning untrusted apps.

37 3.6. SUMMARY CHAPTER 3. ISOLATING ANDROID MALWARE WITH AIRBAG

3.6 Summary

We have presented the design, implementation and evaluation of AirBag, a client-side solution to significantly boost Android-based smartphone capability to defend against mobile malware. By instantiating a separate app isolation runtime that is decoupled from native runtime and enforced through lightweight OS-level virtualization, our system not only allows for transparent execution of untrusted apps, but also effectively prevents them from leaking personal information or damaging the native system. We have implemented a proof-of-concept prototype that seamlessly supports three representative mobile devices, i.e., Google Nexus One, Nexus 7, and Samsung Galaxy S III. The evaluation results with 20 representative Android malware successfully demonstrate its practicality and effectiveness. Also, the performance measurement with a number of benchmark programs shows that our system incurs low performance overhead.

38 CHAPTER

4

SECURING CRITICAL I/O OPERATIONS WITH THYPE

4.1 Introduction

Smartphones have dramatically changed our daily lives in the past few years. Among various smart- phone platforms, Android has dominated the market since 2011 [Gar11] and its market share grows rapidly up to around 80% in 2013 [Gar14]. Meanwhile, Android apps become the most popular tools for people to access and management all kinds of data, private or public. However, there are various of vulnerabilities (e.g., the [Cod]) inside the Android OS such that any OS-level defense or isolation mechanism (e.g., AirBag) can be bypassed through the relatively unsafe OS kernel. When the OS is compromised, you have no privacy in your phone. The disclosure of the third-party driver bugs such as CVE-2013-6123 [Cved] and CVE-2014-4322 [Cvef] shows that the security of Android OS is threatened by its open architecture and the fragmentation of the Android ecosystem. Although the classic discretionary access control (DAC) design (i.e., Unix permissions) in Linux successfully regulates the device file access from third-party apps, attackers can still reach the privileged device files and launch attack on the OS because of wrong file permission settings [Zho14]. The complexity of Android runtime (AOSP) is another weakness of Android that the attacker can

39 4.1. INTRODUCTION CHAPTER 4. SECURING CRITICAL I/O OPERATIONS WITH THYPE

(b) Attack

() Root

(a) Requested Permissions

Figure 4.1 Privileged File Access Example

escalate the privilege and break the DAC. For example, CVE-2014-7911 [Cveg] enables a third-party app to behave as a privileged daemon (i.e., system_server). In this dissertation, we also identify one serious security problem related to the AOSP and the permission settings. In the end of 2012, multiple security vulnerabilities were discovered in the diagnostics (DIAG) driver on many Qualcomm SoC based phones [Cvea; Cveb]. Fortunately, the access to the interface of the DIAG driver, /dev/diag, is restricted to users in radio group on many Qualcomm SoC based phones such that third-party apps cannot exploit the vulnerabilities. However, one built-in content provider breaks the restriction. Specifically, the built-in telephony provider (com.android.providers.telephony) exposes multiple content providers for other apps to access SMS/MMS messages. One of the providers, MmsProvider implements the openFile() method for opening the file path specified in the records of its database. Since the insert() and update() methods are opened for other apps to input arbitrary strings into the database as well as the openFile() method does not check the file path before opening, the telephony provider can be tricked to open arbitrary files according to its privilege level. We prepare an HTC Desire S phone running Android 2.3.5 to demonstrate the vulnerability. As shown in Figure 4.1a, the PrivilegedFileAccess app requests android.permission.READ_SMS and android.permission.WRITE_SMS for accessing the MmsProvider. Figure 4.1b shows the permission and ownership of the device file as well as the information of the app in /proc filesystem. In more details, the app (pid 2498) runs as a user which is not joined to any group but has an fd (51) connects to the device file which can only be accessed by privileged users. Furthermore, we show the strace log

40 4.1. INTRODUCTION CHAPTER 4. SECURING CRITICAL I/O OPERATIONS WITH THYPE in Figure 4.1c to demonstrate how the phone is rooted. We can see that the app performs a setresuid32 call with all parameters set to 0 for rooting the phone and fails due to the permission check inside Linux kernel. After that, the app sends a command to the DIAG driver with an ioctl() call which exploits the vulnerabilities to break the permission check inside the sys_setresuid system call handler. As a result, the next setresuid32 call successfully roots the phone such that the PrivilegedFileAccess app can almost do anything on that phone. Besides the drivers, AOSP,and vendor customizations, the code base of the Android OS, Linux kernel, also has lots of vulnerabilities. CVE-2013-2094 [Cvec] is one of the most famous Linux vulnerability recently which has been upstream-ed for many years before being disclosed and nobody knows for how long it has being exploited. CVE-2013-6282 [Cvee] is another one that is widely used by rooting apps which can be exploited to root almost all Android devices around the time. Although the mandatory access control (MAC) rules of SEAndroid [SC13] successfully isolate lots of driver bugs, both of the above Linux vulnerabilities can be easily exploited with simple system calls such that no existing protection mechanism can prevent these kinds of vulnerabilities from being exploited. This leads to a pressing need of securing mobile apps when the underlying OS is untrusted and could be compromised. In this chapter, we only focus on the security sensitive apps such as those for mobile payment and online banking. These apps usually have strong end-to-end authentication and encryption mechanisms while interacting with remote servers to prevent man-in-the-middle attacks. However, the input from the user and the output generated from the apps need to go through the OS kernel in plain text, which makes the input data (e.g., passwords) prone to be stolen. For example, the compromised framebuffer/LCD driver can trick the user by generating malicious screen output and the touchscreen driver could be exploited to collect finger position data so that the attacker can reproduce the password accordingly. Some recent research efforts address this problem by using ARM TrustZone [Tru]. VeriUI [LC14] secures login by obtaining OAuth tokens through the Secure World which consists of the secure OS and a secure browser. Inside the secure OS, the display driver, the touchscreen driver, and a network driver provide the direct hardware access from the secure browser. However, malware running in the Normal World can still perform UI spoofing attacks in this case. Since current hardware components such as LCD screen controller and touchscreen controller do not have the authentication and encryption capabilities, TrustUI [Li14] relies on extra LED indicators to prevent framebuffer overlay attack and randomizes the software keyboard to prevent the input data being stolen. We believe this problem could be better solved by a virtualization solution due to the fact that the special hardware capabilities (e.g., extra LED indicators, encrypted IO access) and the architecture dependent mechanism (e.g., ARM TrustZone) may not be available on all kinds of Android devices.

41 4.1. INTRODUCTION CHAPTER 4. SECURING CRITICAL I/O OPERATIONS WITH THYPE

Android Runtime

Linux OS Kernel Trusted OS

Critical I/O Drivers

tHype Hypervisor

Hardware

Figure 4.2 An overview of tHype to secure critical I/O operations

To this end, we present a thin-hypervisor based design to solve the problem by building trusted IO access between the user and the hardware. Specifically, the bare-metal (or Type-I) hypervisor provides virtualized framebuffer/LCD and touchscreen to the guest OSs and allows native access for other hardware components. This design could make the hypervisor have a smaller code base compared to existing Type-I hypervisors (e.g., Xen). With the thin-hypervisor, the security sensitive apps can be executed in a trusted guest domain which can securely access the critical hardware (e.g., LCD screen, touchscreen) parallely with the untrusted Android OS in another guest domain. Inside the trusted guest domain, we do not provide the complete Android OS and runtime environment. Instead, we provision the security sensitive apps with a tiny software stack for them to securely access the virtualized hardware. Although those apps need to be re-written to be executed in the trusted guest domain, the thin-hypervisor and the tiny software stack make our solution a much smaller TCB. We implement the thin bare-metal hypervisor, tHype, prototype on the Google/Samsung Nexus 10 tablet equipped with ARM Cortex-A15 processor which supports hardware virtualization. Since we leverage the second-level address translation (SLAT) technology to enable the efficient IO virtu- alization, we build and evaluate the SLAT enhancement prototype on the x86 Xen hypervisor which is easier for development. The purpose is to demonstrate the feasibility of using SLAT features to optimize the virtualization-based approaches. SLAT is a hardware-assisted virtualization technol-

42 4.2. DESIGN CHAPTER 4. SECURING CRITICAL I/O OPERATIONS WITH THYPE ogy which greatly reduces the overhead of software-managed shadow page tables. Each processor vendor has its own SLAT implementation such as Intel’s EPT, AMD’s RVI/NPT, ARM’s “second stage translation”, etc. In a similar way, we also utilize the characteristics of SLAT to facilitate the memory management of the DeHype prototype in Chapter5.

4.2 Design

By securing critical IO for mobile systems, we aim to provide a simple hypervisor which provides an isolated runtime environment to security sensitive apps for critical IO access. As shown in Figure 4.2, our system virtualizes the main input and output devices (i.e., the touchscreen and the screen) due to the fact the users cannot interact with them in a secure way. For example, an online banking app may show the account number on the screen which can be easily retrieve by the attacker who is able to control the framebuffer/LCD driver. When the user is using the graphical keyboard to input the password by the touchscreen, the hijacked touchscreen driver could be exploited to tell the attacker that which finger position is currently pressed. Based on above reasons, our system is designed to provide the secure touchscreen that the untrusted apps or OS cannot retrieve any input data from it while the screen output generated by the trusted app is isolated to prevent it from being stolen by the attacker. Besides the touchscreen input and the screen output, we also provide the access to disk and network because those are necessary to complete the security sensitive operation such as making an online payment. With tHype, the critical IO devices are controlled by the hypervisor with paravirtualization interfaces provided to guest OSs. For those non-critical hardware devices, we leave it to the untrusted OS by assuming that the secure sensitive operations could be completed without them. Specifically, tHype is a bare-metal (type-I) hypervisor that virtualizes only a small set of hardware devices, which makes it simple and small. The access to those virtualized devices are regulated by the hypervisor with the help of hardware virtualization support such that the security sensitive private data managed by the trusted app would not be leaked even when the untrusted OS is subverted.

4.2.1 Critical I/O Virtualization By virtualizing the input device, we can prevent the leakage of critical private data from the source. In tHype, we virtualize the touchscreen and provide each guest domain an interface to retrieve the input events separately. To achieve that, we need to firstly patch the touchscreen driver to have it get the input events from the specific interface provided by the hypervisor instead of accessing the touchscreen hardware directly. Specifically, whenever the touchscreen hardware senses an

43 4.2. DESIGN CHAPTER 4. SECURING CRITICAL I/O OPERATIONS WITH THYPE

Key Pressed

Status Bar

Figure 4.3 Screenshot of Using tHype-powered Trusted I/O input event, it generates an interrupt which would be processed by the hypervisor. The hypervisor retrieves the event by reading the hardware registers, queues the events into a memory page for the current active guest domain, and notify the guest OS by sending the virtual interrupt signal. Then, the tweaked touchscreen driver in the guest OS reads out the input events from memory and reports to upper layers in the interrupt service routine. As the touchscreen input events are isolated by the hypervisor, the compromised guest OS cannot get the password, bank account, etc. inputted by users. However, in many cases, we leave the trace on the screen such that the attackers can still get the input by capturing the the screen. On the other hand, we need a trusted indicator to let the user know if the current guest domain is safe. The screen is the most straight-forward indicator in this case if we can isolate the screen output from each guest domain. Specifically, we use part of the screen as the status bar which can only be updated by the hypervisor such that the user can easily identify if it is the safe guest domain to input critical private data. Figure 4.3 is the screenshot of using tHype-powered trusted I/O. The green bar in the bottom of the screen is the “status bar” which we use as the indicator. When the status bar is not green, it means the current active guest domain is not the trusted domain. The screenshot also shows that the trusted guest domain can retrieve the input event from the graphical keyboard and render it on the upper part of the screen. To achieve that, we actually deal with the framebuffer memory and the LCD controller driver for allowing only the active guest domain to update the framebuffer which would be DMA-ed to the LCD controller hardware. By manipulating the memory mapping, the hypervisor can not only render the content provided by one of the guest domain but also prevent the “status bar” from being corrupted by the malicious guest. Besides the touchscreen input and screen output, the trusted app running inside the trusted guest domain may also need to access the network and the disk to complete the security sensitive

44 4.2. DESIGN CHAPTER 4. SECURING CRITICAL I/O OPERATIONS WITH THYPE operations such as making an online payment. Since both of network and disk operations could be secured by end-to-end authentication and encryption, we re-direct those IO requests to the untrusted guest OS with the assumption that the end-to-end security would not be broken. This could be done by having an request handler in the untrusted guest OS which has a complete software stack for networking and disk IO. Specifically, the network traffic from the trusted guest are injected to the untrusted guest by the hypervisor and routed to the Internet with the help of the untrusted guest. Similarly, simple disk read/write requests from the trusted guest could be processed with the help of the untrusted guest, which greatly reduces the complexity of the trusted guest OS.

4.2.2 Capability-Based Memory Sharing and Alternate Memory View

As we described above, the key to virtualize the critical I/O hardware is memory sharing between different guest domains and the hypervisor. To share a memory page between two domains, we manipulate its host physical address (HPA) to guest physical address (GPA) mapping in the hyper- visor (i.e., having the HPA being double-mapped in two domains). When the page is shared, we also need to control the capability of each domain to prevent the shared data from being corrupted by the malicious guest. For example, when the hypervisor needs to route the network traffic from one domain to another, we setup a memory page which is write-only in the source domain but read-only in the destination domain to prevent data corruption. To smoothly virtualize the screen output, we setup an alternate memory view for each guest domain in the hypervisor which have a separate set of memory pages for framebuffer. Specifically, when the trusted domain is taking over the screen output, the untrusted domain may also try to update the screen. We switch the memory view of the untrusted domain to an alternate view that having all framebuffer pages mapped to dummy pages. With that, the content from the untrusted domain would not be rendered on the screen. However, to maintain the alternate memory view, each guest domain needs two second-level address translation (SLAT) tables (e.g, Intel’s EPT). To this end, we propose a copy-on-write mechanism to optimize the alternate memory view support. Our design benefits from the level structured EPT such that only the table containing the alternate mappings and all its upper-level tables including the root table need to be duplicated. Typically, only four extra pages need to be allocated to support the alternate mapping, which greatly reduces the memory overhead. We leave the details of maintaining alternate memory view in Section 4.3.

45 4.3. IMPLEMENTATION CHAPTER 4. SECURING CRITICAL I/O OPERATIONS WITH THYPE

4.3 Implementation

In this section, we elaborate the implementation of our system. Our current prototype is imple- mented in two parts. The IO virtualization is based on Google Nexus 10 tablet since it is equipped with an ARM Cortex-A15 processor which has the ARM virtualization extensions. For developing and debugging the SLAT of our hypervisor in an easier way, we start our implementation of capability- based memory sharing and alternate memory view from the x86 Xen hypervisor. While conceptually straightforward, only minimal engineering effort is required to port the SLAT enhancement into our ARM-based hypervisor as well as other SLAT implementations such as AMD’s RVI/NPT.

4.3.1 Critical I/O Virtualization Our implementation of virtualized IO is based on the Nexus 10 tablet which has a touchscreen controller connected on the I2C bus for sensing the touch events. The controller notifies the CPU with a interrupt so that the touchscreen driver basically collects all the information about the current input event in the interrupt service routine (ISR) by issuing I2C commands and reports the event to upper software layer by Linux kernel API such as input_event_abs and input_sync. The first thing we need to do is decoupling the driver code associated to the hardware from the guest OS. In Linux kernel, the decoupling could be done based on the interfaces provided by input core. Specifically, the input core is the hub for collecting and dispatching all kinds of input events in Linux such that we can replace the original touchscreen driver with a ring-buffer driver which collects the input events from other source instead of directly from the hardware and reports the events to input core layer. Based on that, we start from moving the whole touchscreen driver into the hypervisor. After that, we replace all function calls for reporting events to input core with the API that store the events into a shared page. Since the touchscreen driver need to access the hardware through I2C, we also need to have the core I2C driver in the hypervisor. As all the information of the input events are stored in the shared page, the hypervisor notifies the guest with the emulated interrupt. Therefore, the simple ring-buffer driver in the guest OS can handle the input event the same as it is from the real hardware. As we mentioned earlier, the screen output is virtualized by manipulating the memory mappings of framebuffer memory. Figure 4.4 illustrates our implementation. We start from the framebuffer driver of Google Nexus 10 by identifying the memory pages that are allocated for DMA-ing the synthesized picture to the LCD. In Google Nexus 10, the LCD output is consist of four framebuffer windows which is used for updating part of the screen. For example, window 0 is the background window that basically would not be updated until the next wallpaper change and window 1 is in

46 4.3. IMPLEMENTATION CHAPTER 4. SECURING CRITICAL I/O OPERATIONS WITH THYPE

Guest Physical Memory Guest Physical Memory framebuffer framebuffer

Hypervisor

dummy framebuffer

Host Physcical Memory DMA

Figure 4.4 Framebuffer Virtualization based on Second Level Address Translation

charge for the main activity on the screen which is rendered on top of window 0. We provide the GPAs of those framebuffer pages as the paravirtualization interface to the guest. Specifically, both domains may update the framebuffer memory and try to render something on the LCD screen anytime. Compared to the design of AirBag mentioned in Section 3.3.2, we do not allocate separate framebuffer memory for each domain. Instead, the framebuffer memory is actually shared in the host physical memory space but only the active domain can access it. The GPA/HPA mapping of framebuffer pages in the inactive domain would be redirected to dummy pages by the hypervisor. This way, the framebuffer content generated by the inactive domain would not be rendered on the LCD screen. Although it could be huge performance overhead for updating the mappings of all framebuffer memory pages in the hypervisor, the alternate memory view mentioned earlier would be a good solution for virtualizing framebuffer in the hypervisor. Due to the fact that our SLAT implementation is not enabled in our hypervisor yet, we emulate it by modifying the page table mappings. Since we do not need a complicated screen output in the trusted domain, we leave the GPU to the untrusted domain for keeping the hypervisor simple.

4.3.2 Capability-Based Memory Sharing and Alternate Memory View

We implement the prototype of capability-based memory sharing and alternate memory view support as a patch on Xen x86 hypervisor since the x86 hardware is easier for either enabling the virtualization support and development. For capability-based memory sharing, we add couple hypercalls for the guest domains to setup shared pages. Specifically, each guest domain needs to export the GPA that it want to use for accessing the shared page. For example, a host physical page h

47 4.3. IMPLEMENTATION CHAPTER 4. SECURING CRITICAL I/O OPERATIONS WITH THYPE

1G

b1

a1 2M

eptp c1

alternate eptp b2 4K Dirty Bits 0 d1 0 c2 a2

Figure 4.5 Initial State of the EPT Tree with an Alternate Memory View

is mapped in both guest domains for sharing. We need to have the mapping [g1 h] in the EPT of → domain 1 and [g2 h] in the EPT of domain 2. To achieve that, the hypercall CREATE_SHM(key,g1) invoked by domain→ 1 creates a shared page which could be attached by domain 2 with ATTACH_-

SHM(key, g2) hypercall. Note that, the first parameter key is currently hardcoded in both domains for creating shared pages for networking and disk IO. After the sharing is created, we also allow the trusted domain to set the access permission of each shared page for all domains. For example,

SET_ACCESS(2, g2, ACCESS_RO) prevents domain 2 from writing to any address and executing any instruction within the page starting from GPA g2. Since both ATTACH_SHM and SET_ACCESS need to modify a specific EPT entry, we leverage the API in Xen to walk through the EPT tree, find out that entry, and update it. To create the alternate view for each guest domain efficiently, we implement the copy-on-write mechanism while duplicating the EPT tree. Based the Xen implementation of Intel VT-x, each guest domain has a physical to machine (p2m) software layer to translate the GPA to HPA according to the EPT tables pointed by the EPT pointer (EPTP) in the Virtual Machine Control Structure (VMCS). Therefore, if we need to have multiple memory views and switch between them, we need to update the EPTP in the VMCS of each guest. The basic idea is that we can create a new EPT tree for the alternate memory view. However, since most of the GPA to HPA mappings in the alternate view are the same as those in the original view, we can reduce the overhead by duplicating only the tables that contain the alternate mappings on demand, which is so called the copy-on-write (COW) mechanism. Figure 4.5 illustrates the EPT tree with an alternate memory view in the initial state. In

48 4.3. IMPLEMENTATION CHAPTER 4. SECURING CRITICAL I/O OPERATIONS WITH THYPE

1G

b1

a1 2M

eptp c1

alternate eptp b2 4K Dirty Bits 0 0 d1 1 1 c2

a2 b3

1 4K

c3 d2

Figure 4.6 The EPT Tree with an Alternate GPA/HPA Mapping this state, both views have the same mappings such that we only need to clone the original root table a1 to the alternate root table a2. Here, we use the dirty bits to indicate that if an EPT entry in the alternate view EPT table is different to the entry in the original view. The dirty bits help us to maintain the consistency of both views. For example, when an EPT violation occurs, couple EPT entries in the original view may be modified. We can easily follow the dirty bits to synchronize those modifications to the alternate view. According to Intel VT-x, the EPT entry could point to a superpage which describes the mapping for a range of memory larger than a 4K page. For example, the page b 1 and c 1 are superpages for translating 1G and 2M memory segments respectively. Those superpages somehow complicate our implementation due to the fact that we may need to split a superpage to alter the mappings of couple 4K pages in the alternate view within the memory range that the superpage represents. Figure 4.6 illustrates the process of modifying the blue EPT entry in c 2 which originally maps the GPA to page d 1 in the alternate view. As one entry in c 2 is touched, we need to duplicate it into a new page c 3 and modify that entry to map the original GPA to another page d 2. For the upper-level EPT table b 2 that points to c 2, we also need to clone it into b 3 to have the pointer points to the newly allocated c 3, which reflects the fact that c 2 is copied into c 3. This way, by allocating three pages, a2, b 3, and c 3, we are able to modify all GPA to HPA mappings represented by the EPT entries in c 2

49 4.3. IMPLEMENTATION CHAPTER 4. SECURING CRITICAL I/O OPERATIONS WITH THYPE

Table 4.1 Hypercalls to Support Capability-Based Memory Sharing and Alternate Memory View

Name Description CREATE_SHM Creating a shared page between guest domains ATTACH_SHM Attaching a shared page SET_ACCESS Setting the access permission of a page in the original memory view SET_ACCESS_ALT Setting the access permission of a page in the alternate memory view SET_MFN_ALT Setting the GPA/HPA mapping of a page in the alternate memory view in the alternate memory view. For maintaining the consistency of the original and alternate views, the dirty bits along with each alternate view EPT table simplify the synchronization process. For example, when the superpage b 1 is split, nothing we need to do on synchronization since the dirty bit for the EPT entry in a2 is 0. It means all modifications within b 1 are automatically synchronized because the corresponding EPT entry in a2 points to b 1 instead of another newly allocated page. When an EPT entry in b 2 is updated, the corresponding entry in b 3 needs to be updated as well since the dirty bit for the EPT entry which points to b 3 in a2 is 1. So, whenever an EPT entry of EPT table T in the original view is modified, we traverse the alternate view EPT tree, check the dirty bit of the EPT entry which points to T in the parent EPT table of T , and synchronize the modification to the alternate view if necessary. For exporting the alternate memory view to trusted guests, three more hypercalls are added: SET_ACCESS_ALT, and SET_MFN_ALT. Table 4.1 summarizes the new hypercalls to support capability- based memory sharing and alternate memory view. As the trusted domain can manipulate the alternate view of other domains, the way to switch between views is another problem. One easy way is switching memory view in the hypervisor by updating the EPTP in the VMCS. This needs all the guests to enter the hypervisor mode by either VMEXIT or VMCALL for reloading the new EPTP. The Intel VT-x hardware feature, VMFUNC comes to help. VMFUNC is an instruction that allows the hardware designer to extend the VM-related functions and one of the extensions is switching between up to 512 EPT pointers without trapping into hypervisor mode. Specifically, we can setup 512 EPTPs in a memory page pointed by the EPTP_LIST in VMCS and use the VMFUNC instruction to switch to one of the EPTPs indexed by ECX in the guest. Therefore, we can generate an exception to the guest OS from the hypervisor to make the guest switch to another memory view without the cost of world switching.

50 4.4. EVALUATION CHAPTER 4. SECURING CRITICAL I/O OPERATIONS WITH THYPE

4.4 Evaluation

In this section, we evaluate the implementation of our system in two aspects. First, we evaluate the performance of view switching of our modified Xen hypervisor with a PC laptop equipped with 2.4 GHz CPU and 4G RAM. Secondly, we evaluate the latency caused by our IO virtualization with the Nexus 10 tablet.

4.4.1 Performance Overhead of Memory View Switching

We evaluate the performance overhead of switching between the original and the alternate views by setting up executable-only (XO) pages in the original view and switch to the alternate view whenever a read/write access happens on the XO pages. Specifically, when the CPU reads or writes the an XO page, an exception would be generated for the hypervisor to handle the access violation. In our evaluation, we switch to the alternate view where those XO pages are setup as RW pages in the hypervisor and return to the guest for executing the read/write instruction. Here, we intentionally switch back to the original view right after the read/write instruction by a hypercall such that the next read/write instruction would trigger another view switch as well. We use SunSpider 1.0.2 [Sun] as the benchmark and the XO pages are chosen if they contain the instructions which would be executed by the workload. Our experiment shows that around 7.7% overhead is generated when we have a program reading those XO pages frequently. The performance of view switching could be optimized by using VMFUNC to switch views in guest mode. Our results show that around 5.6% overhead is generated when we switch views in the guest mode without an extra world switching. Figure 4.7 illustrates our evaluation results. The base is the SunSpider result running on the original Xen hypervisor. Besides the results of view switching by VMCALL and VMFUNC, the green bar shows the overhead without the program which keeps triggering view switches. Based on that, we could say that the hypervisor has only 1.5% overhead for maintaining the alternate view for one guest domain. ∼ Although we cannot measure the overhead on the real system of tHype so far, our use case is switching views only when we need to start or finish a trusted app which does not happen frequently such that the performance overhead would be acceptable. On the other hand, if we need to modify all the GPA/HPA mappings of the framebuffer memory without view switching, there are 8,000 mappings to be modified on Google Nexus 10 (4,000 pages for the background window, and∼ 4,000 pages for the three foreground windows). With view switching, we only need couple instructions for updating the EPT pointer or even one VMFUNC instruction to do it with hardware support. This also demonstrates the benefits of our system.

51 4.5. DISCUSSION CHAPTER 4. SECURING CRITICAL I/O OPERATIONS WITH THYPE

8%

7%

6%

5%

4% Overhead 3%

2%

1%

0% VMCALL VMFUNC zero_view_swtich

Figure 4.7 Performance Overhead of View Switching

4.4.2 Latency of Virtualized IO

The latency of IO virtualization is evaluated on the Nexus 10 tablet. For the touchscreen input, each event is originally captured by the hardware and reported to upper software layer in the ISR of the touchscreen driver. With our modifications, the input events are collected into a shared memory page in the hypervisor and processed by the ring-buffer driver in the guest OS as mentioned in Section 4.3. Therefore, the latency of the virtualized input would be the time period starting from the input event is queued in the ring-buffer until it is processed by the driver in the guest OS. We add a 32-bit timestamp in the data structure of each input event for our statistics. Inside that timestamp field, we store the Cycle Count read from the PMCCNTR register provided by ARM architecture. To enable the counter, we need to assert the enable bit of the PMCR and the PMCNTENSET registers as well. In our experiment, we log the CPU cycles spent on thousands of input events and calculate the average latency based on the CPU frequency. The average latency of one input event is 33.238µs which is small enough for human users. For the virtualized screen output, as we virtualize∼ it by manipulating the memory mappings, there is no obvious overhead in our implementation.

4.5 Discussion

In this section, we re-visit our system design and implementation for possible improvements. First, the use of memory view switching for paravirtualization is new to existing hypervisors. We think the GPA/HPA mapping is suitable for multiplexing hardware resources on memory-mapped I/O platforms such as ARM. Specifically, the hardware registers, DMA buffers, etc. can be masked through controlling the SLAT in the hypervisor. When the hypervisor needs to allow a guest to access certain

52 4.5. DISCUSSION CHAPTER 4. SECURING CRITICAL I/O OPERATIONS WITH THYPE hardware resource, the lightweight memory switch can achieve that. Although the current view switching prototype is based on x86 architecture, the ARM virtualization extension has the similar multi-level table for translating intermediate physical address (IPA) to physical address (PA). We think the concept of switching between different “root tables”, performing copy-on-write (COW) to save space, and synchronizing multiple tables with “dirty bits” is common to different implementations of SLAT. As we have demonstrated the feasibility of our scheme on x86 Xen hypervisor, only minimal engineering effort is required to enable it on other hypervisors. As we mentioned above, tHype system is suitable for memory-mapped I/O platforms because of the memory view switching design. This enables us to build a framework in the guest OS for virtualizing more devices. For example, instead of using hypercalls to export information to the hypervisor, the framework can provide a set of memory allocator API which is bound with the hypervisor that controls the real hardware resources. This way, all kinds of drivers in the guest OS which need to access the HPA for controlling the hardware could be virtualized by the hypervisor smoothly, which would greatly help to deploy the tHype system. We leave it as future work. Since our current prototype relies on the bootloader to bootstrap the hypervisor as a customized kernel image, the integrity of the bootloader plays an important role to ensure that the mobile device cannot boot unauthorized images which could impersonate tHype and trick users to input private data. This may involve extra hardware for storing the key to encrypt/decrypt the boot image. As this is beyond the scope of our research, we assume a trusted bootloader while its integrity could be ensured by security mechanisms such as Secure/Trusted Boot provided by Samsung KNOX [Kno]. While developing the virtualized framebuffer, we find out that the framebuffer memory pages could be dynamically allocated in some cases. For example, when the background is set as an extended desktop, the memory pages for background window would be doubled compared to the static wallpaper setting. When “Live Wallpaper” is set, a new framebuffer window would be allocated. For simplicity, we only handle the static wallpaper case in our prototype by pre-allocating framebuffer memory pages. However, it is not complicated to provide a hypercall interface for the guest to dynamically allocate pages or new windows. To allow the trusted guest domain to access network and disk, we redirect the I/O requests to the helpers running on the untrusted guest OS. This may results in a problem that the compromised helpers can drop all requests and stop serving the trusted domain. Although we can port more drivers into the hypervisor and build a complete software stack in the trusted OS (e.g., networking and filesystem) to decouple the trusted domain from the untrusted domain, we choose to implement the timeout mechanism in the trusted OS to warn the user of the suspicious behaviors before more damages have been made.

53 4.6. SUMMARY CHAPTER 4. SECURING CRITICAL I/O OPERATIONS WITH THYPE

4.6 Summary

We have presented the design, implementation and evaluation of tHype, a thin hypervisor to allow users to access critical I/O devices securely. Specifically, by virtualizing framebuffer/LCD and touch- screen, our system creates a secure tunnel for users to input critical private data such as password and bank account while using security sensitive apps. We have implemented a tHype prototype based on Google Nexus 10 tablet while part of the functions are developed with x86 Xen hypervisor. The evaluation results show that our system is practical and efficient.

54 CHAPTER

5

DEPRIVILEGING HOSTED HYPERVISORS WITH DEHYPE

5.1 Introduction

Based on recent advances on hardware virtualization (e.g., Intel VT [Int] and AMD SVM [Amd]), hosted hypervisors non-intrusively extend the underlying host operating systems (OSs) and greatly facilitate the adoption of virtualization. For example, KVM [Kiv07] is implemented as a loadable kernel module that can be conveniently installed and launched on a commodity host system without re-installing the host system. Moreover, hosted hypervisors can readily benefit from a variety of functionalities as well as latest hardware support implemented in commodity OSs. As a result, hosted hypervisors have been increasingly adopted in today’s virtualization-based computer systems [Net]. Unfortunately, virtualizing a computer system with a hosted hypervisor is still a complex and daunting task. Despite the advances from hardware virtualization and the leverage of various functionality in host OS kernels, a hosted hypervisor remains a privileged driver that has a large code base with a potentially wide attack surface. For instance, the KVM kernel module alone contains 33.6K source lines of code (SLOC) that should be a part of trusted computing base (TCB). Moreover, within the current code base, several components – inherent to its design and implementation –

55 5.1. INTRODUCTION CHAPTER 5. DEPRIVILEGING HOSTED HYPERVISORS WITH DEHYPE are rather complex. Examples include the convoluted memory virtualization and guest instruction emulation. These components occupy half of its code base and are often the home to various exploitable vulnerabilities. Using the popular hosted hypervisors – KVM and VMware Workstation – as examples, if we exam- ine the National Vulnerability Database (NVD) [Nvd], there are more than 24 security vulnerabilities reported in KVM and 49 in VMware Workstation in the last three years. Some of these vulnerabilities have been publicly demonstrated to “facilitate” the escape from a confined but potentially subverted (or even malicious) VM to completely compromise the hypervisor and then take over the host OS [Clo; Vir]. Evidently, having a compromised hosted hypervisor is not just a hypothetical possibility, but a serious reality. Moreover, once a hypervisor is compromised, the attacker can further take over all the guests it hosts, which could lead to not only disrupting hosted services, but also leaking potentially con- fidential data contained within guest VMs. It has been reported that the data confidentiality and auditability problem is a main obstacle for the continued growth and wide adoption of cloud com- puting [Arm10]. Consequently, there is a pressing need to develop innovative solutions to protect the host system and running guest VMs from a compromised (hosted) hypervisor. To address the above need, researchers have explored various approaches. For example, sys- tems have been proposed to formally verify small micro-kernels (e.g., seL4 [Kle09]) so that they do not contain certain software vulnerabilities. Others (e.g., HyperSafe [WJ10]) admit the presence of exploitable software bugs in hypervisors, but develop new techniques to protect the runtime hyper- visor integrity. Additional systems are also developed to re-visit (bare-metal) hypervisor design by proposing new architectures so that the hypervisor TCB can be minimized [Mur08; SK10]. However, these systems typically require a new bare-metal hypervisor design such that their applicability to commodity hosted hypervisors remains to be shown. In another different vein, a number of systems have been proposed to isolate buggy or untrusted device drivers such as [BWZ10; Gan08; Sha09; SG11; Xio11]. However, it is unclear how they can be applied to protect hosted hypervisors. In particular, they do not address host-guest mode switches and hardware-based memory virtualization (e.g., EPT [Int]), which are unique and essential to hosted hypervisors. HyperLock [Wan12] similarly creates a separate address space in host OS kernel so that the execution of KVM as a loadable module can be isolated. However, it still runs in privileged mode and requires additional complex techniques to avoid possible misuse of privileged code. In this chapter, we present DeHype, a system that applies the least privilege principle to hosted hypervisors so that the attack surface can be dramatically reduced. Specifically, by deprivileging the execution of (most) hypervisor code in user mode, we can not only reduce the exposed attack surface,

56 5.2. DESIGN CHAPTER 5. DEPRIVILEGING HOSTED HYPERVISORS WITH DEHYPE but also protect the host system even in the presence of a compromised hypervisor.1 However, challenges exist to deprivilege hosted hypervisor execution. In particular, hosted hypervisors are typically tightly coupled with the host OSs. Accordingly, we propose a dependency decoupling technique to break the tight dependency of hosted hypervisors on host OSs. In other words, the related kernel interfaces leveraged by hosted hypervisors are abstracted and provided at the user space. As a result, the related functionalities such as memory management and signal handling could be re-provisioned to the hypervisor without the help of the host OS. Moreover, to allow for hardware virtualization support (e.g. Intel VT-x [Int]), there are certain instructions that cannot be deprivileged. To accommodate them, we define a minimal subset of privileged hypervisor code into an OS extension, called HypeLet. When the (deprivileged) hypervisor demands to issue a privileged instruction, it traps to the HypeLet by system calls and executes the related instruction in privileged mode. In addition, as hardware support for memory virtualization such as EPT [Int] requires mapping virtual addresses into physical addresses, when DeHype deprivileges the related memory virtualization functionality to user mode, we accordingly propose another technique called memory rebasing for efficient translation in user mode. We have developed a proof-of-concept prototype to deprivilege the popular hypervisor KVM (version kvm-2.6.32.28). Specifically, our prototype runs 93.2% of the loadable KVM module code base in user mode while adding a small TCB (2.3K SLOC)∼ to the host OS kernel. By decoupling the hypervisor code from the host OS and deprivileging its execution, our system essentially demotes the hypervisor as a user-level library (e.g., together with the original companion program – QEMU [Bel05]). This brings additional benefits for its development, extension, and maintenance. For example, since it runs as a user mode process, we can use various feature-rich tools (e.g. GDB [Gdb] and Valgrind [Val]) to facilitate its development and debugging. Moreover, the DeHype design naturally supports running multiple (deprivileged) hypervisors independently on the same host and also opens new opportunities in readily applying recent “out-of-VM” monitoring methods or security mechanisms (e.g., VMwatcher [Jia07] and Ether [Din08]). The evaluation with a number of benchmark programs show that our system is effective and lightweight (with a performance overhead of less than 6%).

5.2 Design

By effectively deprivileging the execution of hosted hypervisors, we aim to significantly reduce the attack surface possibly exposed from them. To elaborate our design, we use the popular KVM

1Although the hosted hypervisor includes the host OS in its TCB, we greatly narrow down the interface exposed by the host OS to untrusted (guest) code.

57 5.2. DESIGN CHAPTER 5. DEPRIVILEGING HOSTED HYPERVISORS WITH DEHYPE

Guest VM Guest VM Guest VM

App App App

Guest VM Guest VM Guest VM OS OS OS

App App ..... App De−Privilege .... KVM KVM KVM User + Guest OS OS OS ...... HypeLet Kernel KVM Host OS Host OS

(a) Original KVM Architecture (b) Deprivileged KVM Architecture

Figure 5.1 An overview of DeHype to deprivilege hosted hypervisor execution

hypervisor as the example. Specifically, KVM is an open-source host hypervisor that has been integrated into mainstream Linux kernel. It is implemented as a loadable kernel module, which once loaded extends the host OS to make use of hardware virtualization support. Each KVM-based guest has a user-mode companion program called QEMU. It facilitates bootstrapping guest machines and emulating certain hardware devices (e.g., network cards) by directly interacting with KVM via system calls. For instance, the companion QEMU program may issue an ioctl command, say KVM_RUN, to KVM to perform a host-to-guest world switch. By design, each guest VM is paired with an instance of the user-mode QEMU program while sharing the same privileged KVM hypervisor instance with other guest VMs. With DeHype, we decompose the KVM into two parts: the deprivileged KVM hypervisor running in user mode and a minimal loadable kernel module called HypeLet running in kernel mode. The deprivileged KVM essentially runs as a user-level library that provides necessary functionalities to interact with HypeLet. In our current design, we naturally integrate the deprivileged KVM into its user-mode companion program QEMU. By doing so, when QEMU issues an ioctl command to KVM, the deprivileged hypervisor receives it as a user-mode function call and then processes it locally. If the processing involves certain privileged code that cannot be deprivileged, it relays the request to HypeLet through a system call. As a result, if a host runs multiple VMs, each VM is paired with its own instance of deprivileged KVM and the original QEMU instance while sharing the same HypeLet OS extension. In Figure 5.1, we show the comparison between the original KVM and the deprivileged KVM. In the rest of this section, we describe our system in detail with a focus on key challenges and related solutions.

58 5.2. DESIGN CHAPTER 5. DEPRIVILEGING HOSTED HYPERVISORS WITH DEHYPE

5.2.1 Dependency Decoupling

To deprivilege a hosted hypervisor, our first challenge is to delineate the tight dependency between the hosted hypervisor and the host OS for decoupling. Particularly, KVM intensively leverages several key functionalities implemented in the host OS. For example, KVM allocates kernel memory based on the default slab allocator [Bon94] provided by Linux kernel. Also, the API, cond_- resched, is invoked to relinquish the processor such as when the hypervisor is pending for certain inputs or events. Accordingly, we need to supply those related functionalities to the hypervisor in user space. Our approach starts from performing a breakdown of the KVM hypervisor. By decomposing it into multiple components, we gain necessary insights and take different ways to deprivilege them. Specifically, there are a few components that involve little or no interaction with the host OS and thus can be largely moved into user space in a straightforward manner. One such example is guest instruction emulation component in KVM. Although the component itself is rather complex and will be invoked to interpret and execute certain guest instructions, its interaction with the host OS is minimal and can be largely deprivileged to user mode. Meanwhile, there also exist certain components that may rely on host OS for their functionalities. A representative example is the kernel memory management that depends on the host OS kernel by utilizing known kernel APIs for memory allocation and deallocation. To deprivilege it, we need to provide a user-mode counterpart. In certain cases where a privileged operation may be involved, a user-mode replacement may not be sufficient and it becomes necessary to split the functionality into two parts: one in user mode and the other in kernel mode. As the user-mode part is deprivileged, there is a need to minimize the kernel-mode part, which eventually becomes part of HypeLet. One example is the guest memory virtualization where basic operations on updating guest page tables may be performed in user mode but critical ones on instantiating or putting them into effect should be performed in kernel space only. Last but not least, there also exist certain components in KVM that may not be demoted to user space. For example, kernel-side event handling and notification as well as hardware virtualization support (e.g., Intel VT-x [Int]) will remain inside the host OS kernel and become part of HypeLet. We highlight that HypeLet should contain only the privileged hypervisor code that simply cannot be executed in user space. Being part of TCB, it is desirable to keep HypeLet minimal. In our current prototype, it mainly contains those privileged instructions introduced for hardware virtualization support (e.g. Intel VT-x [Int]). When the deprivileged hypervisor running in user mode demands to issue such a privileged instruction, it traps to the HypeLet by a system call, which then executes the corresponding instruction in the privileged kernel space. In addition, other than those privileged

59 5.2. DESIGN CHAPTER 5. DEPRIVILEGING HOSTED HYPERVISORS WITH DEHYPE instructions, there also exist kernel-side routines in HypeLet to facilitate the inquiries from the deprivileged hypervisor. For example, a MAP_HVA_TO_PFN service is provided to translate the host virtual address to the related physical memory frame, which is needed to deprivilege hardware- assisted memory virtualization (Section 5.2.2). To further restrict the deprivileged KVM and user-mode QEMU, we also limit the exposed system call interface and available resources with system call interposition. By doing so, we can effectively mediate the runtime interaction from deprivileged KVM (and QEMU) with HypeLet. As the system call interposition mechanism is a well-studied topic, we omit the details in the paper.

5.2.2 Memory Rebasing

Our next challenge is to efficiently support hardware-assisted memory virtualization such as Intel’s EPT [Int]. Specifically, with hardware-assisted memory virtualization, a hosted hypervisor requires to directly manage memory pages in physical address space so that those addresses stored in the nested page tables can be accessed by guest VMs. In the original KVM design as a loadable kernel module, it can simply enjoy feature-rich APIs in the host OS kernel to perform the translation between virtual and physical address spaces. However, once deprivileged, it poses challenges in two main aspects: First, a memory page allocated by a user-level program may be paged out at runtime. Second, a user-level program does not have the mapping information for virtual-to-physical translation. In our current prototype, we solve these problems by allocating pinned memory blocks in Linux kernel and mapping them to user space. Specifically, through HypeLet, we pre-allocate a contiguous pinned memory block for each hypervisor. The pre-allocated memory block is then mapped to user space through the mmap system call so that the (deprivileged) hypervisor can access and use it to build the memory pool for its internal memory management (Section 5.2.1). By passing the base address of the pre-allocated memory to it, the hypervisor though running in user mode can still obtain the necessary mapping to translate a host virtual address of the memory chunk allocated from its memory pool into a physical address. Accordingly, we propose a memory rebasing technique that allows for simply calculating the offset from the memory pool in virtual space and adding it to the base of the pre-allocated block in physical space. Since the memory pool mapped from a pinned memory block is allocated in kernel, we can ensure that any memory page allocated from the pool is always present. Therefore, the hypervisor can safely assign those memory pages into the nested page tables with the corresponding physical addresses.2 In essence, by applying the memory rebasing mechanism, we can allow the deprivileged hy-

2For simplicity, our current prototype assumes that the hypervisor makes static physical memory allocation in its initialization phase. However, it could be readily extended to support dynamic physical memory allocation (e.g., by maintaining multiple pinned memory blocks and associated base addresses).

60 5.2. DESIGN CHAPTER 5. DEPRIVILEGING HOSTED HYPERVISORS WITH DEHYPE pervisor to maintain nested page tables (NPTs) in user mode. With that, these NPTs become the interface for guest VMs to access actual physical memory pages. It has a caveat though: if the hy- pervisor is compromised, despite the fact that it runs in user mode, a guest VM might still be able to access memory beyond the permitted range. In other words, these NPTs may be exploited to subvert the host OS. Fortunately, as NPTs are only used in guest mode, we can postpone all NPT updates (requested by the hypervisor) until the next VM entry occurs. Since each single VM entry is handled by the privileged HypeLet, we can apply a sanity check to ensure only memory pages that belong to the hypervisor or the guest VM are eligible to be mapped (right before HypeLet updates NPTs for actual use). In our prototype, when the user-mode hypervisor is about to update an NPT entry, the entry address and the value to be stored are recorded in a buffer, which is later batch-processed until the hypervisor traps to the HypeLet. During the sanity check, if a malicious address is identified in the buffer, HypeLet simply suspends its execution of the affected hypervisor and the guest VM. By doing so, a compromised hypervisor cannot access those memory pages that belong to other guest VMs or the host OS.

5.2.3 Optimizations

When compared with the original KVM running in kernel mode, a deprivileged one needs to trap to HypeLet for privileged operations. This naturally introduces a system call latency and potentially becomes a source of performance overhead. In our prototype, we monitor the bootstrap process of a guest VM to understand the number of traps (to HypeLet) caused by the privileged instructions executed within each KVM_RUN session. Our results show that thousands of privileged instructions are executed within most KVM_RUN sessions when the guest VM is booting up. As an example, we observe 195,187 privileged instructions executed within a particular KVM_RUN session. If we naively invoke a system call for each privileged instruction, it would translate to 195,187 system calls for the particular KVM_RUN session. To minimize the performance overhead, we propose a cache-based batch-processing mech- anism to reduce the number of unnecessary system calls. In particular, by profiling the runtime behavior of deprivileged KVM (another benefit from running it in user mode), we notice that most system calls are triggered by the instructions to access various fields in the virtual-machine control structure (VMCS). Also, we notice that it is not necessary to make those VMCS fields always syn- chronized. In fact, while running in host mode, as far as these fields are updated before the next guest-to-host world switch, we can ensure the correctness of guest execution. Based on the above observations, we maintain a cached VMCS copy in user mode for the deprivileged hypervisor to

61 5.3. IMPLEMENTATION CHAPTER 5. DEPRIVILEGING HOSTED HYPERVISORS WITH DEHYPE access without invoking any system calls. The cached copy will be synchronized to the real one (maintained in kernel) on demand when there is a need to issue a world switch. Beside the cache-based VMCS optimization, our system also implements another optimization that is related to another frequently invoked privileged service in HypeLet, i.e., MAP_HVA_TO_PFN. This privileged service fulfills the queries to translate a host virtual address into the corresponding physical frame number. Different from the previous memory rebasing mechanism, this service could be used to translate memory pages allocated by the QEMU, which are not from the hypervisor’s memory pool. Although these memory pages are not managed by the hypervisor, it still needs the physical address to handle related NPT faults. We notice that the mapping of these memory pages is always consistent throughout the QEMU lifetime, we can therefore cache the mappings that are already queried inside the hypervisor to reduce the number of system call traps into HypeLet.

5.3 Implementation

We have implemented a proof-of-concept prototype to deprivilege the KVM execution (version 2.6.32.28). Our current prototype is developed on a Dell desktop (with the Intel CoreTM i7 860 CPU and 3GB memory) running Ubuntu 11.10 and Linux kernel 2.6.32.28. Next we present our prototype in more details.

5.3.1 Dependency Decoupling

To deprivilege the KVM execution, our prototype abstracts the host OS interface that is being used by KVM and provides a similar one in user mode. Specifically, our prototype provides a slab-based memory allocator in user mode to fulfill the need of allocating and releasing memory to satisfy KVM needs. But different from the default memory allocator in Linux kernel that prepares its memory pool in boot-up time with the pre-defined kernel heap, our version of memory allocator can be flexibly configured to set its heap to an arbitrary memory block in user space, which becomes one key step to enable the memory rebasing mechanism (Section 5.2.2). Our prototype also provides necessary function routines to emulate original kernel memory access APIs. For example, virt_to_page has been widely used in KVM to translate a virtual address to the corresponding memory frame. As the deprivileged hypervisor allocates memory pages from an internal memory allocator, the original memory accesses cannot be directly used but need to be adjusted for conforming to a different memory layout of the memory heap. Moreover, our prototype also leverages the default support in GLIBC [Gli] for a variety of issues, such as handling signals, performing process scheduling-related operations, and invoking system calls to trigger the

62 5.3. IMPLEMENTATION CHAPTER 5. DEPRIVILEGING HOSTED HYPERVISORS WITH DEHYPE

Table 5.1 Ten Privileged Services in DeHype

Name Function Description VMREAD read VMCS fields VMWRITE write VMCS fields GUEST_RUN perform host-to-guest world switches GUEST_RUN_POST perform guest-to-host world switches RDMSR read MSR registers WRMSR write MSR registers INVVPID invalidate TLB mappings based on VPID INVEPT invalidate EPT mappings INIT_VCPU initialize vCPU MAP_HVA_TO_PFN translate host virtual address to physical frame

privileged HypeLet services. As these library routines are ready-to-use, we found integrating them together with the deprivileged KVM hypervisor is a rather straightforward process. As mentioned earlier, there also exist some privileged instructions that cannot be demoted to user space. To accommodate them, our prototype introduces HypeLet to support a minimal set of privileged hypervisor code that can be invoked from the deprivileged KVM. In Table 5.1, we show those privileged services being supported in HypeLet. In total, there are 10 privileged services. Six of them, i.e., VMREAD, VMWRITE, GUEST_RUN, GUEST_RUN_POST, INVVPID, INVEPT, are services for executing privileged instructions that are introduced for hardware virtualization support. INIT_VCPU is another service that basically initializes essential data structures for a virtualized guest VM, including vCPU. RDMSR and WRMSR are two other services to access model-specific registers with privileged instructions. Our profiling results indicate that RDMSR and WRMSR are mainly used in the VM initialization phase and do not frequently occur in normal hypervisor execution. The last service, MAP_HVA_TO_PFN, does not contain any privileged instruction but is included to answer requests (from the deprivileged KVM) about the mapping from a host virtual address to its physical address. Since the hypervisor requires the mapping to handle possible NPT faults, MAP_HVA_TO_PFN is a frequently requested service that should be optimized (Section 5.2.3).

5.3.2 Memory Rebasing

With deprivileged KVM, the support of hardware-assisted memory virtualization poses unique challenges. Unlike prior software based approaches that require the hypervisor to frequently update the shadow page tables, the hardware-assisted memory virtualization enables the guest to maintain guest page tables (GPTs) while the hypervisor maintains nested page tables (NPTs) to regulate the translation from guest physical addresses to host physical addresses. To maintain NPTs, the hypervisor requires allocating memory pages and storing the associated physical addresses into NPTs

63 5.3. IMPLEMENTATION CHAPTER 5. DEPRIVILEGING HOSTED HYPERVISORS WITH DEHYPE

virtual

u_addr physical 2. Remapping the pinned

memory to user space u_base 3. u_addr −> k_addr

p_addr

4. k_addr −> p_addr

k_addr

k_base 1. Pre−allocating pinned memory in kernel space

Figure 5.2 The memory management in DeHype. The solid lines mark the ways to generate the memory blocks in different address spaces while the dotted lines mark the translation between memory address spaces.

for proper translation. For the traditional KVM as a loadable kernel module, allocating new memory pages and translating their virtual addresses into physical addresses are relatively straightforward. However, with DeHype, the deprivileged hypervisor runs in user mode and does not have the knowledge of the physical addressing space. Moreover, the deprivileged hypervisor cannot prevent the host OS kernel from paging out the memory pages it allocated. To address these problems, our prototype implements a memory rebasing mechanism to fa- cilitate the deprivileged hypervisor to maintain NPTs correctly. In essence, our solution (shown in Figure 5.2) involves allocating pinned memory pages in kernel space and then remapping them to user space. Specifically, in the initialization phase (line 1), we have the HypeLet pre-allocate a pinned memory block (base address: k_base) for each hypervisor.3 With a simple driver interface implemented in HypeLet, we can allow the user-mode hypervisor to remap the pinned memory block to user space. In particular, a mmap call effectively translates k_base to u_base – so that the pinned memory block based at k_base in kernel memory can be accessed by u_base in user space (line 2). After that, the mmap’ed memory block combined with the (k_base, u_base) can be used to build the memory pool for the deprivileged hypervisor’s memory allocator in user space. By doing so, we can guarantee that each memory page allocated from the pool can be efficiently translated to physical address space with our scheme.

3In the kernel configuration, CONFIG_FORCE_MAX_ZONEORDER can be adjusted for allocating a larger-sized block.

64 5.3. IMPLEMENTATION CHAPTER 5. DEPRIVILEGING HOSTED HYPERVISORS WITH DEHYPE

Privileged Service Request VM Entry Pseudo NPT

j i A’ k C’ R’ B’ User Time Kernel Allocate B (A[j]=B) j Allocate C i k Allocate A; (B[k]=C) A C (R[i]=A) Buffer R B

Real NPT

Figure 5.3 An example of constructing pseudo NPTs for the deprivileged hypervisor to traverse.

As an example, suppose the hypervisor allocates a new NPT table for NPT violation handling. Whenever an NPT violation occurs, a memory page (located at u_addr) is allocated from the memory pool for filling the page table entry. To do that, we need to locate the corresponding physical address, namely p_addr. As the mapping of a userspace address to physical address cannot be conveniently retrieved, we choose to use the corresponding kernel space address, namely k_addr, and rely on the virt_to_phys(x) function, which in our x86-32 Linux-based prototype is a simple calculation, i.e., (x) - PAGE_OFFSET, to perform the translation. Further, because u_addr is allocated from the memory pool based at u_base that has a corresponding kernel space address k_base, we can simply calculate k_addr by u_addr - u_base + k_base (line 3). With that, we can further calculate p_addr by virt_to_phys (line 4) and use it to update the NPT entry. To securely update NPT entries (Section 5.2.2), each deprivileged hypervisor instance saves the pairs of address and value to be updated into a local buffer for batch processing. Note that the NPT consists of four levels of page tables. If the hypervisor needs to update an entry in the level-1 table (the lowest level), the parent table or level-2 as well as all the ancestor tables – level-3 and level-4 – need to be traversed before reaching the level-1 table. Since our hypervisor runs in the user mode and is prohibited from performing NPT updates, there are no actual NPTs for traversal from the hypervisor standpoint. To accommodate that, we choose to construct pseudo NPTs. Specifically, when an NPT violation occurs, the hypervisor allocates two memory pages, page P from the ’ed memory pool and page P from the process’ heap while a hash table is used for mmap 0 bookkeeping the relationship. The hypervisor will use P to update the real NPT and P to update 0 the pseudo NPT. In particular, as illustrated in Figure 5.3, we first initialize a root-level or level-4

65 5.3. IMPLEMENTATION CHAPTER 5. DEPRIVILEGING HOSTED HYPERVISORS WITH DEHYPE pseudo page table R . The NPT traversals are redirected to the pseudo page table and the updates 0 go to the real root-level table R. When the first NPT violation occurs, all NPTs except the root-level one are empty. We then allocate a page A to modify some entry, say i , of R . At the same time, we 0 0 also allocate a page A and issue an update to the i th entry in R. Therefore, a further update on the j th entry of the level-3 table A could be done by (1) finding A from R , (2) allocating two page B 0 0 0 0 and B, (3) book keeping the two pages on the hash table, (4) updating the j th entry of A with B , 0 0 and (5) adding a record of updating the j th entry of A with B. For further updates to an existing entry on the pseudo NPT (e.g. flushing page B ), the corresponding log for the page B on the real 0 NPT could be obtained with the help of the hash table. As a result, the hypervisor can traverse the pseudo NPT and generate accurate records for updating the real NPT. Our pseudo NPT design is similar to the traditional shadow paging but differs in two aspects: First, pseudo NPT only shadows the NPT tables while shadow paging needs to mirror a much larger number of guest page tables; Second, our scheme batch-updates the real NPT tables thus incurs less performance overhead than shadow paging, which is required to trap on the guest’s updates to their page tables and synchronize the updates to the real page tables. Our experiments show that pseudo NPT enables the hypervisor to securely manage NPT with a small performance overhead. Although pseudo NPT introduces additional memory overhead, it is necessary to secure NPT updates as we assume that the hypervisor is untrusted.

5.3.3 Optimizations As elaborated in Section 5.2.3, our system design requires a system call to invoke any privileged service in HypeLet, which could introduce extra performance overhead. To mitigate that, we provide a cache-based batch processing mechanism to reduce the number of unnecessary system calls. In particular, our prototyping experience shows that around 90% of invoked privileged instructions are related to accessing the virtual machine control structure (VMCS). Therefore, our prototype aims to reduce the overhead from the large number of VMCS accesses. To elaborate our implementation, we briefly review how VMCS is accessed in a virtualized system. For each guest VM, the hypervisor needs to allocate memory to initialize the corresponding VMCS. Before the guest launches and between each of its guest-mode runs, two privileged instructions, VMREAD and VMWRITE, will be executed to access VMCS (for the purpose of either monitoring or controlling the behavior of the guest VM). Throughout the running period in guest mode, the guest VM execution indirectly affects related VMCS fields that can be later retrieved by hypervisor when it switches back to host mode (e.g., triggered by a VMEXIT) In our implementation, we maintain a VMCS copy in user mode so that VMREAD calls can simply be redirected to read update-to-date results from the cache without issuing any system call. To

66 5.3. IMPLEMENTATION CHAPTER 5. DEPRIVILEGING HOSTED HYPERVISORS WITH DEHYPE

Table 5.2 Cached VMCS Fields

VMREAD GUEST_INTERRUPTIBILITY_INFO GUEST_CS_BASE IDT_VECTORING_INFO_FIELD GUEST_ES_BASE GUEST_PHYSICAL_ADDRESS_HIGH GUEST_CR3 VM_EXIT_INTR_INFO GUEST_RFLAGS GUEST_PHYSICAL_ADDRESS VM_EXIT_REASON VM_EXIT_INSTRUCTION_LEN GUEST_CR4 EXIT_QUALIFICATION GUEST_DS_BASE CPU_BASED_VM_EXEC_CONTROL GUEST_RSP GUEST_CS_SELECTOR GUEST_RIP GUEST_CS_AR_BYTES GUEST_CR0 GUEST_PDPTR0_HIGH GUEST_PDPTR0 GUEST_PDPTR1_HIGH GUEST_PDPTR1 GUEST_PDPTR2_HIGH GUEST_PDPTR2 GUEST_PDPTR3_HIGH GUEST_PDPTR3 VMWRITE GUEST_RFLAGS GUEST_RSP CPU_BASED_VM_EXEC_CONTROL GUEST_RIP VM_ENTRY_INTR_INFO_FIELD EPT_POINTER EPT_POINTER_HIGH GUEST_CR3 avoid synchronizing the large VMCS structure (over 140 fields inside), we profile the KVM execution to locate the top 28 most frequently accessed VMCS fields and save them in the cache. By caching those 28 fields, we found that we can effectively reduce 99.86% of the extra system calls caused by VMREAD requests. For VMWRITE, we apply the similar caching scheme. By choosing to save the 8 frequently VMWRITE’ed VMCS fields, we can reduce 98.28% of extra system calls caused by VMWRITE requests. In total, our prototype caches 31 VMCS fields4 and achieves a good balance between the synchronization cost and the system call latency. The detailed list of cached fields is shown in Table 5.2. Further, in order to maintain the same hardware protection scheme of VMCS, we have a dirty bit associated with each cached VMWRITE field. When a VMWRITE is requested by the deprivileged hypervisor in user mode for updating a cached VMWRITE field, the dirty bit would be set. On the other hand, if a cached VMWRITE field is somehow written via other ways (e.g., MOV) instead of VMWRITE, the dirty bit would not be set and the content would not be flushed to the hardware. To avoid potential attacks that overwrite a dirty cached VMWRITE field, we also store the hash value of the legitimate VMWRITE’d value in a separate array. Therefore, we can invalidate the illegal cache fields while performing synchronization.

4There are five overlapping VMCS fields common in cached VMREAD and VMWRITE fields.

67 5.3. IMPLEMENTATION CHAPTER 5. DEPRIVILEGING HOSTED HYPERVISORS WITH DEHYPE

5.3.4 Lessons Learned In this subsection, we share additional experiences or frustrations we learned when implementing the prototype. The first one is about missing interrupt events in our earlier unsuccessful prototype. In particular, QEMU issues the KVM_RUN ioctl command to enter guest mode. If there is no event occurred (e.g., a pending interrupt), the main thread keeps doing VM entry and VM exit in a loop. When QEMU is about to inject an interrupt to the guest, it signals the main thread (by pthread_- kill) so that the main thread knows that it needs to exit the loop and returns to QEMU for interrupt handling (by checking the existence of pending signal after each VM exit). In the current KVM code base, it sets the signal masks to ensure that specific signals are allowed to be delivered only when the main thread is in its KVM_RUN session to kernel. More specifically, a kernel API sigprocmask is used in the entry point of the KVM_RUN ioctl to allow only SIG_IPI and SIGBUS to be delivered. Before returning back to QEMU, KVM restores the signal mask so that those signals would not be delivered when the main thread is running in userspace. Our earlier prototype intercepts KVM_RUN ioctl from the deprivileged hypervisor and handles it in user mode with real ioctls issued for privileged instructions. If the signal mask is set as the original KVM, SIG_IPI and SIGBUS would be delivered even when the KVM_RUN is handled in user mode. Therefore, after each VM exit, the signal pending condition would not be accurate since some signals are now delivered in user mode. This is the culprit why our earlier prototype misses interrupt events and fails to maintain accurate system time. To solve this problem, we shrink the allowed signal delivery window to each ioctl handler of VMLAUNCH/VMRESUME instruction. Since KVM checks the signal pending condition after VM exit, it would not affect the QEMU by sending signals but keeps the signal pending condition until the next VM entry. This mechanism ensures our system to have a similar interrupt injection frequency as the original KVM architecture has. Another implementation detail is related to a privileged instruction – VMPTRLD. This instruction is used to load the guest states before switching to guest mode when the hypervisor is handling the KVM_RUN request. In many cases, especially when the guest is running a CPU intensive workload, a VMPTRLD could be followed by multiple runs of (VMRESUME,VMEXIT). The reason is that it does not need to handle those VM exits in QEMU. Instead, the hypervisor handles the VM exit and continues the guest’s execution by another VMRESUME. However, in some extreme cases such as running an IO intensive workload in the guest, most VM exits need to be handled in QEMU (e.g. IO instructions). Since VMPTRLD and VMRESUME are executed as separate system calls in our system, it requires at least one more system call than the original KVM to handle a single KVM_RUN request. If the time running in guest mode is extremely short (e.g., the guest is frequently interrupted by IO accesses), the extra system call latency could introduce significant overheads. Notice that the guest states are only used in guest mode, we can then postpone the VMPTRLD instruction until the first VMRESUME

68 5.4. EVALUATION CHAPTER 5. DEPRIVILEGING HOSTED HYPERVISORS WITH DEHYPE instruction to eliminate the extra system call.

5.4 Evaluation

In this section, we evaluate our system by first analyzing security and other related benefits from DeHype and then measuring the performance overhead of our prototype with several standard benchmarks.

5.4.1 Security Benefits

Reducing the attack surface In this work, we assume host hypervisors, either before or after being deprivileged, contain software vulnerabilities that might be exploited by attackers. Accordingly, the traditional “VM escape” attack from a compromised or malicious VM to the hypervisor will still happen in our system. Fortunately, thanks to the deprivileged execution, potential damages that may be caused from such attacks are mostly limited to the hypervisor itself (i.e., including the QEMU process). In particular, with DeHype, all the interactions between the hypervisor and the guest VM occur in the user space. The host OS kernel instead is not directly accessible to a compromised hypervisor, but must be accessed through the system call interfaces exported by HypeLet, which is the only privileged component added by current hypervisor code base. In our prototype, HypeLet contains 2.3K SLOC and defines 10 system calls in total. To further restrict the access to these system calls, our system adopts the known system call interposition technique (Section 5.2.1) to mediate their access and block the default system call interface in host OS kernel from being accessible (that has more than 300 system calls in recent 3.2 Linux kernels). As a result, our system effectively reduces the previously exposed wide attack surface to these 10 system calls. Moreover, the added TCB by KVM is reduced from 33.6K to 2.3K – a 93.2% reduction. It is worth mentioning that in DeHype, each guest is paired with its own deprivileged∼ hypervisor. The hypervisor keeps the guest’s states in pre-allocated memory pages mapped exclusively in its address space. Therefore, it can only access its own guest; other guests are strictly isolated in other processes and not accessible by default. This has the additional benefit of DeHype by protecting other unrelated guest VMs from the compromised hypervisor. Testing real-world vulnerabilities To illustrate DeHype’s security benefits, we explain how real world vulnerabilities from NVD [Nvd] could be mitigated by our system. In the following, we elaborate three of them. The first one we examined is CVE-2009-4031, a vulnerability that is caused by interpreting wrong-size instructions (with too many bytes) in KVM’s guest (x86) instruction emulation. This vulnerability can be exploited by the guest to launch a denial-of-service attack

69 5.4. EVALUATION CHAPTER 5. DEPRIVILEGING HOSTED HYPERVISORS WITH DEHYPE against the host OS kernel. Since DeHype performs instruction emulation in the user space, its exploitation, even successful, is strictly confined within a user-space process. Thus our system effectively mitigates such attack. The second vulnerability we examine is CVE-2010-0435, which allows the guest kernel to cause a NULL pointer dereference in KVM as some function pointers in its Intel-VT support are unini- tialized.5 Because KVM is originally running in privileged mode, this vulnerability can be exploited to crash the host OS. In DeHype, the vulnerability could still be exploited to crash the hypervisor. However, only the hypervisor that is paired with the malicious guest will be affected. With the isolation provided by process boundary, other hypervisors and the host OS are still not affected. This test case is a good example to show the difference from other related work [Erl06; Mao11; Wan12], which leverage software fault isolation techniques to confine memory corruption bugs. Specifically, the difference is that DeHype enables the isolation from hardware (i.e., page tables) instead of rather complex software-based fault isolation techniques. The third vulnerability is CVE-2010-3881, a vulnerability in KVM that leaks kernel data to user space when certain data structures are copied to the user land but without clearing the paddings. A QEMU process could potentially obtain sensitive information from the kernel stack. In DeHype, such “system call” would be intercepted and handled in the user space as a function call. Therefore, the leaked information would only come from the stack of the hypervisor paired with that QEMU process, not from the kernel or other guest VMs.

5.4.2 Other Benefits

By moving the hypervisor to the user space, DeHype also enables some unique benefits and oppor- tunities. In this section, we elaborate two of them. Facilitating hypervisor development In DeHype, the hypervisor is deprivileged to the user space. This makes it possible to develop and debug the hypervisor with tools such as GDB that are convenient and familiar to most programmers. For example, when developing our prototype, we used GDB to debug its pseudo NPT component (Figure 5.3 – Section 3.3), which is one of the most complicated components in the system. In Figure 5.4, we show one debug session with GDB. In this session, we set up a breakpoint at the tdp_page_fault function, the NPT fault handler in KVM. When the KVM execution hits the breakpoint, we can further display the stack trace with the where command, or use the step/stepi command to single step the code and examine changes in machine registers and memory contents

5Note that these function pointers are part of the internal data structure of KVM. The guest kernel may trigger NULL pointer dereference by tricking the KVM to emulate some specific instructions instead of crafting the pointers for other purposes (e.g., running shellcode to access privileged KVM system call interfaces).

70 5.4. EVALUATION CHAPTER 5. DEPRIVILEGING HOSTED HYPERVISORS WITH DEHYPE

Figure 5.4 A GDB session that debugs KVM code with the environment familiar to most programmers.

(e.g., pseudo NPT table) after each step. We can also use the continue command to resume the execution until the next NPT fault to monitor how the pseudo NPT table is built up. During our development, we also used Valgrind [Val], a dynamic instrumentation tool, to detect memory leaks in our prototype (Figure 5.5). To understand the distribution of modifications in new KVM releases that may be related to DeHype, we manually examined three recent releases of KVM (2.6.32, 2.6.33, and 2.6.34) and at- tributed each change to either HypeLet or the deprivileged hypervisor. Specifically, we reviewed changes in the arch/x86/kvm and virt/kvm directories of the Linux kernel which contain the main body of KVM. According to our examinations, 71.7% changes in KVM-2.6.33 (vs. KVM-2.6.32) and 60.9% changes in KVM-2.6.34 (vs. KVM-2.6.33) can be confined in the user space. With DeHype, their development can benefit significantly from the abundant user-space debugging tools. While the results still show 28.3% changes in KVM-2.6.33 (or 39.1% in KVM-2.6.34) may impact DeHype, this is largely because current KVM development freely uses the large body of host OS kernel APIs without restriction. Once the interface between the deprivileged KVM and HypeLet is defined, we found these changes can be dramatically reduced in HypeLet.

71 5.4. EVALUATION CHAPTER 5. DEPRIVILEGING HOSTED HYPERVISORS WITH DEHYPE

Figure 5.5 A Valgrind session that checks possible KVM memory leaks.

Running multiple hypervisors DeHype also naturally allows for multiple mutually isolated hypervisors to concurrently run on the same host and each may have different security features (e.g., in different versions). To illustrate this, we executed two deprivileged KVM hypervisors on our test machine: one has the secure NPT updating feature enabled, while the other has the feature disabled. A guest is then created for each hypervisor. Since both hypervisors share the same HypeLet, we successfully check all NPT updates issued by the guest running on the hypervisor with the feature turned on while the updates of the other guest are handled by the hypervisor itself. This unique capability of DeHype can be potentially leveraged in several different settings. For example, we can apply certain security services such as virtual machine introspection ([Sha09; Jia07]) to monitor the execution of some guests in a host, while running other guests with the normal hypervisor. Moreover, when a new vulnerability is reported and fixed in the deprivileged hypervisor, we can live-migrate all the guests in a host one-by-one to the patched hypervisor at runtime. Under the original KVM, we need to migrate all the guests to another machine altogether, patch the hypervisor, and migrate them back again.

72 5.5. DISCUSSION CHAPTER 5. DEPRIVILEGING HOSTED HYPERVISORS WITH DEHYPE

Table 5.3 Software Packages used in Our Evaluation

Software Package Version Configuration Benchmarks SPEC CPU2006 v1.0.1 reportable int Bonnie++ 1.03e bonnie++ -f -n 256 linux kernel 2.6.39.2 make defconfig Host/Guest Installation Ubuntu Desktop 11.10 default Ubuntu Server 10.04.2 LTS default

5.4.3 Performance

To evaluate the performance overhead introduced by DeHype, we install a number of standard benchmark programs such as SPEC CPU2006 [Spe] and Bonnie++ (a file system benchmark) [Bon]. In addition, we use two application benchmarks to decompress and compile Linux kernel. We mea- sured the elapsed time in the guest with the time command. Our test platform is a Dell OptiPlexTM 980 desktop with a 2.80GHz Intel CoreTM i7 860 CPU and 3G memory. The host runs a default installation of Ubuntu 11.10 desktop with the 2.6.32.31 Linux kernel. The guest runs Ubuntu 10.04.2 LTS server. Table 5.3 summarizes the software packages and configurations in our experiments. Figure 5.6 shows the relative performance of running the benchmarks. The first 12 groups of bars present the relative performance of DeHype running the integer benchmarks of SPEC CPU2006 compared with the vanilla KVM while the last three groups present decompressing Linux kernel (untar_kernel), compiling Linux kernel (make_kernel), and the sequential output performance of Bonnie++. In each group, there are three different DeHype configurations. The DeHype bar de- notes the vanilla DeHype system; DeHype-C reports the optimization benefits from cache-based batch-processing of certain VMCS fields (e.g., VMREAD/VMWRITE); while DeHype-CN indicates ad- ditional overhead by performing secure NPT updates (Section 5.3.2). As shown in the figure, the overall overhead introduced by DeHype is less than 6%. This overhead is inevitable since DeHype by design invokes more system calls than the original KVM.

5.5 Discussion

In this section, we re-examine our system design and implementation for possible improvements as well as explore new opportunities enabled by our approach. First, we assume an adversary model where attackers try to compromise the hypervisor from a guest VM. The privileged HypeLet and its host OS kernel are a part of the TCB. Although the total TCB (with the host OS kernel) may not

73 5.5. DISCUSSION CHAPTER 5. DEPRIVILEGING HOSTED HYPERVISORS WITH DEHYPE

100%

98%

96% DeHype DeHype−C DeHype−CN 94%

92%

90% 400.perlbench401.bzip2403.gcc429.mcf445.gobmk456.hmmer458.sjeng462.libquantum464.h264ref471.omnetpp473.astar483.xalancbmkuntar_kernelmake_kernelbonnie++−write

Figure 5.6 Relative Performance of DeHype be greatly reduced, our system still provides strong protection against malicious or compromised guests by securely confining the hypervisor in the user space. This is particularly true in the cloud environment where the highly constrained HypeLet is the main attack surface exposed to a guest VM. To improve the security level of our system, our prototype performs necessary sanity check on the new 10 system calls introduced by HypeLet to prevent bugs inside the user-level hypervisor from affecting the HypeLet (e.g., including explicit checks for NPT update – Section 5.2.2). Second, our current prototype is still limited in pinning the guest memory. This limitation can be readily addressed by integrating the Linux MMU notifier [Cor]. Specifically, HypeLet registers a set of callback functions to the kernel’s MMU notifier interface, which will notify HypeLet when important memory management events are about to happen. For example, when a “memory swapped out” event takes place, HypeLet will be notified and further reflect the event to the user-level hypervisor. The user-level hypervisor can decide whether to prevent (by marking the page as recently accessed in the age_page mmu notifier) or allow the page swapping according to whether the page is currently in-use or not, respectively. Other events can be similarly handled. By integrating the MMU notifier, we can avoid pinning the guest memory. Meanwhile, the performance of DeHype might be negatively affected slightly due to the overhead in managing these events. Third, our current prototype is limited in not supporting all full-fledged KVM features. Notable ones are SMP and para-virtualized I/O (e.g., virtio [Rus08]). To retrofit our prototype with their support, it is necessary to make a few adjustments that mainly involve additional engineering efforts. Specifically, to support SMP,HypeLet needs to be aware of the presence of multiple virtual CPUs in a guest so that it can schedule VCPUs to physical CPUs, and provide a mechanism (e.g., inter-processor interrupt [Int]) for VCPUs to interrupt and synchronize with one another. The SMP support in the

74 5.5. DISCUSSION CHAPTER 5. DEPRIVILEGING HOSTED HYPERVISORS WITH DEHYPE original KVM can be leveraged for this purpose and make the implementation likely straight-forward. To support para-virtualized I/O, we only need to migrate the virtio [Rus08] virtual device in the original KVM from kernel space to user space. This will likely reduce the performance benefit of virtio because kernel functions used by virtio are not directly accessible and must be replaced by system calls. Still, para-virtualized I/O will perform better than emulated I/O (e.g., virtual Intel e1000 PCI network card in KVM) because it does not involve expensive I/O memory and I/O registers emulation.6 From another perspective, the deprivileged hypervisor architecture as demonstrated in DeHype also introduces some unique capabilities or new opportunities. In particular, as the DeHyped KVM runs as a normal user-mode process, the system can be developed and debugged with the help of many existing tools that are familiar to most programmers. For example, we used GDB [Gdb] to debug our prototype by setting breakpoints, inspecting variables, and executing the code in single-steps. We also used the dynamic instrument tool Valgrind [Val] to detect possible memory leaks in KVM. This is a significant improvement over the kernel-level debugging, in which irregular control flow (e.g., interrupts, task switching, and asynchronous events) makes debugging highly challenging. In addition, our architecture also naturally makes it feasible to run different versions of the KVM hypervisor (as user-level processes) on the same machine. This capability could be useful in several scenarios, for example, to balance performance and security: for virtual machines requiring higher level of security guarantee, we can use an instrumented hypervisor with dynamic information flow tracking [NS05] to detect attacks against the hypervisor. At the same time, we can use a normal hypervisor to manage other virtual machines for better performance. By enabling the suspend/re- sume support of KVM, virtual machines could be live-migrated between these hypervisors, making the performance-security trade-off dynamically configurable. We leave it as future work. Moreover, our architecture can facilitate the design and implementation of a variety of virtualization- based security services (e.g., virtual machine introspection [Jia07]). Some of these services might require modifications to the hypervisor code, which leads to concerns of increased TCB and new vulnerabilities. In DeHype, such changes will most likely be limited to the unprivileged user-mode hypervisor code. Vulnerabilities will still be confined in the process and mediated with the traditional system call interposition approaches (Section 5.2.1).

6 We also point out that with the wide adoption of hardware virtualization, for obvious performance reasons [Kvm], we choose our prototype in favor of hardware-assisted memory virtualization (i.e., NPT), instead of shadowing-based memory virtualization (i.e., SPT). However, we do not envision any technical challenges in supporting the software-based memory shadowing in our prototype.

75 5.6. SUMMARY CHAPTER 5. DEPRIVILEGING HOSTED HYPERVISORS WITH DEHYPE

5.6 Summary

We have presented the design, implementation and evaluation of DeHype, a system to deprivilege hosted hypervisor execution to user mode. Specifically, by decoupling the hypervisor code from the host OS and deprivileging most of its execution, our system not only substantially reduces the attack surface for exploitation, but also brings additional benefits in allowing for better development and debugging as well as concurrent execution of multiple hypervisors in the same physical machine. We have implemented a DeHype prototype for the open source KVM hypervisor. The evaluation results show that our system successfully deprivileged 93.2% of the loadable KVM module code base to user mode while only adding a small TCB (2.3K SLOC) to the host OS kernel. Additional experiments with a number of benchmark programs further demonstrate its practicality and efficiency.

76 CHAPTER

6

CONCLUSION AND FUTURE WORK

In this dissertation, we have presented a series of virtualization-based approaches for securing hypervisors, critical I/O access, and apps, respectively. Based on lightweight OS-level virtualization, AirBag enables users to confine apps in an isolated runtime environment in order to “test” or “profile” the untrusted apps. With the help of tHype, the critical private data would not go through the vulnerable Android OS, which provides the user a secure runtime environment for running security sensitive apps. By deprivileging the execution, the exported attack surface of a hosted hypervisor is substantially reduced by DeHype, which also enables concurrent execution of multiple hosted hypervisors that prevents a compromised hypervisor from attacking guest VMs hosted by other hypervisors. Based on the insights gained from the three pieces of system security research, we propose the following potential research directions for future work:

• Optimizing Userspace VMM As described in Chapter5, we address the attack surface prob- lem of hosted hypervisors and how we reduce it with DeHype. However, in practice, there is always a trade-off between performance and security. As proposed by Honig [Hon], a very critical performance impact (i.e., < .1%) is expected compared to the target of TCB reduction ratio (> 50%) in industry. It means we need to re-analyze the hosted hypervisor, find out the performance critical part, and keep them in the kernel space with efficient interfaces.

77 CHAPTER 6. CONCLUSION AND FUTURE WORK

Specifically, the TCB size is no longer the only target here. Instead, we need to profile each software module in the hosted hypervisor and carefully choose which module should stay in the privileged level. This results in a bigger privileged part of hypervisor compared to the HypeLet OS extension (e.g., 50% of original hypervisor). Further, we can use the kernel- ∼ level isolation approaches such as HyperLock [Wan12] to improve the security. Such a hybrid architecture could be a solution to performance issues.

• Improving OS-Level Virtualization As shown in our AirBag implementation (Chapter3), the biggest problem is that we need to port the kernel patch to each mobile device to enable AirBag support. However, the logic of the kernel patches on all supported device are actually quite similar. If the OS-level virtualization support (e.g., cgroup) could be extended to have a framework for all kinds of drivers to register, AirBag or similar systems could be deployed more easily. Based on the experience of building the AirBag and tHype, we think patching the framebuffer/LCD/GPU related code is the most complicated part. Although there are existing abstraction layers such as FBdev [Fbd], the hardware specific logic is not abstracted perfectly such that we need to touch the vendor customized code. To this end, we are seeking for a better abstraction layer design that abstracts all kinds of graphic hardware such that we can enable the context-aware virtualization in a portable way.

• Boosting Smartphone Resistance to Rooting Because of the fragmentation of Android ecosystem, the code quality of Android OS cannot be ensured compared to iOS and Windows. One reason is that the phone vendor may not be able to push the update to end users in time when some critical vulnerabilities are found and fixed. On the other hand, the vendor customized drivers may not be reviewed carefully compared to how the community merge patches. As a result, we need to assume that the Android OS would be compromised eventu- ally such that the security mechanisms for either protecting data or detecting the malicious “rooting” behaviors on the untrusted OS become very important. Further, the security mecha- nism needs to be implemented in a more privileged level such as hypervisor mode or ARM TrustZone mode to prevent it from being subverted. Although the related approaches such as ensuring OS integrity have been studied for decades, we are facing the problem that how to put the research into practice as the OS running inside the phone in everyone’s pocket is about to be compromised.

78 BIBLIOGRAPHY

[Dro] 260,000 Android users infected with malware. http://www.infosecurity-magazine. com/view/16526/260000-android-users-infected-with-malware/.

[Als] Advanced Linux Sound Architecture (ALSA) project homepage. http://www.alsa- project.org/main/index.php/Main_Page.

[Amd] AMD64 Architecture Programmer’s Manual Volume 2: System Programming. Advanced Micro Devices. 2007.

[Pre] Android 4.2 potential security features unveiled: SELinux, VPN Lockdown and Pre- mium SMS Confirmation. http://www.androidauthority.com/android-4-2- potential-security-features-unveiled-selinux-vpn-lockdown-premium- sms-confirmation-123785/.

[And] Android Malware Genome Project. http://www.malgenomeproject.org/.

[Gol] Android.Golddream Symantec. http://www.symantec.com/security_response/ writeup.jsp?docid=2011-070608-4139-99| .

[And11] Andrus, J. et al. “Cells: A Virtual Mobile Smartphone Architecture”. Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. 2011.

[Ant] AnTuTu Benchmark. http://www.antutulabs.com.

[App] App Store. http://www.apple.com/iphone/from-the-app-store/.

[Tru] ARM TrustZone. http://www.arm.com/products/processors/technologies/ trustzone/index.php.

[Arm10] Armbrust, M. et al. “A View of ”. Commun. ACM 53.4 (2010).

[Bar10] Barr, K. et al. “The VMware mobile virtualization platform: is that a hypervisor in your pocket?” SIGOPS Oper. Syst. Rev. 44.4 (2010).

[Bel05] Bellard, F.“QEMU, a Fast and Portable Dynamic Translator”. USENIX Annual Technical Conference, FREENIX Track. 2005.

[BY10] Ben-Yehuda, M. et al. “The Turtles Project: Design and Implementation of Nested Virtu- alization”. Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation. Vancouver, BC, Canada, 2010.

[Ber11] Beresford, A. R. et al. “MockDroid: Trading Privacy for Application Functionality on Smartphones”. Proceedings of the 12th International Workshop on Mobile Computing System and Applications. 2011.

79 BIBLIOGRAPHY BIBLIOGRAPHY

[Bon] Bonnie++. http://www.coker.com.au/bonnie++/.

[Bon94] Bonwick, J. “The Slab Allocator: An Object-Caching Kernel Memory Allocator”. Proceed- ings of the USENIX Summer 1994 Technical Conference - Volume 1. 1994.

[BWZ10] Boyd-Wickizer, S. & Zeldovich, N. “Tolerating Malicious Device Drivers in Linux”. Pro- ceedings of the 2010 USENIX Annual Technical Conference. 2010.

[Bro] BrowserMark. http://browsermark.rightware.com.

[Cpu] Bug 714271. https://bugzilla.redhat.com/show_bug.cgi?id=714271.

[Bug11] Bugiel, S. et al. “Practical and Lightweight Domain Isolation on Android”. Proceedings of the 1st ACM workshop on Security and privacy in smartphones and mobile devices. 2011.

[Cgr] CGROUPS. http://www.kernel.org/doc/Documentation/cgroups/cgroups. txt.

[Clo] Cloudburst: A VMware Guest to Host Escape Story. http://www.blackhat.com/ presentations/bh-usa-09/KORTCHINSKY/BHUSA09-Kortchinsky-Cloudburst- SLIDES.pdf.

[Cod] CodeAurora Security Advisories. https://www.codeaurora.org/projects/security- advisories.

[Col11] Colp, P.et al. “Breaking Up is Hard to Do: Security and Functionality in a Commodity Hypervisor”. Proceedings of the 23rd ACM Symposium on Operating Systems Principles. 2011.

[Com] comScore Reports December 2012 U.S. Smartphone Subscriber Market Share. http:// www.comscore.com/Insights/Press_Releases/2013/2/comScore_Reports_ December_2012_U.S._Smartphone_Subscriber_Market_Share.

[Cor] Corbet, J. Memory Management Notifiers. http://lwn.net/Articles/266320/.

[Cvea] CVE-2012-4220. http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE- 2012-4220.

[Cveb] CVE-2012-4221. http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE- 2012-4221.

[Cvec] CVE-2013-2094. http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE- 2013-2094.

80 BIBLIOGRAPHY BIBLIOGRAPHY

[Cved] CVE-2013-6123. http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE- 2013-6123.

[Cvee] CVE-2013-6282. http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE- 2013-6282.

[Cvef] CVE-2014-4322. http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE- 2014-4322.

[Cveg] CVE-2014-7911. http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE- 2014-7911.

[Dah11] Dahlin, M. et al. “Toward the Verification of a Simple Hypervisor”. 10th International Workshop on the ACL2 Theorem Prover and its Applications. 2011.

[DN10] Dall, C. & Nieh, J. “KVM for ARM”. Proceedings of the Ottawa Linux Symposium. 2010.

[Dav12] Davi, L. et al. “MoCFI: A Framework to Mitigate Control-Flow Attacks on Smartphones”. Proceedings of the 19th Annual Symposium on Network and Distributed System Security. 2012.

[Dik00] Dike, J. “A user-mode port of the Linux kernel”. Proceedings of the 4th annual Linux Showcase & Conference. Atlanta, Georgia, 2000.

[Din08] Dinaburg, A. et al. “Ether: Malware Analysis via Hardware Virtualization Extensions”. Proceedings of the 15th ACM Conference on Computer and Communications Security. 2008.

[DG11] Dolan-Gavitt, B. et al. “Virtuoso: Narrowing the Semantic Gap in Virtual Machine Intro- spection”. Proceedings of the 2011 IEEE Symposium on Security and Privacy. 2011.

[Ege11] Egele, M. et al. “PiOS: Detecting Privacy Leaks in iOS Applications”. Proceedings of the Network and Distributed System Security Symposium (NDSS). 2011.

[Enc09] Enck, W. et al. “On Lightweight Mobile Phone Application Certification”. CCS (2009).

[Enc10] Enck, W. et al. “TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones”. Proceedings of the 9th USENIX conference on Operating systems design and implementation. 2010.

[Enc11] Enck, W. et al. “A Study of Android Application Security”. Proceedings of the 20th USENIX conference on Security. San Francisco, CA, 2011.

81 BIBLIOGRAPHY BIBLIOGRAPHY

[Erl06] Erlingsson, U. et al. “XFI: Software Guards for System Address Spaces”. Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation. Seattle, WA, 2006.

[Fbd] FBdev: Development of frame buffer drivers. http://linux-fbdev.sourceforge. net/.

[Fel11] Felt, A. P.et al. “Android Permissions Demystified”. Proceedings of the 18th ACM Confer- ence on Computer and Communications Security. 2011.

[FL12] Fu, Y. & Lin, Z. “Space Traveling across VM: Automatically Bridging the Semantic Gap in Virtual Machine Introspection via Online Kernel Data Redirection”. Proceedings of the 2012 IEEE Symposium on Security and Privacy. 2012.

[Gan08] Ganapathy, V. et al. “The Design and Implementation of Microdrivers”. Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems. 2008.

[GR03] Garfinkel, T. & Rosenblum, M. “A Virtual Machine Introspection Based Architecture for Intrusion Detection”. Proceedings of the 10th Network and Distributed System Security Symposium. San Diego, CA, USA, 2003.

[Gar11] Gartner, Inc. Gartner Says Sales of Mobile Devices Grew 5.6 Percent in Third Quarter of 2011; Smartphone Sales Increased 42 Percent. http://www.gartner.com/newsroom/ id/1848514. 2011.

[Gar14] Gartner, Inc. Gartner Says Annual Smartphone Sales Surpassed Sales of Feature Phones for the First Time in 2013. http://www.gartner.com/newsroom/id/2665715. 2014.

[Gdb] GDB: The GNU Project Debugger. http://www.gnu.org/s/gdb/.

[Gli] GLIBC. http://www.gnu.org/software/libc/.

[Goo] Google Play. http://play.google.com/.

[Gra12] Grace, M. et al. “Systematic Detection of Capability Leaks in Stock Android Smartphones ”. NDSS (2012).

[Hor11] Hornyack, P.et al. “These Aren’t the Droids You’re Looking For: Retrofitting Android to Protect Data from Imperious Applications”. Proceedings of the 18th ACM Conference on Computer and Communications Security. 2011.

82 BIBLIOGRAPHY BIBLIOGRAPHY

[Hwa08] Hwang, J.-Y. et al. “Xen on ARM: System Virtualization Using Xen Hypervisor for ARM- Based Secure Mobile Phones”. Proceedings of the 5th Consumer Communications and Networking Conference. 2008.

[Int] Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 3: System Pro- gramming Guide. Intel. 2010.

[Jan14] Jang, Y. et al. “Gyrus: A Framework for User-Intent Monitoring of Text-Based Networked Applications”. The 21st Annual Network and Distributed System Security Symposium (NDSS). 2014.

[Jia07] Jiang, X. et al. “Stealthy Malware Detection Through VMM-based “Out-Of-the-Box” Se- mantic View Reconstruction”. Proceedings of the 14th ACM Conference on Computer and Communications Security. 2007.

[Kan12] Kantola, D. et al. “Reducing Attack Surfaces for Intra-Application Communication in Android”. Proceedings of the second ACM workshop on Security and privacy in smart- phones and mobile devices. 2012.

[Kiv07] Kivity, A. et al. “kvm: the Linux Virtual Machine Monitor”. Proceedings of the 2007 Ottawa Linux Symposium. 2007.

[Kle09] Klein, G. et al. “seL4: Formal Verification of an OS Kernel”. Proceedings of the 22nd ACM Symposium on Operating Systems Principles. 2009.

[Kvm] KVM. http://www.linux-kvm.org/.

[Hon] KVM Security Improvements. http://www.linux-kvm.org/wiki/images/f/f6/ 01x02-KVMHardening.pdf.

[Lan11] Lange, M. et al. “L4Android: A Generic Operating System Framework for Secure Smart- phones”. Proceedings of the 1st Workshop on Security and Privacy in Smartphones and Mobile Devices. 2011.

[Lau14] Lau, B. et al. “Mimesis Aegis: A Mimicry Privacy Shield–A SystemâA˘Zs´ Approach to Data Privacy on Public Cloud”. Proceedings of the 23rd USENIX conference on Security Sym- posium. 2014.

[Li14] Li, W. et al. “Building Trusted Path on Untrusted Device Drivers for Mobile Devices” (2014).

[Lie91] Liedtke, J. et al. “Two Years of Experience with a µ-Kernel Based OS”. Operating Systems Review 25.2 (1991).

83 BIBLIOGRAPHY BIBLIOGRAPHY

[Mai] linux/kernel//torvalds/linux.git. http://git.kernel.org/?p=linux/kernel/ git/torvalds/linux.git.

[LC14] Liu, D. & Cox, L. P.“VeriUI: Attested Login for Mobile Devices”. Proceedings of the 15th Workshop on Mobile Computing Systems and Applications. HotMobile ’14. 2014.

[Mao11] Mao, Y. et al. “Software Fault Isolation with API Integrity and Multi-principal Modules”. Proceedings of the 23rd ACM Symposium on Operating Systems Principles. 2011.

[Mar12] Martignoni, L. et al. “Cloud Terminal: Secure Access to Sensitive Applications from Un- trusted Systems.” USENIX Annual Technical Conference. 2012.

[McC09] McCune, J. M. et al. “Safe Passage for Passwords and Other Sensitive Data”. Proceedings of the 16th Annual Network and Distributed System Security Symposium. 2009.

[Mur08] Murray, D. G. et al. “Improving Xen Security through Disaggregation”. Proceedings of the 4th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments. 2008.

[Nad11] Nadji, Y. et al. “Automated Remote Repair for Mobile Malware”. Proceedings of the 27th Annual Computer Security Applications Conference. 2011.

[Nvd] National Vulnerabilities Database. http://nvd.nist.gov/.

[Nau10] Nauman, M. et al. “Apex: Extending Android Permission Model and Enforcement with User-Defined Runtime Constraints”. Proceedings of the 5th ACM Symposium on Infor- mation, Computer and Communications Security. 2010.

[Nen] NenaMark2. http://nena.se/nenamark/view?version=2/.

[Neo] Neocore. https://play.google.com/store/apps/details?id=com.qualcomm. qx.neocore.

[Net] NetworkWorld. Red Hat’s KVM Virtualization Proves Itself in IBM’s Cloud. http:// www.networkworld.com/community/blog/red-hats-kvm-virtualization- proves-itself-ibm.

[NS05] Newsome, J. & Song, D. “Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software”. Proceedings of the 12th Annual Network and Distributed System Security Symposium. 2005.

[Ngu12] Nguyen, A. et al. “Delusional Boot: Securing Cloud Hypervisors without Massive Re- engineering”. Proceedings of the 7th ACM SIGOPS/EuroSys European Conference on Computer Systems. 2012.

84 BIBLIOGRAPHY BIBLIOGRAPHY

[Okl] OKL4 Microvisor. http://www.ok-labs.com/products/okl4-microvisor.

[Ong09] Ongtang, M. et al. “Semantically Rich Application-Centric Security in Android”. Proceed- ings of the 2009 Annual Computer Security Applications Conference. 2009.

[Pet09] Peter, M. et al. “Virtual Machines Jailed: Virtualization in Systems with Small Trusted Com- puting Bases”. Proceedings of the 1st EuroSys Workshop on Virtualization Technology for Dependable Systems. 2009.

[Wak] PM: Implement autosleep and "wake locks", take 3. http://lwn.net/Articles/ 493924/.

[Qui06] Quigley, D. P.et al. “UnionFS: User- and Community-oriented Development of a Unifica- tion Filesystem”. Proceedings of the 2006 Linux Symposium. 2006.

[Rus08] Russell, R. “Virtio: Towards a De-facto Standard for Virtual I/O Devices”. ACM SIGOPS Operating Systems Review 42.5 (2008).

[Kno] Samsung KNOX. https://www.samsungknox.com/en.

[Dkf] Security Alert: New Android Malware – DKFBootKit – Moves Towards The First Android BootKit. http://www.csc.ncsu.edu/faculty/jiang/DKFBootKit/.

[Hip] Security Alert: New Android Malware – HippoSMS – Found in Alternative Android Markets. http://www.csc.ncsu.edu/faculty/jiang/HippoSMS/.

[Sha09] Sharif, M. et al. “Secure In-VM Monitoring Using Hardware Virtualization”. Proceedings of the 16th ACM Conference on Computer and Communications Security. 2009.

[SC13] Smalley, S. & Craig, R. “Security Enhanced (SE) Android: Bringing Flexible MAC to An- droid”. NDSS (2013).

[Can] Smart phones overtake client PCs in 2011. http://www.canalys.com/newsroom/ smart-phones-overtake-client-pcs-2011.

[Spe] SPEC CPU2006. http://www.spec.org/cpu2006/.

[SG11] Srivastava, A. & Giffin, J. “Efficient Monitoring of Untrusted Kernel-Mode Execution”. Proceedings of the 18th Annual Network and Distributed System Security Symposium. 2011.

[SK10] Steinberg, U. & Kauer, B. “NOVA: a Microhypervisor-based Secure Virtualization Archi- tecture”. Proceedings of the 5th European Conference on Computer systems. 2010.

85 BIBLIOGRAPHY BIBLIOGRAPHY

[Sun] SunSpider JavaScript Benchmark. http://www.webkit.org/perf/sunspider/ sunspider.html.

[Swi03] Swift, M. M. et al. “Improving the Reliability of Commodity Operating Systems”. Proceed- ings of the 19th ACM Symposium on Operating Systems Principles. 2003.

[Sys] SystemTap. http://sourceware.org/systemtap/.

[Sze11] Szefer, J. et al. “Eliminating the Hypervisor Attack Surface for a More Secure Cloud”. Proceedings of the 18th ACM Conference on Computer and Communications Security. 2011.

[Tan12] Tang, Y. et al. “CleanOS: Limiting Mobile Data Exposure with Idle Eviction”. Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation. 2012.

[Val] Valgrind. http://valgrind.org.

[VH11] Varanasi, P.& Heiser, G. “Hardware-Supported Virtualization on ARM”. 2nd Asia-Pacific Workshop on Systems (APSys’11). 2011.

[Vas12] Vasudevan, A. et al. “Trustworthy Execution on Mobile Devices: What security properties can my mobile platform give me?” Proceedings of the 5th international conference on Trust and Trustworthy Computing. 2012.

[Vet] Virtual ethernet device (tunnel). http://lwn.net/Articles/232688/.

[Vir] Virtunoid: Breaking out of KVM. http://nelhage.com/talks/kvm-defcon-2011. pdf.

[WJ10] Wang, Z. & Jiang, X. “HyperSafe: A Lightweight Approach to Provide Lifetime Hypervisor Control-Flow Integrity”. Proceedings of the 31st IEEE Symposium on Security and Privacy. 2010.

[Wan12] Wang, Z. et al. “Isolating Commodity Hosted Hypervisors with HyperLock”. Proceedings of the 7th ACM SIGOPS/EuroSys European Conference on Computer Systems. 2012.

[Wil08] Williams, D. et al. “Device Driver Safety through a Reference Validation Mechanism”. Proceedings of the 8th USENIX Conference on Operating Systems Design and Imple- mentation. 2008.

[Xio11] Xiong, X. et al. “Practical Protection of Kernel Integrity for Commodity OS from Untrusted Extensions”. Proceedings of the 18th Annual Network and Distributed System Security Symposium. 2011.

86 BIBLIOGRAPHY BIBLIOGRAPHY

[Xu12] Xu, R. et al. “Aurasium: Practical Policy Enforcement for Android Applications”. Proceed- ings of the 21st USENIX conference on Security symposium. Bellevue, WA, 2012.

[YY12] Yan, L.-K. & Yin, H. “DroidScope: Seamlessly Reconstructing OS and Semantic Views for Dynamic Android Malware Analysis”. Proceedings of the 21st USENIX Security Symposium. 2012.

[Yan13] Yan, Q. et al. “Designing Leakage-resilient Password Entry on Touchscreen Mobile De- vices”. Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security. 2013.

[YH10] Yang, J. & Hawblitzel, C. “Safe to the Last Instruction: Automated Verification of a Type- Safe Operating System”. Proceedings of the 2010 ACM SIGPLAN conference on Program- ming Language Design and Implementation. 2010.

[Zho14] Zhou, X. et al. “The Peril of Fragmentation: Security Hazards in Android Device Driver Customizations”. IEEE Symposium on Security and Privacy (2014).

[Zho12] Zhou, Z. et al. “Building Verifiable Trusted Path on Commodity x86 Computers”. Pro- ceedings of the IEEE Symposium on Security and Privacy. 2012.

87