Rethinking Design: Asymmetric Multiprocessing for Security and Performance

Scott Brookes Stephen Taylor Thayer School of Engineering at Dartmouth Thayer School of Engineering at Dartmouth College College 14 Engineering Dr 14 Engineering Dr Hanover, NH 03755 Hanover, NH 03755 [email protected] [email protected]

ABSTRACT Often, an increase in security implies a decrease in perfor- Developers and academics are constantly seeking to increase mance, and vice versa. Given the highly developed state of the speed and security of operating systems. Unfortunately, modern systems, it is rare for any incremental change to the an increase in either one often comes at the cost of the status quo to increase security without adding work, thereby other. In this paper, we present an operating system de- decreasing performance. In this paper, we challenge one of sign that challenges a long-held tenet of multicore operat- the earliest and most fundamental design choices that led to ing systems in order to produce an alternative architecture the current status quo of multicore operating system design that has the potential to deliver both increased security and in order to arrive at a system that we believe can increase faster performance. In particular, we propose decoupling both security and performance, rather than trading one for the operating system kernel from user processes by running the other. each on completely separate processor cores instead of at We begin by describing the path that led to the current di↵erent privilege levels within shared cores. Without using accepted standard in operating system design, illustrating the hardware’s privilege modes, virtualization and virtual the issues with that design, and introducing an alternative contexts enforce the security policies necessary to design paradigm to address these issues in Section 2. In Sec- maintain isolation and protection. Our new kernel tion 3, we discuss the various components of the alternative design paradigm o↵ers the opportunity to simultaneously in- design in greater detail. Section 4 provides a brief survey crease both performance and security; utilizing the hardware of related and background literature. In particular, it ex- facilities for inter-core communication in place of those for amines previous attempts to mitigate privilege escalation, privilege mode switching o↵ers the opportunity for increased previous work studying asymmetric multiprocessing, and a performance, while the hard separation between selection of prior attempts to redefine traditional operating user processes and the kernel provides several strong security system design. Finally, Section 5 discusses possible next properties. steps, future work, and methods for evaluating the merits of the proposed design. The contribution of this work is the detailed description CCS Concepts of an alternative design paradigm for multicore operating Computer systems organization Multicore archi- systems that o↵ers opportunities for improvement in both tectures;• Security and privacy !Virtualization and performance and security. In particular, the design o↵ers: security; • ! Complete isolation and “sandboxing” • of every user application. 1. INTRODUCTION Device drivers with the security of user-space encap- Multicore systems have become ubiquitous in every cor- • ner of the modern world, providing the foundation for vir- sulation and the performance of kernel modules. tually all modern technology. For many applications, high Hardware-enforced secure contexts available for application- security and high performance are both critical. As a re- • specific use. sult, the performance of computer systems is increasing at breakneck speed and security research is more important Real-time “watchdog” security monitors in the kernel • than ever. However, the two e↵orts are largely conducted for intrusion detection in applications. independently of (and sometimes opposed to) one another. Fine-grained security policies enabled by per-core vir- • tualization and separation of the kernel and the appli- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed cation. for profit or commercial advantage and that copies bear this notice and the full cita- tion on the first page. Copyrights for components of this work owned by others than Decreased system call performance overhead. ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- • publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. 2. MOTIVATION AND OVERVIEW NSPW ’16, September 26-29, 2016, Granby, CO, USA The fundamental role of an operating system is to provide 2016 ACM. ISBN 978-1-4503-4813-3/16/09. . . $15.00 an appropriate context for process execution, multiplex pro- DOI: http://dx.doi.org/10.1145/3011883.3011886 cesses’ access to the hardware, and protect processes from

68 each another. The current state of the art operating system erating system design onto each core in multicore hardware architecture is the product of many seminal [59, 20, 16, 7, imposed unnecessary limits on the security and performance 65, 60] and hundreds of subsequent research e↵orts and ac- attainable on modern systems. In this paper, we revisit complishes all of these tasks. Current systems provide each that decision and explain how to replace the unicore oper- application with a virtual address space in which the kernel ating system design with one that leverages modern mul- manages the required context. The kernel provides the ap- ticore hardware to address some of the weaknesses found plication with functionality for manipulating hardware by in modern designs. The unique utilization of hardware re- including its own code in the virtual address space of each sources that we present in this paper fundamentally changes process1. In order to protect processes from one another, the how systems provide security to their di↵erent components. kernel does not share virtual address spaces between applica- While redistribute system components to in- tions and it restricts access to the shared kernel functionality crease the security o↵ered by monolithic kernels, they do so using hardware-provided CPU privilege levels. using the exact same hardware abstractions to di↵erentiate Unfortunately, some characteristics of the current kernel user- and kernel-space. In contrast, our paradigm changes design lead to security and performance issues. The popular the underlying hardware abstractions used for coordination return-to-user (ret2usr) style attack [35] is enabled by shar- and translation between di↵erent components of the system. ing the virtual address space between the kernel and the ap- In fact, any current kernel, whether monolithic or micro, plication, despite the CPU privilege levels. Additionally, the could be implemented using our design. Even cutting edge frequent used to change privilege levels between virtualization-based security e↵orts use the same hardware the kernel and application generate substantial performance design patterns that operating systems have been using for overhead. years [57]. As argued in [10] this “turtles all the way down” The current relationship between the operating system approach to security is not sucient. In our design, virtual- and a user application is largely a historical artifact. In the ization is used as a tool for the kernel to protect itself, rather early days of computing, with just one processing unit, only than as an abstraction that replicates the classic kernel de- one program could run at any given time. There needed to sign paradigm in a new software layer. be a way to pass control from one program to another (in this Our design explores the idea of using Asymmetric Mul- case, from the application to the kernel), and some notion tiProcessing (AsMP) on modern hardware. Its main of di↵ering privileges between the two programs. To accom- goal is to divorce the virtual address space of the applica- plish the first task, system designers simply included both tion from that of the kernel. The distinct, decoupled op- “programs” in a single computing context. For the second, erating environments provided on each processor allow for the hardware provided the privilege mode switch triggered the kernel and the application to use completely di↵erent by an (an INT instruction on x86). When the pro- hardware resources, rather than sharing a common proces- cessor reached this instruction, it would save the state of the sor core. Inter-Processor Interrupts (IPIs) will take the role current operating context to a known location, change the of the standard INT/SYSENTER and IRET/SYSEXIT function- CPU’s privilege mode, and direct execution to a predefined ality for system calls2. The application and the kernel will kernel entrance routine. Later, when the kernel wanted to no longer share hardware, so the hardware-provided CPU resume execution of the application, it would issue a special privilege mode switch is not needed, and neither program return (IRET on x86), at which point the hardware would has to relinquish its own resources to message or signal the demote privilege and transfer execution to a kernel-defined other. Additionally, virtualization will be utilized to imple- location in the application’s code. More recently, the x86 ment fine-grained security rules on a per-core basis to further architecture defined SYSENTER and SYSEXIT as alternatives police the software. to INT and IRET for the system call (syscall) interface. This paradigm shift interrupts commonly deployed privi- With the advent of multicore processors, the need to share lege escalation mechanisms by defining privilege by a physi- hardware no longer forced this design onto kernel developers. cal location on-chip rather than traditional processor modes. However, since each core still provided the functionality of As a result, more advanced virtualization-based defenses the unicore processor, it became standard practice to simply can be employed because each core is responsible for ei- take the established formula and replicate it, instantiating ther the kernel or an application - never both. Addition- one instance per core. The only special care needed was ally, the fact that the kernel and the application do not that no two cores would execute the same code at the same share hardware resources means that they can operate si- time. Originally, this was accomplished using a simple “ker- multaneously. This creates opportunities for techniques such nel lock,” a software mechanism that restricted access to the as using kernel to perform watchdog state checks on appli- kernel to only one processor at a time [41]. More recently, cations while they run or applications utilizing truly asyn- many kernels have narrowed the granularity of their locked chronous system calls by signaling the kernel but continuing paths so that two or more fully independent paths through to run while the kernel services its request. The combination the kernel may be used simultaneously by two or more dif- of all of these techniques and their derivative features will ferent cores [64]. Overall, this scheme of employing a single deliver strong separation between the kernel and an appli- mechanism on each available core is known as Symmetric cation, fast IPI-based control transfers and messaging, ad- MultiProcessing (SMP). vanced virtualization-based and application-specific security We suggest that the decision to replicate the unicore op- features, and new opportunities such as asynchronous sys- tem calls, kernel-based watchdog processes, and application- 1The few examples of systems that use “strong” rather than specific hardware-secured subcontexts. “weak” separation between kernel memory and user memory include the 4G/4G split patch [49], 32-bit XNU [36], 2Future work that seeks to implement this design will ex- and certain systems using the hardware facilities provided plore polling as a possible alternative to IPIs as suggested by SPARC V9 hardware [46] by an anonymous NSPW reviewer.

69 2.1 Threat Model create mappings for arbitrary system memory, we will omit This work assumes that an adversary has physical or re- virtual mappings to the application’s page tables from the mote user-level access to the system after it has been initial- application’s virtual address space. In order to modify its ized. He does not have access to the system during initializa- own page tables, the page tables need to contain a virtual tion and may not influence the boot process; issues such as mapping to themselves; without it, the application cannot trusted boot are beyond the scope of our work. The attacker modify the memory used by the unit hopes to escalate his privilege in order to gain access to pro- to define its own virtual address space. This ensures that tected data or code using some privilege escalation mech- the process is permanently and completely isolated in virtual anism, such as return-to-user (ret2usr) attacks [35], rather memory and enables the process’ page tables to deny access than a particular privilege escalation vulnerability, such as to memory-mapped devices. [2, 1, 47]. The attacker may have o✏ine access to the source The virtualization layer also helps to recreate ring 3 pro- and/or binary code of the kernel and any application run- tection by denying inappropriate hardware accesses like read- ning on the system, and arbitrary control of any code/data ing from and writing to processor control registers or exe- to which his privilege allows access. cuting typically privileged instructions. The end result is an application execution context with the same protections 3. ASMP KERNEL DESIGN PARADIGM o↵ered by ring 3, but without the crown jewels of the system hiding behind a CPU privilege level bit. Figure 1 illustrates an overview of our design. Core num- ber, rather than the traditional CPU privilege level (ring), 3.1.2 Hardware Control from Applications di↵erentiates kernel- from user-space. Core-specific security rules in the virtualization layer and virtual memory isola- User-space drivers [26] minimize the amount of potentially tion for ring 0 processes recover the protection o↵ered by third party and frequently buggy [25, 15, 53] code that runs the traditional ring 3 user-space. Interprocessor interrupts with privilege. Unfortunately, this comes at a significant provide a mechanism for implementing system calls across performance cost. Device drivers need to interact with hard- cores. With ring 3 no longer used for isolating user-space, ware, for which they need privilege that they do not have the application can use the hardware protection associated in ring 3. Therefore, they must frequently interact with the with ring 3 to provide its own secure sub-contexts. kernel to accomplish their tasks, significantly degrading per- This section explores the features of our design in detail. formance, increasing the kernel’s code base, and widening The main factors contributing to increased security are di- the interface between user- and kernel-space. vorcing kernel- and user-space, utilizing user-space drivers Although we’ve shown that applications can be run in and deploying fine-grained per-core virtualization-based - ring 0 with privileges less than or equal to those of ring 3, curity policies. The increase in performance will come from we can actually increase security by providing drivers with a faster mechanism for signaling from an application to the privileges greater than those available in ring 3. Setting up kernel, less overhead in driver implementations, and more the proper operating context for a driver (as suggested in ecient parallelization of computation. the virtualization layer on the core being used for a network driver in Figure 1) will allow the driver to be implemented 3.1 Separating Kernel and User Cores in a single context. This is an improvement over traditional The primary characteristic of the proposed design is that “user-space” drivers that need the kernel to do work on their the kernel and the user application do not share a virtual ad- behalf. It keeps the kernel’s code base small and interface dress space and are each executed at ring 0, but on indepen- narrow. dent cores. This arrangement will have several implications on the security of the system. 3.1.3 Application-Specific Secure Contexts By moving user-space from ring 3 on a shared core to 3.1.1 Ring 3 (and more) Protection in Ring 0 ring 0 on a user-specific core, we’ve introduced yet another The weak separation between kernel- and user-space in opportunity to increase security. In particular, ring 3 is no traditional operating system design allows for even the most longer being used by the system. Instead, applications can strictly “sandboxed” processes to be used as a stepping stone (with some kernel support) use ring 3 to provide their own for compromising the whole system because privilege escala- secure contexts. tion compromises the kernel. Since the kernel needs to man- For an example of how this feature might be used, consider age all of system memory, it maintains virtual mappings to a web browser. Although a browser strives to protect web all memory; a process that shares its virtual address space pages from one another and protect itself from web pages, with the kernel can perform arbitrary reads and writes with cross-site scripting [63] is just one example of a common sucient privilege escalation. attack vector in which a single web page can compromise When not shared with the kernel, the application’s vir- an entire web browser. However, if the browser is running tual address space does not need to contain mappings to in ring 0, it could manage individual web pages in ring 3 all of system memory. Consequently, the application can- the same way that traditional operating systems manage not inspect or modify memory unless the kernel has given user processes. When the user opens a new web page, the it access, even with arbitrarily high privilege. Naively, this browser could ask the kernel for a fresh address space that means that the application can safely be run in ring 0. How- contains itself in ring 0 and a clean context for the new ever, additional steps are required to recreate the full pro- web page in ring 3. The kernel needs to handle this task, tections o↵ered by ring 3 while running in ring 0. but the browser could use a local scheduler implementation In ring 0, the process is a supervisor on-core and can read (separate from the main scheduler in the kernel) to cycle from or write to any memory mapped into its address space. through these di↵erent contexts without kernel intervention In order to stop a process from modifying its page tables to during its own scheduled time quantum.

70 Application-Specifc Secure Context

INT/SYSENTER Ring 3

Hardware-Enforced Privilege Mode Boundary

IRET/SYSEXIT

IPIs IPI Ring 0 IPI

Kernel User Process User Process (e.g. Network Driver) (e.g. Web Browser)

Allow HLT instructions? ✓ Allow HLT instructions? ✗ Allow HLT instructions? ✗ Ring -1 Allow device interrupts? ✓ Allow device interrupts? ✓ Allow device interrupts? ✗ (VMM)

Core 0 Core 1 Core N

Figure 1: System design using the proposed paradigm. CPU privilege level (ring) 0, traditionally used for the kernel context, contains the main code for each core: either the kernel or a user processes. The virtualization layer, commonly referred to as ring -1, protects the hardware from user processes. With ring 3 no longer being used by the system, individual applications can use the hardware to provide their own secure sub-contexts.

3.1.4 Watchdog Kernel Processes 3.2 Fine-Grained Virtualization In current operating systems, the kernel is invoked strictly One advantage of the proposed design is the ability to ex- when an application needs work done on its behalf. In other ercise each core’s virtualization hardware independently to words, the kernel is never running without a specific task impose fine-grained security rules on each core based on the to handle. In contrast, with the asymmetric multiprocess- software that that particular core will be executing. Pre- ing solution, the kernel will be running on a dedicated core viously, we discussed why this is necessary to recreate ring whether or not it is servicing a specific process’ request. This 3 protections for user processes in ring 0. Moving beyond creates an opportunity: time during which the kernel is ex- necessity to benefit, this section examines a virtualization- ecuting but has no specific task to complete. One possible based security project and shows how it could be further way to utilize this opportunity is to implement a watchdog strengthened with the new design paradigm. routine [17, 44] in the kernel. This routine could monitor ExOShim [11] is a thin layer of virtualization underneath system and application invariants in real time. For exam- the kernel used to mark all memory associated with ker- ple, it could hash a process’ code to check for code patching, nel code as execute-only using the Extended monitor the stack looking for ROP payloads, or profile sys- (EPT) functionality provided by modern Intel VT-x hard- tem components to detect unfamiliar configurations. In the ware [30]. This mitigates the risk of kernel-level memory event that an anomaly is detected, the application can be disclosures that could facilitate reverse-engineering or the suspended pending further action. construction of kernel ROP or JIT-ROP payloads. Addi- The watchdog routine in this implementation is doing its tionally, ExOShim does not accept any inputs or tolerate work with CPU cycles that would otherwise be wasted on any permissions violations so that it can provide this pro- the kernel core. These cycles could also be used for other tection for the lifetime of the system, even in the face of tasks in the interest of either performance or security. One kernel compromise. example of the former is pre-computing values that are likely Figure 2 illustrates how the physical frames of memory to be requested by applications in future system calls. In corresponding to kernel code are marked execute-only in the the case of the latter, the kernel could use these cycles as EPT, denying kernel-level memory disclosure vulnerabilities. a third line of defense (after virtual memory isolation and However, all other memory is marked with the most liberal virtualization) to restrict user processes being run in ring 0. possible permissions in the EPT: read, write, and execute. For example, if the kernel wants the application to alert it This is because the applications running on the processor in the event of a processor fault, the watchdog process can may need to use this memory. Therefore, the virtualization verify that the process does not modify the fault handlers layer relies on the kernel to maintain the appropriate mem- mapped into its context. ory management and access control for this memory. Note that this is consistent with most hypervisors; only the least

71 Process A Process B Virtual Virtual Memory Memory

Space Space

Kernel Code Kernel

Kernel Code Kernel

Kernel Code Kernel Kernel Code Kernel

Virtual Memory Abstraction

Guest Physical

Memory

Kernel Code Kernel Kernel Code Kernel

ExOShim

Extended X-Only X-Only Page Tables

Real System Physical

Memory

Kernel Code Kernel Kernel Code Kernel

Unused Process A Kernel Process B

Figure 2: Overview of the Protections Provided by ExOShim [11]. All kernel code pages are marked execute-only in the hypervisor’s EPTs. All other memory is marked with the most liberal permissions to allow the kernel to manage memory for individual applications. restrictive security rules can be deployed for the lifetime of a 3.3 Performance Implications virtual machine. This is an unfortunate byproduct of SMP; One possible issue facing the performance of the proposed the hypervisor cannot enforce stricter security rules because architecture is the symmetrical nature of hardware resources it cannot predict which core will need which set of rules. in SMP chips. In particular, caches and or Unfortunately, the permissive rules applied to this mem- peripheral busses are optimized for symmetric use by all ory allows for the possibility of a ret2usr [35] or ret2dir [34] cores. Using the cores asymmetrically may result in subop- style attack. Intuitively, this is because the security rules timal performance of these hardware facilities which could that ExOShim applies to the kernel are not complete. It only be resolved at the hardware development stage. enforces a rule that all kernel code is execute-only.How- The more obvious performance concern is that the pro- ever, it does not enforce the desirable rule that the kernel posed design suggests reserving one processor core for the may execute only kernel code. This rule would eliminate the kernel, decreasing the total number of cores available for si- possibility of the processor executing any non-kernel code multaneous operation of applications by one. Applications with kernel privilege, but the requirement that application that are optimized to make heavy concurrent use of all avail- code must be able to utilize this memory from this core able cores will see a decrease in performance with the fewer denies the possibility of applying such a strong rule in the resources available to them. However, Amdahl’s Law [4] sug- virtualization layer. This rule would be better implemented gests that there is a limit to how many processor cores can with ExOShim than with hardware extensions such as In- decrease the overall time required for an application. There- tel’s SMEP [24]. SMEP can be disabled in the event of fore, in the future (as hardware continues to scale and o↵er kernel compromise, SMEP bypass techniques have already more and more processors) removing any constant number been demonstrated [58, 33], and SMEP cannot mitigate the of cores from those available will not significantly decrease threat of a ret2dir style attack [34] while ExOShim can. the performance of any particular application. In fact, in In the proposed design, however, the kernel runs on its a future implementation of this architecture deployed on a own core; no application code will run on this core. This system with a very high number of available processor cores, means that the ret2usr attack is defeated directly: there the design could reserve one core per independently locked is no user-space to return to in the kernel context. Ad- path in the kernel and still have many cores available for ditionally, the kernel core’s virtualization layer can enforce active processes. even stronger security rules than the ExOShim prototype Additionally, the proposed design o↵ers several opportu- presented in [11]. In addition to marking all memory cor- nities to increase the performance of an application that will responding to kernel code as execute-only, it will mark all be immediately evident in most cases. These performance other memory as non-executable. This will preserve the pro- gains may even help to o↵set the cost of removing a core tection against memory disclosure vulnerabilities while also from specialized highly-concurrent processes. denying ret2dir style attacks by prohibiting the execution of any non-kernel code from the kernel core.

72 Schedule Schedule 3.3.2 IPIs vs. INT/IRET New New Process Process In addition to the possibility for doing additional work per Application alarm unit time by running the kernel and application simultane- 1 2 3 4 5 6 ously, using IPIs to handle control transfer from the applica- Time tion to the kernel is faster than using the SYSENTER/SYSEXIT 1 2 or INT/IRET mechanisms. Using the traditional mechanisms, Kernel the hardware goes through context switching routines in (a) With the Traditional Syscall Mechanism which it saves and restores the appropriate states. In the Schedule Schedule proposed design, this type of context switching is not nec- New New essary. In the case of a regular system call, the operating Process Process context of the user process is maintained on the user’s core; Application alarm not sacrificing the hardware to the kernel means no state 1 2 3 4 5 6 7 8 9 saving is required. Similarly, the kernel does not need its Time state restored because it is operating on its own hardware Kernel 1 2 which maintains its own state. In the case where context switching is required, namely (b) With the Proposed IPI-Based Mechanism during , some additional work will be done to pre- form the required state saving. The method to best imple- Figure 3: Application’s work completed in a given time ment this additional task will be explored in the future im- quantum. With the proposed IPI-based syscall mechanism, plementation of a prototype implementing this design, but the clock cycle following the call to the kernel belongs to the is likely to require some modification to user applications application, not to the kernel. This is not the case with tra- within the system call . ditional system call interfaces; even syscalls that claim to be “asynchronous” suspend the application while it is delivering 3.3.3 Removing Drivers from the Kernel its request to the kernel. In Section 3.1.2 we explained how our design allows us to give device drivers both privileged access to hardware and full user-space encapsulation. We suggested that our design 3.3.1 Simultaneous Application and Kernel Execu- could reveal faster performance than the equivalent driver tion in user-space as traditionally defined because the driver can access hardware from its own context rather than invoking Recall that the proposed design will have the kernel and the kernel. application operating simultaneously on di↵erent processor In addition to the performance increase over traditional cores. This opens the door for new classes of system call in- user-space driver implementations, our method also has a terfaces. In particular, the application does not necessarily possible performance advantage over kernel-level drivers em- need to stop execution while it makes a syscall. Figure 3 ployed in monolithic kernels. Without suciently fine-grained contrasts the work that an application can accomplish with locking, even a kernel-level driver locks other cores out of a traditional syscall and with the proposed IPI-based de- (at least some part of) the kernel while it is servicing inter- sign. It illustrates that when the application requests work rupts. In contrast, drivers loaded with privileged access to from the kernel (in this case, setting an “alarm”) with the the hardware but not into the kernel can service interrupts traditional interface, it must wait for the kernel to perform 3 as quickly as traditional kernel-level drivers, but without the requested action prior to continuing its work . In the blocking other cores from entering the kernel. proposed design, however, it can continue its work immedi- This e↵ect is similar to that reported in [21] when the ately because the application and the kernel are not sharing network driver was isolated to its own virtual machine. In hardware. As a result, the application is able to complete that case, letting a network-specific VM handle all network more work in a given time quantum. interrupts without interrupting the main kernel resulted in Note that this truly asynchronous system call interface a performance increase, despite the added overhead of inter- is only possible if the kernel is not sharing hardware with VM communication. Analogous results were also found in the application. As such, many additional research ques- [8, 51] tions exist before a full implementation of this feature can be realized. For example, if the kernel fails to perform the requested task, it must alert the process; this is nontriv- 4. RELATED WORK ial if the process has changed state since the time of the There is a substantial body of work that aims to thwart syscall. Additionally, the process needs a way to know that privilege escalation using design-, compile-, load-, and/or the kernel is prepared to handle a second syscall after the run-time techniques. Additionally, many e↵orts have ex- first has been dispatched. Despite these unanswered ques- plored the idea of asymmetric multiprocessing for perfor- tions, the possibility for this type of system call mechanism mance purposes. Finally, some previous work suggests a new has promising implications for system performance. paradigm for operating system design. A brief summary of major work in each of these areas follows. 3Note that this is true even in the case of the “asynchronous 4.1 Privilege Escalation Mitigation system calls” [13] used in some current operating systems. These interfaces may provide the capability for the process Often assisted by the weak separation of kernel- and user- to work while the kernel is servicing its request, but the space, all of the most popular kernels have been compro- process must still relinquish its hardware to the kernel while mised by “rootkits” that give the attacker the highest level it is making the request. of privilege (i.e. “root”) [66, 37, 56]. Typically, these attacks

73 ytmt neatwt adae enli otdt use to ported is kernel A hardware. with operating interact an to allow system that instructions independent user. chitecture the by by execution implant and for non-writable code approved. approved is been kernel has code kernel has a user executable the of all that that possibility verifying pages the code mitigates kernel This all WX the on standard enforces shownrules whenever SecVisor as Additionally, switch user-space 5a. and Figure mode in kernel- between processor at- jumps ret2usr a flow CPU control defeats ensuring the This by if protec- level. time. only tacks the privilege resumed changed a detected, execution indeed is at and has rules swapped executable security are user-space of tions and violation a kernel- When of one [34]. only attack style ret2dir been a mitigate of already cannot threat have SMEP the Additionally, techniques 33]. bypass [58, operating demonstrated SMEP threat, by this adopted and mitigate being slowly systems to only aim are [24] extensions SMEP these Intel’s exten- as user hardware such Although to sions user-controlled privileges. of mode kernel-level execution with supervisor the in code from results attack level This privilege mode. chang- CPU user-space without execution the normal user-code of enters ing the path and a in kernel-code creates leaves address branch that an kernel- compromised to The some with set code. associated is target branch controlled Fig- code in user illustrated a attack, 4, this ure In separation. architec- user-space x86 and 64-bit modern on attack this tures. defeat methods existing to summarize used will directly we design attacks, and system at- ret2usr kernel- operating defeats these between proposed of separation our each survey strong in of the user-space a risks Since published the have [12]. mitigate We tacks to used [35]. 62, techniques [42], attacks of [14, implants ret2usr (ROP) code and programming kernel 29], oriented categories: return main kernel-mode three into fall h eueVrulAcietr SA 1]i e far- of set a is [19] (SVA) Architecture Virtual Secure The mark to virtualization memory physical uses [61] SecVisor kernel- weak by enabled directly is attack return-to-user A iue4 eunt-srPiieeEclto Attack Escalation Privilege Return-to-User 4: Figure

Kernel-Space User-Space INT/SYSENTER Hardware-Enforced Privilege Mode Boundary Mode JMPkernel_target ... fn: execve(); ... attacker_target: attacker_target IRET/SYSEXIT 74 []tog h atrsaetp fsse ssml,i is it simple, is system of type master-slave the “[a]lthough Playsta- Sony the in used architecture microprocessor “Cell” yasisiln oechecks. diversifi- code inline code di its compile-time it bypass makes instruc- a that these includes mechanism of also cation each code. kGuard before user-controlled check into tions. inline an execution the places kernel platform, kGuard redirect x86 to the der On shown ret 5b. as time Figure compile at in control-flow kernel’s the on guards escalation privilege primary three the By of site. techniques. each man- partic- call deny KCoFI run-time, a possible to at a ages from branches of mode returns kernel location all all verifying the that of target beginning function and the ular KCoFI code, at Specifically, enter function’s always some integrity. calls function flow that control ensures only previously, o↵ers discussed implementation but SVA the of mechanics target- native to translating interpreter to SVA of code. distributed machine virtualized step dependent final a is the of byte-code performs top The that on executed and . users as such languages to“safe”programming similar compiler control- compile-time, and This at safety integrity flow memory code. provide to new source features advanced kernel any has the to from kernel byte-code a SVA O porting architecture. to hardware similar instructions, these lv oerqet n nerpino otaeo h mas- gen- the on with software core of of interruption servicing its and delayed requests in shared resulted core core kernel slave This single the of processes. a purpose quantity isolate alone; eral to and kernel a↵ord the speed not hard- for the could limited and of had resources attempts face ware Early the multiprocessors. in modern un- persist is assessment to this likely Fortunately, resources.” system total that ine writes quite Enslow usually where [23], master in the summarized on is only proces- running This system between core. operating relationship the master-slave with sors, a explored sors asymmet- relevant. hardware particularly investigates multicore is symmetric that to resources research software plentiful of However, is application asymmetri- processors ric 48]. of of 5, types application iPhone these [50, the the in in examines multiprocessing used that cal chipset Research Fusion A10 Apple 7. the and ser- the tion include valuable Examples provide computation. serial cores parallelization high-cost of high-performance for degrees vice few large and provide a power, and while chip heat, the allow the of lower processors cost to These cores low-performance many performance. heterogeneous have of performance the increase computation. greatly normal spe- can with (GPUs) hardware tasks units software cialized processing specific matching graphics how or demonstrate cards units network Hardware as already systems. such is computing resources within hardware used specific commonly to spe- tasks Allocating software SMP. cific to compared multiprocessing metrical . smercMultiProcessing Asymmetric 4.2 Gad[5 ist eyrturatcsb inserting by attacks ret2usr deny to aims [35] kGuard the leverages [18] (KCoFI) Integrity Flow Control Kernel al oksronigteitouto fmultiproces- of introduction the surrounding work Early that cores with built are multiprocessors many fact, In asym- of merits possible the investigated has research Some ntutosaealvleal obighjce nor- in hijacked being to vulnerable all are instructions in niscnrladuiiaino the of utilization and control its in cient n,a oplrproduces compiler SVA an ✏ine, utfrteatce to attacker the for cult call, and jmp, nesv oko eafo nipratpoesi parallel in process itself. important application an kernel- the of to application-specific behalf do on to work ex- cores intensive [8] specific Corey using Similarly, pro- plores systems. The over operating performance SMP need. improved traditional might demonstrated drivers only described implements those totype that that and (LDK) functionality driver Kernel the device or Device appropriate communication Lightweight the network run a as cores These such I/O. tasks disk specific to cessor in oper- used commodity in be use to systems. traditional cores ating of their processor illustration beyond and proces- capacity thorough IPIs the a a x86 of of provides subset flexibility work own the its Their using each cores. - sor’s CPU system same operating the Linux on the of instances com- independent two pletely running by asymmetrically cores multiprocessor ric se- possible the are realize believe not we that does AsMP. benefits with it performance processor, ker- or multicore uniprocessor concept, curity a a its shoehorn in these onto to similar designed nel for is was work it scans This because that the core. but main from daemon the computation on a appropriate kernel the and requests cores, and applica- records other by made on calls tions system kernel record lightweight provides that mul- it implementations a particular, on In software oper- processor. manage uniprocessor ticore to a implementation allow system easily to ating designed scheme a ment design. issues and proposed these the cost systems, plague low modern to the cores unlikely With processor are of requests. speed core slave high during core ter . rvosyPooe Paradigm-Shifts Proposed Previously 4.3 sMS[1 sin oe fasmercmlioepro- multicore symmetric a of cores assigns [51] AsyMOS symmet- utilizing examined [31] project Twin-Linux The imple- to [32] in used also is paradigm master-slave The oeapoce,lk us tep oo↵ to attempt ours, like approaches, Some

Kernel-Space User-Space INT/SYSENTER Hardware-Enforced Privilege a sn SecVisor Using (a) iue5 eetdTcnqe oDfa e2s rvlg saainAttacks Escalation Privilege ret2usr Defeat to Techniques Selected 5: Figure Mode Boundary Mode JMPkernel_target ... fn: execve(shell); ... attacker_target: attacker_target IRET/SYSEXIT rsmlrsecu- similar er 75 lmntn h eurmn ospotmlil processes multiple support to [43]. space requirement address single the a within Eliminating process single 0. a ring running by from hardware the to the access applications augment native since have actually architecture, would could an design of re- chal- proposed capabilities shared particularly The to is authority access lenging. centralized controlling a library, when without a Even sources in developer. handled application is become the this management, of memory protect responsibility virtual to the provide as can significantly. such kernel development secure processes, a all application that compli- tasks the it the of of because Many ExoKernel job largely the the common-place, overall, cates system become a not for has security Although more necessary resources. o↵ering hardware layer of multiplexing possible Ex- the thinnest manage the to the hardware, with only interact provides to oKernel appli- use the can that developer abstractions cation providing than Rather kernels. entirely. monolithic as performance of levels o↵ same to processes. functionality struggle the level microkernels core user that into means small, migrated this Unfortunately, are drivers the device as keep such to 39]. [6, order techniques verification us- In formal of and possibility analysis the formal for [54]. mini- ing allow vulnerabilities bases code of code likelihood source small the the Additionally, decrease keep to many order to and in aim [52], mal Bear microkernels [28], All QNX several [55], as others. been L4 such [27], have literature the There [3], in presented kernel. microkernels codebase of system small examples operating a the emphasizing by for architecture kernel design. lithic kernel alternative an studies of mi- case possibility particular, provide the In unikernels for and than quo. , rather status crokernels, paradigm existing entire the patching the simply redefining by benefits rity nkrestaeflxblt o euiyadperformance and security for flexibility trade Unikernels is kernel the what redefining suggests [22] ExoKernel The mono- standard the from departs microkernel a of idea The Kernel-Space User-Space INT/SYSENTER Hardware-Enforced Privilege Mode Boundary Mode b sn kGuard Using (b) JMPfault_handler else JMPtarget iftarget ... fn: execve(shell); ... attacker_target: IRET/SYSEXIT attacker_target Є kernel er and/or multiple users simplifies the code base required to expected to be more secure, since the number of vulnera- implement a and reduces the overhead required to bilities in a project is directly related to its size [54]. Secu- complete a single unit of useful work. Several examples have rity is notoriously dicult to measure, but a detailed secu- been deployed alongside virtualization technologies in cloud rity study will consider the prototype’s capabilities against applications [9, 38, 45]. Despite their proven usefulness for known attack vectors and speculate on its possible resiliency providing fast, highly focused applications, unikernels don’t, to previously unknown “zero day” attacks. in isolation, provide protection from most of the attack vec- Evaluating the performance impact of the proposed de- tors discussed in this paper. Additionally, in order to sup- sign is more straight-forward. Bear comes packaged with port the multiple-user multiple-job paradigm that conven- a suite of benchmark tests that perform CPU and mem- tional applications require to operate e↵ectively, they require ory intensive software tasks, including an adaptation of the a hypervisor for scheduling and other process-management malloc benchmark test presented in [40]. Additionally, the type tasks. This is an obvious instance of the often criticized industry-standard AIM9 benchmark suite has been ported “turtles all the way down” approach to system security [10, to Bear. The performance of these test suites will be com- 57]. pared between our prototype and the standard Bear build in order to estimate the overall performance impact of the de- sign. We will also conduct micro-benchmarks to investigate 5. NEXT STEPS specific system tasks instead of overall system throughput. The design paradigm we’ve described has the potential to substantively increase both performance and security. In order to measure its validity, we will develop a prototype kernel that employs it. The platform for our prototype is a research microkernel called Bear [52]. It is very small, but still supports full 64-bit multicore Intel hardware, including the Intel VT-x and VT-d virtualization features. Its small 6. CONCLUSION size and full feature set make it a perfect candidate for the Over the past several decades, few have attempted to type of aggressive kernel redesign necessary to implement a change the traditional operating system design methods. We prototype of our design. believe that symmetric multiprocessing is imposing unnec- Developing the prototype will involve many challenging essary limits on the security and performance of operating research and design questions. Utilizing Intel’s SMP archi- systems. In this paper, we’ve shown an alternative design tecture to implement an unconventional AsMP design in- paradigm based on asymmetric multiprocessing to o↵er pos- volves several challenges including: sibilities for increased security and performance. Our design abandons the traditional weakly-separated ker- Using IPIs in a considerably di↵erent way from their nel and user virtual address spaces in favor of a strongly • standard uses in the bootup process, interrupt propa- separated kernel- and user-space. In particular, this ap- gation, and cache coherency. proach will execute the kernel on one core and applications on other cores, with the two pieces of software never sharing Using the virtualization layer on a given core to stop hardware. System calls and other communication between • malicious or confused privileged user processes from the kernel and the application will be conducted using IPIs inappropriately manipulating hardware. instead of the traditional INT/IRET or SYSENTER/SYSEXIT methods. Finally, the processor’s virtualization layer will Using the virtualization layer and IPIs to force pro- be utilized on a per-core basis to protect the hardware from • cesses to comply with the commands of the kernel. malicious processes. Without the kernel sharing a core, a failure to police We expect an implementation of our design to o↵er sev- delinquent processes might lead to spamming the ker- eral contributions to the current state of the art. Divorc- nel with syscalls or refusal to cooperate when the ker- ing virtual address spaces will enable complete sandboxing nel wants to switch processes on a core. of each user application. Redefining “user-space” will al- low for device drivers with the security of user-space encap- Bootstrapping user applications on cores without a sulation and the performance of kernel modules, and will • loaded kernel. make the ring 3 hardware protection mechanisms available for use within applications. Simultaneous operation of the Merging the virtualization rules required in the pro- kernel and applications introduces the possibility for real- • posed design with existing type-1 or type-2 hypervisors time kernel watchdog security monitors for intrusion de- to support running this operating system in a cloud, or tection in applications. The asymmetric multiprocessing deploying commercial virtualization products on top of paradigm allows for uniquely fine-grained security policy en- this system. forcement enabled by per-core virtualization. Finally, the increased concurrency and faster system call interface sug- Verifying the identity of each core during intercore gest increased overall performance. • communication. We plan to build a prototype kernel that uses the design presented in this paper. In order to measure its success, the We will measure the prototype’s success with an estimate prototype will be evaluated in terms of its complexity, its of its complexity, a detailed security study, and a thorough security, and its performance. We hope that this design and performance evaluation. any future prototype implementation will provide a valu- Executable size and lines of source code provide an esti- able case study in the sparse world of alternative operating of the prototype’s complexity. A small prototype is system designs.

76 Acknowledgments the 2010 Workshop on New Security Paradigms, Many thanks to Jason Dahlstrom, Steve Kuhn, and espe- NSPW’10, pages 51–60, New York, NY, USA, 2010. cially Rob Denz and Martin Osterloh for support during the ACM. development of the ideas described in this paper; and to Ross [11] S. Brookes, R. Denz, M. Osterloh, and S. Taylor. Philipson, Sergey Bratus and Michael Locasto for agreeing ExOShim: Preventing Memory Disclosure using to provide feedback on a draft of the paper. Finally, thanks Execute-Only Kernel Code. In Proceedings of the 11th to the NSPW community including, but not limited to, our International Conference on Cyber Warfare and anonymous reviewers and our shepherd, Matt Bishop. Security, ICCWS’16, pages 56–66, April 2016. This material is based on research sponsored by the De- [12] S. Brookes and S. Taylor. Containing a Confused fense Advanced Research Projects Agency (DARPA) under Deputy on x86: A Survey of Privilege Escalation agreement number FA8750-11-2-0257. The U.S. Govern- Mitigation Techniques. IJACSA, April 2016. ment is authorized to reproduce and distribute reprints for [13] Z. Brown. Asynchronous System Calls. In Proceedings Governmental purposes notwithstanding any copyright no- of the Ottawa Linux Symposium (OLS), pages 81–85, tation thereon. The views and conclusions contained herein 2007. are those of the authors and should not be interpreted as nec- [14] E. Buchanan, R. Roemer, S. Savage, and H. Shacham. essarily representing the ocial policies or endorsements, ei- Return-Oriented Programming: Exploitation without ther expressed or implied, of the Defense Advanced Research Code Injection, 2008. Projects Agency (DARPA) or the U.S. Government. [15] A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler. An Empirical Study of Operating Systems Errors. In 7. REFERENCES Proceedings of the Eighteenth ACM Symposium on [1] CVE-2013-2094, May 2013. Operating Systems Principles, SOSP ’01, pages 73–88, [2] CVE-2016-0728, January 2016. New York, NY, USA, 2001. ACM. [3] M. Accetta, R. Baron, W. Bolosky, D. Golub, [16] F. J. Corbat´oand V. A. Vyssotsky. Introduction and R. Rashid, A. Tevanian, and M. Young. Mach: A New Overview of the Multics System. In Proceedings of the Kernel Foundation for Development. pages November 30–December 1, 1965, Fall Joint Computer 93–112, 1986. Conference, Part I, AFIPS ’65 (Fall, part I), pages [4] G. M. Amdahl. Validity of the Single Processor 185–196, New York, NY, USA, 1965. ACM. Approach to Achieving Large Scale Computing [17] P. Crawford. Linux Watchdog Daemon - Overview. Capabilities. In Proceedings of the April 18-20, 1967, http://www.sat.dundee.ac.uk/psc/watchdog/ Spring Joint Computer Conference, AFIPS ’67 watchdog-background.html, January 2016. (Spring), pages 483–485, New York, NY, USA, 1967. [18] J. Criswell, N. Dautenhahn, and V. Adve. KCoFI: ACM. Complete Control-Flow Integrity for Commodity [5] S. Balakrishnan, R. Rajwar, M. Upton, and K. Lai. Operating System Kernels. In Proceedings of the 2014 The Impact of Performance Asymmetry in Emerging IEEE Symposium on Security and Privacy, SP’14, Multicore Architectures. SIGARCH Comput. Archit. pages 292–307, May 2014. News, 33(2):506–517, May 2005. [19] J. Criswell, A. Lenharth, D. Dhurjati, and V. Adve. [6] C. Baumann, B. Beckert, H. Blasum, and T. Bormer. Secure Virtual Architecture: A Safe Execution Formal Verification of a Microkernel Used in Environment for Commodity Operating Systems. In Dependable Software Systems. In B. Buth, G. Rabe, Proceedings of Twenty-first ACM SIGOPS Symposium and T. Seyfarth, editors, Computer Safety, Reliability, on Operating Systems Principles, SOSP ’07, pages and Security, volume 5775 of Lecture Notes in 351–366, New York, NY, USA, 2007. ACM. Computer Science, pages 187–200. Springer Berlin [20] R. C. Daley and J. B. Dennis. Virtual memory, Heidelberg, 2009. processes, and sharing in Multics. Communications of [7] K. J. Biba. Integrity considerations for secure the ACM,11(5):306–312,1968. computer systems. Technical report, DTIC Document, [21] R. Denz. Securing the Cloud with Utility Virtual 1977. Machines. PhD thesis, Thayer School of Engineering [8] S. Boyd-Wickizer, H. Chen, R. Chen, Y. Mao, at Dartmouth College, July 2016. F. Kaashoek, R. Morris, A. Pesterev, L. Stein, M. Wu, [22] D. R. Engler, M. F. Kaashoek, and J. O’Toole, Jr. Y. Dai, Y. Zhang, and Z. Zhang. Corey: An Operating Exokernel: An Operating System Architecture for System for Many Cores. In Proceedings of the 8th Application-level Resource Management. In USENIX Conference on Operating Systems Design Proceedings of the Fifteenth ACM Symposium on and Implementation, OSDI’08, pages 43–57, Berkeley, Operating Systems Principles, SOSP ’95, pages CA, USA, 2008. USENIX Association. 251–266, New York, NY, USA, 1995. ACM. [9] A. Bratterud, A.-A. Walla, P. E. Engelstad, [23] P. Enslow, Jr. Multiprocessor Organization – a Survey. K. Begnum, et al. IncludeOS: A minimal, resource ACM Comput. Surv., 9(1):103–129, Mar. 1977. ecient unikernel for cloud services. In 2015 IEEE 7th [24] S. Fischer. Supervisor Mode Execution Protection. International Conference on Cloud Computing NSA Trusted Computing Conference and Exposition, Technology and Science (CloudCom), pages 250–257. 2011. IEEE, 2015. [25] A. Ganapathi, V. Ganapathi, and D. Patterson. [10] S. Bratus, M. E. Locasto, A. Ramaswamy, and S. W. Windows XP Kernel Crash Analysis. In Proceedings of Smith. VM-based Security Overkill: A Lament for the 20th Conference on Large Installation System Applied Systems Security Research. In Proceedings of Administration, LISA ’06, pages 12–12, Berkeley, CA,

77 USA, 2006. USENIX Association. [41] R. Lindsley and D. Hansen. Bkl: One lock to bind [26] J. N. Herder. Towards a true microkernel operating them all. In Ottawa Linux Symposium, page 301, 2002. system. PhD thesis, Vrije Universiteit Amsterdam, [42] A. Lineberry. Malicious Code Injection via/dev/mem. February 2005. 2009. [27] J. N. Herder, H. Bos, B. Gras, P. Homburg, and A. S. [43] A. Madhavapeddy and D. J. Scott. Unikernels: Rise of Tanenbaum. : A Highly Reliable, the virtual library operating system. Queue,11(11):30, Self-repairing Operating System. SIGOPS Oper. Syst. 2013. Rev., 40(3):80–89, July 2006. [44] A. Mahmood and E. J. McCluskey. Concurrent error [28] D. Hildebrand. An Architectural Overview of QNX. In detection using watchdog processors-a survey. IEEE Proceedings of the Workshop on Micro-kernels and Transactions on Computers,37(2):160–174,Feb1988. Other Kernel Architectures, pages 113–126, Berkeley, [45] J. Martins, M. Ahmed, C. Raiciu, and F. Huici. CA, USA, 1992. USENIX Association. Enabling fast, dynamic network processing with [29] R. Hund, T. Holz, and F. C. Freiling. Return-oriented clickos. In Proceedings of the second ACM SIGCOMM Rootkits: Bypassing Kernel Code Integrity Protection workshop on Hot topics in software defined Mechanisms. In Proceedings of the 18th Conference on networking, pages 67–72. ACM, 2013. USENIX Security Symposium, SSYM’09, pages [46] R. McDougall and J. Mauro. Solaris internals: Solaris 383–398, Berkeley, CA, USA, 2009. USENIX 10 and OpenSolaris kernel architecture. Pearson Association. Education, 2006. [30] Intel. Intel 64 and IA-32 Architectures Software [47] metasploit. Chkroot Local Privilege Escalation, Developer’s Manual Combined Volumes: 1, 2A, 2B, November 2015. 2C, 3A, 3B, and 3C, June 2014. [48] J. C. Mogul, J. Mudigonda, N. Binkert, [31] A. Joshi, S. Pimpale, M. Naik, S. Rathi, and P. Ranganathan, and V. Talwar. Using Asymmetric K. Pawar. Twin-Linux: Running independent Linux Single-ISA CMPs to Save Energy on Operating Kernels simultaneously on separate cores of a Systems. IEEE Micro,28(3):26–41,2008. multicore system. In Proceedings of the Ottawa Linux [49] I. Molnar. 4G/4G split on x86, 64 GB RAM (and Symposium,2010. more) support, July 2003. [32] S. Kagstrom, L. Lundberg, and H. Grahn. A novel [50] T. Morad, U. Weiser, A. Kolodny, M. Valero, and method for adding multiprocessor support to a large E. Ayguade. Performance, power eciency and and complex uniprocessor kernel. In Parallel and scalability of asymmetric cluster chip multiprocessors. Distributed Processing Symposium, 2004. Proceedings. Letters,5(1):14–17,Jan2006. 18th International, pages 60–, April 2004. [51] S. Muir and J. Smith. AsyMOS-an asymmetric [33] keegan. Attacking Hardened Linux Systems with multiprocessor operating system. In Open Kernel JIT Spraying, June 2011. Architectures and Network Programming, 1998 IEEE, [34] V. P. Kemerlis, M. Polychronakis, and A. D. pages 25–34, Apr 1998. Keromytis. Ret2Dir: Rethinking Kernel Isolation. In [52] C. Nichols, M. Kanter, and S. Taylor. Bear – A Proceedings of the 23rd USENIX Conference on Resilient Kernel for Tactical Missions. In Military Security Symposium, SEC’14, pages 957–972, Berkeley, Communications Conference, MILCOM 2013 - 2013 CA, USA, 2014. USENIX Association. IEEE, pages 1416–1421, Nov 2013. [35] V. P. Kemerlis, G. Portokalidis, and A. D. Keromytis. [53] Y. Padioleau, J. L. Lawall, and G. Muller. kGuard: Lightweight Kernel Protection Against Understanding Collateral Evolution in Linux Device Return-to-user Attacks. In Proceedings of the 21st Drivers. In Proceedings of the 1st ACM USENIX Conference on Security Symposium, SIGOPS/EuroSys European Conference on Computer Security’12, pages 39–39, Berkeley, CA, USA, 2012. Systems 2006, EuroSys ’06, pages 59–71, New York, USENIX Association. NY, USA, 2006. ACM. [36] D. Keuper. XNU: a security evaluation, December [54] R. K. Pandey and V. Tiwari. Article: Reliability 2012. Issues in Open Source Software. International Journal [37] J.-J. Khalife. MS15-010/CVE-2015-0057 win32k Local of Computer Applications,34(1):34–38,November Privilege Escalation, December 2015. 2011. Full text available. [38] A. Kivity, D. Laor, G. Costa, P. Enberg, N. Har’El, [55] D. Potts, S. Winwood, and G. Heiser. Design and D. Marti, and V. Zolotarov. OSv-optimizing the Implementation of the L4 Microkernel for Alpha operating system for virtual machines. In 2014 Multiprocessors, 2002. annual technical conference (usenix atc 14), pages [56] rebel. issetugid() + rsh + libmalloc osx local root, 61–72, 2014. July 2015. [39] G. Klein, J. Andronick, K. Elphinstone, T. Murray, [57] T. Roscoe, K. Elphinstone, and G. Heiser. Hype and T. Sewell, R. Kolanski, and G. Heiser. Comprehensive Virtue. In Proceedings of the 11th USENIX Workshop Formal Verification of an OS Microkernel. ACM on Hot Topics in Operating Systems, HOTOS’07, Trans. Comput. Syst., 32(1):2:1–2:70, Feb. 2014. pages 4:1–4:6, Berkeley, CA, USA, 2007. USENIX [40] C. Lever and D. Boreham. Malloc() Performance in a Association. Multithreaded Linux Environment. In Proceedings of [58] D. Rosenburg. SMEP: What is it, and How to Beat it the Annual Conference on USENIX Annual Technical on Linux., June 2011. Conference, ATEC ’00, pages 56–56, Berkeley, CA, [59] J. H. Saltzer and M. D. Schroeder. The protection of USA, 2000. USENIX Association.

78 information in computer systems. Proceedings of the on Computer and Communications Security,CCS’07, IEEE,63(9):1278–1308,1975. pages 552–561, New York, NY, USA, 2007. ACM. [60] R. S. Sandhu, E. J. Coyne, H. L. Feinstein, and C. E. [63] K. Spett. Cross-site scripting. Technical report, SPI Youman. Role-based access control models. Computer, Labs, 2005. (2):38–47, 1996. [64] A. Starke. Locking in os kernels for smp systems. [61] A. Seshadri, M. Luk, N. Qu, and A. Perrig. SecVisor: Citeseer, 2006. A Tiny Hypervisor to Provide Lifetime Kernel Code [65] K. Thompson. Reflections on Trusting Trust. Integrity for Commodity OSes. SIGOPS Oper. Syst. Commun. ACM, 27(8):761–763, Aug. 1984. Rev.,41(6):335–350,Oct.2007. [66] K. Way. Lastore-Daemon in Deepin 15 results in [62] H. Shacham. The Geometry of Innocent Flesh on the privilege escalation, February 2016. Bone: Return-into-libc Without Function Calls (on the x86). In Proceedings of the 14th ACM Conference

79