Porting Hyperkernel to the ARM Architecture

Technical Report UW-CSE-17-08-02 Porting Hyperkernel to the ARM Architecture Dylan Johnson University of Washington [email protected] Keywords ARM, AArch64, Exokernel, Operating trouble exporting hardware control to user space ap- Systems, Virtualization plications. To solve these issues, Hyperkernel was designed to run within x86’s root CPU mode. While tra- Abstract ditionally reserved for hypervisors, root mode created This work describes the porting of Hyperkernel, an a clean separation between the kernel and user address x86 kernel, to the ARMv8-A architecture. Hyperkernel spaces, making it easier to prove properties of isola- was created to demonstrate various OS design deci- tion. In addition, a kernel running in root mode can be sions that are amenable to push-button verification. greatly simplified by exporting hardware resource man- Hyperkernel simplifies reasoning about virtual mem- agement to user processes in non-root ring 0. ory by separating the kernel and user address spaces. This paper provides a single core contribution. We In addition, Hyperkernel adopts an exokernel design to present an investigation into the ability of the ARM minimize code complexity, and thus its required proof architecture to provide an appealing foundation for burden. Both of Hyperkernel’s design choices are ac- push-button verified operating systems. When porting complished through the use of x86 virtualization sup- Hyperkernel to ARM we emphasized maintaining the port. After developing an x86 prototype, advantageous original features of Hyperkernel that allowed it to be design differences between x86 and ARM motivated us easily verified on x86. Additionally, the ARM architec- to port Hyperkernel to the ARMv8-A architecture. We ture created the possibility of optimizing parts of the explored these differences and benchmarked aspects Hyperkernel implementation. For example, ARM proof the new interface Hyperkernel provides on ARM vides greater flexibility to developers during transitions to demonstrate that the ARM version of Hyperkernel into hypervisor mode, indicating that virtual machine should be explored further in the future. We also outline abstractions not maintained in Hyperkernel could lead the ARMv8-A architecture and the various design chal- to faster mode transitions on ARM. We outline these lenges overcome to fit Hyperkernel within the ARM optimizations and benchmark a few of the design deci- programming model. sions made during the porting of Hyperkernel to ARM. The rest of this paper is organized as follows. §2 dis- 1. Introduction cusses work similar to porting Hyperkernel to ARM. §3 Hyperkernel was created to demonstrate the techniques provides an overview of the ARM architecture. §4 de- for building an OS that can be formally verified us- tails the design of Hyperkernel on ARM. §5 describes ing push-button techniques, such as satisfiability mod- the implementation of Hyperkernel on ARM and the ulo theories (SMT). The design of Hyperkernel empha- differences between the x86 and ARM versions. §6 sized a small, statically allocated, and simple exoker- presents our experiments and the performance of ARM nel. During development, it became clear that placing hypercalls. §7 concludes. Hyperkernel in non-root ring 0 would lead to problems. First, the virtual memory model on modern operating 2. Related work systems cannot be verified in a reasonable amount of OS design. Hyperkernel contains several similarities time using SMT solvers, as the kernel and user ad- to Dune, where each process is virtualized, allowing dress space are shared. Second, an exokernel built us- them to have more direct control over hardware fea- ing the typical x86 programming model would have tures such as interrupt vectors [3]. Dune uses a small 1 Linux kernel module to mediate several running virtualized processes, and a user-level library aids in man- EL0 App App App App aging the newly-exposed resources. Hyperkernel also draws inspiration from Exoker- nel [6]. An exokernel is an operating system kernel EL1 OS OS that provides user level management of operating system abstractions. In an exokernel, user applications EL2 Hypervisor are allowed to manage hardware resources, while the kernel enforces resource policies for the entire system. This design provides a platform for domain spe- EL3 Secure Monitor cific optimizations, increased flexibility for application builders, and better optimizations for specific work- Figure 1. The ARM programming model. Moving loads [6]. Previous exokernel implementations such as down the exception levels increases privilege level. The Aegis [6] and Xok [7] provided low-level abstractions secure exceptions levels are omitted for clarity. by running user code in the same address space and CPU mode as privileged kernel code. This allowed the 3.1 Exception levels sharing of memory, but fault isolation was difficult to Figure 1 shows the four possible CPU modes defined guarantee. In Hyperkernel, low-level abstractions are by the ARM architecture. Each mode is referred to as provided by leveraging virtual machine support. These an exception level (abbreviated EL0-EL3) that deter- extensions were previously thought to be too slow to mines the privilege of currently executing software. be efficient [6] but recent popularity of hardware virtu- Transitions into higher exception levels are caused by alization support has slowly improved the performance either synchronous or asynchronous events, called ex- and in some cases the costs have been mitigated en- ceptions. An exception can only be taken to the next tirely [4]. highest exception level or to the current exception level when above EL0. When returning from an ex- Hypervisor design. KVM-ARM is a hypervisor that ception, the exception level can only decrease by a was built and accepted into the mainline Linux kernel. single level or stay the same [1]. In addition, each ex- Similar to Hyperkernel, KVM-ARM utilizes the ARM ception level supports two execution states, AArch32 programming model in an uncoventional way to both and AArch64. AArch32 uses 32-bit widths and is reduce and simplify the code required. Specifically, it backwards compatible with the ARMv7-A architecture uses a new hypervisor design called split-mode vir- while AArch64 uses 64-bit widths and is a new feature tualization, where only a small portion of hypervisor introduced in ARMv8. Hyperkernel currently supports execution uses the ARM virtualization extensions. In- only AArch64 in order to simplify its implementation. stead, the KVM-ARM hypervisor transitions back into ARM designed each exception level to provide spe- the host kernel where it can take full advantage of al- cific capabilities for certain types of system level soft- ready built OS mechanisms [5]. ware. For example, the least privileged level, EL0, typically runs user applications. EL0 has limited exposure to system configuration registers and exception handling in order to provide process isolation. EL1 is de- 3. ARM Architecture Overview signed for operating system execution. It exposes ex- The ARMv8-A architecture, which we will refer to as ception handling, page table modification, and many simply the ARM architecture from here forward, was other system configuration options. EL2 equips soft- designed for full application workloads on the Cortex- ware with hardware support for virtualization, which A family cores. It defines a rich programming model is typically utilized by a hypervisor. Software in EL2 for applications, operating systems, and hypervisors. is given the tools to manage virtual memory, excep- This section summarizes the core pieces of the ARM tion handling, timers, and other sensitive system com- architecture and how they were utilized in the imple- ponents to provide isolation between virtual machines mentation of Hyperkernel on ARM. in EL1. EL2 has the ability to trap virtual machines into 2 264 of translation to protect MMIO peripherals from direct control by a VM by forcing data abort exceptions that TTBR1_EL1 route to EL2. 3.3 TLB unused TTBR0_EL2 TTBR0_EL3 The ARM virtual memory model provides features for operating system developers to mitigate unnecesary TLB flushes. For example, ARM TLB’s can match TTBR0_EL1 entries based on the current exception level, remov- 0 ing the need to flush the TLB when taking an excep- EL0/EL1 EL2 EL3 tion [1]. Similarly on x86, a vm exit into root mode Figure 2. The ARM virtual memory model. EL0 and will not cause a TLB flush. In addition, ARM defines EL1 share an address space across two page table base a unique application specific identifier (ASID) that an registers, TTBR0_EL1 and TTBR1_EL1. EL2 and EL3 OS assigns to each user process. ASID’s are commu- have separate address spaces. nicated to the hardware through the top sixteen bits of TTBR0_EL1. During an address translation, the hardware matches TLB entries using the ASID, removing EL2, where it can emulate or forward sensitive opera- the need to flush the TLB during a context switch. tions. The last and most privileged level is EL3, which On x86, a similar mechansim is provided called the has the ability to switch between a non-secure and se- process-context identifier (PCID). cure execution state. These two states are orthogonal to the exception levels and secure mode grants access to 3.4 Interrupts a separate physical memory. Software running in EL3 Interrupts on ARM are managed and routed by ARM’s typically manages the transitions between the secure Generic Interrupt Controller (GIC). The GIC architec- and non-secure CPU modes. ture defines mechansims to control the generation, rout- ing, and priority of interrupts in a system [2]. The 3.2 Virtual Memory Model third generation of the GIC architecture, GICv3, de- Figure 2 shows a graphical representation of the ARM fines two compontents of a GIC, the distributor and re- virtual memory model. Each exception level above EL0 distributors. Each system has a single distributor, which has control over the configuration of the MMU when handles global interrupts and interprocessor interrupts, executing in their respective levels.

Load more