Evaluation of Dynamic Binary Translation Techniques for Full System Virtualisation on Armv7-A

Evaluation of dynamic binary translation techniques for full system virtualisation on ARMv7-A Niels Pennemana,∗, Danielius Kudinskasb, Alasdair Rawsthorneb, Bjorn De Suttera, Koen De Bosscherea aComputer Systems Lab, Ghent University, Sint-Pietersnieuwstraat 41, B-9000 Gent, Belgium bSchool of Computer Science, The University of Manchester, Oxford Road, Manchester M13 9PL, UK Abstract We present the STAR hypervisor, the first open source software-only hypervisor for the ARMv7-A architecture that offers full system virtualisation using dynamic binary translation (DBT). We analyse techniques for user-space DBT on ARM and propose several solutions to adapt them to full system virtualisation. We evaluate the performance of a naive configuration of our hypervisor on a real embedded hardware platform and propose techniques to reduce DBT-based overhead. We analyse the impact of our optimisations using both micro-benchmarks and real applications. While the naive version of our hypervisor can get several times slower than native, our optimisations bring down the run-time overhead of real applications to at most 16%. Keywords: Binary translation, Hypervisor, Instruction set architecture, Virtualisation, Virtual machine monitor 1. Introduction The term virtualisation is used in many different contexts, ranging from storage and network technolo- gies to execution environments. In this paper we discuss system virtualisation—a technology that enables multiple guest operating systems to be executed on the same physical machine simultaneously by isolating the physical machine from the operating systems by means of a hypervisor. While system virtualisation in the server world has already matured, it is still an area of ongoing research for embedded systems [4, 21, 25]. Virtualisation solutions for data centres and desktop computers cannot be readily applied to embedded systems because of differences in requirements, use cases, and computer architecture. ARM is by far the leading architecture in the embedded and mobile market [45], and over the past 8 years multiple virtualisation solutions have been developed for it. Most virtualisation efforts for ARM started with paravirtualisation. This is not surprising, because the architecture is not classically virtualisable as shown by our previous work [39]. Paravirtualisation has many drawbacks [14], which can all be solved by full virtualisation. However, full virtualisation requires special hardware support, i.e. classic virtualisability, or special software support, such as dynamically rewriting the code of a guest at run time, a technique known as dynamic binary translation (DBT). ARM has recently extended its ARMv7-A architecture with hardware support for full virtualisation [3]. While hypervisors built for these extensions easily outperform DBT-based hypervisors, they do not run on older hardware. Furthermore, DBT remains useful in a handful of scenarios, ranging from full-system ∗Corresponding author. Tel.: +32 (0)51 303045 Email addresses: [email protected] (Niels Penneman), [email protected] (Danielius Kudinskas), [email protected] (Alasdair Rawsthorne), [email protected] (Bjorn De Sutter), [email protected] (Koen De Bosschere) instrumentation to legacy system software emulation. It is therefore useful to study DBT to virtualise the ARMv7-A architecture. There is, however, little related work for DBT on the ARM architecture in the context of full system virtualisation. Most research focuses on DBT of user-space applications, but not all user-space techniques apply to full system virtualisation. This paper makes the following contributions: • We present the STAR hypervisor, the first open source software-only hypervisor for the ARMv7-A architecture that offers full virtualisation using DBT. • We analyse existing techniques for user-space DBT on ARM and propose several solutions to adapt them to full system virtualisation. • We analyse the performance of a naive configuration of our hypervisor and propose techniques to reduce the overhead of a DBT engine specific to full system virtualisation on ARM. • We provide an evaluation of all our techniques on a real embedded hardware platform. The complete source code of the presented hypervisor is available at http://star.elis.ugent.be. The rest of this paper is organised as follows: Section 2 provides background and discusses problems with existing virtualisation solutions for the ARM architecture. It also discusses why existing user-space DBT techniques for the ARM architecture cannot be readily applied to full system virtualisation. Section 3 provides a brief overview of our hypervisor. In Section 4, we propose techniques specific to system-level DBT for dealing with ARM’s exposed program counter (PC). We analyse the performance of a naive version of our hypervisor in Section 5 to classify the run-time overhead of our DBT engine, focusing specifically on issues with full system virtualisation, and we propose several optimisations. In Section 6, we evaluate how our techniques affect the run-time overhead of the DBT engine on an embedded hardware platform. 2. Background and related work 2.1. Hypervisors for the ARM architecture Many different virtualisation solutions for the ARM architecture have been designed over the past 8 years. Early designs such as ARMvisor [19], B-Labs Codezero Embedded Hypervisor [6], KVM for ARM [15], TRANGO Virtual Processors Hypervisor [49], NEC VIRTUS [32], VirtualLogix VLX [5, 51], VMware MVP [8] and Xen ARM PV [30, 35] all use paravirtualisation, as their development predates the introduction of ARM’s hardware support for full virtualisation. Such hypervisors have many drawbacks [14]: as paravirtualisation hypervisors present their own, custom interface to guests, each guest must be adapted for each specific hypervisor. Since none of the efforts to standardise such interfaces has spread to more than a few operating systems or hypervisors [2, 36, 42], the majority of operating systems does not support such interfaces out of the box. Moreover, the development, maintenance and testing of patches for guests is tedious and costly. Furthermore, licensing may prevent or restrict modifications to operating system source code, and often imposes rules on the distribution of patch sets or patched code. KVM for ARM and Xen ARM PV were superseded by KVM/ARM and a new Xen port in which paravirtualisation support was dropped in favour of ARM’s hardware support [16, 17, 53]. VirtualLogix VLX was used by Red Bend software as the basis for vLogix Mobile, which now also uses ARM’s hardware support instead of paravirtualisation [41]. Some hypervisors support both paravirtualisation and full virtualisation, such as PikeOS [34, 48], OKL4 [26, 50] and Xvisor [54]. PikeOS and OKL4 are commercial microkernel-based hypervisors specifically designed for real-time systems virtualisation. They rely on paravirtualisation for real-time scheduling decisions. 2 Hypervisors built for ARM’s hardware extensions easily outperform DBT-based hypervisors, but do not run on older hardware. Furthermore, DBT remains useful by virtue of its versatility [39]. Virtualisation is often considered as a solution to address software portability issues across different architectures, but hardware extensions merely recreate this problem at a different level. DBT can transparently enable a multitude of other virtualisation usage scenarios such as legacy system emulation and optimisation [13, 20, 24], full system instrumentation and testing [11], optimisations across the border between operating system kernels and applications [1], and even load balancing in heterogeneous multi-core systems [31, 52]. The ITRI Hypervisor presents an interesting case as it uses a combination of static binary translation (SBT) for CPU virtualisation and paravirtualisation for memory management unit (MMU) virtualisation [44]. SBT can be regarded as paravirtualisation, although it does not require the sources of the guest kernels and hence aims to be a more generic approach. Any benefits of SBT over traditional paravirtualisation are however lost due to MMU paravirtualisation. Furthermore, SBT cannot support self-modifying code, and complicates dynamically loading code at run time, as is often done in operating system kernels for device drivers and other kernel modules. The ITRI Hypervisor therefore only supports non-modular kernels. SBT on ARM is hard, because it is common for data to be embedded within code sections, and code may use a mix of 32-bit ARM and variable-width Thumb-2 instructions in which the encoding of each byte can only be deduced by control flow analysis. In the absence of symbolic debug information, extensive analyses and heuristics are required to distinguish code from data [12, 28], and any uncertainties must be dealt with at run time using a combination of interpretation and DBT. There is however no mention of such techniques in any of the ITRI Hypervisor documents [43, 44]. Relying on the availability of symbolic debug information makes SBT-based virtualisation no better than traditional paravirtualisation: symbolic debug information is often not available for commercial software, and its representation may vary across operating systems, so that a translator must support each of them. 2.2. Classic virtualisability Popek and Goldberg [40] formally derived sufficient (but not necessary) criteria that determine whether an architecture is suitable for full virtualisation, currently referred to as classic virtualisability. They

Evaluation of Dynamic Binary Translation Techniques for Full System Virtualisation on Armv7-A

Powervm Virtualization on System P: Managing and Monitoring

IBM Powervm Lx86 Expands the Ability of IBM System P and Bladecenter Servers to Run X86 Linux Applications

IBM Power Systems for the Large Enterprise

IBM Bladecenter JS23 and JS43 Express

IBM System X and Bladecenter Business Partner Guidebook Titles of Interest

IBM Powervm Lx86 Expands the Ability of IBM System P and Bladecenter Servers to Run X86 Linux Applications

Cross-Platform 1 Cross-Platform

A Mainframe Guy Discovers Blades – As in Zenterprise “Blade” Extension

A Comparison of Powervm and X86-Based Virtualization Performance

Enhanced IBM Power Systems Software and Powervm Restructuring

Binary Translation Using Peephole Superoptimizers

IBM Powervm Express Edition and IBM Management Edition for AIX Offerings Help Allocate Systems Resources to Where They Are Needed