Bmcarmor: a Hardware Protection Scheme for Bare-Metal Clouds

(This is the accepted version of the paper) BMCArmor: A Hardware Protection Scheme for Bare-metal Clouds

Takaaki Fukai∗, Satoru Takekoshi∗, Kohei Azuma†, Takahiro Shinagawa† and Kazuhiko Kato∗ ∗University of Tsukuba, Ibaraki, Japan Email: {fukai,satorut}@osss.cs.tsukuba.ac.jp, [email protected] †The University of Tokyo, Tokyo, Japan Email: [email protected], [email protected]

Abstract—Traditional infrastructure-as-a-service (IaaS) clouds emerged. Bare-metal clouds provide physical machines (called provide virtual machines as servers. However, virtualization in- bare-metal instances) rather than VMs. Besides avoiding the curs a performance overhead and prevents maximum utilization overhead of virtualization, bare-metal instances have two ad- of hardware functions, so several IaaS vendors have started new services called bare-metal clouds that provide physical rather ditional advantages. The first is performance stability: since than virtual machines, allowing users to have direct access to bare-metal instances are not shared with other users, there is physical hardware in the cloud. Unfortunately, exposing physical no fluctuation in performance. The second is functionality: hardware to users causes a hardware protection issue for cloud modern hardware supports many advanced functions, such as vendors. Since physical hardware uses non-volatile memory multi-queue and Single Root I/O Virtualization (SR-IOV) that (NVM) to store firmware code and configuration data, this is also exposed to users. If the NVM is modified by malicious users, improve performance [1]–[6]. Bare-metal instances can allow the hardware could be permanently corrupted or infected by direct access to such functions because they avoid the security malware without being noticed. This is difficult for cloud vendors risk of sharing hardware [7], [8]. Bare-metal instances are to prevent because bare-metal clouds have no virtualization already provided by several cloud vendors, including IBM [9], layer to protect their hardware. In this paper, we describe the Internap [10], Oracle [11], and Rackspace [12]. types of attacks that are possible for bare-metal clouds and propose BMCArmor, a hardware protection scheme for bare- Bare-metal instances allow users to access physical hard- metal clouds. BMCArmor uses a thin hypervisor that does ware directly. This is not a problem for most hardware not virtualize the hardware, just preventing access to NVM. components because most hardware states are volatile and can Our experiments show that BMCArmor can successfully protect be restored to their original state by a hardware reset. However, hardware while incurring little performance overhead. the non-volatile memory (NVM) installed in some hardware Index Terms—virtualization, hardware protection, bare-metal cloud, firmware can be problematic. NVM is used for storing persistent information such as firmware code and configuration data, which is generally used at boot time to set up the machine correctly. I.INTRODUCTION However, NVM can be modified from software for firmware Infrastructure-as-a-service (IaaS) clouds are a type of cloud updates and configuration changes. If the data in NVM is service that provides access to server machines via the Internet. changed, it is not reverted to its original states even if the These machines can be used as web servers, database servers, machine is rebooted or shut down, meaning that it remains computing nodes, and so on. Previously, IaaS clouds have in place even after the machine has been returned to the only provided virtual machines (VMs), because they are easy cloud vendor. This allows malicious users to potentially attack to manage with software. However, virtualization degrades cloud vendors and other cloud users, for example, by writing performance, which could be a critical problem for users who incorrect data to NVM. In our experiments, writing incorrect need to handle heavy workloads, such as machine learning and data to the NVM in an onboard network interface card (NIC) scientific computation tasks. To provide higher performance, resulted in a BIOS boot error. Moreover, if an attacker installs a new type of IaaS cloud called a bare-metal cloud has a rootkit in the firmware, the next user might be seriously harmed by information theft or corruption [13]. Firmware © 2017 IEEE. Personal use of this material is permitted. Permission from rootkits are dangerous because they cannot be removed even IEEE must be obtained for all other uses, in any current or future media, by reinstalling the operating system (OS). including reprinting/republishing this material for advertising or promotional NVM for firmware can be protected by hardware-based purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. mechanisms and digital signatures. However, various attack Takaaki Fukai, Satoru Takekoshi, Kohei Azuma, Takahiro Shinagawa, techniques to bypass these protections are known [14]. Several Kazuhiko Kato. BMCArmor: A Hardware Protection Scheme for Bare- malware detection techniques and software-based verification metal Clouds. In Proceedings of the 9th IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2017), Dec 2017. methods have been proposed [15]–[17], but they cannot pre- http://dx.doi.org/10.1109/CloudCom.2017.43 vent the destruction of NVM data. Another simple approach is to restore a clean image to the NVM hardware after the bare- privileges to the hardware via software and so, for example, metal instance is returned to the vendor. However, performing can run an arbitrary code in kernel mode and issue any I/O such an operation for every instance is costly and slows down access instructions she wishes. We do not allow for physical instance provision. Moreover, if the machine cannot boot, no attacks on the hardware, because the hardware is installed verification or restoration can be performed except by using inside the cloud provider’s data center. hardware-based ROM writers. Even if the machine does still In traditional IaaS clouds, the machines that are leased boot, firmware rootkits are difficult to detect and remove. to users are VMs that are created by software when they To prevent such situations, malicious access to NVM should are leased. After they are returned to the vendor, they are therefore be prevented from the outset. simply destroyed and never reused. In contrast, the machines A hardware protection mechanism for bare-metal clouds that are leased in bare-metal clouds are physical machines. should avoid performance overhead to maintain the advantages After having been leased to a user and then returned to the of bare-metal instances. In addition, the protection mecha- vendor, they are reinitialized, such as by clearing the storage nism should not depend on the guest OS, because in our and resetting the BIOS configuration, and then leased again threat model malicious users have OS administrator privileges. to another user. This usage model leads to a situation where a Firmware is in general difficult to modify for cloud vendors modification made by a malicious user may affect subsequent because its source code is not open, so firmware-based pro- users if the modification is allowed to persist. tection mechanisms are difficult to deploy. We assume two attack scenarios for bare-metal clouds. In this paper, we propose a physical hardware protection The first is a denial-of-service (DoS) attack against the cloud scheme for bare-metal clouds. Our scheme, called BMCAr- vendor. An attacker may be able to cause hardware to break mor, protects the hardware using a thin hypervisor that pre- down by modifying NVM with software. For example, she vents malicious users from writing to the physical hardware’s can break a NIC by writing incorrect data into the NIC’s NVM by detecting and discarding the I/O sequences used EEPROM [21]–[23]. In addition, since EEPROM is limited to to write to NVM devices. We enumerate the NVM in the a maximum number of write cycles, simply writing to its NVM hardware and its access interfaces based on hardware spec- sufficiently many times may break the hardware. Moreover, by ifications, then design protection logic to prevent all write carefully (or randomly) manipulating data in its NVM, she can access to the NVM devices. The hypervisor avoids virtualizing cause firmware code using the data to hang, ensuring that the hardware where possible to keep the performance overhead physical machine can never be booted again. To repair the low while continuing to provide useful hardware functions. machine, the cloud vendor might need to reprogram its NVM For example, most I/O accesses to PCI devices, timer devices, using a special hardware device, such as a ROM writer. This and interrupt controllers pass through the hypervisor without type of DoS attack is serious for the cloud vendor because being intercepted. This architecture allows the hypervisor to repairing the machine requires additional cost and time. achieve almost bare-metal performance. The second attack scenario is installing a firmware rootkit. As a proof of concept, we implemented a prototype hy- An attacker could exploit an interface provided for firmware pervisor. This is based on BitVisor [18] and supports pro- updates and infect the firmware with a rootkit. Although this tection for EEPROM in Intel NICs and BIOS ROM. In may seem difficult, many such attack methods have been experiments to confirm the effectiveness of the protection, we actually reported. For example, Delugre [24] implemented a used chipsec [19], a hardware security assessment tool, and rootkit that runs on Broadcom NICs, and a security company ethtool [20], a standard tool for writing to the EEPROM in implemented a malicious UEFI updater which can install a NICs. In addition, we measured the network performance of UEFI rootkit in the BIOS ROM of an Intel motherboard [25]. our system, demonstrating that BMCArmor can successfully In addition, the NSA implemented a malware application, protect physical hardware with much less performance over- named DEITYBOUNCE that allows periodic arbitrary code head than conventional virtual machine monitors (VMMs). execution by installing a malicious code in the BIOS of a The remainder of this paper is organized as follows. Sec- DELL machine [26], [27]. Zaddach et al. [13] demonstrated tion II describes the threat model for bare-metal clouds, while an attack where they were able to install a rootkit on the Section III reviews related work. Section IV describes the HDD without physical access. Since firmware has the highest design of our protection scheme and Section V explains privilege level in the software stack, it is difficult to detect or our prototype hypervisor implementation. Section VI presents remove firmware rootkits. To make matters worse, a rootkit the experimental results. Section VII discusses some issues. can access most of the data in the machine, so it could easily Finally, our conclusions are presented in Section VIII. steal sensitive data or destroy a user’s valuable data. Note, too, that these attacks can be automated by exploiting II.THREAT MODEL the APIs provided by cloud vendors for ordering instances, This section describes our threat model for bare-metal running provisioning scripts, and returning the instances. clouds. We assume that an attacker is a normal user of the Therefore, such attacks could cause an enormous amount of bare-metal cloud who has ordered a bare-metal instance from damage in a short period of time. the cloud provider and is accessing it via the Internet. Since a Some hardware has the ability to write-protect NVM. How- bare-metal instance is a physical machine, she has full access ever, many firmwares do not correctly enable these functions. In fact, we found that these protection functions were not on a virtual machine, BareBox runs malware on a bare- enabled for the bare-metal instances of several real bare-metal metal machine and restores clean OS states after analyzation. clouds. This means that an attacker could easily modify the However, BareBox cannot restore BIOS and NVM states in NVM. Even if the hardware-based protection is enabled, a the hardware. In contrast, BMCArmor can protect BIOS and vulnerability might be found in the function [14], allowing an NVM in the bare-metal environments. Bulygin et al. [35] pro- attacker to still modify the NVM in bare-metal instances. posed DeepWatch, a malware detection system to detect and remove virtualization rootkits and SMM rootkits. DeepWatch III.RELATED WORK exploited a micro controller embedded in the chipset to access This section discusses related work. Lo¨ıc [15] et al. demon- DRAM securely. Although this approach is OS-independent, strated that they could get control of a machine by using a it is heavily dependent on the chipset. On the other hand, NIC vulnerability and proposed a malware detection scheme BMCArmor does not depend on the chipset and only requires called NAVIS, which detects malware in a NIC’s firmware hardware-assisted virtualization. Vasiliadis et al. [36] proposed by checking the firmware’s integrity. Li [16] et al. proposed GPU-assisted malware to evade malware-detection systems. VIPER, a malware detection method that uses a challenge- Since it resides in GPU, detecting it by monitoring CPUs or response protocol between the OS and peripheral devices. main memory is difficult. However, GPU states are volatile In VIPER, the OS sends a challenge message to the device and can be easily erased by a system reset. The target of our and measures the response delay, taking a large delay as system is non-volatile states that cannot be easily restored to an indication of the presence of malware. VIPER can detect safe states. all known attacks against peripheral devices, including proxy attacks which could not be detected previously. Although both IV. DESIGN methods are countermeasures against malware in the firmware, This section presents the design of BMCArmor, which can they cannot prevent the installation of malware. In addition, protect the NVM of physical hardware with little virtualization both methods assume that the OS can be trusted, despite the overhead. fact that the OS cannot be trusted in our threat model. They are, therefore, not suitable for protecting bare-metal clouds. A. Overview In contrast, BMCArmor protects firmware from malware by Hardware protection systems for bare-metal clouds should preventing access to NVM without depending on the OS. be OS-independent, lightweight, and small. Cloud vendors Zhang [17] et al. proposed IOCheck, which uses system cannot trust OSs running on bare-metal instances because management mode (SMM), the most privileged mode in users can install arbitrary OSs, so the protection system x86/x64 CPUs, to check peripheral devices and firmware. should be provided in a different layer. At the same time, the Since SMM code runs without depending on the OS, IOCheck protection system should not sacrifice the performance of bare- does not need to trust the OS. To securely boot the BIOS, metal instances, meaning that it should incur as little overhead which sets up the SMM code, static root of trust for mea- as possible. In addition, it should be as small as possible to surement is used. Unfortunately, this method requires BIOS avoid introducing vulnerabilities into the system. modification, which is difficult in most machine environments. To achieve OS-independent and low-overhead protection In contrast, BMCArmor only needs an open-source hypervisor, with a small system, we exploit a thin hypervisor. A hypervisor which can easily be installed on x86/x64 machines. runs with a higher privilege level than the OS, so it can Many hypervisor-based security enhancement approaches securely protect the NVM without depending on the OS. have been proposed [28]–[33]. In general, hypervisors hide To reduce virtualization overhead, our hypervisor limits the physical devices and prevent direct access to the NVM, thereby number of guest OSs that can run simultaneously to one and protecting it. However, traditional hypervisors cannot avoid exposes the physical hardware directly to the guest OS where performance overhead and inherent vulnerabilities, because possible. To protect the NVM, the hypervisor only intercepts they have to support many operations and have complex write access to it, thus minimizing virtualization overhead. structures. For example, to support multiple VMs, hypervisors This design also helps to reduce the hypervisor size, and must perform hardware virtualization and resource manage- therefore the size of the TCB. ment, such as scheduling virtual CPUs and managing I/O access from multiple VMs to a single physical device. These B. Hypervisor architecture varied and complex operations incur an inevitable performance In order to design a lightweight and small hypervisor, overhead, as well as requiring complex hypervisor structures, we exploited the parapass-through architecture [18]. In this leading to large trusted computing base (TCB) sizes. Since a architecture, the hypervisor only supports one guest OS and bug in the TCB becomes a vulnerability, large hypervisors are it controls the hardware almost completely. Avoiding the not suitable for security purposes. need to run multiple OSs simultaneously significantly reduces There are several studies on malware analysis, detection, virtualization overhead and hypervisor size. For example, the and its avoidance. Kirat et al. [34] proposed BareBox, a hypervisor does not need to perform complex management malware analysis system to analyze VM-aware malware. operations to share CPUs, memory, and physical devices Since VM-aware malware behaves differently when it runs among multiple VMs, such as virtual CPU scheduling, virtual, V. IMPLEMENTATION = Read access = Write access This section describes the BMCArmor implementation. We Guest OS implemented our prototype hypervisor based on BitVisor [18], [37], which is a parapass-through hypervisor. Although our Hypervisor prototype hypervisor only supports Intel CPUs, we believe that Parapass through driver our scheme can also be applied to CPUs from other vendors, Hardware such as AMD or ARM. NVM Other Functions A. Attack surfaces of real hardware

Fig. 1. The BMCArmor architecture with the parapass-through hypervisor The target machine for our implementation consisted of an ASRock X99 Extreme4 motherboard, with an Intel C610/X99 series chipset, an Intel Xeon CPU E5-2603 v4 (1.70GHz) and physical memory management, and arbitration of physical CPU, and an Intel 82574L Gigabit NIC. We set up the machine device access. This means that the hypervisor does not need to boot using BIOS, not UEFI. Our prototype hypervisor to intervene in hardware access requests from the guest OS, protects the BIOS ROM and the Intel NIC. Although it does including I/O accesses, direct memory accesses, and interrupts. not currently support other devices, such as storage devices, To protect the NVM, the hypervisor intercepts write I/O we believe that our scheme can be applied to other devices as access to it. The hypervisor uses a parapass-through driver that well with a cost similar to that of protecting Intel NICs. partially intercepts access to hardware devices and determines To protect NVM, hardware-based protection should be whether to allow or deny access. The driver has knowledge enabled, so we inspected the protection functions of the BIOS of the physical devices that have NVM, such as their register ROM in our machine using chipsec [19]. The results showed layout and the meaning of the registers, but it does not need to that the BIOS left the following functions disabled. fully control them because they are mostly controlled by the • BIOS Write Protect: If this is enabled, CPUs are prohib- guest OS. The size and complexity of the parapass-through ited from writing to BIOS ROM until all the CPUs have driver is therefore much smaller than that of a normal driver. entered SMM. This is an effective way to avoid a race Fig. 1 shows the architecture of BMCArmor. Access to condition vulnerability [14]. devices that do not have NVM (“Other Devices” in the figure) • Serial Peripheral Interface (SPI) Range Protection: This is passed through, allowing the guest OS to control the devices can specify BIOS ROM ranges where software is not directly. On the other hand, access to NVM is intercepted by permitted to write and/or read. a parapass-through driver, which examines the type of access • SPI Configuration Lockdown: If this is enabled, the requested. If it is read access, the driver allows pass-through software cannot modify the SPI settings, including SPI access to the device and the guest OS is allowed to read the Range Protection, without rebooting. NVM. If it is write access, on the other hand, the driver blocks To protect the BIOS ROM, our hypervisor enables these func- access so that the NVM is not modified. tions. The protection functions can be enabled in software by setting flags in particular chipset registers [38]. Unfortunately, the Intel NIC had no such protection functions. C. Booting the hypervisor Another thing the hypervisor should do is to block In our scheme, the hypervisor must boot before the guest OS all interfaces to access NVM. According to the hardware for two reasons: the hypervisor must enable the hardware pro- datasheet [38], [39], the interfaces to be blocked are as follows. tection functions before the guest OS boots, and the hypervisor • The SPI interface for the BIOS ROM must protect itself. To protect the hypervisor, the storage where • The register for multi-byte access to the Intel NIC’s NVM the hypervisor image is stored must be protected, which the (flash or EEPROM), called the EEWR register hypervisor does either by hiding the storage itself or denying • A memory-mapped region of the Intel NIC’s NVM write access to that region. The hypervisor also protects its • The SPI interface for the Intel NIC’s flash memory memory region from the guest OS and physical devices by • The SPI interface for the Intel NIC’s EEPROM using nested paging and IOMMU. For all these reasons, the Note that the Intel NIC has multiple interfaces for accessing administrator of the physical machine must set up the firmware its NVM for compatibility. Its specification allows network (BIOS or UEFI) to boot the hypervisor first. adapter manufacturers to use either EEPROM or flash memory The hypervisor also enables hardware-based NVM write as NVM, so the NIC has two SPI interfaces, one for EEPROM protection if this is available but has not been set yet. Since and one for flash memory. Our prototype blocks both inter- hardware-based write protection, once set, is difficult to be faces, as shown in the following section. reset, the guest OS would not be able to disable it. To prevent attacks exploiting vulnerabilities, the hypervisor prevents the B. Block write I/O access to NVM access to that function. Hypervisor developers can determine To block write I/O access to the NVM, the hypervisor how to enable this protection from the device’s datasheet. needs to intercept write I/O access requests. To do this, the hypervisor uses Intel VT-x functions to ensure that I/O # chipsec_main access attempts to specific regions from the guest OS causes [...] [!] None of the SPI protected ranges \ control to be transferred to the hypervisor via a transition write-protect BIOS region called VMExit. The hypervisor uses two different mechanisms [...] to cause VMExit for programmed I/O (PIO) and memory- [CHIPSEC] Modules failed 2: [-] FAILED: chipsec.modules.common.bios_wp mapped I/O (MMIO) access. For PIO access, the hypervisor [-] FAILED: chipsec.modules.common.spi_lock uses a VT-x function that causes VMExit when any I/O [...] port specified in a given bitmap is accessed. For MMIO Fig. 2. Partial result of executing chipsec_main on bare metal access, the hypervisor uses the extended page table (EPT) to cause VMExit, setting the EPT using identity mapping, # chipsec_main i.e., a mapping where the guest and host physical address [...] are identical. The hypervisor uses permission bits in the EPT [+] PASSED: BIOS is write protected (by SMM and \ entries to ensure write access to specified MMIO pages causes SPI Protected Ranges) [...] VMExit. Since the guest OS can change the MMIO page [CHIPSEC] Modules failed 0: addresses by writing to base address registers (BARs) in PCI [...] configuration space, the hypervisor tracks BAR accesses and [+] PASSED: chipsec.modules.common.bios_wp [+] PASSED: chipsec.modules.common.spi_lock changes the EPT configuration accordingly. [...] Unfortunately, MMIO access can only be intercepted at page granularity. Therefore, if the hypervisor needs to intercept Fig. 3. Partial result of executing chipsec_main on BMCArmor write access to a sensitive register in a MMIO page (access to which would allow NVM write access), access must be and the Intel NIC were 5,897 lines. These were smaller than intercepted for the whole MMIO page. This may incur a those of the device drivers for common OSs and VMMs. performance penalty if a frequently-accessed register is on the same page as a sensitive register. In fact, this situation exists in VI.EXPERIMENTS the Intel NIC, but fortunately, the performance impact caused is smaller than that of common VMMs, as revealed by our A. Setup performance evaluation (see Section VI-C and Section VI-D). We used the physical machine described in Section V-A, We discuss this limitation further in Section VII-A. running Ubuntu 16.04.2 with Linux kernel 4.4.0. The QEMU In our current implementation, the hypervisor simply blocks version used for the control experiments was 2.5.0. all write access to NVM interfaces. In this case, any guest OS that tries to access these interfaces may hang while a software B. Protection component waits for NVM write to complete. However, non- We performed experiments to confirm the effectiveness of malicious software does not usually write to NVM, so the BMCArmor. These experiments checked whether or not it hypervisor does not necessarily need to try to avoid such enabled the hardware-based protection functions and whether situation. If necessary, however, the hypervisor would need or not it blocked write access to NVM. In these experiments, to emulate the device’s error behavior by intercepting read we used chipsec 1.3.0 and ethtool 4.5. access to the registers that indicate the access status. Imple- Chipsec has a chipsec_main command that checks menting this would be complex, because read I/O access would whether or not chipset’s protection functions are enabled. sometimes need to be intercepted and sometimes need to be Fig. 2 and Fig. 3 show parts of the results produced by allowed. Although we have implemented this emulation for the the command on bare metal and on BMCArmor, respec- chipset and confirmed that it worked, it was not used here. tively. The command performed three checks, as described in Section V-A. The first check, SPI protected ranges, C. Protecting the hypervisor checks whether or not at least one of the SPI Protected Ranges To protect the hypervisor, the stored hypervisor image must covers the BIOS region in the BIOS ROM. The second one, be write-protected. While it is possible to write-protect storage chipsec.modules.common.bios_wp, checks whether access, we took a simpler approach: using network boot. The or not the chipset’s BIOS protection is enabled. This checks hypervisor image is loaded from a network server, and since the BIOS Write Enable bit and the BIOS Write Protect bit. the network boot protocols do not support write access, the The third one, chipsec.modules.common.spi_lock, hypervisor image is protected. As for the memory image, the checks whether or not the SPI configurations are locked down. hypervisor uses the EPT to write-protect its memory region. Fig. 2 and Fig. 3 show that the BIOS did not enable these In addition, the hypervisor sets up the IOMMU not to allow functions, while BMCArmor successfully enabled them. access to the hypervisor memory region from I/O devices. The chipsec_util spi write command in chipsec attempts to write to the BIOS ROM via the chipset’s SPI D. Hypervisor size interface. Fig. 4 and Fig. 5 show the results of executing this The core of the hypervisor was approximately 40,000 lines. command on bare metal and on BMCArmor, respectively. The The code sizes of the parapass-through drivers for the chipset command attempted to write 16 bytes of data from “data.bin” # chipsec_util spi write 0x215270 data.bin # chipsec_util spi disable-wp [...] [...] [CHIPSEC] writing to SPI flash memory at \ [CHIPSEC] trying to disable BIOS write protection.. FLA = 0x215270 from ’data.bin’ [+] BIOS region write protection is disabled in \ [spi] writing 0x10 bytes to SPI at FLA = 0x215270 \ SPI flash (in 4 0x4-byte chunks + 0x0-byte remainder) [CHIPSEC] (spi disable-wp) time elapsed 0.000 [spi] writing chunk 0 of 0x4 bytes to 0x215270 [spi] writing chunk 1 of 0x4 bytes to 0x215274 Fig. 6. The result of executing chipsec_util spi disable-wp on [spi] writing chunk 2 of 0x4 bytes to 0x215278 bare metal [spi] writing chunk 3 of 0x4 bytes to 0x21527C [CHIPSEC] completed SPI flash memory write # chipsec_util spi disable-wp [CHIPSEC] (spi write) time elapsed 0.001 [...] [CHIPSEC] trying to disable BIOS write protection.. chipsec_util spi write Fig. 4. The result of a attempt on bare metal [-] couldn’t disable BIOS region write protection \ in SPI flash # chipsec_util spi write 0x215270 data.bin [CHIPSEC] (spi disable-wp) time elapsed 0.000 [...] [CHIPSEC] writing to SPI flash memory at \ FLA = 0x215270 from ’data.bin’ Fig. 7. The result of executing chipsec_util spi disable-wp on [spi] writing 0x10 bytes to SPI at FLA = 0x215270 \ BMCArmor (in 4 0x4-byte chunks + 0x0-byte remainder) [spi] writing chunk 0 of 0x4 bytes to 0x215270 WARNING: SPI cycle not done BMCArmor. ERROR: SPI flash write cycle failed [spi] writing chunk 1 of 0x4 bytes to 0x215274 C. Network benchmark WARNING: SPI cycle not done ERROR: SPI flash write cycle failed To demonstrate the performance impact of partially in- [spi] writing chunk 2 of 0x4 bytes to 0x215278 tercepting write I/O access to the NIC, we measured the WARNING: SPI cycle not done network performance on bare metal, BMCArmor, and KVM. ERROR: SPI flash write cycle failed [spi] writing chunk 3 of 0x4 bytes to 0x21527C The benchmark client machine had a Xeon E3 CPU and a WARNING: SPI cycle not done 10GbE NIC, and was connected directly to the target machine ERROR: SPI flash write cycle failed without using a network switch. The network device of the WARNING: SPI flash write returned error \ (turn on VERBOSE) KVM VM was virtio-net with vhost-net. We used netperf [40] [CHIPSEC] (spi write) time elapsed 0.772 for the benchmark and measured the throughput and latency of the TCP and UDP protocols, measuring the throughput for Fig. 5. The result of a chipsec_util spi write attempt on BMCAr- mor inbound and outbound workloads. To maximize the network performance, the core running the netperf server process and the core receiving interrupts from the NIC were separated. to 0x215270 in the BIOS ROM. In our machine, this address Fig. 10 shows the throughput results. The horizontal axis stored the path of a UEFI boot entry. The results show that shows the packet size and the vertical axis shows the through- chipsec was able to write to the BIOS ROM on bare metal, put, normalized by the bare-metal results (higher is better). but could not write to the ROM on BMCArmor (producing After five initial warm-up measurements, we measured each “ERROR: SPI flash write cycle failed”). Note, however, that throughput and latency ten times, and the graphs show the the command could not write the data correctly even on bare averages and standard deviations of these ten results. metal: the data in the BIOS ROM was changed but the entry For the TCP inbound workload, there was no difference was broken. To correctly write the data, we would need to between the three setups. This is because the TCP stack in consider the checksum and signature. the client machine merged the packets according to Nagle’s The chipsec_util spi disable-wp command in algorithm, so the target machine received a small number chipsec attempts to disable hardware-based write protection. of large packets in all environments. For the UDP inbound Fig. 6 and Fig. 7 show the results of executing this on workload, BMCArmor incurred an overhead of less than 1% bare metal and on BMCArmor, respectively, showing that the for any packet size, while KVM incurred an overhead of write protection was successfully disabled on bare metal, but approximately 65–70% for packets smaller than 256 bytes. prevented on BMCArmor. The reason for the large difference between the results on bare To confirm the effectiveness of the NVM protection for the metal and KVM is many interrupts and non-merged packets. Intel NIC, we used ethtool. Ethtool has a function that can For the TCP outbound workload, BMCArmor also incurred write data to the NVM in Ethernet devices, accessing it via an overhead of less than 1% for any packet size. On the other the NIC’s EEPROM Write register. Fig. 8 and Fig. 9 show hand, KVM incurred an overhead of approximately 16–32% the results for bare metal and BMCArmor, respectively. In for packets equal to or smaller than 32 bytes. For packets both figures, the three commands show the original EEPROM larger than 32 bytes, KVM still incurred an overhead of ap- value, the result of changing the first word to 0x11, and the proximately 3.3–8.0%. The reason for the larger performance final value, respectively. The results show that the EEPROM difference compared with the TCP inbound workload is that, value was changed on bare metal, but was not changed on while the target machine’s TCP stack also merged the packets ˜# ethtool -e enp3s0 offset 0 length 6 Offset Values TCP Inbound workload ------1.0 0x0000: 00 1b 21 53 84 3f 0.8 ˜# ethtool -E enp3s0 magic 0x10d38086 value 0x11 \ 0.6 offset 0x0 0.4 (Mbps/sec) Normalized ˜# ethtool -e enp3s0 offset 0 length 6 Throughput 0.2 Offset Values 0.0 1 2 4 8 16 32 64 128 256 512 1024 ------0x0000: 11 1b 21 53 84 3f UDP Inbound workload ˜# 1.0 0.8 Fig. 8. The result of writing to NVM using ethtool on bare metal 0.6 0.4 (Mbps/sec) Normalized ˜# ethtool -e enp3s0 offset 0 length 6 Throughput 0.2 Offset Values 0.0 1 2 4 8 16 32 64 128 256 512 1024 ------0x0000: 00 1b 21 53 84 3f TCP Outbound workload ˜# ethtool -E enp3s0 magic 0x10d38086 value 0x11 \ 1.0 offset 0x0 0.8 Cannot set EEPROM data: Operation not permitted 0.6 ˜# ethtool -e enp3s0 offset 0 length 6 0.4 Offset Values (Mbps/sec) Normalized Throughput 0.2 ------0.0 0x0000: 00 1b 21 53 84 3f 1 2 4 8 16 32 64 128 256 512 1024 ˜# UDP Outbound workload Fig. 9. The result of writing to NVM using ethtool on BMCArmor 1.0 0.8 0.6 according to Nagle’s algorithm, in a workload with such small 0.4 (Mbps/sec) Normalized Throughput 0.2 packets, the merged packets were still small. In addition, the 0.0 OS had to handle an interrupt for each send request. 1 2 4 8 16 32 64 128 256 512 1024 The results for the UDP outbound workload showed a simi- Size of a packet lar trend to those for the UDP inbound workload. BMCArmor BMCArmor KVM incurred an overhead of less than 5% for any packet size, compared with the larger KVM overhead of approximately 58–86% for packets equal to or smaller than 512 bytes. Fig. 10. Network throughput results Fig. 11 shows the network latency results. The vertical axis shows the latency in µs (lower is better). In both workloads, Baremetal BMCArmor incurred an overhead of less than 1%, compared UDP BMCArmor with 24–27% for KVM. KVM

Protocol TCP The difference in overhead between BMCArmor and KVM was caused by differing numbers of I/O access and interrupt 0 20 40 60 80 100 interceptions. BMCArmor only intercepted write access to a Latency (us) specific MMIO region, while KVM intercepted all I/O access to the virtual NIC, as well as interrupts. The MMIO region on Fig. 11. Network latency results BMCArmor is accessed in a network communication because the interrupt mask register in the Intel NIC, a register that is frequently accessed to mask the interrupts on the completion VMExits ten times and averaged the results. The number of of transmission and reception, lies on the same MMIO page VMExits was 79% less on BMCArmor than on KVM (28275.7 as the sensitive register, as discussed in Section V-B. Never- vs 134170.3). Even if we exclude the VMExits for the PAUSE theless, the overhead is still much lower than for KVM. instruction on KVM, the number of the VMExits was still 66% less on BMCArmor than on KVM (28275.7 vs 83422.3). D. Number of interceptions Most of the VMExits on BMCArmor were caused by the The number of VMExits indicates how many times access EPT violations involved in intercepting write I/O requests, has been intercepted. This number is a measure of overall especially for write access to the interrupt mask register. There performance, so we measured the number of VMExits during were no write accesses to the BIOS EEPROM. the UDP latency workload on BMCArmor and KVM. For On KVM, the main reasons for the VMExits were “External BMCArmor, we added code to count the number of VMExits, interrupt,” “I/O instruction,” and “Write to MSR,” excluding while for KVM we used the kvm_stat command. the “PAUSE instruction.” The “External interrupt” and “I/O in- Table I shows the number of VMExits per second. After struction” VMExits were for device virtualization. The “Write five initial warm-up measurements, we counted the number of to MSR” VMExits, on the other hand, occurred on writing TABLE I virtualizing devices to maintain the high performance of bare- THENUMBEROF VMEXITSFOR BMCARMORVS KVM metal instances. We have designed and implemented this scheme so that it can protect the BIOS ROM and EEPROM in Exit Reason BMCArmor KVM an Intel NIC. Our experiments confirmed that BMCArmor can PAUSE instruction - 50748.0 prevent access to NVM, and the performance results show that Write to MSR - 39854.7 BMCArmor incurred a much smaller overhead than KVM. In External interrupt - 33094.5 future work, we will also support the protection of additional I/O instruction - 10473.5 devices and UEFI firmware. In addition, we will compare EPT Violation 28239.3 - BMCArmor with the protection provided by SR-IOV. CPUID instruction 32.0 - Exception or NMI 2.4 - ACKNOWLEDGMENT VMCALL 2.0 - This work was supported by JSPS KAKENHI Grant Num- Total 28275.7 134170.3 ber JP16H02798. The authors would like to thank Enago Total (excluding PAUSE) 28275.7 83422.3 (www.enago.jp) for the English language review.

REFERENCES to the IA32 TSC DEADLINE MSR register, which is used [1] A. Belay, G. Prekas, A. Klimovic, S. Grossman, C. Kozyrakis, and to specify the timing of the next APIC timer interrupt. These E. Bugnion, “IX: A protected dataplane operating system for high VMExits decreased performance, as shown Section VI-C. throughput and low latency,” in Proc. 11th USENIX Symp. on Operating Systems Design and Implementation, Oct. 2014, pp. 49–65. VII.DISCUSSION [2] S. Peter, J. Li, I. Zhang, D. R. K. Ports, D. Woos, A. Krishnamurthy, T. Anderson, and T. Roscoe, “Arrakis: The operating system is the A. Shared MMIO pages control plane,” in Proc. 11th USENIX Symp. on Operating Systems Design and Implementation, 2014, pp. 1–16. As described in Section V-B, Intel VT-x does not provide [3] “DPDK.” [Online]. Available: http://dpdk.org/ a function for fine-grained interception of MMIO registers; [4] “Storage Performance Development Kit.” [Online]. Available: http: hypervisors must specify the regions to be intercepted with //www.spdk.io/ [5] “Seastar.” [Online]. Available: http://www.seastar-project.org page granularity using EPT. This means that if a non-sensitive [6] “ScyllaDB.” [Online]. Available: http://www.scylladb.com/ register is on the same MMIO page as a sensitive register, [7] T. Ristenpart, E. Tromer, H. Shacham, and S. Savage, “Hey, you, get and the non-sensitive register is accessed frequently, there is off of my cloud: Exploring information leakage in third-party compute clouds,” in Proc. 16th ACM Conf. on Computer and Communications a large interception overhead. For PIO and model-specific Security (CCS 2009), 2009, pp. 199–212. registers, Intel VT-x supports bitmap-based byte-granularity [8] F. Zhang, J. Chen, H. Chen, and B. Zang, “CloudVisor: Retrofitting interception, so to reduce overhead and achieve true bare-metal protection of virtual machines in multi-tenant cloud with nested virtualization,” in Proc. 23rd ACM Symp. on Operating Systems Principles, performance, we hope future CPUs will also support byte- 2011, pp. 203–216. granularity interception of MMIO registers. Another solution [9] “IBM Bluemix.” [Online]. Available: https://www.ibm.com/ would be to redesign the layout of the MMIO registers in cloud-computing/bluemix/ [10] “Internap.” [Online]. Available: http://www.internap.com devices so that frequently-accessed registers do not share an [11] “Oracle Cloud.” [Online]. Available: https://cloud.oracle.com/ MMIO page with registers for accessing NVM. [12] “Rackspace.” [Online]. Available: http://www.rackspace.com/ [13] J. Zaddach, A. Kurmus, D. Balzarotti, E.-O. Blass, A. Francillon, B. SR-IOV-based protection T. Goodspeed, M. Gupta, and I. Koltsidas, “Implementation and implica- tions of a stealth hard-drive backdoor,” in Proc. 29th Annual Computer Using SR-IOV can achieve partial NVM protection. In SR- Security Applications Conf., 2013, pp. 279–288. IOV, a hypervisor manages the physical functions and only [14] C. Kallenberg and R. Wojtczuk, “Speed racer: Exploiting an intel flash exposes virtual functions to the guest OS. The guest OS protection race condition,” January 2015. [Online]. Available: https: //bromiumlabs.files.wordpress.com/2015/01/speed racer whitepaper.pdf can access the device directly without intervention by the [15] L. Duflot, Y.-A. Perez, and B. Morin, “What if you can’t trust your hypervisor, but the virtual functions do not provide interfaces network card?” in Proc. 14th International Conf. on Recent Advances for accessing NVM. Unfortunately, SR-IOV is currently only in Intrusion Detection, 2011, pp. 378–397. [16] Y. Li, J. M. McCune, and A. Perrig, “VIPER: Verifying the Integrity supported by a limited range of devices, such as NICs, and of PERipherals’ firmware,” in Proc. 18th ACM Conf. on Computer and not chipsets and other peripheral devices. In addition, if the Communications Security, 2011, pp. 3–16. hypervisor uses SR-IOV, the guest OS cannot. In bare-metal [17] F. Zhang, H. Wang, K. Leach, and A. Stavrou, “A framework to secure peripherals at runtime,” in Proc. 19th European Symp. on Research in clouds, users may want to use SR-IOV functions, so BMCAr- Computer Security, 2014, pp. 219–238. mor has an advantage in that it can provide all device functions [18] T. Shinagawa, H. Eiraku, K. Tanimoto, K. Omote, S. Hasegawa, to users while protecting NVM with minimal overhead. T. Horie, M. Hirano, K. Kourai, Y. Oyama, E. Kawai, K. Kono, S. Chiba, Y. Shinjo, and K. Kato, “BitVisor: A thin hypervisor for enforcing I/O device security,” in Proc. 2009 ACM SIGPLAN/SIGOPS International VIII.CONCLUSION Conf. on Virtual Execution Environments, 2009, pp. 121–130. In this paper, we have proposed BMCArmor, a hardware [19] “CHIPSEC: Platform security assessment framework.” [Online]. Available: https://github.com/chipsec/chipsec protection scheme for bare-metal clouds. This scheme uses [20] “ethtool – Utility for controlling network drivers and hardware.” a thin hypervisor to block write access to NVM, and avoids [Online]. Available: https://www.kernel.org/pub/ware/network/ethtool/ [21] “Red Hat Bugzilla – Bug 459202 EEPROM/NVM of the e1000e becomes corrupted.” [Online]. Available: https://bugzilla.redhat.com/ show bug.cgi?id=459202 [22] “Serious e1000e driver issue in SLE 11 beta 1 and openSUSE 11.1 beta 1.” [Online]. Available: https://news.opensuse.org/2008/09/22/ serious-e1000e-driver-issue-in-sle-11-beta-1-and-opensuse-111-beta-1/ [23] “Status of the e1000e issue.” [Online]. Available: https://news.opensuse. org/2008/10/03/status-of-the-e1000e-issue/ [24] G. Delugre,` “How to develop a rootkit for Broadcom NetExtreme network cards,” in Recon, 2011. [Online]. Available: http://esec-lab. sogeti.com/static/publications/11-recon-nicreverse slides.pdf [25] Intel Corporation, “Hacking Team’s “Bad BIOS”: A commercial rootkit for UEFI firmware?” Tech. Rep., 2015. [Online]. Available: http://www.intelsecurity.com/advanced-threat-research/ht uefi rootkit.html 7142015.html [26] “DEITYBOUNCE.” [Online]. Available: https://www.eff.org/files/2014/ 01/06/20131230-appelbaum-nsa ant catalog.pdf [27] “Comment on der spiegel article regarding NSA TAo organization.” [Online]. Available: http://en.community. dell.com/dell-blogs/direct2dell/b/direct2dell/archive/2013/12/30/ comment-on-der-spiegel-article-regarding-nsa-tao-organization [28] P. M. Chen and B. D. Noble, “When virtual is better than real,” in Proc. 8th Workshop on Hot Topics in Operating Systems, 2001, pp. 133–138. [29] K. Kourai and S. Chiba, “HyperSpector: Virtual distributed monitoring environments for secure intrusion detection,” in Proc. 1st ACM/USENIX International Conf. on Virtual Execution Environments, 2005, pp. 197– 207. [30] K. Asrigo, L. Litty, and D. Lie, “Using VMM-based sensors to monitor honeypots,” in Proc. 2nd International Conf. on Virtual Execution Environments, 2006, pp. 13–23. [31] J. Yang and K. G. Shin, “Using hypervisor to provide data secrecy for user applications on a per-page basis,” in Proc. 4th ACM SIG- PLAN/SIGOPS International Conf. on Virtual Execution Environments, 2008, pp. 71–80. [32] S. T. Jones, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau, “VMM- based hidden process detection and identification using lycosid,” in Proc. 4th ACM SIGPLAN/SIGOPS International Conf. on Virtual Execution Environments, 2008, pp. 91–100. [33] Y. Chubachi, T. Shinagawa, and K. Kato, “Hypervisor-based prevention of persistent rootkits,” in Proc. 2010 ACM Symp. on Applied Computing, 2010, pp. 214–220. [34] D. Kirat, G. Vigna, and C. Kruegel, “BareBox: Efficient malware analysis on bare-metalq,” in Proc. 27th Annual Computer Security Applications Conf., 2011, pp. 403–412. [35] Y. Bulygin and D. Samyde, “Chipset based approach to detect virtualization malware,” BlackHat Briefings USA, 2008. [36] G. Vasiliadis, M. Polychronakis, and S. Ioannidis, “GPU-assisted malware,” in Proc. 5th International Conf. on Malicious and Unwanted Software, Oct 2010, pp. 1–6. [37] “BitVisor: A single-VM lightweight hypervisor.” [38] “Intel X99 Chipset Platform Controller Hub Datasheet.” [Online]. Available: https://www.intel.com/content/www/us/en/chipsets/ x99-chipset-pch-datasheet.html [39] “Intel 82574 gigabit ethernet controller family: Datasheet.” [On- line]. Available: https://www.intel.com/content/www/us/en/embedded/ products/networking/82574l-gbe-controller-datasheet.html [40] “The netperf homepage.” [Online]. Available: http://www.netperf.org/ netperf