CARMA: a Hardware Tamper-Resistant Isolated Execution Environment on Commodity X86 Platforms
Total Page:16
File Type:pdf, Size:1020Kb
CARMA: A Hardware Tamper-Resistant Isolated Execution Environment on Commodity x86 Platforms Amit Vasudevan Jonathan McCune James Newsome Adrian Perrig Leendert van Doorn CyLab / Carnegie Mellon University Advanced Micro Devices famitvasudevan, jonmccune, [email protected] jnewsome, [email protected] Abstract Attack Type Description Level Much effort has been spent to reduce the software Trusted AL0 Remote Malware in application Computing Base (TCB) of modern systems. However, the AL1 Remote Malware in Operating System hardware TCB remains complex and untrustworthy. Com- AL2 Remote Malware in devices’ firmware AL3 Local Physical attachment of malicious peripheral equip- ponents such as memory, peripherals, and system buses may ment (e.g., FireWire, Thunderbolt, PC Card, or PCIe become malicious via firmware compromise, a malicious card) manufacturer, a malicious supply chain, or local physical AL4 Local Low-speed bus (e.g., LPC bus, 33 MHz) tampering, tampering. We seek to reduce the hardware TCB to a min- e.g., interpositioning or TPM reset attacks [8, 20] AL5 Local High-speed bus (e.g., memory or PCI bus, 1 GHz or imal set of hardware components that must be trusted. We more) tampering, e.g., eavesdropping to attack data describe the design and implementation of an isolated execu- secrecy, injection to attack code and data integrity tion environment on commodity x86 platforms that only re- AL6 Local Malicious CPU lies on the CPU, without needing to trust the memory, buses, peripherals, or any other system components. AL0 AL1 AL2 AL3 AL4 AL5 AL6 Operating system primitives Categories and Subject Descriptors D.4.6 [Software]: Op- Information flow controlled OS Y Y Verified Kernel Y Y erating Systems-Security and Protection Programming language primitives Software Fault Isolation (SFI) Y Y General Terms Security, Systems Sandboxes (NACL) Y Y Hardware primitives Keywords TCB reduction, Secure Execution, On-die Exe- DRTM (Flicker [13], TV [12]) Y Y Y Y cution, Cache-as-RAM TRESOR [14] Y Y Specialized processor (Mon- YYYYYY driaan [25], Aegis [22], 1. Introduction Cerium [1]) Much effort has been spent to reduce the software Trusted CARMA (this paper) Y Y Y Y Y Y Cryptographic primitives Computing Base (TCB) of modern systems. However, there Secure Multi-Party Computa- YYYYYYY remains a large and complex hardware TCB, including tion [26] memory, peripherals, and system buses. There are many re- alistic adversary models where this hardware may be ma- Figure 1. Adversary attack levels on commodity x86 plat- licious or compromised. Thus, there is a practical need to forms and comparison of existing approaches for defending determine whether we can achieve secure program execu- against them tion in the presence of not only malicious software, but also malicious hardware. Specifically, we consider the adversary attack levels as shown in Figure 1. These attack levels correspond to an Permission to make digital or hard copies of all or part of this work for personal or increasing level of attack cost and complexity. classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation The current frontier in remote attacks (AL2) is to attack on the first page. To copy otherwise, to republish, to post on servers or to redistribute peripheral devices. Arrigo et al. demonstrate how a com- to lists, requires prior specific permission and/or a fee. ASIACCS ’12, May 2–4, 2012, Seoul, Korea. promised network card can use the GPU via peer-to-peer Copyright c 2012 ACM 978-1-4503-1303-2/12/05. $10.00 PCI communication, creating a botnet node with powerful computation and communication abilities, completely out of System reach of the OS [23]. For local attacks, simply booting from removable media Memory CPU Execute or plugging in a malicious device that obtains DMA access Cache Secure Execution to all of memory (AL3) is feasible without tools and with a Environment SEEL Attestation Loader (SEEL) trivial amount of training. More sophisticated attacks require Setup and Secure opening the computer system and directly attaching to a bus, Channel Setup either by resetting the bus or by interposing on the bus sig- Load/Map Secure nals (AL4). The much faster speed of the memory and PCI Executable buses requires more sophistication in terms of equipment Code (SEC) and attacker skill (AL5). Supply-chain infiltration may be a Secure Executable Code more realistic source of such malicious components [15, 24]. blob While a simple handheld device (e.g., a smartphone) could SEEL Secure be programmed to interface with low-speed buses, special- Return ized equipment is needed for the high-speed buses. Given this framework for increasing attack levels, it is interesting to consider existing approaches for defending Figure 2. CARMA Architecture: A Secure Execution Envi- against these threats. Figure 1 describes categories of cur- ronment, running a Secure Executable Code (SEC) module, rent defense mechanisms, and which attack levels they are is created entirely within CPU cache. An external Trusted effective against. Device forms the initial root-of-trust to verify environment TRESOR [14] is one recent system intended to defend instantiation, SEC load, and SEC execution. The TCB in- againt physical cold-boot attacks where memory chips are cludes only the CPU and the Trusted Device. frozen and extracted from a running system, by storing cryp- tographic keys in debug registers within the CPU. However, TRESOR assumes code integrity (i.e., that the attacker is un- point integrity, and data integrity/secrecy, with a dramat- able to modify the contents of memory while the system is ically reduced TCB compared to previous work, so that still running). We grant the attacker this capability in AL3 only the CPU needs to be trusted. Consequently, our desired and above, and seek to achieve code integrity even against properties will hold even if all other hardware components such stronger adversaries. are malicious, including the TPM, peripherals, memory, etc. Secure co-processor hardware achieves strong secrecy Since the CPU is the primary system component that exe- and integrity properties against even malicious peripherals cutes instructions, it is natural to trust it for execution. and buses (up to AL5), but these solutions represent expen- Figure 2 shows our high-level architecture. Our central sive specialized hardware [1, 22, 25]. idea is to execute entirely inside the CPU enclosure, lever- Secure multi-party communication can address all of our aging a special execution mode called Cache-as-RAM. This proposed attack levels, including malicious CPUs (AL6), allows the CPU to act as a self-contained computing environ- and does not require any trusted hardware. However, its ment, without having to rely on external untrusted RAM. We highly inefficient computation (about six orders of magni- use a simple external trusted device (TD) to act as a root of tude slowdown) prohibits viable deployment today [3]. trust to bootstrap the environment. We employ a SoftWare- Contributions. The challenge that we tackle in this paper Only Root of Trust (SWORT) mechanism to verify that the is how to defend against a sophisticated adversary with phys- intended code is loaded, and to establish a shared secret that ical access to the host; i.e., AL5. Our goal is to achieve ef- we use to create a secure communication channel. ficient execution, code integrity, data integrity and secrecy against a class AL5 attacker on commodity hardware. To- 2.1 Background: Cache-as-RAM (CAR) wards this end we have developed CARMA, a secure exe- Commodity CPUs include a cache subsystem to reduce the cution primitive that removes system buses and peripherals average time to access memory (DRAM). The cache is a from the TCB, requiring us to trust only the CPU and a sim- smaller, faster, on-die memory which stores copies of the ple inexpensive external verification device. We implement data from the most frequently used DRAM locations. Most this basic primitive on an off-the-shelf PC (x3). modern desktop and server CPUs have at least three inde- pendent caches: an instruction cache to speed up executable 2. CARMA Design instruction fetch, a data cache to speed up data fetch and On current x86 hardware platforms, the TCB for trustworthy store, and a translation lookaside buffer (TLB) used to speed execution primitives includes various low-level system com- up virtual-to-physical address translation for both executable ponents (CPU, memory-controller, IOMMU, TPM, buses). instructions and data. These caches are usually organized as The goal of CARMA is to achieve code integrity, launch a hierarchy of one or more cache levels (L1, L2 and L3). During system startup, the BIOS first initializes the CPU, securely attest the loaded code to the TD. Since the CPU the cache subsystem, the memory controller (DRAM) and has no on-board hardware root of trust (e.g., the TPM is then proceeds to initialize other hardware platform elements external, across an untrusted bus), we use a SoftWare-Only (PCI, SMBus, peripherals, etc.) before transferring control Root of Trust [2, 4, 7, 9, 10, 18, 19, 21]. We extend the to the operating system. The BIOS is stored and accessed as Pioneer [17] system, which is an implementation of SWORT read-only memory on current platforms.1 This presents an on x86-class systems. In our setting, the TD is aware of the interesting problem to the early stages of the BIOS execu- expected memory content of the cache and sends a challenge tion (CPU, cache and DRAM initialization) which require to the SEEL. The checksum function in the SEEL includes a read-write data area and stack.2 To alleviate this problem, execution state of the CAR environment in the checksum CPUs allow portions of the cache subsystem to be addressed computation.