A Low-Cost Memory Remapping Scheme for Address Bus Protection

A Lowcost Memory Remapping Scheme for Address Bus Protection Lan Gao Jun Yang Marek Chrobak Youtao Zhang* San Nguyen HsienHsin S. Leey Computer Science and Engineering Department *Computer Science Department University of California, Riverside University of Pittsburgh Riverside, CA 92521 Pittsburgh, PA 15260 lgao,junyang,marek,snguyen @cs.ucr.edu [email protected] f g ySchool of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA 30332 [email protected] ABSTRACT patched to geographically distributed machines that are often con- The address sequence on the processor-memory bus can reveal abun- tributed by volunteers. Even though distributed security protocols dant information about the control flow of a program. This can lead authenticate and authorize resources and users, there is no protec- to critical information leakage such as encryption keys or propri- tion mechanism once a job starts executing on a remote node. It is etary algorithms. Addresses can be observed by attaching a hard- difficult to recognize if the execution was tampered with, or if any ware device on the bus that passively monitors the bus transac- secret it carries was compromised locally. A host machine, having tion. Such side-channel attacks should be given rising attention full access to all the local resources, can launch a background pro- especially in a distributed computing environment, where remote gram to analyze the execution of an active job and extract confiden- servers running sensitive programs are not within the physical con- tial information such as the encryption key [21]. Also, the memory trol of the client. contents could be altered either through software breaches or by Two previously proposed hardware techniques tackled this prob- privileged users so that the computation time of a job is length- lem through randomizing address patterns on the bus. One proposal ened [12]. This is especially harmful in a commercial environment permutes a set of contiguous memory blocks under certain condi- where resources and services are charged by hour [2]. tions, while the other approach randomly swaps two blocks when There have been a number of proposals that address the attacks necessary. In this paper, we present an anatomy of these attempts from a host machine to its guest programs. The attacks can origi- and show that they impose great pressure on both the memory and nate from a privileged user [12, 18, 27], a normal user [21], or even the disk. This leaves them less scalable in high-performance sys- physical accesses [12, 18, 27, 31, 33]. A common countermeasure tems where the bandwidth of the bus and memory are critical re- is to encrypt memory contents to prevent secrecy violation, and to sources. We propose a lightweight solution to alleviating the pres- authenticate the memory at runtime to prevent memory corruption sure without compromising the security strength. The results show induced misbehavior during execution. Both can be provided by a that our technique can reduce the memory traffic by a factor of 10 secure processor architecture such as XOM [18] or AEGIS [27]. compared with the prior scheme, while keeping almost the same In addition, the dynamic execution address sequence can be ob- page fault rate as a baseline system with no security protection. served locally and analyzed to expose program's critical information such as the private key in the RSA algorithm [15]. Losing a Categories and Subject Descriptors: key allows an attacker to impersonate a trusted user or to delegate C.1 [Processor Architectures]: Miscellaneous; K.6 [Management a victim's access right to other malicious users. Exposing dynamic of Computing and Information Systems]: Security and Protection address sequence can also reveal the program's control flow infor- General Terms: Design, Performance, Security. mation to enable a commercial software developer to reconstruct Keywords: Secure Processor, Address Bus Leakage Protection. a core algorithm from a competitor [13, 14, 32, 33]. Hardware tapping devices for extracting addresses from the system bus, or even injecting traffic into the bus are readily available. For exam- 1. INTRODUCTION ple, an FPGA-based programmable device can be attached to an In typical computer systems, host machines are protected from SMP bus to perform on-line cache emulations taking real-time bus the exploits of malicious applications. For example, applications traffic from real workloads [20]. There are commercially available downloaded from unauthorized sources may contain malicious code mod-chips that can be soldered onto the bus on the motherboard in such as viruses and worms. We address the other side of the prob- a Sony Playstation or Microsoft Xbox to misguide the console to lem where the confidentiality of an application should be protected play pirated games [4]. from being revealed by the host machine itself. This is an often Several techniques have been designed to combat the address in- overlooked but realistic problem in prevalent computing models formation leakage: Goldreich et al. proposed three program trans- such as distributed computing. formation methods [13, 14] to construct a data-oblivious memory In distributed computing, for instance, application tasks are dis- access pattern. Unfortunately, these software-only methods suffer Permission to make digital or hard copies of all or part of this work for from either great performance penalty or memory explosion. Re- personal or classroom use is granted without fee provided that copies are cently, two hardware assisted schemes have been developed to trim not made or distributed for profit or commercial advantage and that copies down the overhead. The HIDE scheme [33] permutes memory sub- bear this notice and the full citation on the first page. To copy otherwise, to space at certain intervals by reading them on-chip and writing them republish, to post on servers or to redistribute to lists, requires prior specific back after permutation. The Shuffle scheme [32] randomly shuffles permission and/or a fee. the memory content whenever a block is read on-chip so that it will PACT'06, September 16–20, 2006, Seattle, Washington, USA. Copyright 2006 ACM 159593264X/06/0009 ...$5.00. be written to a randomly different location later on. Both schemes try to randomize the address sequence appeared on the bus to hide CFGs are constructed. This could ease the identification of the non- easily recognized memory access patterns such as loops. reused part of the code, leading to a potential intellectual property We have observed that the HIDE scheme increases memory ac- theft. cess by a factor of 12 to 32 for page sizes between 4KB and 64KB, More seriously, the timing attacks to Diffie-Hellman, RSA and due to sequential reads/writes and redundant permutations. On other security algorithms exploit the actual directions of a branch the other hand, the Shuffle scheme could induce a large number instruction inside a simple loop to reveal the private key bit-by-bit of page faults in high-performance systems with memory paging. [15]. The loop iterates for a number of times equivalent to the bit Designs that significantly increase the memory or disk demand do width of the private key. Once the address sequence of the loop is not scale well in future machines incorporating multi-threaded or exposed, the private key can be recovered. multi-core processors in which memory and disk bandwidth are Finally, addresses to data region can also expose control flow both the first-order constraints. It is therefore compelling to take since some data are only accessed by one path of the program. those constraints into considerations while designing a secure ar- Therefore, protecting the data addresses should be carried with the chitecture. code addresses. Next, we will briefly review two existing algo- In this paper, we address the two main problems of the aforemen- rithms on address sequence protection, followed by a performance tioned security methods: memory access increase and page fault evaluation for each of them. increase. These problems are mainly due to the following reasons: (1) excessive memory reads and writes on every permutation, (2) wasteful permutations, and (3) non-restricted block relocation. We 3. OVERVIEW OF ADDRESS SEQUENCE propose a lightweight on-chip address permutation that effectively PROTECTION addresses all the three problems and achieves the lowest memory The basic idea of address sequence protection is to break its cor- demands and page faults occurrence. Our main idea is to permute relation with the CFG of the program. For example, the sequence only on-chip cached blocks, to avoid the memory sweeps in HIDE, of “a a+4 a+8” does not correspond to sequential instructions and launch a permutation for only those addresses that have not in the code. Also a sequence of “abc abc” does not indicate been remapped since they are read on-chip. Our scheme incurs a loop structure. Prior schemes (HIDE and Shuffle) incorporated only a 0.88 increase in memory accesses and a close-to-base page randomization of the memory contents from time to time. Next, we fault rate, without× compromising the security strength. will explain the processor-memory and hardware support model in The remainder of this paper is organized as follows. Section 2 those schemes. describes the motivation of this work. Section 3 gives an overview of the two schemes that we improve, and the in-depth analysis of 3.1 Model and premises them. Section 4 introduces our proposed scheme, followed by its We choose to use the secure processor models proposed recently architecture design issues in Section 5. Section 6 presents our ex- [16, 18, 27] as our foundation. Both HIDE and the Shuffle scheme perimental results. Section 7 discusses the related work and Section fall into this framework. In such a secure model, the processor is 8 concludes this paper.

Load more