FPGA-Accelerated Key Search for Cold-Boot Attacks Against AES
Total Page:16
File Type:pdf, Size:1020Kb
FPGA-accelerated Key Search for Cold-Boot Attacks against AES Heinrich Riebler, Tobias Kenter, Christoph Sorge, and Christian Plessl Department of Computer Science, University of Paderborn 33098 Paderborn, Germany Email: f heinrich.riebler — kenter — christoph.sorge — christian.plessl [email protected] Abstract—Cold-boot attacks exploit the fact that DRAM have changed their value. The second goal of the attacker— contents are not immediately lost when a PC is powered off. which poses the algorithmic and computational challenge—is Instead the contents decay rather slowly, in particular if the to first identify the key in the memory dump, and then to DRAM chips are cooled to low temperatures. This effect opens an correct the errors. If successful, the full disk encryption can attack vector on cryptographic applications that keep decrypted be circumvented using the recovered key. keys in DRAM. An attacker with access to the target computer can reboot it or remove the RAM modules and quickly copy The remainder of this paper is structured as follows. In the RAM contents to non-volatile memory. By exploiting the Section II we discuss related work, in particular the AES key known cryptographic structure of the cipher and layout of the key schedule and the Maxeler data flow system. In Section III we data in memory, in our application an AES key schedule with present the design of the key search procedure for identifying redundancy, the resulting memory image can be searched for key schedules in decayed memory dumps. In Section IV sections that could correspond to decayed cryptographic keys; then, the attacker can attempt to reconstruct the original key. we compare the performance of our accelerator with a CPU However, the runtime of these algorithms grows rapidly with implementation. Finally, we draw conclusions in Section V. increasing memory image size, error rate and complexity of the bit error model, which limits the practicability of the approach. II. BACKGROUND AND RELATED WORK In this work, we study how the algorithm for key search In this section, we describe related work, the foundations can be accelerated with custom computing machines. We present of cold-boot attacks, and our accelerator system. While FPGAs an FPGA-based architecture on a Maxeler dataflow computing have been used for many cryptographic applications and im- system that outperforms a software implementation up to 205x, plementations of the AES algorithm are available (e. g. [4]), which significantly improves the practicability of cold-attacks this is to our knowledge the first work that uses FPGAs for against AES. accelerating cold-boot attacks. I. INTRODUCTION A. Background on Cold-Boot Attacks Protecting the integrity and confidentiality of information The first step of a cold-boot attack is to retrieve the memory is gaining importance in our society. As a reaction individuals content, e.g. by cooling down the memory chips and installing and companies are increasingly using full disk encryption to them on another machine. The resulting memory image has protect access to sensitive data. The most commonly used full to be searched for keys in the next step. Given the possibility disk encryption tools use the symmetric block cipher AES [1]. of bit errors (due to memory decay), cryptographic keys may During runtime, these tools keep the secret key material in not conform exactly to the expected structure. An error model memory, because the key is required for any encryption and is used to decide whether a key candidate is plausible. Once decryption operation. Keeping the key in memory was assumed a plausible candidate has been identified, the original key can to be secure, because the main memory (DRAM) was expected be reconstructed using the same or a different error model. to quickly change into a default state when removing the In recent years, seminal work for efficiently finding [3] and power supply. This assumption has however been invalidated reconstructing [5] AES keys in memory dumps has been by security experts [2]. They have shown that the memory presented. However, these methods either rely on a heuristic contents decay surprisingly slowly over time. The decay can parameter or an idealized, perfect asymmetric decay model, be slowed further by cooling the DRAM chips, which opens which assumes that memory bits can flip only in one direction. the possibility to attack the secret keys and thus full disk For more realistic decay models and higher error rates, the encryption solutions. runtimes of cold-boot attacks strongly increase on general- purpose CPUs [6]. Cold-boot attacks [3] exploit these observations. The attack requires physical access to a machine with a secret key in main B. AES Key Schedule memory, for example, a computer that is running a screen lock while the user is absent. The first goal of the attacker is to AES is a commonly used encryption algorithm. In par- obtain a copy of the complete main memory, for example by ticular, popular tools for the encryption of file systems, like rebooting the machine from a USB thumb drive that quickly TrueCrypt, use AES as their default block cipher. The AES dumps the DRAM contents to non-volatile Flash memory. If key is a 128, 192, or 256 bit (pseudo) random sequence of the reboot process is quick, only a small fraction of the bits will bits without any structure; a naive brute-force attack on AES protected data is thus impractical. However, for performance MPC-C Plattform reasons, not only the AES key itself, but the complete so-called DFE PCIe key schedule is kept in memory. The key schedule is comprised PCIe of the master key and a (key size dependent) number of round keys. Since the actual encryption operation requires the DRAM Memory complete key schedule, the schedule is usually pre-computed Controller and stored as a contiguous array in the main memory. Each DRAM round consists of a certain number of transformations, which are defined by the AES key expansion function [7]. Figures 1 FPGA FPGA FPGA FPGA and 2 show how the 10 round keys in the AES-128 key x86 CPU card card card card schedule are generated from the master key, which is stored as the first row (round 0) in the key schedule. The round keys 1– PCI Express 10, which form the complete key schedule, are derived using Maxeler Design Flow two different functions. The first word in each round key is Original computationally Application Manager computed by applying a complex operation using word 0 and application intensive components Kernel(s) Configuration word 3 of the preceding round key, as illustrated in Fig. 2. (.c, .f) The remaining words 1–3 of each round key are computed by Compiler/ HW HW build a simple XOR operation between the word of the previous MaxCompiler Linker Accelerator or simulation round key and the previous word of the same round key, see Fig. 1. Taking the structure of the whole key schedule into account increases the available information and creates a Fig. 3. Maxeler MPC-C platform architecture and design flow. testable mathematical relationship between the master key and the round keys, which allows for locating possible AES key Virtex-6 FPGA and 24GB of on-board SDRAM memory. The schedules in decayed memory images and for reconstructing server uses two 6-core (12 threads) X5650 CPUs running at the original schedule by exploiting redundancies. 2.67GHz and provides 48GB RAM. For providing fast I/O the server is equipped with 3 SSDs in RAID0 configuration. w 0 1 2 3 b r 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 For mapping applications to this system, Maxeler provides 0 5A 57 45 49 5F 55 4E 44 5F 56 49 45 52 5A 49 47 MaxCompiler [8] which offers a Java API for specifying 1 E5 6C E5 49 BA 39 AB 0D 2 round 1 data flow engines (DFE) that process data streams. Since the 3 specified data path follows a restricted feed-forward data flow 4 XOR model, MaxCompiler can automatically perform optimizations 5 like buffer size optimization, pipelining and retiming, which re- 6 7 sults in very efficient implementations. Each DFE is comprised 8 round 10 of two parts, one or several kernels that implement the data- 9 90 66 20 B3 path and a manager that orchestrates the flow of data between 10 BF 96 99 15 2F F0 B9 A6 kernels, off-chip memory, and CPUs. MaxCompiler transforms the kernel and manager code into a hardware design which Fig. 1. AES key expansion: simple XOR operation for all but the first word in each round (w: word, b: byte, r: round). is further processed by the FPGA vendor tools to generate a bitfile. Additionally, MaxCompiler generates a dedicated library for controlling and configuring the kernel execution w 0 1 2 3 b r 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 from the host application. 0 5A 57 45 49 5F 55 4E 44 5F 56 49 45 52 5A 49 47 round 1 1 E5 6C E5 49 III. IDENTIFYING AES KEYS 2 RCON 3 In this section we present the approach and implementation 4 XOR XOR 5 of our hardware accelerator for searching AES key schedules, rotate substitute 6 based on the algorithm proposed by Halderman et al. [3]. 7 SBOX 8 A. Approach 9 58 A0 6A AE 90 66 20 B3 round 10 10 5D 17 07 CE The key search procedure iterates over all possible candi- Fig. 2. AES key expansion: complex operation for first word in each round.