Protecting Bare-metal Embedded Systems With Privilege Overlays

Abraham A. Clements∗, Naif Saleh Almakhdhub†, Khaled S. Saab‡, Prashast Srivastava†, Jinkyu Koo†, Saurabh Bagchi†, Mathias Payer† ∗Purdue University and Sandia National Laboratories, [email protected] †Purdue University, {nalmakhd, srivas41, kooj, sbagchi}@purdue.edu, [email protected] ‡Georgia Institute of Technology, [email protected]

Abstract—Embedded systems are ubiquitous in every aspect of launched the largest distributed denial of service (DDoS) modern life. As the Internet of Thing expands, our dependence attack to date [39]. The criticality of security for embedded on these systems increases. Many of these interconnected systems systems extends beyond smart things. Micro-controllers ex- are and will be low cost bare-metal systems, executing without an . Bare-metal systems rarely employ any security ecuting bare-metal software have been embedded so deeply protection mechanisms and their development assumptions (un- into systems that their existence is often overlooked, e.g., restricted access to all memory and instructions), and constraints in network cards [26], hard drive controllers [57], and SD (runtime, energy, and memory) makes applying protections memory cards [17]. We rely on these systems to provide secure challenging. and reliable computation, communication, and data storage. To address these challenges we present EPOXY, an LLVM- based embedded compiler. We apply a novel technique, called Yet, they are built with security paradigms that have been privilege overlaying, wherein operations requiring privileged obsolete for several decades. execution are identified and only these operations execute in Embedded systems largely lack protection against code privileged mode. This provides the foundation on which code- injection, control-flow hijack, and data corruption attacks. integrity, adapted control-flow hijacking defenses, and protec- Desktop systems, as surveyed in [53], employ many defenses tions for sensitive IO are applied. We also design fine-grained randomization schemes, that work within the constraints of bare- against these attacks such as: Data Execution Prevention metal systems to provide further protection against control-flow (DEP), stack protections (e.g., stack canaries [22], separate and data corruption attacks. return stacks [31], and SafeStack [40]), diversification [49, 41], These defenses prevent code injection attacks and ROP attacks ASLR, Control-Flow Integrity [9, 18], or Code-Pointer In- from scaling across large sets of devices. We evaluate the tegrity (CPI) [40]. Consequently, attacks on desktop-class performance of our combined defense mechanisms for a suite of 75 benchmarks and 3 real-world IoT applications. Our results for systems became harder and often highly program dependent. the application case studies show that EPOXY has, on average, Achieving known security properties from desktop systems a 1.8% increase in execution time and a 0.5% increase in energy on embedded systems poses fundamental design challenges. usage. First, a single program is responsible for hardware con- figuration, inputs, outputs, and application logic. Thus, the I.INTRODUCTION program must be allowed to access all hardware resources Embedded devices are ubiquitous. With more than 9 billion and to execute all instructions (e.g., configuring memory embedded processors in use today, the number of devices has permissions). This causes a fundamental tension with best surpassed the number of humans. With the rise of the “Internet security practices which require restricting access to some of Things”, the number of embedded devices and their con- resources. Second, bare-metal systems have strict constraints nectivity is exploding. These “things” include ’s Dash on runtime, energy usage, and memory usage. This requires all button, utility smart meters, smart locks, and smart TVs. Many protections to be lightweight across these dimensions. Third, of these devices are low cost with software running directly embedded systems are purpose-built devices. As such, they on the hardware, known as “bare-metal systems”. In such have application-specific security needs. For example, an IO systems, the application runs as privileged low-level software register on one system may unlock a while on a different with direct access to the processor and peripherals, without system, it may control an LED used for debugging. Clearly the going through intervening operating system software layers. former is a security-sensitive operation while the latter is not. These bare-metal systems satisfy strict runtime guarantees on Such application-specific requirements should be supported extremely constrained hardware platforms with few KBs of in a manner that does not require the developer to make memory, few MBs of Flash, and low CPU speed to minimize intrusive changes within her application code. Combined, power and cost constraints. these challenges have meant that security protection for code With increasing network connectivity ensuring the secu- injection, control-flow hijack, and data corruption attacks are rity of these systems is critical [21, 51]. In 2016, hijacked simply left out from bare-metal systems. smart devices like CCTV cameras and digital video recorders As an illustrative example, consider the application of DEP to bare-metal systems. DEP, which enforces W ⊕ X on all Stdlib Src App Src HAL Src memory regions, is applied on desktops using a Memory LLVM Linker Management Unit (MMU), which is not present on micro- Plugin controllers. However, many modern micro-controllers have a Clang Passes GCC SafeStack peripheral called the Memory Protection Unit (MPU) that can LLVM enforce read, write, and execute permissions on regions of Bitcode Stdlib the physical memory. At first glance, it may appear that DEP Diversification can be achieved in a straightforward manner through the use LLVM Linker Privilege Plugin Overlaying ` of the MPU. Unfortunately, we find that this is not the case: GNU Linker the MPU protection can be easily disabled, because there is no isolation of privileges. Thus, a vulnerability anywhere in Options Bin Backend the program can write the MPU’s control register to disable Linker Script it. A testimony to the challenges of correctly using an MPU are the struggles existing embedded OSs have in using it for Fig. 1. The compilation work flow for an application using EPOXY. Our security protection, even for well-known protections such as modifications are shown in shaded regions. DEP. FreeRTOS [1], a popular operating system for low-end micro-controllers, leaves its stacks and RAM to be writable EPOXY on 75 benchmark applications and three representa- and executable. By FreeRTOS’s own admission, the MPU tive IoT applications that each stress different sub-systems. port is seldom used and is not well maintained [3]. This was Our performance results for execution time, power usage, evidenced by multiple releases in 2016 where MPU support and memory usage show that our techniques work within did not even compile [8, 2]. the constraints of bare-metal applications. Overheads for the To address all of these challenges, we developed EPOXY benchmarks average 1.6% for runtime and 1.1% for energy. (Embedded Privilege Overlay on X hardware with Y software), For the IoT applications, the average overhead is 1.8% for a compiler that brings both generic and system-specific protec- runtime, and 0.5% for energy. We evaluate the effectiveness tions to bare-metal applications. This compiler adds additional of our diversification techniques, using a Return Oriented Pro- passes to a traditional LLVM cross-compilation flow, as shown gramming (ROP) compiler [52] that finds ROP-based exploits. in Figure 1. These passes add protection against code injection, For our three IoT applications, using 1,000 different binaries control-flow hijack and data corruption attacks, and direct of each, no gadget survives across more than 107 binaries. manipulation of IO. Central to our design is a lightweight This implies that an adversary cannot reverse engineer a single privilege overlay, which solves the dichotomy of allowing the binary and create a ROP chain with a single gadget that scales program developer to assume access to all instructions and beyond a small fraction of devices. memory but restrict access at runtime. To do this, EPOXY In summary, this work: (1) identifies the essential com- reduces execution privileges of the entire application. Then, ponents needed to apply proven security techniques to bare- using static analysis, only instructions requiring elevated priv- metal systems; (2) implements them as a transparent runtime ileges are added to the privilege overlay to enable privileges privilege overlay, without modifying existing source code; just prior to their execution. EPOXY draws its inputs from a (3) provides state-of-the-art protections (stack protections and security configuration file, thus decoupling the implementation diversification of code and data regions) for bare-metal sys- of security decisions from application design and achieves all tems within the strict requirements of run-time, memory size, the security protections without any application code modifica- and power usage; (4) demonstrates that these techniques are tion. Combined, these protections provide application-specific effective from a security standpoint on bare-metal systems. security for bare-metal systems that are essential on modern Simply put, EPOXY brings bare-metal application security computers. forward several decades and applies protections essential for In adapting fine-grained diversification techniques [41], today’s connected systems. EPOXY leverages unique aspects of bare-metal systems, II.THREAT MODELAND PLATFORM ASSUMPTIONS specifically all memory is dedicated to a single application and the maximum memory requirements are determined a priori. We assume a remote attacker with knowledge of a generic This enables the amount of unused memory to be calculated memory corruption vulnerability, i.e., the application running and used to increase diversification entropy. EPOXY then on the itself is buggy but not malicious. adapts the protection of SafeStack [40], enabling strong stack The goal of the attacker is to either achieve code execution protection within the constraints of bare-metal systems. (e.g., injecting her own code, reusing existing code through Our prototype implemenation of EPOXY supports the ROP or performing Data-oriented Programming [37]), corrupt ARMv7-M architecture, which includes the popular Cortex- specific data, or directly manipulate security-critical outputs M3, Cortex-M4, and Cortex-M7 micro-controllers. Our tech- of a system by sending data to specific IO pins. We assume niques are general and should be applicable to any micro- the attacker exploits a write-what-where vulnerability, i.e., one controller that supports at least two modes of execution which allows the attacker to write any data to any memory (privileged and unprivileged) and has an MPU. We evaluate location that she wants. The attacker may have obtained the vulnerability through a variety of means, e.g., source code analysis, or reverse engineering the binary that runs on a different device and identifying security flaws in it. We also assume that the attacker does not have access to the specific instance of the (diversified) firmware running on the target device. Our applied defenses provide foundational protections, which are complementary to and assumed by, many modern defenses such as, the memory disclosure pre- vention work by Braden et. al. [15]. We do not protect against attacks that replace the existing firmware with a compromised Fig. 2. An example memory map showing the regions of memory commonly firmware. Orthogonal techniques such as code signing should available on an ARMv7-M architecture micro-controller. Note the cross be used to prevent this type of attack. hatched areas have an address but no memory. We make the following assumptions about the target system. First, it is running a single bare-metal application, which physical memory that is aliased, which could be in the Internal utilizes a single stack and has no restrictions on the memory RAM, Internal Flash, or External Memory. The alias itself addresses, peripherals, or registers that it can access or instruc- is specified through a hardware configuration register. Thus, tions that it can execute. This is the standard mode of execution memory mapped by the aliased region is addressable using of applications on bare-metal systems, e.g., is the case with two addresses: its default address (e.g., the address of Internal every single benchmark application and IoT application that RAM, Internal Flash, or External Memory) and address of the we use in the evaluation and that we surveyed from the vendors aliased region. This implies that a defender has to configure of the ARM-equipped boards. Second, we require the micro- identical permissions for the aliased memory region and the controller to support at least two execution privilege levels, actual memory region that it points to. A common peripheral and have a means to enforce access controls on memory for (usually a memory controller) contains a memory-mapped these privilege levels. These access controls include marking register that sets the physical memory addressed by the aliased regions of memory as read, write, and/or execute. Typically, an region. A defender must protect both the register that controls MPU provides this capability on a micro-controller. We looked which memory is aliased, in addition to the physical and at over 100 Cortex-M3, M4, and M7 series micro-controllers aliased memory locations. from ARM and an MPU was present on all but one. Micro- controllers from other vendors, such as AVR32 from , Moving up the address space we come to Internal Flash, this also have an MPU. is Flash memory that is located inside the micro-controller. On ARMv7-M devices it ranges in size from a couple KB to a III.ARCHITECTURE BACKGROUND INFORMATION couple MB. The program code and read only data are usually stored here. If no permissions are enforced, an attacker may This section presents architecture information that is needed 1 to understand the attack vectors and the defense mechanisms directly manipulate code . Address space layout randomization in EPOXY. Bare-metal systems have low level access to hard- is not applied in practice and the same binary is loaded on all ware; this enables an attacker, with a write-what-where vulner- devices, which enables code reuse attacks like ROP. Above the ability, to manipulate the system in ways that are unavailable Flash is RAM which holds the heap, stack, and global data to applications on desktop systems. Defense strategies must (initialized data and uninitialized bss sections). Common sizes consider these attack avenues, and the constraints of hardware range from 1KB to a couple hundred KB and it is usually available to mitigate threats. For specificity, we focus on smaller than the Flash. By default this area is read, write, the ARMv7-M architecture which is implemented in ARM and execute-enabled, making it vulnerable to code injection Cortex-M(3,4,7) micro-controllers. The general techniques are attacks. Additionally, the stack employs no protection and thus applicable to other architectures subject to the assumptions is vulnerable to stack smashing attacks which can overwrite laid out in Section II. We present key details of the ARMv7- return addresses and hijack the control flow of the application. M architecture, full details are in the ARMv7-M Architecture Located above the RAM are the peripherals. This area Reference Manual [11]. is sparsely populated and consists of fixed addresses which control hardware peripherals. Peripherals include: General A. Memory Map Purpose Input and Output (GPIO), serial communication In our threat model, the attacker has a write-what-where (UARTS), Ethernet controllers, cryptography accelerators, and vulnerability that can be used to write to any memory address; many others. Each peripheral is configured, and used by read- therefore, it is essential to understand the memory layout of the ing and writing to specific memory addresses called memory- system. Note that these systems use a single, unified memory mapped registers. For example, a smart lock application will space. A representative memory map illustrating the different memory regions is shown in Figure 2. At the very bottom of 1In Flash a 1 may be changed to a 0 without erasing an entire block, parity checks are also common to detect single bit flips. This restricts the changes memory is a region of aliased memory. When an access is that can directly be made to code; however, a wily attacker may still be able made to the aliased region, the access is fulfilled by accessing to manipulate the code in a malicious way. use an output pin of the micro-controller to actuate its locking mechanism. In software this will show up as a write to a fixed address. An adversary can directly open the lock by writing to the GPIO register using a write-what-where vulnerability, bypassing any authentication mechanism in the application. The second region from the top is reserved for external memory and co-processors. This may include things like external RAM or Flash. However, on many small embedded systems nothing is present in this area. If used, it is sparsely populated and the opportunities presented to an attacker are system and program specific. The final area is the System Control Block (SCB). This is a set of memory-mapped regis- Fig. 3. Diagram illustrating how the protection regions (R-x) defined in the MPU by EPOXY are applied to memory. Legend shows permissions ters defined by ARM and present in every ARMv7-M micro- and purpose of each region. Note regions R1-R3 (not shown) are developer controller. It controls the MPU configuration, interrupt vector defined. location, system reset, and interrupt priorities. Since the SCB contains the MPU configuration registers, an attacker can disable the MPU simply by writing a 0 to the lowest bit For the remainder of this paper we will use the follow- MPU CTRL ing notations to describe permissions for a memory region: of the register located at address 0xE000ED94. ? ? ? ? ? Similarly, the location of the interrupt vector table is set by (P-R W ,U-R W ,X| − ) which encodes read and write per- writing the VTOR register at 0xE000ED08. These indicate that missions for privileged mode (P), unprivileged mode (U), the SCB region is critical from a security standpoint. and execution permission for both privileged and unprivileged mode. For example, the tuple (P-RW,U-R,X) encodes a region B. Execution Privileges Modes as executable, read-write for privileged mode and executable, read-only access for unprivileged mode. Note, execute per- Like their counterparts, ARMv7-M processors can exe- missions are set for both privileged and unprivileged mode. cute in different privilege modes. However, they only support For code to be executed, read access must be granted. Thus, two modes: privileged and unprivileged. In the current default unprivileged code can be prevented from executing a region mode of operation, the entire application executes in privileged by removing read access to it. mode, which means that all privileged instructions and all memory accesses are allowed. Thus, we cannot indiscrimi- D. Background Summary nately reduce the privilege level of the application, for fear Current bare-metal system design exposes a large attack of breaking the application’s functionality. Once privileges surface—memory corruption, code injection, control-flow hi- are reduced the only way to elevate privileges is through jack attachs, writing to security-critical but system-specific IO, an exception. All exceptions execute in privileged mode and and modification of registers crucial for system operation such software can invoke an exception by executing an SVC (for as the SCB and MPU configuration. Execution privilege modes “supervisor call”) instruction. This same mechanism is used and the MPU provide the hardware foundation that can be used to create a system call in a traditional OS. to develop techniques that will reduce this vast attack surface. However, the development assumption that all instructions and . Memory Protection Unit all memory locations are accessible is in direct conflict with ARMv7-M devices have a Memory Protection Unit or MPU the security requirements, as some instructions and memory which can be used to set read, write, or execute permissions accesses can exploit the attack surface and need to be re- on regions of the physical memory. The MPU is similar to stricted. Next we present the design of our solution EPOXY, an MMU, but it does not provide virtual memory addressing. which resolves this tension by using privilege overlays, along In effect, the MPU adds an access control layer over the with various diversification techniques to remove the attack physical memory but memory is still addressed by its physical surface. addresses. The MPU defines read, write, and execute privileges for both privileged and unprivileged modes. It also enables IV. DESIGN making regions of memory non executable (“execute never” EPOXY’s goal is to apply system specific protections to in ARM’s terminology). It supports setting up to 8 regions, bare-metal applications. This requires meeting several require- numbered from 0 to 7, with the following restrictions: (1) A ments: (1) Protections must be flexible as protected areas region’s size can be from 32 Bytes to 4 GBytes, in powers of vary from system to system; (2) The compiler must enable two; (2) Each region must be size-aligned (e.g., if the region the enforcement of policies that protect against malicious is 16KB, it must start on a multiple of 16KB); (3) If there is code injection, code reuse attacks, global data corruption, and a conflict of permissions (through overlapping regions), then direct manipulation of IO; (3) Enforcement of the policies the higher numbered region’s permissions take effect. Figure 3 must satisfy the non-functional constraints—runtime, energy illustrates how memory permissions are applied. usage, and memory usage should not be significantly higher than in the baseline insecure execution. (4) The protections Memory-mapped registers, such as the MPU configuration should not cause the application developers to make changes registers, and interrupt vector offset register, are common to to their development workflow and ideally would involve no an architecture and must be protected. In our design, this is application code changes. done by configuring the MPU to only allow access to these EPOXY’s design utilizes four components to apply pro- regions (registers) from the privileged mode. tections to bare-metal systems, while achieving the above System-specific access controls: These are composed of four goals. They are: (1) access controls which limit the setting W ⊕ X on code and data, protection of the alias use of specific instructions and accesses to sensitive memory control register, and protecting any sensitive IO. W ⊕X should locations, (2) our novel privilege overlay which imposes the be applied to every system; however, the locations of code access control on the unmodified application, (3) an adapted and data change from system to system, making the required SafeStack, and (4) diversification techniques which utilize all configuration to enforce it system specific. For example, each available memory. micro-controller has different amounts of memory and a devel- oper may place code and data in different regions, depending A. Access Controls on her requirements. The peripheral that controls the aliased Access controls are used to protect against code injection memory is also system specific and needs protection and thus, attacks and defend against direct manipulation of IO. Access access to it should be set for the privileged mode only. Last, controls specify the read, write, and execute permissions what IO is sensitive varies from system to system and only for each memory region and the instructions which can the subset of IO that is sensitive need be restricted to the be executed for a given execution mode. As described in privileged mode. Section III, modern micro-controllers contain an MPU and To simplify the implementation of the correct access con- multiple execution modes. These are designed to enable DEP trols, our compiler generates the necessary system configura- and to restrict access to specific memory locations. We utilize tion automatically. At the linking stage, our compiler extracts the MPU and multiple execution modes to enforce access information (location, size, and permissions) for the code controls in our design. Using this available hardware, rather region and the data region. In addition, the developer provides than using a software only approach, helps minimize the on a per-application basis information about the location and impact on runtime, energy consumption, and memory usage. size of the alias control register and what IO is sensitive. On our target architecture, IO is handled through memory- The compiler then uses this information, along with the mapped registers as well and thus, the MPU can be used to architecture-specific access controls, to generate the MPU restrict access to sensitive IO. The counter argument to the configuration. The MPU configuration requires writing the use of the MPU is that it imposes restrictions—how many correct bits to specific registers to enforce the access controls. memory regions can be configured (8 in our chosen ARM Our compiler pass adds code to system startup to configure the architecture) and how large each region needs to be and MPU (Figure 3 and Table I). The startup code thus drops the how it should be aligned (Section III-C). However, we still privileges of the application that is about to execute, causing choose to use the MPU and this explains in part the low it to start execution in unprivileged mode. overhead that EPOXY incurs (Table II). While the MPU and the processor execution modes can enforce access controls at B. Privilege Overlay runtime they must be properly configured to enable robust protection. We first identify the proper access controls and We maintain the developer’s assumption of access to all how to enforce them. We then use the compiler to generate instructions and memory locations by using a technique that the needed hardware configuration to enforce access controls we call, privilege overlay. This technique, identifies all instruc- at runtime. Attempts to access disallowed locations trap to a tions and memory accesses which are restricted by the access fault handler. The action the fault handler takes is application controls—referred to as restricted operations—and elevates specific, e.g., halting the system, which provides the strongest just these instructions. Conceptually, this is like overlaying protects as it prevents repeated attack attempts. the original program with a mask which elevates just those The required access controls and mechanisms to enforce instructions which require privileged mode. In some ways, them can be divided into two parts: architecture dependent and this privilege overlaying is similar to an application making system specific. Architecture-dependent access controls: All an operating system call and transitioning from unprivileged systems using a specific architecture (e.g., ARMv7-M) have mode to privileged mode. However, here, instead of being a shared set of required access controls. They must restrict a fixed set of calls which operate in the operating system’s access to instructions and memory-mapped registers that can context, it creates a minimal set of instructions (loads and undermine the security of the system. The instructions that stores from and to sensitive locations and two specific instruc- require execution in privileged mode are specified in the pro- tions) that execute in their original context (the only context cessor architecture and are typically those that change special- used in a bare-metal application execution) after being given purpose registers, such as the program status register (the MSR permissions to perform the restricted operation. By elevating and CPS instructions). Access to these instructions is limited just those instructions which perform restricted operations by executing the application by default in unprivileged mode. through the privilege overlay, we simplify the development process and by carefully selecting the restricted operations, An important observation enables EPOXY to identify most we limit the power of a write-what-where vulnerability. restricted accesses. In our case, the memory addresses being Privilege overlaying requires two mechanisms: A mecha- accessed are memory-mapped registers. In software, these nism to elevate privileges for just the restricted operations accesses are reads and writes to fixed addresses. Typically, and a mechanism to identify all the restricted operations. a Hardware Abstraction Layer (HAL) is used to make these Architectures employing multiple execution modes provide accesses. Our study of HAL’s identified three patterns that a mechanism for requesting the execution of higher level cover most accesses to these registers. The first pattern uses software. On ARM, this is the SVC instruction which causes a macro to directly access a hard-coded address. The second an exception handler to be invoked. This handler checks if the pattern uses a similar macro and a structure to access fixed call came from an authorized location, and if so, it elevates offsets from a hard-coded address. The last pattern uses a the execution mode to the privileged mode and returns to the structure pointer set to a hard-coded address. All use a hard- original context. If it was not from an authorized location, coded address or fixed offsets from them. The use of hard- then it passes the request on to the original handler without coded addresses, and fixed offsets from them, are readily elevating the privilege, i.e., it denies the request silently. The identifiable by static analysis. compiler identifies each restricted operation and prepends it Our static analysis uses backward slicing to identify these with a call to the SVC handler and, immediately after the accesses. A backward slice contains all instructions that af- restricted operation, adds instructions that drop the execution fect the operands of a particular instruction. This enables privileges. Thus, each restricted operation executes in priv- identifying the potential values of operands at a particular ileged mode and then immediately returns to unprivileged location in a program. We limit our slices to a single function mode. and examine only the definitions for the address operand of The restrictions in the way MPU configuration can be spec- load and store operations. Accesses to sensitive registers are ified, creates challenges for EPOXY. The MPU is restricted identified by checking if the address being accessed is derived to protecting blocks of memory of size at least 32 Bytes, from a constant address. This static analysis captures many of and sometimes these blocks include both memory-mapped the restricted memory accesses; however, not all accesses can registers that must be protected to ensure system integrity, be statically identified and manual annotations (likely by the and those which need to be accessed for correct functionality. developer) are required in these cases. Note that we observed For example, the Vector Table Offset Register (VTOR) and few annotations in practice and most are generic per hardware the Application Interrupt and Reset Control Register (AIRCR) platform, i.e., they can be provided by the manufacturer. This are immediately adjacent to each other in one 32 Byte region. primarily occurs when memory-mapped registers are used as The VTOR is used to point to the location of the interrupt arguments in function calls or when aliasing of memory- vector table and is thus a security critical register, while the mapped registers occurs. Aliasing occurs when the register AIRCR is used (among other things) for the software running is not directly referenced, but is assigned to a pointer, and on the device to request a system reset (say, to reload a new multiple copies of that pointer are made so that the register is firmware image) and is thus not security critical. There is no now accessible via many different pointers. These point to two way to set permissions on the VTOR without also applying limitations of our current static analysis. Our backward slicing the same permissions to the AIRCR. EPOXY overcomes this is limited to a single function and with some bounded engi- restriction by adding accesses to the AIRCR to the privilege neering effort, we can expand it to perform inter-procedural overlay, thus elevating accesses whenever the AIRCR is being analysis. To overcome the second limitation though requires accessed. precise alias analysis, which is undecidable in the general case [50]. However, embedded programs—and specifically access C. Identifying Restricted Operations to memory mapped registers—are constrained in their program To identify restricted operations we utilize static analysis structures reducing the concern of aliasing in this domain. and optionally, source code annotations by the developer. Using static analysis enables the compiler to identify many of D. Modified SafeStack the restricted operations, reducing the burden on the developer. EPOXY defends against control-flow hijacking attack by We use two analyses to identify restricted operations; one employing SafeStack [40], modified to bare-metal systems. for restricted instructions and a second to identify restricted SafeStack is a protection mechanism that uses static analysis to memory accesses. Restricted instructions are defined by the move local variables which may be used in an unsafe manner Instruction Set Architecture (ISA) and require execution in to a separate unsafestack. A variable is unsafe if it may access privileged mode. For the ARMv7-M architecture these are the memory out-of-bounds or if it escapes the current function. For CPS and MSR instructions, each of which controls specific example, if a supplied parameter is used as the index of an flags in the program status register, such as enabling or array access, the array will be placed on the unsafestack. It disabling interrupt processing. These privileged instructions utilizes virtual addressing to isolate the unsafestack from the are identified by string matching during the appropriate LLVM rest of the memory. By design, return addresses are always pass. Identifying restricted memory accesses however is more placed on the regular stack because they have to be protected challenging. from illegal accesses. SafeStack ensures that illegal accesses may only happen on items on the unsafestack. In addition TABLE I to its security properties, Safestack has low runtime overhead THE MPU CONFIGURATION USED FOR EPOXY.FOROVERLAPPING REGIONSTHEHIGHESTNUMBEREDREGION (R) TAKES EFFECT. (generally below 1% [40] §5.2) and a deterministic impact on stack sizes makes it a good fit for bare-metal systems. R Permissions Start Addr Size Protects The deterministic impact means—assuming known maximum 0 P-RW,U-RW,XN 0x00000000 4GB Default 4 None Varies 32B unsafestack Guard bounds for recursion—the maximum size for both the regular 5 P-RW,U-R,XN 0xE000E000 4KB SCB and unsafestack is fixed and can be determined a priori. Use 6 P-RW,U-R,XN 0x40013800 512B Alias. Ctrl. Reg of recursion without knowing its bounds is bad design for 7 P-R,U-R,X 0x00000000 256MB Executable Code bare-metal systems. While the low runtime overhead of SafeStack makes it Our basis template uses five regions as shown in Table I. suitable for bare-metal systems, it needs an isolated memory Region 0 encodes default permissions. Using region 0 ensures region to be effective. The original technique, deployed on all other regions override these permissions. We then use the architectures, relied on hardware support for isolation highest regions and work down to assign permissions to ensure (either segmentation or virtual memory) to ensure low over- that the appropriate permissions are enforced. Region 7 is used head. For example, it made the safe region accessible through to enforce W ⊕ X on executable memory. This region covers a dedicated segment register, which is otherwise unused, and both the executable memory and its aliased addresses starting configured limits for all other segment registers to make the at address 0. The three remaining regions (4-6) can be defined region inaccessible through them (on x86). Such hardware in any order and protect the SCB, alias control register, and segment registers and hardware protection are not available the unsafestack guard. in embedded architectures. The alternate pure software mech- The template can be modified to accommodate system anism based on Software Fault Isolation [56] would be too specific requirements, e.g., changing the start address and size expensive for our embedded applications because it requires of a particular region. For example, the two micro-controllers that all memory operations in a program are masked. While on used for evaluation place the alias control register at different some architectures with a large amount of (virtual) memory, physical addresses. Thus, we modified the start address and this instrumentation can be lightweight (e.g., a single and size for each micro-controller. Regions 1-3 are unused and operation if the safe region occupies a linear part of the address can be used to protect sensitive IO that is application specific. space – encoded in a mask, resulting in about 5% overhead), To do this, the start address and size cover the peripheral and here masking is unlikely to work because the safe region permissions are set to (P-RW,U-RW,XN). The addresses for will occupy a smaller and unaligned part of the scarce RAM all peripherals are given in micro-controller documentation memory. provided by the vendor. The use of the template enables system Therefore, to apply the SafeStack principle to bare-metal specific access controls to be placed on the system. It also systems, we place the unsafestack at the top of the RAM, and decouples the development of access control mechanisms and make the stack grow up, as shown in Figure 4a. We then place application logic. a guard between the unsafestack and the other regions in RAM, We implemented a pass in LLVM that generates code to shown as the black region in the figure. This follows best configure the MPU based on the template. The code writes practices for embedded systems to always grow a stack away the appropriate values to the MPU configuration registers to from other memory regions. The guard is created as part of enforce the access controls given in the template, and then the MPU configurations generated by the compiler. The guard reduces execution privileges. The code is called at the very region is inaccessible to both privileged and unprivileged beginning of main. Thus all of main and the rest of the code (i.e., privileges are (P-,W-,XN)). Any overflow on the program executes with reduced privileges. unsafestack will cause a fault either by accessing beyond the bounds of memory, or trying to access the guard region. It B. Privilege Overlays also prevents traditional stack smashing attacks because any Privileged overlay mechanisms (i.e., privilege elevation and local variable that can be overflown will be placed on the restricted operation identification) are implemented using an unsafestack while return addresses are placed on the regular LLVM pass. To elevate privileges two components are used. stack. Our design for the first time provides strong stack They are a privilege requester and a request handler. Requests protection on bare-metal embedded systems. are made to the handler by adding code which performs the operations around restricted operations, as shown in Algorithm . IMPLEMENTATION 1. This code saves the execution state and executes a SVC A. Access Controls (SVC FE) to elevate privileges. The selected instructions are We developed a prototype implementation of EPOXY, then executed in privileged mode, followed by a code sequence building on LLVM 3.9 [42]. In our implementation, access that drops privileges by setting the zero bit in the control controls are specified using a template. The template consists register. Note that this sequence of instructions can safely be of a set of regions that map to MPU region configurations executed as part of an interrupt handler routine as interrupts (see Section III-C for the configuration details). Due to current execute with privileges and, in that mode, the CPU ignores hardware restrictions, a maximum of 8 regions are supported. both the SVC instruction and the write to the control register. Fig. 4. Diagrams showing how diversification is applied. (a) Shows the RAM layout with SafeStack applied before diversification techniques are applied. (b) Shows RAM the layout after diversification is applied. Note that unused memory (gray) is dispersed throughout RAM, the order of variables within the data section (denoted 1-7) and bss section (greek letters) are randomized. Regions A, B, C, and D are random sizes, and G is the unsafestack guard region. (c) Layout of functions before protection; (d) Layout of functions after trapping and randomizing function order.

Algorithm 1 Procedure used to request elevated privileges interrupt sources need to be intercepted by the request handler. 1: procedure REQUEST PRIVILEGED EXECUTION Privileged requests are injected for every identified restricted 2: Save Register and Flags State operation. The static analyses used to identify restricted op- 3: if In Unprivileged Mode then 4: Execute SVC FE (Elevates Privileges) erations are implemented in the same LLVM pass. It adds 5: end if privilege elevation request to all CPS instructions, and all 6: Restore Register and Flags MSR instructions that use a register besides the APSR regis- 7: Execute Restricted Operation ters. These instructions require execution in privileged mode. 8: Set Bit 0 of Control Reg (Reduces Privileges) To detect loads and stores from constant addresses we use 9: end procedure LLVM’s use-def chains to get the back slice for each load and store. If the pointer operand can be resolved to a constant Algorithm 2 Request handler for elevating privileges address it is checked against the access controls applied in 1: procedure HANDLE PRIVILEGE REQUEST the MPU. If the MPU’s configuration restricts that access a 2: Save Process State privilege elevation request is added around the operation. This 3: if Interrupt Source == SVC FE then identifies many of the restricted operations. Annotations can 4: Clear bit 0 of Control Reg (Elevates Privileges) be used to identify additional restricted operations. 5: Return 6: else 7: Restore State C. SafeStack and Diversification 8: Call Original Interrupt Handler The SafeStack in EPOXY extends and modifies the SafeS- 9: end if tack implemented in LLVM 3.9. Our changes enable support 10: end procedure for the ARMv7-M architecture, change the stack to grow up, and use a global variable to store the unsafestack pointer. Stack offsets are applied with global data randomization. Global data The request handler intercepts three interrupt service rou- randomization is applied using a compiler pass. It takes the tines and implements the logic shown in Algorithm 2. The amount of unused RAM as a parameter which is then randomly handler stores register state (R0-R3 and LR – the remaining split into five groups. These groups specify how much memory registers are not used) and checks that the caller is an SVC can be used in each of the following regions: stack offset, FE instruction. Authenticating the call site ensures that only data region, bss region, unsafestack offset, and unused. The requests from legitimate locations are allowed. Due to W ⊕X, number of bytes added to each section is a multiple of four no illegal SVC FE instruction can be injected. If the interrupt to preserve alignment of variables on word boundaries. The was caused by something other than the SVC FE instruction data and bss region diversity is increased by adding dummy the original interrupt handler is called. variables to each region. Note that adding dummy variables The request handler is injected by the compiler by intercept- to the data regions increases the Flash used because the initial ing three interrupt handlers. These are: the SVC handler, the values for the data section are stored as an array in the Flash Hard Fault handler, and the Non Maskable Interrupt handler. and copied to RAM at reset. However, Flash capacity on a Note that executing an SVC instruction causes an interrupt. micro-controller is usually several times larger than the RAM When interrupts are disabled the SVC results in a Hard Fault. capacity and thus, this is less of a concern. Further an option Similarly, when the Fault Mask is set all interrupt handlers can be used to restrict the amount of memory for dummy except the Non-Maskable Interrupt handler are disabled. If variables in the data section. Dummy variables in the bss do an SVC instruction is executed when the fault mask is set it not increase the amount of Flash used. causes a Non-Maskable Interrupt. Enabling and disabling both Another LLVM pass is used to randomize the function order. interrupts and faults are privileged operations, thus all three This pass takes the amount of memory that can be dispersed throughout the text section. It then disperses this memory TABLE II between the function by adding trap functions to the global THERUNTIMEANDENERGYOVERHEADSFORTHEBENCHMARKS EXECUTINGOVER 2 MILLIONCLOCKCYCLES.COLUMNSARE SAFESTACK function list. The global function list is then randomized, ONLY (SS), PRIVILEGE OVERLAY ONLY (PO), ANDALLPROTECTIONSOF and the linker lays out the functions in the shuffled order in EPOXY APPLIED, AVERAGED ACROSS 20 VARIANTS (ALL), ANDTHE the final binary. A trap function is a small function which, NUMBEROFCLOCKCYCLESEACHBENCHMARKEXECUTED, INMILLIONS OFCLOCKCYCLES.AVERAGEISFORALL 75 BENCHMARKS if executed, jumps to a fault handler. These traps are never executed in a benign execution and thus incur no runtime % Runtime %Energy Clk overhead but detect unexpected execution. Benchmark SS PO All SS PO All crc32 0.0 0.0 2.9 -0.1 -0.6 2.5 2.2 sg..insearch 0.0 0.2 -1.0 -0.2 -0.9 0.5 2.2 VI.EVALUATION ndes 2.9 -0.2 1.3 2.4 1.2 3.4 2.4 We evaluate the performance of EPOXY with respect to the levenshtein 1.5 0.0 3.0 1.7 0.8 3.8 2.6 design goals, both in terms of security and resource overhead. sg..quicksort -2.3 0.0 -1.4 -2.8 -0.5 -0.3 2.7 slre -1.5 -0.3 5.3 -2.0 -0.3 8.1 2.9 We first evaluate the impact on runtime and energy using a set sgl..htable -0.6 0.0 2.0 -1.0 -0.7 3.4 2.9 of benchmarks. We then use three real-world IoT applications sgl..dllist -0.6 0.0 0.7 0.3 -0.1 2.6 3.7 to understand the effects on runtime, energy consumption, and edn 0.0 -0.1 0.8 1.9 1.5 4.2 3.8 sg..insertsort -0.3 0.0 1.7 -0.1 -1.6 1.6 3.9 memory usage. Next, we present an evaluation of the effec- sg..heapsort 0.0 0.0 -0.5 -0.1 1.4 1.9 4.0 tiveness of the security mechanisms applied in EPOXY. This sg..queue -7.3 0.0 -7.3 -4.2 -0.9 -3.4 4.6 includes an evaluation of the effectiveness of diversification sg..listsort -0.4 0.0 0.7 -0.1 -0.5 2.4 4.9 to defeat ROP-based code execution attacks and discussion fft 0.0 0.4 0.4 -0.1 0.6 -0.3 5.1 bubblesort 0.0 0.0 1.7 -0.1 1.0 2.6 6.8 of the available entropy. We complete our evaluation by matmult int 0.0 0.0 1.2 -0.1 -0.4 0.7 6.8 comparing our solution to FreeRTOS with respect to the three adpcm 0.0 0.1 -0.4 0.1 2.3 0.6 7.3 IoT applications. sglib rbtree -0.2 -0.1 2.4 0.1 -0.7 3.7 7.4 mat..float 0.0 0.6 0.7 0.0 0.1 1.2 8.6 Several different kinds of binaries are evaluated for each frac 1.6 2.0 1.7 2.4 2.8 4.0 9.9 program using different configurations of EPOXY these are: st 0.0 0.1 0.4 -0.9 -0.3 1.2 19.0 (1) unmodified baseline, (2) privilege overlays (i.e., applies huffbench 1.3 0.0 1.5 7.3 1.2 4.5 20.9 privilege overlaying to allow the access controls to protect fir -1.0 -1.0 1.7 -2.0 1.5 3.1 21.0 cubic -0.2 0.2 0.1 0.0 -0.2 0.6 30.1 system registers and apply W ⊕ X.), (3) SafeStack only, and stb perlin 0.0 -1.3 0.0 0.0 -3.0 0.4 31.6 (4) fully protected variants that apply privileged overlaying, mergesort -0.2 0.5 2.1 -1.0 -0.4 3.1 44.0 SafeStack, and software diversity. We create multiple variants qrduino 0.0 0.0 -1.2 -0.1 -0.7 -0.6 46.0 of a program (20 is the default) by providing EPOXY a unique picojpeg 0.0 -0.4 -2.4 0.0 0.0 0.2 54.3 blowfish -0.4 0.0 -1.3 1.4 -1.3 0.5 56.9 diversification seed. All binaries were compiled using link time dijkstra 0.0 -0.1 -8.7 -0.1 0.0 -7.3 70.5 optimization at the O2 level. rijndael -1.1 0.0 0.1 -0.6 -0.4 2.0 94.9 We used two different development boards for our sqrt 0.0 2.1 1.4 0.0 1.8 2.1 116.2 whetstone -0.4 -0.3 0.1 0.8 0.3 1.6 135.5 experiments the STM32F4Discovery board [6] and the nbody 1.1 1.1 0.4 0.9 0.9 2.5 139.0 STM32F479I-Eval [5] board. Power and runtime were mea- fasta 0.0 0.0 0.4 0.1 0.4 1.2 157.1 sured using a logic analyzer sampling execution time at wikisort 0.3 0.9 2.1 0.2 0.1 3.0 179.6 100Mhz. Each application triggers a pin at the beginning and lms 0.0 0.1 0.6 -0.1 0.3 0.2 225.2 sha -3.5 0.0 -3.7 -1.3 -0.2 0.2 392.9 at the end of its execution event. A current sensor with power Average 0.1 0.1 1.1 0.2 -0.2 2.5 26.3 resolution of 0.5 µW was attached in series with the micro- controller’s power supply enabling only the power used by the the runtime and energy consumption for 64 iterations of the micro-controller to be measured. The analog power samples benchmark for each binary. were taken at 125 KHz, and integrated over the execution time Across the 75 benchmarks the average overhead is 1.6% to obtain the energy consumption. for runtime and 1.1% for energy. The largest increase is on cover 14.2% runtime, 17.9% energy and largest decrease on A. Benchmark Performance Evaluation compress (-11.7% runtime, -10.2% energy). ctl stack is the To measure the effects of our techniques on runtime and en- only other benchmark that has a change in runtime (13.1%) or ergy we use the BEEBs benchmarks [47]. The BEEBs’ bench- energy (15.8%) usage that exceeds ±10%. Table II shows the marks are a collection of applications from MiBench [34], runtime and energy overheads for the benchmarks executing WCET [33] and DSPstone [60] benchmarks. They were de- over 2 million clock cycles. The remaining benchmarks are signed and selected to measure execution performance and en- omitted for space. We find runtime is the biggest factor in ergy consumption under a variety of computational loads. We energy consumption—the Spearman’s rank correlation coeffi- selected the 75 (out of 86) BEEBs’ benchmarks that execute cient is a high 0.8591. for longer than 50,000 clock cycles, and thus, providing a fair The impact on execution time can be explained by the comparison to real applications. For reference, our shortest IoT application of SafeStack (e.g., sg..queue in Table II) and diver- application executes over 800,000 clock cycles. Each is loaded sification. Modest improvements in execution time were found onto the Discovery board and the logic analyzer captures by the creators of SafeStack ([40] §5.2), the primary cause being improvements in locality. Likewise, our improvements B. Application Performance Evaluation come from moving some variables to the unsafestack. These typically tend to be larger variables like arrays. This increases Benchmarks are useful for determining the impact of our the locality of remaining variables on the regular stack and techniques under controlled conditions. To understand the enables them to be addressed from offsets to the stack pointer, overall effects on realistic applications, we use three represen- rather than storing base addresses in registers and using offsets tative IoT applications. Our first program, PinLock, simulates from these. This frees additional registers to store frequently a simple IoT device like a door lock. It requests a four digit used variables, thus reducing register spilling, and consequent pin be entered over a serial port. Upon reception the pin is writes and reads to the stack, thereby improving execution hashed, using SHA1, and compared to a precomputed hash. time. The impact of the privilege overlay on the running If the hashes match, an LED is turned on, indicating the time is minimal because these benchmarks have few restricted system is unlocked. If an incorrect pin is received the user is operations in them and the setups due to EPOXY (such as prompted to try again. In this application the IO is restricted MPU configuration) happen in the startup phase which is not to privileged mode only, thus each time the lock is unlocked, measured for calculating the overhead. privileged execution must first be obtained. This demonstrates EPOXY’s ability to apply application specific access controls. Diversification changes execution time in two ways. The We repeatedly send an incorrect pin followed by the correct first is locality of functions and variables relative to each other. pin and measure time between successful unlocks. The baud Consider separately the case of a control-flow transfer and a rate (115,200 max standard rate) of the UART communications memory load/store. When a control-flow transfer is done (say is the limiting factor in how fast login attempts are made. a branch instruction) and the target is close by, then the target We also use two vendor applications provided with the address is created relative to the PC and control flow is trans- STM32F479I-Eval board. The FatFS-uSD program imple- ferred to that address (1 instruction). On the other hand, if the ments a FAT file system on a micro-SD card. It creates a target address is farther off, then a register is loaded with the file on the SDCard, writes 1KB of data to the file and then address (2 instructions) and control transferred to the content reads back the contents and checks that they are the same. We of the register (1 instruction). Sometimes diversification puts measure the time it takes to write, read and verify the file. The the callee and called function farther apart than in the baseline TCP-Echo application implements a TCP/IP stack and listens in which case the more expensive operation is used. In other for a packet on the Ethernet connection. When it receives a cases the opposite occurs, enabling less expensive (compared packet it echoes it back to the receiver. We measure the time to the baseline) control transfer to be used. Similarly, when it takes to send and receive 1,000 packets, with requests being a memory load (or store) is done from a far off location, a sent to the board fast enough to fully saturate the capabilities new register needs to be loaded with the address and then of the STM32F479I-Eval board (i.e., computation on the board the location accessed (3 instructions), while if it were to a is the limiting factor in how fast packets are sent and received). location near an address already in a register, then it can For each of the three applications we create the same set be accessed using an offset from that register as the base of binaries used for the benchmarks: baseline, SafeStack only, address (1 instruction). The dispersed accesses also uses more privilege overlay only, and 20 variants with all protections registers, increasing register pressure. of EPOXY. To obtain runtime and energy consumption we average 10 executions of each binary. Percent increase relative Another effect of diversification is even more subtle and to the baseline binary is taken for each binary. The average architecture specific. In our target ARM architecture, when a runtime overhead is 0.7% for PinLock, 2.4% for FatFS-uSD, caller invokes a function, general-purpose registers R0-R3 are and 2.1% for TCP-Echo. Figure 5a shows the execution time assumed to be used and overwritten by the callee function overheads as a whisker plot. In the worst case among all and therefore the compiler does not need to save the values executions of all applications protected with EPOXY, the of those registers in the callee context. Thus the compiler runtime overhead is 6.5% occurring on TCP-Echo. Again gives preference to using R0-R3 when allocating registers. Due we see energy consumption is closely related to execution to our register randomization this preference is not always time. Each application’s average energy overheads are: −2.9% followed, and other general purpose registers (R4-R13) are for PinLock, 2.6% for FatFS-uSD and 1.8% for TCP-Echo. used more often than they are in the baseline case. When R4- Figure 5b shows the energy consumption overheads, with R13 are used they first must be saved to, and restored from a noticeable difference: PinLock has a very tight runtime the stack, decreaseing performance. To partially alleviate this distribution, and a relatively wide energy distribution. This performance hit, EPOXY in its register randomization favors application is IO bound and the application is often waiting to the use of the registers R0-R3 in the callee function through a receive a byte over the serial port, due to the slow serial con- non-uniform stochastic process, but does not deterministically nection, causing the time variation to be hidden. However, the enforce this. Reassuringly, the net effect from all the instances changed instruction mix due to EPOXY still causes variation of the diversification is only a small increase in the runtime— in energy overhead. a worst case of 14.7% and an average of 1.1% across all the Changes in memory usage are shown in Table III. It shows benchmark applications. the averages of increase to code (text section), global data (data eoyuae tas ral eue h udno the on burden the reduces greatly and also consumption non- It the energy usage. within runtime, operate memory of protections constraints EPOXY’s functional that find we TCP-Echo. deepest or the PinLock SafeStack of for beyond needed size memory, not stack additional is the Thus, increases path. is it execution memory and when extra save needed privileges—but only memory—to elevating while additional Privilege state required. require restore is also path may only execution stack, overlays deepest single a the has for which memory baseline, the for comparison, In deepest the and for in memory requires bytes stack 128 the the stack and splitting the PinLock because increases SafeStack. in requirements, SafeStack TCP-Echo. applying increase and from the FatFS-uSD come all both size for majority stack accounts The in It variables. increase of the by alignment of preserve caused the to are added used for in data bytes be bytes fit on could (4 still Impacts micro-controller SafeStack protections. would same of EPOXY’s bytes the size with 3,390 thus text Flash, additional baseline 16KB a the has bytes, PinLock which For 11,788 application) code. for applications smallest bytes three additional (the the 3,390 all that than find less we needed to all, as In compiler code. and the varying cause code, emit can new second diversification injects a previously directly manage discussed code overlaying to the privilege instructions increases additional stack, SafeStack requiring size. code by the each size diversifica- affect of all variants and can 20 overlaying, tion the privilege for usage SafeStack, stack application. and sections), bss and only SafeStack binary. the only shows overlay privilege energy diamond and the The shows (a) applications. time star execution IoT the in and three increase binary, the percent showing for plots (b) Box 5. Fig. TCP-Echo FatFs-uSD PinLock App rmtepromneadmmr sg requirements usage memory and performance the From I sum CES NMMR SG O THE FOR USAGE MEMORY IN NCREASE

fteeeuinptswt h eps eua stack regular deepest the with paths execution the of PinLock PLIGALOF ALL APPLYING (a) ,4 8)72(0%) 7.2 Data Global (1%) 18.2 (8%) 3,249 (1%) 14.6 (12%) 2,839 (29%) 3,390 Text

unsafestack FatFS-uSD

TCP-Echo AL III TABLE cosalpsil xcto paths. execution possible all across EPOXY’ unsafestack PROTECTIONS S I

O PinLock T 2 2% 0 (1%) 36 (29%) 128 0 (3%) 128 Over. Priv. (25%) 104 SafeStack PLCTOSFROM APPLICATIONS (b) one) n few a and pointer),

FatFS-uSD Stack .

TCP-Echo taeisdsrbt n nsdmmr ihntedt,bss, data, Let the regions. diversification within text Our memory and memory. unused of any amount distribute availablestrategies the diversity by of constrained amount also is the This Ultimately using registers. variables. programming and global Data-oriented for data, against diversification code, protection the uses provides in EPOXY locations attack, function corruption data and privileged current in the is to execution relative entire mode. the reduction which sharp small in a a state-of-practice is in to results and 5 This surface average each. attack on within executed that are and the instructions small that 7 shows is It overlays IV. privilege Table of in variants shown number 20 are the applications for IoT a the results wrote of and The We application overlays. the privilege overlay. of all the code identifies assembly within the addressing parses which for overlay) verifier, have privilege used many the are how to that and (external each how registers each, occur, for defined in overlays externally executed measure many are how we instructions applications, overlays, many IoT the privilege three into the the insight of gain by To posed system. undermine embedded risk could the privilege of elevated the security part nec- attack, the as reuse and executed code desired is a operation is of restricted this the context if original However, essary. its in and execution, its evaluate we and probabilistic is by one or last coverage. deterministic, the are of two that first design, the While poten- of addresses. guarantees isolating return security by from the protection variables attacks stack effective ROP exploitable and tially provides smashing which stack against SafeStack, incorporates adapted EPOXY an vulnerability. write-what-where a using disable Our or protections. bypass to other for foundational W is EPOXY and First, injection protection? code of useful application provide the it enables does but mance, Evaluation Security C. applications. operations IoT (TCP-Echo) the 25 on and elevated and 31(FatFs-uSD), compiler cases Our (PinLock), these annotation. 35 manual handle for to need compiler the inter- remove be the the to allow adding analysis by will our done Extending procedural address. was constant store a a to and offset passed functions were to offsets arguments because as required all developers. were on pre- annotations burden provide The the could reducing writers further HAL libraries, envision We annotated annotations IO. 7 its the additional protect across an to required shared PinLock is applications. CMSIS components—which 77 ARM’s (HAL) ARM in Library common were Abstraction made annotations for Hardware 10 all C-language of were total library—a annotations a These all), in made. TCP- applications and (77 FatFS-uSD, benchmarks, Echo BEEBs all For developer. )Diversification: 2) Verifier: 1) perfor- and usability for goals design the meets EPOXY ⊕ X ehns lopoet gis tak hc attempt which attacks against protects also mechanism ahrsrce prto sgatdprivileged granted is operation restricted Each W S ofrhrmtgt oerueattacks reuse code mitigate further To ⊕ eoeteaon fsakmemory slack of amount the denote X W ymnpltn ytmregisters system manipulating by ⊕ X rvnpoeto against protection proven a , and R denote the size of the region (any one of the three above, TABLE V depending on which kind of diversification we are analyzing). NUMBEROF ROP GADGETSFOR 1,000 VARIANTS THE IOT APPLICATIONS.LAST INDICATES THE LARGEST NUMBER OF VARIANTS For the text region S is the amount of unused Flash, and for FORWHICHONEGADGETSURVIVES. the data and bss regions S is the amount of unused RAM. Then the total amount of memory available for diversifying Num. Surviving App Total 2 5 25 50 Last any particular region is R+S—say for the global data region, PinLock 294K 14K 8K 313 0 48 the variable can be placed anywhere within R and the slack FatFs-uSD 1,009K 39K 9K 39 0 32 memory S can be split up and any piece “slid” anywhere TCP-Echo 676K 22K 9K 985 700 107 within the data region. Since each is randomized by adding variables or jump instructions with a size of 4 bytes the total gadgets have the best chance of surviving across variants. number of locations for a pointer is (R + S)/4. The number of gadgets located at the same location with the Let us consider PinLock, our smallest example. It uses 2,376 same instructions were then counted across the 1,000 variants. bytes of RAM and would require a part with 4,096 bytes of To define the metric “number of gadgets surviving across x RAM, leaving 1,720 bytes of slack. PinLock’s data section is variants” consider a gadget that is found at the same location 1,160 bytes, thus a four byte pointer can have 720 locations or and with the identical instructions across all x variants. Count over 9 bits of entropy. This exceeds ’s kernel level ASLR up all such gadgets and that defines the metric. This is a (9 bits, [29] Section IV), and unlike Linux’s ASLR, disclosure well-used metric because the adversary can then rely on the of one variable does not reveal the location of all others. gadget to craft the control-flow hijacking attack across all the The text region is 11,788 bytes which means at least 16KB x variants. Clearly, as x goes up, this metric is expected to of Flash would be used. Since all Flash can be used except decrease. Table V shows the number of gadgets that survived the region used for storing initial values for the data region across a given number of variants. To interpret this, consider (maximum of 1,556 bytes in PinLock), the text section can that for the column “2”, this number is the count of gadgets be diversified across 15,224 bytes. This enables approximately which survived across 2 or more variants of the program. The 3,800 locations for a function to be placed, which translates to last remaining gadget survived across 48 variants of PinLock, entropy of just under 12 bits. Entropy is ultimately constrained only 32 variants of FatFS-uSD, and 107 variants of TCP-Echo. due to the small size of memory but, similar to kernel ASLR, If a ROP attack only needs the single gadget which survives an attacker cannot repeatedly guess as the first wrong guess across the maximum number of variants—an already unlikely will raise a fault and stop the system. event—it would work on just over 10% of all variants. This 3) ROP analysis: To understand how diversity impacts code shows that our code diversification technique can successfully reuse attacks we used the ROPgadget [52] ROP compiler. break the attacker’s ability to use the same ROP attack against This tool disassembles a binary and identifies all the available a large set of binaries. ROP gadgets. A ROP gadget is a small piece of code ending in a return statement. It provides the building block for an D. Comparison to FreeRTOS attacker to perform ROP attacks. ROP attacks are a form of control hijack attacks which utilize only the code on the sys- Porting an application to FreeRTOS-MPU could provide tem, thus bypassing code integrity mechanisms. By chaining some of the protections EPOXY provides. Compared to multiple gadgets together, arbitrary (malicious) execution can EPOXY, FreeRTOS-MPU does not provide W ⊕ X or code be performed. By measuring surviving gadgets across different reuse defenses. FreeRTOS-MPU provides privilege separation variants we gain an understanding of how difficult it is for an between user tasks and kernel task. attacker to build a ROP attack for a large set of binaries. User tasks running in unprivileged mode can access their For each of the three applications, we identify gadgets stack and three user definable regions if it wishes to share individually in each of 1,000 variants. Each variant had all some data with another user mode task. A kernel task runs protections applied. To obtain the gadgets, ROPgadget parsed in privileged mode and can access the entire memory map. A each file and reported all gadgets it found including duplicates. user task that needs to perform a restricted operation can be ROPgadget considers a duplicate to be the same instructions started in privileged mode but then the entire execution of the but at a different location, by including these we ensure that user task will be in privileged mode. If the privilege level is dropped, then it cannot be elevated again for the entire duration of the user task, likely a security feature in FreeRTOS-MPU. TABLE IV We compare our technique to using FreeRTOS-MPU by RESULTS OF OUR VERIFIER SHOWING THE NUMBER OF PRIVILEGE OVERLAYS (PO), AVERAGE NUMBER OF INSTRUCTIONS IN AN OVERLAY porting PinLock to FreeRTOS-MPU. The vendor, STMicro- (AVE), MAXIMUMNUMBEROFINSTRUCTIONSINANOVERLAY (MAX), electronics, provided equivalent applications for FatFS-uSD AND THE NUMBER OF PRIVILEGE OVERLAYS THAT USE EXTERNALLY and TCP-Echo that use FreeRTOS; we added MPU support to DEFINED REGISTERS FOR ADDRESSING (EXT). these application. This required: 1) Changing linker and startup App PO Ave Max Ext code of the application to be compatible with the FreeRTOS- PinLock 40 7.0 53 15 MPU memory map. 2) Changing the existing source code to FatFs-uSD 31 5.0 20 0 TCP-Echo 25 5.2 20 0 use FreeRTOS-MPU specific . 3) If any part of a task TABLE VI applied, by a compiler [36, 13, 14, 35, 32, 38, 45, 15] or by COMPARISON OF RESOURCE UTILIZATION AND SECURITY PROPERTIES OF binary rewriting [48, 25]. With the exceptions of [32, 45, 15] FREERTOS-MPU(FREERTOS) VS.EPOXY SHOWING MEMORY USAGE, TOTALNUMBEROFINSTRUCTIONSEXECUTED (EXE), ANDTHENUMBER these works target the applications supported by an OS, OFINSTRUCTIONSTHATAREPRIVILEGED (PI). and assume virtual address space to create large entropy. Mclaughlin et al. [45] propose a firmware diversification App Tool Code RAM Exe PI EPOXY 16KB 2KB 823K 1.4K technique for smart meters, using compiler rewriting. They PinLock FreeRTOS 44KB 30KB 823K 813K give analytically results on how it would slow attack prop- EPOXY 27KB 12K 33.3M 3.9K FatFs-uSD agation through smart meters. They give no analysis with FreeRTOS 58KB 14KB 34.1M 33.0M EPOXY 43KB 35KB 310.0M 1.5K respect to execution time overhead or energy consumption. TCP-Echo FreeRTOS 74KB 51KB 321.8M 307.0M Giuffrida et al. [32] diversify the stack by adding variables to stack frames, creating a non-deterministic stack size which is not suitable for embedded systems. EPOXY applies compile- required a privileged operation, then the entire task must run time diversification and utilizes techniques appropriate to their with full privileges (e.g., task initializing TCP stack). constraints. Braden et al. [15] focus on creating memory Table VI shows the code size, RAM size, number of leakage resistant applications without hardware support. They instructions executed and the number of privileged instructions use an approach based on SFI to prevent disclosure of code for each application using EPOXY and FreeRTOS-MPU. The that has been randomized using fine-grained diversification number of instructions executed (Exe) is the number of in- techniques. Their approach assumes W ⊕X and is compatible structions executed for the whole application to completion. with MPUs. Our work provides a way to ensure enforcement Privileged instructions (PI) describe which of these instruc- of W ⊕ X automatically. tions execute in privileged mode. Both are obtained using the CFI uses control-flow information to ensure the targets of Debug Watch and Trace unit provided by ARM [11]. The all indirect control-flow transfers end up at valid targets. CFI results for EPOXY are averaged over 100 runs across all faces two challenges: precision and performance. While the 20 variants with 5 runs per variant, and FreeRTOS-MPU’s performance overhead has been significantly reduced over are averaged over 100 runs. It is expected that the total time [46, 54], even the most precise CFI mechanism is number of instruction to be comparable as both are running the ineffective if an attacker finds a code location that allows same applications. However, EPOXY uses an average of only enough gadgets to be reached, e.g., an indirect function call 0.06% of privileged instructions FreeRTOS-MPU uses. This is that may call the function desired by the attacker [19, 28]. because EPOXY uses a fine-grained approach to specify the CFI with custom hardware additions has been implemented privileged instructions, while FreeRTOS-MPU sets the whole on embedded systems [24] with low overhead. Our techniques task as privileged. A large value for PI is undesirable from only require the commonly available MPU. CPI [40] enforces a security standpoint because the instruction can be exploited strict integrity of code pointers with low overhead but re- to perform security-critical functions, such as, turning off the quires runtime support and virtual memory. However, separate MPU thereby disabling all (MPU-based) protections. memory regions and MMU-based isolation are not available on bare-metal embedded systems. We leverage SafeStack, an VII.RELATED WORK independent component of CPI that protects return addresses Our work uses our novel privilege overlays, to enable on the stack, and adapt it to embedded systems without virtual established security policies from the desktop world for bare- memory support. metal embedded systems. We also customize several of these Embedded systems security is an important research topic. protections to the unique constraints of bare-metal systems. Cui and Stolfo [23] use binary rewriting to inject runtime Modern desktop operating systems such as Windows, Linux, integrity checks and divert execution to these checks; diversi- and Mac OS X protect against code injection and control-flow fying code in the process. Their checks are limited to checking hijack attacks through a variety of defenses, such as DEP [55], static memory via signatures and assumes DEP. Francillon stack canaries [22], Address Space Layout Randomization et al. [31] use micro-controller architecture extensions to [49], and multiple levels of execution privileges. create a regular stack and a protected return stack. EPOXY The research community has expended significant effort also uses a dual stack, without additional hardware support. in developing defenses for control-flow hijacking and data Firmware integrity attestation [30, 27, 44, 10] uses either a corruption. These works include: Artificial Diversity [20, software or hardware trust anchor to provide validation that 36, 13, 14, 35, 32, 38, 41, 48, 25], Control-Flow Integrity the firmware and or its execution matches a known standard. (CFI) [9, 43, 58, 59, 46, 18], and Code Pointer Integrity These techniques can be used to enforce our assumption (CPI) [40]. Artificial Diversity [20] outlines many techniques that the firmware is not tampered with at installation. Some for creating functionally equivalent but different binaries and frameworks [16, 4, 7, 1] enable creation of isolated compu- how they may impact the ability for attacks to scale across tational environments on embedded systems. mbedOS[4] and applications. A recent survey [41] performs an in-depth review FreeRTOS [1] are both embedded operating systems which of the 20+ years of work that has been done in this area. can utilize the MPU to isolate OS context from application Artificial software diversity is generally grouped by how it is context. TyTan [16] and mbed µV isor [7] enable sandboxing between different tasks of a bare-metal system. These require our prototype implementation shows that not only are these that an application be developed using its respective API. defenses effective, but that they result in negligible execution ARM’s TrustZone [12] provides hardware to divide execution and power overheads. The open-source version of EPOXY is between untrusted and trusted execution environments. The available at https://github.com/HexHive/EPOXY. ARMv7-M architecture does not contain this feature. ACKNOWLEDGMENTS VIII.DISCUSSION We thank the anonymous reviewers for their insightful Real-time systems. The diversity techniques we employ comments. We also thank Brandon Eames for his informative introduce some non-determinism between variants. This may feedback. This material is based in part upon work supported make it unsuitable for real-time systems with strict timing by the National Science Foundation under Grant Numbers requirements. However, the variability is low (a few percent) CNS-1464155 and CNS-1548114. Any opinions, findings, and making our techniques applicable to wide ranges of devices, conclusions or recommendations expressed in this material are particularly IoT devices, as they generally have soft real-time those of the authors and do not necessarily reflect the views of constraints. Investigation of the methods to further reduce the National Science Foundation. This work is also funded by variability is an area of future work. This involves intrusive Sandia National Laboratories. Sandia National Laboratories is changes to the compiler infrastructure to make its actions more a multi-program laboratory managed and operated by Sandia deterministic in the face of diversification. Corporation, a wholly owned subsidiary of Lockheed Martin Protecting inputs and outputs. We demonstrated EPOXY’s Corporation, for the U.S. Department of Energys National ability to protect the lock actuator on PinLock. Protecting the Nuclear Security Administration under contract DE-AC04- Ethernet and the SD interfaces is conceptually the same—a 94AL85000. series of reads and writes to IO registers. However, the HAL for these interfaces makes use of long indirection chains, i.e., REFERENCES passing the addresses of these registers as function parameters. [1] FreeRTOS-MPU. http://www.freertos.org/ Our current analysis does not detect these accesses, and the FreeRTOS-MPU-memory-protection-unit.html complexity of the HAL makes manual annotation a daunting [2] FreeRToS Support Forum. ARM CM3 MPU does not seem task. Extending our analysis to be inter-procedural will allow to build in FreeRTOS 9.0.0. https://sourceforge.net/p/freertos/ us to handle these complex IO patterns. discussion/382005//3743f72c/ Use with lightweight OSs. EPOXY can be extended to apply [3] FreeRToS Support Forum. Stack overflow detection on Cortex- m3 with MPU . https://sourceforge.net/p/freertos/discussion/ its protections to lightweight OSs, such as FreeRTOS. Our 382005/thread/18f8a0ce/#deab diversity techniques are directly usable as they do not change [4] mbed OS. https://www.mbed.com/en/development/mbed-os/ any calling conventions. Privilege Overlays require the use of [5] STM32479I-EVAL. http://www.st.com/resource/en/user a system call and care must be take to ensure one is reserved. manual/dm00219352.pdf Currently SVC FE is used, an arbitrary choice, which can [6] STM32F4-Discovery. http://www.st.com/st-web-ui/ static/active/en/resource/technical/document/data brief/ be changed to a compile-time parameter. Thus, enabling the DM00037955.pdf application of W ⊕ X—assuming the OS does not use the [7] The mbed OS uVisor. https://www.mbed.com/en/technologies/ MPU, which typically is the case. To apply SafeStack, the only security/uvisor/ remaining protection, EPOXY needs to know the number of [8] FreeRToS Support Forum. Mistype in port.c for threads created, and how to initialize each unsafestack. This GCC/ARM CM3 MPU , Jan 2016. https://sourceforge.net/ p/freertos/discussion/382005/thread/6a4f7df2/ may be obtained by making EPOXY aware of the OS thread [9] M. Abadi, M. Budiu, U. Erlingsson, and J. Ligatti, Control- create functionality, so it can be modified to setup both stacks. flow integrity, In ACM Conf. on Computer and Communication The OS’s context switch would also need to be changed to save Security. ACM, 2005, pp. 340–353. and restore separate unsafestack guards for each thread. With [10] T. Abera, N. Asokan, L. Davi, J. Ekberg, T. Nyman, A. Paverd, these changes EPOXY could apply its defenses to systems A. Sadeghi, and G. Tsudik, C-FLAT: control-flow attestation for embedded systems software, In Symp. on Information, using a lightweight OS. Computer and Communications Security, 2016. [11] ARM, ARMv7-M Architecture Reference Manual, “E.b” ed., IX.CONCLUSION 2014. Bare-metal systems typically operate without even basic [12] ARM, Trustzone, 2015. http://www.arm.com/products/ modern security protections like DEP and control-flow hijack processors/technologies/trustzone/ [13] S. Bhatkar, D. DuVarney, and R. Sekar, Address Obfuscation: protections. This is caused by the dichotomy inherent in An Efficient Approach to Combat a Broad Range of Memory bare-metal system development: all memory is executable Error Exploits. USENIX Security Symp., 2003. and accessible to simplify system development, but security [14] S. Bhatkar, D. DuVarney, and R. Sekar, Efficient Techniques principles dictate restricting some of their use at runtime. We for Comprehensive Protection from Memory Error Exploits, propose EPOXY, that uses a novel technique called privilege USENIX Security Symp., 2005. [15] K. Braden, S. Crane, L. Davi, M. Franz, P. Larsen, C. Liebchen, overlaying to solve this dichotomy. It applies protections and A.-R. Sadeghi, Leakage-resilient layout randomization for against code injection, control-flow hijack, and data corruption mobile devices, In Network and Distributed Systems Security attacks in a system-specific way. A performance evaluation of Symp. (NDSS), 2016. [16] F. Brasser, B. El Mahjoub, A.-R. Sadeghi, C. Wachsmann, and [37] H. Hu, S. Shinde, S. Adrian, Z. L. Chua, P. Saxena, and P. Koeberl, Tytan: Tiny trust anchor for tiny devices, In Design Z. Liang, Data-oriented programming: On the expressiveness Automation Conf. ACM/IEEE, 2015, pp. 1–6. of non-control data attacks, In IEEE Symp. on Security and [17] bunnie and Xobs, The exploration and explotation of a sd Privacy. IEEE, 2016, pp. 969–986. memory card, In Chaos Computing Congress, 2013. [38] T. Jackson, B. Salamat, A. Homescu, K. Manivannan, G. Wag- [18] N. Burow, S. A. Carr, J. Nash, P. Larsen, M. Franz, S. Brun- ner, A. Gal, S. Brunthaler, C. Wimmer, and M. Franz, thaler, and M. Payer, Control-Flow Integrity: Precision, Security, Compiler-generated software diversity, In Moving Target De- and Performance, ACM Computing Surveys, vol. 50, no. 1, 2018, fense. Springer, 2011, pp. 77–98. preprint: https://arxiv.org/abs/1602.04056. [39] B. Krebs, DDoS on Dyn Impacts Twit- [19] N. Carlini, A. Barresi, M. Payer, D. Wagner, and T. R. Gross, ter, Spotify, Reddit. https://krebsonsecurity.com/2016/10/ Control-Flow Bending: On the Effectiveness of Control-Flow ddos-on-dyn-impacts-twitter-spotify-reddit/ Integrity, In SEC: USENIX Security Symposium, 2015. [40] V. Kuznetsov, L. Szekeres, M. Payer, G. Candea, R. Sekar, and [20] F. B. Cohen, Operating system protection through program D. Song, Code Pointer Integrity, USENIX Symp. on Operating evolution, Computers and Security, vol. 12, no. 6, pp. 565–584, Systems Design and Implementation, 2014. oct 1993. [41] P. Larsen, A. Homescu, S. Brunthaler, and M. Franz, SoK: [21] A. Costin, J. Zaddach, A. Francillon, and D. Balzarotti, A Automated Software Diversity, IEEE Symp. on Security and large-scale analysis of the security of embedded firmwares, In Privacy, pp. 276–291, 2014. USENIX Security Symp., 2014, pp. 95–110. [42] C. Lattner and V. Adve, Llvm: A compilation framework for [22] C. Cowan, C. Pu, D. Maier, and J. Walpole, StackGuard: lifelong program analysis and transformation, In Intl. Symp. Automatic Adaptive Detection and Prevention of Buffer- Code Generation and Optimization. IEEE, 2004, pp. 75–86. Overflow Attacks. USENIX Security Symp., 1998. [43] J. Li, Z. Wang, T. Bletsch, D. Srinivasan, M. Grace, and [23] A. Cui and S. J. S. Stolfo, Defending Embedded Systems X. Jiang, Comprehensive and efficient protection of kernel con- with Software Symbiotes, In Intl. Conf. on Recent Advances in trol data, IEEE Trans. on Information Forensics and Security, Intrusion Detection. Springer, 2011, pp. 358–377. vol. 6, no. 4, pp. 1404–1417, 2011. [24] L. Davi, M. Hanreich, D. Paul, A.-R. Sadeghi, P. Koeberl, [44] Y. Li, J. M. McCune, and A. Perrig, Viper: Verifying the D. Sullivan, O. Arias, and Y. Jin, Hafix: Hardware-assisted flow integrity of peripherals’ firmware, In ACM Conf. on Computer integrity extension, In Proceedings of the 52Nd Annual Design and Communications Security, 2011, pp. 3–16. Automation Conference, ser. DAC ’15, 2015, pp. 74:1–74:6. [45] S. E. McLaughlin, D. Podkuiko, A. Delozier, S. Miadzvezhanka, [25] L. V. Davi, A. Dmitrienko, S. Nurnberger,¨ and A.-R. Sadeghi, and P. McDaniel, Embedded firmware diversity for smart elec- Gadge Me If You Can, In Symp. on Information, Computer tric meters. In USENIX Work. on Hot Topics in Security, 2010. and Communications Security. ACM Press, 2013, p. 299. [46] B. Niu and G. Tan, Modular control-flow integrity, ACM SIG- [26] L. Duflot, Y.-A. Perez, G. Valadon, and O. Levillain, Can you PLAN Notices, vol. 49, no. 6, pp. 577–587, 2014. still trust your network card, CanSecWest, pp. 24–26, 2010. [47] J. Pallister, S. J. Hollis, and J. Bennett, BEEBS: open [27] K. Eldefrawy, G. Tsudik, A. Francillon, and D. Perito, Smart: benchmarks for energy measurements on embedded platforms, Secure and minimal architecture for (establishing dynamic) root CoRR, vol. abs/1308.5174, 2013. of trust. In Network and Distributed System Security Symp., [48] V. Pappas, M. Polychronakis, and A. D. Keromytis, Smashing vol. 12, 2012, pp. 1–15. the gadgets: Hindering return-oriented programming using in- [28] I. Evans, F. Long, U. Otgonbaatar, H. Shrobe, M. Rinard, place code randomization, IEEE Symp. on Security and Privacy, H. Okhravi, and S. Sidiroglou-Douskos, Control jujutsu: On the pp. 601–615, 2012. weaknesses of fine-grained control flow integrity, In CCS’15: [49] PaX Team, PaX address space layout randomization (ASLR), Conference on Computer and Communications Security, 2015. 2003. http://pax.grsecurity.net/docs/aslr.txt [29] D. Evtyushkin, D. Ponomarev, and N. Abu-Ghazaleh, Jump over [50] G. Ramalingam, The undecidability of aliasing, ACM Trans. aslr: Attacking branch predictors to bypass aslr, In IEEE/ACM Program. Lang. Syst., vol. 16, no. 5, Sep. 1994. International Symposium on Microarchitecture (MICRO), 2016. [51] A.-R. Sadeghi, C. Wachsmann, and M. Waidner, Security and [30] A. Francillon, Q. Nguyen, K. B. Rasmussen, and G. Tsudik, privacy challenges in industrial internet of things, In Design A minimalist approach to remote attestation, In Euro. Design, Automation Conf. ACM/IEEE, 2015, p. 54. Automation, and Test. EDAA, 2014, p. 244. [52] J. Salwan, ROPgadget - Gadgets Finder and Auto-Roper, 2011. [31] A. Francillon, D. Perito, and C. Castelluccia, Defending http://shell-storm.org/project/ROPgadget/ embedded systems against control flow attacks, In ACM Conf. [53] L. Szekeres, M. Payer, and D. Song, SoK: Eternal War in on Computer and Communication Security, 2009, pp. 19–26. Memory, In IEEE Symp. on Security and Privacy. IEEE, may [32] C. Giuffrida, A. Kuijsten, and A. Tanenbaum, Enhanced 2013, pp. 48–62. operating system security through efficient and fine-grained [54] C. Tice, T. Roeder, P. Collingbourne, S. Checkoway, U.´ Erlings- address space randomization. USENIX Security Symp., 2012. son, L. Lozano, and G. Pike, Enforcing forward-edge control- [33] J. Gustafsson, A. Betts, A. Ermedahl, and B. Lisper, The flow integrity in gcc & llvm, In USENIX Security Symp., 2014. malardalen¨ wcet benchmarks: Past, present and future, In Open [55] A. van de Ven and I. Molnar, Shield, 2004. https: Access Series in Informatics, vol. 15. Schloss Dagstuhl- //www.redhat.com/f/pdf/rhel/WHP0006US Execshield.pdf Leibniz-Zentrum fuer Informatik, 2010. [56] R. Wahbe, S. Lucco, T. E. Anderson, and S. L. Graham, [34] M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, Efficient software-based fault isolation, In SOSP’03: Symposium T. Mudge, and R. B. Brown, Mibench: A free, commercially on Operating Systems Principles, 1993. representative embedded benchmark suite, In Intl. Work. on [57] J. Zaddach, A. Kurmus, D. Balzarotti, E.-O. Blass, A. Francil- Workload Characterization. IEEE, 2001, pp. 3–14. lon, T. Goodspeed, M. Gupta, and I. Koltsidas, Implementation [35] A. Homescu, S. Neisius, P. Larsen, S. Brunthaler, and M. Franz, and implications of a stealth hard-drive backdoor, In Annual Profile-guided automated software diversity, In Intl Symp. on Computer Security Applications Conf., 2013, pp. 279–288. Code Generation and Optimization. IEEE, 2013, pp. 1–11. [58] C. Zhang, T. Wei, Z. Chen, L. Duan, L. Szekeres, S. McCamant, [36] A. Homescu, S. Brunthaler, P. Larsen, and M. Franz, Librando: D. Song, and W. Zou, Practical control flow integrity and ran- Transparent code randomization for just-in-time compilers, In domization for binary executables, In IEEE Symp. on Security ACM Conf. on Computer and Communication Security, 2013. and Privacy. IEEE, 2013, pp. 559–573. stone: A dsp-oriented benchmarking methodology, In Intl. Conf. [59] M. Zhang and R. Sekar, Control flow integrity for cots binaries, on Signal Processing Applications and Technology, 1994, pp. In USENIX Security Symp., 2013, pp. 337–352. 715–720. [60] V. Zivojnovic, J. M. Velarde, C. Schlager, and H. Meyr, Dsp-