Building Hardware Components for Memory Protection of Applications on a Tiny Processor

Total Page:16

File Type:pdf, Size:1020Kb

Building Hardware Components for Memory Protection of Applications on a Tiny Processor Building Hardware Components for Memory Protection of Applications on a Tiny Processor Hyunyoung Oh, Yongje Lee, Junmo Park, Myonghoon Yang and Yunheung Paek Seoul National University {hyoh,yjlee,jmpark,mhyang}@sor.snu.ac.kr,[email protected] ABSTRACT the OS can exercise access control over every page in its memory As we move towards the IoT era, it is clear that many tiny proces- system by specifying security properties for memory protection sors will surround us, constantly communicating and transferring (i.e., memory access permissions and memory region attributes) valuable information with others. Although it is obvious that such in memory-mapped tables known as translation tables [2]. From processors will become appealing targets to attackers, they often the perspective of security, a key benefit of virtualizing address lack sophisticated hardware memory protection mechanism, such space is that it empowers the OS to protect sensitive data from as virtual memory. To enhance memory protection of tiny proces- unauthorized code by selectively binding the data with the code in sors, several studies have tried to design new hardware protection the virtual address space. That is, in the virtual memory system, if mechanisms at low cost. Those mechanisms, however, demand in- any application wants to access data at runtime, its code and the vasive modifications to the CPU internal architecture, and these data must be mapped (or bound) to the same virtual address space. modifications require significant cost and time for CPU redesign Therefore, in order to authorize only a trusted application to access and verification. In this paper, we present the design of hardware sensitive data, the OS simply configures translation tables in a way components for realizing an isolation of critical information on a to map the physical address of the data onto the virtual address tiny processor. Unlike other previous work, in our approach, several space of the authorized application code. hardware components are integrated together to build a system Despite its powerful security capability through code-data bind- with enhanced memory protection. To check the feasibility of our ing, virtual memory is rarely adopted by tiny embedded processors idea, we implement an early prototype where a RISC-V processor today in the market [3], but instead, deployed mostly in more capa- is connected with the proposed hardware components. Empirical ble processors targeting desktop PCs or other computing devices results show that ours achieves the goal with virtually no perfor- with ample resources. One major reason is that maintaining virtual mance and low area overhead. memory inherently requires a considerable amount of computation which is more than what a resource-stringent tiny processor could 1 INTRODUCTION afford. To satisfy the resource constraints, many vendors ofsuch low-end processors have implemented in hardware memory protec- As we move to the Internet of Things (IoT) era, more and more ex- tion mechanisms without virtualization. For this, these processors tremely small embedded devices like implanted medical devices are usually have a hardware memory management module, called the expected to use tiny embedded processors. Moreover, these devices memory protection unit (MPU), that assists the OS to enforce basic mostly would likely be connected to the IoT network and carry access rules for memory protection on the device without virtual sensitive user information like user profiles or credentials, and ac- memory support. MPU assumes that the system has a single physi- cordingly they are becoming attractive targets for cybercriminals. cal memory space regardless of the number of processes running on Therefore, for the proliferation of small devices in the IoT era, it the device. The memory space is divided into a number of regions, seems clear that we must properly address the security challenge and each process is assigned a set of regions associated with specific that is to protect security-critical information on these devices from access permissions given by the OS. At runtime, MPU enforces the various attacks. To enforce the protection of the critical informa- access rules by constantly checking if every process makes data tion, a conventional method is to implement a memory protection accesses complying with their permissions. mechanism that can isolate trusted applications processing critical Being free from the virtualization overhead, MPU can control information from other untrusted ones. data accesses of a process on memory regions with more efficiency. To efficiently support memory protection, the hardware typi- To maximize the efficiency, the access control mechanism ofMPU cally provides a special module for memory management, called the in existing tiny processors has been simplified in a way that each memory management unit (MMU). A main task of MMU is perform- region in the main memory is given the identical access permission ing virtual-to-physical mappings to realize virtual memory so that for every process trying to access it. However, from the perspective Permission to make digital or hard copies of all or part of this work for personal or of security, such efficiency exposes the system to a potential security classroom use is granted without fee provided that copies are not made or distributed breach that even when sensitive data must be processed by a trusted for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM application, the region containing the data can be accessed not only must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, by the trusted code but also by others (possibly from adversaries). to post on servers or to redistribute to lists, requires prior specific permission and/or a For this reason, when users want to set different access permissions fee. Request permissions from [email protected]. CARRV’17, Oct 2017, Boston, MA USA individually for each process, the OS should set different MPU © 2017 Association for Computing Machinery. configurations whenever the context switch occurs. However, this ACM ISBN 978-x-xxxx-xxxx-x/YY/MM...$15.00 OS-dependent approach is reportedly no longer secure because https://doi.org/10.1145/nnnnnnn.nnnnnnn many recent cases have reported that even an OS kernel can be patterns of the host. To check the feasibility of our idea, we make a compromised by the high-level attack such as rootkits. In those prototype where the slightly modified RISC-V Rocket is integrated cases, the memory isolation based on MPU is no longer guaranteed. with other hardware components. To overcome the limitation of the OS-dependent memory protec- The rest of the paper is organized as follows. Section 2 describes tion, recent studies [4, 7] endeavor to extend the protection features our assumption and the threat model. After Section 3 shows the by providing a hardware-enforced isolation of software modules overall system architecture for the memory protection, Section 4 targeting tiny embedded processors. In SMART [4], they presented explains the detailed implementation of our hardware modules. a new processor architecture which includes an attestation Read- Then, Section 5 discusses the experimental setup and results. In Only Memory (ROM) and a special storage in which the sensitive Section 6, we cover the previous studies related to ours. data are stored. Also, its MCU access control logic is changed to en- sure that only the specified code in the attestation ROM can access 2 ASSUMPTION AND THREAT MODEL the sensitive data in the special storage. In [7], they tried to en- In this work, we target embedded systems employing tiny pro- hance the protection mechanism of the original MPU architecture cessors available in today's market, which cannot afford MMUs by augmenting hardware logic to link code regions to data regions, for sophisticated virtual memory managements. We assume that thus making the permission check execution-aware. Although they the target system comes in the form of an SoC and therefore any have successfully enhanced the security of tiny processors, all these external hardware can be attached during implementation. previous solutions cannot be applied directly to the devices with As for our adversaries, we assume that untrusted software mod- existing processors because they require intrusive modifications ules can be installed in the target devices. These untrusted modules to the internal architecture of an existing processor including the can lead to compromising an entire program when (1) the modules pipeline registers and datapaths. have vulnerabilities which can be exploited by an attacker or (2) the A line of research [6, 8] has focused on implemental issues such modules themselves are malicious. This is due to the fact that, with- as how security hardware logics can be implemented with keep- out address protection between modules, a software module would ing the underlying CPU intact. Both studies commonly have used be able to corrupt the data used by another module. Additionally, we the hardware debugging interface of the processor to extract the assume that the adversary has full control over the communication branch target addresses for the control flow integrity enforcement. interface with the target system and can deliberately manipulate or By verifying branch target addresses at runtime, they have success- eavesdrop on the messages sent over the interface. We also assume fully detected the illegal control flow with incurring virtually no that the target system is exposed to physical attacks such as on-chip overhead on the host CPU. Motivated by their work, in this paper, bus probing or reverse engineering but these are out of the scope we present the design of hardware components which enable the of this paper. enhanced memory protection features for a tiny embedded proces- sor. Our solution is basically exerting the strategy similar to that of 3 OVERALL SYSTEM ARCHITECTURE earlier work [7].
Recommended publications
  • Virtual Memory
    Chapter 4 Virtual Memory Linux processes execute in a virtual environment that makes it appear as if each process had the entire address space of the CPU available to itself. This virtual address space extends from address 0 all the way to the maximum address. On a 32-bit platform, such as IA-32, the maximum address is 232 − 1or0xffffffff. On a 64-bit platform, such as IA-64, this is 264 − 1or0xffffffffffffffff. While it is obviously convenient for a process to be able to access such a huge ad- dress space, there are really three distinct, but equally important, reasons for using virtual memory. 1. Resource virtualization. On a system with virtual memory, a process does not have to concern itself with the details of how much physical memory is available or which physical memory locations are already in use by some other process. In other words, virtual memory takes a limited physical resource (physical memory) and turns it into an infinite, or at least an abundant, resource (virtual memory). 2. Information isolation. Because each process runs in its own address space, it is not possible for one process to read data that belongs to another process. This improves security because it reduces the risk of one process being able to spy on another pro- cess and, e.g., steal a password. 3. Fault isolation. Processes with their own virtual address spaces cannot overwrite each other’s memory. This greatly reduces the risk of a failure in one process trig- gering a failure in another process. That is, when a process crashes, the problem is generally limited to that process alone and does not cause the entire machine to go down.
    [Show full text]
  • The Case for Compressed Caching in Virtual Memory Systems
    THE ADVANCED COMPUTING SYSTEMS ASSOCIATION The following paper was originally published in the Proceedings of the USENIX Annual Technical Conference Monterey, California, USA, June 6-11, 1999 The Case for Compressed Caching in Virtual Memory Systems _ Paul R. Wilson, Scott F. Kaplan, and Yannis Smaragdakis aUniversity of Texas at Austin © 1999 by The USENIX Association All Rights Reserved Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. For more information about the USENIX Association: Phone: 1 510 528 8649 FAX: 1 510 548 5738 Email: [email protected] WWW: http://www.usenix.org The Case for Compressed Caching in Virtual Memory Systems Paul R. Wilson, Scott F. Kaplan, and Yannis Smaragdakis Dept. of Computer Sciences University of Texas at Austin Austin, Texas 78751-1182 g fwilson|sfkaplan|smaragd @cs.utexas.edu http://www.cs.utexas.edu/users/oops/ Abstract In [Wil90, Wil91b] we proposed compressed caching for virtual memory—storing pages in compressed form Compressed caching uses part of the available RAM to in a main memory compression cache to reduce disk pag- hold pages in compressed form, effectively adding a new ing. Appel also promoted this idea [AL91], and it was level to the virtual memory hierarchy. This level attempts evaluated empirically by Douglis [Dou93] and by Russi- to bridge the huge performance gap between normal (un- novich and Cogswell [RC96]. Unfortunately Douglis’s compressed) RAM and disk.
    [Show full text]
  • Memory Protection at Option
    Memory Protection at Option Application-Tailored Memory Safety in Safety-Critical Embedded Systems – Speicherschutz nach Wahl Auf die Anwendung zugeschnittene Speichersicherheit in sicherheitskritischen eingebetteten Systemen Der Technischen Fakultät der Universität Erlangen-Nürnberg zur Erlangung des Grades Doktor-Ingenieur vorgelegt von Michael Stilkerich Erlangen — 2012 Als Dissertation genehmigt von der Technischen Fakultät Universität Erlangen-Nürnberg Tag der Einreichung: 09.07.2012 Tag der Promotion: 30.11.2012 Dekan: Prof. Dr.-Ing. Marion Merklein Berichterstatter: Prof. Dr.-Ing. Wolfgang Schröder-Preikschat Prof. Dr. Michael Philippsen Abstract With the increasing capabilities and resources available on microcontrollers, there is a trend in the embedded industry to integrate multiple software functions on a single system to save cost, size, weight, and power. The integration raises new requirements, thereunder the need for spatial isolation, which is commonly established by using a memory protection unit (MPU) that can constrain access to the physical address space to a fixed set of address regions. MPU-based protection is limited in terms of available hardware, flexibility, granularity and ease of use. Software-based memory protection can provide an alternative or complement MPU-based protection, but has found little attention in the embedded domain. In this thesis, I evaluate qualitative and quantitative advantages and limitations of MPU-based memory protection and software-based protection based on a multi-JVM. I developed a framework composed of the AUTOSAR OS-like operating system CiAO and KESO, a Java implementation for deeply embedded systems. The framework allows choosing from no memory protection, MPU-based protection, software-based protection, and a combination of the two.
    [Show full text]
  • Virtual Memory
    Virtual Memory Copyright ©: University of Illinois CS 241 Staff 1 Address Space Abstraction n Program address space ¡ All memory data ¡ i.e., program code, stack, data segment n Hardware interface (physical reality) ¡ Computer has one small, shared memory How can n Application interface (illusion) we close this gap? ¡ Each process wants private, large memory Copyright ©: University of Illinois CS 241 Staff 2 Address Space Illusions n Address independence ¡ Same address can be used in different address spaces yet remain logically distinct n Protection ¡ One address space cannot access data in another address space n Virtual memory ¡ Address space can be larger than the amount of physical memory on the machine Copyright ©: University of Illinois CS 241 Staff 3 Address Space Illusions: Virtual Memory Illusion Reality Giant (virtual) address space Many processes sharing Protected from others one (physical) address space (Unless you want to share it) Limited memory More whenever you want it Today: An introduction to Virtual Memory: The memory space seen by your programs! Copyright ©: University of Illinois CS 241 Staff 4 Multi-Programming n Multiple processes in memory at the same time n Goals 1. Layout processes in memory as needed 2. Protect each process’s memory from accesses by other processes 3. Minimize performance overheads 4. Maximize memory utilization Copyright ©: University of Illinois CS 241 Staff 5 OS & memory management n Main memory is normally divided into two parts: kernel memory and user memory n In a multiprogramming system, the “user” part of main memory needs to be divided to accommodate multiple processes. n The user memory is dynamically managed by the OS (known as memory management) to satisfy following requirements: ¡ Protection ¡ Relocation ¡ Sharing ¡ Memory organization (main mem.
    [Show full text]
  • Extracting Compressed Pages from the Windows 10 Virtual Store WHITE PAPER | EXTRACTING COMPRESSED PAGES from the WINDOWS 10 VIRTUAL STORE 2
    white paper Extracting Compressed Pages from the Windows 10 Virtual Store WHITE PAPER | EXTRACTING COMPRESSED PAGES FROM THE WINDOWS 10 VIRTUAL STORE 2 Abstract Windows 8.1 introduced memory compression in August 2013. By the end of 2013 Linux 3.11 and OS X Mavericks leveraged compressed memory as well. Disk I/O continues to be orders of magnitude slower than RAM, whereas reading and decompressing data in RAM is fast and highly parallelizable across the system’s CPU cores, yielding a significant performance increase. However, this came at the cost of increased complexity of process memory reconstruction and thus reduced the power of popular tools such as Volatility, Rekall, and Redline. In this document we introduce a method to retrieve compressed pages from the Windows 10 Memory Manager Virtual Store, thus providing forensics and auditing tools with a way to retrieve, examine, and reconstruct memory artifacts regardless of their storage location. Introduction Windows 10 moves pages between physical memory and the hard disk or the Store Manager’s virtual store when memory is constrained. Universal Windows Platform (UWP) applications leverage the Virtual Store any time they are suspended (as is the case when minimized). When a given page is no longer in the process’s working set, the corresponding Page Table Entry (PTE) is used by the OS to specify the storage location as well as additional data that allows it to start the retrieval process. In the case of a page file, the retrieval is straightforward because both the page file index and the location of the page within the page file can be directly retrieved.
    [Show full text]
  • Using an MPU to Enforce Spatial Separation
    HighIntegritySystems Using an MPU to Enforce Spatial Separation Issue 1.3 - November 26, 2020 Copyright WITTENSTEIN aerospace & simulation ltd date as document, all rights reserved. WITTENSTEIN high integrity systems v Americas: +1 408 625 4712 ROTW: +44 1275 395 600 Email: [email protected] Web: www.highintegritysystems.com HighIntegritySystems Contents Contents..........................................................................................................................................2 List Of Figures.................................................................................................................................3 List Of Notation...............................................................................................................................3 CHAPTER 1 Introduction.....................................................................................................4 1.1 Introduction..............................................................................................................................4 1.2 Use Case - An Embedded System...........................................................................................5 CHAPTER 2 System Architecture and its Effect on Spatial Separation......................6 2.1 Spatial Separation with a Multi-Processor System....................................................................6 2.2 Spatial Separation with a Multi-Core System............................................................................6 2.3 Spatial Separation
    [Show full text]
  • Strict Memory Protection for Microcontrollers
    Master Thesis Spring 2019 Strict Memory Protection for Microcontrollers Erlend Sveen Supervisor: Jingyue Li Co-supervisor: Magnus Själander Sunday 17th February, 2019 Abstract Modern desktop systems protect processes from each other with memory protection. Microcontroller systems, which lack support for memory virtualization, typically only uses memory protection for areas that the programmer deems necessary and does not separate processes completely. As a result the application still appears monolithic and only a handful of errors may be detected. This thesis presents a set of solutions for complete separation of processes, unleash- ing the full potential of the memory protection unit embedded in modern ARM-based microcontrollers. The operating system loads multiple programs from disk into indepen- dently protected portions of memory. These programs may then be started, stopped, modified, crashed etc. without affecting other running programs. When loading pro- grams, a new allocation algorithm is used that automatically aligns the memories for use with the protection hardware. A pager is written to satisfy additional run-time demands of applications, and communication primitives for inter-process communication is made available. Since every running process is unable to get access to other running processes, not only reliability but also security is improved. An application may be split so that unsafe or error-prone code is separated from mission-critical code, allowing it to be independently restarted when an error occurs. With executable and writeable memory access rights being mutually exclusive, code injection is made harder to perform. The solution is all transparent to the programmer. All that is required is to split an application into sub-programs that operates largely independently.
    [Show full text]
  • HALO: Post-Link Heap-Layout Optimisation
    HALO: Post-Link Heap-Layout Optimisation Joe Savage Timothy M. Jones University of Cambridge, UK University of Cambridge, UK [email protected] [email protected] Abstract 1 Introduction Today, general-purpose memory allocators dominate the As the gap between memory and processor speeds continues landscape of dynamic memory management. While these so- to widen, efficient cache utilisation is more important than lutions can provide reasonably good behaviour across a wide ever. While compilers have long employed techniques like range of workloads, it is an unfortunate reality that their basic-block reordering, loop fission and tiling, and intelligent behaviour for any particular workload can be highly subop- register allocation to improve the cache behaviour of pro- timal. By catering primarily to average and worst-case usage grams, the layout of dynamically allocated memory remains patterns, these allocators deny programs the advantages of largely beyond the reach of static tools. domain-specific optimisations, and thus may inadvertently Today, when a C++ program calls new, or a C program place data in a manner that hinders performance, generating malloc, its request is satisfied by a general-purpose allocator unnecessary cache misses and load stalls. with no intimate knowledge of what the program does or To help alleviate these issues, we propose HALO: a post- how its data objects are used. Allocations are made through link profile-guided optimisation tool that can improve the fixed, lifeless interfaces, and fulfilled by
    [Show full text]
  • Memory Protection in Embedded Systems Lanfranco Lopriore Dipartimento Di Ingegneria Dell’Informazione, Università Di Pisa, Via G
    CORE Metadata, citation and similar papers at core.ac.uk Provided by Archivio della Ricerca - Università di Pisa Memory protection in embedded systems Lanfranco Lopriore Dipartimento di Ingegneria dell’Informazione, Università di Pisa, via G. Caruso 16, 56126 Pisa, Italy E-mail: [email protected] Abstract — With reference to an embedded system featuring no support for memory manage- ment, we present a model of a protection system based on passwords and keys. At the hardware level, our model takes advantage of a memory protection unit (MPU) interposed between the processor and the complex of the main memory and the input-output devices. The MPU sup- ports both concepts of a protection context and a protection domain. A protection context is a set of access rights for the memory pages; a protection domain is a set of one or more protection contexts. Passwords are associated with protection domains. A process that holds a key match- ing a given password can take advantage of this key to activate the corresponding domain. A small set of protection primitives makes it possible to modify the composition of the domains in a strictly controlled fashion. The proposed protection model is evaluated from a number of salient viewpoints, which include key distribution, review and revocation, the memory requirements for storage of the information concerning protection, and the time necessary for key validation. Keywords: access right; embedded system; protection; revocation. 1. INTRODUCTION We shall refer to a typical embedded system architecture featuring a microprocessor inter- facing both volatile and non-volatile primary memory devices, as well as a variety of input/out- put devices including sensors and actuators.
    [Show full text]
  • Virtual Memory - Paging
    Virtual memory - Paging Johan Montelius KTH 2020 1 / 32 The process code heap (.text) data stack kernel 0x00000000 0xC0000000 0xffffffff Memory layout for a 32-bit Linux process 2 / 32 Segments - a could be solution Processes in virtual space Address translation by MMU (base and bounds) Physical memory 3 / 32 one problem Physical memory External fragmentation: free areas of free space that is hard to utilize. Solution: allocate larger segments ... internal fragmentation. 4 / 32 another problem virtual space used code We’re reserving physical memory that is not used. physical memory not used? 5 / 32 Let’s try again It’s easier to handle fixed size memory blocks. Can we map a process virtual space to a set of equal size blocks? An address is interpreted as a virtual page number (VPN) and an offset. 6 / 32 Remember the segmented MMU MMU exception no virtual addr. offset yes // < within bounds index + physical address segment table 7 / 32 The paging MMU MMU exception virtual addr. offset // VPN available ? + physical address page table 8 / 32 the MMU exception exception virtual address within bounds page available Segmentation Paging linear address physical address 9 / 32 a note on the x86 architecture The x86-32 architecture supports both segmentation and paging. A virtual address is translated to a linear address using a segmentation table. The linear address is then translated to a physical address by paging. Linux and Windows do not use use segmentation to separate code, data nor stack. The x86-64 (the 64-bit version of the x86 architecture) has dropped many features for segmentation.
    [Show full text]
  • Chapter 4 Memory Management and Virtual Memory Asst.Prof.Dr
    Chapter 4 Memory management and Virtual Memory Asst.Prof.Dr. Supakit Nootyaskool IT-KMITL Object • To discuss the principle of memory management. • To understand the reason that memory partitions are importance for system organization. • To describe the concept of Paging and Segmentation. 4.1 Difference in main memory in the system • The uniprogramming system (example in DOS) allows only a program to be present in memory at a time. All resources are provided to a program at a time. Example in a memory has a program and an OS running 1) Operating system (on Kernel space) 2) A running program (on User space) • The multiprogramming system is difference from the mentioned above by allowing programs to be present in memory at a time. All resource must be organize and sharing to programs. Example by two programs are in the memory . 1) Operating system (on Kernel space) 2) Running programs (on User space + running) 3) Programs are waiting signals to execute in CPU (on User space). The multiprogramming system uses memory management to organize location to the programs 4.2 Memory management terms Frame Page Segment 4.2 Memory management terms Frame Page A fixed-lengthSegment block of main memory. 4.2 Memory management terms Frame Page A fixed-length block of data that resides in secondary memory. A page of data may temporarily beSegment copied into a frame of main memory. A variable-lengthMemory management block of data that residesterms in secondary memory. A segment may temporarily be copied into an available region of main memory or the segment may be divided into pages which can be individuallyFrame copied into mainPage memory.
    [Show full text]
  • X86 Memory Protection and Translation
    2/5/20 COMP 790: OS Implementation COMP 790: OS Implementation Logical Diagram Binary Memory x86 Memory Protection and Threads Formats Allocators Translation User System Calls Kernel Don Porter RCU File System Networking Sync Memory Device CPU Today’s Management Drivers Scheduler Lecture Hardware Interrupts Disk Net Consistency 1 Today’s Lecture: Focus on Hardware ABI 2 1 2 COMP 790: OS Implementation COMP 790: OS Implementation Lecture Goal Undergrad Review • Understand the hardware tools available on a • What is: modern x86 processor for manipulating and – Virtual memory? protecting memory – Segmentation? • Lab 2: You will program this hardware – Paging? • Apologies: Material can be a bit dry, but important – Plus, slides will be good reference • But, cool tech tricks: – How does thread-local storage (TLS) work? – An actual (and tough) Microsoft interview question 3 4 3 4 COMP 790: OS Implementation COMP 790: OS Implementation Memory Mapping Two System Goals 1) Provide an abstraction of contiguous, isolated virtual Process 1 Process 2 memory to a program Virtual Memory Virtual Memory 2) Prevent illegal operations // Program expects (*x) – Prevent access to other application or OS memory 0x1000 Only one physical 0x1000 address 0x1000!! // to always be at – Detect failures early (e.g., segfault on address 0) // address 0x1000 – More recently, prevent exploits that try to execute int *x = 0x1000; program data 0x1000 Physical Memory 5 6 5 6 1 2/5/20 COMP 790: OS Implementation COMP 790: OS Implementation Outline x86 Processor Modes • x86
    [Show full text]