<<

Building Hardware Components for of Applications on a Tiny

Hyunyoung Oh, Yongje Lee, Junmo Park, Myonghoon Yang and Yunheung Paek Seoul National University {hyoh,yjlee,jmpark,mhyang}@sor.snu.ac.kr,[email protected]

ABSTRACT the OS can exercise access control over every in its memory As we move towards the IoT era, it is clear that many tiny proces- system by specifying security properties for memory protection sors will surround us, constantly communicating and transferring (i.e., memory access permissions and memory region attributes) valuable information with others. Although it is obvious that such in memory-mapped tables known as translation tables [2]. From processors will become appealing targets to attackers, they often the perspective of security, a key benefit of virtualizing address lack sophisticated hardware memory protection mechanism, such space is that it empowers the OS to protect sensitive data from as . To enhance memory protection of tiny proces- unauthorized code by selectively binding the data with the code in sors, several studies have tried to design new hardware protection the virtual . That is, in the virtual memory system, if mechanisms at low cost. Those mechanisms, however, demand in- any application wants to access data at runtime, its code and the vasive modifications to the CPU internal architecture, and these data must be mapped (or bound) to the same . modifications require significant cost and time for CPU redesign Therefore, in order to authorize only a trusted application to access and verification. In this paper, we present the design of hardware sensitive data, the OS simply configures translation tables in a way components for realizing an isolation of critical information on a to map the of the data onto the virtual address tiny processor. Unlike other previous work, in our approach, several space of the authorized application code. hardware components are integrated together to build a system Despite its powerful security capability through code-data bind- with enhanced memory protection. To check the feasibility of our ing, virtual memory is rarely adopted by tiny embedded processors idea, we implement an early prototype where a RISC-V processor today in the market [3], but instead, deployed mostly in more capa- is connected with the proposed hardware components. Empirical ble processors targeting desktop PCs or other devices results show that ours achieves the goal with virtually no perfor- with ample resources. One major reason is that maintaining virtual mance and low area overhead. memory inherently requires a considerable amount of computation which is more than what a resource-stringent tiny processor could 1 INTRODUCTION afford. To satisfy the resource constraints, many vendors ofsuch low-end processors have implemented in hardware memory protec- As we move to the Internet of Things (IoT) era, more and more ex- tion mechanisms without . For this, these processors tremely small embedded devices like implanted medical devices are usually have a hardware module, called the expected to use tiny embedded processors. Moreover, these devices (MPU), that assists the OS to enforce basic mostly would likely be connected to the IoT network and carry access rules for memory protection on the device without virtual sensitive user information like user profiles or credentials, and ac- memory support. MPU assumes that the system has a single physi- cordingly they are becoming attractive targets for cybercriminals. cal memory space regardless of the number of processes running on Therefore, for the proliferation of small devices in the IoT era, it the device. The memory space is divided into a number of regions, seems clear that we must properly address the security challenge and each is assigned a set of regions associated with specific that is to protect security-critical information on these devices from access permissions given by the OS. At runtime, MPU enforces the various attacks. To enforce the protection of the critical informa- access rules by constantly checking if every process makes data tion, a conventional method is to implement a memory protection accesses complying with their permissions. mechanism that can isolate trusted applications processing critical Being free from the virtualization overhead, MPU can control information from other untrusted ones. data accesses of a process on memory regions with more efficiency. To efficiently support memory protection, the hardware typi- To maximize the efficiency, the access control mechanism ofMPU cally provides a special module for memory management, called the in existing tiny processors has been simplified in a way that each (MMU). A main task of MMU is perform- region in the main memory is given the identical access permission ing virtual-to-physical mappings to realize virtual memory so that for every process trying to access it. However, from the perspective Permission to make digital or hard copies of all or part of this work for personal or of security, such efficiency exposes the system to a potential security classroom use is granted without fee provided that copies are not made or distributed breach that even when sensitive data must be processed by a trusted for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM application, the region containing the data can be accessed not only must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, by the trusted code but also by others (possibly from adversaries). to post on servers or to redistribute to lists, requires prior specific permission and/or a For this reason, when users want to set different access permissions fee. Request permissions from [email protected]. CARRV’17, Oct 2017, Boston, MA USA individually for each process, the OS should set different MPU © 2017 Association for Computing Machinery. configurations whenever the context occurs. However, this ACM ISBN 978-x-xxxx-xxxx-x/YY/MM...$15.00 OS-dependent approach is reportedly no longer secure because https://doi.org/10.1145/nnnnnnn.nnnnnnn many recent cases have reported that even an OS kernel can be patterns of the host. To check the feasibility of our idea, we make a compromised by the high-level attack such as . In those prototype where the slightly modified RISC-V Rocket is integrated cases, the memory isolation based on MPU is no longer guaranteed. with other hardware components. To overcome the limitation of the OS-dependent memory protec- The rest of the paper is organized as follows. Section 2 describes tion, recent studies [4, 7] endeavor to extend the protection features our assumption and the threat model. After Section 3 shows the by providing a hardware-enforced isolation of software modules overall system architecture for the memory protection, Section 4 targeting tiny embedded processors. In SMART [4], they presented explains the detailed implementation of our hardware modules. a new processor architecture which includes an attestation Read- Then, Section 5 discusses the experimental setup and results. In Only Memory (ROM) and a special storage in which the sensitive Section 6, we cover the previous studies related to ours. data are stored. Also, its MCU access control logic is changed to en- sure that only the specified code in the attestation ROM can access 2 ASSUMPTION AND THREAT MODEL the sensitive data in the special storage. In [7], they tried to en- In this work, we target embedded systems employing tiny pro- hance the protection mechanism of the original MPU architecture cessors available in today's market, which cannot afford MMUs by augmenting hardware logic to link code regions to data regions, for sophisticated virtual memory managements. We assume that thus making the permission check execution-aware. Although they the target system comes in the form of an SoC and therefore any have successfully enhanced the security of tiny processors, all these external hardware can be attached during implementation. previous solutions cannot be applied directly to the devices with As for our adversaries, we assume that untrusted software mod- existing processors because they require intrusive modifications ules can be installed in the target devices. These untrusted modules to the internal architecture of an existing processor including the can lead to compromising an entire program when (1) the modules pipeline registers and . have vulnerabilities which can be exploited by an attacker or (2) the A line of research [6, 8] has focused on implemental issues such modules themselves are malicious. This is due to the fact that, with- as how security hardware logics can be implemented with keep- out address protection between modules, a software module would ing the underlying CPU intact. Both studies commonly have used be able to corrupt the data used by another module. Additionally, we the hardware debugging interface of the processor to extract the assume that the adversary has full control over the communication branch target addresses for the control flow integrity enforcement. interface with the target system and can deliberately manipulate or By verifying branch target addresses at runtime, they have success- eavesdrop on the messages sent over the interface. We also assume fully detected the illegal control flow with incurring virtually no that the target system is exposed to physical attacks such as on-chip overhead on the host CPU. Motivated by their work, in this paper, probing or reverse engineering but these are out of the scope we present the design of hardware components which enable the of this paper. enhanced memory protection features for a tiny embedded proces- sor. Our solution is basically exerting the strategy similar to that of 3 OVERALL SYSTEM ARCHITECTURE earlier work [7]. However, while their approach requires a perma- nent design change of the host processor, our study conforms to the 3.1 SoC Prototype Overview modular design approach, that several hardware components can be As clearly stated in Section 1, the ultimate goal of our research is assembled together to offer the enhanced memory protection. As to develop the hardware components which can be used to build our first hardware component, we design the memory region protec- a SoC with an external memory protection capability. To this end, tor (MRP) whose main role is to enable a tiny processor to have the we first present the SoC prototype including our proposed hard- code-data binding ability that has been provided by MMU with vir- ware components. Figure 1 shows the overall architecture of our tual memory. MRP is able to isolate code regions individually from SoC platform. In the prototype, the host CPU is the RISC-V Rocket each other according to their permissions, and thereby to bind data CPU in which the security interface is installed. The host CPU and regions to a specific code region based on the pre-defined access our MRP module are connected via the system bus, which is the rules between code and data regions. In order to define individual standard AMBA3 AXI interconnect [10]. MRP supports the security access rules between code and data regions, MRP maintains the function that is to establish the software isolation environment by register files for storing the low- and high addresses of codeand observing the host internal information (i.e., currently executing data regions which are configured at the boot time. Also, We build instruction address or memory access patterns of programs run- the configurable access permission matrix which defines the access ning on the host). In order to obtain the information, MRP has an permission of a code region for each data region. With the access additional connection to the security interface embedded in the permission matrix, MRP determines the legitimacy of each data host. Also, MRP has a connection to the access permission matrix transfer by checking the defined access permission between the that contains access permissions of a code region for each data instruction address generated at the CPU's instruction fetch stage region. By referring the access permission matrix, MRP determines and the data transfer address generated at the execution stage. How- if there is an illegal memory access. The main memory is a storage ever, as our MRP is located outside the processor, those necessary which keeps the code and data necessary for the host's operations. addresses issued by the internal pipeline are not directly visible to The host CPU and the main memory are connected through the our MRP. Thus, we must somehow devise a special mechanism to system bus in a way that the main memory can be accessed by the extract the processor internal information outwardly in a timely host. Our MRP hardware is also attached to the system bus so that manner. For this purpose, we also build a generic security interface the host can configure our MRP at will. In Section 4, more details inside the processor to extract the control flow and the data access of the hardware modules will be explained. 2 RISC-V CPU code region which the data transfer instruction belongs to and the Memory Access Security data region which the instruction's data address lands. Another Region Permission Interface Protector Matrix submodule is the decision unit that makes the final decision about the occurrence of illegal memory accesses when a data transfer occurs. To make the decision if there are illegal memory accesses, AMBA Interconnect (Master/Slave) MRP refers to the access permission matrix. And the other is the MRP controller. During the boot-up sequence, through the AHB slave interface in the MRP controller the code stored in PROM configures Memory Main Controller Memory the registers in CRS and DRS to define protected code and data regions, respectively and the registers in the access permission matrix to assign the allowed permissions. To define the address Figure 1: The overall architecture of our prototype range of each code or data region, two registers are provided to set the low and high addresses of the region. MRP needs the program execution information inside the proces- 3.2 Memory Protection Process sor to perform the aforementioned functions. The essential infor- Figure 2 illustrates the exemplary case for our SoC prototype with mation is categorized into three groups: (1) the instruction address the high-level software and hardware architecture. Inside the SoC of currently executing code (PC) and (2) the data address and (3) boundary, the RISC-V CPU is integrated with a programmable read- the data transfer type of each data transfer instruction. MRP takes only memory (PROM). PROM stores the boot code to define as input the information through the security interface. The infor- the code and data region and initialize the access permission ma- mation for groups (1), (2) and (3) is fed respectively by the security trix and for each task during boot. Since the tiny devices that our interface via the following pins: security solution for the memory protection targets include limited tasks, PROM stores the code in advance. Once the code and data re- (1) data_transfer_inst_addr gion are defined and the access permission matrix is initialized, they (2) data_transfer_data_addr are controlled so that they can not be modified. The host CPU is (3) data_transfer_type also connected with the main memory and peripheral IPs such as a In addition, the flag indicating the execution of data transfers is timer or high-speed communication interfaces. Our MRP monitors sent over signal line, data_transfer_en. all memory accesses issued by the CPU based on the information The data_transfer_data_addr and data_transfer_inst_ad fed by the security interface. These memory accesses include gen- dr are the data and instruction addresses of the data transfer instruc- eral memory access as well as access to memory-mapped IO devices tion, respectively. The instruction address of transmitted data means (here, they are peripherals in Figure 2). the address of the currently executing code. The data_transfer_type In the example given in Figure 2, each task has its own address indicates whether the data transfer instruction is stored or loaded. space to access. During the software execution, our MRP checks (i.e., either memory write or read). The data_transfer_en indi- whether or not any task running on the host CPU tries to access cates a signal to enable the preceding signals. When a data transfer the wrong address space. During the boot-up sequence, the host instruction is executed by the processor, the data_transfer_en CPU first access the boot loader code stored in PROM. Then, the signal is set on. At the same cycle, data_transfer_inst_addr, boot loader programs our MRP to define the accessible code and data_transfer_type and data_transfer_data_addr become va- data regions for each task. lid. Then, the instruction and data address are respectively sent to CRS and DRS by the MRP controller. To find out which code region the executed data transfer instruction belongs to, CRS searches Task A Task B Task C …… through its registers (CodeRegion0-7). Then, the code region num-

layer ber is output via code_region_num. Similarly, using the data ad- Software dress from the MRP controller, DRS looks up the matched code region in its registers (DataRegion0-7) and sends the region num- RISC-V Main Memory ber outwards through data_region_num. When the incoming data MRP CPU layer address belongs to one of defined code region, the region number

Peripherals Hardware is conveyed via the code_region_num_t signal. Then, the selected region numbers are sent to the access permission matrix. Finally, the access permission matrix searches the allowed permission cor- Figure 2: Memory protection process using our prototype responding to the region, and sends back the permission to the decision unit to decide whether or not the executed data transfer is 4 IMPLEMENTATION DETAILS legitimate. When the currently executed data transfer instruction 4.1 Memory Region Protector is found to violate the received access rule, the decision unit sets Figure 3 shows the overall architecture of our MRP hardware. The the illegal_access signal on. MRP hardware is divided into four submodule. The code region Up to now, we have explained how a code region for a trusted selector (CRS) and data region selector (DRS) respectively finds the software module can be defined by the set of memory regions and 3 h cespriso arxhsteacs emsinfrcode for permission access the has matrix permission access The oe takonades o xml,under example, For address. known a at voked oteacs emsinmti oudt t registers. its update to matrix information permission decoded access the the sends to and command the decodes re- module On protocol. the bus command, the the with ceiving accordance in the encoded interconnect, is AHB command the Over executed, command. is a PROM launches in processor stored the code the When matrix. permission cess to allocation. bits permission three of by case represented every is the matrix cover permission The access region, the data in and permissions: data code access each of For types Executable. three Writable, are Readable, There regions. and data region code vs. vs. code code permission relationship: access of types The two 4. maintains Figure access matrix The in eight. depicted is is region matrix each of permission number the and regions, data and Matrix Permission Access 4.2 in target found flow(i.e., be control can the instruction) of branch change of after MRP address PC modules, in software value trusted the the if untrusted of checks one an invoke When to modules. tries software that software other MRP by by executed set be regions can code overall the of MRP addresses code, instruction called trusted registers call optional safely an to interface has To an data. with security-critical users arbitrary access the an to provide at software snippets trusted code the the in reuse address to want might in- attackers only the is software trusted the the to that communication enforce safe to a is for software one thing trusted trusted important by is most latter invoked The the not. is or whether module of software regardless trusted software the con- another to when have case we the this, to sider addition In permissions. access associated security interface security set code regionsset code uigtebo-p h oesoe nPO ntaie h ac- the initializes PROM in stored code the boot-up, the During CodeRegion7 (l/h) CodeRegion7 (l/h) CodeRegion6 (l/h) CodeRegion5 (l/h) CodeRegion4 (l/h) CodeRegion3 (l/h) CodeRegion2 (l/h) CodeRegion1 (l/h) CodeRegion0 Code Region Selector Region Code signals fromsignals CodeEntries iue3 eoyRgo rtco architecture Protector Region Memory 3: Figure data_addr inst_addr, code_region_ code_region_ num_t num

Access Permission Matrix Permission Access AHB Slave Slave AHB Interface MRP Controller H lv interface slave AHB AHB InterconnectAHB Decision Unit Decision load or store

load or store CodeEntries illegal access Memory Region Protector

set access permissions data_region_ data_addr num oerueattacks reuse code nthe in

CodeEntries hc oti the contain which Data Region Selector Data Region R controller MRP DataRegion7 DataRegion7 DataRegion6 DataRegion5 DataRegion4 DataRegion3 DataRegion2 DataRegion1 DataRegion0 set data regionsset data . [ 9 ], 4 AP oetatteepee fifrain eaaye h pipeline the analyzed we information, of pieces these extract To so processor RISC-V the with interacting is interface security The einnme n aargo ubr h cespermission access and the permission number, the reads region matrix data and number region region code between rule access the region code The between MRP. let from First, follows. number as region is procedure code/data the receiving the after h orsodn ntuto/aaadestaserdfo the from transferred address instruction/data part obtains corresponding classi- extraction the address the the instruction/data by perceived part, type the fication access in memory place to on taking thesignals based is Lastly, send operation core. can an what that distinguishing after part operation, MRP classification store a or be load the should on there depending table MRP area since each Second, is creates MRP. the interface to to address security transfer instruction/data the export data occurred, to that ready out data the turns inside it array place andtriggers If data MRP. takes to transfer data actually part which the data transfer a the is that generation flag confirms enable an andinstruction/data First, extraction. classification address generation, flag enable parts: signals the of track core. keeping the by by memory generated the to accessed infor- is which mation determine to status. have execution interface the security our regarding Consequently, signals the observe with to registers interface of control the context designed we the executed, recognize execution being to instruction the sense, the to this In due . sequence differ- or instruction executed optimization is input core the the from by be conducted ently often sequence may the there that However, case need. a we signals the stored extracting and memory for calculated and are execute addresses the instruction/data between where main is stages to the attention stages, paying pipeline are we these part Among execute, back. decode, fetch, write pcgen, and of memory composed pipeline six-stage a has as ters such components patterns. internal and access architecture data or address instruction as such processor MRP the to transfer tion can interface the that Interface Security 4.3 unit. decision the to permission h cespriso arxrasadsnstepermission the sends and reads matrix permission access The o hsproe h euiyitraei opsdo three of composed is interface security the purpose, this For ( Ci rthe or j r ai,teacs emsinmti ed h permission the reads matrix permission access the valid, are , Cj ) hnteacs emsinmti ed h allowed the sends matrix permission access the Then . rga program iue4 cesPriso Matrix Permission Access 4: Figure i n aaregion data and tec tg.TeRS- processor RISC-V The stage. each at AP AP ( Ci ( Ci , i Dj j , and Similarly, . Dj ) hntecd region code the When . ) j eteacs permission access the be nrciigtecode the receiving On . ' nenlinforma- internal s tg oto regis- control stage AP ( Ci , Cj ) means i 195 204 FFs 6894 1082 1481 21.48% 80 36 9229 1066 1182 LUTs 12.81% ] is used to interconnect all the modules in 1 Components Rocket Core Security Interface Region PredictorMemory Access Permission Matrix Total over% Baseline System Our System Table 1: Synthesis result of our hardware components Baseline Based on the parameters for the prototype as described above, Category Hardware Components 5.2 Performance Overhead extracts the internal information without changingof the the critical host path CPU. Usingthe the functional interface, execution our of MRP the runscheck host. in of Hence, parallel MRP the with also access does permission system. not To impact initialize the the performance protected of code/datapermissions the in regions target MRP, and the the overhead access of setting registers is additionally quantified the resources necessary fortermshardwareour of components lookup in tables for logic (LUTs)show and that, FFs. compared The to design the statistics incur baseline the Rocket resource core, overhead our of components 12.81%respectively. and We 21.48% also for estimated FFs and thecomponents LUTs, gate using Synopsys count Design of Compiler. our With a hardware commercial is 16,586(1,712 gates for the securityand interface, 2,046 12,828 gates gates for for the MRP access permission matrix). entered. However, we observe someorder cases may which be the differentfrom processing pending on orderthe the executionof the status.interface input instructions In to de- this align case, elements we within allowters a the from data security an set execute by stage. using Incomprehensively control other determines regis- words, the the current security situation interface andcorresponding selects instruction/data the address executing theoperation load through or the store . Finally, the selected address 5 EXPERIMENTAL RESULTS 5.1 Hardware Area Overhead mented the SoC prototype includingdescribed the in hardware Section components 3. as In our prototype, we use theterized Xilinx by Zynq- FPGA configuration DefaultFPGASmallConfig,host processor. In as the our implemented MRP,data the region are number both of configured code and to be eight. The bus compliant with our prototype system. As the OS kernel, zynq 3.15.0 is used. 45nm process , the total gate-count of the proposed modules 7000 board and use a version 1.7 of RISC-V Rocket core parame- The security interface incurs zero performance overhead because it Table 1 provides the design statistics of our hardware prototype. We To evaluate the area overhead of our approach, we have imple- AMBA AHB2 protocol [ we synthesized our overall SoCwith Design a onto Xilinx the XC7Z020CLG484-1 prototyping FPGA board and 64MB external SDRAM. will be passed to the output port of the security interface. 5 D data_en inst_addr data_type data_addr MUX

MUX Address extender Security Interface Security Existing wire Existing Additional wire Additional WB_pc . WB Stage WB Store Store

Address Tagged Address Arb Core MEM_pc Load Data Array Data MEM Stage MEM Data CacheData Load address extender reg EX_pc EX ctrl EX Stage EX Lastly, the most important factor is, when the enable flag oc- A classification part is used to classify whetherthe currently First of all, to recognize if the memory access operation occurs, addresses depending on theeral, input it instruction is sequence. executed in In the gen- core in the same order as the instruction curs, which instruction address shouldgram counter be at fetched which from stage.tion/data the To accomplish address pro- this extraction work, part the are instruc- used to export appropriate ready/valid signal inside the data cachethe and corresponding the data interface address, obtains theenable flag operation high which makes is determined the which memory access type has. interface refers to control registersthe that instruction contain information being about executed andexecute controls stage. the In operation summary of so the far, if the enable flag occurs by the executed instruction is thecreates load different tables or depending storeit on operation. is the Since necessary to orload MRP recognize store operation, memory that access. this To instruction determine has the what memory type access of type, the security after being re-extended to thethrough data the address received from the core both high for theis load also or passed store from operation,address the is the arbiter a accessed short to address address thecore. based data on Therefore, the array. the data However, simplified address this address sent from sends the to the output port each operation are both highthe can be enable used flag, as the ansignals enable security to flag. interface MRP.From Additionally, when gets theready ready/valid to transfer signal become the the security interface checks the ready/valid signalarray between and the the data arbiter that controls the input signals of the data array. operation, the state when the ready/valid signal corresponding to core and sends them to anFigure output 5. port. These parts are shown in Figure 5: Information extraction through Security Interface At this time, since each ready/valid signal exist in the store/load imposed. For each code or data region, three register writes are SMART. Instead of implementing a special storage for the secret required: two for the high- and low-address of the region and one for key and the access control logic, TrustLite realized a programmable the access permission. However, considering that tiny processors access control logic, called the execution-aware memory protection mainly target small devices where most of their applications are unit (EA-MPU), to link code regions to data regions. The EA-MPU already fixed and thus can be statically allocated, this register- hardware has an access rule table which maintains the access per- setting overhead has little impact on the whole system performance. missions from code regions to data regions. To find out which code is currently running on the host, they modified the pipeline logic of 5.3 Security Considerations a processor to extract the instruction pointer (IP) which points to the address of the executed instructions. Also the read/write addresses Firstly, the main goal of our MRP is to establish the isolation envi- of data transfer instructions are also extracted to monitor the data ronment for the critical code and data. For this purpose, our MRP access pattern. The EA-MPU logic checks the IP and data transfer refers to the access control matrix which ensures that a data region addresses against the access rule table to decide the existence of storing security-critical data is only accessed by an authorized ap- invalid memory accesses. These hardware studies for the memory plication. In addition, the MRP prevents untrusted software from protection problem empirically suggest that capitalizing on hard- indiscriminately accessing (or invoking) the tasks in code regions ware techniques should be an excellent way to overcome the lack by restricting its access to predefined code entries in MRP registers. of sophisticated memory protection features of low-end devices (or As another application, our MRP would be helpful in establishing processors). Unlike our approach, however, their techniques have the root of trust for remote attestation. Remote attestation is the made permanent changes to the internal architecture of the existing process of securely verifying the state of a remote hardware plat- processor to improve memory protection. In our work, we present form to ensure that the known trusted software is correctly running a new design approach to building a security system with enhanced on the platform [4]. To ensure this code integrity, the usual strategy memory protection by integrating several hardware components. of attestation is to compute a hash value of the trust code periodi- As a hardware component that plays a role similar to the MRP cally by reading the code itself and comparing the hash value to a designed in this paper, there are MPU [3] of ARM Cortex series golden value which is pre-computed when the trusted software is processors and PMP [5] which can be optionally enabled in RISC- initially installed. To support this scenario, our MRP is also capable V processors recently. For MPU and PMP, the size of a protected of providing a code region with an exclusive access permission to region can be configured as a power-of-two multiple of 4KB. Com- read another code region to be attested and of providing the secret pared to these components, MRP has the advantage that the upper- key necessary for computing the hash value. and lower-bounds of a region can be configured to any address. Consequently, MRP provides more flexibility in setting up the pro- 6 RELATED WORK tected regions. Moreover, it is meaningful that the security interface As memory protection (e.g., ) is considered to be a provides the system developers with the capability of installing fundamental primitive of most security enhanced systems, much re- various security hardware components which operate based on the search effort has been devoted to developing, in hardware, efficient CPU internal information. isolation mechanism available to various platforms. For high-end devices (e.g., desktop PCs or even smartphones), the problems of 7 CONCLUSION dealing with software isolation are relatively well-understood and We present in this paper a hardware-based memory protection solu- a considerable amount of knowledge has been obtained from re- tion to enhance the memory protection features of a tiny embedded searches conducted for the past decades. In fact, most existing processor. To achieve this goal, we propose several hardware com- high-end processors deploy an MMU hardware to protect the sys- ponents. The core component of the our hardware is MRP that tem by preventing one process from interfering with the critical plays the role of deciding the existence of invalid memory accesses code or data of another process. On the other hand, only a few by observing the source and destination addresses of data transfers research studies have been interested in developing enhanced mem- inside the host. To this end, MRP refers to the access permission ory protection features for low-end resource-impoverished devices. matrix that provides the access permission for each code and data In SMART [4], one of the hardware solutions to enhance memory region. Moreover, to establish a communication channel between protection for low-end devices, they proposed a technique to estab- the our hardware components and the host processor, we design the lish a root of trust for performing remote attestation. In their work, generic security interface by modifying the internal architecture of a special storage in the memory space is added to the processor the RISC-V processor. Being connected to the RISC-V CPU via the architecture to store the secret key which is necessary to perform security interface, MRP can receive the CPU internal information hash functions for attestation. They also implemented a custom at runtime. The experimental results showed that our hardware access logic which ensures that only the code residing in a specified based solution successfully isolated code regions individually from region in ROM can access the key in the special storage. This way, each other and provides enhanced memory protection with low the secret key is only accessible by the verifier code in ROM, and area and virtually no performance overhead. users can get the assurance that the hash value is computed by the authorized verifier code using the correct secret key. TrustLite7 [ ] is another noticeable hardware-assisted technique aiming to enhance REFERENCES [1] ARM 1999. AMBA Specification. ARM. the memory protection of tiny processors with MPU. Basically, [2] ARM 2007. ARM Architecture Reference Manual (ARMv7-A and ARMv7-R edition. TrustLite extended the memory access control model proposed by ARM.

6 [3] ARM 2014. ARM Cortex-M7 Processor. ARM. [4] Karim Eldefrawy, Gene Tsudik, Aurélien Francillon, and Daniele Perito. 2012. SMART: Secure and Minimal Architecture for (Establishing Dynamic) Root of Trust.. In NDSS, Vol. 12. 1–15. [5] Electrical Engineering and Computer Sciences University of California 2016. The RISC-V Instruction Set Manual Volume II: Privileged Architecture. Electrical Engineering and Computer Sciences University of California. [6] Zonglin Guo, Ram Bhakta, and Ian G Harris. 2014. Control-flow checking for intrusion detection via a real-time debug interface. In Smart Computing Workshops (SMARTCOMP Workshops), 2014 International Conference on. IEEE, 87–92. [7] Patrick Koeberl, Steffen Schulz, Ahmad-Reza Sadeghi, and Vijay Varadharajan. 2014. TrustLite: a security architecture for tiny embedded devices. In Proceedings of the Ninth European Conference on Computer Systems. ACM, 10. [8] Yongje Lee, Jinyong Lee, Ingoo Heo, Dongil Hwang, and Yunheung Paek. 2017. Using CoreSight PTM to Integrate CRA Monitoring IPs in an ARM-Based SoC. ACM Trans. Des. Autom. Electron. Syst. 22, 3, Article 52 (April 2017), 25 pages. https://doi.org/10.1145/3035965 [9] Hovav Shacham. 2007. The geometry of innocent flesh on the bone: Return-into- libc without function calls (on the ). In Proceedings of the 14th ACM conference on Computer and communications security. ACM, 552–561. [10] Xilinx 2011. AXI Reference Guide. Xilinx.

7