<<

Zhao YY, Chen MY, Liu YH et al. IMPULP: A hardware approach for in- via user-level partitioning. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 35(2): 418–432 Mar. 2020. DOI 10.1007/s11390-020-9703-2

IMPULP: A Hardware Approach for In-Process Memory Protection via User-Level Partitioning

Yang-Yang Zhao1,2, Student, Member, IEEE, Ming-Yu Chen1,2,3,∗, Member, CCF, ACM, IEEE Yu-Hang Liu1,2,3, Member, CCF, ACM, IEEE, Zong-Hao Yang1,2, Xiao-Jing Zhu1, Zong-Hui Hong2, and Yun-Ge Guo2

1State Key Laboratory of , Institute of Computing Technology, Chinese Academy of Sciences Beijing 100190, China 2University of Chinese Academy of Sciences, Beijing 100049, China 3PengCheng Laboratory, Shenzhen 518055, China E-mail: {zhaoyangyang, cmy, liuyuhang, yangzonghao}@ict.ac.cn; [email protected] E-mail: hongzonghui [email protected]; [email protected]

Received May 9, 2019; revised February 4, 2020.

Abstract In recent years many security attacks occur when malicious codes abuse in-process memory resources. Due to the increasing complexity, an application program may call third-party code which cannot be controlled by programmers but may contain security vulnerabilities. As a result, the users have the risk of suffering information leakage and control flow hijacking. However, current solutions like memory protection extensions (MPX) severely degrade performance, while other approaches like Intel memory protection keys (MPK) lack flexibility in dividing security domains. In this paper, we propose IMPULP, an effective and efficient hardware approach for in-process memory protection. The rationale of IMPULP is user-level partitioning that user code segments are divided into different security domains according to their instruction addresses, and accessible memory spaces are specified dynamically for each domain via a set of boundary registers. Each instruction related to memory access will be checked according to its security domain and the corresponding boundaries, and illegal in-process memory access of untrusted code segments will be prevented. IMPULP can be leveraged to prevent a wide range of in-process memory abuse attacks, such as buffer overflows and memory leakages. For verification, an FPGA prototype based on RISC-V instruction set architecture has been developed. We present eight tests to verify the effectiveness of IMPULP, including five memory protection function tests, a test to defense typical buffer overflow, a test to defense famous memory leakage attack named Heartbleed, and a test for security benchmark. We execute the SPEC CPU2006 benchmark programs to evaluate the efficiency of IMPULP. The performance overhead of IMPULP is less than 0.2% runtime on average, which is negligible. Moreover, the resource overhead is less than 5.5% for hardware modification of IMPULP. Keywords in-process isolation, memory protection, out-of-bounds, user-level partitioning

1 Introduction and components in the same address space. When the third-party codes contain security vulnerabilities, the In conventional memory protections, only mem- adversary abuses the memory resources within a pro- ory accesses requiring inter-process communications are checked and prevented if violations are detected [1]. cess to execute codes illegally in the context of the at- With the increasing complexity of applications, a pro- tacked process, leading to and sen- gram may call third-party codes including functions sitive leakage. For example, in the Heartbleed○1

Regular Paper Special Section of ChinaSys 2019 This work was supported by the National Key Research and Development Plan of China under Grant No. 2016YFB1000200, the National Natural Science Foundation of China under Grant No. 61772497, and the State Key Laboratory of Computer Architecture Foundation under Grant Nos. CARCH4405 and CARCH2601. ∗Corresponding Author ○1 https://gizmodo.com/how-heartbleed-works-the-code-behind-the-internets-se-1561341209, Oct. 2019. ©Institute of Computing Technology, Chinese Academy of Sciences 2020 Yang-Yang Zhao et al.: IMPULP: A Hardware Approach for In-Process Memory Protection 419 incident that nearly affected the entire Internet, the IMPULP can effectively prevent buffer overflows and attacker used a vulnerability in the OpenSSL code to memory leakages. We also test typical applications pass an illegal length to the memcpy function to ob- in SPEC CPU2006 benchmark to evaluate the perfor- tain sensitive data, including user account and pass- mance loss of IMPULP. The results show that the run- word information. Moreover, some malicious libraries time performance overhead is less than 0.2% on average. in mobile Apps [2] steal sensitive data using privileged In addition, the resource overhead is less than 5.5% for APIs (application programming interface). In addition, hardware modification of IMPULP. the attacker leverages third-party plugins [3] to bypass Our contributions in this study can be summarized the protection of current program, resulting in memory- as follows. corruption vulnerabilities. 1) We propose IMPULP, a hardware approach for In the past, -based memory protection like in-process memory protection, dividing user-level se- SFI (software fault isolation) [4–6], was neither complete curity domains by the range of corresponding instruc- nor efficient due to instrumenting every memory-access tion addresses. User code with different instruction ad- instruction for run-time checks. Therefore, recent re- dresses can access different ranges of search explores more practical hardware extensions. space. When untrusted user code segments access out- For example, Intel MPX [7] provides new instructions of-bounds memory, the modified CPU pipeline would of instruction set architecture (ISA) to configure report illegal state and generate exception. The pro- the bounds for a pointer to a buffer. However, Intel cess is halted timely. MPX severely degrades the performance of programs 2) We present a pair of IMPULP APIs to dynam- due to the idea of out-of-bound checking by software ically specify the address range of memory protection. instrumentation for each instruction of a process. Fur- Switching between different security domains is per- thermore, Intel has recently announced control-flow en- formed with only a slight modification on original pro- forcement technology (CET)○2 and MPK○3 . However, gram by means of our proposed APIs. This design re- these technologies either provide limited hardware sup- duces runtime overhead in the following two ways. IM- port or cause unnecessary performance overhead. PULP requires no traps into kernel compared with the To take advantage of hardware to improve perfor- conventional domain isolation method. Besides, IM- mance, we propose IMPULP for efficient in-process PULP diminishes the costs of instrumentation for each memory protection. Instead of instrumenting each instruction, since it merely sets registers when a cross- memory access instruction by software like Intel MPX, domain function call occurs. IMPULP only inserts an API function by software 3) We build an FPGA prototype system with an to restrict the accessible address range before each enhanced RISC-V CPU core, a modified Linux kernel, cross-domain function call. Moreover, the out-of-bound and a extension to support IMPULP. We eva- checking is performed by modified CPU pipeline auto- luate the prototype with both functional tests and per- matically when the program is running. Hence, run- formance benchmarks. The results show that IMPULP time overhead can be greatly reduced in IMPULP com- effectively performs in-process memory protection with pared with Intel MPX. Since the partition of security low runtime overhead and hardware cost. domain is achieved in user level, the performance over- The paper is organized as follows. In Section 2, we head of domain switching in IMPULP is smaller than introduce the state-of-the-art in-process memory pro- that of other domain-based methods, such as Intel MPK tection methods. Section 3 provides a brief introduc- and memory domains on ARM32○4 . tion of adversary model. Section 4 details the design of We develop the IMPULP prototype on an FPGA IMPULP. In Section 5, we present both the hardware platform integrated with RISC-V○5 ISA. We add new and the software implementations. Moreover, we briefly groups of dedicated CPU registers, and modify Linux introduce the usage of IMPULP. Section 6 evaluates the kernel to support register configuration and raise out- effectiveness and efficiency of IMPULP. The hardware of-bounds exception. Experimental results show that cost is also presented in this section. Section 7 discusses

○2 https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf, Oct. 2019. ○3 https://lwn.net/Articles/643617/, Oct. 2019. ○4 http://silver.arm.com/download/ARM and AMBA Architecture/AR150-DA-70000-r0p0-00bet9/DDI0487A h armv8 arm.pdf, Mar. 2019. ○5 https://riscv.org/specifications/, Mar. 2019. 420 J. Comput. Sci. & Technol., Mar. 2020, Vol.35, No.2 the limitations of our IMPULP prototype and Section 8 the control transfer when the program runs; therefore concludes this paper. a program is always limited by the original control flow graph and code pointer. 2 Backgrounds Data-only attacks modify critical data variables such as branching conditions to change program. Data 2.1 In-Process Memory Abuse Attacks and flow integrity (DFI) [22, 23] mitigates data-only attacks Defenses by restricting data access. To give developers an effi- cient way to protect sensitive data like cryptographic In-process is usually caused by keys at the source code level, IMIX [24] added a new improper internal memory access and triggered by instruction called smov to protect metadata. many unreliable factors within a user process, such as third-party libraries or plugins, improper function para- 2.2 State-of-the-Art Memory Protection meters, buffer overflow, out-of-bounds memory access Approaches and speculation . The attackers take advantage of in-process memory security risks mentioned above, From the design point of view, existing memory pro- once they have penetrated into the context of target tection approaches can be divided into the following process, and they can freely access (or abuse) memory categories. content of victim programs. Specific attack methods The first type divides a part of memory from mem- include but are not limited to code-injection [8], code- ory space to store sensitive information, and applies reuse [9–11], and data-only attacks [12–14]. various memory protection techniques to prevent at- Code-injection attackers inject malicious code seg- tacks to the security part of memory. This type is ments as data into the address space of a process, and represented by Intel SGX (software guard extensions), then execute the code to gain control of the process, which aims at strongly isolating sensitive code and data thereby achieving attacks by generating many other from the , hypervisor, BIOS, and other processes or modifying system files. Typical defenses applications○9 . However, Intel SGX requires intensive like DEP (data execution prevention)○6 add an execute effort to decouple code to be run in a security domain permission bit in a table. The operating system (known as enclave), and can be used for memory pro- mitigates most attacks by managing the tection with high overhead of performance for entering bit to set code segment non-modifiable and set data seg- and exiting the enclave [25]. Intel SGX rejects any at- ment modifiable but not executable. Specific schemes tempt to access trusted memory from outside the en- include no-execute page-protection (NX)○7 clave, while IMPULP limits the accessible memory ad- defined by AMD and execute disable bit (XD)○8 pro- dress of codes from untrusted security domains. There cessor defined by Intel. are fundamental differences in the principle of the two Even if code-injection attacks are eliminated, methods. attackers can change the execution order of in- The second type hides the memory address that process code to rewrite sensitive data in in-process stores sensitive information, making it unpredictable memory spaces with code-reuse attacks, such as to an attacker for memory protection purposes. This ROP (return-oriented programming) [15], COP (call- type is represented by ASLR (address space layout ran- oriented programming) [16], and JOP (jump-oriented domization). ASLR is commonly applied to user-space programming) [17]. The defense method includes applications and OS kernels due to its effectiveness and ASLR (address space layout randomization) [18], CFI low overhead. Unfortunately, ASLR suffers from vari- (control flow integrity) [19] and CPI (code-pointer ous side-channel attacks. A combination of a software integrity) [20, 21]. The basic idea of ASLR is to load the bug and the knowledge of data structure addresses can codes into a random location; thus the attacker cannot lead to private code execution. find the exact target code location, increasing the dif- The third type divides a part of memory from mem- ficulty of code-reuse attacks. Both CFI and CPI limit ory space for untrusted code segments. When the

○6 http://support.microsoft.com/kb/875352/EN-US/, Mar. 2019. ○7 https://en.wikipedia.org/wiki/NX bit, Mar. 2019. ○8 http://www-ssl.intel.com/content/www/us/en/processors/architectures/software-developer-manuals.html, Mar. 2019. ○9 https://software.intel.com/en-us/sgx, Mar. 2019. Yang-Yang Zhao et al.: IMPULP: A Hardware Approach for In-Process Memory Protection 421 program is running, the developer checks whether the or returned. When each instruction is executed, the memory access boundary or the access permission is modified pipeline in hardware automatically configures broken, thus preventing possible security attacks. This and checks the boundaries for each instruction. The type is represented by Intel MPX and Intel MPK. Our software does not need to inject more instructions for method of IMPULP also belongs to this type. of each instruction. Therefore, IM- Another type of methods does not divide security PULP has obvious performance advantages over Intel domains for the entire memory space, but explores fine- MPX. grained partitions. For example, they tag a memory Another type of hardware-enforced mechanism is block with several bits in hardware to distinguish the based on Intel VT-x technology. Dune [30] builds a pro- security domains. This type is represented by tagged cess abstraction with a small virtual kernel that initial- memory [26]. Tagged memory adds a tag for memory izes virtual hardware and mediates interactions with or register unit to store the semantic information of the physical kernel. Users access privileged resources data. Permission check and memory isolation can be through the interface provided by Dune and avoid the performed according to the corresponding tag. This trap into kernel. Therefore, Dune significantly improves method lacks real implementation due to the high hard- the performance. To isolate memory, Dune maintains ware cost. both regular and protected EPT (extended ) Many classic software countermeasures against mappings to enforce privilege checking. Since Dune ex- memory abuse, such as Libsafe [27], LibsafeXP [28], and poses kernel modules directly to the user without addi- stack canaries [29], always incur high overhead of per- tional security measures, it introduces serious security formance. Therefore, we only illustrate the features of risks. typical techniques in Table 1 and list the shortcomings Memory domain based methods include Shreds of of these techniques. Intel MPX implements bounds ARM platform and Intel MPK. Shreds [31] uses the checking in hardware. Therefore, it provides new in- memory domain of ARM for access control. A shred structions to configure a lower and upper bound for a can be viewed as a flexibly defined segment of a pointer to a buffer. When the program is running, In- execution. Each shred is associated with a protected tel MPX leverages another instruction to quickly check memory pool with specific domain ID. If the domain whether this address points into the buffers bound- ID is wrong, the memory access is illegal. However, the aries. Since the software instruments bounds check- change of memory domains needs to modify page table ing for every memory access instruction of a process, in kernel, which introduces high overhead. Intel MPX causes high overhead of performance. Com- Intel MPK divides the memory into different do- pared with Intel MPX, the idea of bounds checking mains. Each domain has a specific value called memory for each memory access instruction is the same in IM- protection key. For an x86 system, the key of a page is PULP. However, the specific implementation is clearly marked with four bits in the page table. Once a pro- optimized in IMPULP. The main difference is that the cess accesses the memory, hardware checks whether the software of IMPULP only configures the boundary reg- key of the process matches the key from the memory isters in hardware when an untrusted function is called block. A mismatch will trigger an exception. Compared

Table 1. Features of State-of-the-Art In-Process Memory Isolation Techniques

Method Hardware-Enhanced Mechanism Characteristics Shortcoming Intel MPX Memory bounds check Every instruction is checked to prevent The performance overhead is high out-of-bounds memory access Dune VT-based method Intel VT-x provides users with full ac- The kernel is exposed to users, which cess to protected hardware introduces security risks Shreds Domain-based method ARM holds the access permissions for It offers limited security due to a sup- 16 domains to isolate sensitive data port of specific hardware and recom- piled libraries Intel MPK Domain-based method Intel x86 supports 16 domains of mem- The permission setting register is ac- ory protection key to divide user vir- cessible for user program, which intro- tual space duces security risks IMIX ISA-based method It adds new instruction named smov to A security risk occurs when the adver- access critical metadata sary modifies code and then executes the smov instruction with controlled arguments 422 J. Comput. Sci. & Technol., Mar. 2020, Vol.35, No.2 with Shreds, when an application changes the protec- tion address to distinguish between trusted and un- tion domain, Intel MPK requires no modification of the trusted code segments by the following steps. page tables, and therefore the performance overhead is 1) At compile time, we set the instruction address lower. However, the user program can control read and range of trusted code segments. The corresponding write permissions for each domain by changing the cor- boundary registers are updated by kernel at load time responding registers, which introduce security risks. only once. Therefore, it is impossible for untrusted code Recently, there exist new methods based on modi- segments to modify the instruction address range of fied ISA. IMIX [24] extends the Intel x86 ISA with a new trusted codes to ensure security. 2) At runtime, the instruction smov to protect sensitive metadata. How- hardware obtains the current program (PC) ever, the requirement of immutable codes in the adver- value and compares it with the instruction address sary model of IMIX is hard to achieve, especially when a range of trusted code segments. If the PC is inside the process calls more and more complex third-party codes. range, the current instruction originates from a trusted Once the adversary injects new codes or modifies exist- code segment. Otherwise, it comes from an untrusted ing codes, and then executes the smov instruction with code segment. controlled arguments, a security risk generates. For trusted code segments, the accessible memory address covers the entire address space of the process. 3 Adversary Model For untrusted code segments, IMPULP limits the acces- sible memory address within parts of the address space Throughout our work, we use the following adver- and sets access permissions for safety by the following sary model and assumptions, which are consistent with steps. prior work in this field of research [9–11, 20, 21]. 1) At compile time, we specify the accessible mem- Memory Corruption. We assume the presence of ory space for untrusted code segments by instrument- memory-corruption vulnerability, which the adversary ing three instructions within the trusted code segments. can repeatedly exploit to read and write data according These three instructions set the lower and the upper to the memory access permissions. bound of accessible memory space, and the access per- Kernel Security. This paper divides security do- mission of the memory space. They can be packaged mains in the user level. Therefore, the system kernel is as an API function. The encapsulated pair of APIs hypothesized to be secure, which means that this pa- is as follows: start protect (addr, len, cfg, index) and per will not discuss the attacks and defenses about the end protect (index) (see the details in Subsection 5.2.1). system kernel. We name untrusted code segment as library function. Inter-Process Security. This paper focuses on in- We add the API function right before the library func- process memory protection. Therefore, we assume that tion is invoked in the program. 2) At runtime, the different processes are safely isolated, which means that corresponding registers are updated when the instru- this paper will not discuss the attacks and defenses mented instructions are executed. When the illegal across different processes. command is issued by out-of-bounds memory requests Security Risk of User Code. When applications call from untrusted code segments, the hardware will detect untrusted third-party codes, a process may induce vul- out-of-bounds or illegal accesses when checking the cor- nerabilities and lead to information leakage and control responding registers. Then the modified CPU pipeline flow hijacking. Security risks of user codes include vul- generates an interrupt and records the current interrupt nerable OpenSSL codes, malicious libraries in mobile category information. Apps, third-party plugins, and any untrusted code seg- With the address space restrictions of memory ac- ments which may induce an attack. cess, sensitive data is isolated from out-of-bounds mem- ory access of in-process untrusted instructions. How- 4 Design of IMPULP ever, attackers may execute code-reuse attacks. For ex- ample, if untrusted code segments make use of branch IMPULP divides in-process code segments into or jump or other memory access related instructions to different security domains by the instruction address. reach the address where the accessible memory bound- The first version of IMPULP divides the codes of the aries are stored, attackers can break through the above process into two parts, trusted code segments and un- boundaries by modifying the values in boundary reg- trusted code segments. IMPULP leverages the instruc- isters. Even a jump between library functions can in- Yang-Yang Zhao et al.: IMPULP: A Hardware Approach for In-Process Memory Protection 423 troduce security risks. IMPULP provides protection 5.1 Hardware Modification against ROP attacks with existing conventional CFI From Section 4, we know that the hardware mainly methods [19]. modifies three parts. IMPULP adds groups of reg- Since several new groups of boundary registers are isters to store boundaries corresponding to user code introduced into the process, if these register groups are segments in different security domains. The hardware modified by malicious codes, the security defense is cor- needs to perform boundary checking while the program rupted. Therefore, an automatic hardware checking is running, and therefore IMPULP modifies the struc- mechanism is added in IMPULP. These new registers, ture of CPU pipeline. In addition, in order to protect designed for security enhance, are unwritable by un- the boundary register groups from being illegally ac- trusted code segments. cessed, the hardware extends ISA instructions. Another possible form of security risk occurs when the library function invokes any system call by using the 5.1.1 Adding Register Groups syscall function. A process running in kernel mode can IMPULP adds three types of registers in the CPU execute any instruction in the instruction set and access core. any memory location in the system. Since our security There is one group of primary instruction address model considers the kernel to be secure and does not range (PIAR) registers, consisting of the starting and perform security checks on memory access boundaries, the ending instruction addresses of the primary func- a successful syscall can break through the IMPULP de- tion. If the (PC) value of the current fense. instruction exceeds the range, then it is considered that To eliminate such security risk, we make some the current instruction comes from the library function. changes in the kernel to handle syscall. When a syscall We provide 16 groups of library memory address occurs, the kernel adds a syscall check procedure by the range (LMAR) registers. Each group includes one pair following steps. of the starting and the ending memory address acces- 1) The kernel determines whether the syscall is from sible by a library function. Therefore, we can protect trusted code segments or library functions. If it comes 16 objects of one library function at most, to limit the from trusted code segments, the syscall will run nor- memory boundaries of the object. Whenever CPU exe- mally. If it comes from a library function, step 2 will cutes the load/store instruction of an object, the CPU be executed. 2) The kernel compares the parameters pipeline looks up the corresponding group of registers, passed in of a syscall with the upper and the lower confirming whether the memory access of load/store in- bound of the library function registers. If the parame- struction is out-of-bounds. Once a library function re- ters exceed the boundaries, an error report is triggered. turns to the primary function, all LMAR registers are From the above discussion, the user-level partition- reset, which can be used by the next library function. ing for trusted and untrusted code segments needs mod- Therefore, the configuration and the reset of the LMAR ifications of the kernel. The preset of accessible mem- registers are dynamic. The reset is also achieved by in- ory space for untrusted code segments requires GCC strumenting instructions in trusted code segments. extension and changes in linker script. The hardware To avoid ROP-like attacks, IMPULP adds a return achieving runtime check of illegal memory access needs address register (RAR) in the CPU core, recording the to modify the CPU pipeline and add new types of in- return address of the library function. When the library terrupt. In addition, IMPULP adds new groups of reg- function is called, the return address will be stored into isters, and new instructions in ISA are added to pro- RAR. When the library function returns to the pri- tect these metadata (see Section 5 for details of the mary function, the target address stored in the pro- changes). gram counter will be compared with the return address stored in RAR, and the mismatch will trigger return 5 Implementation of IMPULP address error exception. With the help of RAR, we can assure the control flow integrity of IMPULP. In this section, we will describe the hardware and In order to ensure the security of IMPULP, access software modifications of IMPULP in details. For the to registers is subject to the following rules. convenience of description, we name the trusted code Rule 1. The primary function could access local segment in a process as a primary function, and the variables of the program and all the global variables, untrusted code segment as a library function. while the library function could only access variables 424 J. Comput. Sci. & Technol., Mar. 2020, Vol.35, No.2 within the address ranges that the LMAR registers in- EX (Execution). The module to check out-of- dicate. This rule prevents the library functions access- bounds access includes two parts. According to the ing sensitive data kept by the primary function. results of the ID stage, if the instruction is load/store Rule 2. Only the kernel can modify PIAR. Both the and it belongs to library function, IMPULP will check kernel and the primary function could modify LMAR whether the address is within legal memory bounds set registers. in LMAR registers. Out-of-bounds access will trigger an exception to CPU. The indicator exception l in Fig.1 5.1.2 Modification of CPU Pipeline corresponds to illegal load instruction, while exception s To support the design of IMPULP described in is valid when illegal memory store occurs. Section 4, the hardware modification of CPU pipeline MEM (Memory Access). The module added in this includes four parts, which are distributed in various stage checks ROP-like attacks and reports illegal branch stages. Fig.1 shows the changes in a typical five-stage or jump of control flow. For branch and jump instruc- CPU pipeline architecture. The following paragraphs tions, IMPULP divides the destination address of the will introduce the details of each part. same process into two parts, primary function address IF (Instruction Fetch). An instruction is fetched and library function address. If the instruction jumps from the memory by the program counter. We do not from the primary to the library function, the return make changes in this unit. address will be stored into RAR. When any branch or ID (Instruction Decode). We add a module to divide jump instruction occurs in a library function, IMPULP security domains of instructions. This module contains compares the target address with the return address three parts. First, IMPULP adds a unit to check the stored in the RAR. If they are the same, this operation type field of current instruction. If the instruction is is legal. Otherwise IMPULP will trigger a return ad- judged as load, store, branch or jump, IMPULP triggers dress error exception. The corresponding indicator is the next unit, which is used to check the privilege level exception bj in Fig.1. of the instruction. Both other types of instructions and The above steps provide strong protection against instructions from the kernel follow the conventional pro- the ROP attacks, since any other target address from cessing flow. Then, if the privilege level is user mode, a library function is prohibited except the right return IMPULP will compare the program counter with the address to the primary function. This implementation values stored in PIAR registers. With this unit, the in- will certainly affect the application of IMPULP to com- structions of the same process are divided into different plex scenarios, but for the first version of IMPULP, we security domains effectively. With the modification of do not consider function nesting. ID stage, CPU obtains the information about which WB (Write Back). During this stage, the data type of region the instruction code belongs to. There loaded from memory or calculated by the ALU is writ- are three types of regions, namely, the kernel, the pri- ten to the register file for normal state. IMPULP adds mary function, and the library function. As shown in a group of registers to record the current PC value, Fig.1, when the indicator primary i is valid, the instruc- the memory access address, and the exception reason. tion originates from the primary function of user code. When exception occurs in the EX or MEM stage, CPU The function of other indicators is the same. generates the exception with the registers.

IF ID EX MEM WB (1) (2) (3) (4)

kernel_i Check exception_l Security Check primary_i Out-of - exception_bj Execute Domain ROP Bounds exception_s Exception Division Attack Access library_i

(a) (b) () ()

Fig.1. Processing flow modification of five-stage pipeline. Yang-Yang Zhao et al.: IMPULP: A Hardware Approach for In-Process Memory Protection 425

5.1.3 Adding IMPULP-Exclusive Instructions and the sum of addr and len into the upper one. The former refers to the lower bound of accessible mem- In order to configure IMPULP-exclusive registers, ory access address, while the latter refers to the upper we need to extend dedicated configuration instructions cfg on the RISC-V instruction set. bound. defines the permission of the address set in The processor core of our experimental platform LMAR of the library function. supports the RV64MAFDC instruction set. We take When the programmers write codes of a library one of the operation codes reserved in the RV64MAFDC function, they will traditionally request an accessible instruction set, and then the new instruction can be space for a variable. The starting address and the extended. For the convenience of addressing and ex- size of the space are known to the programmer. These tending PIAR and LMAR registers, we have designed two parameters are passed to the API function as addr four IMPULP-exclusive instructions namely impulpprw, and len. Then, the API function updates the lower impulpprs, impulplrw, and impulplrs. The first six char- bound and the upper bound of a corresponding group acters represent our method, “p” or “l” means the in- of LMAR registers. struction is used for setting PIAR or LMAR registers, During the runtime of programs, IMPULP will in- “rw” or “rs” corresponds to “read and write” or “read quire the LMAR registers to see if the following ope- and set” respectively. With the extended instructions, rations are within the specific memory. In this way, IMPULP ensures the security of configuring registers. malicious accesses to the memory space would never succeed as they will be prevented by IMPULP and raise 5.2 Software Supplement exception. index is needed to indicate specific registers used by the current API. After the invocation of a li- As shown in Fig.2, the compiler supports the new brary function, the end protect function is invoked to IMPULP-exclusive instructions. Then according to the clear a LMAR group specified by index; thus IMPULP symbol table which is generated by the compiler, the will not check the memory access range anymore. kernel delimits the primary function and configures the To prevent library functions modifying LMAR reg- registers. The primary function can call API to set LMAR registers. Finally, the CPU executes the pro- isters, special configuration instructions in start protect gram under specific memory access limitation. and end protect could only be executed in the primary function. 5.2.1 IMPULP API 5.2.2 Compiler Modification The encapsulated pair of APIs is as follows: start protect (addr, len, cfg, index), RISC-V provides the companion compilation tool end protect (index). GCC. The compiler computes the input parameter’s There exists a group of LMAR registers. The effective data length, figures out which part of the pro- start protect function writes addr into the lower one, gram is the primary function, and guides the kernel to

Code Kernel CPU

⊲⊲⊲ GCC Set Registers When Instruction start_protect ↼addr֒ Loading Program Decode ,↽len֒ cfg֒ index Code Linker ⊲⊲⊲ Instrumentation end_protect↼index↽, ⊲⊲⊲ Handle Exception Out-of -Bound Checking

Deployment and Build Runtime Check

Fig.2. Developers can configure the accessible address range of a library function via APIs; the compiler instruments codes and supports new instructions; the linker adds codes to modify the memory layout; the kernel sets registers and handles new exceptions at runtime. 426 J. Comput. Sci. & Technol., Mar. 2020, Vol.35, No.2

fill the corresponding registers. 5.3 Usage of IMPULP We add the definition of the new instruction to GCC, as well as the operation code and the instruc- Users apply IMPULP with several simple steps. tion mask of the new instruction. The modified GCC When the user requires calling an untrusted library generates a corresponding binary code when processing function, IMPULP is set to protect in-process mem- a new instruction. When the program is executed, the ory by the following steps. First, the user adds a pair hardware can set and read the value of the correspond- of APIs’ code in the place where the library function is ing registers with the binary code. called in the program. The parameters of start protect We add some lines of code in linker script to modify are the upper and the lower bound of available mem- the memory layout of a primary function. Before the ory space for untrusted function, and the permissions program runs, the instruction address boundaries of the of this space. The index of end protect equals the same primary function have been written to the correspond- value in start protect to reset the same group of LMAR ing PIAR registers. registers. Then the user opens the compile options that support new IMPULP instructions when GCC compiles 5.2.3 Kernel Modification the code. After generating the executable file, the user We modify the load function of a program in the runs the file with the attribute -impulp. Then the user kernel. When the kernel loads the binary file (ELF file), code is running under the protection of IMPULP. Once it obtains the symbol table of the ELF file, which is al- the untrusted code accesses out-of-bounds memory, the ready generated by the compiler and the linker script. CPU will halt the process and generate an exception. Then, start (the entry point of ELF) and libc csu fini (the end of initialization) in this symbol table are re- 6 Evaluation and Results garded as the start and the end address of the primary ○10 function. Finally, the kernel fills the start and the end RISC-V is a free and open source instruction set address of the primary function into registers. architecture (ISA). We perform our evaluations using We modify the code of the process context in a modified RISC-V CPU with 1 GB RAM running the kernel, adding the operations of saving and restor- Linux kernel 4.6.2 and the basic frequency of CPU is ing IMPULP-exclusive registers to ensure the correct- 62.5 MHz. We use the Virtex-7 FPGA VC707 Evalua- ○11 ness of register information. tion Kit as our testing platform. ○12 We modify the code of the exception execution func- We take the open source project “sifive/freedom” tion in the kernel, adding the processing code of the as the baseline project. The steps to generate the base- IMPULP exception. Then, the corresponding excep- line project are as follows: 1) download the project; tion information is printed according to the type of the 2) compile codes in the file named “Makefile.vc707- exception. u500devkit” in the project; 3) create a vivado project We add a procedure of syscall check in the kernel. and generate a bit file; 4) burn the bit file to the test The primary function uses a separate syscall entrance, platform and run the Linux kernel. while the library function uses the entrance of libc to The steps to generate IMPULP project are simi- execute syscall. After the syscall handler saves the con- lar: 1) modify the codes in the files of the directory text, the syscall check procedure is started. According “/freedom/rocket-chip/src/main/scala/rocket” (the files to the value of epc (exception program counter), IM- are written in scala language); 2) re-compile codes PULP judges whether the syscall is from the primary in the file named “Makefile.vc707-u500devkit” in the function or the library function. The syscall from the project; 3) replace the recompiled verilog file in the vi- primary function continues executing, while the syscall vado project; 4) generate new bit file and test the IM- from the library function performs the following check PULP logic. step. If the parameters passed in of a syscall exceed the This section contains the following contents. First, upper or lower boundaries recorded in LMAR registers, we verify the effectiveness of IMPULP. 1) We test mem- the kernel raises an interrupt. Otherwise, the syscall ory protection functions of IMPULP with five cases. 2) from the library function continues executing. We perform a test to defense typical buffer overflow.

○10 https://riscv.org/specifications/privileged-isa/, Mar. 2019. ○11 https://www.xilinx.com/products/boards-and-kits/ek-v7-vc707-g.html, Mar. 2019. ○12 https://github.com/sifive/freedom, Mar. 2019. Yang-Yang Zhao et al.: IMPULP: A Hardware Approach for In-Process Memory Protection 427

3) We test the programs of MIT Lincoln Laboratories executing out-of-bounds memory in the address space, buffer overflow benchmark. 4) We achieve a test to and prohibit library functions from modifying the boun- defense famous memory leakage attacks named Heart- dary register groups associated with the security de- bleed. Then, we select the popular SPEC CPU2006 fense. In addition, when library function calls syscall, benchmark to prove the efficiency of IMPULP. Finally, the memory is also protected by the procedure of syscall we present the hardware overhead of IMPULP. check. Therefore, we test the above five functions with 6.1 Effectiveness five programs. Functional descriptions and test results are shown in 6.1.1 Functional Test Cases Table 2. The exception numbers of 0xC, 0xD and 0xE According to the description in Section 4, IMPULP are IMPULP-exclusive, while the exception number of can prevent library functions from reading, writing or 0x8 is general.

Table 2. Five Functional Test Cases

Attack Type Test Description Test Result Out-of-bounds read The attack occurs when the load instruction of IMPULP generates an exception, and the ex- a library function accesses illegal memory ad- ception number is 0xC dress Out-of-bounds write The attack occurs when the store instruction of IMPULP generates an exception, and the ex- a library function accesses illegal memory ad- ception number is 0xD dress Control-flow hijack The attack occurs when a library function IMPULP generates an exception, and the ex- jumps to a memory address which exceeds the ception number is 0xE limited boundaries Access security-related registers The attack occurs when a library function tries The write enable signal in hardware is invalid. to modify the registers of PIAR, LMAR and The security-related registers cannot be modi- RAR fied by any library function Syscall abuse The attack occurs when a library function tries IMPULP generates an exception, and the ex- to access out-of-bounds memory with a system ception number is 0x8 call

6.1.2 Defense Case of Buffer Overflow open source network servers, namely Bind, Wuftpd and Sendmail. Typical buffer overflow includes stack overflow and heap overflow. The defense case of stack overflow 1 int main↼int argc, char *argv[] ↽ is shown in Fig.3. Line 3 defines a variable in the 2 { stack. Line 4 and line 6 insert IMPULP APIs to limit 3 char pass♭♯, the memory access boundaries of the library function 4 start _protect↼pass, size of (pass), CFG, 0↽, (strcpy in line 5). When the length of copied string 5 strcpy ↼pass, argv[1]↽, is less than 10, the program runs normally. Once the 6 end _protect↼↽, 7 printf ↼εstrcpy doneε↽, length of the copied string exceeds 10, IMPULP raises 8 return 0; an exception. The test of heap overflow is almost the 9 } same. The only different line is line 3. Line 3 applies for a space in the heap by “char *pass = malloc(10)”. Fig.3. Defense case of stack overflow. IMPULP also raises an exception. Therefore, IMPULP We test 13 programs on our platform integrated effectively defends buffer overflow attacks by setting ac- with RISC-V ISA. Each program is accompanied by cessible memory boundaries for vulnerable parameters a document, which includes the causes of buffer over- with APIs. flow, test methods, and expected results. First we test the programs on the platform without IMPULP, and 6.1.3 MIT Benchmarks then we deploy IMPULP API functions in each test We also test the programs of MIT Lincoln Labora- program when the buffer overflow occurs. Finally, we tories buffer overflow benchmark [32]. It contains model analyze the test results. programs with and without buffer overflow bugs deve- The test results show that four test programs do not loped from reported vulnerabilities in three real-world have buffer overflow problems on our RISC-V platform. 428 J. Comput. Sci. & Technol., Mar. 2020, Vol.35, No.2

One possible reason is that our platform is immune to raises an exception and the process is terminated. Fi- the attacks naturally, since the platform is based on nally, no secret data is leaked out. RISC-V ISA. Moreover, three test programs running without IM- 6.2 Efficiency: SPEC CPU2006 Benchmark PULP generate segmentation errors, indicating that Programs these programs cannot bypass existing kernel security In SPEC CPU2006 benchmark suite, we add mechanisms and require no protection provided by IM- start protect API function before the library function PULP. is called and the end protect API function after the li- Six test programs suffer buffer overflow vulnerabil- brary function returns. We regard GLIBC as unreliable ities without IMPULP, resulting in information leak- library functions, for instance, the functions like printf, age, local variables being overwritten, and program exe- memset, strcpy, memcpy are not trusted. We record the cution producing unexpected results. The ID numbers lines of API codes of each test benchmark. As shown of these test cases are 1283, 1285, 1289, 1291, 1295, in Table 3, more than 1 000 API functions are inserted and 1297. With IMPULP, all the illegal operations are into the whole benchmark. An API function may be prevented. Therefore, these programs can successfully called multiple times during program execution, or be defend against buffer overflows and raise exception. skipped due to branch or jump commands. Therefore, 6.1.4 Defense Case of Memory Leakage we count the total number of API calls for each program and list the results in Table 3. Heartbleed is a in the OpenSSL cryp- We choose bzip2, gcc, mcf, gobmk, hmmer, sjeng, tography library. The vulnerability allows attackers libquantum, omnetpp and astar as our test benchmarks. to remotely read protected memory from an estimated We run train workload three times and use average run- 24%–55% of popular HTTPS sites. We build a socket time for each benchmark. Results show that the IM- server with the API provided by OpenSSL-1.0.1e in C, PULP configuration overhead is almost negligible, the which initializes the socket and SSL library, and waits total overhead of the IMPULP is 0.166% (249/150353) for the connection of clients using SSL accept. We run of the total execution runtime (150 353 s). The pro- a socket client program to communicate with the socket gram like bzip2 calls API functions in a large amount server. The client sends hello request of TLSv1.1 to the (907 446), while the time overhead is only 0.05%. The server, which establishes the connection between them. IMPULP method has a performance advantage in prin- To replay Heartbleed, we send a malformed Heart- ciple, and the test results verify the advantage. First, beat request from a socket client. The expected length IMPULP divides security domains in the user level. IM- of the response packet is 685 bytes. Then the server PULP takes instruction address as comparative subject replies with excess data that may contain sensitive in- to determine whether the current code can access the formation, such as users’ passwords. The actual length object range dynamically set by a pair of APIs. The of the response packet is 61455 bytes. overhead is only the boundary setting instruction added After deploying IMPULP in the OpenSSL source at the entrance of a subroutine. Therefore, it reduces code, that is, protecting the memcpy invocation in the inter-domain switch overheads compared with con- tls1 process heartbeat, the unexpected copy operation ventional methods, which either save complex contexts

Table 3. Execution Time of SPEC CPU2006 Benchmark Programs

Test Program Lines of API Codes Total Number of API Calls Baseline (s) IMPULP (s) Run-Time Cost (%) 401.bzip2 404 907 446 6 167.00 6 170.00 0.050 403.gcc 248 322 406 167.26 167.32 0.030 429.mcf 20 18 2 060.54 2 059.95 −0.030 445.gobmk 58 13 150 22 270.00 22 355.00 0.430 456.hmmer 218 120 14 935.00 14 981.00 0.310 458.sjeng 144 30 21 729.00 21 784.00 0.250 462.libquantum 98 36 14 565.00 14 637.00 0.490 471.omnetpp 8 8 58 322.00 58 292.00 −0.050 473.astar 44 80 10 137.00 10 156.00 0.190 Total 1 242 1 243 294 150 352.80 150 602.27 0.166 Yang-Yang Zhao et al.: IMPULP: A Hardware Approach for In-Process Memory Protection 429 or perform complex encryption and decryption when are verified on our first FPGA prototype based on switching domains. Furthermore, during the execution RISC-V ISA. However, applying IMPULP to more ap- of a subroutine, any data access out of boundaries will plications and platforms also faces the following prob- be detected by hardware automatically, and no instru- lems, which need to be solved in our future work. mentation is needed within a subroutine. Therefore, First, the reason to implement our first prototype of the performance overhead is negligible. IMPULP on an FPGA platform integrated with RISC- Table 3 shows the execution time comparison be- V ISA is that RISC-V ISA is free and open source. The tween the baseline and IMPULP. The execution time implementation of IMPULP needs to modify the CPU of IMPULP is almost the same as the execution time pipeline and the instruction set. The changes are at of the baseline. The time is recorded in seconds. the architecture level. However the existing Intel plat- forms and ARM platforms are difficult to achieve the 6.3 Hardware Cost modifications because they are not open sources and often confront with property rights issues. Related stu- First, we check the utilization of FPGA resources dies like Intel MPX and Intel SGX require hardware and total on-chip power in the vivado project. assistance from the Intel platforms; therefore perfor- As shown in Table 4, the overhead of different re- mance comparison cross platforms is difficult to achieve. sources of IMPULP is less than 5.5% while the power Therefore we only analyze the advantages and disad- overhead is 4.3%. LUT means look-up table. LU- vantages of related studies. TRAM represents the look-up table implemented by Second, the implementation of our first prototype of RAM (random access memory). FF represents flip-flop IMPULP only checks load/store/branch/jump instruc- units. BRAM represents block RAM. tions. The reason is that RISC-V is a reduced instruc- Table 4. FPGA Resource Usages of RISC-V Baseline and IM- tion set, which does not include other instructions to PULP affect memory access. In the future, when IMPULP is Power (W) LUT LUTRAM FF BRAM applied to other platforms, the implementation needs to Baseline 3.747 48 519.0 3 546.0 35 833.0 30 add checks for other instructions of the corresponding IMPULP 3.911 51 138.0 3 550.0 37 771.0 30 platform. Overhead (%) 4.300 5.4 0.1 5.4 0 Third, the current experimental platform only im- plements the simplest ROP defense with a register to Moreover, preparing for future chip design, we also ensure the correctness of IMPULP. Although it is suffi- use the DC (design compiler) tool to compare the mod- cient for testing benchmarks, the usage of IMPULP is ified core with the original core in terms of area, cells limited when applying to more library functions. The (the minimum unit of ), and power. next prototype of IMPULP will include the implemen- Considering the practicality of a chip, we add L2 cache tation of conventional CFI methods, such as return ad- [29, 33] resources and test the overall results. Table 5 gives dress stack and shadow stack . the hardware comparison results of the baseline and Then, nested function calls are not supported in IMPULP. The area overhead of the IMPULP is 0.7% our first prototype, since the specific implementation (16598/2375314) while the power overhead is 0.5% may involve more security domains. However, nested (0.982/192.119). Both are negligible. function calls are very ordinary in practical applica- tions. Therefore, we simply discuss the method to solve Table 5. Hardware Comparison Results of RISC-V Baseline the problem and reserve the implementation for future and IMPULP work. Area (µm2) Cells Power (W) We take the simplest single-level nested call as an Baseline 2 375 314.0 814 561.0 192.119 example. First, the primary function calls a library IMPULP 2 391 912.0 822 024.0 193.101 function. Then, the library function calls another li- Overhead (%) 0.7 0.9 0.500 brary function or system function. To support the nested function call, there exist at least two groups 7 Limitations and Future Work of memory boundary registers and the corresponding selection logic for different library functions. The cor- With the experiments in Section 6, the correctness responding return address register is also extended to and the performance advantage of the IMPULP method a return address stack for nested return address check- 430 J. Comput. Sci. & Technol., Mar. 2020, Vol.35, No.2 ing. When a library function calls a system function, [4] McCamant S, Morrisett G. Evaluating SFI for a CISC ar- one method is to modify the codes to jump back to chitecture. In Proc. the 15th USENIX Security Symposium, the primary function, and then execute the system call. July 2006, Article No. 9. [5] Wahbe R, Lucco S, Anderson T E, Graham S L. Effi- When a library function calls another library function, cient software-based fault isolation. ACM SIGOPS Ope- we switch to a trusted state to get permission to read rating Systems Review, 1993, 27(5): 203-216. and write new registers of new library function. Then, [6] Sehr D, Muth R, Biffle C, Khimenko V, Pasko E, Schimpf we limit the accessible memory space no larger than the K, Yee B, Chen B. Adapting software fault isolation to con- original library function for security. If out-of-bounds temporary CPU architectures. In Proc. the 19th USENIX memory access occurs, an error is reported. To prevent Security Symposium, August 2010, pp.1-12. [7] Otterstad C W. A brief evaluation of Intel®MPX. In Proc. a ROP attack, the return address is compared with the the 2015 Annual IEEE Systems Conference, April 2015, addresses stored in the return address stack. pp.1-7. Finally, IMPULP is a user-level isolation method. [8] One Aleph. Smashing the stack for fun and profit. Phrack For programs in user mode, the address space is only Magazine, 1996, 7(49): Article No. 14. a virtual address space. The conversion from virtual [9] Schuster F, Tendyck T, Liebchen C, Davi L, Sadeghi A R, Holz T. Counterfeit object-oriented programming: On the address to physical address and the protection of phys- difficulty of preventing code reuse attacks in C++ applica- ical address are not covered by the IMPULP method. If tions. In Proc. the 36th IEEE Symposium on Security and necessary, we can extend the IMPULP method to other Privacy, May 2015, pp.745-762. security levels in our future work, or combine other con- [10] Shacham H. The geometry of innocent flesh on the bone: ventional protection methods, such as memory encryp- Return-into-libc without function calls (on the x86). In Proc. the 2007 ACM SIGSAC Conference on Computer tion technology, to realize the protection of physical and Communications Security, October 2007, pp.552-561. address. [11] Snow K Z, Monrose F, Davi L, Dmitrienko A, Liebchen C, Sadeghi A. Just-in-time code reuse: On the effectiveness of 8 Conclusions fine-grained address space layout randomization. In Proc. the 34th IEEE Symposium on Security and Privacy, May The security vulnerability is inevitable without ef- 2013, pp.574-588. fective in-process memory protection methods. In this [12] Chen S, Xu J, Sezer E C. Non-control-data attacks are re- Proc. the 14th USENIX Security Sympo- study, we proposed IMPULP, a hardware approach for alistic threats. In sium, July 2005, Article No. 13. effective and efficient in-process memory protection. [13] Hu H, Chua Z L, Adrian S, Saxena P, Liang Z. Auto- The main idea of the IMPULP method is to divide se- matic generation of data-oriented exploits. In Proc. the 24th curity domains in the user level in order to improve USENIX Security Symposium, August 2015, pp.177-192. performance. For the codes of untrusted security do- [14] Hu H, Shinde S, Adrian S, Chua Z L, Saxena P, Liang Z. mains, IMPULP limits the accessible memory address Data-oriented programming: On the expressiveness of non- control data attacks. In Proc. the 37th IEEE Symposium within parts of the address space and sets access per- on Security and Privacy, May 2016, pp.969-986. missions for safety. The bounds checking is executed [15] Roemer R, Buchanan E, Shacham H, Savage S. Return- by modified pipeline in hardware. Experimental results oriented programming system, languages, and applications. showed that IMPULP can effectively protect in-process ACM Transactions on Information and System Security, memory with negligible runtime overhead. 2012, 15(1): Article No. 2. [16] Sadeghi A A, Niksefat S, Rostamipour M. Pure-call ori- ented programming (PCOP): Chaining the gadgets using References call instructions. Journal of Computer Virology and Hack- ing Techniques, 2018, 14(2): 139-156. [1] Jacomme C, Kremer S, Scerri G. Symbolic models for iso- [17] Bletsch T, Jiang X, Freeh V, Liang Z. Jump oriented pro- Proc. the 2007 IEEE Eu- lated execution environments. In gramming: A new class of code-reuse attack. In Proc. the ropean Symposium on Security and Privacy, April 2017, 6th ACM Symposium on Information, Computer and Com- pp.530-545. munications Security, March 2011, pp.30-40. [2] Chen Y H, Reymondjohnson S, Sun Z, Lu L. Shreds: Fine- [18] Lu K, Song C, Lee B, Chung S P, Lee W. ASLR-guard: grained execution units with private memory. In Proc. the Stopping address space leakage for code reuse attacks. In 2006 IEEE Symposium on Security and Privacy, May 2016, Proc. the 22nd ACM SIGSAC Conference on Computer pp.56-71. and Communications Security, October 2015, pp.280-291. [3] Kudo N, Yamauchi T, Austin T H. Access control for plug- [19] Abadi M, Budiu M, Erlingsson U, Ligatti J. Control-flow ins in Cordova-based hybrid applications. In Proc. the 31st integrity. In Proc. the 12th ACM SIGSAC Conference on IEEE International Conference on Advanced Information Computer and Communications Security, November 2005, Networking and Applications, March 2017, pp.1063-1069. pp.340-353. Yang-Yang Zhao et al.: IMPULP: A Hardware Approach for In-Process Memory Protection 431

[20] Kuznetsov V, Szekeres L, Payer M, Candea G, Sekar R, Yang-Yang Zhao received her M.E. Proc. the 11th USENIX Song D. Code-pointer integrity. In degree in electronic and communication Symposium on Operating Systems Design and Implemen- engineering from Harbin Institute of tation, October 2014, pp.147-163. Technology, Harbin, in 2013. She is a [21] Evans I, Fingeret S, Gonzalez J, Otgonbaatar U, Tang T, Ph.D. candidate in computer architec- Shrobe H, Sidiroglou-Douskos S, Rinard M, Okhravi H. ture of University of Chinese Academy Missing the point(er): On the effectiveness of code pointer of Sciences, Beijing. She is also a integrity. In Proc. the 36th IEEE Symposium on Security student member of IEEE. Her main and Privacy, May 2015, pp.781-796. research interests include architecture, memory system, [22] Akritidis P, Cadar C, Raiciu C, Costa M, Castro M. Pre- and security. venting memory error exploits with WIT. In Proc. the 29th IEEE Symposium on Security and Privacy, May 2008, pp.263-277. [23] Castro M, Costa M, Harris T. Securing software by enforc- Ming-Yu Chen received his Ph.D. ing data-flow integrity. In Proc. the 7th USENIX Sympo- degree in computer architecture from sium on Operating Systems Design and Implementation, Institute of Computing Technology November 2006, pp.147-160. (ICT), Chinese Academy of Sciences [24] Frassetto T, Jauernig P, Liebchen C, Sadeghi A, Darm- (CAS), Beijing, in 2000. He is currently stadt T U. IMIX: In-process memory isolation eXtension. a professor and Ph.D. supervisor of In Proc. the 27th USENIX Security Symposium, August ICT, CAS, Beijing, UCAS, Beijing, 2018, pp.83-97. and PengCheng Laboratory, Shenzhen. [25] Costan V, Devadas S. Intel SGX explained. IACR Cryptol- He is a member of CCF, ACM and IEEE. His main ogy ePrint Archive, 2016, 2016: Article No. 86. research interests include computer architecture, memory [26] Feustel E A. On the advantages of tagged architecture. protection, operating system, and algorithm optimization IEEE Transactions on Computers, 1973, 22(7): 644-656. for high-performance computers and datacenter networks. [27] Tsai T, Singh N. Libsafe: Transparent system-wide pro- tection against buffer overflow attacks. In Proc. the 2002 International Conference on Dependable Systems and Net- Yu-Hang Liu received his Ph.D. works, June 2002, Article No. 541. degree in computer science from Bei- [28] Lin Z, Mao B, Xie L. LibsafeXP: A practical and transpar- hang University, Beijing, in 2013. He ent tool for run-time buffer overflow preventions. In Proc. is an associate professor in Institute of the 7th Annual IEEE Information Assurance Workshop, Computing Technology (ICT), Chinese June 2006, pp.332-339. Academy of Sciences (CAS), Beijing. [29] Dang T H Y, Maniatis P, Wagner D. The performance cost He has been a postdoctoral researcher of shadow stacks and stack canaries. In Proc. the 10th ACM with the Computer Science Department Symposium on Information, Computer and Communica- of Illinois Institute of Technology (IIT), Chicago. He is a tions Security, April 2015, pp.555-566. member of CCF, ACM and IEEE. His research interests [30] Belay A, Bittau A, Mashtizadeh A, Terei D, Mazi`eres D, include high-performance computing, computer architec- Kozyrakis C. Dune: Safe user-level access to privileged CPU ture, data intensive computing, and memory performance features. In Proc. the 10th USENIX Symposium on Ope- optimization. He is currently working on multi-core rating Systems Design and Implementation, October 2012, memory scheduling, partitioning and prefetching area to pp.335-348. improve multi-core memory access bandwidth utilization [31] Chen Y, Reymondjohnson S, Sun Z, Lu L. Shreds: Fine- and minimize access latency. grained execution units with private memory. In Proc. the 2016 IEEE Symposium on Security and Privacy, May 2016, pp.56-71. [32] Zitser M, Lippmann R, Leek T. Testing static analysis tools Zong-Hao Yang received his Bach- using exploitable buffer overflows from open source code. In elor’s degree in software engineering Proc. the 12th ACM SIGSOFT International Symposium from Dalian University of Technology, on Foundations of Software Engineering, October 2004, Dalian, in 2013. He is a Master student pp.97-106. from University of Chinese Academy [33] Carlini N, Barresi A, Payer M, Wagner D, Gross T R. of Sciences, Beijing. His main research Control-flow bending: On the effectiveness of control-flow interests include architecture, operating integrity. In Proc. the 24th USENIX Security Symposium, system, and security. August 2015, pp.161-176. 432 J. Comput. Sci. & Technol., Mar. 2020, Vol.35, No.2

Xiao-Jing Zhu received her Ph.D. Yun-Ge Guo received his M.E. degree in computer architecture from degree in computer technology from University of Science and Technology University of Chinese Academy of of China, Hefei, in 2008. She has been Sciences (CAS), Beijing, in 2018. He a researcher in Institute of Computing has done researches in Institute of Technology (ICT), Chinese Academy Computing Technology (ICT), CAS, of Sciences (CAS), Beijing. Her main Beijing. His main research interests research interests include operating include operating system and system system, and security. software.

Zong-Hui Hong received her M.S. degree in computer architecture from University of Chinese Academy of Sciences (CAS), Beijing, in 2018. She has done researches in Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS), Beijing. Her main research interests include architecture, memory system, and security.