<<

Technical Report No. CE-J09-001, Dept. of Electrical Engineering, Princeton University, Jan. 30, 2009. INVISIOS: A Lightweight, Minimally Intrusive Secure Execution Environment ∗

Divya Arora Najwa Aaraj Anand Raghunathan Intel Corporation Princeton University Purdue University [email protected] [email protected] [email protected] Niraj K. Jha Princeton University [email protected]

Abstract 1 Introduction Security has long been the subject of extensive research Many information security attacks exploit vulnerabilities in in the context of computing and communication systems. “trusted” and privileged software executing on the system, such as This has led to developments of various “functional” security the operating system (OS). On the other hand, most security mech- measures [1,2] such as cryptographic algorithms and secure anisms provide no immunity to security-critical user applications if communication protocols. However, security attacks over the vulnerabilities are present in the underlying OS. While technologies years have made it abundantly clear that while functional se- have been proposed that facilitate isolation of security-critical soft- curity measures are necessary, they are not sufficient. Many ware, they require either significant computational resources and attacks target weaknesses in a system’s implementation and are hence not applicable to many resource-constrained systems, a system’s security is only as strong as the weakest link in its or necessitate extensive re-design of the underlying processors and implementation. hardware. A recurring theme among many security attacks is that In this work, we propose INVISIOS – a lightweight, minimally “trusted” code is hijacked at run-time – so even if the orig- intrusive hardware-software architecture to make the execution of inal code is not malicious by intent, it can be manipulated security-critical software invisible to the OS, and hence protected by clever attackers, resulting in unwarranted behavior, such from its vulnerabilities. The INVISIOS software architecture encap- as execution of malicious code, leakage of sensitive informa- sulates the security-critical software into a self-contained software tion, and corruption of program variables. While the sub- module, called the secure core. While the secure core is part of the version of a user application can cause a significant amount kernel and is run with kernel-level privileges, its code, data, and of damage, security breaches that exploit vulnerabilities in execution are transparent to and protected from the rest of the ker- the OS are especially dangerous. Modern OSs are huge and nel. The INVISIOS hardware architecture consists of simple add-on complex software, spanning tens of millions of lines of code, hardware components that are responsible for bootstrapping the se- making it extremely difficult to make them vulnerability-free. cure core, ensuring that it is exercised by applications in only per- Moreover, most commodity OSs are monolithic, implying mitted ways, and enforcing the isolation of its code and data from that the entire OS runs at the highest privilege level. Since the rest of the system. We implemented INVISIOS by enhancing the there is no privilege separation within the OS, compromise QEMU full-system emulator and Linux to model the proposed soft- of one part can easily bring down the entire system. ware and hardware enhancements, and applied it to protect a com- While it may not be feasible to secure all applications mercial cryptograhpic library. Our experiments demonstrate that from a compromised OS, it may be possible to achieve a INVISIOS is capable of facilitating secure execution at very small secure execution environment for selected, security-critical overheads, making it suitable for resource-constrained embedded applications (e.g., cryptographic libraries, key management systems and systems-on-chip. software, integrity verification kernels). Some work has been done towards the above objective, which involves isolation of the critical software from the rest of the system. The isola- tion may be based on physical separation, wherein the crit- ∗Acknowledgments: This work was supported by NSF under Grant No. ical software is executed on a physically separate CNS-0720110. and uses a separate memory (e.g., IBM co-processor [3]), or

1 logical separation, in which both the sensitive and untrusted • We implement a prototype to show the viability of our code run on the same processor, but are isolated, either us- approach. We incorporate several modifications in the ing an additional layer of software, such as virtualization, or Linux kernel and perform a comprehensive analysis of through additional hardware support (e.g., Intel TXT [4] and the penalties imposed by our processor architecture. We ARM TrustZone [5]). analyze the security of our architecture by running the In this paper, we present a lightweight, minimally intru- modified Linux kernel on a modified version of the sive secure execution environment based on the principle of QEMU processor emulator [7]. logical separation. Securing critical software through logi- The rest of this paper is organized as follows. We present cal separation is non-trivial since it cannot be achieved by a survey of relevant past work in Section 2. We describe the securing the code alone. Even the knowledge of a few in- details of the hardware and software components of INVI- termediate variables, which may be obtained by reading pro- SIOS in Section 3. We present our experimental methodol- cessor registers during context , can be used by an ogy and results in Section 4. We analyze the security of our attacker to infer the original sensitive data from which they architecture in Section 5 and conclude in Section 6. were computed [6]. We focus on providing a secure exe- cution environment for critical software while making min- 2 Related Work imally intrusive changes to the existing processor architec- ture. The proposed scheme is especially suitable in the con- Secure execution environments can be achieved by using text of SoCs, which are assembled using standard compo- an additional layer of software underneath the OS to virtual- nents (including processors), that are provided in the form ize the hardware platform. Despite significant advances in ef- of intellectual property (IP) cores. In such scenarios, it is ficient virtualization, they remain beyond the reach of many difficult for the SoC designer to make intrusive changes to resource-constrained embedded systems, which are the pri- the processor. Our proposal includes add-on hardware units mary target of our work. Moreover, the complexity of virtual which simply observe the address lines on the main proces- machine monitors (VMMs) has grown significantly to a point sor chip and base their checking on the observed addresses where there is concern about vulnerabilities in the VMM it- and certain pre-configured constants. They do not modify self [8]. the processor pipeline, datapath or other on-chip structures, Next, we review past work in the field of hardware- such as caches and register files. We also do not add any new assisted secure program execution. instructions. The contributions of our work are as follows: Researchers have proposed architectural enhancements to validate the control flow of an executing program at run-time • We propose a set of hardware enhancements to the pro- and flag deviations from the expected control flow as security cessor that can facilitate an execution environment that exceptions. The enhancements can be in the form of modifi- is invisible to, and protected from, the rest of the soft- cations to the processor pipeline [20], dedicated add-on hard- ware that executes on the processor. We refer to the ware checkers [9], and additions to the processor instruction above environment as the secure domain, while the non- set [10] that augment the existing instructions to perform ad- critical software is run in the normal domain. ditional checking. 1. The hardware enhancements consist of (i) a mem- Dynamic information flow tracking [11] is a hardware- ory access checker and a monitor, which monitor supported information flow tracker, which tracks the flow of the processor-to- bus, and the system memory bus spurious information and prevents it from violating a speci- respectively, in order to enforce memory separation be- fied security policy. Secure return address stack [12] is an- tween the secure domain and the normal domain, and other hardware technique which prevents the disruption of (ii) a secure bootstrap controller which authenticates normal control flow affected by a specific type of attack, and verifies the integrity of the sensitive code, and in- namely, stack overflow. Secure program execution frame- stalls it for execution in the secure domain. We assume work (SPEF) [13] is a combination of architectural and com- the critical software is available as a library, which pro- pilation techniques which ensure software integrity and pre- vides services to other applications. vent any code, which has not been explicitly installed, from 2. The software architecture of INVISIOS comprises being executed on the system. (i) a user-level stub that provides the same application The above techniques are primarily designed to protect a programming interface (API) as the secure library, en- benign application from its own vulnerabilities. However, codes the API calls, and makes an appropriate system they cannot protect this application, if the vulnerabilities are present in the OS code which lead to compromising the OS. call, (ii) a kernel-level stub which interfaces between the user stub and the actual library code, and (iii) a se- In light of the above limitations and the fact that modern cure wrapper that provides single entry/exit points into OSs abound in security vulnerabilities, architectures that of- the library, and makes the secure library self-contained fer stronger protection are required for security-critical soft- and invisible to the rest of the system. ware, such as cryptographic libraries, digital rights manage-

2 ment software, etc. Some architectures base their security on privileges, but is protected from the rest of the kernel by the physical separation, isolating the critical software from the INVISIOS hardware. In short, the kernel code is a superset rest of the system and thus protecting it from the weaknesses of the code executing in the secure domain. of the latter. For example, smartcards are often used to exe- 3.1 Threat Model cute small computations [14], which involve the use of sen- sitive code or data. At the higher end of the spectrum are the We assume that the critical code that needs to be secured high-performance secure co-processors which are capable of is available as a software library, secure lib, and only executing general purpose code and include accelerators for this library needs to be protected by INVISIOS. The applica- cryptographic operations [3,15]. tions that use secure lib reside in the non-secure domain It is also possible to create a secure execution environment and can invoke the library functions through a well-defined for critical software based on hardware- and virtualization- API. We further assume that the code of secure lib is supported logical separation. In this case, the critical soft- trusted, benign, and small enough to be thoroughly checked ware is executed on the same processor as the OS and other and made free of security vulnerabilities. untrusted applications, but the support of hardware or vir- secure lib is encapsulated into secure core which tual machine monitors is used to maintain a separation be- is executed in the secure domain. All user applications tween the two. Enhanced processor architectures, such as and the kernel except secure core are untrusted and ex- XOM [16] and AEGIS [17] revamp the to ecute in the normal domain. The objective of INVISIOS protect applications in the presence of untrusted memory and is to ensure that the code and data of secure core are OS. The attack model assumes that the contents of the mem- safeguarded from all entities executing in the normal do- ory can be easily read or modified by an adversary and that a main. For example, it prevents the malicious overwrite of compromise of the OS can lead to breakdown of the separa- the secure core code by, say, a compromised OS. In ad- tion between different processes. dition, the data of secure core, including globals, heap Architectures, such as overshadow [18], aim at protect- and stack variables, and temporary values stored in the pro- ing the privacy and integrity of application data even when cessor registers during execution, cannot be read or modified the OS is compromised. They use the concept of multi- by any instructions that do not belong to secure core. shadowing, which gives a different view of the actual ma- INVISIOS does not protect secure lib against phys- chine addresses for different processes. Different views are ical attacks that are launched after the library has been achieved through the encryption and decryption of mem- loaded. The bootstrap unit (described later) verifies that ory pages. Secret-protected (SP) architecture [19] and ARM secure core has not been corrupted before loading. The Trustzone [5] aim for a more modest target of a single secure remaining hardware protects against run-time software-based compartment in the processor with fewer changes to the ar- attacks. chitecture. INVISIOS has a similar objective to SP and Trust- Note that INVISIOS does not limit what secure core zone in the sense that it also creates a single secure compart- can do. Therefore, if secure lib itself is malicious or ment with a small amount of hardware support and few mod- vulnerable, then INVISIOS cannot preserve the boundary ifications to the kernel. However, unlike these approaches, between the secure and normal domains. The remaining a primary objective of INVISIOS is to make minimally in- sections describe the modifications in the kernel and en- trusive changes to the processor architecture. Therefore, un- hancements to the processor architecture that are required to like the above techniques, it does not introduce any new in- achieve the above objectives. structions in the processor instruction set architecture (ISA) 3.2 INVISIOS Architecture Overview or modify the tags of on-chip caches and translation looka- side buffers (TLBs). The secure compartment in INVISIOS The goal of INVISIOS is to provide a secure, isolated protects code from unwarranted modification and protects the execution environment for security-sensitive software while program data from both unwarranted disclosure and modifi- making minimally intrusive changes to the processor. There- cation. The privacy of program code (which can be achieved fore, no new instructions, cache tags, TLB tags, or modifica- through encryption) is not an objective of INVISIOS. tions to the datapath are introduced. The INVISIOS hard- ware modules base their checking on the addresses of in- struction and data accessed by the processor. The software 3 INVISIOS Architecture architecture of INVISIOS also makes relatively few changes As mentioned previously, INVISIOS segregates the exe- to the core OS and is transparent to applications that do not cution on the processor into two domains: secure and nor- use secure lib. mal. These domains are different from the privilege levels The core idea behind the design of INVISIOS is to create of a program, namely, user and kernel. The software run- a self-contained software module, namely, secure core ning in the secure domain, referred to as secure core is (which encapsulates the code and data to be protected), load incorporated in the OS kernel and executes with kernel-level it at a fixed physical address in memory [non-accessible by

3 direct memory access (DMA)] and use hardware support to cal addresses after the address translation is completed by control access to the memory range reserved for the core. the MMU, resident between the processor and the caches. Essentially, the hardware acts as a gatekeeper, opening and An alternative translation scheme, using virtually-indexed, closing access to the secure memory range based on certain physically-tagged caches, could use the virtual address to conditions and events. index into the cache and the translated physical address to Figure 1 depicts the high-level view of the INVISIOS verify the tag. Even in this scheme, a complete physical ad- hardware and software architecture. On the software side, dress is available before the cache access is considered valid. the kernel, secure core, and the application are loaded Therefore, the above cache access scheme is compatible with into memory. secure core is part of the kernel and is INVISIOS. INVISIOS is not compatible with virtually ad- loaded at a known physical memory address. On the hard- dressed caches which are accessed using virtual addresses ware side, the figure shows the main processor, along with and MMU translation is performed only in case of a cache on-chip instruction and data caches, and the memory man- miss. However, this is not a major limitation since virtually agement unit (MMU). The processor and other system pe- addressed caches are not commonly used as they create ad- ripherals can access the off-chip main memory via the system dress aliasing problems (multiple virtual addresses mapping memory bus. The INVISIOS hardware architecture is shown to same physical address). within the dotted lines. In this section, we describe how a given software library, secure lib, is incorporated into the INVISIOS software Secure Application + library + Kernel architecture. We assume that applications can avail of the Software services of secure lib through a well-defined API ex- Hardware Instruction ported by the library. In INVISIOS, the application runs

I−$ in user space, whereas secure lib runs in kernel space. VA PA Shared memory is used to pass application arguments to li- Processor MMU Data brary functions and to return results from the latter. Usually, System VA PA D−$ memory the sensitive code which needs to be secured is provided in the form of a user-space library. For example, an applica- Disable R/W NMI tion may be partitioned, manually or using data-flow analysis tools, to segregate security-sensitive modules. secure lib Bootstrap unit peripherals needs to be partitioned into two modules – one which runs in

Checker State Bus the normal domain (in user or kernel space), and is referred Monitor normal secure lib INVISIOS hardware architecture to as , and the other which executes in the INVISIOS secure domain (which resides in the kernel) and is called invisios secure lib. The parts of the Figure 1. INVISIOS hardware and software ar- original library, which go into invisios secure lib, chitecture may need to be explicitly ported into the kernel. Note that secure core invisios secure lib The virtual and physical address lines for both instructions is a superset of . and data are pulled out and fed to the INVISIOS hardware As mentioned before, a fixed physical memory ad- which uses them to enforce the requisite access control. Any dress range is reserved for secure core, which is pro- violation detected by the INVISIOS hardware resets its state, tected from user applications and the rest of the kernel disables memory access for the current access and results in by the INVISIOS hardware. secure core comprises a non-maskable interrupt (NMI) to the processor. A benign invisios secure lib along with secure wrapper OS can respond to this interrupt by terminating the offending (described below). Every entry into and exit from application. In case the OS itself has been corrupted, the secure core represent a transition between the two ex- result is indeterminate except that no software (not even the ecution domains and must be monitored and validated by the OS) can access the secure memory range since access to it is hardware. closed by the hardware. Any legitimate software wishing to Figure 2(a) shows the case when an application is linked use secure lib after a violation detection will need to re- with secure lib to produce a regular binary. Calls from initialize the INVISIOS hardware and secure core. The the application to the library are regular function calls and INVISIOS hardware is described in Section 3.4. do not require OS intervention. Control is transferred to Note that Figure 1 shows a processor with separate L1 in- the OS from the application in three cases, as depicted by struction and data caches. L2 cache has been omitted from the block arrows: (a) when the application makes system the figure for ease of illustration and the INVISIOS archi- calls, (b) when an exception is generated by the processor tecture is independent of it. Also, the figure shows phys- (e.g, divide by zero error), and (c) when a hardware inter- ically addressed caches, which are addressed using physi- rupt (e.g, timer interrupt) is delivered to the processor. A

4 corrupted OS can take advantage of any of these control stored return address. In case of a new call, the transfers to read/corrupt security-sensitive data belonging to secure entry code switches the processor stack secure lib. pointer to the beginning of the secure stack region (re- 3.3 INVISIOS Software Architecture served for secure core), and invokes the appropriate function in invisios secure lib. Since the OS is untrusted, any control flow or data assign- secure intr handler is invoked when an inter- ment based on the results of a system call are also unreliable. rupt or exception occurs while secure core is ex- Therefore, invisios secure lib consists only of those ecuting. Its primary functions are to save the context parts of secure lib that do not make anysystem call. Any of secure core, zeroize registers used for holding kernel subroutine invoked by invisios secure lib temporary values, create a fake interrupt context, restore needs to be replicated in the latter. Cases (b) and (c) men- the original interrupt handlers and jump to the original tioned above, in which control is transferred to an untrusted handler corresponding to the current interrupt of excep- OS, are handled by various components of the INVISIOS tion. Essentially, secure intr handler just per- software architecture. forms some cleanup functions before letting the original Figure 2(b) shows the three main components of the IN- handler handle it. The fake interrupt context is created VISIOS software: in such a way that the original interrupt handler believes • user stub: This is a small piece of code which ex- that the interrupt came when the program was executing poses the same API as secure lib and is linked with in kernel stub and returns control to the latter after the application to produce an executable for INVISIOS. completing its function. kernel stub then jumps to It transforms the library API calls into a system call with the single point of entry into secure wrapper, and appropriate parameters. In the simplest case, the list of execution resumes. parameters comprises the name of the API call and the arguments passed to it by the application. Therefore, in- The secure exit code cleans up the state set up by vocation of library calls in the binary results in system secure entry. It resets the saved secure core calls, which transfer control to kernel stub. context, restores the original interrupt handlers and returns to kernel stub, which, in turn, re- • kernel stub: This consists of functionsthat copy ar- turns to user stub. invisios secure lib guments from the user space into kernel space, transfer does not invoke any outside kernel functions, and control to secure wrapper and copy the results re- all interrupts and exceptions occurring during turned by the latter back into user space before returning the execution of secure core are caught by to user stub. secure intr handler before passing control to • secure wrapper: This is the code which encapsu- the original handlers. Thus, the OS is oblivious of lates invisios secure lib and is the main soft- secure core and only knows about kernel stub ware component responsible for making execution in- to which it returns control after the original handler has visible to the OS. secure wrapper has a single point completed its job. for entry and a known, finite number of points of exit. In all the components described above, interrupts are dis- Functions in invisios secure lib can only be abled and re-enabled as necessary so that an interrupt during called by secure wrapper. their operation does not affect the correctness or security of secure wrapper further consists of three sub- the system. secure wrapper needs to be part of the ker- components: secure entry, secure exit, and nel since it changes interrupt handlers, which involves the use secure intr handler. The secure entry of privileged instructions. The three-layered software archi- code saves the caller context (namely, the context of tecture facilitates a smooth transition from user stub run- kernel stub), and changes the address of all the ning with minimal privileges to secure core which runs processor interrupt and exception handlers to point with the highest privileges as part of the kernel, but is segre- to the start of secure intr handler. Conse- gated in memory from the rest of the kernel and is protected quently, any interrupts and exceptions received by from it with the help of the INVISIOS hardware. the OS during the execution of secure core are INVISIOS trades off flexibility for simplicity. The fact redirected to secure intr handler. Finally, the that secure core lies within a single contiguous mem- secure entry code checks if this is a new call into ory range simplifies the hardware checker greatly. However, secure core or whether control has been returned it also imposes several restrictions on the code contained in to it after a prior interrupt. In the latter case, it re- secure core. Since it has to be self-contained and be lim- stores the context of secure core (which was stored ited to a certain memory range, secure core cannot call by secure intr handler when secure core an external dynamic memory allocator. Therefore, we imple- was last interrupted) and continues execution from the ment a simple memory allocator in secure core which al-

5 Application Library API Application Library API secure_lib user_stub

Binary Binary

System calls System calls Library calls

OS OS kernel_stub

secure_core Processor Processor Exceptions Interrupts Exceptions Interrupts

Devices Devices

secure_wrapper Single entry & multiple pre−defined exits invisios_secure_lib (a) (b) Figure 2. (a) Regular software architecture for applications using secure library, and (b) INVISIOS soft- ware architecture

Table 1. State registers in INVISIOS Description

Kpub Public key of the certifying authority permitted to sign software for INVISIOS Rcommand Used to command the bootstrap unit to install a new library Rstart address Start address in memory of the new library to be installed by the bootstrap unit Rinit state Set after the bootstrap unit has authenticated the secure library and initialized other registers Rentry point Address of the single point of entry into secure core Rexit interrupt Address of the instruction at which secure core exits on an interrupt Rexit final Address of the instruction at which secure core exits after completing its operation Rsecure state Set if the processor is executing code in secure core. Access to the secure memory region by the processor is permitted at this time Rhigh address Upper bound of the address range in which secure core is contained Rlow address Lower bound of the address range in which secure core is contained locates memory out of a memory pool which is reserved for non-volatile memory and is used for initial authentication of it and returns freed memory to the pool. secure core in- secure core. Onlya secure core that is signed by the terfaces with the application through shared memory which private part of the key pair, Kpvt, can be installed on the IN- is assumed to lie in the normal domain and can be of un- VISIOS system. Kpvt is known to a certifying authority, to restricted size. Also, some memory pages are statically re- which software vendors desiring to create an INVISIOS core served for the program stack of the core which is managed can send their code for signing. by secure wrapper. We recognize that this is a limita- A description of the state registers is given in Table 1. At tion since it is hard to bound the stack size of an applica- any stage, any of the active functional units of the INVISIOS tion. Work is ongoing to permit contiguous physical memory hardwarecan report a violation. This results in an NMI to the pages to be dynamically added and removed from the mem- processor and resets registers Rinit state and Rsecure state. ory range reserved for the core, in a secure fashion. The main functional units of INVISIOS hardware are illus- 3.4 INVISIOS Hardware Architecture trated in Figure 3 and described in the following subsections. The INVISIOS hardware comprises functional units that 3.4.1 Bootstrap unit perform various kinds of checking and maintain its state in- The bootstrap unit is responsible for installing a new formation. The state is stored in programmable registers secure core (and, therefore, a new secure lib) in the which can be set only by other INVISIOS functional units. protected memory. It authenticates secure core, veri- In addition, the hardware comes pre-programmed with the fies its integrity, and initializes the hardware registers with public part of a public-key pair, Kpub. The key is stored in parameters provided in the core. The OS copies the con-

6 Instr. VA Data VA Violation => NMI System bus Instr. PA Check virtual Bootstrap unit K pub to physical (Authentication translation Hard−wired state & integrity Data PA init Instr. verification) addr. R command R start_address FSM R init_state R entry_point Memory access R secure_state Bus monitor checker R high_address Checking engine R low_address R exit_interrupt Programmable state R exit_final

Figure 3. INVISIOS hardware architecture tents of secure core into physical memory at address • FSM to enforce the secure core interface: The core start addr, writes the same address to register FSM is a small which ensures that the entry Rstart address, and directs the bootstrap unit to begin instal- into and exit from secure core can only happen at lation via Rcommand. The core is not usable until it has been permitted addresses. For example, it checks that the ex- explicitly installed by the bootstrap unit. ecution of secure core can begin only at a fixed ad- secure core is created as described in the previous dress programmed into Rentry point. If a program tries section. Its header contains the size of the image and the to jump into secure core at any other address, the hash of the image signed using Kpvt. The bootstrap unit FSM reports a violation. Once the program has entered computesthe hash of the core, decrypts the signed hash using secure core, the FSM sets Rsecure state indicating Kpub and compares the two to verify the authenticity and in- that the execution is in the secure domain and access to tegrity of loaded secure core. If the above operations go all memory is permitted at this time. Similarly, the FSM through without an error, it asserts the init signal to the finite- ensures that the exit from secure core can only be state machine (FSM, described below), and initializes regis- at one of the two exit points defined above. ters Rentry point, Rexit interrupt, Rexit final, Rhigh address The FSM has three states: (i) START, which is the ini- and Rlow address, with values specified in the core image. tial state of the FSM where secure core is not yet usable The value in Rentry point denotes the address corre- and needs to be verified by bootstrap unit before use, sponding to the single point of entry into secure core. (ii) INIT, which is the state to which the FSM goes Registers Rexit final and Rexit interrupt store the two exit after initialization by the bootstrap unit and after exit points from secure core: the former stores the ad- from secure core, and (iii) SECURE, which is the dress of the last instruction in the core after which it state of the FSM when the secure core is executing. The returns to kernel stub, having completed its opera- proposed scheme has a very simple FSM that imposes tion; the latter stores the address of the instruction in minimal checks. However, it can be easily augmented secure intr handler which transfers control to the with more complex checking mechanisms, such as those original handler, when an interrupt (or exception) occurs proposed in [9,20], to incorporate complete control flow during the operation of secure core. Rhigh address and checking of secure core. Rlow address designate the memory range reserved for the • Virtual to physical address translation checker: In core which is protected by the checker. After initializationof processors, such as XOM and SP, secure versions of the registers, the bus monitor is activated. load/store instructions accompanied by cache and TLB tags are used to ensure that a program is writing to or 3.4.2 Checking engine reading from memory in its own secure compartment. If the wants to share data with another process, The checking engine is the primary hardware module which it writes to a special compartment which is untagged secure core controls access to . There are no separate and unchecked. commands to instruct the checking engine to start or stop checking for access violations. The checking engine is “al- In INVISIOS, there are no special instructions to read ways on” after the initialization phase. It has three major from or write to the protected memory range. Also, components: secure core is permitted to access both regular and

7 protected memory. Most of its writes are to protected 4.1 INVISIOS Software Architecture memory except when it is writing back the results of We implemented the INVISIOS software architecture in computation to the shared memory so that they can be Linux kernel version 2.6 for the architecture. In our im- accessed by the calling application. plementation, kernel stub is a Linux device driver which Consider the scenario when a compromised OS alters interfaces with applications using regular open, close, the virtual to physical address translation, thereby trick- read, write and mmap system calls. kernel stub pre- ing secure core into writing to unprotected physical processes arguments and jumps to the single point of entry memory under the false assumption of security. In order in secure wrapper. secure wrapper performs the to prevent such an attack, a checker is required, which functions describe in Section 3.3 and calls the appropriate verifies the address translation. function in invisios secure lib. secure lib is a commercial cryptographic library In the user mode, a virtual address in the executing pro- CLIB 1 which was ported to the kernel. It supports a wide gram can map to any physical address and these map- range of cryptographic algorithms including symmetric key pings are stored as entries in the page table. Therefore, algorithms (DES, 3DES, AES, RC4), one-way hash func- verifying the address translation of code running in the tions (MD5, SHA-1), public key algorithms (RSA, Diffie- user space would require storing the entire page table Hellman, Elliptic curve), and digital signature algorithms in hardware. However, in INVISIOS, secure core (DSA, RSA, Elliptic curve). runs in kernel mode. In kernel mode, the physical ad- The original library, CLIB, is structured in such a way dress is at a fixed offset from the virtual address. There- that its core runs in a different address space from the call- fore, the hardware can verify the translation by a sim- ing application. The execution follows a client-server model, ple addition rather than comparing it against pre-stored in which the core of the library runs as a server, waiting page table entries. for application requests. The CLIB client consists of the application linked with a stub library which exposes the • Memory access checker: This functional unit prevents CLIB API. Shared memory is used to pass arguments and any code from outside the secure core from accessing results between the client and the server, and semaphores any word lying in the protected memory range. It val- are used for synchronization between the two. In INVI- idates each load and store instruction executing on the SIOS, the server code is migrated to the kernel and forms processor and allows or disallows access based on the invisios secure lib. The calls in the client code state of the INVISIOS hardware. If R is set, secure state which generate computation requests are converted to calls the checker permits the processor to access any mem- to the INVISIOS device driver. The device driver acts as a ory region. However, if R is reset, and the secure state communication link between the client running in user space processor attempts to access memory lying between the and invisios secure lib running in kernel space in address in R and R , the checker low address high address the INVISIOS secure domain. reports a violation and sends an NMI to the processor. As mentioned before, memory space is statically allocated for the data and stack of secure core. Physical mem- 3.4.3 Bus monitor ory pages are reserved for the core and pinned along with the rest of the kernel. We use a simple dynamic memory The bus monitor is a watchdog module which prevents po- allocator to dynamically allocate and free memory from a tentially malicious bus peripherals from reading/writing to fixed pool. Linker scripts were used to generate a kernel the protected memory range. The checking engine provides image which has a contiguous memory range reserved for similar protection, but its protection extends only to code ex- secure core. Kernel symbol table information was ex- ecuting on the processor. The bus monitor extends this pro- tracted to determine the addresses of INVISIOS entry and tection to peripherals which can access memory using DMA. exit points, which were used as configuration parameters for It monitors the traffic on the system memory bus and flags the INVISIOS hardware. accesses to the protected memory by any peripheral as a se- In order to achieve speedy switching between interrupt curity violation. The offending bus transaction is aborted as handlers in secure wrapper, two copies of the interrupt a result of this, and the INVISIOS registers and memory state descriptor table (IDT) were maintained. The address of the are reset. The bus monitor is activated by the bootstrap unit IDT is stored in a called IDTR (interrupt and is also “always on,” like the checking engine. descriptor table register), whose contents are modified by secure wrapper to change interrupt handlers. In case of 4 Experimental Methodology and Results interrupts, the x86 processor pushes the contents of eFlags, CS (code segment) and EIP (instruction pointer) registers In this section, we present details of our experimental 1Name changed to protect vendor. framework for implementing and testing INVISIOS.

8 onto the stack before jumping to the appropriate interrupt 3. Modification of FSM registers. handler. In case of exceptions, such as page fault, stack exception, etc., an additional error code is pushed onto the All malicious accesses to INVISIOS memory space were stack by the processor. secure intr handler imple- caughtand flagged by the hardwarechecker added to QEMU. ments different functionalities for different interrupts and ex- Hardware modeling was done for the purpose of func- ceptions. After zeroizing processor registers and performing tional testing only. Performance estimates for INVISIOS other cleanup operations, it creates a fake interrupt/exception hardware could not be obtained due to lack of availability context as expected by the original handler before jumping of cycle-accurate simulators which can execute an entire OS. to the latter. However, one of the main considerations in the design of The modified kernel was run on a Pentium 3.4GHz laptop. INVISIOS was simplicity. Since the INVISIOS hardware is Test applications were selected from CLIB’s regression test composed of simple checkers, they can perform the checking suite and run on top of the modified kernel. Both the kernel in parallel with program execution without causing notice- and test applications were compiled with “-o2” optimization able performance penalties. We do not expect a performance flag. The test programs exercise various parts of CLIB, in- impact in addition to the impact of the INVISIOS software cluding symmetric key encryption, cryptographic hash com- architecture presented in Section 3.3. putation, and public key operations. Figures 4(a), (b) pertain to hash computation using SHA-1 and MD5 hash algorithms 5 Security Analysis for varying data sizes, Figure 4(c) presents data for RSA key generation with varying sizes for the modulus, and Fig- In this section, we analyze the security of our architec- ures 4(d–f) illustrate the performance of AES CBC (cipher- ture and show how it blocks attacks against the secure library block chaining), and 3DES CBC and ECB (electronic code- that is being protected. In our architecture, the code of the book) encryption and decryption. untrusted application is inherently protected from modifica- The time measurements were made using the Pentium tion. However, an attack can target vulnerabilities in the OS time-stamp , which is a 64-bit register, incremented to access the secure core memory or to cause the INVISIOS on every clock cycle. It maintains an accurate count of the to leak confidential data by storing it in unprotected physi- processor cycles and can be accessed using the rdtsc in- cal memory . Other attacks could use DMA accesses, bus struction. The average performance penalty in the tested peripherals, and page faults to read or write to the protected benchmarks was 1%. memory range. 4.2 INVISIOS Hardware Architecture We first discuss the various forms of attacks aimed at accessing the secure core memory. These attacks work by The INVISIOS hardware was modeled on a full-system (i) maliciously modifying the OS virtual-to-physical address emulator QEMU [7]. QEMU is a multi-processor emulator translation mechanism (Attack 1), (ii) accessing protected (i.e., it supports multiple host and target processors) which memory from code lying outside the secure core (Attack 2), achieves good emulation speed using dynamic binary trans- (iii) maliciously modifying bus peripherals to access secure lation. In our study, we incorporated several modifications in memory through DMA (Attack 3), (iv) exploiting page faults the x86 emulation engine in order to include the INVISIOS to page out memory pages belonging to the secure core (At- hardware, as follows: tack 4), or (v) exploiting vulnerabilities in the secure library 1. Addition of registers Rentry point, Rstart address, to leak information (Attack 5). Rentry point, Rhigh address, and Rlow address. Attack 1 is effectively blocked by the virtual-to-physical 2. Addition of a hardware checker, which implements address translation checker, which, instead of storing the en- virtual to physical address translation checking and tire page table in hardware and updating it upon page faults, memory access checking against the values stored in verifies the translation by a single offset addition. Attack 2 Rlow address and Rhigh address. is also blocked by the memory access checker, which vali- We tested the functional correctness of INVISIOS by run- dates each load and store instruction executing on the pro- ning the modified Linux kernel (described above) on QEMU. cessor and disallows access to the secure memory core (i.e., The INVISIOS hardware implemented within QEMU did not any memory address between the value of Rlow address and report false positives for any benchmark. We also created that of Rhigh address) from outsidethe core if Rsecure state is malicious modifications in the kernel to access INVISIOS set. A more sophisticated type of attack exploits DMA to ac- memory when the processor was not executing in the secure cess secure memory. This attack (Attack 3) is circumvented domain. Malicious modifications of the kernel include: in two ways: (i) through the bus monitor, which flags access 1. Modification of virtual to physical memory mappings in by peripherals to protected memory as a violation, and (ii) the mm/ Linux folder. by specifying that the memory region between Rlow address R 2. Accessing memory addresses lying between the ad- and high address is not accessible throughDMA. Attack 4 is dresses in registers Rlow address and Rhigh address. circumvented by pinning the secure physical memory along

9

300000 300000

¨     

§ ¨ BASE © CASE BASE CASE

¨ ¡      250000 § ¨ © 250000 INVISIOS INVISIOS

200000 200000

150000 150000 # cycles # cycles

100000 100000

50000 50000

0 0

¡ ¢ £ ¤ ¥ ¦      256 512 768 1024 1280 1536 1792 256 512 768 1024 1280 1536 1792 Data size (B) Data size (B)

(a) SHA-1 hash (b) MD5 hash

4000000

BASE CASE - Decryption 3500000

3400000 INVISIOS - Decryption

1

4

2 3 2 5 6

BASE CASE BASE CASE - Encryption

1

4 2 5 7 2 3 3000000 3200000 INVISIOS INVISIOS - Encryption

3000000 2500000

2800000

2000000

0 /

2600000 # cycles

.

,

- ,

+ 1500000 2400000

2200000 1000000

2000000 500000 1800000

0

1600000

  

768 1024 1152 1280 1 256 2 3 512 4 5 1024 6 7 2048 8 9 4096 1011121314 8192 16384

& '() 



% %

! "#$ !*  Data size (B)

(c) RSA key generation (d) AES CBC encryption and decryption

8000000 6000000

BASE CASE - Decryption BASE CASE - Decryption 7000000 INVISIOS - Decryption 5000000 INVISIOS - Decryption BASE CASE - Encryption BASE CASE - Encryption 6000000 INVISIOS - Encryption INVISIOS - Encryption 4000000 5000000

4000000 3000000 # cycles # cycles

3000000 2000000

2000000

1000000 1000000

0 0 1 256 2 3 512 4 5 1024 6 7 2048 8 9 4096 1011121314 8192 16384 1 256 2 3 512 4 5 1024 6 7 2048 8 9 4096 1011121314 8192 16384

Data size (B) Data size (B)

(e) 3DES CBC encryption and decryption (f) 3DES ECB encryption and decryption

Figure 4. Performance results for INVISIOS

10 with the kernel, thus, the memory used by the secure core [8] J. McCune, B. Parno, A. Perrig, M. Reiter, and A. Seshadri, cannot be paged out. If an OS handler tries to swap out a “Minimal TCB code execution,” in Proc. Int. Symp. Security secure core page and replace it by another, the memory and Privacy, May 2007, pp. 267–272. access handler is able to detect it as an illegal access. [9] D. Arora, S. Ravi, A. Raghunathan, and N. K. Jha, “Se- INVISIOS does not aim to protect against Attack 5, since cure embedded processing through hardware-assisted run- it assumes that the code in secure core is trusted and time monitoring,” in Proc. Design, Automation and Test in verified to not leak sensitive information. INVISIOS allows Europe Conf., Mar. 2005, pp. 178–183. secure core to read and write outside the secure memory, [10] M. Budiu, Ulfar´ Erlingsson, and M. Abadi, “Architectural sup- which is necessary for the secure core to communicate with port for software-based protection,” in Proc. Wkshp. Architec- the calling application. However, this is a common limitation tural and System Support for Improving Software Dependabil- of all secure execution environments, including those based ity, Oct. 2006, pp. 42–51. on virtualization or extensive processor enhancements. [11] G. E. Suh, J. Lee, and S. Devadas, “Secure program execution via dynamic information flow tracking,” in Proc. Int. Conf. Architectural Support for Programming Languages and Op- 6 Conclusions erating Systems, July 2004, pp. 85–96. In this paper, we presented INVISIOS, a hardware- [12] J. P. McGregor, D. K. Karig, Z. Shi, and R. B. Lee, “A pro- software architecture for providing a secure execution envi- cessor architecture defense against buffer overflow attacks,” in ronment for security-critical software. INVISIOS makes the Proc. Int. Conf. Information Technology: Research and Edu- execution of critical software invisible to the OS and thus cation, Aug. 2003, pp. 243–250. makes the former immune to vulnerabilities of the latter. The [13] D. Kirovski, M. Drini´c, and M. Potkonjak, “Enabling trusted INVISIOS software architecture requires very few changes software integrity,” in Proc. Int. Conf. Architectural Support in a commodity OS. The hardware enhancements entailed by for Programming Languages and Operating Systems, Oct. INVISIOS are non-intrusive (requiring no new instructions, 2002, pp. 108–120. cache/TLB tags or changes in the datapath) and result in a [14] M. J. Atallah and L. Jiangtao, “Enhanced smart-card based very simple checker design. We have implemented INVI- license management,” in Proc. Int. Conf. E-Commerce, June SIOS by runninga modified Linux kernelon an x86 emulator 2003, pp. 111–119. modified to include INVISIOS hardware. This, along with [15] J. G. Dyer, M. Lindemann, R. Perez, R. Sailer, L. van Doorn, our performance measurements on an actual x86 platform, and S. W. Smith, “Building the IBM 4758 secure coproces- indicates that the proposed architecture provides an effective sor,” IEEE Computer, vol. 34, pp. 57–66, Oct. 2001. way to achieve a secure execution environment on a regular [16] D. Lie, C. A. Thekkath, M. Mitchell, P. Lincoln, D. Boneh, processor, at reasonable overheads. J. C. Mitchell, and M. Horowitz, “Architectural support for copy and tamper resistant software,” in Proc. Int. Conf. Archi- tectural Support for Programming Languages and Operating References Systems, Nov. 2000, pp. 168–177. [17] G. E. Suh, D. Clarke, B. Gassend, M. van Dijk, and S. De- [1] B. Schneier, Applied Cryptography: Protocols, Algorithms vadas, “AEGIS: Architecture for tamper-evident and tamper- and Source Code in C. John Wiley and Sons, 1996. resistant processing,” in Proc. Int. Conf. Supercomputing, June [2] W. Stallings, Cryptography and Network Security: Principles 2003, pp. 160–171. and Practice. Prentice Hall, 1998. [18] X. Chen, T. Garfinkel, E. C. Lewis, P. Subrahmanyam, C. A. [3] S. W. Smith and S. Weingart, “Building a high-performance, Waldspurger, D. Boneh, J. Dwoskin, and D. R. K. Ports, programmable secure ,” Computer Networks, “Overshadow: A virtualization-based approach to retrofitting vol. 31, no. 9, pp. 831–860, Apr. 1999. protection in commodity operating systems,” in Proc. Int. Conf. Architectural Support for Programming Languages and [4] LaGrande technology preliminary architecture specification. Operating Systems, Mar. 2008, pp. 2–13. Intel Publication no. D52212, May 2006. [19] R. B. Lee, P. C. S. Kwan, J. P. McGregor, J. Dwoskin, and [5] ARM TrustZone Technology Overview. http://www. Z. Wang, “Architecture for protecting critical secrets in micro- arm.com/products/CPUs/arch-trustzone. processors,” in Proc. Int. Symp. , June html. 2005, pp. 2–13. [6] N. R. Potlapally, A. Raghunathan, S. Ravi, N. K. Jha, and [20] T. Zhang, X. Zhuang, S. Pande, and W. Lee, “Anomalous path R. B. Lee, “Satisfiability-based framework for enabling side- detection with hardware support,” in Proc. Int. Conf. Compil- channel attacks on cryptographic software,” in Proc. Design, ers, Architectures and Synthesis for Embedded Systems, Oct. Automation and Test in Europe Conf., Mar. 2006, pp. 18–23. 2005, pp. 43–54. [7] “QEMU open source processor emulator.” http://fabrice.bellard.free.fr/qemu/

11