HARES: Hardened Anti-Reverse Engineering System

HARES: Hardened Anti-Reverse Engineering System Technical Whitepaper Jacob I. Torrey Assured Information Security, Inc. [email protected] / @JacobTorrey ABSTRACT bility: protection of algorithmic IP, providing resis- This paper provides a technical overview of the tance to mining for ROP gadgets and significantly HARES software protection research effort performed increasing the difficulty of \weaponizing" a crash- by Assured Information Security. HARES is an case into a RCE. Background is provided, followed anti reverse-engineering technique that uses on-CPU by details on the HARES design, and an evaluation encryption [6] in conjunction with Intel x86 TLB- of the proof-of-concept. Finally related works and splitting [11] in order to significantly increase the conclusions are presented. effort required to obtain the clear-text assembly instructions that comprise the target x86 application. 2. PROBLEM STATEMENT The current state-of-the-art for program protec- Performance and use-cases of the system are pre- tions are vulnerable to sufficiently motivated at- sented, and a number of weaknesses and future tackers and reverse-engineers. Resulting from these works are discussed. Related works are compared weaknesses, sensitive algorithms can be copied, and and contrasted with HARES in order to highlight vulnerabilities in software exploited. its improvements over the current state-of-the-art. Keywords 2.1 Background This research builds upon numerous past research Security, Obfuscation, Encryption, Split-TLB, AES, efforts and technologies, the following subsections TRESOR provide the requisite background for a generally technical reader to fully understand the methodol- 1. INTRODUCTION ogy of HARES. This paper provides details on the design and implementation of the research effort HARES. HARES 2.1.1 Intel x86 Architecture provides a proof-of-concept capability to transpar- ently execute fully-encrypted binaries on a stan- HARES supports the modern Intel x86 CPU archi- dard Intel CPU with minimal performance impact. tectyre (Core-i 4000 series and newer), a modified- Harvard CISC instruction set architecture commonly There are a number of use-cases for such a capa- found on today's desktop and laptop computer sys- tems. The following paragraphs present a deep- dive into certain features of the x86 architecture which are leveraged by HARES in order to provide anti reverse-engineering protections. Privilege Rings. Initially implemented in the In- tel 80286, and extended more fully with the 80386, privilege rings and\protected mode"provide a level of protection from misbehaving applications. Prior 1 to protected mode, the CPU operated in\real mode", where every application had full access to the en- tire memory space, and the ability to directly com- municate with hardware. If a real mode application were to overwrite a critical portion of memory storing the operating system (OS), there were no protective mechanisms in place to prevent it from doing so. In protected mode, the OS kernel could load it- self into a more privileged \ring" than the applica- 1 tions it was supporting and limit the applications' Figure 1: Paging Translation on x86 ability to impact the system as a whole. There are four \official" privilege rings (0 − 3) and two or three other modes of execution that are typically conditions for the hardware to generate an SMI referred to as rings −1; −2; & − 3. Modern OSs are defined via a number of chip-set registers that typically utilize only two of these rings: ring-0 for can be locked during SMM initialization [4]. the privileged \kernel-mode" OS routines, sched- uler and interrupt handlers and ring-3 for the\user- mode" applications where they are limited in their Paging & Memory Management. Modern CPUs abilities to execute certain instructions, access pro- provide the ability to provide each process with a tected memory regions, interact with the hardware, unique view of memory [4]. This feature, known as generally to impact the system as a whole [4]. virtual memory, eases the OS's task of isolating different applications and providing each application As the x86 architecture has evolved and the scrutiny with a different view of memory. When a virtual placed on the low-level implementations increased memory address is accessed by an application, the in the security community, two more CPU modes CPU uses a number of data structures to automat- have been informally called rings: hypervisor (VMM) ically translate the virtual address into a physical root mode (ring −1) and system management mode address. This process, outlined in Figure 1, uses (ring −2). Ring −3 is also used when referring the CPU's CR3 register to find a page directory to Intel's Active Management Technology (AMT), and optionally a page table that holds the phys- but is not relevant for this research. Ring −1 is ical address. In most modern OSes, each process more privileged than the OS kernel, and is used by is given its own set of page translation structures hypervisors to virtualize multiple OSs without re- to map the 4GiB flat memory view provided to the quiring modification to the OS kernel (full-virtualization).system's physical memory (assuming a 32-bit OS). The specifics of ring −1 are discussed later as HARES is implemented as a thin-hypervisor. Intel VT-x, VT-d & EPT/VPID. A growing trend Ring −2, more commonly known as system man- in information technology is the use of virtual ma- agement mode (SMM), is a parallel and hardware- chines, or VMs to enable data-center consolidation. enforced stealth mode of execution. It was orig- A hypervisor, or virtual machine monitor (VMM), inally developed for BIOS and chip-set manufac- allows multiple OSes to run simultaneously on the turers to run platform-specific routines for power same physical system, each isolated from the others and thermal management without exposing the in- and provided with the appearance of a normal sys- ternals to the OS. When the CPU is executing in tem environment. While some hypervisors require SMM, it has a full view of memory, while when ex- changes to the guest OS (para- virtualization) to ecuting in protected mode, the chip-set blocks ac- function properly, many leverage newer CPU ex- cess to SMM memory regions (SMRAM). In order tensions to allow an unmodified OS to run with to switch from protected (or real mode) into SMM, only minor interactions from the hypervisor. These a system management interrupt (SMI) is issued to extensions, known as virtual machine extensions, the CPU and the hardware saves all CPU context, allowing the SMM code to execute without the OS 1Image from http://viralpatel.net/taj/ or hypervisor detecting it was interrupted. The tutorial/image/paging.gif 2 or VT-x on Intel, improve performance by empow- ering the CPU and chip-set to perform more of the isolation and VM memory management in hardware as opposed to software. VT-x allows the hypervisor to set a number of different exit conditions for each guest VM that, when met will trigger a VM Exit, returning control to the hypervisor for processing. After the initial release of VT-x, it was discovered Figure 2: Core 2 and Previous CPU Archi- in [14] that it was possible to use a device's direct tecture (TLBs Highlighted)3 memory access (DMA) capabilities to bypass VT- x's protections and compromise the hypervisor. In response to this weakness, Intel released the exten- TLB. While logically, the TLB stores the transla- sions for directed IO (VT-d), providing a memory tions for all accessed addresses in the same region of management unit (MMU) for IO DMA memory ac- silicon, the physical implementation splits the TLB cesses, preventing a device from directly accessing (Figure 2) into two: one for instruction addresses main memory outside of its policy-proscribed re- (I-TLB) and one for data addresses (D-TLB). This gion(s). implementation detail is important, as it allows the TLB to point to different addresses for instruction In the latest version of hardware virtualization ex- fetches as compared to data accesses. tensions, Intel and AMD have released the extended page table (EPT) or rapid virtualization indexing In earlier processors4, the TLB was a single-layer (RVI) technologies, respectively. These allow the (L1) cache that, in the silicon was split into two: a hypervisor to take even less of a role in the memory data TLB (D-TLB) and instruction TLB (I-TLB), management and isolation of each guest by provid- for data and instruction fetches respectively. With ing another layer of paging structures to translate the release of the Nehalem architecture5, a second the physical addresses that the VM OS believes to layer (L2) cache was introduced, the shared TLB be the physical address (guest physical address) to (S-TLB) that provides a larger cache for both the the machine physical address. The CPU can au- I-TLB and D-TLB in the event of an eviction from tomatically translate these in a similar fashion to the L1, further reducing TLB \misses" and the re- conventional paging and provide a VM Exit, anal- sultant performance hit. ogous to a page fault, for the hypervisor. These translations are stored in the TLB, and tagged with The TLB operates in a similar fashion to the reg- each guest's VM process ID (VPID) so they need ular CPU caches (L1, L2 & L3) in that it typically not be flushed on VM context switch. does \the right thing", requiring minimal manual management. The Intel architecture does provide a These aforementioned technologies significantly aid small number of instructions for flushing the TLB, the hypervisor in running multiple VMs in an iso- either a single entry or multiple entries. The TLB lated fashion with relatively minor performance im- is flushed (excepting for pages marked \global" in pacts.

Load more