EPA-RIMM: a Framework for Dynamic SMM-Based Runtime Integrity Measurement Brian Delgado, Karen Karavanic Portland State University Bdelgado†, [email protected]
Total Page:16
File Type:pdf, Size:1020Kb
EPA-RIMM: A Framework for Dynamic SMM-based Runtime Integrity Measurement Brian Delgado, Karen Karavanic Portland State University bdelgado†, [email protected] ABSTRACT critical management tasks at runtime using SMM. This can include managing CPU power states, controlling low-level Runtime integrity measurements identify unexpected changes in hardware, handling thermal throttling, performing BIOS flash operating systems and hypervisors during operation, enabling updates [34], and handling memory errors, among other early detection of persistent threats. System Management Mode, a usages[1]. SMM operates at a more privileged layer than host-side privileged x86 CPU mode, has the potential to effectively perform kernel or hypervisor code. The entry into SMM is accomplished such rootkit detection. Previously proposed SMM-based by a System Management Interrupt (SMI) which typically takes approaches demonstrated effective detection capabilities, but at a all CPU threads1 out of the operating system or applications and cost of performance degradation and software side effects. In this into the SMI handler. However, the SMI’s work is generally paper we introduce our solution to these problems, an SMM-based processed by a single CPU thread. SMIs can be triggered by Ring Extensible, Performance Aware Runtime Integrity Measurement 0 software by writing values to the APM_CNT IO port on Intel Mechanism called EPA-RIMM. The EPA-RIMM architecture platforms [46] and the “SMI Command Port” on AMD platforms features a performance-sensitive design that decomposes large [47]. Intel’s Itanium processors also feature a Platform integrity measurements and schedules them to control Management Interrupt (PMI) that is similar to an SMI [49]. A perturbation and side effects. EPA-RIMM's decomposition of proposal has been made to bring SMM-like functionality to ARM long-running measurements into shorter tasks, extensibility, and platforms utilizing TrustZone [48]. Upon entering SMM, the use of SMM complicates the efforts of malicious code to detect or SMRAM Save State Map is populated with CPU register values avoid the integrity measurements. Using a Minnowboard-based from the time of interruption. An SMM RIMM can inspect this prototype, we demonstrate its detection capabilities and data to determine if unexpected changes have occurred. SMM performance impacts. Early results are promising, and suggest that code issues an RSM instruction to exit SMM and resume host-side EPA-RIMM will meet production-level performance constraints execution at the point of interruption. The Intel BITS tool while continuously monitoring key OS and hypervisor data measures the duration of SMIs on a system and warns if they are structures for signs of attack. larger than “acceptable” limits (e.g. 150 microseconds) [7]. For RIMM purposes, SMM provides critical benefits: KEYWORDS • SMM runs at a higher privilege than host software; BIOS; SMM; firmware; rootkit detection; runtime integrity • SMM has broad visibility into operating system and measurement; security; performance 2 hypervisor (host software) memory and CPU registers • SMM is strongly isolated from host software as the 1 INTRODUCTION hardware-protected SMRAM can only be read or Today’s complex server platforms include software written from SMM, providing a location to place environments vulnerable to sophisticated malware called rootkits. measurement software Rootkits target sensitive low-level kernel or hypervisor data • When an SMI occurs, the transitions into SMM cannot structures - such as interrupt handlers, event handlers, registers, be prevented by malicious code; and memory - remaining undetected for extended periods to • When all CPU threads enter SMM, host software achieve persistence. One response to this vulnerability has been execution is paused which presents the opportunity to the development of runtime integrity measurement mechanisms inspect the system while it is temporarily halted; and (RIMMs) that aim to detect rootkits before financial, political, or • As x86 systems broadly support SMM, the mechanisms other damage occurs. The runtime integrity measurement can be readily adopted. approach periodically preempts execution and examines the interrupted state, checking for unexpected changes in low-level resources. Several approaches leveraging a privileged CPU mode 1 As some Intel CPUs feature the HyperThreading feature which called System Management Mode (SMM) to perform the needed creates a set of logical CPUs, we refer to the logical CPUs as monitoring of resources in the runtime environment demonstrated “CPU threads.” promising rootkit detection capabilities [3][4][5]. 2 Recent open-source UEFI code updates have constrained SMM’s SMM is a general purpose, widely available mechanism on memory visibility. We discuss the implications and potential x86 CPUs. Firmware on computer systems performs a variety of options in Section 5. † This author was a full-time employee of Intel Corp. when this work was done. For these reasons, SMM RIMMs present intriguing possibilities (Section 2.4). We discuss the architecture in Section 2.5. A for adding new detection capabilities to combat host software detailed example is provided in Appendix A. rootkits. There are three key abstractions: Checks, Tasks, and Bins. A Our research demonstrated that an important drawback limits Check is a description of an integrity measurement, including a the potential of SMM-RIMMs: interference with system software command and its arguments, a priority, and a decomposition that may lead to significant perturbation of the system and even target (to guide decomposition). Checks allow the Administrator software failures [2]. System software assumptions regarding to specify particular measurements over sets of memory regions, scheduling regularity as well as task durations are challenged by Control Registers, and Model-Specific Registers (MSRs). Sample prolonged periods of time in SMM. Early SMM RIMMs exceeded Checks include: “Static Linux Kernel Code Sections” that the published SMM guidelines by orders of magnitude. There are measures the Linux kernel code sections to identify code other limitations to early SMM-based approaches: the lack of injections, the “IDTR” Check that verifies that the IDTR register extensibility - the inability to dynamically change the set of value has not unexpectedly changed, the “GDT” Check that monitored resources to maintain effectiveness as new rootkits measures the Global Descriptor Table (GDT) to determine if it has emerge; and the inability to adjust the frequency of checking as changed. Other Checks measure specific MSRs or CPU control new threat information becomes available. registers (e.g. CR0, CR4), for example, to determine if the In this paper, we present our solution to this problem, an Supervisor Mode Execution Protection (SMEP) were disabled by SMM-RIMM framework called EPA-RIMM. EPA-RIMM’s goal malware. At runtime, Checks are decomposed into some number is to provide quick detection of kernel or hypervisor rootkits by of Tasks, or partial resource measurements to meet expectations identifying unexpected changes in system state snapshots. It for SMI latency. Tasks are scheduled by filling Bins which accomplishes this by periodically interrupting the running system consist of the set of work to be performed in one SMI session. to inspect sets of presumed static resources to identify changes, Each Bin’s size is defined as the sum of the execution times of the any of which would be a strong indicator of compromise. Tasks it contains. The key contributions of this work are: 1. A mechanism for decomposing large integrity 2.1 Diagnosis Manager measurements to remain consistent with expectations The Diagnosis Manager (DM) is the component that regarding SMI latency. This approach removes the orchestrates the runtime integrity measurements. It decides which significant performance degradation incurred by inspections to run under at a given point of time and interprets the prolonged SMIs and creates an opportunity for the measurement results. A single DM may be responsible for one or development of new integrity measurements that were more nodes in a cluster, and may communicate with other formerly impractical. Diagnosis Managers. The DM sends and receives information 2. A mechanism for throttling the rate of integrity checking. about attack discoveries from across the EPA-RIMM framework EPA-RIMM’s scheduler facilitates a varying time budget to help guide detection on other monitored nodes. This allows for measurements, allowing the performance-security dynamically adjusting the priority of measurements to search for tradeoff to be adjusted during runtime based on the threat detected issues on other nodes. The DM can also interface with an landscape. This allows EPA-RIMM to reduce its external Security Information and Event Management system performance impact on the system when necessary but (SIEM) to send and receive telemetry on attacks, although a full also supports increasing it as threats emerge. description of this interface is outside of the scope of this paper. 3. An API to specify measurements during operation. This allows the measurements to vary depending on the environment and abstracts the complexity of OS/VMM- specific details from the lowest level SMM-code. 4. An open-source prototype of EPA-RIMM to demonstrate its effectiveness and performance. We plan to release EPA-RIMM as open-source software to facilitate its use as a research and educational tool. EPA-RIMM supports either