IOTA: Detecting Erroneous I/O Behavior Via I/O Transaction Auditing
Total Page:16
File Type:pdf, Size:1020Kb
Appears in the First Workshop on Compiler and Architectural Techniques for Application Reliability and Security (CATARS) In Conjunction with the 2008 International Conference on Dependable Systems and Networks (DSN 2008) Anchorage, Alaska, June 2008 IOTA: Detecting Erroneous I/O Behavior via I/O Transaction Auditing Albert Meixner and Daniel J. Sorin Duke University [email protected], [email protected] Abstract nosed, the system could suggest that the user or system The correctness of the I/O system—and thus the administrator replace this device. If a driver bug is diag- nosed, the bug can be reported and the driver can be correctness of the computer—can be compromised by restarted (which is often sufficient). If a security breach hardware faults, driver bugs, and security breaches in is diagnosed, the system can shutdown the driver under downloaded device drivers. To detect erroneous I/O suspicion and alert the user. In this work, we focus on behavior, we have developed I/O Transaction Auditing error detection, rather than diagnosis and recovery. (IOTA), which checks the high-level behavior of I/O Our error detection mechanism, I/O Transaction transactions. In an IOTA-protected system, the operat- Auditing (IOTA), uses end-to-end checking [13] of I/O ing system creates a signature of the I/O transactions it transactions. We define a transaction as a high-level, expects to occur and every I/O device computes a signa- semantically atomic I/O operation, such as sending an ture of the transactions it actually performs. By compar- Ethernet frame. A transaction represents the basic I/O ing these signatures, IOTA can discover erroneous operation for higher level OS services such as the file behavior in both the hardware and the software respon- system or network stack. A transaction typically con- sible for performing I/O operations. Our experimental sists of many low-level I/O operations, such as reading results show that IOTA reliably detects a wide range of or writing memory-mapped device registers. For exam- erroneous behavior at little cost in performance, I/O ple, sending an Ethernet frame is a transaction that con- bandwidth, or hardware. sists of multiple I/O accesses from the driver to initiate the transaction at the device and many memory accesses 1. Introduction by the device to obtain the frame payload. Transactions are high-level operations that are independent of low- As computer system dependability has become level implementation details; for example, a disk trans- increasingly important, there has been a significant action is independent of the format of the file system. amount of research in checking the system’s behavior. The basic idea of IOTA is to audit I/O transactions Recent research has primarily focused on checking the at multiple points in the system (OS, buses, devices) to behavior of cores [2, 8] and memory systems [9], while ensure that they are propagated correctly through the I/O I/O systems have received comparatively little attention. software and hardware stack. In an IOTA-protected sys- The I/O system includes both hardware and soft- tem, the OS creates a signature of the I/O transactions it ware, and it comprises a crucial part of a computer sys- expects to occur. IOTA’s auditing hardware—which tem. I/O hardware consists of the actual devices (disks, could be located at an I/O device (e.g., SCSI device) or network cards) and the interconnect (e.g., PCI, AGP, I/O controller (e.g., SCSI controller for multiple SCSI ISA, HyperTransport) that enables communication devices)—computes a signature of the transactions the between the devices, processor, and memory. I/O soft- device actually performs. By comparing these signa- ware includes the device drivers that provide the inter- tures, IOTA can discover many types of erroneous face between the OS and the devices. behavior between the OS and the auditors. Moreover, The I/O system can behave erroneously due to three IOTA can detect many errors not detectable by simpler causes. First, like cores and memory systems, I/O hard- mechanisms such as error detecting codes (signatures ware can suffer from physical faults, both transient and over only the transmitted data) or higher-level mecha- permanent. Second, the I/O system is susceptible to nisms (e.g., timeouts on dropped packets). For example, device driver bugs. Third, I/O may behave erroneously IOTA can detect if a malicious driver tries to send data due to security breaches, particularly in downloaded to an attacker. device drivers. IOTA’s primary contribution is an end-to-end error Our goal is to detect erroneous I/O behavior. Once detection scheme for I/O. It detects many faults, driver detected, the system can take action to diagnose and bugs, and security breaches. However, it has some limi- remedy the situation. If a permanent device fault is diag- tations, including a device model that is not entirely 1 general and excludes some common I/O devices, such showed that 53% of detected bugs in the Linux kernel as graphics accelerators. were in device drivers [4]. In the rest of this paper, we describe and evaluate When a driver bug is excited, it manifests itself as IOTA. First, in Section 2, we describe our error model. an erroneous behavior. Because the space of possible Then, we present IOTA in Section 3, and we evaluate it design bugs is unbounded, we cannot construct a com- in Section 4. We discuss related work and compare it to plete error model. One can always imagine a more dia- IOTA in Section 5, before concluding in Section 6. bolical bug that does not fit into a given error model. 2.3 Security Breaches 2. Error Model Drivers are a natural weak point in a trusted OS We consider errors due to three causes: physical environment. Although their access to memory can be faults, driver bugs, and driver security breaches. These restricted in the same way as for regular applications, errors all occur between the OS and IOTA’s hardware drivers by necessity have access to the devices they are auditors, which enables detection of many of them. controlling and to user data transferred between the 2.1 Physical Faults device and the system. This position can be exploited to maliciously leak sensitive data. Unlike other trusted Our fault model is a single stuck-at fault that is components, such as the OS kernel, drivers are provided either transient or permanent. Due to faults, a variety of by many sources and are frequently updated. Thus, the errors can occur, and we provide examples: often low code quality that manifests itself in numerous •Incorrect Actions: A device sends a network packet bugs also makes drivers suspects for possible intrusions. to the wrong destination. A widely publicized recent breach in an Apple Mac- •Out-of-order Actions: The OS specifies that the disk Book was possible when a downloaded third-party controller should complete the DMA before sending driver was used for its wireless Ethernet card [14]. This an interrupt, but the interrupt occurs first.1 breach was not preventable by use of a personal firewall. •Missing/Duplicated Actions: The network device The damage that a driver can cause depends on the simply does not send a packet. device. Drivers for communication devices like network •Wrong Status: A device’s status is incorrect. This cards can analyze data transferred over the Internet by error is subsumed by the previously listed errors. the user, such as e-mail, and communicate the collected •Deadlock/Hanging: A device does nothing. information to an outside party. Encryption alone is insufficient to counter malicious drivers, because the These erroneous behaviors are not equally likely, driver has full control over incoming and outgoing data even though we assume the underlying faults are equally and can thus execute man-in-the-middle attacks. Mass likely to occur in any hardware component. Although storage drivers (SCSI, IDE, etc.) do not have the same some of these errors are easily detectable with other ability to communicate data to an outside party, but they mechanisms (e.g., an error in writing data to a disk is can help an attacker circumvent the operating system detectable with EDC), others errors are more challeng- file permissions in order to access confidential data. ing (e.g., an error that causes a write to a disk to not be performed at all). 3. I/O Transaction Auditing (IOTA) 2.2 Driver Bugs In this section, we present IOTA’s system model, Device driver bugs are common [3] and can be give an overview of how IOTA fits into the remainder of expected to remain a problem in the future for a variety the system, discuss our particular implementation, and of reasons. Most notably, there exists a vast number of explain IOTA’s costs and limitations. drivers, each of which has to manage a complicated interface between the OS and the device. The device 3.1 System Model and Key Insight driver has to split a semantically atomic operation, such IOTA views each I/O device as a complex func- as writing a block to disk, into a sequence of requests to tional unit that performs a series of self-contained input the device. While doing this, it must also obey device and output transactions. A device performs a transaction idiosyncrasies (such as timing constraints) and handling either as a result of commands issued by the processor asynchronous events (such as interrupts). This requires or due to outside events, e.g., incoming messages. Once sophisticated synchronization and complex logic for a transaction completes, the device notifies the CPU via infrequent special cases (error handling), which makes an interrupt or polling and makes the results accessible thorough testing difficult.