A Tamper-Resistant Framework for Unambiguous Detection of Attacks in User Space Using Process Monitors
Total Page:16
File Type:pdf, Size:1020Kb
A Tamper-Resistant Framework for Unambiguous Detection of Attacks in User Space Using Process Monitors Ramkumar Chinchani and Shambhu Upadhyaya Kevin Kwiat Dept. of Computer Science and Engineering Air Force Research Laboratory University at Buffalo, SUNY 525 Brooks Road Amherst, NY 14260 Rome, NY 13441 Email: rc27, shambhu ¡ @cse.buffalo.edu Email: [email protected] Abstract with large projects [14], [9], [12], [11] suggest this. Maintaining high availability of these services is criti- Replication and redundancy techniques rely on the as- cal for the smooth functioning of any organization relying sumption that a majority of components are always safe and on those resources. These services can fail due to software voting is used to resolve any ambiguities. This assumption faults or attacks by intruders. Occurrence of faults can result may be unreasonable in the context of attacks and intru- in the unpredictable failure of the system. Intrusions are of- sions. An intruder could compromise any number of the ten likened to faults and successful attacks, like faults, leave available copies of a service resulting in a false sense of the system in an inconsistent or unusable state. security. The kernel based approaches have proven to be Fault detection and tolerance techniques have goals such quite effective but they cause performance impacts if any as dependability, reliability, availability, safety and per- code changes are in the critical path. In this paper, we pro- formability which are similar to the goals of intrusion pre- vide an alternate user space mechanism consisting of pro- vention and detection, and other security measures. How- cess monitors by which such user space daemons can be ever, the correspondence is not always one to one. Given unambiguously monitored without causing serious perfor- the gamut of possibilities through which intrusions or ma- mance impacts. A framework that claims to provide such a licious behavior can manifest, the decision-making is very feature must itself be tamper-resistant to attacks. We the- diffuse and hard, unlike detecting atomic faults where the oretically analyze and compare some relevant schemes and problem is less non-deterministic. Also, faults are generally show their fallibility. We propose our own framework that is more random and occur independent of each other, whereas based on some simple principles of graph theory and well- an intruder can target specific parts of a system deliberately founded concepts in topological fault tolerance, and show in order to disable it as a whole. that it can not only unambiguously detect any such attacks on the services but is also very hard to subvert. We also Fault tolerance techniques profess redundancy and repli- present some preliminary results as a proof of concept. cation to maintain high availability with the assumption that even if a small minority of them fail, the majority will con- tinue to render the services required of them. Byzantine agreement [23] and voting protocols [36], [19], [4] allow 1. Introduction for consistent outputs in spite of failures in some replicated components. However, from the security point of view, it is With the advent of computer networks, user space ser- safe to assume that if an intruder can compromise a partic- vices have not only become plausible but also very preva- ular copy of a service, he can compromise other copies of it lent. Both academic and other organization environments as well with relative ease. This voids the assumption that a alike use services of user space daemons such as inetd, sshd, majority of these copies will be safe and that the detection lpd, etc. More complicated services are provided by intru- of any such compromises will be unambiguous. From an sion detection systems, network management systems, etc. intruder's point of view, taking over a service simply means As these services become feature-rich, the code base be- disabling a majority of its copies. Although not trivial, it comes larger and hence these programs are likely to contain can still be done with careful reconnaissance and analysis software bugs some of which may be exploitable. Empir- of weaknesses in the service components. ical case studies in software engineering and experiences Fault tolerance is the attribute that enables a system to continue the correct performance of its specified tasks in the requires program verification that is often not computation- presence of hardware or software faults [31], [21]. It allows ally feasible. Consequently, since it is not possible to elimi- greater availability and reliability of the particular system nate all the faults, a workaround is to achieve a high degree component. This is true even in the context of intrusions. of fault tolerance. But in order to respond to such events, one must provide a Fault tolerant software architectures [5], [16], [22] have mechanism to unambiguously detect these failures. a common theme of redundant and replicated software enti- In this paper, we first discuss the pros and cons of build- ties or components that communicate with each other using ing such detection systems at various levels of the operat- some protocol to arrive at a consistent and correct outcome. ing system, and the related work. Then, we evolve some These failure models assume failures that occur indepen- theoretical machinery to analyze the various relevant tech- dent of some system activity. niques. We propose a framework using some simple con- Networks are often susceptible to outages because fail- cepts in graph theory and show that it has the capability of ure of components or nodes has resulted in some path or detecting failures of services unambiguously and also mak- circuit becoming open or incomplete. The problem of de- ing it very hard for an intruder to subvert it. While known veloping fault-tolerant networks has been well-studied us- solutions popularly implement this functionality inside the ing topologies and graph theory [32], [6] [30]. Although, kernel, the crux of this paper is to provide an alternate so- we use concepts in graph theory to model the problem, our lution by unambiguously detecting these failures in user effort attempts to provide a framework to ensure high avail- space. Although, the focus of this paper is more on failures ability of processes and services (in the context of intrusions due to intrusions and attacks, this solution works equally and attacks) on a single host rather than across nodes on a well with inherent software faults and failures. network. 1.1. Paper Organization 2.2. Intrusion Detection Techniques Section 2 gives a more elaborate description of the re- The various facets central to providing protection to ser- lated work done by peers. Section 3 explains the theoretical vices are prevention, preemption, detection, deterrence and aspects of the framework and compares it with the current mitigation. Being a decision-making problem (often non- techniques. Section 4 analyzes the various configurations. binary), intrusion detection is very hard to solve. Since Section 5 discusses the implementation issues. Preliminary the seminal work by Denning [13], significant efforts have experiments and results are presented in Section 6. The pa- been invested in devising new and effective techniques to per is concluded in Sections 7 and 8 by a discussion and perform intrusion prevention and detection. [34] presents a overview of our future work. collection of about hundred such systems. Even though the trusted computing base (TCB) of a component or process should be as minimal as possible [2], 2. Background and Relevant Work it is rarely the case [25]. A user space process relies on the operating system and if the kernel becomes unstable 2.1. Fault Tolerance Techniques then the correct behavior of the process is highly suspect. However, kernels are typically well tested and secure from tampering making kernel space implementations a natural A software service or component may fail due to differ- choice for achieving tamper-resistance. It then becomes an ent reasons. In order to expedite proper response, it is very engineering tradeoff between safety of the intrusion detec- important to ascertain the fact that the component has in- tion itself and possible performance penalties of kernel level deed failed. Detection of failures is a decision-making prob- implementations. The various pros and cons are evaluated lem with varying degrees of hardness depending on what as follows. failure model is considered. For our work we take into ac- count these models: ¢ Completely in the kernel space ¢ Inherent software failures Many intrusion detection system implementations fall into this category, e.g., [35]. When designing an in- ¢ Failures due to attacks trusion detection system, the safety of the system be- comes a critical question. Since, the processor priv- Software errors and bugs inevitably creep into large ilege levels prevent user space programs to tamper projects due to the human factor. Even though these pro- with anything inside the kernel, it becomes an obvi- grams may be tested extensively, some of these software ous choice to implement the system inside the kernel. bugs may not be detected. Total elimination of these errors Also, if some event immediately forces a user space program to be woken up, then the context switches can 2.3. Summary be very expensive. On the other hand, there are some pitfalls too. A bad implementation inside the kernel is It is clear that protecting services using intrusion detec- equally serious if not more. If the code is added to the tion systems is not adequate since the IDS itself may come critical paths of the kernel, then the performance im- under attack and fail, leaving the system vulnerable. It pact could be very high. The Linux Security Modules therefore becomes a question of “who watches the protec- (LSM) [1] framework provides a diffuse mechanism tor?”, “is it always necessary to deploy a kernel space IDS to perform checks at various places inside the kernel to protect user space services?” and “are fault-tolerant ar- as compared to a more central system call interception chitectures capable of combating attacks?”.