Software Self-Healing Using Error Virtualization
Total Page:16
File Type:pdf, Size:1020Kb
Software Self-healing Using Error Virtualization Stylianos Sidiroglou Submitted in partial fulfillment of the Requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2008 c 2008 Stylianos Sidiroglou All Rights Reserved ABSTRACT Stylianos Sidiroglou Software Self-healing Using Error Virtualization Despite considerable efforts in both research and development strategies, software errors and subsequent security vulnerabilities continue to be a significant problem for computer systems. The accepted wisdom is to approach the problem with a multitude of tools such as diligent software development strategies, dynamic bug finders and static analysis tools in an attempt to eliminate as many bugs as possible. Unfortunately, history has shown that it is very hard to achieve bug-free software. The situation is further exacerbated by the exorbitant cost of system down-time which some estimates place at six million dollars per hour. In the absence of perfect software, retrofitting error toleration and recovery techniques, in systems not designed to deal with failures, becomes a necessary complement to proactive approaches. Towards this goal, this dissertation introduces and evaluates a set of techniques for recovering program execution in the presence of faults by effectively retrofitting legacy applications with exception handling techniques, Error Virtualization and AS- SURE . The main premise of the approach is that there is a mapping between faults that may occur during program execution and a finite set of errors that are explicitly handled by the program’s code. Experimental results are presented to support our hypothesis. The results demonstrate that our techniques can recover program execu- tion in the case of failures in 80 to 90% of the examined cases, based on the technique used. Furthermore, the results illustrate that the performance overhead induced by the techniques to protect against a specific fault can be minimized to under 10%. This dissertation also describes two deployment mechanisms, Shadow Honeypots and Application Communities, that can reduce the cost of monitoring the application and, in turn, enable efficient deployment strategies for error virtualization systems. Contents Contents i List of Figures vi List of Tables ix 1 Introduction 1 1.1 Background: Exploiting Software Elasticity ............... 6 1.2 Contributions ............................... 7 1.3 Dissertation Roadmap .......................... 8 2 Error Virtualization 10 2.1 Error Virtualization ............................ 10 2.1.1 Execution Transactions ...................... 13 2.1.2 Recovery: Forcing Error Returns ................ 14 2.2 Error Virtualization Using Rescue Points ................ 16 2.2.1 Rescue Point Discovery ...................... 19 2.2.2 Rescue Point Selection ...................... 20 2.2.3 Rescue Point Creation ...................... 22 2.2.4 Rescue Point Testing ....................... 23 i 2.2.5 Rescue Point (Patch) Deployment ................ 24 3 Related Work 26 3.1 Software Elasticity and Error Recovery ................. 26 3.2 Self-healing Systems ........................... 29 3.3 Protection Mechanisms .......................... 31 3.4 Transactional Processing ......................... 33 3.5 Automated Testing ............................ 33 4 DYBOC: DYnamic Buffer Overflow Containment 35 4.1 Introduction ................................ 35 4.1.1 Our Contribution ......................... 37 4.2 Approach ................................. 39 4.2.1 Instrumentation .......................... 39 4.2.2 Recovery: Execution Transactions ................ 42 4.2.3 Dynamic Defensive Postures ................... 48 4.3 Evaluation: Execution Transactions ................... 50 4.3.1 Performance Evaluation ..................... 50 4.3.2 Effectiveness as a worm containment strategy ......... 53 5 STEM: Selective Transactional EMulation 56 5.1 Introduction ................................ 56 5.2 Approach ................................. 57 5.3 System Overview ............................. 57 5.4 Application Monitors ........................... 60 5.5 Selective Transactional EMulation (STEM) ............... 61 5.6 Recovery: Forcing Error Returns .................... 63 ii 5.7 Caveats and Limitations ......................... 64 5.8 Implementation .............................. 65 5.9 Evaluation ................................. 70 5.9.1 Performance ............................ 70 6 ASSURE 77 6.1 Introduction ................................ 77 6.2 ASSURE Operational Overview ..................... 81 6.2.1 Architectural Components .................... 83 6.3 Error Virtualization Using Rescue Points ................ 87 6.3.1 Rescue Point Discovery ...................... 90 6.3.2 Rescue Point Selection ...................... 91 6.3.3 Rescue Point Creation ...................... 92 6.3.4 Rescue Point Testing ....................... 94 6.3.5 Rescue Point (Patch) Deployment ................ 95 7 Evaluation of Error Virtualization 97 7.1 Experimental Evaluation ......................... 98 7.1.1 Bug Summary ........................... 99 7.1.2 Overall Functionality Results .................. 99 7.1.3 Patch Generation Performance .................. 102 7.1.4 Recovery Performance ...................... 104 7.1.5 Patch Overhead .......................... 106 7.1.6 ASSURE Component Overhead ................. 107 7.2 Evaluation on Injected Faults ...................... 109 7.2.1 Comprehensive Evaluation on Apache .............. 110 7.2.2 Error Virtualization Survivability Results ............ 113 iii 7.2.3 Return Value Distribution .................... 114 8 Deployment Scenarios 116 8.1 Shadow Honeypots ............................ 116 8.2 Introduction ................................ 117 8.3 Architecture ................................ 121 8.4 Implementation .............................. 125 8.4.1 Filtering and anomaly detection ................. 125 8.4.2 Shadow Honeypot Creation ................... 127 8.5 Experimental Evaluation ......................... 132 8.5.1 Performance of shadow services ................. 132 8.5.2 Filtering and anomaly detection ................. 137 8.6 Limitations ................................ 142 8.7 Application Communities ......................... 143 8.8 Motivation ................................. 143 8.9 Application Communities ......................... 146 8.10 Analysis .................................. 149 8.10.1 Work Calculation ......................... 151 8.10.2 Work Distribution ........................ 153 8.10.3 Overlapping Coverage ...................... 155 8.10.4 Analytical Results ........................ 157 8.11 Evaluation ................................. 158 9 Conclusion 165 9.1 Summary ................................. 165 iv 10 Future Work 169 10.0.1 Long-term Goals ......................... 171 Bibliography 174 A Analysis on Real Bugs 188 A.0.2 Apache (mod ftp proxy) ..................... 188 A.0.3 Apache (mod rewrite) ...................... 189 A.0.4 Apache (mod include) ...................... 189 A.0.5 openLDAP modrdn ........................ 190 A.0.6 postgreSQL ............................ 190 A.0.7 MySQL .............................. 191 A.0.8 sshd ................................ 192 v List of Figures 1.1 Error virtualization: Mapping between possible and handled faults .. 3 2.1 Error virtualization: Mapping between possible and handled faults .. 12 2.2 Error virtualization (EV) using rescue points .............. 17 2.3 Creating Rescue Points .......................... 18 4.1 Protecting with pmalloc() ........................ 40 4.2 Saving state for recovery. ......................... 45 4.3 Saving previous recovery context. .................... 46 4.4 Saving global variable. .......................... 47 4.5 Enabling DYBOC conditionally. ..................... 47 4.6 Micro-benchmark results. ......................... 50 4.7 Apache benchmark results. ........................ 50 4.8 DYBOC’s effects on worm propagation ................. 54 5.1 Feedback control loop ........................... 58 5.2 STEM Example .............................. 66 5.3 Return from within emulation. ...................... 68 5.4 STEM Performance ............................ 73 5.5 STEM Performance: Main processing loop ............... 74 vi 6.1 ASSURE System Overview ........................ 82 6.2 Error virtualization (EV) using rescue points .............. 88 6.3 Creating Rescue Points .......................... 89 7.1 Rescue-point to fault ........................... 100 7.2 Recovery time ............................... 102 7.3 Patch generation time .......................... 102 7.4 Normalized performance ......................... 106 7.5 Checkpoint Times ............................. 107 7.6 Checkpoint Size .............................. 108 7.7 EV Apache ................................ 109 7.8 Rescue Depth Apache .......................... 110 7.9 Error Virtualization Recovery Rate ................... 114 7.10 Return value distribution: The purpose of this busy graph is to illus- trate the distribution of values returned by functions during erroneous input. This range of return values explains why heuristics tend to produce sub-par results for recovering program execution. ....... 115 8.1 Shadow Honeypots: Accuracy vs Scope ................. 118 8.2 Shadow Honeypot architecture. ..................... 122 8.3 System workflow. ............................. 122 8.4 High-level diagram of prototype shadow honeypot implementation. 125 8.5 Example of pmalloc()-based