Damage Management in Database Management Systems
Total Page:16
File Type:pdf, Size:1020Kb
The Pennsylvania State University The Graduate School Department of Information Sciences and Technology Damage Management in Database Management Systems A Dissertation in Information Sciences and Technology by Kun Bai °c 2010 Kun Bai Submitted in Partial Ful¯llment of the Requirements for the Degree of Doctor of Philosophy May 2010 The dissertation of Kun Bai was reviewed and approved1 by the following: Peng Liu Associate Professor of Information Sciences and Technology Dissertation Adviser Chair of Committee Chao-Hsien Chu Professor of Information Sciences and Technology Thomas La Porta Distinguished Professor of Computer Science and Engineering Sencun Zhu Assistant Professor of Computer Science and Engineering Frederico Fonseca Associate Professor of Information Sciences and Technology Associate Dean, College of Information Sciences and Technology 1Signatures on ¯le in the Graduate School. iii Abstract In the past two decades there have been many advances in the ¯eld of computer security. However, since vulnerabilities cannot be completely removed from a system, successful attacks often occur and cause damage to the system. Despite numerous tech- nological advances in both security software and hardware, there are many challenging problems that still limit e®ectiveness and practicality of existing security measures. As Web applications gain popularity in today's world, surviving Database Man- agement System (DBMS) from an attack is becoming even more crucial than before because of the increasingly critical role that DBMS is playing in business/life/mission- critical applications. Although signi¯cant progress has been achieved to protect the DBMS, such as the existing database security techniques (e.g., access control, integrity constraint and failure recovery, etc.,), the buniness/life/mission-critical applications still can be hit due to some new threats towards the back-end DBMS. For example, in addition to the vulnerabilities exploited by attacks (e.g., the SQL injections attack), databases can be damaged in several ways such as the fraudulent transactions (e.g., identity theft) launched by malicious outsiders, erroneous transactions issued by the in- siders by mistakes. When the database is under such a circumstance (attack), rolling back and re-executing the damaged transactions are the most used mechanisms dur- ing the system recovery. This kind of mechanism either stops (or greatly restricts) the database service during repair, which causes unacceptable data availability loss or denial- of-service for mission critical applications, or may cause serious damage spreading during iv on-the-fly recovery where many clean data items are accidentally corrupted by legitimate new transactions. In this study, we address database damage management (DBDM), a very important problem faced today by a large number of mission/life/business-critical applications and information systems that must manage risk, business continuity, and assurance in the presence of severe cyber attacks. Although a number of research projects have been done to tackle the emerging data corruption threats, existing mechanisms are still limited in meeting four highly de- sired requirements: near-zero-run-time overhead, zero-system-down time, zero-blocking- time for read-only transactions, minimal-delay-time for read-write transactions. Firstly, to achieve the four highly desired requirements, we propose TRACE, a zero-system- down-time database damage tracking, quarantine, and recovery solution with negligible run time overhead. TRACE consists of a family of new database damage tracking, quarantine, and cleansing techniques. We built TRACE into the kernel of PostgreSQL. Secondly, motivated by the limitation of TRACE mechanism, we propose a novel proac- tive damage management approach denoted database ¯rewalling. This approach deals with transaction level attacks. Pattern mining and Bayesian network techniques are adopted in the ¯rewalling framework to mine frequent damage spreading patterns and to predict the data integrity in the face of attack when certain type of attack occurs repeatedly. This pattern mining and Bayesian inference approach provides a probability based strategy to estimate the data integrity on the fly. With this probabilistic feature, the database ¯rewalling approach is able to enforce a policy of transaction ¯ltering to dynamically ¯lter out the potential damage spreading transactions. v Table of Contents List of Tables :::::::::::::::::::::::::::::::::::::: x List of Figures ::::::::::::::::::::::::::::::::::::: xi Acknowledgments ::::::::::::::::::::::::::::::::::: xiii Chapter 1. Introduction :::::::::::::::::::::::::::::::: 1 1.1 Problems in This Study . 4 1.1.1 Light Weighted Damage Quarantine and Recovery System . 4 1.1.2 Preventive Damage Management Approach . 5 1.2 Contributions of This Work . 7 1.3 Outline . 8 Chapter 2. Related Works :::::::::::::::::::::::::::::: 10 2.1 Traditional Database Security Research . 11 2.2 Failure Handling Research . 16 2.3 Intrusion Detection System (IDS) Research . 20 2.4 Intrusion Tolerant Database Research . 27 2.4.1 Self-Healing Software Systems . 27 2.4.2 Intrusion Tolerant Database Systems . 32 Chapter 3. The TRACE Damage Management System :::::::::::::: 42 3.1 Introduction . 42 vi 3.2 Background . 44 3.2.1 Existing Solutions and Their Limitations . 45 3.3 Preliminaries and Problem Statement . 47 3.3.1 The Threat Model . 47 3.3.2 An Motivated Example using SQL Injection Attack . 48 3.3.3 Basic Concepts of the Database System . 49 3.3.4 Problem Statement . 50 3.3.4.1 Dependency Relations . 50 3.3.4.2 Problem Statement . 52 3.4 Overview of Approach . 53 3.4.1 Assumptions . 54 3.4.2 The Standby Mode . 55 3.4.3 The Cleansing Mode . 58 3.4.3.1 Damage Quarantine . 58 3.4.3.2 Damage Assessment . 60 3.4.3.3 Valid Data De-Quarantine . 61 3.4.3.4 Repairing On-The-Fly . 64 3.5 Design and Implementation of TRACE atop PostgreSQL . 65 3.5.1 Implementing the Marking Scheme . 66 3.5.2 Maintaining Before Images . 68 3.5.3 Damage Assessment Module . 68 3.5.4 Quarantine/De-Quarantine Module . 69 3.5.5 On-The-Fly Repairing . 71 vii 3.5.6 Garbage collection . 72 3.6 TRACE-FG: A Fine-Grained Damage Management Scheme . 75 3.6.1 A Fine-grained Version-data-status-identi¯cation Method . 75 3.6.2 Fine-grained De-quarantine and Repair . 78 3.7 Experimental Results . 79 3.7.1 Evaluation of TRACE System . 79 3.7.1.1 System Overhead . 80 3.7.1.2 Zero System Down Time . 81 3.7.1.3 TRACE vs. `point-in-time' recovery . 83 3.7.1.4 System CPU Time Distribution . 86 3.7.1.5 Reduced Service Delay Time and Saved Legitimate Transactions . 87 3.7.1.6 IDS Impacts on TRACE System . 90 3.7.2 Evaluation of TRACE-FG System . 93 3.7.2.1 Saved Data Versions . 95 3.7.2.2 Reduced Delay Time/Saved De-committed Transac- tions . 98 3.7.2.3 TRACE-FG vs. TRACE and `point-in-time' on Sys- tem Throughput . 100 3.8 Discussion . 101 3.9 Appendix . 102 3.9.1 TPC-C Transaction Read/Write Set Template . 102 3.9.2 Clinic OLTP Application Transaction Read/Write Template . 105 viii Chapter 4. Preventive Damage Management through Filtering of Damage Spread- ing Transactions ::::::::::::::::::::::::::::: 109 4.1 Introduction . 109 4.2 Background and Preliminaries . 111 4.2.1 The Threat Model . 111 4.2.2 The System Model . 112 4.3 Overview of Approach . 113 4.3.1 Rational . 114 4.3.2 Database Firewall Architecture . 115 4.4 Mining Damage Spreading Patterns . 118 4.4.1 Two Types of Damage Spreading Patterns . 118 4.4.2 Mining The Damage Spreading Patterns . 119 4.4.2.1 Basic De¯nitions . 120 4.4.2.2 Finding One-Hop Spreading Patterns . 122 4.4.2.3 Finding Multi-Hop Spreading Patterns . 123 4.5 Integrity Level Estimator . 124 4.5.1 Bayesian Network Based Analysis on Data Integrity . 126 4.5.2 Integrity Estimation Using Bayesian Network . 128 4.5.3 Algorithm of Integrity Estimation . 131 4.5.4 Updating the Filtering Rules On-The-Fly . 131 4.6 Experimental Results . 133 4.6.1 Experiment Settings . 134 4.6.2 Mining Algorithms Testing . 135 ix 4.6.3 System Integrity Analysis . 137 4.6.4 System Availability Analysis . 139 4.6.5 System Throughput Analysis . 141 Chapter 5. Conclusion ::::::::::::::::::::::::::::::::: 145 5.1 Summary . 145 5.2 Future Work . 148 5.2.1 Research Plan . 148 5.2.1.1 Task 1. Non-Blocking Repair Using a Shadow Database. 149 5.2.1.2 Task 2. Machine Learning based Preventive Damage Management. 150 Bibliography :::::::::::::::::::::::::::::::::::::: 151 x List of Tables 3.1 Existing Approaches and Their Limitations. 44 3.2 TPC-C Parameters and Their Values . 80 A 4.1 An Example of Attack History Hi . 112 A 4.2 An Example of An Attack History Hi Grouped By Patient ID and Sorted By Transaction Time . 113 4.3 One-Hop Spreading Pattern Mapping . 121 A0 4.4 An Example of Attack Histories Hi After Mapping . 121 4.5 An Example of Using Attack Histories to Mine One-Hop Spreading Patterns 121 4.6 TPC-C Parameters and Their Values . 135 4.7 Parameters of the synthetic datasets . 135 4.8 Parameter values of the datasets . 136 4.9 Measurement of the Mining Algorithm of One-hop Patterns . 136 4.10 Speed of System Integrity Estimation . 138 xi List of Figures 3.1 An Example of Damage Spreading . 46 3.2 The TRACE System Workflow . 55 3.3 An Example of the Causality Table Construction . 56 3.4 Identify/Repair Corrupted Data Records Overview . 59 3.5 Example of Repairing Corrupted Data Records . 62 3.6 Data Record Structure with Marking Bits . 67 3.7 TRACE Assessment/Repair with Causality Table in PostgreSQL . 70 3.8 An Example of the relationship between the process window and the missing probability . 74 3.9 Cases of Assessment Procedure Limitations .