Efficient and Secure Query Evaluation Over Encrypted XML Databases
Total Page:16
File Type:pdf, Size:1020Kb
Secure Query Answering and Privacy-preserving Data Publishing by Hui Wang B.Sc, Wuhan University, P.R.China, 1998 M.Sc, The University of British Columbia, 2002 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy in The Faculty of Graduate Studies (Computer Science) The University Of British Columbia October, 2007 © Hui Wang 2007 ii Abstract The last several decades have witnessed a phenomenal growth in the network• ing infrastructure connecting computers all over the world. The Web has now become an ubiquitous channel for information sharing and dissemination. More and more data is being exchanged and published on the Web. This growth has created a whole new set of research challenges, while giving a new spin to some existing onefs. For example, XML (extensible Markup Language), a self-describing and semi-structured data format, has emerged as the standard for representing and exchanging data between applications across the Web. An important issue of data publishing is the protection of sensitive and private information. However, security/privacy-enhancing techniques bring disadvan• tages: security-enhancing techniques may incur overhead for query answering, while privacy-enhancing techniques may ruin data utility. In this thesis, we study how to overcome such overhead. Specifically, we address the following two problems in this thesis: (a) efficient and secure query evaluation over pub• lished XML databases, and (b) publishing relational databases while protecting privacy and preserving utility. The first part of this thesis focuses on efficiency and security issues of query evaluation over XML databases. To protect sensitive information in the pub: lished database, security policies must be defined and enforced, which will result in unavoidable overhead. Due to the security overhead and the complex struc• ture of XML databases, query evaluation may become inefficient. In this thesis, we study how to securely and efficiently evaluate queries over XML databases. First, we consider the access-controlled database. We focus on a security model by which every XML element either is locally assigned a security level or inher• its the security level from one of its ancestors. Security checks in this model can cause considerable overhead for query evaluation. We investigate how to reduce the security overhead by analyzing the subtle interactions between in• heritance of security levels and the structure of the XML database. We design a sound and complete set of rules and develop efficient, polynomial-time algo• rithms for optimizing security checks on queries. Second, we consider encrypted XML database in a "database-as-service" model, in which the private database is hosted by an untrusted server. Since the untrusted server has no decryption key, its power of query processing is very limited, which results in inefficient query evaluation. We study how to support secure and efficient query evalua• tion in this model. We design the metadata that will be hosted on the server side with the encrypted database. We show that the presence of the metadata not only facilitates query processing but also guarantees data security. We prove Abstract iii that by observing a series of queries from the client and responses by itself, the server's knowledge about the sensitive information in the database is always below a given security threshold. The second part of this thesis studies the problem of preserving both pri• vacy and the utility when publishing relational databases. To preserve utility, the published data will not be perturbed. Instead, the base table in the orig• inal database will be decomposed into several view tables. First, we define a general framework to measure the likelihood of privacy breach of a published view. We propose two attack models, unrestricted and restricted models, and derive formulas to quantify the privacy breach for each model. Second, we take utility into consideration. Specifically, we study the problem of how to design the scheme of published views, so that data privacy is protected while maximum utility is guaranteed. Given a database and its scheme, there are exponentially many candidates for published views that satisfy both privacy and utility con• straints. We prove that finding the globally optimal safe and faithful view, i.e., the view that does not violate any privacy constraints and provides the maxi• mum utility, is NP-hard. We propose the locally optimal safe and faithful view as the heuristic, and show how we can efficiently find a locally optimal safe and faithful view in polynomial time. iv Contents Abstract ii Contents iv List of Figures viii Acknowledgements xi Dedication xiii 1 Introduction 1 1.1 Efficient and Secure Evaluation of Queries over XML Database . 1 1.1.1 Efficient and Secure Evaluation of Queries over Access- Controlled XML Databases 2 1.1.2 Efficient and Secure Evaluation of Queries over Encrypted XML Databases 4 1.2 Privacy-and-utility-preserving Data Publishing 6 2 Background 9 2.1 extensible Markup Language(XML) 9 2.1.1 Document Type Definition (DTD) 11 2.1.2 XML Tree Pattern Queries (TPQ) 13 2.1.3 Structural Join: Efficient Evaluation of XML Queries . 14 2.1.4 XPath Query Containment • • • • 15 2.2 XML Encryption 16 2.3 K-anonymity 16 3 Query Evaluation over Access-controlled XML Databases . 20 3.1 Related Work 22 3.1.1 XML Access-Control Models 22 3.1.2 Efficient Enforcement Mechanisms 24 3.1.2.1 Authorization Views 24 3.1.2.2 Access-control Index 25 3.1.2.3 Static Analysis 26 3.1.2.4 Query Evaluation and Rewriting 26 3.2 Model and Problem 27 3.2.1 Security and Authorization Model 27 Contents v 3.2.2 Multi-level Security and Authorization 28 3.2.3 SC Annotations and Problem Statement 29 3.3 Definitions and Conventions 30 3.4 DAG-structured DTD Graphs 31 3.4.1 Root Rule • 32 3.4.2 Parent-Child(PC) Rule 33 3.4.3 Child-Parent(CP) Rule 34 3.4.4 Cousin-cousin(CC) Rule 34 3.4.4.1 Unconstrained Nodes 35 3.4.4.2 Constrained Nodes 36 3.4.5 Putting It All Together • 40 3.5 DAG-Structured DTDs with Disjunction 43 3.5.1 The Challenges 43 3.5.2 Optimizing against Disjunction DTDs . 45 3.5.3 Modified Rules 46 3.5.4 Algorithm DisjunctionOpt 48 3.5.5 Theoretical Results of Disjunction DTDs 49 3.6 Experiments 49 3.6.1 Experimental Setup 50 3.6.2 Impact of Optimization 50 3.6.3 Optimization Overhead 52 3.7 Summary 53 4 Query Evaluation over Encryption XML Databases 56 4.1 Related Work 59 4.2 Security Issues 60 4.2.1 Encryption Scheme 60 4.2.2 Security Constraints 61 4.2.3 Prior Knowledge and Attack Model 62 4.2.4 Security Definitions 63 4.3 Encryption Scheme 66 4.3.1 Secure Encryption Scheme 66 4.3.2 Optimal Secure Encryption Scheme 69 4.4 Metadata on Server 70 4.4.1 Structural Index 70 4.4.2 Representing DSI index 72 4.4.3 Value Index 76 4.4.3.1 Order-preserving Encryption with Splitting, Scal• ing and Noise 77 4.5 Query Processing • • 84 4.5.1 Query Translation at the Client 84 4.5.2 Query Translation at the Server 85 4.5.3 Security of Query Answering 86 4.5.4 Query Post-Processing 90 4.6 Activities on Client Side 91 4.6.1 Key Management 91 Contents vi 4.6.2 Client's Benefit 92 4.7 Experimental Evaluation 92 4.7.1 Experimental Setup 93 4.7.2 Our Approach vs. Naive Method ; . 94 4.7.3 Division of Work between Client and Server 94 4.7.4 Effect of Query Size 95 4.7.5 Effect of Various Encryption Schemes 95 4.8 Summary 96 5 Probabilistic Analysis of Privacy Leakage of Published Views 100 5.1 Related Work 103 5.2 Security Model 104 5.2.1 Private Association 104 5.2.2 Attack Models 104 5.2.2.1 Unrestricted Attack Model 104 5.2.2.2 Restricted Attack Model 105 5.2.2.3 Probability of Privacy Breach 106 5.3 Connectivity Graph 106 5.3.1 Two-View-Table Case 107 5.3.2 K-View-Table Case 108 5.4 Measurement of Probability of Privacy Breach 108 5.4.1 Two-View-Table Case 109 5.4.2 K-view-table Case 113 5.5 Summary 114 6 Privacy and Utility Preserving Data Publishing by Views . 115 6.1 Related Work 116 6.2 Security Model 117 6.2.1 Private and Public Associations 117 6.2.2 Association Cover, Privacy, and Utility 119 6.3 Faithful View Schemes 121 6.4 Connectivity Graph 122 6.4.1 The Concept 122 6.4.2 Traversing Connectivity Graph 125 6.5 View Lattice 126 6.5.1 The Concept 126 6.5.2 Properties of View Lattice 129 6.5.2.1 Monotonicity 129 6.5.2.2 Equivalence 130 6.5.3 Optimization of Search Strategy 133 6.6 Experiments 135 6.6.1 Experimental Setup 135 6.6.2 Size of Connectivity Graph 136 6.6.3 Locally Optimal Solution vs. Globally Optimal Solution . 137 6.6.4 Performance 137 6.6.5 Optimization 138 Contents vii 6.7 Summary 139 7 Conclusions and Future Work 140 7.1 Future Work 141 7.1.1 Efficient and Secure Query Evaluation 141 7.1.2 Privacy-preserving Data Publishing 141 Bibliography 143 A Proof 152 viii List of Figures 1.1 An Example of Encrypted XML Database 2 1.2 Example XMark-based DAG DTD .