Uva-DARE (Digital Academic Repository)
Total Page:16
File Type:pdf, Size:1020Kb
UvA-DARE (Digital Academic Repository) Scalable distributed data structures for database management Karlsson, S.J. Publication date 2000 Document Version Final published version Link to publication Citation for published version (APA): Karlsson, S. J. (2000). Scalable distributed data structures for database management. General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons). Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible. UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl) Download date:11 Oct 2021 atabasee Management Scalablee Distributed Data Structures s for r Databasee Management Academischh Proefschrift terr verkrijging van de graad van doctor aann de Universiteit van Amsterdam opp gezag van de Rector Magnificus prof.. dr. J. J. M. Franse tenn overstaan van een door het collegee voor promoties ingestelde commissie, inn het openbaar te verdedigen inn de Aula der Universiteit opp donderdag 14 december 2000, te 11.00 uur doorr Jonas S Karlsson geborenn te Enköping, Sweden Promotor:: Prof. Dr. M. L. Kersten Faculteit:: Faculteit der Natuurwetenschappen, Wiskunde en Inform; Thee research reported in this thesis has been partially carried out at thee University of Linköping, Sweden, within the Engineering Database Lab,, a research group at the Department of Computer and Information Sciencee at Linköping Institute of Technology. Thee research reported in this thesis has been partially carried out at CWI,, the Dutch national research laboratory for mathematics and com- puterr science, within the theme Data Mining and Knowledge Discovery, aa subdivision of the research cluster Information Systems. Thee research reported in this thesis has been carried out under the auspi- cess of SIKS, the Dutch Graduate School for Information and Knowledge Systems.. SIKS Dissertation Series No 2000-11 ISBNN 90 6196 498 9 Contents s II Scalable Distributed Data Structures 19 11 Preliminaries 21 1.11 Birthground of SDDSs 21 1.22 SDDSs 22 1.33 Requirements from SDDSs 22 1.44 Data Structures - Background 23 1.4.11 Retrieval Methods 23 1.4.22 Reasonable Properties 24 1.4.33 Basic Data Organization 25 1.4.44 Memory Management/Heaps 26 1.4.55 Linked Lists 27 1.4.66 Chained/Closed Hashing 27 1.4.77 Trees 27 1.4.88 Signature Files 28 1.55 Roundup 29 1.66 LH* (1 dimensional data) 29 1.6.11 LH* Addressing Scheme 30 1.6.22 LH* File Expansion 31 1.6.33 Conclusion 32 1.77 Orthogonal Aspects 33 1.7.11 Performance 33 1.7.22 Dimensions 33 1.7.33 Overhead 34 1.7.44 Distribution and Parallelism 34 1.7.55 Availability 35 22 The LH*LH Algorithm 37 2.11 Introduction 37 2.22 The Server 38 2.2.11 The LH Manager 38 2.2.22 LH* Partitioning of an LH File 39 2.2.33 Concurrent Request Processing and Splitting 42 2.2.44 Shipping 42 3 3 44 CONTENTS 2.33 Notes on LH*LH Communications 43 2.3.11 Communication Patterns 44 2.44 LH*LH Implementation 45 2.4.11 The System Initialization 45 2.4.22 The Data Client 46 2.4.33 The Server 50 2.4.44 Server Mapping 52 2.4.55 Summary and Future Work 54 2.4.66 Host for Scientific Data 55 2.55 Hardware Architecture 56 2.5.11 Communication 57 2.5.22 Measure Suite 58 2.66 Performance Evaluation 58 2.6.11 Scalability 59 2.6.22 Efficiency of Concurrent Splitting 64 2.77 Curiosity 68 2.88 Conclusion 69 33 SDDS for High-Performance Spatial Access 71 3.11 Introduction 72 3.22 hQT* Overview 73 3.2.11 Records 74 3.2.22 Pseudokey Construction 74 3.2.33 Bucket Numbering 75 3.2.44 Addressing 75 3.2.55 File Growth 77 3.33 Distribution in hQT* 77 3.3.11 Distribution (ForwardBuckets) 78 3.3.22 Distributed Point queries 78 3.3.33 Distributed Region Queries 78 3.3.44 IAM Policies 79 3.44 Server Splitting 80 3.4.11 hQT* Splitting 80 3.4.22 Dissection Splitting Algorithm 80 3.55 Measurements 81 3.5.11 Efficiency of IAM Policies 81 3.5.22 Server Load Distribution 85 3.5.33 Discussion 86 3.66 Conclusions 87 44 S7-storage: Multi-Attribute Storage 89 4.11 Introduction 90 4.22 Related Work 90 4.33 The O-storage 92 CONTENTSCONTENTS 5 4.3.11 Buckets and Branch nodes 92 4.3.22 An Example 93 4.3.33 Point Searching 94 4.3.44 Splitting Strategy 94 4.44 Performance evaluation and Tuning 95 4.4.11 Bucket Size vs Pruning 96 4.4.22 Insert costs 97 4.4.33 Search cost for a growing data set 98 4.4.44 Influence of Number of Attributes 100 4.4.55 Comparison with kd-tree 100 4.55 Exploration of the O-tree design space 101 4.5.11 Branches 101 4.5.22 Dynamic hash-function 103 4.5.33 Explored variants of fi-trees 103 4.5.44 Implementation Notes 104 4.66 Conclusions 105 III Applications of SDDSs 107 55 Database Systems 111 5.11 The Need for High Performance Databases Ill 5.22 Conventional Databases 113 5.33 Distributed Databases 113 5.44 Federated Databases 114 5.55 Multidatabases 114 5.66 Data Servers 115 5.77 Parallel Data Servers 115 5.88 Database Machines 117 5.99 Overview of Some Data Servers 118 5.100 DB History 120 5.111 Conclusions 121 5.122 Properties of Structures for Servers 122 5.12.11 The Problem 122 5.12.25.12.2 Scalability 122 5.12.33 Distribution 123 5.12.44 Availability 124 5.12.55 Conclusions 124 66 Scalable Distributed Storage Manager 125 6.11 Seamless SDDS integration in an Extensible DBMS 125 6.22 Introduction 126 6.33 Background 127 6.3.11 Scalable Distributed Data Structures 127 66 CONTENTS 6.3.22 Monet 128 6.44 SDDS within an Extensible Database System 128 6.4.11 SDDS requirements on a DBMS 128 6.4.22 Resource Management 129 6.4.33 SDDS Administration 129 6.4.44 Algebraic Operations 130 6.55 Implementation and Performance Study 131 6.5.11 Optimal Size of a Distributed Partition 132 6.5.22 Overhead added by SDDSs 134 6.5.33 Performance Scalability 134 6.5.44 Discussion 136 6.66 Summary 137 77 Summary & Future Issues 139 7.11 Summary 139 7.22 Extensions to this thesis 139 7.33 Future work 140 Listt of Figures 1.11 LH* File Expansion Scheme 30 2.11 The Data Server 39 2.22 The LH-structure 40 2.33 Pseudo-key usage by LH and LH* 40 2.44 Partitioning of an LH-file by LH* splitting 41 2.55 One node on the Parsytec machine 56 2.66 Static routing on a 64 nodes machine between two nodes. 57 2.77 Allocation of servers and clients 58 2.88 Build time of the file for a varying number of clients 59 2.99 Global insert time measure at one client, varying the number off clients 60 2.100 Actual throughput with varying number of clients 61 2.111 Ideal and actual throughput with respect to the number of clientss 62 2.122 Comparison between Static and Dynamic splitting strategy, onee client 63 2.133 Comparison between Static and Dynamic splitting, with four clientss 63 2.144 Efficiency of individual shipping 64 2.155 Efficiency of bulk shipping 65 2.166 Efficiency of the concurrent splitting 66 2.177 LH*LH client insert time scalability 67 3.11 The Record Structure 74 3.22 An offset space-filling curve first 3 layers 75 3.33 Navigation in a) Quad-Tree b) hQT* 76 3.44 Left: hQT* file key space partitioning by 4 successive splits. Right:: The equivalent quad-tree 77 3.55 Split Dissection Algorithm 81 3.66 a) Forward message count using different policies on servers 1,2,33 and 31.b) Only the 3 most efficient strategies 84 3.77 Split distribution over "time" 86 7 7 LISTLIST OF FIGURES 4.11 A bucket of an !7-tree and its attributes 92 4.22 A "typical" ft-marshaled tree 93 4.33 Bucket Split Algorithm 95 4.44 Varying a) search times b) insert times 97 4.55 Search time using a) 1,2,3,8 attributes in O-tree b) details of 88 attribute 98 4.66 a) Search times b) 8, 16 attributes files c) 8 attribute file e) 166 attribute file 99 4.77 8 attribute file, standard deviation for patterns search a) The kd-treee and fi-tree 10% compared with the 0-20% b) KD comparedd with fï-tree with acceptance limit of 10% and 20% 101 5.11 Data and application servers 116 6.11 Local memory on one node, varying sizes of data 133 6.22 One node, distributed access, varying sizes 134 6.33 Constant sized relation 8 MBytes, varying number of nodes.