Paralellizing the Data Cube

PARALELLIZING THE DATA CUBE By Todd Eavis SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY AT DALHOUSIE UNIVERSITY HALIFAX, NOVA SCOTIA JUNE 27, 2003 °c Copyright by Todd Eavis, 2003 DALHOUSIE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE The undersigned hereby certify that they have read and recommend to the Faculty of Graduate Studies for acceptance a thesis entitled “Paralellizing the Data Cube” by Todd Eavis in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Dated: June 27, 2003 External Examiner: Virendra Bhavsar Research Supervisor: Andrew Rau-Chaplin Examing Committee: Qigang Gao Evangelos Milios ii DALHOUSIE UNIVERSITY Date: June 27, 2003 Author: Todd Eavis Title: Paralellizing the Data Cube Department: Computer Science Degree: Ph.D. Convocation: October Year: 2003 Permission is herewith granted to Dalhousie University to circulate and to have copied for non-commercial purposes, at its discretion, the above title upon the request of individuals or institutions. Signature of Author THE AUTHOR RESERVES OTHER PUBLICATION RIGHTS, AND NEITHER THE THESIS NOR EXTENSIVE EXTRACTS FROM IT MAY BE PRINTED OR OTHERWISE REPRODUCED WITHOUT THE AUTHOR’S WRITTEN PERMISSION. THE AUTHOR ATTESTS THAT PERMISSION HAS BEEN OBTAINED FOR THE USE OF ANY COPYRIGHTED MATERIAL APPEARING IN THIS THESIS (OTHER THAN BRIEF EXCERPTS REQUIRING ONLY PROPER ACKNOWLEDGEMENT IN SCHOLARLY WRITING) AND THAT ALL SUCH USE IS CLEARLY ACKNOWLEDGED. iii To the two women in my life: Amber and Bailey. iv Table of Contents Table of Contents v List of Tables x List of Figures xi Abstract i Acknowledgements ii 1 Introduction 1 1.1 Overview of Primary Research . 3 1.1.1 Parallelizing the Data Cube . 4 1.1.2 Computing Partial Cubes in Parallel . 5 1.1.3 Parallel Multi-dimensional Indexing . 5 1.2 A Look Ahead . 7 2 An Introduction to Parallel Computing 9 2.1 Introduction . 9 2.2 A Taxonomy of Parallel Architectures . 10 2.3 The Memory Model . 13 2.3.1 Shared Memory MIMD . 13 2.3.2 Distributed Memory MIMD . 15 2.4 The Interconnection Fabric . 16 2.4.1 Dynamic Interconnection Networks . 16 2.4.2 Static Interconnection Networks . 17 2.5 Contemporary Trends . 22 2.5.1 The Symmetric Multi-Processor . 22 2.5.2 The Cluster Alternative . 24 2.5.2.1 Remaining Hurdles . 25 v 2.6 Parallel Computing Models . 28 2.6.1 The PRAM . 29 2.6.2 Bulk Synchronous Parallel . 30 2.6.3 LogP . 32 2.6.4 CGM . 33 2.7 Performance Measurement . 34 2.7.1 Non-Optimality . 35 2.7.2 Scalability of Parallel Algorithms . 36 2.8 Application Support . 37 2.8.1 MPI Primitives . 37 2.8.2 MPI Alternatives . 39 2.8.3 SMP Support . 39 2.9 Machine Models Used in This Thesis . 40 2.10 Workload Partitioning and NP-Completeness . 43 2.10.1 The Theory of NP-completeness . 44 2.11 Conclusion . 48 3 An Introduction to OLAP and the Data Cube 50 3.1 Introduction . 50 3.2 Decision Support Systems . 51 3.2.1 The Historical Context of OLAP . 53 3.3 Defining OLAP . 55 3.3.1 OLAP: A Functional Definition . 57 3.3.2 OLAP: The FASMI Definition . 58 3.4 The Data Warehouse . 61 3.4.1 Architecture . 62 3.4.2 The Star Schema . 64 3.4.3 MOLAP, ROLAP and Multi-dimensional Data . 66 3.5 The Data Cube . 67 3.5.1 The Data Cube Operator . 71 3.6 Data Cube Algorithms . 72 3.6.1 Top Down . 73 3.6.2 Bottom Up . 78 3.6.3 Array-based . 82 3.7 Multi-dimensional Indexing Techniques . 88 3.7.1 The Origin of Indexing . 88 3.7.2 In-core methods . 91 3.7.3 Disk-based Methods . 94 3.7.3.1 Multi-dimensional Hashing . 96 3.7.3.2 Hierarchical Tree-based Methods . 98 vi 3.7.3.3 Space-filling Curves . 101 3.7.4 Comparative Results . 104 3.8 Conclusion . 105 4 Computing Full Data Cubes in Parallel 107 4.1 Introduction . 107 4.2 Related Work . 108 4.3 Motivation . 116 4.4 A New Approach to Parallelizing the Data Cube . 118 4.4.1 The Target Architecture . 118 4.4.2 A Sequential Base . 120 4.4.3 Partitioning for Parallel Computation . 121 4.4.4 The Parallel PipeSort Algorithm . 125 4.5 Optimizing Performance . 127 4.5.1 Optimizing Sorting Operations . 129 4.5.2 Data Movement . 132 4.5.3 Aggregation Operations . 137 4.5.4 Input/Output Patterns . 141 4.6 The Costing Model . 145 4.6.1 Cuboid Size Estimation . 146 4.6.1.1 Cardinality-based Estimation . 147 4.6.1.2 Sample Scaling . 147 4.6.1.3 A Probabilistic Method . 148 4.6.1.4 Our Own Probabilistic Approach . 149 4.6.2 Pipeline Cost Estimation . 151 4.6.2.1 Input/Output . 152 4.6.2.2 Scanning . 153 4.6.2.3 Sorting . 154 4.6.3 Putting it all together . 155 4.7 Implementation . 158 4.7.1 Generating Data Cube Input . 159 4.8 Analysis . 159 4.8.1 The Scheduling Phase . 162 4.8.2 Workload Partitioning . 163 4.9 Experimental Evaluation . 168 4.9.1 Parallel Speedup . 170 4.9.2 Data Set Size . 173 4.9.3 Dimension Count . 175 4.9.4 Over-Sampling Factor . 177 4.9.5 Record Skew . 177 vii 4.9.6 Pipeline Performance . 180 4.10 Review of Research Objectives . 181 4.11 Conclusions . 183 5 Computing Partial Cubes in Parallel 185 5.1 Introduction . 185 5.2 Related Work . 186 5.3 Motivation . 193 5.4 A New Partial Cube Method . 194 5.4.1 Adding Non-Essential Nodes to the Selected Set . 195 5.4.2 Building the Complete Schedule Tree . 201 5.5 Analysis and Optimization . 210 5.5.1 Complexity . 210 5.5.2 Reducing the Cost of Building the Essential Tree . 211 5.5.2.1 Recursive Pipeline Generation . 211 5.5.2.2 An Aggressive Quadratic Time Algorithm . 217 5.5.3 Reducing the Cost of Adding Non Essential Views . 219 5.5.4 Extending the Algorithm into High Dimensions . 225 5.6 Parallel Partial Data Cubes . 232 5.7 Experimental Evaluation . 233 5.7.1 Evaluation of Schedule Tree Generation Algorithms . 234 5.7.1.1 Quality of Generated Trees . 234 5.7.1.2 Run Time Performance on the Full Cube . 236 5.7.1.3 Computing Partial Cubes . 237 5.7.1.4 Addition of Non-Essential Views . 241 5.7.1.5 Pruning the Guiding Graph . 243 5.7.2 Performance of the Parallel Partial Cube Algorithm . 245 5.8 Review of Research Objectives . 248 5.9 Conclusions . 249 6 Distributed Data Cube Indexing 251 6.1 Introduction . 251 6.2 Related Work . 253 6.2.1 Sequential ROLAP Indexing . 255 6.2.1.1 The R-tree . ..

Paralellizing the Data Cube

Two-Level Main Memory Co-Design: Multi-Threaded Algorithmic Primitives, Analysis, and Simulation

Minimizing Writes in Parallel External Memory Search

The Parallel Persistent Memory Model

Implementing Operational Intelligence Using In-Memory Computing

Models for Parallel Computation in Multi-Core, Heterogeneous, and Ultra Wide-Word Architectures

High Performance Computing Ð Past, Present and Future

Experiments with a Parallel External Memory System*

Fundamentals – Parallel Architectures, Models, and Languages

The Efficiency of Mapreduce in Parallel External Memory Arxiv

The Power of In-Memory Computing: from Supercomputing to Stream Processing

On the Complexity of List Ranking in the Parallel External Memory Model

15. Hierarchical Models and Software Tools for Parallel Programming