Parallel Algorithms in External Memory

Parallel Algorithms in External Memory By David Alexander Hutchinson A thesis submitted to the Faculty of Graduate Studies and Research in partial fullmentof the requirements for the degree of Do ctor of Philosophy OttawaCarleton Institute for Computer Science Scho ol of Computer Science Carleton University Ottawa Ontario May Includes revisions as of February The original version of this thesis is available from University Microlms International c Copyright David Alexander Hutchinson Abstract External memory EM algorithms are designed for computational problems in which the size of the internal memory of the computer is only a small fraction of the prob lem size The Parallel Disk Mo del PDM of Vitter and Shriver is widely used to discriminate b etween external memory algorithms on the basis of inputoutput IO complexity Parallel algorithms are designed to eciently utilize the computing p ower of multiple pro cessing units interconnected by a communication mechanism Apop ular mo del for developing and analyzing parallel algorithms is the Bulk Synchronous Parallel BSP mo del due to Valiant In this work we develop simulation techniques b oth randomized and deterministic which pro duce ecient EM algorithms from ecient algorithms develop ed under BSP like parallel computing mo dels Our techniques can accommo date one or multiple pro cessors on the EM target machine each with one or more disks and they also adapt to the disk blo cking factor of the target machine We prop ose new more comprehensive mo dels for EM and parallel algorithms which consider the total costs incurred by the algorithm including computation IO and communication The new EMBSP EMBSP and EMCGM mo dels combine the features of the BSP and PDM and thereby answer to a challenge p osed by the ACM Working Group on Storage IO for LargeScale Computing We obtain parallel external memory algorithms for a large number of problems including sorting p ermutation matrix transp ose geometric and GIS problems in cluding D convex hulls D Voronoi diagrams and various graph problems ii Acknowledgements I would like to thank all of the p eople who have help ed me to complete the work describ ed in this thesis Knowing that I cannot thank everyone who has help ed me in my studies and fearing that I would forget someone who has help ed a great deal I have pro crastinated as usual in writing these acknowledgements However the time has come so I hop e that those who I have not listed will forgive me I am sure that I will rememb er all of you tomorrow after the publication deadline has passed I would like to thank m y three advisors Frank Dehne Anil Maheshwari and JorgRudiger Sack for their patience advice and for their condence that I would eventually nish this work In particular I want to thank my initial advisor JorgRudiger Sack for taking a gamble on me for the excellent infrastructure and nancial supp ort he provided and for his critical advice b oth on technical and nontechnical issues I would like to thank Frank Dehne for his help and encouragement in writing pap ers and for his unagging enthusiasm for our results and my prosp ects To Anil Maheshwari I oer my gratitude for the many hours wespent discussing Computer Science and other things over the past several years and for his unfailing willingness to do so on an impromptu basis Without his daytoday help encourage ment and quiet condence in our work I would not have been successful Iwould like to express my appreciation to Wolfgang Dittrich for his collab oration on many nice results and for showing me the power of mathematics in expressing ideas It has been a real pleasure working with him Tomy external examiner Abhiram Ranade and the other memb ers of my exami nation committee Thomas Kunz Ekow Oto o and Ivan Sto jmenovic thanks are due for their hard work in reading this thesis and for the useful suggestions they made I would like to thank my other coauthors Mark Lanthier Doron Nussbaum David Roytenb erg Radu V elicescu Norb ert Zeh and also Patrick Morin Kumanan Yogaratnam for many fruitful discussions and other co op erative work I also would like to thank Lyudmil Alexandrov for many interesting discussions and for his sug gestions for improving the algorithms in Chapter For their encouragement comments and questions which led me to new ideas I would like to acknowledge the help of Lars Arge Tom Cormen and Je Vitter To iii Lars thanks also for a comprehensive BibTeX le on external memory algorithms This was a great help to me Iwould like to thank the p eople of the Scho ol of Computer Science for the excellent working environment during my studies Particular thanks go to Rosemary Carter Barbara Coleman Linda Pfeier Maureen Sherman and Marlene Wilson Thanks also to the technical sta of SCS who kept the machines running and put up with my many suggestions To all of my faculty graduate and undergraduate colleagues who I have worked with as part of the Paradigm Group I oer my appreciation for your contributions to a great working environment Particular thanks go to Mark Lanthier for organizing the coke fund the pizza schedule and the ho ckey games and to Doron Nussbaum and Pat Morin for organizing the pizza talks FinallyI must acknowledge the help and supp ort of a group of p eople that have made the crucial dierence between success and failure for me First I would liketo thank my parents for instilling the desire to stretch for new and wonderful things and for their unconditional supp ort in all of myendeavours Lastly I oer very sp ecial thanks to my wife Gail and our children Lauren Julia Steven and Jessica for their supp ort and encouragement for their cheerfulness and for their love and aection I could not have continued without your help David Hutchinson Ottawa May This research was partially supp orted by Almerco Inc by the Natural Sciences and Engineering Research Council of Canada and by pTeran Computing Inc iv Contents Abstract ii Acknowledgements iii Intro duction Overview Motivation What Makes aGoodEM Algorithm Eectiveness of EM Techniques Fundamental Assumptions for Implementing Ecient IO Al gorithms Basic Data Formats on Multiple Disks Previous EM Research A Brief Survey External Memory Paradigms Batch Filtering Distribution Sweeping Data Structuring Simulation Lower Bounds on IO Complexity wer Bound Sorting Lo Permutation Lower Bound Matrix Transp ose Lower Bound A General Lower Bounding Technique Realistic Parameter Ranges The Role of RAIDs Storage Space Preliminaries Notation and Terminology Asymptotic Notation BSP Mo dels v A Central Issue Communication Bottlenecks Pap ers Contributing to the Thesis Claim to Originalityor Contribution to Knowledge Collab orations Contributions Organization of the Thesis Mo dels Overview Organization of This Chapter EM Mo dels Aggarwal and Vitter Single Disk Mo del The Parallel Disk Mo del PDM Hierarchical Memory Mo dels Parallel Computation Mo dels The Parallel Random Access Memory PRAM Mo del The Bulk Synchronous Parallel BSP Mo del The Extended BSP BSP Parallel Pro cessing Mo del The Coarse Grained Multicomputer CGM Mo del Other Parallel Computing Mo dels Combining PDM and BSPlike Mo dels Ob jective A More Comprehensive Mo del The EMBSP Mo del System Mo del Cost mo del The EMBSP Mo del The EMCGM Mo del Go o dness Criteria for Parallel EM Algorithms Extending the coptimality Criterion AlternativeGoodness Criteria for EMBSP Algorithms EMBSP Algorithms from Simulation Overview of the Metho d Overview and Rationale Randomized Simulation Single Pro cessor Target Machine Multiple Pro cessor Target Machine vi Deterministic Simulation Intro duction Balancing Communication Single Pro cessor Target Machine Multiple Pro cessor Target Machine Summary Deterministic Simulation with Lower Slackness Motivation Overview The Parallel Algorithm LowSlackRouting The Parallel Algorithm NumberBlocks Analysis of LowSlackRouting The EM Algorithm EMLowSlackRouting The EM Algorithm EMNumberBlocks Analysis of EMLowSlackRouting Single Pro cessor Target Machine

Parallel Algorithms in External Memory

Paralellizing the Data Cube

Two-Level Main Memory Co-Design: Multi-Threaded Algorithmic Primitives, Analysis, and Simulation

Minimizing Writes in Parallel External Memory Search

The Parallel Persistent Memory Model

Implementing Operational Intelligence Using In-Memory Computing

Models for Parallel Computation in Multi-Core, Heterogeneous, and Ultra Wide-Word Architectures

High Performance Computing Ð Past, Present and Future

Experiments with a Parallel External Memory System*

Fundamentals – Parallel Architectures, Models, and Languages

The Efficiency of Mapreduce in Parallel External Memory Arxiv

The Power of In-Memory Computing: from Supercomputing to Stream Processing

On the Complexity of List Ranking in the Parallel External Memory Model