Title: Efficient Parallel and Distributed Algorithms for Gis Polygon Overlay Processing
Total Page:16
File Type:pdf, Size:1020Kb
TITLE: EFFICIENT PARALLEL AND DISTRIBUTED ALGORITHMS FOR GIS POLYGON OVERLAY PROCESSING by SATISH PURI Under the Direction of Sushil K. Prasad, PhD ABSTRACT Polygon clipping is one of the complex operations in computational geometry. It is used in Geographic Information Systems (GIS), Computer Graphics, and VLSI CAD. For two polygons with n and m vertices, the number of intersections can be O(nm). In this disser- tation, we present the first output-sensitive CREW PRAM algorithm, which can perform polygon clipping in O(logn) time using O(n + k + k0) processors, where n is the number of vertices, k is the number of intersections, and k0 is the additional temporary vertices introduced due to the partitioning of polygons. The current best algorithm by Karinthi, Srinivas, and Almasi does not handle self-intersecting polygons, is not output-sensitive and must employ O(n2) processors to achieve O(logn) time. The second parallel algorithm is an output-sensitive PRAM algorithm based on Greiner-Hormann algorithm with O(logn) time complexity using O(n + k) processors. This is cost-optimal when compared to the time com- plexity of the best-known sequential plane-sweep based algorithm for polygon clipping. For self-intersecting polygons, the time complexity is O(((n+k)lognloglogn)=p) using p ≤ (n+k) processors. In addition to these parallel algorithms, the other main contributions in this dissertation are 1) multi-core and many-core implementation for clipping a pair of polygons and 2) MPI- GIS and Hadoop Topology Suite for distributed polygon overlay using a cluster of nodes. Nvidia GPU and CUDA are used for the many-core implementation. The MPI based system achieves 44X speedup while processing about 600K polygons in two real-world GIS shapefiles 1) USA Detailed Water Bodies and 2) USA Block Group Boundaries) within 20 seconds on a 32-node (8 cores each) IBM iDataPlex cluster interconnected by InfiniBand technology. INDEX WORDS: Polygon Clipping, Polygon Overlay, Parallel Algorithms, MPI, MapRe- duce, GIS. TITLE: EFFICIENT PARALLEL AND DISTRIBUTED ALGORITHMS FOR GIS POLYGON OVERLAY PROCESSING by SATISH PURI A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in the College of Arts and Sciences Georgia State University 2015 Copyright by Satish Puri 2015 TITLE: EFFICIENT PARALLEL AND DISTRIBUTED ALGORITHMS FOR GIS POLYGON OVERLAY PROCESSING by SATISH PURI Committee Chair: Sushil K. Prasad Committee: Ying Zhu Rafal Angryk Shamkant Navathe Electronic Version Approved: Office of Graduate Studies College of Arts and Sciences Georgia State University August 2015 iv DEDICATION This dissertation is dedicated to my loving and supportive wife, Priti Giri, who has been proud and supportive of my work and who has shared the many uncertainties, challenges, and sacrifices for completing this dissertation. I am truly thankful for having you in my life. This dissertation is also dedicated to my always encouraging parents, Kishor Puri and Piu Kumari Puri. Thank you for all the unconditional love, guidance, and support that you have always given me. v ACKNOWLEDGEMENTS This thesis is the result of research carried out over a period of many years. During this period, many people supported my work and helped me to bring it to a successful conclusion, and here I would like to express my gratitude. This dissertation work would not have been possible without the support of many people. I want to express my gratitude to my advisor Dr. Sushil K. Prasad for his support and invaluable guidance throughout my study. His knowledge, perceptiveness, and innovative ideas have guided me throughout my graduate study. I also present my words of gratitude to the other members of my thesis committee, Dr. Ying Zhu, Dr. Rafal Angryk, and Dr. Shamkant Navathe for their advice and their valuable time spent in reviewing the material. I also would like to extend my appreciation to my colleagues in DiMoS research group for all the support and ideas and to everyone who offered me academic advice and moral support throughout my graduate studies. vi TABLE OF CONTENTS ACKNOWLEDGEMENTS ::::::::::::::::: v LIST OF TABLES :::::::::::::::::::: x LIST OF FIGURES :::::::::::::::::::: xi LIST OF ABBREVIATIONS :::::::::::::::: xvi PART 1 INTRODUCTION ::::::::::::::: 1 1.1 Polygon Overlay Processing ....................... 1 1.2 Motivation .................................. 2 1.3 Proposed Research and Development Agenda ............ 3 1.3.1 Goals . 4 1.3.2 Overview of Main Results . 4 PART 2 A SURVEY ON PARALLEL SPATIAL OVERLAY AND JOIN OVER CLOUD AND GPU-ACCELERATED HY- BRID PLATFORMS :::::::::::::: 8 2.1 Definitions .................................. 8 2.2 Historical Background .......................... 8 2.3 Scientific Fundamentals .......................... 10 2.3.1 Cloud computing . 10 2.3.2 Windows Azure Platform . 11 2.3.3 Hadoop MapReduce . 11 2.4 Key Applications .............................. 12 2.4.1 Polygon Overlay Processing using Windows Azure . 12 2.4.2 Spatial Overlay and Join using Hadoop . 13 vii 2.4.3 Spatial Overlay and Join using GPUs . 15 2.4.4 Accelerated Hadoop for Spatial Join . 17 2.5 Future Directions ............................. 17 PART 3 OUTPUT-SENSITIVE PARALLEL ALGORITHM FOR POLY- GON CLIPPING ::::::::::::::: 18 3.1 Introduction ................................. 18 3.2 Background and Literature ....................... 21 3.2.1 The Computational Model . 21 3.2.2 Polygon clipping . 21 3.2.3 Segment Tree . 22 3.3 Algorithm for clipping two polygons . 23 3.3.1 Terminology . 23 3.3.2 Vatti's Polygon Clipping Algorithm . 23 3.3.3 Problem Definition . 25 3.3.4 Multi-Way Divide and Conquer Algorithm . 25 3.3.5 Complexity Analysis . 35 3.4 Multi-threaded Multiway Divide and Conquer Algorithm . 37 3.5 Experimental Setup and Implementation Results . 39 3.5.1 Synthetic Data . 39 3.5.2 Real Data . 42 PART 4 PARALLEL GREINER-HORMANN BASED POLYGON CLIPPING ALGORITHM WITH IMPROVED BOUNDS 44 4.1 Introduction ................................. 44 4.2 Background Concepts and Related Work . 47 4.2.1 NVIDIA Graphics Processing Unit . 47 4.2.2 Sequential Polygon Clipping of Arbitrary Polygons . 48 4.2.3 Differences between GH and Vatti's algorithm . 48 viii 4.2.4 Time Complexity . 49 4.3 Parallel Greiner-Hormann Polygon Clipping Algorithm and Imple- mentations .................................. 49 4.3.1 Terminology . 49 4.3.2 PRAM Polygon Clipping Algorithm . 53 4.3.3 Multi-threaded implementation design . 56 4.4 Experiment and Evaluation ....................... 57 4.4.1 Multi-threaded Parallel Greiner-Hormann Clipping Algorithm Evalu- ation . 58 PART 5 MPI-GIS : GIS POLYGONAL OVERLAY PROCESSING USING MPI ::::::::::::::::: 64 5.1 Introduction ................................. 64 5.2 Background and Literature ....................... 66 5.2.1 Data Types in GIS . 66 5.2.2 R-Tree . 66 5.2.3 Crayons system on Azure cloud . 67 5.2.4 Clipper Library . 67 5.3 Master-Slave architecture based system design . 68 5.3.1 Architecture with dynamic load balancing using R-tree . 68 5.3.2 Architecture with dynamic load balancing using sorting-based algo- rithm . 70 5.3.3 Our system with Static Load Balancing . 71 5.3.4 MPI related Issues . 72 5.3.5 Clipper Library Related Issues . 72 5.3.6 Timing Characteristics and Experiments . 73 5.4 Optimizations based on file and space partitioning . 77 5.4.1 Input Data Processing and Organization . 78 5.4.2 Spatial Partitioning . 79 ix 5.4.3 MPI Communication . 81 5.4.4 MPI-GIS Overlay Processing System Evaluation . 82 5.4.5 Comparison with earlier work . 84 PART 6 MAPREDUCE ALGORITHMS FOR GIS POLYGONAL OVERLAY PROCESSING :::::::::::: 86 6.1 Introduction ................................. 86 6.2 Background and Literature ....................... 88 6.2.1 Map Overlay . 88 6.2.2 R-tree Data Structure . 89 6.2.3 MapReduce Framework . 90 6.2.4 Overlay and Miscellaneous Approaches . 91 6.3 Map Reduce Algorithms ......................... 92 6.3.1 Problem Definition . 92 6.3.2 Algorithm With Map and Reduce Phases . 93 6.3.3 Overlay Algorithm with Chained Map Reduce Phases . 95 6.3.4 Overlay Algorithm with Map Phase Only using DistributedCache 96 6.3.5 Grid-based Overlay Algorithm with a single Map and Reduce phase 99 6.4 Experimental Setup ............................ 100 PART 7 CONCLUSIONS :::::::::::::::: 107 REFERENCES ::::::::::::::::::::: 109 x LIST OF TABLES Table 3.1 Example shows the merging of sublists Al and Ar in an internal node of Cole's mergesort tree in a time-stepped fashion. Al = f5,6,7,9g and Ar = f1,2,3,4g. Also note the inversions marked for reporting by our extended Cole's merging algorithm. 30 Table 3.2 Scanbeam table showing the edges and labeled vertices of output poly- gons after processing intersections, start/end points for all scanbeams. 32 Table 3.3 Description of real-world datasets. 41 Table 4.1 Execution timings in ms for Parallel GH using simulated datasets with varying number of vertices (n) in S. C has 10 vertices (m). There are O(nm) intersections. ......................... 59 Table 4.2 Execution timing in ms for GPC, CUDA-GH and Parallel GH using 8 CPU cores. For Classic+Ocean, to measure GPC time, two threads are used. First thread clips Classic and second thread clips Ocean. 61 Table 5.1 Description of real-world datasets. 82 Table 6.1 Intersection graph represented as an adjacency list. 96 Table 6.2 Description of data sets used in experiments. 102 xi LIST OF FIGURES Figure 2.1 Polygon Overlay in Hadoop Topology Suite (HTS) using a single Map and Reduce phase. .......................... 13 Figure 3.1 Segment Tree ............................. 22 Figure 3.2 Example showing intersection of two self-intersecting concave poly- gons. .................................. 24 Figure 3.3 (I) Different types of partial polygons in a scanbeam with L/R edge labels (L stands for left, R stands for right), (II) Assigning labels to edges in a scanbeam .......................... 26 Figure 3.4 Intersection of 4 edges in a scanbeam.