A Robust Void-Finding Algorithm Using Computational Geometry and Parallelization Techniques
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSIDAD DE CHILE FACULTAD DE CIENCIAS F´ISICAS Y MATEMATICAS´ DEPARTAMENTO DE CIENCIAS DE LA COMPUTACION´ A ROBUST VOID-FINDING ALGORITHM USING COMPUTATIONAL GEOMETRY AND PARALLELIZATION TECHNIQUES TESIS PARA OPTAR AL GRADO DE MAG´ISTER EN CIENCIAS, MENCION COMPUTACION´ DEMIAN ALEY SCHKOLNIK MULLER¨ PROFESOR GU´IA: BENJAM´IN BUSTOS CARDENAS´ NANCY HITSCHFELD KAHLER MIEMBROS DE LA COMISION:´ MAURICIO CERDA VILLABLANCA GONZALO NAVARRO BADINO MAURICIO MAR´IN CAIHUAN SANTIAGO DE CHILE 2018 i Resumen El modelo cosmol´ogicoactual y m´asaceptado del universo se llama Lambda Cold Dark Matter. Este modelo nos presenta el modelo m´assimple que proporciona una explicaci´onrazonablemente buena de la evidencia observada hasta ahora. El modelo sugiere la existencia de estructuras a gran escala presentes en nuestro universo: Nodos, filamentos, paredes y vac´ıos. Los vac´ıosson de gran inter´espara los astrof´ısicosya que su observaci´onsirve como validaci´onpara el modelo. Los vac´ıosson usualmente definidos como regiones de baja densidad en el espacio, con s´olounas pocas galaxias dentro de ellas. En esta tesis, presentamos un estudio del estado actual de los algoritmos de b´usquedade vac´ıos. Mostramos las diferentes t´ecnicasy enfoques, e intentamos deducir la complejidad algor´ıtmica y el uso de memoria de cada void-finder presentado. Luego mostramos nuestro nuevo algoritmo de b´usquedade vac´ıos, llamado ORCA. Fue construido usando triangulaciones de Delaunay para encontrar a los vecinos m´ascercanos de cada punto. Utilizando esto, clasificamos los puntos en tres categor´ıas: Centro, borde y outliers. Los outliers se eliminan como ruido. Clasificamos los tri´angulosde la triangulaci´onen tri´angulosde vac´ıosy centrales. Esto se hace verificando un criterio de distancia, y si los tri´anguloscontienen outliers. Este m´etodo nos permite crear un algoritmo de b´usquedade vac´ıosr´apidoy robusto. Adicionalmente, se presenta una versi´onparalela del algoritmo. ii Abstract Cosmic Voids are generally described as large, underdense regions of the Universe. Over the past years, there have been many attempts to build algorithms to find voids in the large-scale structure of the Universe. There are many different methods, but most approaches do not consider robustness. In this thesis, we present an efficient, fast and robust void-finding algorithm, by using a series of computational geometry and parallelization techniques. We take advantage of the properties of certain features, such as Delaunay triangulations, and k-nearest-neighbour search algorithms. Additionally, we made a parallel version of the algorithm (on GPU), a useful feature for large data sets, since it speeds up running time. We successfully build a cosmic void-finding algorithm, that is both robust and efficient. We tested the algorithm on randomly generated samples of two-dimensional data sets, and found most voids on most sets, with a retrieval rate on average above 90%. In order to test robustness, we inserted random noise into voids, and the algorithm proved to be highly tolerant to it, still detecting the void even with 200 noise points inside it. Regarding running time, the new algorithm is around three times as fast as the algorithm against it was benchmarked. The parallel version is about twice as fast as the sequential algorithm. iii Dedicatoria Dedicado a mis padres, mis profesores gu´ıas,y mis amigos. iv Contents 1 Introduction 1 1.1 Motivation..............................................1 1.2 Research Questions..........................................2 1.3 Hypothesis..............................................2 1.4 General Objective..........................................2 1.5 Specific Objectives..........................................2 1.5.1 Development Plan......................................3 1.6 Contributions.............................................3 1.6.1 Algorithmic Complexity and running time.........................3 1.6.2 Memory Usage........................................3 1.6.3 Effectiveness.........................................3 1.6.4 Robustness..........................................4 1.6.5 Parallelization........................................4 2 Basic Concepts 5 2.1 Review of current Literature.....................................5 2.1.1 Adaptive tree (ART).....................................5 2.1.2 Gridding, and cube growing.................................6 2.1.3 Distance Field by gridding, then climbing algorithm...................7 2.1.4 Statistical analysis......................................8 2.1.5 Delaunay tetrahedra / triangulation............................8 v 2.1.6 Distance Field and Watershed techniques......................... 10 2.1.7 Voronoi Tesselation and watershed techniques....................... 11 2.1.8 Wall Builder and Sphere Growing............................. 11 2.1.9 Discussion........................................... 13 2.2 Computational Techniques...................................... 15 2.2.1 Delaunay Triangulation................................... 15 2.2.2 K-Nearest-Neighbor Search................................. 15 2.2.3 KD-Tree............................................ 16 2.2.4 Parallelization, OpenCL and pyOpenCL.......................... 17 2.3 Research Methodology........................................ 17 2.3.1 Performance Indicators................................... 18 3 Cosmic Void-Finding Algorithm 19 3.1 First Approaches........................................... 19 3.1.1 Brute Force.......................................... 19 3.1.2 Delaunay Triangulation................................... 22 3.1.3 KD-Tree............................................ 24 3.2 Improved Solutions.......................................... 25 3.2.1 KD-Tree with high k + Image Processing......................... 25 3.2.2 ORCA: Higher generation Delaunay Triangulation k-NN search for Noise removal + edge removal......................................... 26 3.3 Discussion and selection of algorithm................................ 29 3.4 Parallel Approach.......................................... 30 3.4.1 Parallelization tools and frameworks............................ 30 3.4.2 First approach........................................ 31 3.4.3 Final parallel version..................................... 32 4 Results and Analysis 34 vi 4.1 Generated Samples.......................................... 34 4.2 Void-finding algorithm comparison................................. 38 4.2.1 Comparison conditions and Data Sets........................... 38 4.2.2 DELFIN and ORCA..................................... 38 4.2.3 Performance Indicators................................... 39 5 Conclusions and Future Work 47 5.1 Conclusions.............................................. 47 5.2 Future Work............................................. 48 Bibliography 49 vii List of Tables 2.1 Overview of existing void-finders..................................6 4.1 Running time comparison between DELFIN, ORCA, and ORCA Parallel........... 39 4.2 Memory used by DELFIN and by ORCA............................. 40 viii List of Figures 2.1 A Delaunay Triangulation with with circumcircles shown.................... 15 2.2 A representation of 3-NN search [?]................................. 16 2.3 KD-Tree................................................ 16 3.1 KD-Tree algorithm run with n=8192 points, k=9, "=100, plotting center points only..... 25 3.2 Example of second-gen Triangulation Neighbors.......................... 27 4.1 A randomly generated 4096-point dataset............................. 35 4.2 Voids found on a randomly generated 4096-point dataset, using 4th generation Delaunay neighbors, " value of 100 and k value of 15............................. 35 4.3 A randomly generated 8192-point dataset............................. 36 4.4 Voids found on a randomly generated 8192-point dataset, using 3th generation Delaunay neighbors, " value of 100 and k value of 8............................. 36 4.5 A randomly generated 32768-point dataset............................. 37 4.6 Voids found on a randomly generated 32768-point dataset, using 5th generation Delaunay neighbors, " value of 80 and k value of 20............................. 37 4.7 Recovery and Error Rates of irregular voids over a 1.000 points set, using values of k=3 and "=150.................................................. 41 4.8 Recovery and Error Rates of irregular voids over a 5.000 points set, using values of k=7 and "=100.................................................. 42 4.9 Recovery and Error Rates of irregular voids over a 10.000 points set, using values of k=12 and "=70.................................................. 42 ix 4.10 Recovery and Error Rates of irregular voids over a 50.000 points set, using values of k=15 and "=50.................................................. 42 4.11 Recovery and Error Rates of regular voids over a 1.000 points set, using values of k=3 and "=200. 43 4.12 Recovery and Error Rates of regular voids over a 5.000 points set, using values of k=7 and "=120. 43 4.13 Recovery and Error Rates of regular voids over a 10.000 points set, using values of k=10 and "=100.................................................. 44 4.14 Recovery and Error Rates of regular voids over a 50.000 points set, using values of k=14 and "=70.................................................. 44 4.15 Recovery and Error Rates of regular voids over a 100.000 points set, using values of k=14 and "=70.................................................. 44 4.16 Comparison between ORCA (left)