<<

UNIVERSIDAD DE CHILE FACULTAD DE CIENCIAS F´ISICAS Y MATEMATICAS´ DEPARTAMENTO DE CIENCIAS DE LA COMPUTACION´

A ROBUST -FINDING ALGORITHM USING COMPUTATIONAL GEOMETRY AND PARALLELIZATION TECHNIQUES

TESIS PARA OPTAR AL GRADO DE MAG´ISTER EN CIENCIAS, MENCION COMPUTACION´

DEMIAN ALEY SCHKOLNIK MULLER¨

PROFESOR GU´IA: BENJAM´IN BUSTOS CARDENAS´ NANCY HITSCHFELD KAHLER

MIEMBROS DE LA COMISION:´ MAURICIO CERDA VILLABLANCA GONZALO NAVARRO BADINO MAURICIO MAR´IN CAIHUAN

SANTIAGO DE CHILE 2018 i Resumen

El modelo cosmol´ogicoactual y m´asaceptado del universo se llama Lambda Cold Dark Matter. Este modelo nos presenta el modelo m´assimple que proporciona una explicaci´onrazonablemente buena de la evidencia observada hasta ahora. El modelo sugiere la existencia de estructuras a gran escala presentes en nuestro universo: Nodos, filamentos, paredes y vac´ıos. Los vac´ıosson de gran inter´espara los astrof´ısicosya que su observaci´onsirve como validaci´onpara el modelo. Los vac´ıosson usualmente definidos como regiones de baja densidad en el espacio, con s´olounas pocas galaxias dentro de ellas. En esta tesis, presentamos un estudio del estado actual de los algoritmos de b´usquedade vac´ıos. Mostramos las diferentes t´ecnicasy enfoques, e intentamos deducir la complejidad algor´ıtmica y el uso de memoria de cada void-finder presentado. Luego mostramos nuestro nuevo algoritmo de b´usquedade vac´ıos, llamado ORCA. Fue construido usando triangulaciones de Delaunay para encontrar a los vecinos m´ascercanos de cada punto. Utilizando esto, clasificamos los puntos en tres categor´ıas: Centro, borde y outliers. Los outliers se eliminan como ruido. Clasificamos los tri´angulosde la triangulaci´onen tri´angulosde vac´ıosy centrales. Esto se hace verificando un criterio de distancia, y si los tri´anguloscontienen outliers. Este m´etodo nos permite crear un algoritmo de b´usquedade vac´ıosr´apidoy robusto. Adicionalmente, se presenta una versi´onparalela del algoritmo.

ii Abstract

Cosmic Voids are generally described as large, underdense regions of the Universe. Over the past years, there have been many attempts to build algorithms to find voids in the large-scale structure of the Universe. There are many different methods, but most approaches do not consider robustness. In this thesis, we present an efficient, fast and robust void-finding algorithm, by using a series of computational geometry and parallelization techniques. We take advantage of the properties of certain features, such as Delaunay triangulations, and k-nearest-neighbour search algorithms. Additionally, we made a parallel version of the algorithm (on GPU), a useful feature for large data sets, since it speeds up running time. We successfully build a cosmic void-finding algorithm, that is both robust and efficient. We tested the algorithm on randomly generated samples of two-dimensional data sets, and found most voids on most sets, with a retrieval rate on average above 90%. In order to test robustness, we inserted random noise into voids, and the algorithm proved to be highly tolerant to it, still detecting the void even with 200 noise points inside it. Regarding running time, the new algorithm is around three times as fast as the algorithm against it was benchmarked. The parallel version is about twice as fast as the sequential algorithm.

iii Dedicatoria

Dedicado a mis padres, mis profesores gu´ıas,y mis amigos.

iv Contents

1 Introduction 1

1.1 Motivation...... 1

1.2 Research Questions...... 2

1.3 Hypothesis...... 2

1.4 General Objective...... 2

1.5 Specific Objectives...... 2

1.5.1 Development Plan...... 3

1.6 Contributions...... 3

1.6.1 Algorithmic Complexity and running time...... 3

1.6.2 Memory Usage...... 3

1.6.3 Effectiveness...... 3

1.6.4 Robustness...... 4

1.6.5 Parallelization...... 4

2 Basic Concepts 5

2.1 Review of current Literature...... 5

2.1.1 Adaptive tree (ART)...... 5

2.1.2 Gridding, and cube growing...... 6

2.1.3 Distance Field by gridding, then climbing algorithm...... 7

2.1.4 Statistical analysis...... 8

2.1.5 Delaunay tetrahedra / triangulation...... 8

v 2.1.6 Distance Field and Watershed techniques...... 10

2.1.7 Voronoi Tesselation and watershed techniques...... 11

2.1.8 Wall Builder and Sphere Growing...... 11

2.1.9 Discussion...... 13

2.2 Computational Techniques...... 15

2.2.1 Delaunay Triangulation...... 15

2.2.2 K-Nearest-Neighbor Search...... 15

2.2.3 KD-Tree...... 16

2.2.4 Parallelization, OpenCL and pyOpenCL...... 17

2.3 Research Methodology...... 17

2.3.1 Performance Indicators...... 18

3 Cosmic Void-Finding Algorithm 19

3.1 First Approaches...... 19

3.1.1 Brute Force...... 19

3.1.2 Delaunay Triangulation...... 22

3.1.3 KD-Tree...... 24

3.2 Improved Solutions...... 25

3.2.1 KD-Tree with high k + Image Processing...... 25

3.2.2 ORCA: Higher generation Delaunay Triangulation k-NN search for Noise removal + edge removal...... 26

3.3 Discussion and selection of algorithm...... 29

3.4 Parallel Approach...... 30

3.4.1 Parallelization tools and frameworks...... 30

3.4.2 First approach...... 31

3.4.3 Final parallel version...... 32

4 Results and Analysis 34

vi 4.1 Generated Samples...... 34

4.2 Void-finding algorithm comparison...... 38

4.2.1 Comparison conditions and Data Sets...... 38

4.2.2 DELFIN and ORCA...... 38

4.2.3 Performance Indicators...... 39

5 Conclusions and Future Work 47

5.1 Conclusions...... 47

5.2 Future Work...... 48

Bibliography 49

vii List of Tables

2.1 Overview of existing void-finders...... 6

4.1 Running time comparison between DELFIN, ORCA, and ORCA Parallel...... 39

4.2 Memory used by DELFIN and by ORCA...... 40

viii List of Figures

2.1 A Delaunay Triangulation with with circumcircles shown...... 15

2.2 A representation of 3-NN search [?]...... 16

2.3 KD-Tree...... 16

3.1 KD-Tree algorithm run with n=8192 points, k=9, ε=100, plotting center points only..... 25

3.2 Example of second-gen Triangulation Neighbors...... 27

4.1 A randomly generated 4096-point dataset...... 35

4.2 Voids found on a randomly generated 4096-point dataset, using 4th generation Delaunay neighbors, ε value of 100 and k value of 15...... 35

4.3 A randomly generated 8192-point dataset...... 36

4.4 Voids found on a randomly generated 8192-point dataset, using 3th generation Delaunay neighbors, ε value of 100 and k value of 8...... 36

4.5 A randomly generated 32768-point dataset...... 37

4.6 Voids found on a randomly generated 32768-point dataset, using 5th generation Delaunay neighbors, ε value of 80 and k value of 20...... 37

4.7 Recovery and Error Rates of irregular voids over a 1.000 points set, using values of k=3 and ε=150...... 41

4.8 Recovery and Error Rates of irregular voids over a 5.000 points set, using values of k=7 and ε=100...... 42

4.9 Recovery and Error Rates of irregular voids over a 10.000 points set, using values of k=12 and ε=70...... 42

ix 4.10 Recovery and Error Rates of irregular voids over a 50.000 points set, using values of k=15 and ε=50...... 42

4.11 Recovery and Error Rates of regular voids over a 1.000 points set, using values of k=3 and ε=200. 43

4.12 Recovery and Error Rates of regular voids over a 5.000 points set, using values of k=7 and ε=120. 43

4.13 Recovery and Error Rates of regular voids over a 10.000 points set, using values of k=10 and ε=100...... 44

4.14 Recovery and Error Rates of regular voids over a 50.000 points set, using values of k=14 and ε=70...... 44

4.15 Recovery and Error Rates of regular voids over a 100.000 points set, using values of k=14 and ε=70...... 44

4.16 Comparison between ORCA (left) and DELFIN (right), using a set of 8192 points...... 45

4.17 Comparison between ORCA (left) and DELFIN (right), using a set of 8.192 points, showing in green similar voids found, and in orange voids with different shapes found...... 45

4.18 Comparison between ORCA (left) and DELFIN (right), using a set of randomized 10.000 points with a 200-radius void with 75 random noise points inside...... 46

4.19 Comparison between ORCA (left) and DELFIN (middle and right), using a set of randomized 10.000 points with a 200-radius void with 125 random noise points inside. For DELFIN the parameters were moved so that it found many small voids (center) and on the next iteration of parameters it did not find voids (right)...... 46

x Chapter 1

Introduction

The current, and most accepted cosmological model of the universe is called Lambda Cold Dark Matter (ΛCDM). This is the simplest model that provides a reasonably good account of the observed evidence thus far. The model suggests the existence of large-scale structures present in our universe: nodes, filaments, walls, and voids. Voids are of great interest to astrophysicists since their observation serves as a validation for the model. Voids are usually defined as under-dense regions in space, with only a few inside them. This is, of course, an oversimplification. Many authors have different definitions of voids, which, in turn, makes the task of building a robust void finder very difficult.

1.1 Motivation

Voids can be defined as large, underdense sections of space, with very few or no galaxies. Voids typically have a diameter of 10 to 100 megaparsecs. They were first discovered in 1978 in a pioneering study by Stephen Gregory and Laird A. Thompson at the Kitt Peak National Observatory [?].

The applications of voids are broad and impressive, ranging from shedding light on the current understanding of dark energy, to refining and constraining cosmological evolution models. Voids act as bubbles in the Universe that are sensitive to background cosmological changes. This means that the evolution of a void’s shape is in part the result of the expansion of the Universe. Since this acceleration is believed to be caused by dark energy, studying the changes of a void’s shape over a period of time can further refine the ΛCDM model and provide a more accurate dark energy equation of state [?]. Additionally, the abundance of voids is a promising way to constrain the dark energy equation of state [?].

Broadly speaking, there are two main categories of void-finding algorithms. First, there are algorithms that are based on computing all distances between points, such as cube and sphere growing algorithms (see Section 2.1.2 and Section 2.1.8), and those which use a distance field (see Section 2.1.3 and Section 2.1.6). The algorithmic complexity of these algorithms is, for the most part, quadratic, due to the fact that they compute all distances between all points. This makes them somewhat slow for big data sets, and not really scalable.

1 The second type of algorithms are those who use computational geometry techniques (see Section 2.1.5 and Section 2.1.7). These algorithms, although faster than the quadratic ones, can be less precise. Finally, most algorithms are susceptible to noise in the data.

In this thesis, we propose a new method, using new approaches to find voids, such as combining Delaunay Triangulations with k-nearest-neighbor search algorithms. We develop a new, fast and robust cosmic void- finding algorithm, ORCA, that outperforms existing ones in terms of running time, memory usage, and effectiveness, as well as robustness to noise in the data. We will benchmark the new algorithm and compare it to an existing one (DELFIN [?]), in terms of running time, memory usage, effectiveness, and robustness, in order to validate it.

1.2 Research Questions

• Are there faster and more robust algorithms for void-finding that are as effective as existing ones?

• How can we implement a parallel Void-Finding Algorithm?

1.3 Hypothesis

• K-nearest neighbor search algorithms combined with Delaunay triangulations can be used to build a robust cosmic Void-Finding algorithm that outperforms existing ones in terms of running time, memory usage, effectiveness, and robustness.

• GPU computing can be used to parallelize void-finding algorithms, speeding up running time.

1.4 General Objective

• Create a robust, fast and effective cosmic Void-finding Algorithm, along with a parallel version of it.

1.5 Specific Objectives

• Design and implement a new Void-Finding Algorithm using Delaunay triangulations and k-nearest neighbor search algorithms.

• Design and implement a parallel Void-Finding Algorithm using GPU computing techniques.

• Benchmark the algorithms and compare them to an existing Void-Finding Algorithm, in terms of running time, memory usage, effectiveness, and robustness.

2 1.5.1 Development Plan

1. Research of the current state of the art.

2. Design, implement and test the k-nearest neighbor search algorithm (k-NNS) over a two-dimensional data set. Refine both parameters (k and epsilon), and try out relevant dependencies (epsilon depending on k and vice versa).

3. Design, implement and test an algorithm to classify points according to the results of k-NNS.

4. Design, implement and test a hybrid algorithm between k-NNS and Delaunay triangulation or Voronoi tessellation.

5. Design, implement and test an algorithm that is able to run parallel and that takes advantage of GPU computing.

6. Benchmark ORCA in terms of running time, memory usage, effectiveness, and robustness, and compare it to DELFIN [?].

1.6 Contributions

1.6.1 Algorithmic Complexity and running time

Most reviewed Void-Finding Algorithms present quadratic approaches in terms of algorithmic complexity, if they are implemented without special data structures or algorithms. This means, that as the data grows, the running time will start to become impractical. In this sense, the new algorithm has an algorithmic complexity lower than that boundary, O(n log n). Also, the running time of ORCA is similar or faster than DELFIN. The Parallel version of ORCA is also faster than ORCA.

1.6.2 Memory Usage

In terms of memory, revised algorithms should not perform badly. In fact, most should use memory linearly related to the used data. ORCA uses a linear amount of memory too, that is, using only as much memory as the original data set, multiplied by some constant factor. In the case of ORCA, it uses a base memory of ∼ 38 mb, plus ∼ 0.0018 mb per point.

1.6.3 Effectiveness

When detecting cosmic voids, we set out to get at least 80% of the same voids that were found using existing void finders. When analyzing individual voids, we also set out to achieve a 50% overlap comparing the areas of the voids found by our algorithm and existing ones (DELFIN [?]). Both goals were achieved and surpassed.

3 1.6.4 Robustness

The new algorithm is very robust. This implies that strange sets of data should do not break it. It also implies that large data sets are processed correctly and that voids are found, even in the presence of noise in the data.

1.6.5 Parallelization

The parallel version of ORCA runs on multiple cores and GPU at the same time. The most time-consuming part runs in parallel, in order to reduce bottleneck effects. This will was an important factor we considered when building the algorithm and choosing which techniques to use.

4 Chapter 2

Basic Concepts

In the present chapter we will address three points. First, we will talk about the present state of cosmic void- finders, in an extensive review of the current literature (see Section 2.1). Next, we present the computational techniques used throughout this thesis (see Section 2.2). Finally, the research methodology is presented and explained (see Section 2.3).

2.1 Review of current Literature

In this section, a series of related studies on cosmic void finders are shown. The studies have been sorted by algorithmic strategy. Table 2.1 shows a summary of the existing voidfinders. Since most studies do not present their algorithms directly, we have made an effort to estimate their algorithmic complexity, assuming average cases.

2.1.1 Adaptive tree (ART)

Gottl¨ober et al. [?] address the problem of voids using high-resolution N-body simulations. All numerical simulations were run using the adaptive tree (ART) N-body code of Kravtsov, Klypin & Khokhlov [?].

The ART has linear running time depending on the number of cells, Nc, i.e. ∼ O(Nc). An adaptive mesh refinement technique is used to achieve high resolution in the regions of interest. The authors start with running a low-resolution simulation and proceed to higher-resolution simulations as they assign velocity and displacement to the particles. Patiri et al. [?] also use this technique in one of their algorithms, which is run over n-body simulations.

The search for voids starts with the construction of the minimal spanning tree for haloes. We then start searching the simulation for the point with the largest distance R1 to the set of haloes. This is, then, the center of the largest void (with a radius of R1). The search continues, searching for the next point not within a void with the greatest distance, and so on.

5 Family Name Delaunay / Voronoi Structure in The 3d Distribution. Ii. Voids and Watersheds of Local Maxima and Minima Delaunay / Voronoi Zipf’s law for fractal voids and a new void-finder Delaunay / Voronoi On Finding Large Polygonal Voids Using Delaunay Triangulation: The Case of Planar Point Sets Delaunay / Voronoi ZOBOV: a parameter-free void-finding algorithm Distance Field A Simple Void-Searching Algorithm Distance Field Voids in a CDM universe Distance Field Statistics of voids in the two-degree Survey Distance Field A cosmic watershed: the WVF void detection technique Growing Voids in the distribution ofgalaxies: an assessment oftheir signifi- cance and derivation of a void spectrum Growing Void scaling and void profiles in cold dark matter models Growing Voids in the PSCz Survey and the Updated Zwicky Catalog Growing The Size, Shape and Orientation of Cosmological Voids in The Sloan Digital Sky Survey Growing Automated Detection of Voids in Redshift Surveys

Table 2.1: Overview of existing void-finders

As the authors of [?] explain, the search for the minimum spanning tree of haloes can be achieved in ∼ O(N). Then comes the complex part, namely the search for the voids themselves. If the simulation has a cube edge length of k, then they have to search for the point with the greatest distance R1 to the haloes. Assuming there are h haloes, this would take ∼ O(hk3), just for the first void. This has to be repeated until all voids are found. If there are v voids, the algorithm should run in ∼ O(vhk3).

2.1.2 Gridding, and cube growing

Gridding is the process of dividing space into a grid. Gridding algorithms usually start off by dividing space into a cubical grid, and then flagging cubes according to emptiness criteria.

G. Kauffmann and A. P. Fairall [?] defined voids as spherical regions, completely empty of galaxies. They constrain the shape of the void by fitting an ellipsoid or sphere to the galaxies on the boundaries of the voids. They can then calculate the volume of the void by measuring the volume of the sphere or ellipsoid. The other restriction is the size of the void. Their algorithm is called VOIDSEARCH. The data was split into cubes, and empty and non-empty cubes were marked as ’off’ and ’on’. The algorithm then searches for cubical voids. To better simulate the spherical or ellipsoidal shape of the void, the program attempted to append a series of single-layered, rectangular grouping of ’off’ cubes, equal in area or smaller than the surface on to which it is to be added, and area no less than two-thirds. The algorithm proceeded by finding the largest base voids first, progressing down in size to the smaller voids, adding adjacent faces as explained above. An important parameter is the cube edge length. Smaller gridding length gives a higher resolution, but has an impact on the running time, as we will discuss below.

S. Arbabi-Bidgoli and V. Muller [?] use an adapted version of the void search algorithm proposed by Kauffman

6 & Fairall [?]. They use a high-resolution density field grid where each galaxy occupies one grid cell. The first void is found, and then smaller and smaller ones are added to the list of voids. In order to find a void, empty cubes are formed with empty cells. Once the biggest cubes have been found and tagged as voids, the faces start checking if they can expand the void a little further, with the condition that the area covered by empty cells is larger than two-thirds of the face area.

Clearly, we can see that there is a direct relationship between cube edge length and running time. Smaller cube edge length leads to more cubes, so if the cube edge length is s, and the total length of the ’data cube’ M 3 to be analyzed is M, we would have a grid consisting of ( s ) cubes. Additionally, the algorithm needs to traverse all objects of the survey in order to mark cubes as ’on’ or ’off’. We will call the number of objects present in the survey N. We can mark the cubes in linear time: O(N), depending directly on the number of particles in the survey. After this initial step, the program looks for voids. For each cube marked ’off’, all k3 neighboring faces have to be checked recursively. The recursive check, for a single cube, takes O( N . But every 3 k3 k6 3 ’off’ cube has to be visited at least once, so that gives us an estimated order of O((k − N) ∗ ( N ) = O( N − k ). In order to save in memory the ’on’ and ’off’ grid, we need k3 boolean variables in the three-dimensional array. The N galaxies have to be saved as a triad of floats, or similar. In order to save the voids, we need only the center and radius. However, since there are normally few voids (less than 20), this is negligible in terms of memory.

2.1.3 Distance Field by gridding, then climbing algorithm

This is a mathematical approach to voids. We start off by defining a scalar field D : L3 → R as the distance of a given point x to the nearest galaxy. In this way, we can define define D(x) = minn{|x − Xn|}, where

Xn, n = 1, ..., N are the location of the particles. This is what J. Aikio and P. M¨ah¨onen[?] call a Distance Field (DF). Thus, local maxima of the DF are the points in space with the longest distance to the nearest galaxy and are then taken as the ”centers” of voids.

Their algorithm first defines a cubical mesh over the survey volume. The L3 cube is divided into k3 elementary cells, where k = L/s and s is called a resolution parameter. Afterward, for each elementary cell center, they calculate the minimum distances to the other points. That results in a discrete DF D(x). From there they find the local maxima.

After the previous steps, they have to divide the elementary cells among the voids. This is done using the ”climber algorithm”. The void to which a certain cell belongs is found by ”climbing” on the DF, towards the center, in other words, one of the local maxima. It is easy to see that as the climbing goes on, every cell along the way belongs to the same void, and is marked as such. If the climbing gets to a cell already belonging to a void, the climbing ends.

For every cell, they have to calculate the distance to every galaxy. In other words, this is O(k3/N), where N is 3 3 k3 k6 the number of galaxies and k the number of cells. So calculating the DF, in total, takes O(k ∗ ( N ) = O( N )

7 Afterward, they have to take each cell and climb to the nearest void center. Every cell has to climb; the complexity of this step, then, should be around O(k3). This should present a problem if the resolution parameter s is very small (lower values of s over a certain space L increase the value of k3). In order to save the DF in memory, we need k3 floats, in order to save all distances. The climbing par needs k3 ints because every cell needs to be flagged as belonging to a void. Voids, as in all strategies, are negligible. This algorithm requires more memory than cube growing (see Section 2.1.2).

Colbert et.al. [?] use a variant of the algorithm proposed by Aikio and M¨ah¨onen[?]. It is based on the assumption that voids are primordial negative overdensity perturbations that grew gravitationally and have reached shell crossing at present time. The algorithm is tested over n-body simulations. The authors map all particles to a three-dimensional mesh. Then, the grid is smoothed adaptively. Local minima in the particle distribution are found, and spheres are centered on these minima. Complexity analysis shows that this variant of the algorithm performs the same way as the one described previously.

Patiri et al. [?] present us with a statistical analysis of voids. They use the data from the two-degree Field Galaxy Redshift Survey and define voids as non-overlapping maximal spheres empty of haloes or galaxies. Additional constraints are mass or luminosity above a given value. The algorithm is called CELLS Void Finder and was created in order to search for all the voids in a galaxy based on a grid. The algorithm calculates the distances between each of the empty grid cells and all the galaxies, keeping the minimum distance. With the list of minimum distances, they search for the local maxima, which corresponds to the void centers.

2.1.4 Statistical analysis

Patiri et al. [?] called one of their algorithms HB Void Finder. First, they generate a sample of random trial spheres with a fixed radius R. The code then checks which spheres are empty, and keeps them. Then, they find for each trial empty sphere the four nearest objects and the sphere is expanded to contain them in its surface.

As the authors explain, the CPU time of the codes mainly depends on the number of particles, the number, and radius of trial spheres in the case of the HB algorithm and on the number of cells (i.e. resolution) and the levels of neighboring cell marking for the Cell Void Finder. For the Cell Void Finder, each empty grid cell has to compute the distance to all galaxies, so if there are g galaxies and k3 empty grid cells, the running time of this part should be around O(E + gk3). The E is the number of generated spheres. This method requires E spheres to be stored, so E times x,y,z,r, as coordinates and the radius. This could be stored in ints or floats according to the wanted precision.

2.1.5 Delaunay tetrahedra / triangulation

A Delaunay triangulation for a set P of points in a plane is a triangulation DT(P) such that no point in P is inside the circumcircle of any triangle in DT(P). Delaunay triangulations maximize the minimum angle of all

8 the angles of the triangles in the triangulation; they tend to avoid sliver triangles.

Way et al. [?] define a new algorithm which they call HOP. HOP is a parameter-free method of assigning groups of galaxies to local density maxima or minima. They also present a void finder that uses Delaunay tetrahedra. The general idea is to assemble a discrete set of entities, called objects. Then, the algorithm computes the value of a function f (the so-called HOP function) for each object, and finally, to analyze the adjacency information among the objects.

In order to correctly identify voids, the algorithm first computes the Delaunay tessellation of the galaxy positions. After that, they identify groups of tetrahedra making up voids using HOP with f = tetrahedral volume. Then, for each Delaunay void, containing Nvoid tetrahedra all 4Nvoid triangular faces of the tetrahedra making up the void are collected. The faces that appear on the list only once are identified, and thus we know that the vertices of such faces are on the surface. Finally, we can identify internal galaxies that are not on the surface.

Gaite [?] tests whether Zipf’s Power-law holds on voids found on fractal point sets. In order to do this, the author establishes a definition of voids based on discrete stochastic geometry, in particular, Delaunay and Voronoi Tesselations. The algorithm is made for two-dimensional point sets, but could easily be extended to three-dimensional ones. The algorithm works as follows. First, a Delaunay and Voronoi tessellation is built for the point set. The Delaunay triangles are then sorted by size. The first (largest) triangle is selected to start building the first void. The void grows by adding adjacent triangles if the overlap criterium is met. This criterium states that the distance between the centers of two overlapping spheres (in this case, the circumscribing spheres of the Delaunay triangles) be less than a given fraction f of the smaller radius. Al set of added triangles constitute, finally, a void. The next unused biggest triangle is the starting point for the next void, and so on. The first part, the Delaunay triangulation, can be achieved with divide and conquer in O(n log n), as shown by Cignoni et. al. [?]. Triangle sorting can be done by any modern sorting algorithm in O(n log n). The Voronoi diagram helps to find quickly the neighboring triangles. Every triangle has to be analyzed, so less than O(n). In memory, we would need to save N points and N triangles. With a custom data structure, this could take up very little memory.

Hitschfeld et al. [?] define voids as a zone of low point density in a point set, which may have some points in its interior. The main tool used here is Delaunay triangulations. The idea is to look for the longest-edge of the triangulation that does not belong to any already found polygon. This idea is based upon the fact that when a void is present on a planar void distribution, the edges of the Delaunay triangulation that will cross the void itself are local longest-edges in comparison with edges belonging to the neighborhood of the void.

The proposed algorithm begins reading the Delaunay triangulation and reading the threshold value. Then, it orders the triangles by their longest-edge. They take the first two triangles from the list (the ones with the longest edges), and labels them as ’used’. A triangle set with those two triangles is constructed. Then, for each of the neighbors of the triangles in the set, they are added to the set if and only if they share its longest

9 edge with a triangle from the set. In any case, checked triangles are labeled as used. If at the end of this step, the area of the triangles in the triangle set is greater than the threshold value, then we add this triangle set to the list of voids. The process is then repeated for the next unused triangles in the sorted triangle list. As stated before, creating a Delaunay triangulation can be achieved in roughly O(n log n). The ordering of the triangles can be done with any modern sorting algorithm in O(n log n) too. Then, for each of the triangles in this list, we have to start looking for neighbors and start looking for triangles which share its longest edge. However, in this step, if we use a triangle, we will not use it again, so this whole step takes only O(t), where t is the number of Delaunay triangles, which in this case is O(n), where n is the number of points in the set [?].

After this step, there is a void joining step. The subvoids are joined into candidate voids if they fulfill a certain criterion specified by the user (arc criterion, frontier criterion, second longest-edge criterion and frontier-edge criterion).

2.1.6 Distance Field and Watershed techniques

A watershed is a transformation (usually in the field of image processing) is defined on a grayscale image. The name refers metaphorically to a geological watershed, or drainage divide, which separates adjacent drainage basins. The watershed transformation treats the image it operates upon like a topographic map, with the brightness of each point representing its height, and finds the lines that run along the tops of ridges.

Platen et al. [?] base their algorithms on the watershed transform (WST) of Beucher & Lantuejoul (1979) and Beucher & Meyer (1993). The WST is mainly used in geophysics and is used for segmenting images into regions and objects. It operates by ’filling’ with water a landscape, starting at the lowest points. If two different sources of water touch, they form a ridge, (which corresponds to saddle points in the density field). It possesses several desirable qualities for void finding algorithms. It uses a relatively low number of parameters, it does not restrict the shape of a void, and it normally produces closed contours.

The first part of the algorithm creates a density field from a point distribution. The density field is then gridded and smoothed, using natural neighbor maxmin and median filtering. The next step is to transform the image into a discrete set of density levels and to eliminate pixel noise. Now the algorithm is ready to find the field minima and start the ’flooding’. Finally, once a pixel is reached by two distinct basins it is identified as belonging to their segmentation boundary. By continuing this procedure up to the maximum density level the whole region has been segmented into distinct void patches. The hierarchy is corrected by removing segmentation boundaries whose density is lower than some density threshold.

k6 As seen in Section 2.1.3, the longest part is calculating the distance field (O( N )). Then, finding the minima should take O(k3), since every cell has to be visited once and exactly once. The watershed hast to ’paint’ every cell, exactly once, so here we have again a bound of O(k3). As in the previous distance field (see Section 2.1.3), we will need k3 ints of floats, and for the watershed, each cell has to be marked to which void it belongs, so again we will have to save k3 ints. The memory cost of voids, as usual, can be dismissed.

10 2.1.7 Voronoi Tesselation and watershed techniques

A Voronoi tessellation is a partitioning of a plane into regions based on distance to points in a specific subset of the plane. That set of points (called seeds) is specified beforehand, and for each seed, there is a corresponding region consisting of all points closer to that seed than to any other. These regions are called Voronoi cells. The Voronoi diagram of a set of points is dual to its Delaunay triangulation. Once the Voronoi tessellation is complete, the aforementioned watershed techniques are used.

Neyrinck [?] creates an algorithm called ZOBOV (ZOnes Bordering On Voidness), which finds depressions in a set of points. One of the advantages of Neyrinck’s approach is that ZOBOV does not have any free parameters nor assumptions about shape. The algorithm works based on Voronoi tessellations in order to estimate densities, to find voids and subvoids. The methods used are somewhat similar to the one used by Platen et. al. [?], since both use tessellation techniques to measure densities, and both use the ’watershed’ concept.

The algorithm starts off by estimating the density using a Voronoi Tessellation. After tessellating, each particle i is given a density according to the formula 1/V (i) where V (i) is the volume of the Voronoi cell around particle i. The second step is to ’Zone’. Each particle is sent to its neighbor with less density until they arrive at density minimums (called a zone’s core). All Voronoi cells that ’flow’ towards the same core are part of a void. However, due to discreteness noise, many zones are spurious, and so it is necessary to join some voids. The final step is the joining of voids. In this part, watershed techniques are used. For each zone z, the water level is set to z’s minimum density and then raised gradually. The overflow shows then which voids to connect.

Sutter et.al. [?] create a set of tools they call VIDE. At its core, VIDE uses a substantially enhanced version of ZOBOV to calculate a Voronoi tessellation for estimating the density field and performing a watershed transform to construct voids. Additionally, VIDE provides significant functionality for both pre- and post-processing.

Voronoi tessellations can be build in O(n log n) with Fortune’s algorithm [?]. Calculating each Voronoi cell volume, and applying it to each particle, should take O(p), where p is the number of particles in the set. The zoning part has to be performed by every particle in the set, so again we have a lower bound of O(p). In memory, saving Voronoi cells would need N cells. The density field depends on N too, as does the climbing.

2.1.8 Wall Builder and Sphere Growing

Hagai El-Ad and Tsvi Piran developed their algorithm in 1996, based on a model in which the main features of the Large-Scale Structure (LSS) of the Universe are voids and walls [?]. ’Walls’ are thin 2D structures with high galaxy density. They define galaxies within walls as ’wall galaxies’. Wall galaxies then constitute boundaries between under-dense regions; the authors define these regions as voids. In this definition, voids

11 are not completely empty. The few scattered galaxies inside voids are called ’field galaxies’. The algorithm defines a void as a continuous volume that does not contain any wall galaxies and is nowhere thinner than a given diameter. The algorithm is divided into two steps: Wall Builder and Void Finder. The algorithm uses three parameters: n, β and ξ.

A wall galaxy is required to have at least n other wall galaxies within a sphere of radius L around it. Every galaxy that does not satisfy this condition is classified as a field galaxy. The algorithm applies these conditions recursively until all the field galaxies are found. Let’s say there are N galaxies in the dataset. Each galaxy has to be compared to all others. This is clearly ∼ O(N 2). This has to be done repeatedly until we have found all field galaxies. The Void Finder searches for spheres that are devoid of any wall galaxies. In other words, the authors keep the wall galaxies and discard the field galaxies. For a void with a maximal sphere of a diameter dmax, the authors take only spheres with diameters larger than ξdmax, where ξ is the “thinness parameter”. If the void is composed of more than one sphere, then each sphere must intersect at least one another sphere with a circle wider than the minimal diameter ξdmax. The authors approximated the number of voids to be expected via Poisson distributions. The algorithm stopped when the expected number of voids was found.

Later on, Fiona Hoyle and Michael S. Vogeley [?] study voids by using a method based on the one by El-Ad & Piran [?]. They apply their algorithm to the Point Source Catalogue Survey and the Updated Zwicky Catalog. Hoyle and Vogeley use n-body simulations, and also classify galaxies into ”field galaxies” and ”wall galaxies”. They first classify all galaxies in one of those categories. Then, they detect empty cells in the distribution of wall galaxies. Maximal empty spheres are grown, and then unique voids are classified. Finally, they enhance the void volume.

For the first part, they use the same method as El-Ad & Piran [?]. Then, when detecting empty cells, they place the wall galaxies onto a three-dimensional grid and count the number of galaxies in each cell. The authors refer to these empty grid cells as holes. Each hole is considered to be part of a possible void. Each hole starts to grow a sphere until it reaches a wall galaxy. Then, the sphere starts moving in the opposite direction of the wall galaxy found, and continues to grow. When a second wall galaxy is hit, they next find the vector that bisects the line joining the two galaxies and move the hole in this direction until a third galaxy is found, as before. This is repeated a final time.

Once all possible voids are detected, they are sorted by radius length, largest first. They assume the largest one is a void, and then, using a fractional overlap parameter, they calculate the overlaps of all voids and so join them together if they overlap by a significant amount. The final step, enhancement of void volume, goes as follows: They define a certain threshold and then compute the volume of each void by Monte Carlo integration, i.e. they embed it in a box that is larger than the void and generate many random particles within the box and count how many lie within one of the holes that make up the void.

The first part is already discussed previously. To count the number of empty cells, they have to take every galaxy and put it into one of the cells. This can be done in linear time, with simple geometry. If there are n

12 galaxies, then this would take O(n). The growing part is more tricky. Most cells will be empty, so we can assume n holes. Each has to start growing a sphere. We will assume that this takes a certain amount of iterations until the first wall galaxy is found. More voids imply faster-growing spheres iterations, so if there 3 k /v 3 are v voids, we can assume an average of 2 iterations, where k is the number of cubes. So, this steps takes 3 3 k /v k6 a total of O(k ∗ 2 ) = O( 2v ). Then comes the classification step. This depends largely on the number of voids found in the previous step, let’s call the number of voids found in the previous step v0. For each void, they have to overlap it with all other bigger ones, so that leaves us with something a little better than O(v02) (since we are not comparing every possible void to every other, but every void just with the bigger ones). Void enhancement seems to use a classic Monte Carlo algorithm, in which case the runtime would be bounded by O(v), where v are the number of voids found in the previous step. In memory, we will need to save only the wall galaxies, so something less than N. Spheres can be saved as k3 4-coordinate data.

Foster and Nelson [?] came up with an algorithm by extending the one from Hoyle & Vogeley [?] described previously. The authors apply the algorithm to the Data Release 5 of the Sloan Digital Sky Survey. A statistical analysis of the distribution of the size, shape, and orientation of voids is performed. The Void Finding algorithm is divided into seven steps. (i) data input; (ii) classification of galaxies as field or wall galaxies; (iii) detection of the empty cells in the distribution of wall galaxies; (iv) growth of the maximal sphere; (v) classification of the unique voids; (vi) enhancement of the void volume; and (vii) calculation of the void properties.

Data input processes the data, the coordinates are transformed. Every point undergoes a mathematical transformation, which takes ∼ O(n). To classify the galaxies, the average distance to the third nearest galaxy as well as the standard deviation of that value are computed. This has a lower bound of O(n2), since we have to compare each galaxy with every other one in the set. The detection of empty cells will depend entirely upon the resolution used. With k3 cells, and having to traverse every one of them, the running time is O(k3). The sphere growing part starts off with the position of the nearest galaxy to the center of every empty cell. A first growing vector, pointing from the nearest galaxy to the center of the empty sphere is computed, and the radius is gradually increased. The algorithm goes through the entire and finds the galaxy which yields the smallest sphere whose center has moved along the vector. Every empty cell has to be traversed, and every time we have to compare it with every galaxy. This leaves us with a lower bound of O(nk3). The final steps are bound by the classification of unique voids, which has to compare each void to a certain parameter for the radius, so this step is linear to the number of voids found.

2.1.9 Discussion

There seem to be two big algorithm families. For one part, we have cube and sphere growing ( [?,?,?,?,?]), and calculating a distance field ( [?, ?, ?, ?]). All these algorithms require a sort of quadratic algorithm. However, since we are working on a three-dimensional space, the relevant parameter of the gridding is the cube edge length. By altering the cube edge length, the number of cubes grows cubically. So, in the end, the

13 algorithms, by depending on the cube edge length, are actually bound by a power of 6 (O(k6)). This could present problems with run-time when the resolution is too high. To better see why growing spheres or cubes, and the computation of the distance field are algorithmically similar, we have to imagine that when building a distance field, each cell is searching for the nearest neighbor. The best way to do this is usually by growing a sphere or cube.

The second group of algorithms are those using Delaunay triangulations and tetrahedra, or Voronoi cells ([?,?,?,?]). These algorithms show running times bounded by the building of the triangulation/tesselation, which are near O(n log n). n is the number of triangles, which correlates linearly with the number of initial points. The processing done after this initial step usually involves only traversing the triangles or cells, so normally around O(n). With bigger datasets, these algorithms should perform better than the first ones.

14 2.2 Computational Techniques

In this Section we will present a series of computational techniques used on the experimental void-finding algorithms created, as well as used on the final version of the algorithm.

2.2.1 Delaunay Triangulation

For a given set P of discrete points in a plane, there is a triangulation DT(P) such that no point in P is inside the circumcircle of any triangle in DT(P) (see Fig. 2.1). Delaunay triangulations maximize the minimum angle of all the angles of the triangles in the triangulation; they tend to avoid sliver triangles. The triangulation is named after Boris Delaunay [?].

Figure 2.1: A Delaunay Triangulation with with circumcircles shown

2.2.2 K-Nearest-Neighbor Search

K-Nearest Neighbor search (k-NN search) is a variant of the Nearest Neighbor search problem, proposed in 1973 by Donald Knuth as the Post office problem [?].

The Nearest Neighbor problem is defined as follows: Given a set S of points in a space M and a query point q ∈ M, find the closest point in S to q. Usually, M is a metric space and dissimilarity is expressed as a distance metric, and M is taken to be the d-dimensional vector space where dissimilarity is measured using the Euclidean distance. In the case of this work, we will be using a two-dimensional vector space and Euclidean distance.

The k-NN problem is a direct generalization of the NN problem, where we need to find the k closest points.

15 Figure 2.2: A representation of 3-NN search [?]

2.2.3 KD-Tree

A k-dimensional tree is a space-partitioning data structure for organizing points in a k-dimensional space. k-d trees are a special case of binary space partitioning trees. They were invented by Jon Louie Bentley in 1975 [?].

Every non-leaf node can be thought of as implicitly generating a splitting hyperplane that divides the space into two parts, known as half-spaces. Points to the left of this hyperplane are represented by the left subtree of that node and points right of the hyperplane are represented by the right subtree (see Figure 2.3). The hyperplane direction is chosen in the following way: every node in the tree is associated with one of the k-dimensions, with the hyperplane perpendicular to that dimension’s axis [?].

Figure 2.3: KD-Tree

16 2.2.4 Parallelization, OpenCL and pyOpenCL

By Parallelization we mean the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the central processing unit (CPU). Essentially, it acts as a pipeline of parallel processing between one or more GPUs and CPUs that analyzes data as if it were in image or other graphic form. While GPUs operate at lower frequencies, they typically have many times the number of cores. Thus, GPUs can process far more pictures and graphical data per second than a traditional CPU. Migrating data into graphical form and then using the GPU to scan and analyze it can create a large speedup.

OpenCL [?] (Open Computing Language) is a framework for writing programs that execute across hetero- geneous platforms. They consist for example of CPUs, GPUs, DSPs, and FPGAs. OpenCL specifies a programming language (based on C99) for programming these devices and application programming interfaces (APIs) to control the platform and execute programs on the compute devices. OpenCL provides a standard interface for parallel computing using task-based and data-based parallelism.

PyOpenCL [?] is a python wrapper for OpenCL. It has object cleanup tied to lifetime of objects. This idiom, often called RAII in C++, makes it much easier to write correct, leak- and crash-free code. A big advantage of PyOpenCL is its completeness, giving access to the complete list of OpenCL’s features. Additionally, it has automatic error checking, translating all errors automatically into Python exceptions. PyOpenCL’s base layer is written in C++, so it runs virtually as fast as the original OpenCL. PyOpenCL is open-source under the MIT license and free for commercial, academic, and private use.

2.3 Research Methodology

Due to the exploratory nature of the Thesis, conventional development methodologies will have to be adjusted, and a more agile approach will be used. Techniques, algorithms, and combinations of both will be tried out, in order to determine the ones that will actually be used in the final iteration of ORCA. Therefore, the methodology will be action-research [?], [?].

Action-research consist of a series of repeating steps. The first step is the planning phase. Here, the groundwork for the next phase is laid out, and the researcher makes design and practical decisions. The idea is to make quick cycles, in order to arrive at this phase with new insights on the problem.

The next phase, act, is the most time-consuming phase since it consists of doing the actual work. In the case of this Thesis, most of the programming will be done in this phase, as well as future optimization work and tweaks to enhance performance.

In the observing phase, the researcher merely collects information and data. Regarding the algorithms, this is the step where the performance (running time and memory) will be benchmarked, as well as determining the

17 accuracy of the algorithm, and his resistance to noise in the data.

The last phase is reflect, where the researcher draws conclusions based on the observations made earlier. The next planning phase will depend on said conclusions. This means that the algorithm will be refined as the cycle start repeating.

2.3.1 Performance Indicators

When testing the algorithm, four distinct performance indicators will be used. First, we will test out the running time of ORCA over the many data-sets, and compare it to the running time of DELFIN [?]. The second performance indicator is the total memory usage. The third one is effectiveness, which will be measured as the % of overlap between the voids found by our algorithms and the ones found by the DELFIN algorithm. [?]. As a fourth and last indicator, we will test out ORCA’s and DELFIN’s robustness, by designing voids with noise inside them.

18 Chapter 3

Cosmic Void-Finding Algorithm

A new Void-Finding Algorithm has been developed, by following a series of smaller trials-and-errors. This algorithm is described in Section 3.2.2. In Section 3.3 we discuss the reasoning behind the choice.

The rest of the chapter describe our first attempts at building a void-finder. The first one was brute force, which will give us some insights and will serve as a baseline for the next ones. We will then describe a first approach to finding nearest-neighbor using a Delaunay Triangulation. The next algorithm uses a KD-tree as a base structure. The second section depicts the improved solutions. The first one uses high k and ε values, creating a sort of dense network, and then fills the voids according to the area, using image processing. The next algorithm uses gen generations of Delaunay triangulation neighbors in order to classify points to eliminate the noise and then proceeds to eliminate the edges of the Delaunay Triangulation that are smaller than the ε used. Lastly, we build a parallel version of the algorithm, using PyOpenCL [?]. For more details on the used algorithms see Section 2.2.

3.1 First Approaches

Every algorithm described in this section, uses a different method to classify points into three different categories. We will be using k-Nearest Neighbor search (see Section 2.2.2) for this end. Every algorithm uses two parameters: ε and k. We will define two points as ε-neighbors if they are at least within ε distance of each other. A point is considered a center point if it has at least k ε-neighbors. If a point is not a center point, but one of his ε-neighbors is a center point, then the point is considered a border point. If the point is neither a center nor a border point, then it is considered an outlier point. This strategy will allow us to remove outlier points as noise, thereby making the algorithm more robust (in this case, resistant to noise).

3.1.1 Brute Force

The very first approach used is simply brute force. We will make a distance matrix, storing the distance from N 2 all points ( 2 distances). After that, we simply classify all points using the k-ε-neighbor criterion.

19 Code

Listing 3.1: Brute Force DECLARE epsilon , k, file READ file , with n points DECLARE distance matrix M: shape(n,n)

//Filling distance Matrix M FOREACH point i IN point list: FOREACH point j IN point list: M[i,j]=M[j,i]=distance(point i, point j)

//calculate epsilon depending on k: //epsilon is the mean distance of the kth neighbor.

DECLARE EMPTY LIST kNearest FOR i FROM 0 TO n : APPEND first k elements of sorted M[i] TO kNearest DEFINE epsilon as MEAN OF kNearest

DECLARE EMPTY LISTS center , candidates , outlier , c e n t e r

//check for center objects. FOREACH point i IN point list: DEFINE nrNeighbors AS 0 FOREACH point j IN point list: IF (M[ i , j ] <= epsilon) AND i != j: INCREMENT nrNeighbors BY 1 IF nrNeighbors >= k : APPEND point i TO center list ELSE : APPEND point i to candidates list

//Move candidates from outlier to border, //if they fulfill the criterion. //If not, put them on the outlier point list. FOR EACH candidate in candidates list : wasBorder = False FOR EACH center in center list: IF M[candidate ,center] <= e p s i l o n : APPEND candidate TO border list DEFINE wasBorder AS True BREAK IF NOT wasBorder: APPEND candidate TO outlier list

PLOT center, outlier , border lists

Code Description

First, we define a distance function. This is just the Euclidean distance

p 2 2 (d = (x1 − x2) + (y1 − y2) ). We then declare the appropriate ε, k, and the file to be read. We parse the data on the file, and save it onto a variable, as a set of points. We create the distance matrix (with a shape of ` x `) and proceed to fill it with each distance between points. This is one of the slowest parts of the algorithm: It takes O(n2) every time. Optionally, we can calculate a specific ε depending on k: We will

20 use the mean distance of the k-th neighbor in this case. We then declare a series of empty arrays, to put the points in them (we will classify those points later). Now we will loop over every point, and compare the distance to every other. If we find another point whose distance is less or equal than ε, we count it as a neighbor. Once this loop is finished, if we have at least k neighbors, we add the point to the center point list. If not, we add it to the candidates’ list. After this step is over, we loop over all candidates, to check if they fulfill the border criterion: To have at least one ε-neighbor center point. Finally, we plot the points.

21 3.1.2 Delaunay Triangulation

The next approach is calculating the k-ε-neighbor with a Delaunay triangulation (see Section 2.2.1). This saves a lot of computation time, since building a Delaunay triangulation takes O(n log n) time, and neighbors in the Triangulation can be found in O(1) time. However, on average, there are 6 neighbors for each point in the triangulation. This means that for a higher k, this method will not work properly, because the Delaunay Triangulation is not going to provide us with more than 6 neighbors, on average.

Code

Listing 3.2: Delaunay Triangulation DECLARE epsilon , k, file READ file , with n points

//make the Delaunay triangulation over the points. triangulation = Delaunay(points)

DECLARE EMPTY point LISTS center , candidates , outlier , center //loop over every point

FOR EACH p in point list: DEFINE nrNeighbors AS 0 FOR EACH neighbor IN find −neighbors(p, triangulation ): IF (distance(neighbor ,p) <= e p s i l o n ) : INCREMENT nrNeighbors BY 1 IF nrNeighbors >= k : APPEND p TO c e n t e r l i s t ELSE : APPEND p to candidate list

//Move candidates from outlier to border, //if they fulfill the criterion. //If not, put them on the outlier point list. FOR EACH candidate in candidate list : wasBorder = False FOR EACH center in center list: IF distance(center ,candidate) <= e p s i l o n : APPEND candidate TO border list wasBorder = True BREAK IF NOT wasBorder: APPEND candidate TO outlier list

PLOT center, outlier , border lists

Code Description

As in the previous case, we start by declaring the desired ε, k, and the file with the points to be read. We proceed to read the file and store the points on a local variable. Afterwards, we make the Delaunay triangulation (see Section 2.2.1), and we store that into a variable too. We declare the empty lists, to be filled out later. Now comes the classifying part: we loop over each point in the dataset. For each point, we

22 look up its neighbors in the Delaunay triangulation. This is very fast (O(1)), as the triangulation already has neighbors stored. We use these neighbors to check for the ε criterion, adding those points with at least k neighbors to the list of center points. The next step is to determine if the candidates are outliers or border points. Once this step is finished, we proceed to plot the points.

23 3.1.3 KD-Tree

The last algorithm of this section uses a KD-Tree (see Section 2.2.3). The idea is to take advantage of the data structure, which allows us to check for k-Nearest Neighbors quickly (O(log n) time on average, worst case of O(n) ).

Code

Listing 3.3: KD-Tree DECLARE epsilon , k, file READ file , with n points BUILD kdTree WITH p o i n t s DECLARE EMPTY point LISTS center , candidates , outlier , center

//check for center objects. FOR EACH point : GET number of epsilon −neighbors FROM kdTree IF number of epsilon −neighbors >= k : APPEND point TO center list ELSE : APPEND point TO candidate list

//Move candidates from outlier to border, //if they fulfill the criterion. //If not, put them on the outlier point list. FOR EACH candidate IN candidates list : wasBorder = False FOR EACH center IN center list: IF distance(candidate ,center) <= e p s i l o n : APPEND candidate TO border list SET wasBorder TO True BREAK IF NOT wasBorder: APPEND candidate TO outlier list

PLOT center, outlier , border lists

Code Description

The algorithm starts by declaring the ε, k, and the file to be read. We read the file and store it in memory. Now we proceed to build the KD-tree (see Section 2.2.3) with the given points. Empty lists are declared, to be filled up later. We loop over every point. For each one, we get the number of ε neighbors from the KD-tree. This is accomplished in O(log n) time on average (or O(n) worst-case), given the structure of the KD-tree. Since we will do this for every point in our set, this gives us an average total running time of O(n log n). As usual, if the k-ε-neighbor criterion is met, the point is put into the center list, otherwise, into the candidates’ list. Upon completion of this part, we check to see which candidates belong to the border point list, and which into the outlier list.

24 3.2 Improved Solutions

3.2.1 KD-Tree with high k + Image Processing

If we run the previous KD-tree implementation with high values of k and ε, and we plot only the edges that connect center points, we notice that voids are generated naturally (see Figure 3.1). The general idea is to take these images, and via image processing analyze the voids generated this way. We will need a new parameter: A threshold value used to determine if the area of a void is big enough to classify it as a cosmic void.

Figure 3.1: KD-Tree algorithm run with n=8192 points, k=9, ε=100, plotting center points only

Code

Listing 3.4: KD-tree + Image Processing DECLARE epsilon , k, threshold , file READ file , with n points GENERATE p l o t with KD−Tree SAVE plot as plot −F i l e //remove borders and leave only the image PROCESS plot −F i l e

//loop over pixels: FOR EACH pixel p in plot − f i l e : DEFINE toCheck as empty list DEFINE voidPoints as empty list APPEND p to toCheck WHILE toCheck IS NOT EMPTY:

25 DEFINE c u r r e n t as POP FROM toCheck APPEND current to voidPoints IF current(x+1) IS inside image AND current(x+1) is white: APPEND current(x+1) TO toCheck IF current(x−1) IS inside image AND current(x−1) i s white : APPEND c u r r e n t ( x−1) TO toCheck IF current(y+1) IS inside image AND current(y+1) is white: APPEND current(y+1) TO toCheck IF current(y−1) IS inside image AND current(y−1) i s white : APPEND c u r r e n t ( y−1) TO toCheck IF length(voidPoints) > t h r e s h o l d : FILL(voidPoints)

Code Description

We will first declare our parameters. In this case, we will use ε and k to generate the first plot. We will also need a threshold value, for the second part of the algorithm. We then read the file. Using the saved points, we generate the plot, and save it, using the KD-tree algorithm. The generated plot has big white borders, meaning that we need to crop the image first.

After the cropping (see Figure 3.1), we start to loop through each pixel. An earlier version of this code used a recursive algorithm to count white adjacent pixels, but due to constant stack overflows, it was changed to an iterative implementation. The algorithm goes as follows: Two empty lists are declared. The first one is used to keep adding pixels to a list used to save pixels yet-to-check. If a pixel is white, we will add it to the second list, the pixel void list. Once this step is completed, we can count how many pixels make up the void. If it is above the defined threshold, we will count that as a cosmic void, we paint all those pixels a random color. Alternatively, we can return the list of cosmic voids as pixels.

3.2.2 ORCA: Higher generation Delaunay Triangulation k-NN search for Noise removal + edge removal

The idea behind this algorithm is quite simple, and its low algorithmic complexity comes from that simpleness. As discussed in a previous section (Section 3.1.2), using Delaunay Triangulation to perform a k-NN search, has the slight disadvantage that you can not use values of k higher than 6. However, there is a nice way to circumvent this boundary. If you use not only the direct neighbors of a point in the triangulation but also the neighbors of those neighbors (called, henceforth, second-generation or second-order neighbors), you can easily expand your possible k values. In fact, just by using second-gen neighbors, we can go from ∼6 up to ∼18 neighbors (∼6 from first-gen, plus ∼12 more from second-gen, as depicted in Figure 3.2, where we show first-gen neighbors in blue and second-gen neighbors in red).

With this in mind, we will conduct a higher-generation k-NN search in order to classify our points into the three categories (center, border and outlier points), as explained in Section 3.1. This search, because of the

26 Figure 3.2: Example of second-gen Triangulation Neighbors

properties of the Delaunay Triangulation, has a quick average running time of O(n log n). However, this is a bit slower than the actual creation of the Delaunay Triangulation. Thus, this part is the bottleneck of the algorithm. Once all points are classified, we begin the second step of the algorithm.

We will now classify every triangle in our Delaunay Triangulation unto two possible categories: Center triangles, or void triangles. If all three points of a triangle are center points, and if all edges are smaller than ε, then it is a center triangle. Otherwise, we will count that as a void triangle. Coloring all void triangles gives us a clear and quick view of the voids in the data set.

Code

Listing 3.5: ORCA DECLARE epsilon , k, gen, file READ file , with n points

//make the Delaunay triangulation over the points. triangulation = Delaunay(points)

DECLARE EMPTY point LISTS center , candidates , outlier , center

//loop over every point FOR EACH p in point list: DEFINE nrNeighbors AS 0 FOR EACH neighbor IN find −neighbors(p,gen, triangulation ): IF (distance(neighbor ,p) <= e p s i l o n ) : INCREMENT nrNeighbors BY 1 IF nrNeighbors >= k : APPEND p TO c e n t e r l i s t ELSE : APPEND p to candidate list

//Move candidates from outlier to border, //if they fulfill the criterion. //If not, put them on the outlier point list. FOR EACH candidate in candidate list : wasBorder = False FOR EACH center in center list:

27 IF distance(center ,candidate) <= e p s i l o n : APPEND candidate TO border list wasBorder = True BREAK IF NOT wasBorder: APPEND candidate TO outlier list

DECLARE EMPTY t r i a n g l e LISTS void−t r i a n g l e , center −t r i a n g l e FOR EACH triangle t IN triangulation: DEFINE a,b,c AS points of triangle t

IF a,b,c IN center list AND distance(a,b), distance(a,c), distance(b,c) < e p s i l o n : APPEND t TO center −t r i a n g l e ELSE : APPEND t TO void−t r i a n g l e

Code Description

As in the previous case, we start by declaring the desired ε, k, and the file with the points to be read. Additionally, we declare a gen parameter, to specify how many generations of neighbors in the Triangulation we want to look up. We proceed to read the file and store the points on a local variable. Afterwards, we make the Delaunay triangulation (see Section 2.2.1), and we store that into a variable too. We declare the empty lists, to be filled out later. Now comes the classifying point: we loop over each point in the data-set. For each point, we look up its neighbors in the Delaunay triangulation. Here lies a significant difference with the previous Delaunay Triangulation algorithm: We will not only check the neighbors in the triangulation but will also go up to look for gen generations of neighbors. This will be not as fast as the previous case, but it is still very fast, and we have successfully solved the problem for higher k. We use these neighbors to check for the ε criterion, adding those points with at least k neighbors to the list of center points. The next step is to determine if the candidates are outliers or border points.

After the classification of points, we start classifying triangles. Two empty triangle lists are declared, one for center triangles, one for void triangles. For each triangle in the Delaunay triangulation, we will define the three points of the triangle as a, b and c. Then, we check if a, b and c are center points, and we check that all edges of the triangle are smaller than ε. If so, we will add that triangle to the center triangles list. If not, we will put it into the void triangle list. This step is quite fast. Every Delaunay Triangulation has O(n) triangles, for a given data-set with n points. So for each point, we have to look it up, which will not take more than O(log n) time. This way, we arrive at an upper bound of O(n log n) time. After the triangles are classified, we can plot them, or, alternatively, return them as output.

28 3.3 Discussion and selection of algorithm

The first approaches (see Section 3.1) show our initial attempts at building the voidfinder by using k-Nearest- Neighbor search (see Section 2.2.2). The brute force approach (see Section 3.1.1) uses a memory-heavy matrix to store all distances. Even worse: this steps takes O(n2) time. The next step is equally slow: We need to traverse every point in our data set, and compare it to others in order to classify it. We then tried a faster approach, by using a Delaunay Triangulation (see Section 2.2.1 and Section 3.1.2) to classify points. This is already a huge improvement, since building the Delaunay Triangulation takes O(n log n) time. The downside is that every point in the Triangulation has 6 neighbors on average, meaning that if we want to use higher values of k this method will not work. This is due to the fact that if we want to check if a given point has, say, 10  − neighbors, the Delaunay Triangulation will provide us with only 6 of them; thus, we can not be know if said point has the 10  − neighbors or not. The last approach we tried out was using a KD-Tree (see Section 2.2.3 and Section 3.1.3). This algorithm is really fast, given the underlying data structure; however, the results obtained show that the classification of points is not very precise.

The next part (see Section 3.2) are more refined solutions that yielded better results, and are mostly based upon the first approaches. First, we noted that with the basic KD-Tree Algorithm version (see Section 3.1.3), if we plotted the data using a high value of k, most non-void parts of the data would end up covered. Voids, meanwhile, would end up mostly empty. Using image processing, we would extract the voids. The downside here was the second part: Image processing, in this case, proved too slow.

Finally, we arrive at the best solution (see Section 3.2.2). We use a Delaunay Triangulation, and do not limit ourselves to the direct neighbors, but recursively traverse the nodes as not to limit the k parameter. By classifying points, we can remove outliers and clean up our data set. Finally, we classify the triangles in the triangulation according to void and center triangles. This is the algorithm we selected, for parallelization (Section 3.4 and comparison purposes (Chapter4).

29 3.4 Parallel Approach

As discussed in Section 3.3, we selected ORCA as the best solution. In this Section we will first discuss the different parallelization tools and frameworks available, and then show two approaches of parallel versions of ORCA.

3.4.1 Parallelization tools and frameworks

There are multiple tools for programming across multiple platforms and hardware such as CPUs, GPUs, etc. In this subsection we will discuss the various tools and frameworks that exist today, and why OpenCL (specifically, pyOpenCL) was chosen to make the parallel version of ORCA.

CUDA

CUDA is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs) [?]. It serves as a platform layer giving access to the GPU. It works by executing kernels. CUDA can be programed in C,C++, Fortran, Python and MATLAB. Nvidia also provides a toolkit in order to develop GPU-accelerated applications. CUDA possesses numerous advantages. It has scattered reads, meaning that the coda can read from any address in memory. It also has a unified virtual memory and unified memory. CUDA also uses a fast shared memory region, that can be accessed by threads. Additionally, it features faster downloads and read backs to and from the GPU, and it has full support for integer and bitwise operations, including integer texture lookups.

However, CUDA also has many limitations. All CUDA source code has to be processed according to C++ rules. This is both on the host computer or on the GPU device. When interacting with OpenGL, the interoperability is one-way. This means that OpenGL has access to registered CUDA memory, but not the other way around. Whilst copying from host to device memory, performance hits may occur. This is due to system bus bandwidth and latency. In order to achieve best performance, threads have to run un groups of at least 32. The total number of threads may number the thousands. Unlike OpenCL, GPUs that can run CUDA code are only available from Nvidia. There is no fallback functionality, nor an emulator. Run-time information and exception handling are only supported in the host code, but not in device code.

C++ AMP

C++ Accelerated Massive Parallelism (C++ AMP) is a C++ Library that provides tools for developing programs that compile and execute on data-parallel hardware such as GPUs. It is implemented on DirectX 11 by Microsoft, programming directly in C++. It is portable, and the Microsoft implementation is included in Visual Studio 2012, and it features a debugger and profiler support.

30 DirectCompute

Microsoft’s DirectCompute is an API intended to run compute kernels on CPUs and GPUs on Windows Vista, Windows 7 and later versions. It was released with the DirectX 11 API . The main disadvantage is portability: a Windows machine is required in order to execute DirectCompute kernels. This restriction lead us to discard this tool.

OpenCL

OpenCL (Open Computing Language) is a framework developed by Khronos Group for programing across heterogeneous platforms, consisting mainly of CPUs and GPUs. It specifies programming languages, based on C99 and C++ 11. OpenCL features an API to control the platform, and thereby execute programs on the compute devices. It provides a standard interface for parallel computing, using not only data-based parallelism but also task-based.

Basically, OpenCL treats a computing system as an array of compute devices. These consist of CPUs, GPUs, etc. There is a C-like programming language used to program kernels. Kernels are series of instructions (functions) executed on OpenCL devices. The idea is that a single kernel execution may run on many of the processing elements in parallel.

One of the key features of OpenCL is his API. It allows programs running on the host to launch kernels on the compute devices, as well as manage device memory. The OpenCL standard defines third-party APIs for other programming languages and platforms, such as Python (pyOpenCL). This last part is crucial for this investigation, since the bulk of the software of ORCA is programmed in Python.

3.4.2 First approach

The most time-consuming part of ORCA is the computation of the distances of points. This is done in two different parts of the code: First, when looking for center points, and afterwards, when looking for border points. Thus, we define two kernels, one for each task.

The first approach was a direct modification of the original ORCA algorithm. In order to give the kernels the proper input, numpy vectors and matrices had to be constructed. Surprisingly, whilst bench-marking this first version, we saw that the running time was around 2 seconds slower than linear ORCA. Further inspection revealed that the parallel part was extremely fast (less than a second usually), but the construction and preparation of the data took over 5 or 6 seconds. Thus, a new version was created, without pre-processing so much data.

Next, we will show the kernel used:

Listing 3.6: First Kernel kernel void isCore( global const float2 ∗ points ,

31 global const int ∗ matriz , g l o b a l i n t ∗ r e s u l t s , int k, float epsilon, int ancho ){ i n t idx = g e t g l o b a l i d ( 0 ) ; int nrVec = 0; float2 A = points[idx]; for(int i=0;i=k) { results[idx] = 1; return ; } } resultados[idx] = 0; }

kernel void isBorder( global const float2 ∗ points , global const int ∗ matriz , g l o b a l i n t ∗ results , float epsilon, int ancho){ i n t idx = g e t g l o b a l i d ( 0 ) ; if(results[idx] == 0){ float2 A = points[idx]; for(int i=0; i < ancho ; i ++){ i n t indexB = matriz[getPos (idx,i ,ancho)]; if(results[indexB] == 1){ f l o a t 2 B = points[indexB]; if (distancia(A,B) <= e p s i l o n ){ results[idx] = 2; return ; } } } } }

3.4.3 Final parallel version

In the new, final version, each kernel receives the list of points, k, ε and a list of results, to be filled out. A single kernel just takes a single point, and calculates some distances to the other points. The Center-finding kernel looks if a certain point has at least kε-neighbors, and marks this result as a 0 or a 1 on the results list. Later on, the border-finding kernel is called. Each one of these kernels looks of it has a center at ε distance or less, and marks itself on the result list as a 2 if this is the case.

Code:

Listing 3.7: Kernel 2 kernel void isCoreNxN( global const float2 ∗ points , g l o b a l i n t ∗ resultados , int k,

32 float epsilon, int ancho ){ i n t idx = g e t g l o b a l i d ( 0 ) ; int nrVec = 0; float2 A = points[idx]; for(int i=0;i=k) { resultados[idx] = 1; return ; } resultados[idx] = 0; }

kernel void isBorderNxN( global const float2 ∗ points , g l o b a l i n t ∗ resultados , float epsilon , i n t ancho ){ i n t idx = g e t g l o b a l i d ( 0 ) ; if(resultados[idx] == 0){ float2 A = points[idx]; for(int i=0; i < ancho ; i ++){ if(resultados[i] == 1){ f l o a t 2 B = p o i n t s [ i ] ; if (distancia(A,B) <= e p s i l o n ){ resultados[idx] = 2; return ; } } } } }

See Section 4.2.3 for a comparison of running times between the sequential version and the parallel one.

33 Chapter 4

Results and Analysis

In this chapter we will show the results obtained with by ORCA. First, we will show some generated samples. Then, we compare ORCA with DELFIN [?].

4.1 Generated Samples

In this section, we show some results we found whilst running ORCA. The points were generated for the study and benchmarking of DELFIN [?].

Important note: It is common practice to test out algorithms on two-dimensional artificial data sets, before moving to real ones. Furthermore, real data comes in three dimensions, which escapes the scope of the present thesis.

First, in Fig. 4.1 we can see a data set of 4096 randomly generated points, with a density of 1, 024x10−3 points/area. After this, carefully placed and shaped irregular voids are taken out of the set.

When running ORCA over the 4096 irregular data set, we can see that the algorithm can detect most big voids. Strange and small shapes however, are not detected (see Fig. 4.2)

Next, we can see the data-set expanded to 8192 random points, with a density of 2, 048x10−3 points/area (with the same voids carved out of the data, see Fig. 4.3).

The algorithm still finds most voids when using a more densely packed data-set, and it still struggles detecting smaller shaped voids (see Fig. 4.4).

The following figure (4.5) shows a randomly generated data-set of 32768 points, with a density of 8, 024x10−3 points/area. We can clearly see the voids, most of them build in non-traditional shapes, in order to test our algorithm.

The point classifier successfully allows ORCA to classify most points as center points when using the appropriate values for ε and k (see Fig. 4.6).

34 Figure 4.1: A randomly generated 4096-point dataset

Figure 4.2: Voids found on a randomly generated 4096-point dataset, using 4th generation Delaunay neighbors, ε value of 100 and k value of 15

35 Figure 4.3: A randomly generated 8192-point dataset

Figure 4.4: Voids found on a randomly generated 8192-point dataset, using 3th generation Delaunay neighbors, ε value of 100 and k value of 8

36 Figure 4.5: A randomly generated 32768-point dataset

Figure 4.6: Voids found on a randomly generated 32768-point dataset, using 5th generation Delaunay neighbors, ε value of 80 and k value of 20

37 4.2 Void-finding algorithm comparison

In this section, we aim to compare ORCA with DELFIN [?], in order to validate ORCA and benchmark it according to the objectives stated (see 1.5).

4.2.1 Comparison conditions and Data Sets

We will compare the new void finding algorithm, ORCA, with DELFIN [?].

For the first trials we used a randomly generated set of 1.024 two-dimensional points. As the development process continued, we used higher density data sets, doubling each time the number of random points, up to 262.144 points.

Both algorithms were tested on the following machine:

• MacBook Pro (Retina, mid 2014)

• Processor: 2.6 GHz Intel Core i5

• Memory 8 GB 1600 MHz DDR3

• Graphics Intel Iris 1536 MB

4.2.2 DELFIN and ORCA

The DELFIN algorithm has two main parts [?] (see Section 2.1.5):

1. Subvoid building step: DELFIN builds terminal-edge regions from the Delaunay triangulation of the data set. Small terminal-edge regions are discarded by using area or length values. The remaining regions are considered as subvoids.

2. Joining subvoids: subvoids found in the previous part are joined into candidate voids if they fulfill a criterion specified by the user. Candidate voids are marked as voids if their area is larger than a minimum valid void area parameter.

The new algorithm, ORCA, has also two distinct steps (see Section 3.2.2):

1. A Delaunay triangulation is build. With it, ORCA identifies neighbors and classifies points into three categories: Center points, border points and outliers.

2. Outliers are discarded, and border points together with the Delaunay triangulation are used to build the voids.

38 DELFIN ORCA ORCA Parallel # of Points Running time (s) Running time (s) Running time (s) 1 024 2.58 0.17 0.08 2 048 4.14 0.38 0.21 4 096 5.08 0.60 0.33 8 192 5.40 1.17 0.61 16 384 8.29 2.25 1.25 32 768 14.53 4.74 2.40 65 536 27.90 9.14 4.67 131 072 54.61 18.28 9.42 262 144 107.29 36.97 18.58

Table 4.1: Running time comparison between DELFIN, ORCA, and ORCA Parallel

4.2.3 Performance Indicators

When testing ORCA, four distinct performance indicators were used. First, we measured the running time of ORCA over the chosen datasets and compared it to the running time of the void finder DELFIN [?]. The second performance indicator was total memory usage. The third one was effectiveness, which will be measured as the % of overlap between the voids found by ORCA and the ones found by the DELFIN algorithm, as well as comparing it to the actual areas of the voids. As a fourth indicator, we measured robustness, i.e., the resistance to noise in the data.

Running time

The following table shows the running time, in seconds, of DELFIN, and of ORCA. They were tested on the number of points shown on Table 4.1.

As we can see, both algorithms scale almost linearly with the number of points (i.e., if we double the points we will double the running time). However, ORCA outperforms DELFIN considerably - its almost three times as fast.

Memory Usage

Table 4.2 shows the memory used by both DELFIN and ORCA.

Due to the structures used, and the way it was built, ORCA uses less memory than DELFIN.

39 DELFIN ORCA # of Points Memory Used (MB) Memory Used (MB) 1 024 42 37 2 048 47 39 4 096 58 43 8 192 80 52 16 384 126 66 32 768 212 98 65 536 391 161 131 072 728 287 262 144 1434 502

Table 4.2: Memory used by DELFIN and by ORCA

40 Effectiveness

• Recovery and Error Rates:

To measure the performance of the algorithm, we used the same two parameters as used in Ortega [?]: rv, the retrieval rate for void v, and ev, the overdetection rate for void v. These quantities are respectively calculated as the detected fraction of the original void, and the non-void fraction of the detected void. More precisely,

A∩v we define the retrieval rate of a void as rv = , where A∩v is the intersection between the generated void v Av and the retrieved void v , and Av is the area of v. Similarly, we define the overdetection or error rate of a

A∩v void as ev = 1 where Av is the area of the retrieved void v. Av∗

In order to test our algorithm, we experimentally determined the best values for k and ε. Generally speaking, the values of k range between 3 (for sparse data-sets) to close to 20 (for really dense data-sets). In the case of ε, it ranges between 200 (for really sparse data-sets) to 50 (for really dense data-sets). If the algorithm finds to few voids, this means that too many points are classified as central points, meaning that we have to increase k and/or decrease ε. The same way, if the algorithm is overdetecting voids, this means that we are finding too many outlier points or border points, which means that we have to decrease k and/or decrease ε.

Irregular voids:

Figure 4.7: Recovery and Error Rates of irregular voids over a 1.000 points set, using values of k=3 and ε=150.

As can clearly be seen in Figure 4.7, there were some voids not detected at all by ORCA. Many other voids were not detected accurately. This is due to the distance between points on the 1.000 points set, and the way ORCA works.

While detecting voids on the 5.000 points dataset (see Fig. 4.8), ORCA did considerably better. Just two voids were not detected, and most other voids were detected with only a small overdetection percentage.

On the 10.000 points set (Fig. 4.9), all voids were successfully detected, and ORCA had just problems detecting void 6.

Finally, on the 50.000 points set (Fig. 4.10), ORCA detected all voids with only small error and overdetection rates.

41 Figure 4.8: Recovery and Error Rates of irregular voids over a 5.000 points set, using values of k=7 and ε=100.

Figure 4.9: Recovery and Error Rates of irregular voids over a 10.000 points set, using values of k=12 and ε=70.

Figure 4.10: Recovery and Error Rates of irregular voids over a 50.000 points set, using values of k=15 and ε=50.

42 Regular Voids:

Figure 4.11: Recovery and Error Rates of regular voids over a 1.000 points set, using values of k=3 and ε=200.

Again, while testing ORCA on a 1.000 point set (Fig. 4.11), the algorithm had trouble detecting some of the voids, because of the low density of the data and voids being rather sparse.

Figure 4.12: Recovery and Error Rates of regular voids over a 5.000 points set, using values of k=7 and ε=120.

On the 5.000 points set (Fig. 4.12), ORCA did significantly better. Two voids were not detected, and two more had low recovery and high overdetection rates.

Four distinct voids where not detected on the 10.000 points set (Fig. 4.13). They were either too small or lay on a border.

On the 50.000 points set (Fig. 4.14), just one void was not detected (void 3). This void lay on the border, and so was not accounted for. For most other voids, the error was virtually non-existent.

On the 100.000 point set (Fig. 4.15), most voids that were not detected were voids that merged. ORCA detected them as a single void, and thus, they are not accounted for.

43 Figure 4.13: Recovery and Error Rates of regular voids over a 10.000 points set, using values of k=10 and ε=100.

Figure 4.14: Recovery and Error Rates of regular voids over a 50.000 points set, using values of k=14 and ε=70.

Figure 4.15: Recovery and Error Rates of regular voids over a 100.000 points set, using values of k=14 and ε=70.

44 • Comparison with DELFIN

In this section, we aim to compare the effectiveness of DELFIN and ORCA. To this end, we will compare the number and shapes of the found voids, as well as comparing it to the actual voids inserted in the data.

Fig. 4.16 shows a set of 8.192 points, with irregular shapes cut out. On the left side are the voids found by ORCA, and on the right, the ones found by DELFIN. Fig. 4.17, highlights with green color voids that match approximately in size, and with orange voids that differ in shape or size. Note that every big void has been found.

Figure 4.16: Comparison between ORCA (left) and DELFIN (right), using a set of 8192 points

Figure 4.17: Comparison between ORCA (left) and DELFIN (right), using a set of 8.192 points, showing in green similar voids found, and in orange voids with different shapes found.

Robustness

To test the robustness (i.e., resistance to noise in the data) of the algorithms, a special test was designed. First, 10.000 points where randomly placed on a plane following an uniform distribution, except for the inside of a 200-radius circle centered on the origin. Then, noise was added. Fig. 4.18 shows 75 random noise points added to the void in the middle. Here, ORCA detects most parts of the central void, while DELFIN detects two

45 small voids. However, as seen in Fig. 4.19, when putting 125 random noise points inside the circle, DELFIN is no longer capable of detecting it as a void (right side). If we adjust DELFIN’s minimal area parameter, we get lots of non-existent voids (central image), while our algorithm still detects the large void in the center.

Figure 4.18: Comparison between ORCA (left) and DELFIN (right), using a set of randomized 10.000 points with a 200-radius void with 75 random noise points inside.

Figure 4.19: Comparison between ORCA (left) and DELFIN (middle and right), using a set of randomized 10.000 points with a 200-radius void with 125 random noise points inside. For DELFIN the parameters were moved so that it found many small voids (center) and on the next iteration of parameters it did not find voids (right)

46 Chapter 5

Conclusions and Future Work

In this chapter we show the main conclusions drawn from this work. We discuss how the objectives were achieved, and answer out hypothesis questions. Finally, we explain what future work can be derived from this work.

5.1 Conclusions

We set out firstly to do an extensive review of the current state of the art of void-finding algorithms. An extensive review was done, not only classifying the algorithms and techniques found, but also venturing on the algorithmic complexity of each one of the algorithms (see Section 2.1). A surprising conclusion was that there are broadly two big categories of void finding algorithms, and that many existing algorithms basically did the same computations, although they seemed different.

The main goal was to construct a new void finding algorithm, that could outperform existing ones in terms of running time, memory usage, effectiveness and robustness. This was successfully achieved by using a Delaunay triangulation in order to look for neighbouring points. With this, a k-Nearest-Neighbor search was implemented to classify points into three categories. This proved useful, not only for clearing noise in the data, but also to classify triangles into void and non-void triangles.

As was shown in Section 4.2.3, ORCA outperforms DELFIN in running time. This way, the new algorithm can process larger data sets quickly. We tested a data set with up to ∼ 250.000 points, and obtained good results in only half a minute.

In terms of memory usage, our algorithm uses light data structures, which means a lowered memory cost. As shown in Section 4.2.3, when processing data sets with increased number of points, ORCA uses only around a third of DELFIN’s memory. This can also prove extremely useful, since extremely populated data sets could lead to memory shortage, thus forcing memory swaps and slowing the running time immensely.

Regarding effectiveness, ORCA performs better on more dense data sets, finding voids better and overdetecting

47 less. In comparison to DELFIN, DELFIN can detect strange non-convex shaped voids a little better than ORCA, because of the way DELFIN appends triangles to large-edge triangles it finds, and the way it joins voids afterwards. Our algorithm, however, still detects strangely shaped voids, but just the bigger parts, and not the strangely shaped ones, like tentacle-shaped voids or similar.

Concerning robustness, our algorithm can detect voids even with a high percentage of noise in the data. We tested up to more than 200 random noise points inside a 200-radius circle, and the algorithm still detected a void. It is worth to notice, however, that random points near the outer shell of the circle should no longer be considered noise; now they are part of a voids border, and our algorithm treats them as such.

We successfully build a parallel version of ORCA, taking advantage of OpenCL’s API which allowed us to program ORCA in a multi-thread way, outsourcing the most time-consuming parts over to the GPU. This new, parallel algorithm has all the advantages of the original ORCA, but runs around two times as fast (see Table 4.1 and Section 3.4).

5.2 Future Work

First of all, ORCA can be further refined by reducing the number of parameters it uses. k and ε could be derived from the data, using density of points or desired void diameter.

Next, a new filter could be applied to the final voids, limiting them by area. Also, as is the case of DELFIN [?], voids on the border should be discarded, because they can be not only incomplete, but we do not know what lies beyond the data-set.

Another future version of ORCA could also be extended to more than two dimensions. This new 3D version of ORCA could be tested on real data, as found on the SDSS Surveys, the Millenium Simulation Project, or similar. In order to extend it to three dimensions, Delaunay tetrahedra can still be used to find neighbor efficiently. One of the main difficulties of extending the algorithm to more dimensions is the visualization of the data, mainly the voids the algorithm finds. It is harder to use visual inspection, to check voids. Another difficulty is the fact that when extending to three dimensional data sets the Delaunay Triangulation will have more points, and thus our previous running time of O(log n) will no longer be easily achievable.

Regarding parallelization, a new version could extend the idea of preparing and refining the data further, thus reducing the running time of the GPU. The triangles, on the last part of ORCA, could also be processed on GPU on a future version.

48 Bibliography

[1] J Aikio and P M¨ah¨onen.“A simple void-searching algorithm”. In: The Astrophysical Journal 497.2 (1998), p. 534. [2] Kory J Allred and Wei Luo. “Data-mining Based Detection of Glaciers: Quantifying the Extent of Alpine Valley Glaciation”. In: Geosciences 1.1 (2015), pp. 1–18. [3] R Alonso et al. “Delaunay based algorithm for finding polygonal voids in planar point sets”. In: Astronomy and Computing 22 (2018), pp. 48–62. [4] S Arbabi-Bidgoli and V M¨uller.“Void scaling and void profiles in cold dark matter models”. In: Monthly Notices of the Royal Astronomical Society 332.1 (2002), pp. 205–214. [5] Jon Louis Bentley. “Multidimensional binary search trees used for associative searching”. In: Communi- cations of the ACM 18.9 (1975), pp. 509–517. [6] Marc; Overmars Mark; Schwarzkopf Otfried de Berg Mark; van Kreveld. “Computing the Voronoi Diagram”. In: Computational Geometry (2nd revised ed.) Springer-Verlag, 2000. Chap. 7.2, pp. 151–160. [7] Graham Birley and Neil Moreland. A practical guide to academic research. Routledge, 2014. [8] CUDA. https://developer.nvidia.com/cuda-zone. Accessed: 2018-08-17. [9] Rupert F Chisholm and Max Elden. “Features of emerging action research”. In: Human Relations 46.2 (1993), pp. 275–298. [10] Paolo Cignoni et al. “DeWall: A fast divide and conquer Delaunay triangulation algorithm in Ed”. In: Computer-Aided Design 30.5 (1998), pp. 333–341. [11] J¨orgM Colberg et al. “Voids in a ΛCDM universe”. In: Monthly Notices of the Royal Astronomical Society 360.1 (2005), pp. 216–226. [12] Boris Delaunay. “Sur la sphere vide”. In: Izv. Akad. Nauk SSSR, Otdelenie Matematicheskii i Estestven- nyka Nauk 7.793-800 (1934), pp. 1–2. [13] Hagai El-Ad et al. “Automated detection of voids in redshift surveys”. In: The Astrophysical Journal Letters 462.1 (1996), p. L13. [14] Caroline Foster and Lorne A Nelson. “The size, shape, and orientation of cosmological voids in the sloan digital sky survey”. In: The Astrophysical Journal 699.2 (2009), p. 1252. [15] Jos´eGaite. “Zipf’s law for fractal voids and a new void-finder”. In: The European Physical Journal B-Condensed Matter and Complex Systems 47.1 (2005), pp. 93–98. [16] Stephen A Gregory and Laird A Thompson. “The Coma/A1367 and its environs”. In: The Astrophysical Journal 222 (1978), pp. 784–799. [17] G Kauffmann and AP Fairall. “Voids in the distribution of galaxies: an assessment of their significance and derivation of a void spectrum”. In: Monthly Notices of the Royal Astronomical Society 248.2 (1991), pp. 313–324. [18] Andreas Kl¨ockner et al. “PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation”. In: Parallel Computing 38.3 (2012), pp. 157–174. [19] Donald E Knuth. Fundamental algorithms: the art of computer programming. Vol. 3. 1973. [20] Andrey V Kravtsov et al. “Adaptive refinement tree: a new high-resolution N-body code for cosmological simulations”. In: The Astrophysical Journal Supplement Series 111.1 (1997), p. 73.

49 [21] Jounghun Lee and Daeseong Park. “Constraining the dark energy equation of state with cosmic voids”. In: The Astrophysical Journal Letters 696.1 (2009), p. L10. [22] Mark C Neyrinck. “ZOBOV: a parameter-free void-finding algorithm”. In: Monthly Notices of the Royal Astronomical Society 386.4 (2008), pp. 2101–2109. [23] Rodrigo Ignacio Alonso Ortega et al. “A Delaunay Tesselation based Void FInder Algorithm”. In: Universidad de Chile (2016). [24] Santiago G Patiri et al. “Statistics of voids in the two-degree Field Galaxy Redshift Survey”. In: Monthly Notices of the Royal Astronomical Society 369.1 (2006), pp. 335–348. [25] Alice Pisani et al. “Counting voids to probe dark energy”. In: Physical Review D 92.8 (2015), p. 083531. [26] Erwin Platen et al. “A cosmic watershed: the WVF void detection technique”. In: Monthly Notices of the Royal Astronomical Society 380.2 (2007), pp. 551–570. [27] Raimund Seidel. “The upper bound theorem for polytopes: an easy proof of its asymptotic version”. In: Computational Geometry 5.2 (1995), pp. 115–116. [28] Anatoly Klypin Stefan Gottl¨ober Ewa L. Lokas and Yehuda Hoffman. “The dark side of the halo occupation distribution”. In: The Astrophysical Journal 609.1 (2003), p. 35. [29] John E Stone et al. “OpenCL: A parallel programming standard for heterogeneous computing systems”. In: Computing in science & engineering 12.3 (2010), pp. 66–73. [30] PM Sutter et al. “VIDE: the Void IDentification and Examination toolkit”. In: Astronomy and Computing 9 (2015), pp. 1–9. [31] FHMS Vogeley. Voids in the PSCz Survey and the Updated Zwicky Catalog. Tech. rep. 2001. [32] Michael J Way et al. “Structure in the 3D Galaxy Distribution. II. Voids and Watersheds of Local Maxima and Minima”. In: The Astrophysical Journal 799.1 (2015), p. 95.

50