Fractal Dimension and Space Filling Curve Information Systems And
Total Page:16
File Type:pdf, Size:1020Kb
Fractal Dimension and Space Filling Curve Approximate Space-filling Curve Ana Karina Tavares da Moura Gomes Dissertation submitted to obtain the Master Degree in Information Systems and Computer Engineering Supervisor: Prof. Dr. Andreas Wichert Examination Committee Chairperson: Prof. Dr. Mário Jorge Costa Gaspar da Silva Supervisor: Prof. Dr. Andreas Miroslaus Wichert Member of the Committee: Prof. Dr. Pável Pereira Calado June 2015 ii Dedicated to my beloved son Enzo. You were my greatest motivation to finish the course and not give up, regardless of the difficulties. I love you so much Zo!ˆ iii iv Acknowledgments First of all, I would like to thank my mother Ju´ and my sister Carol for all the support they gave me. I have special thanks to Carol who so many times listened me without understand anything just to help me think out loud. Thank you for all the love and care. Love you sis! I would like to thank my advisor for the patience and for the advices, but mainly for help me growing as a student and future professional allowing me to think and decide for my own head. Thank you! I would like to thank my mother-in-law Cristina for taking care of my child allowing me to work on the afternoons. I would like to thank my closest friends Ana Silva and Jose´ Leira for all the support. Thank you for soften the worst days of work. You are one of the greatest wealth that I gained from this course. Ana, thank you for not giving up on me and for being an angel! Ze,´ thank you for being my thesis partner and for listening all my endless doubts. Finally, I have a special thanks to my husband Tiago. Thanks for not judge me, for believe in me (even when I didn’t) and for your endless patience. There are no words to describe how grateful I am for having you in my life. Thank you for all the love, all the support and comprehension during all this years. Love you! v vi Resumo Curvas preenchedoras de espac¸o sao˜ fractais gerados por computador que podem ser usadas para indexar espac¸os de baixas dimensoes.˜ Existem estudos anteriores que as usam como metodo´ de acesso nestes cenarios,´ contudo sao˜ muito conservadores usando-as apenas em espac¸os ate´ quatro dimensoes.˜ Adicionalmente, os metodos´ alternativos tendem a apresentar um desempenho pior do que uma pesquisa linear quando os espac¸os ultrapassam as dez di- mensoes.˜ Deste modo, no contexto da minha tese, estudo as curvas preenchedoras de espac¸o e as suas propriedades assim como os desafios apresentados pelos dados multidimensionais. Eu proponho o uso destas, especificamente da curva de Hilbert, como um metodo´ de acesso para indexar pontos multidimensionais ate´ trinta dimensoes.˜ Eu comec¸o por mapear os pontos para a curva de Hilbert gerando os seus h-values. Em seguida, desenvolvo tresˆ heuristicas para procurar vizinhos aproximadamente mais proximos´ de um dado ponto de pesquisa com o objec- tivo de testar o desempenho da curva. Duas heur´ısticas usam a curva com metodo´ de acesso direto e a restante usa a curva como chave secundaria´ combinada com uma variante da B-tree. Estas resultam de um processo iterativo que basicamente consiste no planeamento, concepc¸ao˜ e teste da heur´ıstica. De acordo com os resultados do teste, a heur´ıstica e´ alterada ou e´ criada uma nova. Os resultados experimentais com as tresˆ heur´ısticas provam que a curva de Hilbert pode ser usada como metodo´ de acesso e que esta consegue funcionar pelo menos em espac¸os ate´ trinta e seis dimensoes.˜ Palavras-chave: Fractais, Curva de Hilbert Preenchedora de Espac¸o, Indexac¸ao˜ de Baixas Dimensoes,˜ Vizinho Aproximadamente Mais Proximo´ vii viii Abstract Space-filling curves are computer generated fractals that can be used to index low-dimensional spaces. There are previous studies using them as an access method in these scenarios, although they are very conservative only applying them up to four-dimensional spaces. Additionally, the alternative access methods tend to present worse performance than a linear search when the spaces surpass the ten dimensions. Therefore, in the context of my thesis, I study the space-filling curves and their properties as well as challenges presented by multidimensional data. I propose their use, specifically the Hilbert curve, as an access method for indexing multidimensional points up to thirty dimensions. I start by mapping the points to the Hilbert curve generating their h-values. Then, I develop three heuristics to search for approximate nearest neighbors of a given query point with the aim of testing the performance of the curve. Two of the heuristics use the curve as a direct access method and the other uses the curve as a secondary key retrieval combined with a B-tree variant. These result from an iterative process that basically consists of planning, conceiving and testing the heuristic. According to the test results, the heuristic is adjusted or a new one is created. Experimental results with the three heuristics prove that the Hilbert curve can be used as an access method, and that it can operate at least in spaces up to thirty-six dimensions. Keywords: Fractals, Hilbert Space-Filling Curve, Low-Dimensional Indexing, Approximate Nearest Neighbor ix x Contents Acknowledgments..............................................v Resumo................................................... vii Abstract................................................... ix List of Tables................................................ xiii List of Figures................................................ xvi 1 Introduction 1 1.1 Hypothesis and Methodology.....................................2 1.2 Main contributions..........................................2 1.3 Document Outline...........................................3 2 Multidimensional Data 5 2.1 Introduction..............................................5 2.2 Multidimensional Relations......................................6 2.3 Vector Space.............................................7 2.4 Multidimensional Query.......................................9 2.4.1 Range Query.........................................9 2.4.2 Nearest Neighbor Query...................................9 2.5 Multidimensional Index........................................ 10 2.5.1 R-tree............................................. 11 2.5.2 Kd-tree............................................ 12 2.6 Summary............................................... 13 3 Fractals 15 3.1 Introduction.............................................. 15 3.2 The Hausdorff Dimension...................................... 16 3.3 Space-Filling Curves......................................... 18 3.3.1 Z-Curve............................................ 18 3.3.2 Hilbert Curve......................................... 20 3.4 Higher Dimensions.......................................... 24 3.5 Clustering Property.......................................... 28 3.6 Fractals: An Access Method..................................... 30 xi 3.6.1 Secondary Key Retrieval................................... 30 3.6.2 Direct Access Index..................................... 32 3.7 Summary............................................... 34 4 Approximate Space-Filling Curve 35 4.1 Motivation............................................... 35 4.2 Methodology............................................. 36 4.3 Data Description........................................... 37 4.4 Mapping System........................................... 39 4.5 Dataset Analysis based on a Linear Search.............................. 40 4.6 Experiment 1: Hypercube Zoom Out................................. 43 4.7 Experiment 2: Enzo Full Space.................................... 46 4.8 Experiment 3: Enzo Reduced Space................................. 48 4.9 Summary............................................... 52 5 Conclusions 55 5.1 Contributions............................................. 56 5.2 Future Work.............................................. 57 Bibliography 62 A Appendix 63 A.1 Chapter: Fractals........................................... 63 A.2 Chapter: Approximate Space-Filling Curve............................. 65 xii List of Tables 4.1 Datasets Description......................................... 40 4.2 Neighbor Distance to Query Point per Dimension.......................... 41 4.3 HZO Runtime Results......................................... 44 4.4 HZO Approximate Nearest Neighbors Results............................ 45 4.5 Enzo FS Runtime Results....................................... 47 4.6 Enzo FS Approximate Nearest Neighbors Results.......................... 47 4.7 Enzo RS Runtime Results....................................... 49 4.8 Enzo RS Relative Error........................................ 51 A.1 Gray Code............................................... 63 A.2 Enzo RS Results........................................... 65 xiii xiv List of Figures 2.1 Examples of Multidimensional Data.................................7 2.2 Example of Unit Circles for L1, L2 and L1 Norms.........................8 2.3 R-tree Planar and Directory Representation............................. 11 2.4 Kd-tree Planar and Directory Representation............................. 12 3.1 Ratio Similarity Example 1...................................... 16 3.2 Ratio Similarity Example 2...................................... 16 3.3 Hausdorff Dimension Analysis.................................... 17 3.4 Z-curve................................................ 19 3.5 Z-curve Bit Shuffling......................................... 19 3.6 Z-curve Query Example......................................