Dipartimento Di Informatica, Bioingegneria, Robotica Ed Ingegneria Dei Sistemi
Total Page:16
File Type:pdf, Size:1020Kb
Dipartimento di Informatica, Bioingegneria, Robotica ed Ingegneria dei Sistemi Computational and Theoretical Issues of Multiparameter Persistent Homology for Data Analysis by Sara Scaramuccia Theses Series DIBRIS-TH-2018-16 DIBRIS, Università di Genova Via Opera Pia, 13 16145 Genova, Italy http://www.dibris.unige.it/ Università degli Studi di Genova Dipartimento di Informatica, Bioingegneria, Robotica ed Ingegneria dei Sistemi Ph.D. Thesis in Computer Science and Systems Engineering Computer Science Curriculum Computational and Theoretical Issues of Multiparameter Persistent Homology for Data Analysis by Sara Scaramuccia May, 2018 Dottorato di Ricerca in Informatica ed Ingegneria dei Sistemi Indirizzo Informatica Dipartimento di Informatica, Bioingegneria, Robotica ed Ingegneria dei Sistemi Università degli Studi di Genova DIBRIS, Univ. di Genova Via Opera Pia, 13 I-16145 Genova, Italy http://www.dibris.unige.it/ Ph.D. Thesis in Computer Science and Systems Engineering Computer Science Curriculum (S.S.D. INF/01) Submitted by Sara Scaramuccia DIBRIS, Univ. di Genova [email protected] Date of submission: January 2018 Title: Computational and Theoretical Issues of Multiparameter Persistent Homology for Data Analysis Advisors: Leila De Floriani Dipartimento di Informatica, Bioingegneria, Robotica ed Ingegneria dei Sistemi Università di Genova, Italy leila.defl[email protected] Claudia Landi Dipartimento di Scienze e Metodi dell’Ingegneria Università di Modena e Reggio Emilia, Italy [email protected] Ext. Reviewers: *Silvia Biasotti, `Heike Leitte *Istituto di Matematica Applicata e Tecnologie Informatiche "Enrico Magenes" Consiglio Nazionale delle Ricerche: Genova, Italy [email protected] ` Fachbereich Informatik Technische Universität Kaiserslautern, Germany [email protected] 4 Abstract The basic goal of topological data analysis is to apply topology-based descriptors to understand and describe the shape of data. In this context, homology is one of the most relevant topological descriptors, well-appreciated for its discrete nature, computability and dimension independence. A further development is provided by persistent homology, which allows to track homological features along a one- parameter increasing sequence of spaces. Multiparameter persistent homology, also called multipersistent homology, is an extension of the theory of persistent homology motivated by the need of analyzing data naturally described by several parameters, such as vector-valued functions. Multipersistent homology presents several issues in terms of feasibility of computations over real-sized data and theoretical challenges in the evaluation of possible descriptors. The focus of this thesis is in the interplay between persistent homology theory and discrete Morse Theory. Discrete Morse theory provides methods for reducing the computational cost of homology and persis- tent homology by considering the discrete Morse complex generated by the discrete Morse gradient in place of the original complex. The work of this thesis addresses the problem of computing multipersistent homology, to make such tool usable in real application domains. This requires both computational optimizations towards the applications to real-world data, and theoretical insights for finding and interpreting suitable descriptors. Our computational contribution consists in proposing a new Morse-inspired and fully discrete preprocessing algorithm. We show the feasibility of our preprocessing over real datasets, and evaluate the impact of the proposed algorithm as a preprocessing for computing multipersistent homology. A theoretical contribution of this thesis consists in proposing a new notion of optimality for such a preprocessing in the multiparameter context. We show that the proposed notion generalizes an already known optimality notion from the one-parameter case. Under this definition, we show that the algorithm we propose as a preprocessing is optimal in low dimensional domains. In the last part of the thesis, we consider preliminary applications of the proposed algorithm in the context of topology-based multivariate visualization by tracking critical features generated by a discrete gradient field com- patible with the multiple scalar fields under study. We discuss (dis)similarities of such critical features with the state-of-the-art techniques in topology-based multivariate data visualization. i Contents List of Figures ix List of Tables xiii Introduction xv 1 Background notions1 1.1 Cell complexes...................................1 1.1.1 Simplicial complexes............................2 1.1.2 Cubical and cell complexes.........................4 1.2 Homology......................................7 1.2.1 Simplicial chain complexes........................8 1.3 Lefschetz complexes................................ 10 1.3.1 Relative homology and Mayer-Vietoris pairs............... 12 1.4 Discrete Morse Theory............................... 14 1.4.1 Basic notions in discrete Morse Theory.................. 14 1.4.2 Discrete Morse complex.......................... 16 1.5 One-parameter persistent homology........................ 17 1.5.1 One-parameter filtrations.......................... 18 1.5.2 One-parameter persistence module..................... 19 1.6 Multiparameter persistent homology........................ 21 1.6.1 Multiparameter filtrations......................... 21 1.6.2 Multiparameter persistence module.................... 23 1.7 Invariants in persistent homology.......................... 24 iii 1.7.1 Invariants in one-persistent homology................... 25 1.7.2 Invariants in multipersistent homology................... 27 1.7.3 Algebraic invariants in persistent homology................ 30 2 State of the art 35 2.1 Data structures representing simplicial and cell complexes............ 35 2.2 Filtrations...................................... 36 2.3 Computing one-persistent homology........................ 37 2.3.1 Integrated one-parameter techniques.................... 39 2.3.2 Preprocessing in one-persistent homology................. 40 2.3.3 Tools for one-persistent homology..................... 41 2.3.4 Discussion................................. 42 2.4 Computing multipersistent homology....................... 43 2.4.1 Invariants in multipersistent homology................... 44 2.4.2 Computing multipersistent descriptors................... 47 2.4.3 Preprocessing in multipersistent homology................ 50 2.4.4 Discussion................................. 51 2.5 Interpreting invariants in persistent homology................... 51 2.5.1 Distances between invariants........................ 52 2.5.2 Stability................................... 53 2.5.3 Towards Statistics.............................. 54 2.6 Discussion...................................... 54 3 Preliminaries on Morse-based preprocessings for persistence 55 3.1 Finding a discrete gradient over a single lower star: HomotopyExpansion ................................ 56 3.1.1 Input and output formalization....................... 56 3.1.2 Description................................. 58 3.1.3 Complexity remarks............................ 62 iv 3.2 A local preprocessing algorithm for one-persistent homology: ProcessLowerStars ................................. 63 3.2.1 Input and output formalization....................... 63 3.2.2 Description................................. 66 3.2.3 Complexity remarks............................ 67 3.3 A global preprocessing algorithm for multipersistent homology: Matching ...................................... 68 3.3.1 Input and output formalization....................... 69 3.3.2 Description................................. 71 3.3.3 Complexity remarks............................ 73 4 A local Morse-based preprocessing algorithm for multipersistent homology: ComputeDiscreteGradient 77 4.1 Input and output formalization........................... 77 4.2 Well-extensible indexing.............................. 78 4.3 Algorithm description................................ 80 4.3.1 Auxiliary functions............................. 82 4.4 Complexity..................................... 84 4.4.1 Analysis.................................. 84 4.4.2 Discussion................................. 87 4.5 Correctness..................................... 89 4.5.1 Corresponding partitions.......................... 90 4.5.2 ComputeDiscreteGradient and Matching are equivalent......... 91 4.6 From a discrete gradient to the Morse complex: BoundaryMaps ................................... 93 4.6.1 Input and output formalization....................... 93 4.6.2 Description in the simplicial case..................... 94 4.6.3 Description in the cubical case....................... 96 5 Optimality by relative homology 99 v 5.1 One-parameter optimality in terms of relative homology............. 99 5.2 Multiparameter optimality............................. 104 5.3 Filtered Morse complexes preserve relative homology............... 106 5.4 Algorithm ComputeDiscreteGradient is optimal................. 108 5.4.1 Optimality for 3D regular grids...................... 108 5.4.2 Optimality for abstract simplicial 2-complexes.............. 108 6 Experimental analysis and evaluation 113 6.1 Comparison to the Matching Algorithm...................... 113 6.1.1 Method................................... 113 6.1.2 Implementation............................... 114 6.1.3 Results................................... 115 6.1.4 Discussion................................. 116 6.2 Impact on the computation