Introduction to the R Package TDA

Total Page:16

File Type:pdf, Size:1020Kb

Introduction to the R Package TDA Introduction to the R package TDA Brittany T. Fasy, Jisu Kim, Fabrizio Lecci, Cl´ement Maria, David L. Millman, and Vincent Rouvreau In collaboration with the CMU TopStat Group Abstract We present a short tutorial and introduction to using the R package TDA, which pro- vides some tools for Topological Data Analysis. In particular, it includes implementations of functions that, given some data, provide topological information about the underlying space, such as the distance function, the distance to a measure, the kNN density estima- tor, the kernel density estimator, and the kernel distance. The salient topological features of the sublevel sets (or superlevel sets) of these functions can be quantified with persistent homology. We provide an R interface for the efficient algorithms of the C++ libraries GUDHI, Dionysus, and PHAT, including a function for the persistent homology of the Rips filtration, and one for the persistent homology of sublevel sets (or superlevel sets) of arbitrary functions evaluated over a grid of points. The significance of the features in the resulting persistence diagrams can be analyzed with functions that implement the methods discussed in Fasy, Lecci, Rinaldo, Wasserman, Balakrishnan, and Singh(2014), Chazal, Fasy, Lecci, Rinaldo, and Wasserman(2014c) and Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman(2014a). The R package TDA also includes the implementation of an algorithm for density clustering, which allows us to identify the spatial organization of the probability mass associated to a density function and visualize it by means of a dendrogram, the cluster tree. Keywords: Topological Data Analysis, Persistent Homology, Density Clustering. 1. Introduction Topological Data Analysis (TDA) refers to a collection of methods for finding topological structure in data (Carlsson 2009). Recent advances in computational topology have made it possible to actually compute topological invariants from data. The input of these pro- cedures typically takes the form of a point cloud, regarded as possibly noisy observations from an unknown lower-dimensional set S whose interesting topological features were lost during sampling. The output is a collection of data summaries that are used to estimate the topological features of S. One approach to TDA is persistent homology (Edelsbrunner and Harer 2010), a method for studying the homology at multiple scales simultaneously. More precisely, it provides a framework and efficient algorithms to quantify the evolution of the topology of a family of nested topological spaces. Given a real-valued function f, such as the ones described in Section2, persistent homology describes how the topology of the lower level sets fx : f(x) ≤ tg (or superlevel sets fx : f(x) ≥ tg) changes as t increases from −∞ to 1 (or decreases from 1 to −∞). This information is encoded in the persistence diagram, a multiset of points in 2 Introduction to the R package TDA the plane, each corresponding to the birth and death of a homological feature that existed for some interval of t. This paper is devoted to the presentation of the R package TDA, which provides a user- friendly interface for the efficient algorithms of the C++ libraries GUDHI (Maria 2014), Dionysus (Morozov 2007), and PHAT (Bauer, Kerber, and Reininghaus 2012). In Section2, we describe how to compute some widely studied functions that, starting from a point cloud, provide some topological information about the underlying space: the distance function (distFct), the distance to a measure function (dtm), the k Nearest Neighbor density estimator (knnDE), the kernel density estimator (kde), and the kernel distance (kernelDist). Section3 is devoted to the computation of persistence diagrams: the function gridDiag can be used to compute persistent homology of sublevel sets (or superlevel sets) of functions evaluated over a grid of points; the function ripsDiag returns the persistence diagram of the Rips filtration built on top of a point cloud. One of the key challenges in persistent homology is to find a way to isolate the points of the persistence diagram representing the topological noise. Statistical methods for persistent homology provide an alternative to its exact computation. Knowing with high confidence that an approximated persistence diagrams is close to the true{computationally infeasible{ diagram is often enough for practical purposes. Fasy et al. (2014), Chazal et al. (2014c), and Chazal et al. (2014a) propose several statistical methods to construct confidence sets for persistence diagrams and other summary functions that allow us to separate topological signal from topological noise. The methods are implemented in the TDA package and described in Section3. Finally, the TDA package provides the implementation of an algorithm for density clustering. This method allows us to identify and visualize the spatial organization of the data, without specific knowledge about the data generating mechanism and in particular without any a priori information about the number of clusters. In Section4, we describe the function clusterTree, that, given a density estimator, encodes the hierarchy of the connected components of its superlevel sets into a dendrogram, the cluster tree (Kpotufe and von Luxburg 2011; Kent 2013). 2. Distance Functions and Density Estimators As a first toy example to using the TDA package, we show how to compute distance functions and density estimators over a grid of points. The setting is the typical one in TDA: a set d of points X = fx1; : : : ; xng ⊂ R has been sampled from some distribution P and we are interested in recovering the topological features of the underlying space by studying some functions of the data. The following code generates a sample of 400 points from the unit circle and constructs a grid of points over which we will evaluate the functions. > library("TDA") > X <- circleUnif(400) > Xlim <- c(-1.6, 1.6); Ylim <- c(-1.7, 1.7); by <- 0.065 > Xseq <- seq(Xlim[1], Xlim[2], by = by) > Yseq <- seq(Ylim[1], Ylim[2], by = by) > Grid <- expand.grid(Xseq, Yseq) Brittany Terese Fasy, Jisu Kim, Fabrizio Lecci, Cl´ement Maria, David L. Millman, Vincent Rouvreau3 The TDA package provides implementations of the following functions: d • The distance function is defined for each y 2 R as ∆(y) = infx2X kx − yk2 and is computed for each point of the Grid with the following code: > distance <- distFct(X = X, Grid = Grid) • Given a probability measure P , the distance to measure (DTM) is defined for each d y 2 R as Z m0 1=r 1 −1 r dm0(y) = (Gy (u)) du ; m0 0 where Gy(t) = P (kX − yk ≤ t), and m0 2 (0; 1) and r 2 [1; 1) are tuning parame- ters. As m0 increases, DTM function becomes smoother, so m0 can be understood as a smoothing parameter. r affects less but also changes DTM function as well. The default value of r is 2. The DTM can be seen as a smoothed version of the distance function. See (Chazal, Cohen-Steiner, and M´erigot 2011, Definition 3.2) and (Chazal, Massart, and Michel 2015, Equation (2)) for a formal definition of the "distance to measure" function. Given X = fx1; : : : ; xng, the empirical version of the DTM is 0 11=r 1 X d^ (y) = kx − ykr ; m0 @k i A xi2Nk(y) where k = dm0 ∗ ne and Nk(y) is the set containing the k nearest neighbors of y among x1; : : : ; xn. For more details, see (Chazal et al. 2011) and (Chazal et al. 2015). The DTM is computed for each point of the Grid with the following code: > m0 <- 0.1 > DTM <- dtm(X = X, Grid = Grid, m0 = m0) d • The k Nearest Neighbor density estimator, for each y 2 R , is defined as ^ k δk(y) = d ; n vd rk(y) d where vn is the volume of the Euclidean d dimensional unit ball and rk(x) is the Eu- clidean distance form point x to its kth closest neighbor among the points of X. It is computed for each point of the Grid with the following code: > k <- 60 > kNN <- knnDE(X = X, Grid = Grid, k = k) 4 Introduction to the R package TDA d • The Gaussian Kernel Density Estimator (KDE), for each y 2 R , is defined as n 2 1 X −ky − xik p^ (y) = p exp 2 : h d 2h2 n( 2πh) i=1 where h is a smoothing parameter. It is computed for each point of the Grid with the following code: > h <- 0.3 > KDE <- kde(X = X, Grid = Grid, h = h) d • The Kernel distance estimator, for each y 2 R , is defined as v n n n u 1 X X 1 X κ^ (y) = u K (x ; x ) + K (y; y) − 2 K (y; x ); h tn2 h i j h n h i i=1 j=1 i=1 2 −kx−yk2 where Kh(x; y) = exp 2h2 is the Gaussian Kernel with smoothing parameter h. The Kernel distance is computed for each point of the Grid with the following code: > h <- 0.3 > Kdist <- kernelDist(X = X, Grid = Grid, h = h) For this 2 dimensional example, we can visualize the functions using persp form the graphics package. For example the following code produces the KDE plot in Figure1: > persp(Xseq, Yseq, + matrix(KDE, ncol = length(Yseq), nrow = length(Xseq)), xlab = "", + ylab = "", zlab = "", theta = -20, phi = 35, ltheta = 50, + col = 2, border = NA, main = "KDE", d = 0.5, scale = FALSE, + expand = 3, shade = 0.9) Brittany Terese Fasy, Jisu Kim, Fabrizio Lecci, Cl´ement Maria, David L. Millman, Vincent Rouvreau5 Distance Function DTM Sample X 1.0 0.5 0.0 −0.5 −1.0 −1.0 −0.5 0.0 0.5 1.0 kNN KDE Kernel Distance Figure 1: distance functions and density estimators evaluated over a grid of points. 2.1. Bootstrap Confidence Bands We can construct a (1 − α) confidence band for a function using the bootstrap algorithm, which we briefly describe using the kernel density estimator: 1.
Recommended publications
  • Unknot Recognition Through Quantifier Elimination
    UNKNOT RECOGNITION THROUGH QUANTIFIER ELIMINATION SYED M. MEESUM AND T. V. H. PRATHAMESH Abstract. Unknot recognition is one of the fundamental questions in low dimensional topology. In this work, we show that this problem can be encoded as a validity problem in the existential fragment of the first-order theory of real closed fields. This encoding is derived using a well-known result on SU(2) representations of knot groups by Kronheimer-Mrowka. We further show that applying existential quantifier elimination to the encoding enables an UnKnot Recogntion algorithm with a complexity of the order 2O(n), where n is the number of crossings in the given knot diagram. Our algorithm is simple to describe and has the same runtime as the currently best known unknot recognition algorithms. 1. Introduction In mathematics, a knot refers to an entangled loop. The fundamental problem in the study of knots is the question of knot recognition: can two given knots be transformed to each other without involving any cutting and pasting? This problem was shown to be decidable by Haken in 1962 [6] using the theory of normal surfaces. We study the special case in which we ask if it is possible to untangle a given knot to an unknot. The UnKnot Recogntion recognition algorithm takes a knot presentation as an input and answers Yes if and only if the given knot can be untangled to an unknot. The best known complexity of UnKnot Recogntion recognition is 2O(n), where n is the number of crossings in a knot diagram [2, 6]. More recent developments show that the UnKnot Recogntion recogni- tion is in NP \ co-NP.
    [Show full text]
  • Computational Topology and the Unique Games Conjecture
    Computational Topology and the Unique Games Conjecture Joshua A. Grochow1 Department of Computer Science & Department of Mathematics University of Colorado, Boulder, CO, USA [email protected] https://orcid.org/0000-0002-6466-0476 Jamie Tucker-Foltz2 Amherst College, Amherst, MA, USA [email protected] Abstract Covering spaces of graphs have long been useful for studying expanders (as “graph lifts”) and unique games (as the “label-extended graph”). In this paper we advocate for the thesis that there is a much deeper relationship between computational topology and the Unique Games Conjecture. Our starting point is Linial’s 2005 observation that the only known problems whose inapproximability is equivalent to the Unique Games Conjecture – Unique Games and Max-2Lin – are instances of Maximum Section of a Covering Space on graphs. We then observe that the reduction between these two problems (Khot–Kindler–Mossel–O’Donnell, FOCS ’04; SICOMP ’07) gives a well-defined map of covering spaces. We further prove that inapproximability for Maximum Section of a Covering Space on (cell decompositions of) closed 2-manifolds is also equivalent to the Unique Games Conjecture. This gives the first new “Unique Games-complete” problem in over a decade. Our results partially settle an open question of Chen and Freedman (SODA, 2010; Disc. Comput. Geom., 2011) from computational topology, by showing that their question is almost equivalent to the Unique Games Conjecture. (The main difference is that they ask for inapproxim- ability over Z2, and we show Unique Games-completeness over Zk for large k.) This equivalence comes from the fact that when the structure group G of the covering space is Abelian – or more generally for principal G-bundles – Maximum Section of a G-Covering Space is the same as the well-studied problem of 1-Homology Localization.
    [Show full text]
  • Algebraic Topology
    Algebraic Topology Vanessa Robins Department of Applied Mathematics Research School of Physics and Engineering The Australian National University Canberra ACT 0200, Australia. email: [email protected] September 11, 2013 Abstract This manuscript will be published as Chapter 5 in Wiley's textbook Mathe- matical Tools for Physicists, 2nd edition, edited by Michael Grinfeld from the University of Strathclyde. The chapter provides an introduction to the basic concepts of Algebraic Topology with an emphasis on motivation from applications in the physical sciences. It finishes with a brief review of computational work in algebraic topology, including persistent homology. arXiv:1304.7846v2 [math-ph] 10 Sep 2013 1 Contents 1 Introduction 3 2 Homotopy Theory 4 2.1 Homotopy of paths . 4 2.2 The fundamental group . 5 2.3 Homotopy of spaces . 7 2.4 Examples . 7 2.5 Covering spaces . 9 2.6 Extensions and applications . 9 3 Homology 11 3.1 Simplicial complexes . 12 3.2 Simplicial homology groups . 12 3.3 Basic properties of homology groups . 14 3.4 Homological algebra . 16 3.5 Other homology theories . 18 4 Cohomology 18 4.1 De Rham cohomology . 20 5 Morse theory 21 5.1 Basic results . 21 5.2 Extensions and applications . 23 5.3 Forman's discrete Morse theory . 24 6 Computational topology 25 6.1 The fundamental group of a simplicial complex . 26 6.2 Smith normal form for homology . 27 6.3 Persistent homology . 28 6.4 Cell complexes from data . 29 2 1 Introduction Topology is the study of those aspects of shape and structure that do not de- pend on precise knowledge of an object's geometry.
    [Show full text]
  • How Persistent Homology Can Be Useful in Predicting Proteins Functions
    University of Rabat (Morocco) Faculty of Medicine MASTER BIOTECHNOLOGIE MEDICALE´ OPTION "BIOINFORMATIQUE" Projet Fin Annee´ (PFA) How persistent homology can be useful in predicting proteins functions Author: Zakaria Lamine Jury: President : ..... Examinator : Advisor : My Ismail Mamouni, CRMEF Rabat Soutenu le .... Dedication I dedicate this modest work to : My parents, especially my mother for all the sacrifices she has made, for all her assistance and presence in my life, my father for all his sacrifice and support. Thanks I thank GOD for giving me the strength to do this work. I thank my family for the support and the sacrifice. I thank my professor My Ismail Mamouni for his support and for all the discussions that have always been fruitful it is an honour to be one of his students. Introduction Persistent homology is a method for computing topological features of a space at different spatial resolutions. More persistent features are detected over a wide range of length and are deemed more likely to represent true features of the underlying space, rather than artifacts of sampling, noise, or particular choice of parameters. To find the persistent homology of a space, the space must first be represented as a simplicial complex. A distance function on the underlying space corresponds to a filtration of the simplicial complex, that is a nested sequence of increasing subsets. In other words persistent homology characterizes the geometric features with persistent topological invariants by defining a scale parameter relevant to topological events. Through filtration and persistence, persistent homology can capture topological structures continuously over a range of spatial scales.
    [Show full text]
  • Christopher J. Tralie –
    B [email protected] Í www.ctralie.com Christopher J. Tralie ctralie Research Interests Geometric signal processing, applied topology, nonlinear time series analysis, music information retrieval, video processing, computer graphics Academic Positions 8/1/2019 - Assistant Professor of Mathematics And Computer Science, Ursinus College. Col- Present legeville, Pennsylvania. 1/2019 - Postdoctoral Fellow, Department of Chemical And Biomedical Engineering, Johns 6/2019 Hopkins University. Baltimore, Maryland. 4/2017 - Postdoctoral Associate, Department of Mathematics, Duke University. Durham, 12/2018 North Carolina. Education 2011 - 2017 Ph.D., Duke University, Durham, NC, Electrical and Computer Engineering with Cer- tificate in College Teaching. Advisers: Guillermo Sapiro, John Harer Dissertation Title: “Geometric Multimedia Time Series” 2011 - 2013 M.S., Duke University, Durham, NC, Electrical and Computer Engineering. 2007 - 2011 B.S.E., Princeton University, Princeton, NJ, Electrical Engineering with Certificate in Computer Science (Cum Laude). Honors and Awards 2018 Deezer Hacking Audio And Music Research (HAMR) Best Code Award, Paris, France 2016 Top 5% Teachers At Duke: Dean’s award for ranking among top 5% (university wide) in student evaluations for Quality of Course or Intellectual Stimulation, Duke University, Spring 2016 2015 Duke University Department of Electrical Engineering Best Poster Award 2015 Duke University Bass Family Teaching Fellowship 2011 National Science Foundation Graduate Fellowship 2011 G. David Forney Jr. Prize in Signals and Systems at Princeton University 2009 Summer Undergraduate Fellowship in Robotics at Duke University: Awarded through the National Science Foundation’s Research Experience for Undergraduates (REU) Program 2007 Lockheed Martin National Merit Scholar 2006 Pennsylvania Governor School for the Sciences, Carnegie Mellon University Publications Journal Publications Paul Bendich, Ellen Gasparovic, John Harer, and Christopher J.
    [Show full text]
  • Topics in Low Dimensional Computational Topology
    THÈSE DE DOCTORAT présentée et soutenue publiquement le 7 juillet 2014 en vue de l’obtention du grade de Docteur de l’École normale supérieure Spécialité : Informatique par ARNAUD DE MESMAY Topics in Low-Dimensional Computational Topology Membres du jury : M. Frédéric CHAZAL (INRIA Saclay – Île de France ) rapporteur M. Éric COLIN DE VERDIÈRE (ENS Paris et CNRS) directeur de thèse M. Jeff ERICKSON (University of Illinois at Urbana-Champaign) rapporteur M. Cyril GAVOILLE (Université de Bordeaux) examinateur M. Pierre PANSU (Université Paris-Sud) examinateur M. Jorge RAMÍREZ-ALFONSÍN (Université Montpellier 2) examinateur Mme Monique TEILLAUD (INRIA Sophia-Antipolis – Méditerranée) examinatrice Autre rapporteur : M. Eric SEDGWICK (DePaul University) Unité mixte de recherche 8548 : Département d’Informatique de l’École normale supérieure École doctorale 386 : Sciences mathématiques de Paris Centre Numéro identifiant de la thèse : 70791 À Monsieur Lagarde, qui m’a donné l’envie d’apprendre. Résumé La topologie, c’est-à-dire l’étude qualitative des formes et des espaces, constitue un domaine classique des mathématiques depuis plus d’un siècle, mais il n’est apparu que récemment que pour de nombreuses applications, il est important de pouvoir calculer in- formatiquement les propriétés topologiques d’un objet. Ce point de vue est la base de la topologie algorithmique, un domaine très actif à l’interface des mathématiques et de l’in- formatique auquel ce travail se rattache. Les trois contributions de cette thèse concernent le développement et l’étude d’algorithmes topologiques pour calculer des décompositions et des déformations d’objets de basse dimension, comme des graphes, des surfaces ou des 3-variétés.
    [Show full text]
  • 25 HIGH-DIMENSIONAL TOPOLOGICAL DATA ANALYSIS Fr´Ed´Ericchazal
    25 HIGH-DIMENSIONAL TOPOLOGICAL DATA ANALYSIS Fr´ed´ericChazal INTRODUCTION Modern data often come as point clouds embedded in high-dimensional Euclidean spaces, or possibly more general metric spaces. They are usually not distributed uniformly, but lie around some highly nonlinear geometric structures with nontriv- ial topology. Topological data analysis (TDA) is an emerging field whose goal is to provide mathematical and algorithmic tools to understand the topological and geometric structure of data. This chapter provides a short introduction to this new field through a few selected topics. The focus is deliberately put on the mathe- matical foundations rather than specific applications, with a particular attention to stability results asserting the relevance of the topological information inferred from data. The chapter is organized in four sections. Section 25.1 is dedicated to distance- based approaches that establish the link between TDA and curve and surface re- construction in computational geometry. Section 25.2 considers homology inference problems and introduces the idea of interleaving of spaces and filtrations, a funda- mental notion in TDA. Section 25.3 is dedicated to the use of persistent homology and its stability properties to design robust topological estimators in TDA. Sec- tion 25.4 briefly presents a few other settings and applications of TDA, including dimensionality reduction, visualization and simplification of data. 25.1 GEOMETRIC INFERENCE AND RECONSTRUCTION Topologically correct reconstruction of geometric shapes from point clouds is a classical problem in computational geometry. The case of smooth curve and surface reconstruction in R3 has been widely studied over the last two decades and has given rise to a wide range of efficient tools and results that are specific to dimensions 2 and 3; see Chapter 35.
    [Show full text]
  • On the Treewidth of Triangulated 3-Manifolds
    On the Treewidth of Triangulated 3-Manifolds Kristóf Huszár Institute of Science and Technology Austria (IST Austria) Am Campus 1, 3400 Klosterneuburg, Austria [email protected] https://orcid.org/0000-0002-5445-5057 Jonathan Spreer1 Institut für Mathematik, Freie Universität Berlin Arnimallee 2, 14195 Berlin, Germany [email protected] https://orcid.org/0000-0001-6865-9483 Uli Wagner Institute of Science and Technology Austria (IST Austria) Am Campus 1, 3400 Klosterneuburg, Austria [email protected] https://orcid.org/0000-0002-1494-0568 Abstract In graph theory, as well as in 3-manifold topology, there exist several width-type parameters to describe how “simple” or “thin” a given graph or 3-manifold is. These parameters, such as pathwidth or treewidth for graphs, or the concept of thin position for 3-manifolds, play an important role when studying algorithmic problems; in particular, there is a variety of problems in computational 3-manifold topology – some of them known to be computationally hard in general – that become solvable in polynomial time as soon as the dual graph of the input triangulation has bounded treewidth. In view of these algorithmic results, it is natural to ask whether every 3-manifold admits a triangulation of bounded treewidth. We show that this is not the case, i.e., that there exists an infinite family of closed 3-manifolds not admitting triangulations of bounded pathwidth or treewidth (the latter implies the former, but we present two separate proofs). We derive these results from work of Agol and of Scharlemann and Thompson, by exhibiting explicit connections between the topology of a 3-manifold M on the one hand and width-type parameters of the dual graphs of triangulations of M on the other hand, answering a question that had been raised repeatedly by researchers in computational 3-manifold topology.
    [Show full text]
  • CMSC 754: Lecture 18 Introduction to Computational Topology
    CMSC 754 Ahmed Abdelkader (Guest) CMSC 754: Lecture 18 Introduction to Computational Topology The introduction presented here is mainly based on Edelsbrunner and Harer's Computational Topology, while drawing doses of inspiration from Ghrist's Elementary Applied Topology; it is mostly self-contained, at the expense of brevity and less rigor here and there to fit the short span of one or two lectures. An excellent textbook to consult is Hatcher's Algebraic Topology, which is freely available online: https://pi.math.cornell.edu/~hatcher/AT/AT.pdf. 2 What is Topology? We are all familiar with Euclidean spaces, especially the plane R where we 3 draw our figures and maps, and the physical space R where we actually live and move about. Our direct experiences with these spaces immediately suggest a natural metric structure which we can use to make useful measurements such as distances, areas, and volumes. Intuitively, a metric recognizes which pairs of locations are close or far. In more physical terms, a metric relates to the amount of energy it takes to move a particle of mass from one location to another. If we are able to move particles between a pair of locations, we say the locations are connected, and if the locations are close, we say they are neighbors. In every day life, we frequently rely more upon the abstract notions of neighborhoods and connectedness if we are not immediately concerned with exact measurements. For instance, it is usually not a big deal if we miss the elevator and opt to take the stairs, or miss an exit on the highway and take the next one; these pairs of paths are equivalent if we are not too worried about running late to an important appointment.
    [Show full text]
  • Download Malware Onto Their Devices
    P a g e | 1 Application of Data Analytics to Cyber Forensic Data A Major Qualifying Project Report submitted to the Faculty of the Worcester Polytechnic Institute in partial fulfillment of the requirements for the Degree of Bachelor of Science By Yao Yuan Chow Date: October 13, 2016 Approved: Professor: George T. Heineman, Computer Science Advisor Professor: Randy C. Paffenroth, Mathematical Sciences Advisor Sponsored by: MITRE Corporation P a g e | 2 Table of Contents Table of Figures..................................................................................................................................4 Table of Tables...................................................................................................................................5 Abstract .............................................................................................................................................6 Acknowledgements............................................................................................................................7 Executive Summary............................................................................................................................8 1 Introduction ............................................................................................................................. 12 2 Background .............................................................................................................................. 15 2.1 Tactical Cyber Threat Intelligence ......................................................................................
    [Show full text]
  • Persistent Homology and the Topology of Motor Cortical Activity
    Persistent homology and the topology of motor cortical activity Julian Salazar and Emma West [email protected] emma [email protected] · Abstract. We introduce the underlying principles of topological data analysis via persistent homology. We then preview how these can be applied to analyze the activity-based topologies of various neuronal classes in the motor cortex, along with their evolution as induced by engagement in a physical task. Contents 1. Introduction1 1.1. Topological data analysis1 1.2. Motor cortical activity3 2. Mathematical background4 2.1. Simplicial complexes4 2.2. Homology groups6 2.3. Persistent homology 10 2.4. Witness complexes 12 3. Data collection and classification 13 3.1. Experimental setup 13 3.2. Neuronal classifications 14 4. Topological analysis 16 4.1. Prior work 16 4.2. Activity and witness spaces 16 4.3. Persistence barcodes 17 5. Conclusions 18 Acknowledgements 20 References 20 1. Introduction 1.1. Topological data analysis Topology is the branch of mathematics which studies the qualitative geometry of a space. This includes information about connectivity and higher-dimensional structures such as holes and voids, attributes that are preserved under homeo- morphisms like continuous deformation. In recent years, the field of topological data analysis (TDA) has emerged as a way to employ topological tools such as homology to point clouds of discrete data. This technique is particularly useful in recovering qualitative information about a data set that might not be recognized using conventional analytic methods, as these methods often make assumptions of distance-dependence and linear separability. TDA generally returns a summary of 1 2 JULIAN SALAZAR AND EMMA WEST the overall data structure that is insensitive to coordinates and metric, via exam- ination of the relationships between functorial geometric constructions that arise from the data [Car09].
    [Show full text]
  • Persistence Homology of Networks: Methods and Applications
    Aktas et al. RESEARCH Persistence Homology of Networks: Methods and Applications Mehmet E Aktas1*, Esra Akbas2 and Ahmed El Fatmaoui1 *Correspondence: [email protected] Abstract 1Department of Mathematics and Statistics, University of Information networks are becoming increasingly popular to capture complex Central Oklahoma, Edmond, OK, relationships across various disciplines, such as social networks, citation networks, USA and biological networks. The primary challenge in this domain is measuring Full list of author information is available at the end of the article similarity or distance between networks based on topology. However, classical graph-theoretic measures are usually local and mainly based on differences between either node or edge measurements or correlations without considering the topology of networks such as the connected components or holes. In recent years, mathematical tools and deep learning based methods have become popular to extract the topological features of networks. Persistent homology (PH) is a mathematical tool in computational topology that measures the topological features of data that persist across multiple scales with applications ranging from biological networks to social networks. In this paper, we provide a conceptual review of key advancements in this area of using PH on complex network science. We give a brief mathematical background on PH, review different methods (i.e. filtrations) to define PH on networks and highlight different algorithms and applications where PH is used in solving network mining problems. In doing so, we develop a unified framework to describe these recent approaches and emphasize major conceptual distinctions. We conclude with directions for future work. We focus our review on recent approaches that get significant attention in the mathematics and data mining communities working on network data.
    [Show full text]