<<

THREE EXAMPLES OF APPLIED & COMPUTATIONAL HOMOLOGY

Robert Ghrist 1 as an appetizer, for which the truncated bibliogra- phy serves as a menu for the second course. Algebraic as Applied Mathe- On Homology. matics? Homology is a machine that converts local data is limitless in its dual capacity for about a space into global algebraic structure. In abstraction and incarnation. To a large degree, its simplest form, homology takes as its argument many of the modern revolutions in technology and simple pieces of a X and re- information rest on piers of mathematics that assist, turns a sequence of abelian groups Hk(X), k ∈ inform, or otherwise catalyze progress. It appears N. Homology is a , which in practice that those branches of mathematics which are most means: (1) topologically equivalent spaces (ho- easily understood and communicated are precisely motopic) have algebraically equivalent (isomor- those which find greatest applicability in the mod- phic) homology groups; and (2) topological maps ern world. To conclude from this that deeper or between spaces f : X → Y induce algebraic more difficult fields are inherently less applicable maps (homomorphisms) on homology groups f∗ : would be premature. H∗(X) → H∗(Y ). Numerous homology Consider for example the utility of algebraic exist, fine-tuned for different classes of spaces (sim- topology. Long cloistered behind formal and cat- plicial, cellular, singular, etc.). egorical walls, this branch of mathematics has been Roughly speaking, homology groups count and the source of little in the way of concrete applica- collate holes in a space. The simplest example tions, as compares with its more analytic or com- of a homological is the number of con- binatorial cousins. In this author’s opinion, this is nected components of a space — dim H0 — the not due to a fundamental lack of applicability so type of ‘holes’ that a zero-dimensional instrument much as to (1) the lack of a motivating exposition of can measure. A less trivial example of a homo- the tools for practitioners; and (2) an historical lack logical invariant is the . The of emphasis on computational features of the the- Euler characteristic χ of a triangulated is ory. These two issues are coupled. Advances which the alternating sum of the number of simplices demonstrate the utility of a topological spur — vertices minus edges plus faces — and that the need for good computation. Good algorithms this quantity is a topological invariant of the sur- for computing topological data spur the search for face. For more general (but tame) spaces, χ(X) further applications. can be expressed either as the alternating sum of is the mathematics that arises the number of k-dimensional cells of X, or, as in the attempt to describe the global features of a ∞ k Pk=0(−1) dim Hk(X). This quantity, being based space via local data. That such tools have utility on homology, is an invariant. It is a signal exam- in applied problems concerning large data sets is ple of a homological device, being both computable not difficult to argue. To give a sense of what is and invariant. Our first example of applied alge- possible, we sketch three recent examples of spe- braic topology relies on this invariant. cific applications of homological tools. This list is neither inclusive nor ranked: these examples were Example 1: “How many people are in the chosen for concreteness, simplicity, and timeliness. building?” This brief and woefully incomplete sketch is meant Problem: Target Enumeration. Consider a store 1 Department of Mathematics and Coordinated Science Labo- whose ceiling tiles, walls, and carpet are embed- ratory, University of Illinois, Urbana-Champaign. The author gratefully acknowledges the support of DARPA and the Na- ding with people-counting sensors. How can these tional Science Foundation. local sensors collaborate to determine the number

1 2 of customers in the store? Tool: Euler Characteristic Integration. One of the fundamental difficulties in large-scale sensor networks is data aggregation. A sufficiently dense collection of nodes will sample an environ- ment redundantly. The goal of sensing is to com- press this redundant local data into a global de- scription of the environment. The operation of stitching local information over patches is the fun- damental defining property of a sheaf, a means of assigning an algebraic object to open subsets of a space in such a manner that restrictions and over- laps are respected. As an example, consider the problem of counting a collection of targets. Some fixed but unknown number N of targets lie in a domain D. The do- main is filled with sensors, each of which can deter- mine how many targets are nearby. It matters not how the sensors operate (e.g., via infrared, acoustic, d or optical sensing). Assume simple sensors which merely detect the number of nearby targets, with no information about target identity, distance, or bear- ing. In the limit (where one has a sensor at each point in D), this yields a counting function Figure 1: Integration with respect to Euler charac- h : D → N. The problem is to determine the num- teristic enumerates redundant data over a sensor ber of targets, given only h. network. The solution lies in an elegant integration theory which uses Euler characteristic as a measure. For compact sets A, B, the Euler characteristic satisfies 2. Thanks to a version of the Fubini theorem for χ(A ∪ B) = χ(A) + χ(B) − χ(A ∩ B). Note the dχ, one can count moving targets over time similarity of this to the definition of a measure. In- without the need to embed clocks on the sen- deed, χ is a type of scale-invariant topological vol- sor nodes. ume, as was known going back to Hadwiger and Blaschke at least. It is straightforward to construct 3. Because integration is a local operation, target- a measure dχ against which one can integrate cer- counting can be performed by the network it- tain functions. The type of piecewise-constant or self with a distributed, local computation. constructible function h : D → N that a sensor field returns is integrable in this theory. Moral: “Data aggregation is a topological integra- Recent work of Baryshnikov et al. [1] gives a sim- tion.” ple formula for computing the number of targets as Example 2: “What does the data look RD h dχ, in the setting where each target is detected by sensors on a topologically trivial (e.g., convex) like?” neighborhood. Because this is a topological inte- Problem: High-Dimensional Data Analysis. Given gration theory, there are no geometric restrictions. a large, high-dimensional data set, how can one de- Sensors can, e.g., count the number of vehicles driv- termine its shape and structure? ing over a domain laced with vibration sensors, Tool: . counting subcompacts and SUV’s as equals. Though the subject of topology is often intro- This is the starting point for a broad array of ap- duced in terms of doughnuts, coffee cups, knots, plications which rely on constructible sheaves and or other visual icons, the true strength of topology the sheaf-theoretic properties of dχ. Precisely be- is the ease with which it analyses high-dimensional cause the answer is expressed in terms of an inte- objects. The impact of this strength is perhaps best gration theory, one can do the following: asserted in data-analysis, where the incoming rate 1. For a sparse network of sensor nodes, deter- of large, high-dimensional data sets currently far mining the number of targets becomes the nu- exceeds statisticians’ abilities to analyse and de- merical problem of approximating the topo- scribe the data sets. logical integral via a discrete sampling. Assume for the sake of argument that one is 3

H( ) *

Figure 2: Persistent homology of a simplicial approximation finds hidden structures in large data sets.

given a data set that consists of a sampling (per- set) persist over the range of parameters [ǫ,ǫ′]. haps, though not necessarily random) of a reason- Carlsson et al. use the classification of modules n able subset X ⊂ E of . Nature has over a polynomial (with field coefficients) to trained the human brain to reconstructing shapes compute persistent homology and to correlate it from planar projections, but this works only for cer- with the birth and death of topological features in tain (small!) values of n. the data [7]. This allows a principled and automatic Knowing the homology of X is a good basis for distillation of complex data sets into global features asserting the global features of the ‘true’ model X — a method that does not rely on projections or of the data. Several basic statistical ideas – e.g., clus- heuristics. tering – are readily seen to correspond to some- Specific successes of the method include the fol- thing homological, in this example dim H0. The lowing. natural question presents itself: how can one com- pute H∗(X) from a discrete sampling of points N ⊂ 1. Persistent homology was used [2] to find sig- X? nificant features hidden in a large data set of pixellated natural images compressed onto a The work of Carlsson et al. employs the following 7-dimensional ; most notable is a persis- strategy. Fix a parameter ǫ > 0, and build a sim- tent in H2, which in turn yields plicial complex Rǫ as follows: a k- of Rǫ insights into the structure of the space of natu- is a collection of k +1 data points in N pairwise n ral images. within distance ǫ. Fixing X ⊂ E a , then for ǫ sufficiently small and N sufficiently dense, the 2. Recent work [3] uses persistent homology to complex Rǫ has the same type (and thus find hidden structures in experimental data as- homology) as X. However, one is given a fixed data sociated with the V1 visual cortex of certain set, and further refinement maybe be expensive or primates. impossible. Thus, one is forced to vary ǫ. Which ǫ best captures the true topology of the underlying Moral: “The shape of the data lies not in a single data set? For ǫ too small, Rǫ is a discrete set; for ǫ space, but in a diagram of spaces.” too large, Rǫ is a single simplex. In this context, the golden mean may not exist. Example 3: “It looks chaotic to me!” Algebraic-topology suggests a functional ap- Problem: Experimental Verification of Chaotic Dy- proach. One of the simplest and best insights of namics. An experiment (physical or numerical) the Grothendieck programme is the notion that the yields data that looks chaotic. Is it rigorously topology of a given space is framed in the map- chaotic, or just noisy? pings to or from that space. With this perspec- Tool: Conley Index Theory. tive as guide, one considers the ordered sequence One of the great scientific lessons of the 20th of spaces {Rǫ} for ǫ > 0, stitched together by in- ǫ→ǫ′ ′ century was that when a physical system exhibits clusion maps ι : Rǫ ֒→ Rǫ′ for ǫ<ǫ . The erratic temporal behaviour, it may not be due to ǫ→ǫ′ homology of the family of maps ι is the called randomness or poor measurement — determinis- ǫ→ǫ′ the persistent homology of the data set: ι∗ cap- tic systems can exhibit well-defined chaos. How- tures which homological features (holes in the data ever, it is a persistent challenge to demonstrate that 4 a given system is chaotic. The Lorenz equations — themselves a cartoon model of fluid flow — were only recently shown to be rigorously chaotic, af- ter more than thirty years’ inquiry. Still more in- tractable remain data coming from physical experi- ments, in which system noise and instrument errors conspire to frustrate analysis. There seems to be lit- tle recourse for the experimentalist beyond saying, “It looks chaotic to me.” A prime feature of topological methods is that, being global, they are typically impervious to the noise inherent in physical systems. Such is the case here. Work of Mischaikow et al. [5] uses a homo- CH logical invariant of dynamics combined with a pri- * ori bounds on the noise amplitudes to determine the rigorous dynamics of an experimental system based on noisy time-series data. Figure 3: The Conley index CH∗ of experimental The mathematical tool used is the Conley index, time series data can rigorously verify chaotic dy- an algebraic-topological extension of the Morse in- namics. dex. Consider the flow of rainwater falling on a mountainous terrain D: this flow is that of −∇h, where h : D → R is the height function of the ter- rain. The Morse index of a critical point of h is an cate various stationary solutions. A Conley integer that classifies the type of critical point: min- index computation [6] proves that these solu- ima have index 0, saddle-passes have index 1, and tions exist, with a computational effort of the maxima have index 2. The homological Conley in- same order as a re-run of the numerical solu- dex enriches the Morse index from integers to (ho- tion at a finer resolution. mology types of) spaces: for a Morse function, the Moral: “It’s hardly more expensive to prove the dy- Conley index of a critical point is a sphere of di- namics than to simulate it.” mension the Morse index. The Conley index, unlike the Morse index, applied to non-gradient and non- Looking Forward: smooth vector fields, as well as to discrete-time dy- namics. It is efficacious, even to the point of detect- The three examples here surveyed are all appli- ing chaotic dynamics. This index is computable for cations of homological tools to problems of large realistic systems, thanks to recent progress in com- and often noisy data sets. However, there are nu- putational homology [4]. merous other examples of a different nature un- Work of Mischaikow et al. takes (noisy) time- der the same aegis of applied algebraic topology. series data and represents the dynamics as a multi- Many of these are obstruction-theoretic in nature valued map on a . By adapting the — topological measures of complexity of coordinat- Conley index to this setting and computing the ho- ing robots, synchronizing a network, or performing mological index, it is possible to verify the under- distributed asynchronous computation. lying dynamics, so long as the noise tolerances re- The list of mathematical ideas which were once spect the discretization assumptions. Rigorous re- erroneously derided as useless abstractions (uni- sults about experimental or numerical data include form convergence, algebra, theory, the following: etc.) is sufficiently long and embarrassing so as to suggest patience in the case of applied algebraic 1. For experimentally-generated data on the dy- topology. Given that the (hard) work of generat- namics of a magneto-elastic ribbon in an os- ing good algorithms for computing topological in- cillating magnetic field, a Conley index ap- variants for realistic systems is so recent [4], it can proach proves that the experimental system is be successfully argued that the spate of ad- chaotic (has positive topological entropy) [5]. vances in applied algebraic topology is neither co- The method is robust, and works even when incidental nor terminal. environmental noise alters the appearance of the data significantly. 2. Numerical simulations of the Kuramoto- Sivashinsky partial differential equation indi- Bibliography

[1] Y. Baryshnikov and R. Ghrist, “Target enumeration via Euler characteristic integrals,” preprint (2007). [2] G. Carlsson, T. Ishkhanov, V. de Silva, and A. Zomorodian, “On the local behavior of spaces of nat- ural images,” Intl. J. Comput. Vision, 76(1), (2008), 1– 12. [3] G. Carlsson, T. Ishkhanov, F. M´emoli, D. Ringach, G. Sapiro, “Topological analysis of the responses of neu- rons in V1,” preprint (2007). [4] T. Kaczynski, K. Mischaikow, and M. Mrozek, Com- putational Homology, Applied Mathematical Sciences 157, Springer-Verlag, 2004. [5] K. Mischaikow, M. Mrowzek, J. Reiss, and A. Szym- czak, “Construction of Symbolic Dynamics from Ex- perimental Time Series,” Phys. Rev. Lett. 82, (1999) 1144 - 1147. [6] P. Zgliczynski and K. Mischaikow, “Rigorous numer- ics for partial differential equations: the Kuramoto- Sivashinsky equation,” Foundations of Comp. Math., 1, (2001), 255–288. [7] A. Zomorodian and G. Carlsson, “Computing Persis- tent Homology,” Disc. and Comp. Geom., 33, (2005), 249–274.

5