Hypoelliptic Diffusion Maps and Their Applications in Automated
Total Page:16
File Type:pdf, Size:1020Kb
Hypoelliptic Diffusion Maps and Their Applications in Automated Geometric Morphometrics by Tingran Gao Department of Mathematics Duke University Date: Approved: Ingrid Daubechies, Supervisor Mauro Maggioni Mark Stern Yaron Lipman Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Mathematics in the Graduate School of Duke University 2015 Abstract Hypoelliptic Diffusion Maps and Their Applications in Automated Geometric Morphometrics by Tingran Gao Department of Mathematics Duke University Date: Approved: Ingrid Daubechies, Supervisor Mauro Maggioni Mark Stern Yaron Lipman An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Mathematics in the Graduate School of Duke University 2015 Copyright c 2015 by Tingran Gao All rights reserved Abstract We introduce Hypoelliptic Diffusion Maps (HDM), a novel semi-supervised machine learning framework for the analysis of collections of anatomical surfaces. Triangular meshes obtained from discretizing these surfaces are high-dimensional, noisy, and unorganized, which makes it difficult to consistently extract robust geometric features for the whole collection. Traditionally, biologists put equal numbers of \landmarks" on each mesh, and study the \shape space" with this fixed number of landmarks to understand patterns of shape variation in the collection of surfaces; we propose here a correspondence-based, landmark-free approach that automates this process while maintaining morphological interpretability. Our methodology avoids explicit feature extraction and is thus related to the kernel methods, but the equivalent notion of \kernel function" takes value in pairwise correspondences between triangular meshes in the collection. Under the assumption that the data set is sampled from a fibre bundle, we show that the new graph Laplacian defined in the HDM framework is the discrete counterpart of a class of hypoelliptic partial differential operators. This thesis is organized as follows: Chapter 1 is the introduction; Chapter 2 describes the correspondences between anatomical surfaces used in this research; Chapter 3 and 4 discuss the HDM framework in detail; Chapter 5 illustrates some interesting applications of this framework in geometric morphometrics. iv To my parents, Liqin Lou & Bichi Gao v Contents Abstract iv List of Figures ix List of Abbreviations and Symbolsx Acknowledgements xii 1 Introduction1 1.1 Shape Spaces...............................3 1.2 Non-Rigid Geometry Processing.....................6 1.3 Correspondence-Based Shape Distances.................6 1.4 Diffusion Geometry............................8 2 A Rigid-Motion-Invariant Wasserstein Distance9 2.1 Wasserstein Distance but Rigid-Motion-Invariant........... 10 2.2 The Continuous Kantorovich-Procrustes Distance (CKPD)...... 17 2.3 Relation with Some Other Shape Distances............... 21 3 Hypoelliptic Diffusion Maps on Tangent Bundles 24 3.1 Introduction................................ 24 3.2 Motivating The Fibre Bundle Assumption............... 30 3.3 Hypoelliptic Diffusion Maps: The Formulation............. 40 3.3.1 Basic Setup............................ 40 3.3.2 Graph Hypoelliptic Laplacians.................. 41 vi 3.3.3 Spectral Distances and Embeddings............... 42 3.4 HDM on Tangent and Unit Tangent Bundles.............. 47 3.4.1 HDM on Tangent Bundles.................... 47 3.4.2 HDM on Unit Tangent Bundles................. 52 3.4.3 Finite Sampling on Unit Tangent Bundles........... 55 3.5 Numerical Experiments and the Riemannian Adiabatic Limits.... 63 3.5.1 Sampling without Noise..................... 63 3.5.2 Sampling from Empirical Tangent Spaces............ 65 3.5.3 As-Flat-As-Possible (AFAP) Connections............ 66 3.5.4 An Excursion to the Riemannian Adiabatic Limits...... 69 3.6 Discussion and Future Work....................... 70 4 Hypoelliptic Diffusion Maps on the Heisenberg Group 73 4.1 The Heisenberg Group H3 pRq ...................... 73 4.2 Hypoelliptic Diffusion Maps on H3 pRq ................. 76 4.2.1 Construction........................... 76 4.2.2 Proof of Theorem 26....................... 78 5 Applications and Future Work 89 5.1 Applying HDM to Collections of Anatomical Surfaces......... 89 5.2 Automatic Landmarking for Morphometircs via Sparsity....... 92 5.3 Tree-Based Metric Approximation in Shape Spaces.......... 93 5.4 Phylogeny Reconstruction with Distance Methods........... 95 A The Geometry of Tangent Bundles 97 A.0.1 Coordinate Charts on Tangent Bundles............. 97 A.0.2 Horizontal and Vertical Differential Operators on Tangent Bun- dles................................. 100 A.0.3 The Unit Tangent Bundle as a Subbundle........... 103 vii B Proofs of Theorem 10, 14, 21, and 23 106 B.0.4 Proofs of Theorem 10 and Theorem 14............. 106 B.0.5 Proofs of Theorem 21 and Theorem 23............. 128 Bibliography 167 Biography 180 viii List of Figures 1.1 Automatic correspondence generated by the continuous Procrustes al- gorithm, between second mandibular molars of two lemurs.......3 3.1 A diffusion map incorporating local geometric information...... 27 3.2 Non-triviality in analyzing a collection of teeth (c.f. [25])....... 35 3.3 Holonomy on a unit sphere........................ 38 3.4 Bar plots of the smallest 36 eigenvalues of 3 graph hypoelliptic Lapla- cians with fixed “ 0:2 and varying δ (sampling without noise)... 65 3.5 Bar plots of the smallest 36 eigenvalues of 3 graph hypoelliptic Lapla- cians with fixed “ 0:2 and varying δ (sampling from empirical tan- gent spaces)................................ 67 2 3.6 The vector field (near ξj) on S determined by (3.5.5)......... 67 5.1 MDS embeddings of CPD (left) and HBDD (right).......... 91 5.2 Globally consistent segmentation by spectral clustering for HDM... 92 5.3 Automatic landmarking using sparse eigenvectors for a group of 4 teeth 93 5.4 Phylogenies of a collection of 19 lemur teeth from 5 extant genera, constructed by applying UPGMA to shape distance matrices..... 95 5.5 Phylogenetic tree estimated with molecular-based methods....... 96 ix List of Abbreviations and Symbols Symbols Put general notes about symbol usage in text here. Notice this text is double-spaced, as required. Rn n-dimensional Euclidean space M base manifold F fibre TM the tangent bundle of the Riemannian manifold M UTM the unit tangent bundle of the Riemannian manifold M TxM the tangent space of M at x P M gij components of the Riemannian metric tensor on M dM p¨; ¨q geodesic distance on M dvolM the volume form of M Inj pMq injectivity radius of M Py;x parallel transport from TxM to TyM along the geodesic segment connecting x to y (for x; y sufficiently close) ∆H {∆V horizontal/vertical Laplacian K; KPCA kernel functions , B diffusion bandwidth on the base manifold δ; F diffusion bandwidth on the fibre x Abbreviations CPD Continuous Procrustes Distance CKPD Continuous Procrustes-Kantorovich Distance GPD Generalized Procrustes Distance DM Diffusion Maps VDM Vector Diffusion Maps HDM Hypoelliptic Diffusion Maps HBD Hypoelliptic Diffusion Distance HBDM Hypoelliptic Base Diffusion Maps HBDD Hypoelliptic Base Diffusion Distance xi Acknowledgements First and foremost, I would like to thank my Ph.D. advisor, Prof. Ingrid Daubechies, for her constant support and advice in the past few years. I sincerely appreciate the many efforts she made, financially and academically, that helped develop my path. I gratefully acknowledge the valuable information provided by Dr. Yaron Lipman and Dr. Roi Poranne that built up my geometry processing skill sets. Special thanks go to Dr. Yaron Lipman for my two visit to the Weizmann Institute of Science. I am truly and deeply indebted to my mentors in all branches of mathematics: Prof. Robert Bryant, Prof. Mark Stern, Prof. Lenhard Ng, and Dr. Hangjun Xu for differential geometry; Prof. Mauro Maggioni and Dr. Hau-Tieng Wu for diffusion maps; Prof. Jianfeng Lu and Prof. Thomas Beale for analysis. I would also like to thank my first-year mentors, Prof. Jian-Guo Liu and Prof. Thomas Witelski, for the opportunity to pursue my Ph.D. at Duke. I thank Dr. Andrew Schretter for his enormous assistance in computational resources, and Prof. Doug Boyer and Dr. Gabriel Yapuncich for many informative discussions in biology. I want to thank Kangkang Wang, Hangjun Xu, and all my friends at Duke, for the precious memories from the good old days. I reserve a last special appreciation for Rujie Yin, my friend, teacher, and soul companion, the redefinition of my life. xii 1 Introduction Phylogeny is the science that studies evolution of and evolutionary patterns among groups of species or populations. Molecular-based phylogeny, or phylogenetics, is a very successful tool to explore the evolutionary history of species on earth, in which evolutionary biologists analyze nucleotide sequences or amino acid data, build gene trees from probabilistic substitution models that describe the mutation process of pyrimidines (the nucleobases A and G) and purines (C and T), and eventually estimate species trees from biological phenomena such as horizontal gene transfer, gene duplication/loss, or deep coalescence. The past few decades have witnessed enormous progress in this field. Contrasting to this, the phylogenetics based on morphology, the branch of evolu- tionary biology founded on classifying geometric and structural features of the organ- isms of different species, is still in its infancy. Designing reliable methods to construct