Hypoelliptic Diffusion Maps and Their Applications

in Automated Geometric Morphometrics

by

Tingran Gao

Department of Mathematics Duke University

Date: Approved:

Ingrid Daubechies, Supervisor

Mauro Maggioni

Mark Stern

Yaron Lipman

Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Mathematics in the Graduate School of Duke University 2015 Abstract Hypoelliptic Diffusion Maps and Their Applications in Automated Geometric Morphometrics

by

Tingran Gao

Department of Mathematics Duke University

Date: Approved:

Ingrid Daubechies, Supervisor

Mauro Maggioni

Mark Stern

Yaron Lipman

An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Mathematics in the Graduate School of Duke University 2015 Copyright c 2015 by Tingran Gao All rights reserved Abstract

We introduce Hypoelliptic Diffusion Maps (HDM), a novel semi-supervised machine learning framework for the analysis of collections of anatomical surfaces. Triangular meshes obtained from discretizing these surfaces are high-dimensional, noisy, and unorganized, which makes it difficult to consistently extract robust geometric features for the whole collection. Traditionally, biologists put equal numbers of “landmarks” on each mesh, and study the “shape space” with this fixed number of landmarks to understand patterns of shape variation in the collection of surfaces; we propose here a correspondence-based, landmark-free approach that automates this process while maintaining morphological interpretability. Our methodology avoids explicit feature extraction and is thus related to the kernel methods, but the equivalent notion of “kernel function” takes value in pairwise correspondences between triangular meshes in the collection. Under the assumption that the data set is sampled from a fibre bundle, we show that the new graph Laplacian defined in the HDM framework is the discrete counterpart of a class of hypoelliptic partial differential operators. This thesis is organized as follows: Chapter 1 is the introduction; Chapter 2 describes the correspondences between anatomical surfaces used in this research; Chapter 3 and 4 discuss the HDM framework in detail; Chapter 5 illustrates some interesting applications of this framework in geometric morphometrics.

iv To my parents, Liqin Lou & Bichi Gao

v Contents

Abstract iv

List of Figures ix

List of Abbreviations and Symbolsx

Acknowledgements xii

1 Introduction1

1.1 Shape Spaces...... 3

1.2 Non-Rigid Geometry Processing...... 6

1.3 Correspondence-Based Shape Distances...... 6

1.4 Diffusion Geometry...... 8

2 A Rigid-Motion-Invariant Wasserstein Distance9

2.1 Wasserstein Distance but Rigid-Motion-Invariant...... 10

2.2 The Continuous Kantorovich-Procrustes Distance (CKPD)...... 17

2.3 Relation with Some Other Shape Distances...... 21

3 Hypoelliptic Diffusion Maps on Tangent Bundles 24

3.1 Introduction...... 24

3.2 Motivating The Fibre Bundle Assumption...... 30

3.3 Hypoelliptic Diffusion Maps: The Formulation...... 40

3.3.1 Basic Setup...... 40

3.3.2 Graph Hypoelliptic Laplacians...... 41

vi 3.3.3 Spectral Distances and Embeddings...... 42

3.4 HDM on Tangent and Unit Tangent Bundles...... 47

3.4.1 HDM on Tangent Bundles...... 47

3.4.2 HDM on Unit Tangent Bundles...... 52

3.4.3 Finite Sampling on Unit Tangent Bundles...... 55

3.5 Numerical Experiments and the Riemannian Adiabatic Limits.... 63

3.5.1 Sampling without Noise...... 63

3.5.2 Sampling from Empirical Tangent Spaces...... 65

3.5.3 As-Flat-As-Possible (AFAP) Connections...... 66

3.5.4 An Excursion to the Riemannian Adiabatic Limits...... 69

3.6 Discussion and Future Work...... 70

4 Hypoelliptic Diffusion Maps on the Heisenberg Group 73

4.1 The Heisenberg Group H3 pRq ...... 73

4.2 Hypoelliptic Diffusion Maps on H3 pRq ...... 76 4.2.1 Construction...... 76

4.2.2 Proof of Theorem 26...... 78

5 Applications and Future Work 89

5.1 Applying HDM to Collections of Anatomical Surfaces...... 89

5.2 Automatic Landmarking for Morphometircs via Sparsity...... 92

5.3 Tree-Based Metric Approximation in Shape Spaces...... 93

5.4 Phylogeny Reconstruction with Distance Methods...... 95

A The Geometry of Tangent Bundles 97

A.0.1 Coordinate Charts on Tangent Bundles...... 97

A.0.2 Horizontal and Vertical Differential Operators on Tangent Bun- dles...... 100

A.0.3 The Unit as a Subbundle...... 103

vii B Proofs of Theorem 10, 14, 21, and 23 106

B.0.4 Proofs of Theorem 10 and Theorem 14...... 106

B.0.5 Proofs of Theorem 21 and Theorem 23...... 128

Bibliography 167

Biography 180

viii List of Figures

1.1 Automatic correspondence generated by the continuous Procrustes al- gorithm, between second mandibular molars of two lemurs...... 3

3.1 A diffusion map incorporating local geometric information...... 27

3.2 Non-triviality in analyzing a collection of teeth (c.f. [25])...... 35

3.3 Holonomy on a unit sphere...... 38

3.4 Bar plots of the smallest 36 eigenvalues of 3 graph hypoelliptic Lapla- cians with fixed  “ 0.2 and varying δ (sampling without noise)... 65 3.5 Bar plots of the smallest 36 eigenvalues of 3 graph hypoelliptic Lapla- cians with fixed  “ 0.2 and varying δ (sampling from empirical tan- gent spaces)...... 67

2 3.6 The vector field (near ξj) on S determined by (3.5.5)...... 67 5.1 MDS embeddings of CPD (left) and HBDD (right)...... 91

5.2 Globally consistent segmentation by spectral clustering for HDM... 92

5.3 Automatic landmarking using sparse eigenvectors for a group of 4 teeth 93

5.4 Phylogenies of a collection of 19 lemur teeth from 5 extant genera, constructed by applying UPGMA to shape distance matrices..... 95

5.5 Phylogenetic tree estimated with molecular-based methods...... 96

ix List of Abbreviations and Symbols

Symbols

Put general notes about symbol usage in text here. Notice this text is double-spaced, as required.

Rn n-dimensional Euclidean space M base manifold

F fibre

TM the tangent bundle of the M

UTM the unit tangent bundle of the Riemannian manifold M

TxM the tangent space of M at x P M

gij components of the Riemannian metric tensor on M

dM p¨, ¨q distance on M

dvolM the volume form of M

Inj pMq injectivity radius of M

Py,x parallel transport from TxM to TyM along the geodesic segment connecting x to y (for x, y sufficiently close)

∆H {∆V horizontal/vertical Laplacian

K,KPCA kernel functions

, B diffusion bandwidth on the base manifold

δ, F diffusion bandwidth on the fibre

x Abbreviations

CPD Continuous Procrustes Distance

CKPD Continuous Procrustes-Kantorovich Distance

GPD Generalized Procrustes Distance

DM Diffusion Maps

VDM Vector Diffusion Maps

HDM Hypoelliptic Diffusion Maps

HBD Hypoelliptic Diffusion Distance

HBDM Hypoelliptic Base Diffusion Maps

HBDD Hypoelliptic Base Diffusion Distance

xi Acknowledgements

First and foremost, I would like to thank my Ph.D. advisor, Prof. Ingrid Daubechies, for her constant support and advice in the past few years. I sincerely appreciate the many efforts she made, financially and academically, that helped develop my path. I gratefully acknowledge the valuable information provided by Dr. Yaron Lipman and Dr. Roi Poranne that built up my geometry processing skill sets. Special thanks go to Dr. Yaron Lipman for my two visit to the Weizmann Institute of Science. I am truly and deeply indebted to my mentors in all branches of mathematics: Prof. Robert Bryant, Prof. Mark Stern, Prof. Lenhard Ng, and Dr. Hangjun Xu for differential geometry; Prof. Mauro Maggioni and Dr. Hau-Tieng Wu for diffusion maps; Prof. Jianfeng Lu and Prof. Thomas Beale for analysis. I would also like to thank my first-year mentors, Prof. Jian-Guo Liu and Prof. Thomas Witelski, for the opportunity to pursue my Ph.D. at Duke. I thank Dr. Andrew Schretter for his enormous assistance in computational resources, and Prof. Doug Boyer and Dr. Gabriel Yapuncich for many informative discussions in biology. I want to thank Kangkang Wang, Hangjun Xu, and all my friends at Duke, for the precious memories from the good old days. I reserve a last special appreciation for Rujie Yin, my friend, teacher, and soul companion, the redefinition of my life.

xii 1

Introduction

Phylogeny is the science that studies evolution of and evolutionary patterns among groups of species or populations. Molecular-based phylogeny, or phylogenetics, is a very successful tool to explore the evolutionary history of species on earth, in which evolutionary biologists analyze nucleotide sequences or amino acid data, build gene trees from probabilistic substitution models that describe the mutation process of pyrimidines (the nucleobases A and G) and purines (C and T), and eventually estimate species trees from biological phenomena such as horizontal gene transfer, gene duplication/loss, or deep coalescence. The past few decades have witnessed enormous progress in this field. Contrasting to this, the phylogenetics based on morphology, the branch of evolu- tionary biology founded on classifying geometric and structural features of the organ- isms of different species, is still in its infancy. Designing reliable methods to construct phylogenies from morphological data, validated by comparison with molecular-based phylogenetic “ground truth” for extant species, would permit morpho-based phylo- genetic tree-building for species in the fossil record that, due to the lack of DNA information, lie far beyond the reach of molecular-based methods.

1 Not surprisingly, the early exploration in the science of phylogeny, back to Carl Linnaeus (1707-1778), relies heavily on morphology. More recently, insights gained from molecular biology have become more dominant. An important factor in the success of molecular-based phylogeny is that it lends itself easily to the quantitative description of the evolutionary process, involving probability models as well as ef- ficient algorithms for computation. Morphology is more complex than the genome, because of its infinite-dimensional variability and the lack of appropriate substitution models; many computational biologists, when building a morphological counterpart of the molecular-based methodology, introduce a coarse discretization of morphol- ogy (so as to enable the use of discrete algorithms) that typically renders it more qualitative than truly quantitative. Morphology, however, is also an evolving branch of evolutionary biology. Adopt- ing modern digital data acquisition/representation techniques and computer graph- ics/geometry processing algorithms, it strives to provide a quantitative description for the geometric dissimilarity between organisms. For instance, recent development of microCT-scanning, a three-dimensional scanning technique, has enabled the ac- quisition of relatively large numbers of digitized anatomical surfaces; these data sets are the focus of the research documented in this thesis. In this introductory chapter, we briefly overview the ingredients from mathemat- ics, statistics, and computer science that motivate our research, and articulate the methodology of correspondence-based shape analysis. The remainder of the thesis is structured as follows: in Chapter2, we provide an analysis for the softassign Procrustes matching algorithm [122], and point out that it can be understood as a rigid-motion-invariant Wasserstein distance; in Chapter3, we introduce the concept of Hypoelliptic Diffusion Maps (HDM), a framework that generalizes Diffusion Maps for data sets with a fibre bundle structure (such as the collection of shapes in our application), and study in detail its theoretical foundation in the case of tangent

2 Figure 1.1: Automatic correspondence generated by the continuous Procrustes algorithm, between second mandibular molars of two lemurs. and unit tangent bundles of a closed Riemannian manifold; in Chapter4, we con- struct HDM on the 3-dimensional Heisenberg group, illustrating the extension of the HDM construction to more general fibre bundles or sub-Riemannian manifolds; some applications are given in Chapter5.

1.1 Shape Spaces

Following D. Kendall [78], statisticians view the collection of shapes as an abstract metric space or a Riemannian manifold, with each point representing a specific shape. The original approach is landmark-based, meaning it relies on a discretization of the continuous shape, and summarizes a distance between two shapes from computing the distances between corresponding landmarks on the pair. This is highly relevant to landmark- and Procrustes-based geometric morphometics [102], which typically requires selecting equal numbers of consistently homologous landmark points on each anatomical surface in the first step, and treating the coordinates of the landmark

3 points as the full geometric traits in subsequent analysis. Along this line, standard geometric morphometric software such as morphologika2 [111] have been developed. The landmark-based shape space analysis is limited by the knowledge of land- mark placement. From a mathematical point of view, extracting a finite number of landmarks from a continuous surface inevitably loses geometric information, unless when the shapes under consideration are uniquely determined by the landmarks (e.g. polygonal shapes, as considered in [78][53]), which is rarely the case for geometric morphometricians in biology; from a practical point of view, the requirement that an equal number of landmarks must be chosen on each shape is sometimes unrealistic due to the complex evolutionary and developmental process. Moreover, manually placing landmarks on each shape among a large collection is a tedious task, and the skill to perform it “correctly” typically requires years of professional training; even then the “correctness” can be subject to debate among experts. In contrast to representing each shape with a finite number of landmarks, we can treat the entire shape as an object, modeled as a Riemannian manifold, and study the geometry of the space of Riemannian manifolds. Since Riemannian manifolds are metric spaces (more specifically, they are intrinsic metric spaces), one way to study them is via the Gromov-Hausdorff distance. Motivated by the work of Y. Shikata [130], Gromov established two precompactness theorems for sequences of closed Riemannian manifolds with prescribed geometric constraints, which laid the foundation of Large Scale Geometry [62, 88, 110]. From a spectral point of view, similar precompactness can be obtained from embedding each Riemannian manifold into l2, the space of real-valued, square integrable series, using the heat kernel of the Laplace-Beltrami operator on the manifold. Over the past decades, the field of manifold learning benefited from this spectral embedding scheme [87, 41, 74]; a similar embedding has been recently proposed [156] that uses the heat kernel of the connection Laplacian of a Riemannian manifold. Computationally, methods based

4 on the Gromov-Hausdorff distance have also been developed for shape analysis, see [95, 96, 97] and the references therein. Amazingly, the idea of studying the set of all smooth submanifolds of a fixed ambient smooth manifold (of finite or infinite dimensionality) can be traced back to B. Riemann, in his Habilitationsschrift [124] (translated by W.K. Clifford [123]),

Es giebt indess auch Mannigfaltigkeiten, in welchen die Ortsbestimmung nicht eine endliche Zahl, sondern entweder eine unendliche Reihe oder eine stetige Mannigfaltigkeit von Gr¨ossenbestimmungen erfordert. Solche Mannigfaltigkeiten bilden z. B. die m¨oglichenBestimmungen einer Func- tion f¨urein gegebenes Gebiet, die m¨oglichenGestalten ein err¨aumlichen Figur u. s. w.

There are manifoldnesses in which the determination of position requires not a finite number, but either an endless series or a continuous man- ifoldness of determinations of quantity. Such manifoldnesses are, for example, the possible determinations of a function for a given region, the possible shapes of a solid figure, etc.

Apparently, Riemann conceived the study of the “Riemannian manifold of Rieman- nian manifolds”, realizing that the underlying manifold is infinite-dimensional in nature. In their pioneering work [99, 100], Michor and Mumford studied such an in-

finite dimensional Riemannian structure on the space of smooth regular curves in R2, a mathematical model for (the boundary of) planar shapes, which is clearly indepen- dent of the choice of landmarks. Based on their explicit computation of and curvatures in this shape space, PDE and variatonal methods have been developed [98, 158] and applied to many problems arising from medical imaging and computer vision. Although these approaches are beautiful and mathematically satisfying, they often suffer from high computational cost in practice.

5 Recent progress in geometry modeling and shape deformation made the concept of shape spaces a topic of high interest in computer graphics and geometry processing. The work in this direction extended the computational framework from planar shapes to three-dimensional triangular meshes, and often is computationally much more efficient; see [82, 73, 158, 155, 112, 65] for some examples.

1.2 Non-Rigid Geometry Processing

MicroCT-scanning and associated post-processing transform anatomical surfaces into triangular meshes. To extract geometry information from triangular meshes, the techniques in computer graphics and geometry processing are essential. Computer graphics provides visual representation for discretized meshes through many existing software tools and libraries [131]; geometry processing, however, deals with a much wider range of problems including feature-detection, parametrization, reconstruction, deformation, to name just a few. The main difficulty in applying these algorithms to geometric morphometrics stems from the fact that the variation of anatomical shapes are hard to characterize using existing tools; it is thus important to adapt techniques from Non-Rigid Geometry Processing [27] to our data sets of anatomical surfaces. For instance, algorithms such as fast marching [85], farthest point sampling [103], conformal parametrization [117], and thin-plate spline [22] have been found crucial in our research.

1.3 Correspondence-Based Shape Distances

From a machine learning/pattern recognition point of view, it is natural to solve the species classification problem in two steps: first extract features from each shape, and then perform classification/clustering algorithms in the feature space. Reliable features on anatomical surfaces are not easy to detect; nevertheless, this approach becomes feasible due to the recent progress in topological data analysis, in particular

6 the invention of persistent homology transform (PHT) [148], a sufficient statistics

for 2-dimensional shapes in R3. However, in addition to correct and robust species identification, it is often desirable in geometric morphometrics to identify function- ally or developmentally equivalent parts on each shape, for the purpose of phylogeny reconstruction; topological data analysis algorithms such as PHT generally lacks such interpretability. In contrast, our methodology is based on pairwise correspon- dences, which can be understood as a natural extension from Procrustes analysis to the context of smooth surfaces. The Conformal Wasserstein Distance [89, 91], Continuous Procrustes Distance [3], and Generalized Procrustes Distance [119, 120] are instantiations of this program, replacing the individual feature extraction with a pairwise (latent) feature matching phase; in a second stage, based on all pairwise correspondences, we detect globally consistent features for the whole collection of shapes through HDM, an extension of the diffusion map framework, to be detailed in Chapter3 and Chapter4. The correspondence-based shape distances considered here share a variational

3 paradigm. Let S1, S2 be two 2-dimensional surfaces in R , and A pS1,S2q be an

admissible set of correspondences between S1 and S2. The distance between S1,S2 is defined as

D pS1,S2q “ inf F pf; S1,S2q , (1.3.1) fPA pS1,S2q

where F is a positive-valued smooth functional that provides necessary geometric

structures to make D p¨, ¨q a distance. We shall impose additional constraints on F , such that solving the minimization problem (1.3.1) not only yields the desired shape

distance, but also provides an “optimal correspondence” between S1 and S2.

7 1.4 Diffusion Geometry

Under the assumption that a collection of anatomical surfaces are i.i.d. sampled from an underlying shape manifold, techniques from manifold learning, in particu- lar the diffusion maps [41, 87, 39, 40, 42, 135, 133], provide a natural framework to “knit together” local information (pairwise shape distances and correspondences) into global geometry (shape manifold). While standard diffusion maps only take into consideration the pairwise distances, we extended this framework to incorporate the pairwise correspondences as well; the output of the algorithm thus provides a glob- ally consistent set of “synchronized” pairwise correspondences, which is essentially equivalent to the globally consistent identification of landmark points for geometric morphometricians. Motivated by [90, 132], a similar framework has been developed in [83] as a visualization tool, based on a dissimilarity score computed from local and global shape alignments; we study the framework of hypoelliptic diffusion maps (HDM) from a manifold learning point of view, laying the mathematical foundation for studying the underlying geometric and probabilistic structures.

8 2

A Rigid-Motion-Invariant Wasserstein Distance

The shape distance considered in this chapter is motivated by the Continuous Pro-

crustes Distance [3]. Let S1, S2 be two compact smooth surfaces with (smooth)

3 boundaries embedded in R , normalized to have equal surface area. Let A pS1,S2q denote the set of area-preserving diffeomorphisms between S1 and S2. The Contin- uous Procrustes Distance between S1 and S2 is defined as

1 2 2 DcP pS1,S2q “ inf inf }R pxq ´ C pxq } dvolS1 pxq (2.0.1) CPApS1,S2q RP p3q „ ˆ E żS1 ˙

where E p3q “ R3 ¸ O p3q is the Euclidean group in R3. In practice, searching the

space A pS1,S2q is computationally intractable, but can be well approximated by

searching the (much smaller) space of all M¨obiustransforms between S1 and S2,

as long as DcP pS1,S2q is small (see [3, Theorem 3.10]); the optimal correspondence is thus a M¨obiustransform (often composed with a thin-plate spline interpolant)

instead of an are-preserving map between S1 and S2. In theory, the existence and

uniqueness of a minimizer in A pS1,S2q is unclear, which motivated the study of a “relaxation” for the minimization problem (2.0.1), to be detailed in this chapter.

9 2.1 Wasserstein Distance but Rigid-Motion-Invariant

Let µ, ν be two Borel probability measures in Rn, and denote Π pµ, νq for the set of all couplings between µ and ν, i.e.,

n n n n Π pµ, νq :“ tπ P Prob pR ˆ R q | π pA ˆ R q “ µ pAq , π pR ˆ Aq “ ν pAq ,A P Bu , (2.1.1)

where B stands for the all Borel sets in Rn. The L2-Wasserstein distance between µ and ν is defined as

1 2 2 W2 pµ, νq :“ inf }x ´ y} dπ px, yq , (2.1.2) πPΠpµ,νq n n ˆ żżR ˆR ˙ see, e.g., [150, 151]. We consider the following variant of (2.1.2), which incorporates the invariance under rigid motions:

1 2 2 W2,˚ pµ, νq :“ inf inf }Rx ` t ´ y} dπ px, yq . (2.1.3) πPΠpµ,νq RPOpnq n n n R ˆR ˜ tPR żż ¸ The minimization problem (2.1.3) is non-convex, since it requires searching over the

orthogonal group O pnq. However, we show now that due to the compactness we can still prove the existence of an optimal pair pπ˚,R˚q that achieves the infimum in (2.1.3). For simplicity of notation, let us write

Λ pπ; µ, νq “ inf }Rx ` t ´ y}2 dπ px, yq (2.1.4) RPOpnq n RnˆRn tPR żż and

E pµ, νq “ inf Λ pπ; µ, νq . (2.1.5) πPΠpµ,νq

Proposition 1. For a fixed π P Π pµ, νq, the infimum in (2.1.4) is attained by a pair

n pR˚, t˚q P O pnq ˆ R . Moreover, if we let

µ :“ x dµ pxq , ν :“ y dν pyq , n n żR żR 10 then

t˚ “ ν ´ R˚µ, (2.1.6)

J R˚ “ WQ , (2.1.7)

where W, Q are determined by the following Singular Value Decomposition (SVD)

px ´ µq py ´ νqJ dπ px, yq “ QΣW J. (2.1.8)

RnijˆRn

Proof. In (2.1.4), take the gradient of t

2 ∇t }Rx ` t ´ y} π px, yq “ ∇t xRx ` t ´ y, Rx ` t ´ yy π px, yq » fi » fi RnijˆRn RnijˆRn – fl – fl

2 2 “ ∇t }t} π px, yq ` 2 xt, Rx ´ yy dπ px, yq ` }Rx ´ y} π px, yq » fi RnijˆRn RnijˆRn RnijˆRn – fl “ 2t ` 2 Rx dπ px, yq ´ 2 y dπ px, yq

RnijˆRn RnijˆRn

“ 2t ` R x dµ pxq ´ y dν pyq “ 2 pt ` R µ ´ νq . n n żR żR

Setting the left hand side to 0 gives (2.1.6). Using t “ ν ´ R µ, we have

}Rx ` t ´ y}2 dπ px, yq “ }Rx ´ Rµ ´ y ` ν}2 dπ px, yq

RnijˆRn RnijˆRn

“ }R px ´ µq ´ py ´ νq }2 dπ px, yq

RnijˆRn

“ px ´ µqJ RJR px ´ µq dπ px, yq ` py ´ νqJ py ´ νq dπ px, yq

RnijˆRn RnijˆRn

´ 2 py ´ νqJ R px ´ µq dπ px, yq

RnijˆRn 11 “ px ´ µqJ px ´ µq dµ pxq ` py ´ νqJ py ´ νq dν pyq n n żR żR ´ 2 py ´ νqJ R px ´ µq dπ px, yq .

RnijˆRn

Note that the first two terms on the right hand side are constants. In order to minimize the left hand side, it thus suffices to maximize the last double integral on the right hand side. To do this, note that

py ´ νqJ R px ´ µq dπ px, yq

RnijˆRn

“ Trace py ´ νqJ R px ´ µq dπ px, yq

RnijˆRn ” ı

“ Trace R px ´ µq py ´ νqJ dπ px, yq

RnijˆRn ” ı

“ Trace R px ´ µq py ´ νqJ dπ px, yq » fi RnijˆRn – fl “ Trace UQΣW J “ Trace W JUQΣ ,

where Q, Σ, T come from the` SVD ˘ ` ˘

px ´ µq py ´ νqJ dπ px, yq “ QΣW J.

RnijˆRn

Since W JUQ is unitary, one has

max Trace W JUQΣ “ Trace pΣq UPOp3q “ ` ˘‰ J J with maximum attained by the U˚ satisfying W U˚Q “ I ô U˚ “ WQ .

From now on, let us denote

Var pµq :“ }x ´ µ}2 dµ pxq , Var pνq :“ }y ´ ν}2 dν pyq n n żR żR 12 for the second moments of µ, ν respectively, and

Cov pπ; µ, νq :“ px ´ µq py ´ νqJ dπ px, yq , (2.1.9)

RnijˆRn

since the quantity on the right hand side is essentially a “covariance” between µ, ν.

Corollary 2. The right hand side of (2.1.4) can be written as

Λ pπ; µ, νq “ Var pµq ` Var pνq ´ 2 }Cov pπ; µ, νq}˚ ,

where }¨}˚ stands for the nuclear norm or Schatten 1-norm of a square matrix. The minimization problem (2.1.5) is thus equivalent to the following maximization problem

E pµ, νq “ sup }Cov pπ; µ, νq}˚ . (2.1.10) πPΠpµ,νq r Without loss of generality, we may assume µ “ ν beyond this point. The existence

of a minimizing pair pπ˚,R˚q can be established from the weak-˚ compactness of Π pµ, νq and the weak-˚ continuity of E.

Lemma 3 (Existence of a Minimizer)r. Suppose µ “ ν. Then there exists a pair

pπ˚,R˚q P Π pµ, νq ˆ O pnq such that

2 2 pW2,˚ pµ, νqq “ }R˚x ´ y} dπ˚ px, yq .

RnijˆRn

Proof. It suffices to show that the supremum in (2.1.10) can be attained by some

π˚ P Π pµ, νq. By the weak-˚ compactness of Π pµ, νq (see [89]) and the Banach- Alaoglu Theorem, it suffices to show the weak-˚ continuity of the functional

π ÞÝÑ }Cov pπ; µ, νq}˚ .

8 The following argument establishes this weak-˚ continuity. Assume tπjuj“1 is a weakly convergent sequence of measures in Π pµ, νq. Note that the map from px, yq

13 to each entry of the matrix px ´ µq py ´ νqJ is a continuous function, thus by the

8 assumption of the weak-˚ convergence of tπjuj“1,

J lim Cov pπj; µ, νq “ lim px ´ µq py ´ νq dπj px, yq jÑ8 jÑ8 RnijˆRn

“ px ´ µq py ´ νqJ dπ px, yq “ Cov pπ; µ, νq ,

RnijˆRn where the limit holds for each entry of the n ˆ n real matrix. By the continuity of matrix norms,

lim }Cov pπn; µ, νq} “ }Cov pπ; µ, νq} . nÑ8 ˚ ˚

This completes the proof of the proposition.

Since W2,˚ is invariant under rigid motions, it does not distinguish measures that are pull-backs of each other by a rigid motion. Denote

n n Prob˚ pR q :“ Prob pR q {„, (2.1.11)

where the equivalence relation “„” for µ, ν P Prob pRnq is defined as

n µ „ ν ô f pxq dµ pxq “ f pRy ` tq dν pyq for some R P O pnq , t P R . n n żR żR n Following conventions, points in Prob˚ pR q are denoted by brackets r¨s. We need to n check that W2,˚ defines a distance on Prob˚ pR q. We could have written W2,˚ prµs , rνsq since it is defined over the equivalence classes of Prob pRnq, but by abusing notation we write with W2,˚ pµ, νq for simplicity, whenever its meaning is clear from the con-

n text. Obviously, for each pair of rµs , rνs P Prob˚ pR q,

W2,˚ pµ, νq ě 0 and

W2,˚ pµ, νq “ W2,˚ pν, µq . 14 In order to show that W2,˚ defines a distance, We still have to prove the triangle inequality and the if-and-only-if statement

W2,˚ pµ, νq “ 0 ô rµs “ rνs .

They are established in the following two propositions.

n Proposition 4 (Triangle Inequality). For any rµs , rνs , rωs P Prob˚ pR q,

W2,˚ pµ, ωq ď W2,˚ pµ, νq ` W2,˚ pν, ωq .

Proof. Let µ, ν, ω P Prob pRnq be such that µ “ ν “ ω “ 0. According to Lemma3,

there exists π12 P Π pµ, νq, π23 P Π pν, ωq, and R12,R23 P O pnq such that

2 2 W2,˚ pµ, νq “ }R12x ´ y} dπ12 px, yq ,

RnijˆRn

2 2 W2,˚ pν, ωq “ }R23y ´ z} dπ23 py, zq .

RnijˆRn

By the Gluing Lemma (see, e.g, [150, Lemma 7.6]), there exists

n n n π123 P Prob pR ˆ R ˆ R q

n n such that the marginal of π123 on the first direct product R ˆ R is π12 and the n n marginal on the second direct product R ˆ R is π23. We thus have

1 2 2 W2, pµ, ωq ď }R23R12x ´ z} dπ123 px, y, zq ˚ ¨ ˛ Rnˆ¡RnˆRn ˝ ‚ 1 2 J 2 “ R12x ´ R z dπ123 px, y, zq . ¨ 23 ˛ n ¡n n R ˆR ˆR › › ˝ › › ‚ Now, for arbitrary y P Rn,

J J R12x ´ R23z ď }R12x ´ y} ` y ´ R23z “}R12x ´ y} ` }R23y ´ z} , › › › › › › › 15 › thus

J 2 R12x ´ R23z dπ123 px, y, zq n ¡n n R ˆR ˆR › › › › 2 ď p}R12x ´ y} ` }R23y ´ z}q dπ123 px, y, zq

Rnˆ¡RnˆRn

2 2 “ }R12x ´ y} dπ123 px, y, zq ` }R23y ´ z} dπ123 px, y, zq

Rnˆ¡RnˆRn Rnˆ¡RnˆRn

` 2 }R12x ´ y} ¨ }R23y ´ z} dπ123 px, y, zq

Rnˆ¡RnˆRn p˚q 2 2 ď }R12x ´ y} dπ123 px, y, zq ` }R23y ´ z} dπ123 px, y, zq

Rnˆ¡RnˆRn Rnˆ¡RnˆRn

1 1 2 2 2 2 ` 2 }R12x ´ y} dπ123 px, y, zq }R23y ´ z} dπ123 px, y, zq ¨ ˛ ¨ ˛ Rnˆ¡RnˆRn Rnˆ¡RnˆRn ˝ ‚ ˝ ‚ 1 1 2 2 2 2 2 “ }R12x ´ y} dπ12 px, yq ` }R23y ´ z} dπ23 py, zq , »¨ ˛ ¨ ˛ fi nijˆ n nijˆ n — R R R R ffi –˝ ‚ ˝ ‚ fl where p˚q follows from the Cauchy-Schwarz Inequality. Consequently,

1 2 J 2 W2, pµ, νq ď R12x ´ R z dπ123 px, y, zq ˚ ¨ 23 ˛ n ¡n n R ˆR ˆR › › ˝ › › ‚ 1 1 2 2 2 2 ď }R12x ´ y} dπ12 px, yq ` }R23y ´ z} dπ23 py, zq ¨ ˛ ¨ ˛ RnijˆRn RnijˆRn ˝ ‚ ˝ ‚ “ W2,˚ pµ, νq ` W2,˚ pν, ωq .

16 n Proposition 5. For any rµs , rνs P Prob˚ pR q,

W2,˚ pµ, νq “ 0 if and only if rµs “ rνs .

Proof. By Lemma3, if W2,˚ pµ, νq “ 0, then there exists π˚ P Π pµ, νq and R˚ P O pnq such that

2 }R˚x ´ y} dπ˚ px, yq “ 0,

RnijˆRn or equivalently

2 }R˚x ´ y} “ 0 π˚-a.e.

Thus

}R˚x ´ y} dπ˚ px, yq “ 0

RnijˆRn

as well. Since }¨} is a metric, by [54, Lemma 2.4], this implies

π˚ “ pId ˆ R˚q# µ

where p¨q# stands for the push-forward. Since π˚ P Π pµ, νq, this means ν “ pR˚q# µ, ˚ or equivalently µ “ pR˚q ν. This proves rµs “ rνs.

We summarize the discussion so far into the following theorem.

n Theorem 6. W2,˚ p¨, ¨q is a metric on Prob˚ pR q.

2.2 The Continuous Kantorovich-Procrustes Distance (CKPD)

After establishing the existence of minimizers in Lemma3, the next natural ques- tions are the uniqueness and regularity of the minimizers. The regularity of optimal transport plans/maps is a difficult issue in general, and we refer interested readers to [30, 29, 31], [150, Chapter 12], and [54] (which is particularly relevant to our set- ting). In this section, we shall only summarize some partial uniqueness results for the rigid-motion-invariant optimal transport problem.

17 When µ, ν are general measures in Prob pRnq, the optimal R P O pnq is completely determined by the SVD (2.1.8) for each fixed π P Π pµ, νq. However, the SVD may be non-unique, for instance when Σ contains two or three identical singular values. This is not surprising: we know the alignment problem has no unique solution in general; even when there is no unique solution, one can still determine the set of all

optimal alignments with respect to a fixed π P Π pµ, νq (see [119, Observation 3.2.30 and Proposition 3.2.33]). When all singular values in Σ are distinct, corresponding columns in Q and W are determined up to a common plus/minus sign, which proves the uniqueness of the optimal R.

The uniqueness of π P Π pµ, νq with respect to a fixed rigid motion R P O pnq is far more involved. When µ or ν is absolutely continuous with respect to H n, the n-dimensional Hausdorff measure on Rn, Brenier proved the uniqueness of the optimizer to the optimal transport problem with quadratic distance cost [26]; the same conclusions were subsequently extended to the case where µ or ν vanishes merely on all Lipschitz hypersurfaces (i.e., surfaces of codimension one) in Rn [94, 55]. In our analysis of anatomical surfaces, we are interested in applying the metric

3 W2,˚ to a subclass of Prob˚ pR q that “represents” congruence classes of (smooth) surfaces. Specifically, we identify (up to a surface area normalization) a surface

S with its surface area measure, denoted as dvolS, which is a probability measure absolutely continuous with respect to the 2-dimensional Hausdorff measure H 2 with a uniform density. We denote this subclass as

3 3 S :“ rµs P Prob˚ R | µ “ dvolS for a smooth surface S Ă R . (2.2.1) ` ˘ ( Definition 7 (The Continuous Kantorovich-Procrustes Distance). We call the re-

striction of W2,˚ on S the Continuous Kantorovich-Procrustes Distance (CKPD).

18 Formally, for surfaces S1,S2,

1 2 2 DcKP pS1,S2q “ inf inf }Rx ` t ´ y} dπ px, yq , (2.2.2) ¨πPΠpµ1,µ2q RPOp3q ˛ tP 3 R R3ijˆR3 ˝ ‚ where µ1 “ dvolS1 , µ2 “ dvolS2 are the surface measures on S1,S2, respectively.

As discussed in Section 1.3, when the shape distance (2.2.2) is computed in prac- tice, we would like to obtain the minimizer pπ˚,R˚q P Π pµ1, µ2q ˆ O p3q as well. Though Lemma3 guarantees the existence of such a minimizing pair, we do not know if it is uniquely determined. The obstruction to directly deducing the unique- ness from Brenier’s theory (see, e.g., [150, Chapter 3]) comes from the fact that the measures in S do not vanish on all Lipschitz hypersurfaces: they are precisely supported on such hypersurfaces. Fortunately, some work in this direction has been done by Gangbo and McCann [54] and Ahmad [1], though not in the rigid-motion- invariance setting. Based on these results, we are able to say a bit more about W2,˚

3 on S than on the full Prob˚ pR q. For simplicity of notation, throughout this section we denote

µ1 :“ dvolS1 , µ2 :“ dvolS2

3 for smooth surfaces S1,S2 in R . When S1,S2 are compact, [54] deduced from the Kantorovich duality that all minimizers of the optimal transport problem

inf }x ´ y}2 dπ px, yq πPΠpµ1,µ2q RnijˆRn are supported on the subdifferential of a common convex function ψ : Rn Ñ R. This conclusion also holds for W2,˚ on S , since the minimizer π˚ is no different from the optimal measure in S that attains the infimum of the optimal transport problem

2 inf }R˚x ´ y} dπ px, yq , (2.2.3) πPΠpµ1,µ2q RnijˆRn 19 where R˚ P O pnq is the rigid motion such that pπ˚,R˚q P Π pµ1, µ2q ˆ O pnq is a minimizer for the rigid-motion-invariant optimal transport problem

inf }Rx ´ y}2 dπ px, yq . (2.2.4) pπ,RqPΠpµ1,µ2qˆOpnq RnijˆRn

Under the assumption that one of S1,S2 is the boundary of a strictly convex and bounded domain of Rn, Gangbo and McCann proved in [54] that the minimizer to (2.2.3) is unique. The discussion so far is summarized in the following proposition:

Proposition 8 (Partial Uniqueness). Suppose S1,S2 are smooth compact surfaces in R3 centered at the origin. Then

(i) For any π P Π pµ1, µ2q, if the three singular values of Cov pπ; µ1, µ2q (see (2.1.9))

are distinct, then there exists a unique R˚ P O p3q such that

2 2 inf }Rx ´ y} dπ px, yq “ }R˚x ´ y} dπ px, yq ; RPOp3q R3ijˆR3 R3ijˆR3

(ii) For any R P O p3q, if S1 or S2 is the boundary of a bounded, strictly convex

3 domain of R , then there exists a unique π˚ P Π pµ1, µ2q such that

2 2 inf }Rx ´ y} dπ px, yq “ }Rx ´ y} dπ˚ px, yq . πPΠpµ1,µ2q R3ijˆR3 R3ijˆR3

Conditions in Proposition8(i) and (ii) are necessary but insufficient for the uniqueness of a minimizing pair pπ˚,R˚q P Π pµ1, µ2q ˆ O p3q for (2.2.4). In fact, this “joint uniqueness” requires the uniqueness of the maximizer to the problem

max }Cov pπ; µ1, µ2q}˚ , πPΠpµ1,µ2q which is a non-convex optimization problem (π ÞÑ Cov pπ; µ1, µ2q is linear; Π pµ1, µ2q and the nuclear norm are both convex). Since Π pµ1, µ2q is convex and (weak-˚) com- pact, it follows from the Krein-Milman theorem that the maximum will be attained

20 by the extreme points of Π pµ1, µ2q; however, the complete characterization of the ex- tremality among doubly stochastic measures, a question first raised by Birkhoff [17, #111] in 1948, remains an open problem (see, e.g., [67] for a necessary and nearly suf- ficient condition for this extremality). Over the past decades, it has been discovered that the uniqueness of the minimizer to the Monge-Kantorovich problem depends on the Morse structure of the cost function and the of the supports of the marginal measures, and some sufficient conditions for the uniqueness have been for- mulated (see [2, 114, 84, 93][150, Chapter 12] and the references therein). These methods, however, do not adapt to (2.2.4) directly, since the equivalent notion of “cost function” in the rigid-motion-invariant setting is essentially non-local. In other words, for (2.2.4) the cost to transport a unit amount of mass between two points in

3 R depends on the global geometry of S1 and S2.

2.3 Relation with Some Other Shape Distances

As a distance between surfaces, the Continuous Kantorovich-Procrustes Distance (CKPD) is closely related to existing shape distances. In particular, it provides a unified framework for analyzing the Continuous Procrustes Distance (CPD) [3], Generalized Procrustes Distance (GPD) [119], and Soft-assign Procrustes Matching Algorithm (SPMA) [122]. Since computing CPD involves a minimization problem on the area-preserving maps (2.0.1), it is natural to interpret it as a Monge problem in the sense of optimal transport. Accordingly, CKPD is a relaxation of CPD, in the same sense as the Kan- torovich problem relaxes the Monge problem in optimal transport: if the minimizing measure π˚ for CKPD is supported on the graph of a smooth function, then it gives rise to a minimizing area-preserving map for CPD. We intend to prove the existence and uniqueness of the minimizer of CPD by studying the same problems for CKPD, but the complete resolution of this problem will be left in the future work.

21 The link between CKPD and GPD or SPMA stems from discretization. In our application, smooth surfaces S1,S2 are represented by triangular meshes, typically with different number of vertices and faces. Suppose the vertices of S1 are stored in a matrix X of dimension 3ˆm1, in which each column Xj stands for the coordinates

3 in R of the j-th vertex on the discretized S1, and store vertices of S2 similarly in

3ˆm2 Y P R . Moreover, without loss of generality, we may assume that S1,S2 are centered at the origin and scaled to have unit surface area. The surface measures

µ1, µ2 are also discretized as

m1 m2

µ1 “ αiδXi , µ2 “ βjδYj , i“1 j“1 ÿ ÿ where δp¨q stands for the Dirac delta function, and coefficients tαi | 1 ď i ď m1u, tβj | 1 ď j ď m2u satisfy

m1 m2 αi “ 1, 0 ď αi ď 1; βj “ 1, 0 ď βj ď 1. i“1 j“1 ÿ ÿ

In this discrete setting, CKPD between S1,S2 can be computed from

m1 m2 2 min min }RXi ´ Yj} Bij, (2.3.1) BPΠpµ1,µ2q RPOp3q i“1 j“1 ÿ ÿ where Xi is the i-th column of X, Yj is the j-th column of Y , and Π pµ1, µ2q is the set of m1 ˆ m2 matrices B satisfying

m2 m1 Bij “ αi, Bij “ βj,Bij ě 0. j“1 i“1 ÿ ÿ

Minimizing (2.3.1) is equivalent to maximizing

max max Trace RXBY J max XBY J , (2.3.2) “ ˚ BPΠpµ1,µ2q RPOp3q BPΠpµ1,µ2q ` ˘ › › 22 › › where } ¨ }˚ is the matrix nuclear norm. Though fast algorithms for minimizing the nuclear norm exist, maximizing the nuclear norm is a non-convex optimization prob- lem and is known to be hard to solve. One way to minimize (2.3.1) is to alternatively update one of B, R while fixing the other, which typically leads to a local rather than the global minimizer of the functional; this formulation together with the alternating minimizing scheme is the backbone of SPMA [122]. In the special case that X,Y are of the same size (i.e., m1 “ m2) and µ1, µ2 are uniformly distributed over point clouds X,Y , the maximizing B is a permutation matrix (up to a scalar , by the Birkhoff-von Neumann theorem), in which case (2.3.1) is equivalent to the generalized Procrustes distance and the globally optimal alignment R˚ can be found through searching over all Principal Component Alignments [119, Observation 3.2.27]. Finally, switching focus from smooth maps between surfaces to couplings of sur- face measures is reminiscent of recent trends in the study of soft maps [141, 140], the goal of which is to resolve local and global geometric ambiguities for non-isometric shape matching. In yet another approach, and following a similar idea but replac- ing the triangular mesh with the space of smooth functions on it, one can study the “optimal transport” between “measures” on function spaces, thus defining functional maps [113]. In that context, function spaces are represented as linear combinations of eigenvectors of the discrete Laplacian on the triangular mesh, in contrast to our representation using Dirac delta functions at each mesh vertex. We note that our relaxation formulation differs significantly from soft and functional maps mathemat- ically, since the infimum over all rigid motions completely breaks the linearity of the functional by making the transport cost function non-local. In practice, we observed that the extra degrees of freedom in probability couplings provide the information and structure needed to apply HDM (to be detailed in the remainder of this thesis) to analyze collections of anatomical shapes; as we shall see, the impact of the noise introduced by relaxing maps to transport-plans can be removed by the diffusion.

23 3

Hypoelliptic Diffusion Maps on Tangent Bundles

3.1 Introduction

Acquiring complex, massive, and often high-dimensional data sets has become a common practice in many fields of natural and social sciences; while inspiring and stimulating, these data sets can be challenging to analyze or understand efficiently. To gain insight despite the volume and dimension of the data, methods from a wide range of science fields have been brought into the picture, rooted in statistical inference, machine learning, signal processing, to mention just a few. Among the exploding research interests and directions in data science, the relation between the graph Laplacian [37] and the manifold Laplacian [126] has emerged as a useful guiding principle. Specifically, the field of non-linear dimensionality reduction has witnessed the emergence of a variety of Laplacian-based techniques, such as Lo- cally Linear Embedding (LLE) [127], ISOMAP [147], Hessian Eigenmaps [46], Local Tangent Space Alignment (LTSA) [160], Diffusion Maps [41], Orientable Diffusion Maps (ODM) [135], Vector Diffusion Maps (VDM) [133], and Schr¨odingerEigen- maps [149]. The general practice of these methods is to treat each object in the

24 data set (these objects could be images, texts, shapes, etc.) as an abstract node or vertex, and form a similarity graph by connecting each pair of similar nodes with an edge, weighted by their similarity score. Built with varying flexibility, these methods provide valuable tools for organizing complex networks and data sets by “learning” the global geometry from the local connectivity of weighted graphs. The Diffusion Map (DM) framework [41, 87, 39, 40, 42, 135, 133] proposes a probabilistic interpretation for graph-Laplacian-based dimensionality reduction al- gorithms. Under the assumption that the discrete graph is appropriately sampled from a smooth manifold, it assigns transition probabilities from a vertex to each of its neighbors (vertices connected to it) according to the edge weights, thus defining a graph random walk the continuous limit of which is a diffusion process [153, 49] over the underlying manifold. The eigenvalues and eigenvectors of the graph Lapla- cian, which converge to those of the manifold Laplacian under appropriate assump- tions [12, 13], then reveal intrinsic information about the smooth manifold. More precisely, [14] proves that these eigenvectors embed the manifold into an infinite dimensional l2 space, in such a way that the diffusion distance [41] (rather than the geodesic distance) is preserved. Appropriate truncation of these sequences leads to an embedding of the smooth manifold into a finite dimensional Euclidean space, with small metric distortion. Under the manifold assumption, [135, 133] recently observed that estimating ran- dom walks and diffusion processes on structures associated with the original manifold (as opposed to estimates of diffusion on the manifold itself) are able to handle a wider range of tasks, or obtain improved precision or robustness for tasks considered ear- lier. For instance, [135] constructed a random walk on the orientation bundle [24, §I.7] associated with the manifold, and translated the detection of orientability into an eigenvector problem, the solution of which reveals the existence of a global section on the orientation bundle; [133] introduced a random walk on the tangent bundle

25 associated with the manifold, and proposed an algorithm that embeds the manifold into an l2 space using eigen-vector-fields instead of eigenvectors (and thus the name Vector Diffusion Maps (VDM)). In [156] the VDM approach is used, analogously to [14], to embed the manifold into a finite dimensional Euclidean space. Although the VDM embedding does not reduce the dimensionality as much as standard diffusion embedding methods, it benefits from improved robustness to noise, as illustrated by the analysis of some notoriously noisy data sets [75, 76]. Both [135] and [133] incorporate additional structures into the graph Laplacian framework: in [133] this is an extra orthogonal transformation (estimated from local tangent planes) attached to each weighted edge in the graph; in [135] the edge weights are overwritten with signs determined by this orthogonal transformation. These methods are successful because they incorporate more local geometry in the path to dimensionality reduction, by estimating tangent planes. In fact, the advantage of utilizing local geometric information from the tangent bundle had been noticed earlier: Figure 3.1 shows a simple example, borrowed from [87, §2.6.1], where the original data set (shown in Figure 3.1(a)) is a Descartes Folium with self-intersection at the origin, parametrized by

3 tan θ 3 tan2 θ π π x pθq “ , y pθq “ , θ P ´ , . 1 ` tan3 θ 1 ` tan3 θ 2 2 ” ı This curve is the projection onto a plane of a helix in R3. A standard isotropic random walker on the planar curve would get lost at the intersection, even when sober, as shown in Figure 3.1(b), where the embedding completely mixes blue and red tails beyond the crossing point. In contrast, incorporating tangent information into local

similarity scores yields a much more clear embedding back to R3 (see Figure 3.1(c)), which blows up (in the sense of complex algebraic geometry [61, pp.182]) the self- intersecting curve at its singularity and unraveled its hidden geometry. Specifically, the similarity measure used in the modified diffusion map between any pair of points

26 px pθ1q , y pθ1qq and px pθ2q , y pθ2qq on the curve is

2 2 d ppx pθ1q , y pθ1qq , px pθ2q , y pθ2qqq “ }px pθ1q , y pθ1qq ´ px pθ2q , y pθ2qq}2

px1 pθ q , y1 pθ qq px1 pθ q , y1 pθ qq 2 ` µ 1 1 ´ 2 2 , }px1 pθ q , y1 pθ qq} }px1 pθ q , y1 pθ qq} › 1 1 2 2 2 2 ›2 › › › › where µ ą 0 is a parameter that balances› the two contributions to the dissimilarity› score in consideration. (Two distinct tangent vectors exist at the self-intersection, but they each belong to a distinct point in the parametrization.) It is possible to use the

Figure 3.1: A diffusion map incorporating local geometric information methodology of ODM and VDM to tackle similar problems in much broader contexts, where the local geometric information can be of a different type than information about tangent planes. Indeed, for many data sets, a single data point has abundant structural details; typically graph-Laplacian-based methods begin by “abstracting away” these details, encoding only pairwise similarites. In some circumstances, the hidden details (pixels in an image, vertices/faces on a triangular mesh, key words and transition sentences in a text, etc.) may themselves be of interest. For example, in the geometry processing problem of analyzing large collections of 3D shapes, it is desirable to enable user exploration of shape variations across the collection. In this case, abstracting each single shape as a graph node completely ignores the spatial

27 configuration of an individual shape. On the other hand, even when sticking to pairwise similarity scores significantly simplifies the data manipulation, the best way to score similarity is not always clear. In practice, the similarity measure is often dictated by practical heuristics, which may be misguided for incompletely understood data. In addition, there are situations for which it can be proved that no finite- dimensional representation will do justice to the data. (For instance, in topological data analysis of shapes and surfaces, the only known sufficient statistics (other than the data set itself) is the set of all persistent diagrams taken from all directions [148].) In this paper, we propose the Hypoelliptic Diffusion Map (HDM), a new graph- Laplacian-based framework for analyzing complex data sets. This method focuses on data sets in which pairwise similarity between data points is not sufficiently informative, but each single data point carries sophisticated individual structure. In practice, this type of data set often arises when the data acquired is too noisy, has huge degrees of freedom, or contains un-ordered features (as opposed to sequential data). An example that has all these characteristics is, e.g., a data set in which each data point is a two-dimensional surface in R3, represented either by a triangular mesh or a collection of persistent diagrams. In many cases, computing pairwise similarity within such data sets requires minimizing some functional over the space of admissible pairwise correspondences, and the similarity score between two surfaces is achieved by a certain optimal correspondence map between the surfaces. It is conceivable that the optimal correspondence contains substantial information, missing from the condensed similarity score. The HDM framework is our first attempt at mining this hidden information from correspondences. Like ODM and VDM, HDM generalizes the DM framework, but it takes an essen- tially different path. We are most interested in the scenario in which the individual structures themselves are also manifolds. In order to take them into consideration, we first augment the manifold underlying DM, denoted as M, with extra dimensions.

28 To each point x on M, this augmentation attaches the individual manifold at x, de-

noted as Fx; we assure that around each x P M there exists an open neighborhood U such that on U the augmented structure “looks like” a product of U with a “uni- versal template” manifold F . Intuitively, M plays the role of a “parametrization”

for all the Fx. Of course, the existence of such a universal template makes sense

only if the Fx, x P M are compatible in some appropriate sense (each Fx should at least be diffeomophic to F ; we shall add more restrictions below); however, such compatibility is not uncommon for many data sets of interest, as we shall see in Section 3.2. This picture of parametrizing a family of manifolds with an underlying manifold is reminiscent of the modern differential geometric concept of a fibre bun- dle, which played an important role in the development of geometry, topology, and in the past century. Therefore, we shall refer to this geometric object as the underlying fibre bundle of the data set. Adopting the terminology from differential geometry, we call M the base manifold, the universal template manifold

F the fibre, and each Fx a fibre at x. The probabilistic interpretation of HDM is a random walk on the fibre bundle. In one step, the transition occurs either between points on adjacent but distinct fi- bres, or within the same fibre. Since the fibre bundle is itself a manifold (referred to as the total manifold, denoted as E), this looks so far no different from a direct application of DM, only on an augmented geometric object. However, HDM also incorporates the pairwise correspondences of data points in the fibre bundle formu- lation, by requiring transitions between distinct fibres to satisfy certain directional constraints imposed by the correspondences. The resulting random walk is no longer a direct analogy of its standard counterpart on the total manifold, but rather a “lift” of a random walk on the base manifold M. Under mild assumptions, its continuous limit is a diffusion process on the total manifold E, infinitesimally generated by a hypoelliptic differential operator [68] (thus the name HDM). We can then embed the

29 whole fibre bundle into a Euclidean space using the eigenvectors of this hypoelliptic differential operator; discretely this corresponds to solving for the eigenvectors of our new graph Laplacian, referred to as a hypoelliptic Laplacian of the graph. It turns out that, by varying a couple of parameters in its construction, the family of graph hypoelliptic Laplacians contains the discrete analogue of several important and infor- mative partial differential operators on the fibre bundle, relating the geometry of the base manifold with that of the total manifold. Our numerical experiments revealed interesting phenomena when embedding the fibre bundle using eigenvectors of these new graph Laplacians. Though the HDM framework applies to general fibre bundles, the focus of this paper is the study of tangent and unit tangent bundles of Riemannian manifolds; in a sequel paper we shall study more general fibre bundles. Note that even though the fibre bundles in this paper are the same as for VDM, HDM for tangent bundle nevertheless differs from VDM; we shall come back to this below. This paper is organized as follows: Section 3.2 sets up notations and terminology, and discusses the meaning of the fibre bundle assumption; Section 3.3 describes the formulation of HDM in detail; Section 3.4 characterizes the hypoelliptic graph Lapla- cians on tangent and unit tangent bundles, and studies their pointwise convergence from finite samples; some numerical experiments are shown in Section 3.5; finally we conclude with a brief discussion and propose potentially interesting directions for future work. In AppendixA we include preliminaries on the geometry of tangent bundles and (as their subbundles) unit tangent bundles.

3.2 Motivating The Fibre Bundle Assumption

For high-dimensional data generated by some implicit process with relatively fewer degrees of freedom, it is often reasonable to assume that the data lie approximately on a manifold of much lower dimension than the ambient space. In the litera-

30 ture on semi-supervised learning, this is often referred to as the manifold assump- tion [11, 161]. The goal of semi-supervised learning is to build a classifier based on a partially labeled training set; learning the underlying manifold structure of high- dimensional data is often the first step in this practice, not only because it reduces the dimensionality, but also due because it simplifies the data and exposes the structure. Our fibre bundle assumption is a generalization of the manifold assumption. In differential geometry, a fibre bundle is a manifold itself, that is structured as a family of related manifolds parametrized by another underlying manifold. Following [142], a fibre bundle consists of the following data1:

1. the total manifold E;

2. the base manifold M;

3. the bundle projection π, a surjective smooth map from E onto M;

4. the fibre manifold F , satisfying

(a) for any x P M, π´1 pxq is diffeomorphic to F ;

(b) for any x P M, there exists an open neighborhood U of x in M and a

´1 diffeomorphism φU from π pUq to U ˆ F ;

5. the structure group G, a topological transformation group that acts effectively2 on F , satisfying

(a) for any x P M and two open neighborhoods U and V that both satisfy (4b), the diffeomorphism on F , defined as “freezing the first component

1 Strictly speaking, the definition given here is that of a coordinate bundle [142, §2.3]; fibre bundles are equivalence classes of coordinate bundles. This distinction is less crucial since in the HDM framework we describe the structure of a fibre bundle using coordinates. This is similar to how manifold learning uses the notion of a manifold. 2 G acts effectively on F if g pfq “ f for all f P F implies g “ e, the identity element of G

31 ´1 as x”, obtained from φV ˝ φU as

x ´1 gUV :“ φV ˝ φU px, ¨q : F Ñ F, “ ‰ x is an element gUV in G, and this correspondence

x x ÞÑ gUV

is continuous with respect to the topology on G;

(b) for any x P M and three open neighborhoods U, V, W that all satisfy (4b),

x gUU “ the identity element e of G

and

x x x gUV ˝ gVW “ gUW .

The diffeomorphisms in (4b) are also known as local trivializations. For each x on

´1 the base manifold M, it is conventional to denote the fibre over x as Fx :“ π pxq. The fibre bundle assumption can now be stated as follows:

Assumption 9 (The Fibre Bundle Assumption). The data lie approximately on a fibre bundle, in the sense that each data object is a subset of a fibre over some point on a base manifold.

Note that in the special case where the fibre manifold F is a single point, the fibre bundle is diffeomorphic to its base manifold, and our fibre bundle assumption reduces to the manifold assumption. The definition of a fibre bundle is technical, especially for the part involving the structure group G. The key point is that a fibre bundle is locally a product manifold, and these local pieces are carefully patched together so that the product structures

32 remain consistent when they intersect. Product manifolds are thus fibre bundles by definition, but the concept of a fibre bundle becomes interesting only when the global geometry gets twisted and exposes non-trivial topology. The M¨obiusband, the Klein bottle, and the Hopf fibration are standard illustrations of this; see e.g. [142, §1]. At a first glance, the fibre bundle assumption imposes strong restrictions on the data set structure. However, when understanding the structure of individual data points is equally as interesting as understanding the structure of the data set in the large, the framework based on the manifold assumption becomes insufficient. For instance, in geometric morphormetrics [159], the data sets of interest are collections of shapes, i.e., two-dimensional smooth surfaces in R3, and the central problem is to infer species and other biological information from shape variations. Under the assumption that these variations are governed by relatively few degrees of freedom, it is possible to learn manifold coordinates for each shape in the collection (e.g., applying the diffusion map to the shape collection based on some pairwise shape-distance, e.g., [122, 96, 57, 102, 89, 91,3]). Yet it is difficult to infer shape variation from such coordinates, since the geometry of each individual shape is “abstracted away”, collapsing each shape to a single point. To add interpretability to the manifold learning framework in this circumstance, it is a natural idea to learn different coordinates for distinct points on the same shape, and simultaneously keep similar the coordinates of points belonging to different shapes that are developmentally or functionally equivalent. This geometric intuition is embodied by the fibre bundle assumption. From this point of view, the fibre bundle assumption is but an extra level of indirection (borrowing a term from Andrew Koenig’s “fundamental theorem of software engineering”) for the manifold assumption. The example of shape analysis in geometric morphometics is particularly inter- esting, because it contains another source of ideas that naturally models the data set as a fibre bundle: the global registration problem. Geometric morphometri-

33 cians typically select equal numbers of homologous landmarks on each shape in a globally consistent manner, then reduce the analysis to investigation of the shape space [78, 79] of these landmark points. Along these lines, tools like the Generalized Procrustes Analysis (GPA) [59, 48, 80, 60] have been developed in statistical shape analysis [47], and software products [111, 116] made available. (Recent progress in this area [108, 138, 107,6, 34] relates semidefinite programming with the little Grothendieck problem.) A common basis for these GPA-based methods is that the homology of landmark points depends on human input. Manually placing landmarks on each shape among a large collection is a tedious task, and the skill to perform it correctly typically requires years of professional training. Recently, automated meth- ods have been proposed in this field, based on efficient and robust pairwise surface comparison algorithms [25, 89, 91,3, 119, 120]. However, biological morphologists typically do not compare surfaces merely pairwise: in practice, an experienced mor- phologist uses a large database of anatomical structures to improve the consistency and accuracy of visual interpretation of biological features. This consistency can not be trivially achieved by any geometric algorithm that uses only pairwise comparison information, even when each pairwise comparison is of remarkably high quality. This is shown in Figure 3.2, where a small set of landmarks is propagated from a Microce- bus molar to a corresponding Lepilemur molar, along three different paths. Though all surfaces A through E are fairly similar to each other (and hence the algorithm in [3] guarantees high quality pairwise correspondences), direct propagation of land-

marks via path AÑB gives a different result from AÑCÑB or AÑDÑEÑB. Using the collection tA, B, C, D, Eu leads to a more accurate correspondence between A and B then an isolated A-B comparison would. In the fibre bundle framework, the inherent inconsistency for pairwise-comparison- based global registration can be modeled using the concept of the holonomy of con- nections. In the sense of Ehresmann [50], a connection is a choice of splitting the

34 Figure 3.2: Non-triviality in analyzing a collection of teeth (c.f. [25])

short exact sequence

0 Ñ VE Ñ TE Ñ π˚TM Ñ 0 (3.2.1)

In this short exact sequence, TE is the tangent bundle of the total manifold E; VE is the vertical tangent bundle of E, a subbundle of TE spanned by vectors that are

tangent not only to E at some point u P E, but also to the fibre Fπpuq over π puq P M; π˚TM is the pullback bundle of TM to TE. The practical meaning of this definition is as follows: since the fibre Fx over x P M carries manifold structure for itself, the notion of vectors that are “tangent to the fibre” is well-defined; they correspond to VE. The short exact sequence (3.2.1) tells us that the quotient bundle of TE by VE is isomorphic to π˚TM, but there is no canonical way to choose a “horizontal tangent bundle” HE for TE such that

HE ‘ VE “ TE.

The definition of an Ehresmann connection is just the choice of such a subbundle

HE. More concretely, a connection specifies for each point u P E a subspace HuE of 35 TuE, such that HuE together with all vertical tangent vectors at u spans the entire tangent space TuE at u. Of course, the choice of subspaces HuE should depend

smoothly on u. We shall call vectors in HuE horizontal, while keeping in mind that this concept builds upon the connection. As long as a connection is given on a fibre bundle, tangent vectors on the base

manifold M can always be canonically lifted to E. That is, for any u P E and

any tangent vector Xπpuq P TπpuqM, there exists in HuE a unique tangent vector

L Xu P TuE. In fact, this follows immediately from the fact that HE is isomorphic to π˚TM, as implied in the short exact sequence (3.2.1). Moreover, a smooth vector field X on M can be uniquely lifted to E, resulting in a vector field XL on E that is horizontal everywhere. This eventually enables us to lift any smooth curve

γ : R Ñ M on the base manifold to a horizontal curve γ˜ on E, defined by the ODE

L dγ˜ dγ “ . dt dt ˇuptq ˜ ˇπpuptqq¸ ˇ ˇ ˇ ˇ ˇ ˇ Note that the horizontal curve is uniquely determined once its starting point on E is specified. Therefore, given a smooth curve γ : r0, 1s Ñ M that connects γ p0q to

γ p1q on M, there exists a smooth map from Fγp0q to Fγp1q (at least when γ p0q and γ p1q are sufficiently close), defined as

Fγp0q Q s ÞÑ γ˜s p1q P Fγp1q,

whereγ ˜s denotes the horizontal lift of γ with starting point s. Such constructed maps between neighboring fibres, obviously depending on the choice of path γ, is called the parallel transport along γ. Like the concept of horizontal tangent vectors, parallel transport depends on the choice of the connection. We shall denote the

γ parallel transport from fibre Fy to fibre Fx as Pxy : Fy Ñ Fx. When γ is a unique geodesic on M that connects y to x, we drop the super-index γ and simply write

36 Pxy : Fy Ñ Fx. We shall see later that the probabilistic interpretation of HDM (and even VDM) implicitly depends on lifting from the base manifold a path that is continuous but not necessarily smooth. Though this can not be trivially achieved by the ODE based approach, stochastic differential geometry has already prepared the appropriate tools for tackling this technicality (see e.g. [144, §5.1.2]). We now return to modeling the inherent inconsistency for geometric morphomet- rics. Similar to the diffusion map framework, where small distances are considered to approximate geodesic distances on the manifold, we assume, when the pairwise dis- tance between surfaces S1,S2 is relatively small among all pairwise distances within the collection, that the shape distance is approximately equal to the geodesic distance on the base manifold. Moreover, under the fibre bundle assumption, we consider

the pairwise correspondence map from S1 to S2 to approximate PS2,S1 , the parallel transport along the geodesic connection S1 to S2. By routing through different inter- mediates, one obtains different maps from S1 to S2, which is conceptually equivalent to parallel-transporting along different piecewise geodesics. Due to the dependency on the underlying path, the parallel transport typically does not define globally con- sistent maps. The inconsistency shown in Figure 3.2, caused by propagation along three different paths, fits into this geometric picture. The inconsistency of parallel transport, closely related to the curvature of the corresponding connection [5], is characterized by the notion of holonomy [145, 28, 16].

γ If for all x, y P M the parallel transport Pxy : Fy Ñ Fx is independent of the choice of path γ, then the connection is said to be flat or has trivial holonomy; otherwise the connection is non-flat or the holonomy is non-trivial. Figure 3.3 illustrates the non-trivial holonomy of the Levi-Civita connection on the unit sphere in R3: if we 2 parallel transport a tangent vector v P TAS , first from A to C along the equator and then from C to B along the meridian, then the result PBC PCAv is generally different from PBAv, the result obtained by directly parallel transporting v from A to B along 37 the meridian that connects the two points.

Figure 3.3: Holonomy on a unit sphere

The fibre bundle of interest in Figure 3.3 is an example of a tangent bundle. Generally, the tangent bundle TM of a d-dimensional Riemannian manifold M is

a fibre bundle with base manifold M, fibre Rd, and structure group O pdq (the d-

dimensional orthogonal group); the fibre over each x P M is TxM, the tangent space of M at x. On this bundle, there uniquely exists a canonical connection, the Levi-Civita connection, that is simultaneously torsion-free and compatible with the Riemannian metric on M. The unit tangent bundle UTM is a subbundle of TM, with the same base manifold and structure group, but has a different type of fibre

Sd, the unit pd ´ 1q-dimensional sphere in Rd; the fibre over each x P M consists of all tangent vectors of M at x with unit length. The Levi-Civita connection carries over to a canonical connection on UTM. We focus on analyzing HDM on these two types of fibre bundles in this paper. Note that the tangent bundle is also of fundamental importance for VDM. How- ever, as we shall see in Section 3.3, HDM aims at a goal different from VDM’s, even on tangent bundles: VDM acts on vector fields on M, or equivalently operates on

38 sections of TM (denoted as Γ pM,TMq); HDM focuses on functions on TM, and thus operates on sections of the trivial line bundle TM ˆ R (denoted as Γ pTM, Rq). While VDM embeds the base manifold M into a Euclidean space of lower dimension, HDM is more interested in how each fibre of TM corresponds to its neighboring fibres. In short, VDM and HDM extend DM in two different directions. The use of diffusion maps to solve the global registration problem was proposed earlier in the geometry processing community [132, 83], as was the concept of a “template” for a collection of shapes [152, 139, 109, 71, 70, 36]. These approaches were quite successful, albeit based mostly on heuristics; the fibre bundle framework provides geometric interpretations and insights for many of them. For instance, cycle-consistency-based approaches [109, 70] focus on improving the consistency of composed correspondence maps along 1, 2, 3-cycles, which is implicitly an attempt to recover from condition (5b) the fibre bundle structure that underlies the shape collection; from this point of view, these method sample only one point from each coordinate patch on the base manifold, and likely suffer from an inaccurate recovery due to low sampling rate. [83] uses the diffusion map as a visualization tool, based on a dissimilarity score computed from local and global shape alignments. This is similar to the random walk HDM constructs on a fibre bundle; the geometric meaning of the fuzzy correspondence score in [83] is vague from a manifold learning point of view, but then, it was not the main focus of [83] to analyze the new graph Laplacian on the discretized fibre bundle. From the fibre bundle point of view, the goal of many global registration problems is to learn the fibre bundle structure that underlies the collection of objects. Making an analogy with the terminology manifold learning, we call this type of learning problems fibre learning. A flat connection, or its induced parallel transport, is the key to resolving the problem. However, we remark that the existence of a flat connection on an arbitrary fibre bundle is not guaranteed: the geometry and topology of the fibre

39 bundle may be an obstruction. For a discussion on tangent bundles, see [101, 58].

3.3 Hypoelliptic Diffusion Maps: The Formulation

3.3.1 Basic Setup

The data set considered in the HDM framework is a triplet pX , ρ, Gq, where

1. The total data set X is formed by the union

n X “ Xj j“1 ď

where each subset Xj is referred to as the j-th fibre of X , containing κj points

Xj “ xj,1, xj,2, ¨ ¨ ¨ , xj,κj . ( We call the collection of fibres the base data set

B “ tX1,X2, ¨ ¨ ¨ ,Xnu ,

and let π : X Ñ B be the canonical projection from X to B

π : X ÝÑ B

xj,k ÞÝÑ Xj, 1 ď j ď n, 1 ď k ď κj.

We shall denote the total number of points in X as

κ “ κ1 ` κ2 ` ¨ ¨ ¨ ` κn.

2. The similarity measure ρ is a real-valued function on X ˆ X , such that for all ξ, η P X ρ pξ, ηq ě 0, ρ pξ, ξq “ 0, ρ pξ, ηq “ ρ pη, ξq .

On the product set Xi ˆ Xj, we denote

ρij ps, tq “ ρ pxi,s, xj,tq ; 40 then ρij is an κi ˆ κj matrix on R, to which we will refer as the similarity

matrix between Xi and Xj.

3. The affinity graph G “ pV,Eq has K vertices, with each vi,s corresponding to a

point xi,s P X ; without loss of generality, we shall assume G is connected. (In

our applications, each xi,s is typically connected to several xj,t’s on neighboring

fibres.) If there is an edge between vi,s and vj,t in G, then xi,s is a neighbor

of xj,t and xj,t is a neighbor of xi,s. Moreover, we also call Xi a neighbor of

Xj (and similarly Xj a neighbor of Xi) if there is an edge in G linking one

point in Xi with one point in Xj; this terminology implicitly defines a graph

GB “ pVB,EBq, where vertices of VB are in one-to-one correspondences with

fibres of X , and EB encodes the neighborhood relations between pairs of fibres.

GB will be called as the base affinity graph.

3.3.2 Graph Hypoelliptic Laplacians

Let W P Rκˆκ be the weighted adjacency matrix of the graph G, i.e., W is a block matrix in which the pi, jq-th block

Wij “ ρij. (3.3.1)

The ps, tq entry in Wij is thus the edge weight ρij ps, tq between vi,s and vj,t. Note that W is a symmetric matrix, since ρ is symmetric. Let D be the κ ˆ κ diagonal matrix

n κj n κj

D :“ diag W1j p1, tq , ¨ ¨ ¨ , Wn,j pκn, tq , (3.3.2) #j“1 t“1 j“1 t“1 + ÿ ÿ ÿ ÿ then the graph hypoelliptic Laplacian for the triplet pX , ρ, Gq is defined as the graph Laplacian of the graph G with edge weights given by W , that is

LH :“ D ´ W. (3.3.3)

41 Since G is connected, the diagonal elements of D are all non-zero, and we can define the random-walk and normalized version of LH

H ´1 H ´1 Lrw :“ D L “ I ´ D W, (3.3.4)

H ´1{2 H ´1{2 ´1{2 ´1{2 L˚ :“ D L D “ I ´ D WD . (3.3.5)

Following [41], we can also repeat the constructions above on a renormalized graph of G. More precisely, let Qα be the K ˆ K diagonal matrix

α α N Kj N Kj

Qα :“ diag W1j p1, tq , ¨ ¨ ¨ , WN,j pKN , tq , (3.3.6) #˜j“1 t“1 ¸ ˜j“1 t“1 ¸ + ÿ ÿ ÿ ÿ where α is some constant between 0 and 1, and set

´1 ´1 Wα :“ Qα WQα . (3.3.7)

The graph hypoelliptic Laplacians can then be constructed for Wα instead of W , by

first forming the K ˆ K diagonal matrix Dα

N Kj N Kj

Dα :“ diag pWαq1j p1, tq , ¨ ¨ ¨ , pWαqN,j pKN , tq , (3.3.8) #j“1 t“1 j“1 t“1 + ÿ ÿ ÿ ÿ and then set

H Lα :“ Dα ´ Wα, (3.3.9)

H ´1 H ´1 Lα,rw :“ Dα Lα “ I ´ Dα Wα, (3.3.10)

H ´1{2 H ´1{2 ´1{2 ´1{2 Lα,˚ :“ Dα Lα Dα “ I ´ Dα WαDα . (3.3.11)

3.3.3 Spectral Distances and Embeddings

In order to define spectral distances, we shall use eigen-decompositions. This is the

H reason to consider the symmetric matrices Lα,˚; their eigen-decompositions lead to H H H a natural representation for Lα,rw, since Lα,˚ and Lα,rw are diagonal-similar:

H 1{2 H ´1{2 Lα,* “ Dα Lα,rwDα . 42 κˆ1 H Let v P R be an eigenvector of Lα,˚ corresponding to eigenvalue λ; v defines a function on the vertices of G, or equivalently on the data set X . By the construction

H κˆ1 of Lα,˚, v P R can be written as the concatenation of n segments of length

κ1, ¨ ¨ ¨ , κn,

J J J v “ vr1s, ¨ ¨ ¨ , vrns

κj ˆ1 ` ˘ where vrjs P R defines a function on fibre Xj. Now let λ0 ď λ1 ď λ2 ď ¨ ¨ ¨ ď H λκ´1 be the κ eigenvalues of Lα,˚ in ascending order, and denote the eigenvector corresponding to eigenvalue λj as vj. By our connectivity assumption for G, we know from spectral graph theory [37] that λ0 “ 0, λ0 ă λ1, and v0 is a constant vector with all entries equal to 1; we have thus

0 “ λ0 ă λ1 ď λ2 ď ¨ ¨ ¨ ď λκ´1.

H By the spectral decomposition of Lα,˚,

κ´1 H J Lα,˚ “ λlvlvl , (3.3.12) l“0 ÿ and for any fixed diffusion time t P R`,

κ´1 H t t J Lα,˚ “ λlvlvl , (3.3.13) l“0 ` ˘ ÿ with the pi, jq-block taking the form

κ´1 H t t J Lα,˚ “ λlvlrisvlrjs. (3.3.14) ij l“0 ´` ˘ ¯ ÿ

43 In general, this block is not square. Its Frobenius norm can be computed as

2 J H t H t H t Lα,˚ “ Tr Lα,˚ Lα,˚ ij ij ij › ›F „  ›´` ˘ ¯ › ´` ˘ ¯ ´` ˘ ¯ › › κ´1 › › t t J J “ Tr λlλmvlrisvlrjsvmrjsvmris «l,m“0 ff ÿ (3.3.15) κ´1 t t J J “ Tr λlλmvmrisvlrisvlrjsvmrjs «l,m“0 ff ÿ κ´1 t t J J “ λlλmvmrisvlrisvlrjsvmrjs. l,m“0 ÿ Let us define the hypoelliptic base diffusion map

t κ2 V : B ÝÑ R (3.3.16) t{2 t{2 J Xj ÞÝÑ λl λm vlrjsvmrjs 0ďl,mďκ´1 ´ ¯ 2 then (denoting the standard Euclidean inner produce in Rκ as x¨, ¨y)

2 H t t t Lα,˚ “ V pXiq ,V pXjq , (3.3.17) ij › ›F ›´` ˘ ¯ › @ D › › with which we can define› the hypoelliptic› base diffusion distance on B as

t t dHBDM,t pXi,Xjq “ V pXiq ´ V pXjq

› › 1 t t t t t t 2 “ V pXi›q ,V pXiq ` V p›Xjq ,V pXjq ´ 2 V pXiq ,V pXjq . @ D @ D @ (3.3.18)D(

The hypoelliptic base diffusion map embeds the base data set B into a Euclidean

space using GB, the base affinity graph with edges weighted by entry-wise non- negative matrices. In this sense, it is closely related to the vector diffusion maps [133]: if

κ1 “ κ2 “ ¨ ¨ ¨ “ κn “ d 44 and (relaxing the constraint ρ ě 0)

pvi, vjq P EB ô ρij “ wijOij, where wij ě 0 and Oij is d ˆ d orthogonal, then the weighted adjacency matrix W (as defined in (3.3.1)) coincides with the ad- jacency matrix S defined in [133, §3]. In this case, the graph hypoelliptic Laplacian of pX , ρ, Gq reduces to the graph connection Laplacian for pGB, twiju , tOijuq. Note that in HDM we assume the non-negativity of the similarity measure ρ, which is gen- erally not the case for vector diffusion maps. (The non-negativity of the eigenvalues

H H of Lα,˚ allows us to consider arbitrary powers Lα,˚; in VDM, this is circumvented by considering powers of S2.) From a different point of view, by the Riesz Represen- tation Theorem, smooth vector fields on a manifold M can be identified with linear functions on TM, thus VDM can be viewed as HDM restricted on the space of linear functions on TM.

In addition to embedding the base data set B, HDM is also capable of embedding the total data set X into Euclidean spaces. Define for each diffusion time t P R` the hypoelliptic diffusion map

t κ 1 H : X ÝÑ R ´ (3.3.19) t t t xj,s ÞÝÑ λ1v1rjs psq , λ2v2rjs psq , ¨ ¨ ¨ , λκ´1vpκ´1qrjs psq . ` ˘ where vlrjs psq is the s-th entry of the j-th segment of the l-th eigenvector, with j “ 1, ¨ ¨ ¨ , n, s “ 1, ¨ ¨ ¨ , κj. We could also have written

j´1

vlrjs psq “ vl psj ` sq , where s1 “ 0 and sj “ κp for j ě 2. p“1 ÿ

Following a similar argument as in [41], we can define the hypoelliptic diffusion dis- tance on X as

t t dHDM,t pxi,s, xj,tq “ H pxi,sq ´ H pxj,tq . (3.3.20) › › ›45 › As a result, Ht embeds the total data set X into a Euclidean space in such a manner that the hypoelliptic diffusion distance on X is preserved. Moreover, this embedding automatically suggests a global registration for all fibres, according to the similarity measure ρ. For simplicity of notations, let us write

t t Hj :“ H æXj

t t for the restriction of H to fibre Xj, and call it the j-th component of H . Up to scaling, the components of Ht bring the fibres of X to a common “template”, such

that points xi,s and xj,t with a high similarity measure ρij ps, tq tend to be close to each other in the embedded Euclidean space. Pairwise correspondences between

fibres Xi,Xj can then be reconstructed from the hypoelliptic diffusion map. Indeed,

m assuming each Xj is sampled from some manifold Fj, and a template fibre F Ă R can be estimated from

t t H1 pX1q , ¨ ¨ ¨ ,Hn pXnq ,

t then one can often extend (by interpolation) Hj from a discrete correspondence to

a continuous bijective map from Fj to F , and build correspondence maps between

t an arbitrary pair Xi,Xj by composing (the interpolated continuous maps) Hi and

t ´1 Hj . A similar construction was implicit in [83]. Sometimes, it is more useful to consider` ˘ a normalized version of hypoelliptic diffusion map that takes value on the

standard unit sphere in Rκ´1:

t κ 2 κ H : X ÝÑ S ´ Ă R

t (3.3.21) r Hj pxj,sq xj,s ÞÝÑ t . Hj pxj,sq › › › › We shall see an example that applies Ht to SO p3q in Section 3.5.

r

46 3.4 HDM on Tangent and Unit Tangent Bundles

The HDM framework is very flexible: if each fibre consists of one single point, the hypoelliptic graph Laplacian reduces to the graph Laplacian that underlies diffusion maps; if all the fibres have the same number of points and all similarity matrices (defined in Section 3.3.3) are orthogonal (up to a multiplicative constant), the hy- poelliptic graph Laplacian reduces to the graph connection Laplacian that underlies vector diffusion maps. The goal of this section is to relate HDM to some other partial differential operators of geometric importance on tangent and unit tangent bundles of (compact, closed) Riemannian manifolds. In a follow-up paper, we will extend the geometric setting to more general fibre bundles. This section builds upon the fibre bundle assumption. Adopting notation in

Section 3.3.1, we assume that X is sampled from a fibre bundle E, and each fibre

Xj is sampled from a fibre over some point on the base manifold M. Moreover, we shall assume that E is the tangent bundle or unit tangent bundle of M, i.e., E “ TM or E “ UTM. For the convenience of the reader, some basic properties about the geometry of these fibre bundles are reviewed in AppendixA.

3.4.1 HDM on Tangent Bundles

Let K : R2 Ñ Rě0 be a smooth kernel function supported on the unit square r0, 1s ˆ r0, 1s. In all that follows, we shall assume that M is a compact manifold without boundary, which, according to standard custom, we shall simply call a closed manifold. Let the closed manifold M be equipped with a Riemannian metric tensor

ě0 g, which induces on M a geodesic distance function dM p¨, ¨q : M ˆ M Ñ R ; g defines an inner product on each tangent space of M, denoted as

j k xu, vyx “ gjk pxq u v , u, v P TxM, (3.4.1)

47 where, and for the remainder of this section unless otherwise specified, we have adopted the Einstein summation convention. The vector norm on TxM with respect to this inner product shall be denoted as

1 j k 2 }u}x “ gjk pxq u u , u P TxM. (3.4.2) ` ˘ We denote

Py,x : TxM Ñ TyM (3.4.3) for the parallel transport from x P M to y P M with respect to the Levi-Civita connection on M, along a geodesic segment that connects x to y. It is well known that a tangent vector can be parallel-transported along any smooth curve on M; since M is compact, its injectivity radius Inj pMq is positive, and thus any x P M lies within a geodesic normal neighborhood in which any point y can be connected to x through a unique geodesic with length smaller than Inj pMq. Therefore, Py,x is well-defined, at least for x, y P M with dM px, yq ă Inj pMq. Furthermore, for such x, y P M, Py,x is an orientation-preserving isometry between the domain and target tangent planes [44, Exercise 2.1].

For bandwidth parameters  ą 0, δ ą 0, define for all px, vq , py, wq P TM

2 P v w 2 dM px, yq } y,x ´ }y K,δ px, v; y, wq :“ K , . (3.4.4) ˜  δ ¸

Note that the requirement that supp pKq Ă r0, 1sˆr0, 1s implies that K,δ px, v; y, wq ‰ ? 0 only if dM px, yq ď . It follows that Py,x, and further K,δ px, v; y, wq, are well- defined when  ď Inj pMq2; we shall restrict ourselves to such sufficiently small .

48 K,δ is symmetric because Py,x is an isometry between TxM and TyM:

2 2 2 P P w v 2 dM py, xq }Px,yw ´ v}x dM px, yq } y,x p x,y ´ q}y K,δ py, w; x, vq “ K , “ K , ˜  δ ¸ ˜  δ ¸

2 w P v 2 dM px, yq } ´ y,x }y “ K , “ K,δ px, v; y, wq . ˜  δ ¸

This symmetry is of particular importance for the definition of symmetric diffusion semigroups [143].

Let p P C8 pTMq be the probability density function according to which we shall sample. Assume p is bounded from both above and below (away from 0):

0 ă pm ď p px, vq ď pM ă 8, @ px, vq P TM. (3.4.5)

Define

p,δ px, vq :“ K,δ px, v; y, wq p py, wq dµ py, wq . (3.4.6) żTM where dµ is the standard volume form on TM. As in AppendixA, dµ is a product of dVy pwq (the standard translation- and rotation-invariant Borel measure on TyM) and dvolM pyq (the standard Riemannian volume element on M). If we fix x P M and integrate p px, vq along TxM, then

p pxq :“ p px, vq dVx pvq (3.4.7) żTxM is a density function on M, since

p pxq dvol pxq “ p px, vq dVx pvq dvol pxq “ p px, vq dµ px, vq “ 1. żM żM żTxM żTM We call p the projection of p on M. Furthermore, dividing p px, vq by p pxq yields a

conditional probability density function on TxM

p px, vq p pv | xq “ , (3.4.8) p pxq 49 since

p px, vq dVx pvq p pxq p pv | xq dV pvq “ TxM “ “ 1. x p pxq p pxq żTxM ş For any f P C8 pTMq, we can define its average along fibres on TM, with respect to the conditional probability density functions, as the following function on M:

f pxq “ f px, vq p pv | xq dVx pvq , @x P M. (3.4.9) żTxM Finally, for any 0 ď α ď 1, define the α-normalized kernel

α K,δ px, v; y, wq K,δ px, v; y, wq :“ α α . (3.4.10) p,δ px, vq p,δ py, wq

We are now ready to define a family of hypoellitpic diffusion operators on TM as

α K,δ px, v; y, wq f py, wq p py, wq dµ py, wq α TM H,δf px, vq :“ ż (3.4.11) α K,δ px, v; y, wq p py, wq dµ py, wq żTM for any f P C8 pTMq.

α We are interested in the asymptotic behavior of H,δ in the limit  Ñ 0, δ Ñ 0. It turns out that this depends on the relative rate with which  and δ approach 0.

For simplicity of notation, let us write γ “ δ{.

Theorem 10 (HDM on Tangent Bundles). Let M be a closed Riemannian manifold,

α 8 8 and H,δ : C pTMq Ñ C pTMq defined as in (3.4.11). If δ “ O pq as  Ñ 0, or equivalently if γ “ δ{ is asymptotically bounded, then for any f P C8 pTMq, as  Ñ 0 (and thus δ Ñ 0),

m ∆H rfp1´αs px, vq ∆H p1´α px, vq Hα f px, vq “ f px, vq `  21 ´ f px, vq ,δ 2m p1´α px, vq p1´α px, vq 0 „  m ∆V rfp1´αs px, vq ∆V p1´α px, vq ` δ 22 ´ f px, vq ` O 2 ` δ2 , 2m p1´α px, vq p1´α px, vq 0 „  ` ˘ (3.4.12)

50 where m0, m21, m22 are positive constants depending only on the kernel K.

Theorem 10 is the tangent bundle version of Theorem 14 (which applies to unit tangent bundles), and the proofs for these two theorems are essentially identical. We included a proof of Theorem 14 in AppendixB, from which a proof of Theorem 10 can be easily adapted.

α Proposition 11. Let M and H,δ be as in Theorem 10. If δ “ γ, then for any f P C8 pTMq and sufficiently small  ą 0,

1 1´α 1´α α m2 ∆M fp pxq ∆M p pxq 2 lim H,γf px, vq “ f pxq `  ´ f pxq ` O  , γÑ8 2m1 p1´α x p1´α x 0 « “ p q‰ p q ff ` ˘ (3.4.13) where ∆M is the Laplace-Beltrami operator on M, p is the projected density function

1 1 on M, f pxq is the average of f along fibres on TM, and m0, m2 are constants that only depend on the kernel function K.

α Corollary 12. Let M and H,δ be as in Theorem 10. If δ{ “ γ Ñ 8 as  Ñ 0, δ Ñ 0, then for any f P C8 pTMq, in general

Hα f px, vq ´ f px, vq Hα f px, vq ´ f px, vq lim lim ,γ ‰ lim lim ,γ , γÑ8 δÑ0  δÑ0 γÑ8 

α and thus an asymptotic expansion of H,δf px, vq for small , δ is not well-defined. In fact, for each fixed γ, as in (3.4.12),

m ∆H rfp1´αs px, vq ∆H p1´α px, vq Hα f px, vq “f px, vq `  21 ´ f px, vq ,γ 2m p1´α px, vq p1´α px, vq 0 „  m ∆V rfp1´αs px, vq ∆V p1´α px, vq `γ 22 ´ f px, vq ` O 2 , 2m p1´α px, vq p1´α px, vq 0 „  ` ˘ (3.4.14)

51 whereas by Proposition 11

1 1´α 1´α α m2 ∆M fp pxq ∆M p pxq 2 lim H,γf px, vq “ f pxq `  ´ f pxq ` O  . γÑ8 2m1 p1´α x p1´α x 0 « “ p q‰ p q ff ` ˘ (3.4.15)

Corollary 13. Under the same assumptions and notation as in Theorem 10, if

α “ 1, then

(i) If δ “ O pq as  Ñ 0, then for any f P C8 pTMq, as  Ñ 0 (and thus δ Ñ 0),

1 m21 H m22 V 2 2 H,δf px, vq “ f px, vq `  ∆ f px, vq ` δ ∆ f px, vq ` O  ` δ ; 2m0 2m0 ` (3.4.16)˘

(ii) For any f P C8 pTMq,

1 1 m2 2 lim H,γf px, vq “ f pxq `  1 ∆M f pxq ` O  . (3.4.17) γÑ8 2m0 ` ˘ 3.4.2 HDM on Unit Tangent Bundles

The construction of HDM for unit tangent bundles is very similar to the construction in Section 3.4.1. We only need to replace the volume element dµ on TM with dΘ, the Liouville measure on UTM (see, e.g., [35, Chapter VII]), and modify the definition

of K,δ in (3.4.4) into

2 d2 P v, w dM px, yq Sy p y,x q K,δ px, v; y, wq :“ K , , (3.4.18) ˜  δ ¸

where Sy is the unit sphere in TyM, and dSy p¨, ¨q is the geodesic distance on Sy

under the induced metric from TyM. Note that K,δ as defined in (3.4.18) is still

symmetric. Abusing notation, we shall not distinguish the K,δ in (3.4.4) with the

52 unit tangent bundle version (3.4.18), whenever the specification can be inferred from contexts. Similarly, notation

p pxq :“ p px, vq dVx pvq , (3.4.19) żSx

p px, vq p pv | xq “ , (3.4.20) p pxq and

8 f pxq “ f px, vq p pv | xq dVx pvq , @f P C pUTMq , @x P M (3.4.21) żSx will stay the same as in (3.4.7), (3.4.8), and (3.4.9). Like in (3.4.11), now we can define a family of hypoellitpic diffusion operators on UTM for any 0 ď α ď 1 as

α K,δ px, v; y, wq f py, wq p py, wq dΘ py, wq α UTM H,δf px, vq :“ ż (3.4.22) α K,δ px, v; y, wq p py, wq dΘ py, wq żUTM for any f P C8 pUTMq.

Theorem 14 (HDM on Unit Tangent Bundles). Let M be a closed Riemannian

α 8 8 manifold, and H,δ : C pUTMq Ñ C pUTMq defined as in (3.4.22). If δ “ O pq as  Ñ 0, or equivalently if γ “ δ{ is asymptotically bounded, then for any f P C8 pUTMq, as  Ñ 0 (and thus δ Ñ 0),

m ∆H rfp1´αs px, vq ∆H p1´α px, vq Hα f px, vq “ f px, vq `  21 S ´ f px, vq S ,δ 2m p1´α px, vq p1´α px, vq 0 „  m ∆V rfp1´αs px, vq ∆V p1´α px, vq ` δ 22 S ´ f px, vq S ` O 2 ` δ2 , 2m p1´α px, vq p1´α px, vq 0 „  ` ˘ (3.4.23)

where m0, m21, m22 are positive constants depending only on the kernel K. 53 We included a proof of Theorem 14 in AppendixB. The proof of Theorem 10 is essentially the same.

α Proposition 15. Let M and H,δ be as in Theorem 14. If δ “ γ, then for any f P C8 pUTMq and sufficiently small  ą 0,

1 1´α 1´α α m2 ∆M fp pxq ∆M p pxq 2 lim H,γf px, vq “ f pxq `  ´ f pxq ` O  , γÑ8 2m1 p1´α x p1´α x 0 « “ p q‰ p q ff ` ˘ (3.4.24) where ∆M is the Laplace-Beltrami operator on M, p is the projected density function

1 1 on M, f pxq is the average of f along fibres on UTM, and m0, m2 are constants that only depend on the kernel function K.

α Corollary 16. Let M and H,δ be as in Theorem 14. If δ{ “ γ Ñ 8 as  Ñ 0, δ Ñ 0, then for any f P C8 pUTMq, in general

Hα f px, vq ´ f px, vq Hα f px, vq ´ f px, vq lim lim ,γ ‰ lim lim ,γ . γÑ8 δÑ0  δÑ0 γÑ8 

α and thus an asymptotic expansion of H,δf px, vq for small , δ is not well-defined. In fact, for each fixed γ, as in (3.4.23),

m ∆H rfp1´αs px, vq ∆H p1´α px, vq Hα f px, vq “f px, vq `  21 S ´ f px, vq S ,γ 2m p1´α px, vq p1´α px, vq 0 „  m ∆V rfp1´αs px, vq ∆V p1´α px, vq `γ 22 S ´ f px, vq S ` O 2 , 2m p1´α px, vq p1´α px, vq 0 „  ` ˘ (3.4.25) whereas by Proposition 15

1 1´α 1´α α m2 ∆M fp pxq ∆M p pxq 2 lim H,γf px, vq “ f pxq `  ´ f pxq ` O  . γÑ8 2m1 p1´α x p1´α x 0 « “ p q‰ p q ff ` ˘ (3.4.26)

54 Corollary 17. Under the same assumptions and notation as in Theorem 14, if

α “ 1, then

(i) If δ “ O pq as  Ñ 0, then for any f P C8 pUTMq, as  Ñ 0 (and thus δ Ñ 0),

1 m21 H m22 V 2 2 H,δf px, vq “ f px, vq `  ∆S f px, vq ` δ ∆S f px, vq ` O  ` δ ; 2m0 2m0 ` (3.4.27)˘

(ii) For any f P C8 pUTMq,

1 1 m2 2 lim H,γf px, vq “ f pxq `  1 ∆M f pxq ` O  . (3.4.28) γÑ8 2m0 ` ˘ 3.4.3 Finite Sampling on Unit Tangent Bundles

Though the theory of hypoelliptic diffusion maps on unit tangent bundles is com- pletely parallel to its counterpart on tangent bundles, in practice it is usually much easier to sample from the unit tangent bundle since it is compact whenever the base manifold is. It thus makes much more sense to study finite sampling on unit tangent bundles. In this section, we first consider sampling without noise, i.e. where we sample exactly on unit tangent bundles; next, we study the case where the tangent spaces are empirically estimated from samples on the base manifold. The latter sce- nario is a proof-of-concept for applying the hypoelliptic diffusion map framework to much more general fibre bundles in practice, where data representing each fibre are often acquired with noise. The proofs of Theorem 21 and Theorem 23 can be found in AppendixB. In Section 3.5, we shall demonstrate a numerical experiment that addresses the difference between the two sampling strategies.

Sampling without Noise

We begin with some assumptions and definitions. Assumption 18 includes our tech- nical assumptions, and Assumption 19 specifies the noiseless sampling strategy.

55 Assumption 18. 1. ι : M ãÑ RD is an isometric embedding of a d-dimensional closed Riemannian manifold into RD, with D " d.

2. Let the two-variable smooth function K : R2 Ñ Rě0 be compactly supported on

the unit square r0, 1s ˆ r0, 1s. The partial derivatives B1K, B2K are therefore automatically compactly supported on the unit square as well. (In fact, a similar result still holds if K and its first order derivatives decay faster at infinity than any inverse polynomials; we avoid such technicalities and focus on demonstrating the idea, using compactly supported K.)

Assumption 19. The pNB ˆ NF q data points

x1,1, x1,2, ¨ ¨ ¨ , x1,NF

x2,1, x2,2, ¨ ¨ ¨ , x2,NF . . . . . ¨ ¨ ¨ .

xNB ,1, xNB ,2, ¨ ¨ ¨ , xNB ,NF are sampled from UTM with respect to a probability density function p px, vq satis-

fying (3.4.5), following a two-step strategy: (i) sample NB points ξ1, ¨ ¨ ¨ , ξNB i.i.d. on

M with respect to p, the projection of p on M (recall (3.4.19)); (ii) sample NF points

xj,1, ¨ ¨ ¨ , xj,NF on Sξj with respect to p p¨ | ξjq, the conditional probability density on the fibre (recall (3.4.20)).

Definition 20. 1. For  ą 0, δ ą 0 and 1 ď i, j ď NB, 1 ď r, s ď NF , define

2 2 }ξ ´ ξ } }Pξ ,ξ xi,r ´ xj,s} K i j , j i , i ‰ j, Kˆ px , x q “  δ ,δ i,r j,s $ ˆ ˙ &0, i “ j. % where Pξj ,ξi : Sξi Ñ Sξj is the parallel transport from Sξi to Sξj . Note the ˆ ˆ difference between K,δ and K,δ (defined in (3.4.18)): K,δ uses Euclidean

distance while K,δ uses geodesic distance.

56 2. For 0 ď α ď 1, define

NB NF ˆ pˆ,δ pxi,rq “ K,δ pxi,r, xj,sq j“1 s“1 ÿ ÿ ˆ α and the empirical α-normalized kernel K,δ

ˆ ˆ α K,δ pxi,r, xj,sq K,δ pxi,r, xj,sq “ α α , 1 ď i, j ď NB, 1 ď r, s ď NF . pˆ,δ pxi,rq pˆ,δ pxj,sq

3. For 0 ď α ď 1 and f P C8 pUTMq, denote the α-normalized empirical hypoel- liptic diffusion operator by

NB NF ˆ α K,δ pxi,r, xj,sq f pxj,sq ˆ α j“1 s“1 H,δf pxi,rq “ ÿ ÿ . NB NF ˆ α K,δ pxi,r, xj,sq j“1 s“1 ÿ ÿ Theorem 21 (Finite Sampling without Noise). Under Assumption 18 and Assump- tion 19, if

(i) δ “ O pq as  Ñ 0;

(ii) N lim F “ ρ P p0, 8q , NB Ñ8 NB NF Ñ8 then for any xi,r with 1 ď i ď NB and 1 ď r ď NF , as  Ñ 0 (and thus δ Ñ 0), with high probability

m ∆H rfp1´αs px q ∆H p1´α px q Hˆ α f px q “ f px q `  21 S i,r ´ f px q S i,r ,δ i,r i,r 2m p1´α px q i,r p1´α px q 0 „ i,r i,r  m ∆V rfp1´αs px q ∆V p1´α px q ` δ 22 S i,r ´ f px q S i,r (3.4.29) 2m p1´α px q i,r p1´α px q 0 „ i,r i,r  ´ 1 d 2 2 ´1 2 ´ 4 ` O  ` δ ` θ˚ NB  , ´ ¯ 57 where 1 θ˚ “ 1 ´ . d d´1 NF 1 `  4 δ 4 N c B We give a proof of Theorem 21 in Appendix B.0.5.

Sampling from Empirical Tangent Spaces

In practice, it has been shown in [133] that, under the manifold assumption, a local PCA procedure can be used for estimating tangent spaces from a point cloud; we are using PCA here as a procedure that determines the dimension of a local good linear approximation to the manifold, and also, conveniently, provides a good basis, which can be viewed as a basis for each tangent plane. To sample on these tangent spaces, it suffices to repeatedly sample coordinate coefficients from a fixed standard unit sphere; each sample can be interpreted as giving the coordinates of a point (approximately) on the tangent space. Parallel-transports will take the corresponding point that truly lies on the tangent space at ξ to the tangent space at ζ, another point on the manifold. This new tangent space is, however, again known only approximately; points in this approximate space are characterized by coordinates with respect to the local PCA basis at ζ. We can thus express the whole (approximate) parallel- transport procedure by maps between coordinates with respect to PCA basis at ξ to sets of coordinates at ζ; these changes of coordinates incorporate information on the choices of basis at each end as wells as on the parallel-transport itself. Let us now describe this in more detail, setting up notation simultaneously.

Throughout this section, Assumption 18 still holds. Let tξ1, ¨ ¨ ¨ , ξNB u be a col- lection of i.i.d. samples from M; then the local PCA procedure can be summarized

as follows: for any ξj, 1 ď j ď NB, let ξj1 , ¨ ¨ ¨ , ξjk be its k nearest neighboring points. Then

Xj “ rξj1 ´ ξj, ¨ ¨ ¨ , ξjk ´ ξjs 58 is a D ˆ k matrix. Let KPCA be a positive monotonic decreasing function supported on the unit interval, e.g., the Epanechnikov kernel

2 KPCA puq “ 1 ´ u χr0,1s, ` ˘ where χ is the indicator function. Fix a scale parameter PCA ą 0, let Dj be the k ˆ k diagonal matrix

}ξj ´ ξj1 } }ξj ´ ξjk } Dj “ diag KPCA ? , ¨ ¨ ¨ , KPCA ?   ˜d ˆ PCA ˙ d ˆ PCA ˙¸ and carry out the singular value decomposition (SVD) of matrix XjDj as

J XjDj “ UjΣjVj .

An estimated basis Bj for the local tangent plane at ξj is formed by the first d left singular vectors (corresponding to the d largest singular values in Σj), arranged into a matrix as follows:

p1q pdq Dˆd Bj “ uj , ¨ ¨ ¨ , uj P R . ” ı Note that the intrinsic dimension d is generally not known a priori;[133] proposed estimating dimension locally from the decay of singular values in Σj, and then take the median of all local dimensions to estimate d;[92] proposed a different approach based on multi-scale singular value decomposition.

Once a pair of estimated bases Bi,Bj is obtained for neighboring points ξi, ξj,

one estimates a parallel-transport from Tξi M to Tξj M as

O : arg min O BJB , ji “ ´ j i HS OPOpdq › › › › where }¨}HS is the Hilbert-Schmidt norm. Though this minimization problem is non-

J convex, it has a efficient closed-form solution via the SVD of Bi Bj, namely

J J J J Oji “ UV , where Bj Bi “ UΣV is the SVD of Bj Bi. 59 It is worth noting that Oji depends on the bases; it operates on the coordinates of tangent vectors under Bi and Bj, as explained above. Oji approximates the

true parallel-transport Pξj ,ξi (composed with the bases-expansions) with an error of

O pPCAq, in the sense of [133, lemma B.1]. We summarize our sampling strategy for this section (with some new notations) in the following definition.

Definition 22. 1. Let tξ1, ¨ ¨ ¨ , ξNB u be a collection of samples from the base man- ifold M, i.i.d. with respect to some probability density function p P C8 pMq.

For each ξj, 1 ď j ď NB, sample NF points uniformly from the pd ´ 1q-

dimensional standard unit sphere Sd´1 in Rd, and denote the set of samples

as Cj “ tcj,1, ¨ ¨ ¨ , cj,NF u, where each cj,s is a d ˆ 1 column vector. Using the

basis Bj estimated from the local PCA procedure, each cj,s corresponds to an

“approximate tangent vector at ξj”, denoted as

τj,s :“ Bjcj,s.

We use the notation Sj for the unit sphere in the estimated tangent space (i.e.,

the column space of Bj). Note that the τj,1, ¨ ¨ ¨ , τj,NF are uniformly distributed

on Sj.

2. By [133, lemma B.1], for any Bj there exists a D ˆ d matrix Qj, such that the

columns of Qj constitutes an orthonormal basis for ι˚Tξj M and

}Bj ´ Qj}HS “ O pPCAq .

We define the tangent projection from ι˚Sξj to the estimated tangent plane as

J QjQ τj,s τ τ j . j,s ÞÑ j,s “ J QjQj τj,s › › › › 60 This map is well-defined for sufficiently small PCA, and then it is an isometry. Its inverse is given by

J BjB τ j,s τ τ j . j,s ÞÑ j,s “ J BjBj τ j,s › › Note that we have › ›

}τj,s ´ τ j,s} ď CPCA

for some constant C ą 0 independent of indices j, s. Since we sample each

Sj uniformly and the projection map τj,s ÞÑ τ¯j,s is an isometry, the points

tτ j,1, ¨ ¨ ¨ , τ j,NF u are also uniformly distributed on Sξj . The points

τ 1,1, τ 1,2, ¨ ¨ ¨ , τ 1,NF

τ 2,1, τ 2,2, ¨ ¨ ¨ , τ 2,NF . . . . . ¨ ¨ ¨ .

τ NB ,1, τ NB ,2, ¨ ¨ ¨ , τ NB ,NF

are therefore distributed on UTM according to a joint probability density func- tion p on UTM defined as

p px, vq “ p pxq , @ px, vq P UTM.

As in Assumption 19, we assume p satisfies (3.4.5), i.e.,

0 ă pm ď p px, vq “ p pxq ď pM ă 8, @ px, vq P UTM

for positive constants pm, pM .

3. For  ą 0, δ ą 0 and 1 ď i, j ď NB, 1 ď r, s ď NF , define

}ξ ´ ξ }2 }O c ´ c }2 K i j , ji i,r j,s , i ‰ j, K,δ pτ i,r, τ j,sq “  δ $ ˆ ˙ &0, i “ j. % where Oji is the estimated parallel-transport from Tξi M to Tξj M. 61 4. For 0 ď α ď 1, define

NB NF qˆ,δ pτ i,rq “ K,δ pτ i,r, τ j,sq j“1 s“1 ÿ ÿ

and

α K,δ pτ i,r, τ j,sq K,δ pτ i,r, τ j,sq “ α α , 1 ď i, j ď NB, 1 ď r, s ď NF . qˆ,δ pτ i,rq qˆ,δ pτ j,sq

5. For 0 ď α ď 1 and f P C8 pUTMq, denote

NB NF α K,δ pτ i,r, τ j,sq f pτ j,sq α j“1 s“1 H,δf pτ i,rq “ ÿ ÿ . NB NF α K,δ pτ i,r, τ j,sq j“1 s“1 ÿ ÿ

Theorem 23 (Finite Sampling from Empirical Tangent Planes). In addition to As- sumption 18, suppose

2 ´ d`2 (i) PCA “ O NB as NB Ñ 8; ˆ ˙

1 3 2 (ii) As  Ñ 0, δ “ O pq and δ " PCA `  2 ; ´ ¯ (iii) N lim F “ ρ P p0, 8q . NB Ñ8 NB NF Ñ8

Then for any τi,r with 1 ď i ď NB and 1 ď r ď NF , as  Ñ 0 (and thus δ Ñ 0), with

62 high probability

H 1´α H 1´α α m21 ∆S rfp s pτ i,rq ∆S p pτ i,rq H f pτ i,rq “ f pτ i,rq `  ´ f pτ i,rq ,δ 2m p1´α pτ q p1´α pτ q 0 „ i,r i,r  m ∆V rfp1´αs pτ q ∆V p1´α pτ q ` δ 22 S i,r ´ f pτ q S i,r (3.4.30) 2m p1´α pτ q i,r p1´α pτ q 0 „ i,r i,r  ´ 1 d 1 3 2 2 ´1 2 ´ 4 ´1 2 2 ` O  ` δ ` θ˚ NB  ` δ PCA `  , ´ ´ ¯¯ where 1 θ˚ “ 1 ´ . d d´1 NF 1 `  4 δ 4 N c B We give a proof of Theorem 23 in Appendix B.0.5.

3.5 Numerical Experiments and the Riemannian Adiabatic Limits

In this section, we consider a numerical experiment on SO p3q, the special linear group of dimension 3, realized as the unit tangent bundle of the standard sphere S2 in R3. We shall compare both sampling strategies covered in Section 3.4.

3.5.1 Sampling without Noise

In the first step, we uniformly sample NB “ 2, 000 points tξ1, ¨ ¨ ¨ , ξNB u on the unit

2 sphere S , and find for each sample point the KB “ 100 nearest neighbors in the point cloud. Next, we sample NF “ 50 vectors of unit length tangent to the unit sphere at each sample point (which in this case is a circle), thus collecting a total of

2 NB ˆ NF “ 100, 000 points on UTS “ SO p3q, denoted as

txj,s | 1 ď j ď NB, 1 ď s ď NF u .

The hypoelliptic diffusion matrix H is then constructed as an NB ˆ NB block matrix with block size NF ˆ NF , and Hij (the pi, jq-th block of H) is non-zero only if the 63 sample points ξi, ξj are each among the KB-nearest neighbors of the other; when Hij

is non-zero, its pr, sq-entry (1 ď r, s ď NF ) is non-zero only if Pξj ,ξi xi,r and xj,s are each among the KF “ 50 nearest neighbors of the other, and in that case

2 2 }ξ ´ ξ } Pξ ,ξ xi,r ´ xj,s H pr, sq “ exp ´ i j exp ´ j i ij  δ ˜ ¸ ˜ › › ¸ › › (3.5.1) 2 2 }ξ ´ ξ } Pξ ,ξ xi,r ´ xj,s “ exp ´ i j ` j i ,  δ « ˜ › › ¸ff › › where the choices of , δ will be explained below. Note that for the unit sphere S2

2 2 the parallel-transport from Tξi S to Tξj S can be explicitly constructed as a rotation along the axis ξi ˆξj. Finally, we form the α-normalized hypoelliptic diffusion matrix

Hα by

Hij pr, sq pHαqij pr, sq “ α α , (3.5.2) NB NF NB NF Hil pr, mq Hjk pr, nq ˜l“1 m“1 ¸ ˜k“1 n“1 ¸ ÿ ÿ ÿ ÿ and solve the eigenvalue problem

´ 1 ´ 1 D 2 HαD 2 U “ UΛ (3.5.3) ´ ¯

where D is the pNBNF q ˆ pNBNF q diagonal matrix with entry pk, kq equal to the

k-th column sum of Hα:

NB NF D pk, kq “ Hα pk, vq , v“1 ÿ and Λ is a diagonal matrix of the same dimensions. Throughout this experiment, we fix α “ 1,  “ 0.2 and choose various values of δ ranging from 0.0005 to 50, and observe the spacing of the eigenvalues stored in Λ.

The purpose of this experiment is to investigate the influence of the ratio γ “ δ{ on the spectral behavior of graph hypoelliptic Laplacians. As shown in Figure 3.4,

64 the spacing in the spectrum of these graph hypoelliptic Laplacians follow patterns

similar to the multiplicities of the eigenvalues of corresponding Laplacians on SO p3q (governed by the relative size of δ and ). In Figure 3.4(a), δ ! , hence the graph hypoelliptic Laplacian approximates the horizontal Laplacian on SO p3q (according to Theorem 14), in which the smallest eigenvalues have multiplicities 1, 6, 13, ¨ ¨ ¨ ; in Figure 3.4(b), δ “ O pq, hence the graph hypoelliptic Laplacian approximates the total Laplacian on SO p3q (according to Theorem 14), with eigenvalue multiplic- ities 1, 9, 25, ¨ ¨ ¨ ); in Figure 3.4(c), δ " , hence the graph hypoelliptic Laplacian approximates the Laplacian on the base manifold S2 (according to Corollary 16), with eigenvalue multiplicities 1, 3, 5, ¨ ¨ ¨ ). Note that in Figure 3.4(c) we fixed  and pushed δ to 8, which is essentially equivalent to the limit process in (3.4.24) rather than (3.4.25). Moreover, if in each figure we divide the sequence of eigenvalues by the smallest non-zero eigenvalue, the resulting sequence coincide with the eigenvalues of the corresponding manifold Laplacian up to numerical error. For a description of the spectrum of these partial differential operators, see [146, Chapter 2].

(a) δ “ 0.002 (b) δ “ 0.015 (c) δ “ 20 Figure 3.4: Bar plots of the smallest 36 eigenvalues of 3 graph hypoelliptic Lapla- cians with fixed  “ 0.2 and varying δ (sampling without noise)

3.5.2 Sampling from Empirical Tangent Spaces

Similar to sampling without noise, we uniformly sample NB points tξ1, ¨ ¨ ¨ , ξNB u on

the unit sphere in the first step, then construct the KB-nearest-neighbor-graph for

65 the point cloud with KB “ 100; the only difference is that here NB “ 4, 000. (This finer discretization is necessary since we know from Theorem 23 and Theorem 21 that sampling from empirically estimated tangent spaces results in a slower convergence rate for HDM on unit tangent bundles. For the same reason we choose a larger

´5u2 NF , see below.) Next, we perform local PCA (with KPCA puq “ e χr0,1s) in the

KB-neighborhood around each sample point ξj, and solve for Oij from the local PCA bases Bi,Bj whenever ξi, ξj are among the KB-nearest-neighbors of each other. We

1 2 then sample NF “ 100 points from the standard unit circle S in R for each ξj, and denote them as

tcj,s | 1 ď j ď NB, 1 ď s ď NF u .

The block construction of the hypoelliptic diffusion matrix H is similar to the noise- less case, but with non-zero Hij pr, sq replaced with

2 2 }ξi ´ ξj} }Ojici,r ´ cj,s} Hij pr, sq “ exp ´ exp ´ ˜  ¸ ˜ δ ¸ (3.5.4) }ξ ´ ξ }2 }O c ´ c }2 “ exp ´ i j ` ji i,r j,s . « ˜  δ ¸ff

Finally, we construct Hα as in (3.5.2), set α “ 1, and solve the same generalized eigenvalue problems (3.5.3) with fixed  “ 0.2 and varying δ. As shown in Figure 3.5, the spacing of the spectrum of graph hypoelliptic Laplacians is quite similar to what was obtained in sampling without noise.

3.5.3 As-Flat-As-Possible (AFAP) Connections

The purpose of this experiment is to provide some insights into the embeddings introduced in (3.3.16) and (3.3.19). As mentioned in Section 3.3.3, the embedding resulting from hypoelliptic diffusion maps tends to map “similar points” (where the similarity is specified by the pairwise correspondences) on different fibres to points on

66 (a) δ “ 0.002 (b) δ “ 0.042 (c) δ “ 20 Figure 3.5: Bar plots of the smallest 36 eigenvalues of 3 graph hypoelliptic Lapla- cians with fixed  “ 0.2 and varying δ (sampling from empirical tangent spaces) a common “template” fibre that are close to each other with respect to the Euclidean space into which the embedding takes place. This is illustrated in Figure 3.6 using the HDM obtained from the unit sphere example in Section 3.5.2, with NB “ 4, 000,

NF “ 100, KB “ 100, KF “ 50,  “ 0.2, and δ “ 0.042. We pick an arbitrary point

2 Figure 3.6: The vector field (near ξj) on S determined by (3.5.5)

xj,s on Sξj , the j-th fibre sampled from the unit tangent bundle, which stands for a unit tangent vector (see the black arrow in Figure 3.6) to the unit sphere at ξj, the

j-th sample. On each fibre Sξk where 1 ď k ‰ j ď NB, we then look for

t t Pξk,ξj xj,s :“ arg min Hk pxk,rq ´ Hj pxj,sq (3.5.5) xk,rPSξ k › › › › r › r r ›

67 t where Hk is defined in (3.3.21), and we choose t “ 1. The resulting collection of unit tangent vectors r

Γ:“ txj,su Pξk,ξj xj,s | 1 ď k ‰ j ď NB (3.5.6) ď ! ) gives rise to a discretizationr of a sectionr on the unit tangent bundle UTS2; this discretization can then be extended to the entire S2 by interpolation. Since the connection we used in this construction of HDM is the canonical Levi-Civita connec- tion, the similarity between two points on different fibres are measured according to their deviation from being parallel along geodesic segments to each other; therefore,

each Pξk,ξj xj,s stands for the unit tangent vector in the fibre Sξk that is the closest to P x among all discrete samples tx | 1 ď l ď N u. As shown in Figure 3.6, near ξk,ξj rj,s k,l F

ξj the vector field Γ (extended by interpolation) is approximately constructed from parallel-transporting x to its neighboring fibres along geodesic segments. HDM r j,s thus implicitly constructs vector fields on S2 that are locally as close to a parallel vector field as possible, though we know from differential topology that, globally, there is no “truly parallel” unit-norm vector field on the manifold S2. In this partic- ular example, the “as-parallel-as-possible vector fields” produced by HDM can also be interpreted as generated by a connection that is as flat as possible (AFAP), or as close to trivial as possible. A related construction on triangular meshes can be found in [43], which relies heavily on the connectivity information stored in the mesh structure. It is worthwhile to note that our approach using HDM is fundamentally different, in that our computational approach uses only a random neighborhood graph of the point cloud constituted by approximate samples of the manifold, as opposed to a structured triangular mesh.

68 3.5.4 An Excursion to the Riemannian Adiabatic Limits

The formulae (3.5.1) and (3.5.4) provide an alternative interpretation (other than the one given by the HDM framework developed in this paper) for our numerical experiments in Section 3.5.1 and Section 3.5.2, as follows. If we set γ “ δ{, then

2 2 }ξ ´ ξ } Pξ ,ξ xi,r ´ xj,s exp ´ i j ` j i  δ « ˜ › › ¸ff › ›

1 1 2 “ exp ´ }ξ ´ ξ }2 ` P x ´ x ,  i j γ ξj ,ξi i,r j,s „ ˆ ˙ › › › › and our numerical experiments can be understood as applying the standard diffusion map (with bandwidth parameter ) to the total manifold of SO p3q, except that the total manifold is equipped with a family of metrics different from the canonical bi- invariant one. These metrics all rely on the splitting of the tangent bundle of S2 by the Levi-Civita connection (as defined in AppendixA, see (A.0.5)), and are formed by recombining the horizontal and vertical components of the Sasaki metric tensor using a parameter γ ą 0 that controls the relative weight of the two components. This is in contrast to the interpretation given by the hypoelliptic diffusion map framework: assuming γ ą 0 is fixed (implying δ “ O pq), recall from Theorem 14 that 1 H,γf pxq ´ f pxq m21 H m22 V lim “ ∆S ` γ∆S f pxq , (3.5.7) Ñ0  2m m 0 ˆ 21 ˙ thus γ controls the infinitesimal generator of the diffusion process in consideration, while the metric on the total manifold of SO p3q is fixed. (Though we do not fix γ in our numerical experiments, the limit in (3.5.7) still provides insights for small values of  ą 0.) This duality between metrics and infinitesimal generators, reflected in the change of the relative size of the bandwidth parameters, is a natural conse- quence from a differential geometric point of view: the Laplace-Beltrami operator on a Riemannian manifold depends on the choice of the Riemannian metric ten- sor; while the bandwidth parameters are characteristics of the chosen diffusion map,

69 they can equivalently be interpreted as deformations of the underlying metric ten- sor. We would also like to mention related work that investigated the link between bandwidth and kernel density estimation [23], as well as recent progress in analyzing diffusion kernels with data-dependent bandwidth [15]; their relation with HDM will be explored in more detail in future work. The decomposition of tangent spaces of the total manifold is not only an essential element in the HDM framework but also the source of the duality relation discussed above. In differential geometry, such a decomposition can be studied in the broader context of Riemannian submersions, the purpose of which is to study the index theory for a family of smooth manifolds (parametrized by a base manifold). It is then important to “blow-up” the horizontal component of the metric so as to extract the fibre information; the approach adopted there is formally similar to the metric deformation we utilize in HDM, except that the parameter γ multiplies the horizontal component of the metric tensor and sent to 8 in the limit process (known as the Riemannian Adiabatic Limit [19, 18]). Though there is thus a close relation between that approach and HDM, we emphasize that our main focus here is the spectral geometry of the fibres rather than their topological invariants.

3.6 Discussion and Future Work

This paper introduced hypoelliptic diffusion maps (HDM), a novel semi-supervised learning framework for the analysis and organization of a class of complex data sets, in which individual structures at each data point carry abundant information that can not be easily extracted away by a pairwise similarity measure. We also introduced the fibre bundle assumption, a generalization of the manifold assumption, and proved that under this assumption HDM provides embeddings for both the base and the total manifold; furthermore, the flexibility of the HDM framework enables us to view VDM and the standard diffusion maps (DM) as special cases. The rest

70 of the paper focused on analyzing HDM on the tangent and unit tangent bundles of closed Riemannian manifolds, with convergence rate estimated for finite sampling on unit tangent bundles. These results provide the mathematical foundation for HDM on tangent bundles, and motivate further studies concerning both wider applicability and deeper mathematical understanding of the algorithmic framework. We conclude this paper with a few potential directions for further exploration.

1) HDM on General Fibre Bundles. We are interested in providing a more general mathematical framework for studying HDM on a wider class of fibre bundles. This is necessary and interesting, since data sets of interest to HDM (such as shape collections or persistent diagrams) are naturally modeled on fibre bundles that are more general than tangent and unit tangent bundles. The theory of shape spaces [105] is of particular importance in this direction, since the concepts of horizontal and vertical Laplacians are readily available in the Sub-Riemannian literature [104, 125, 10].

2) Spectral Convergence of HDM. The convergence results in this paper are point- wise; as in [13, 136], we believe that it is possible to show the convergence of the eigenvalues and eigenvectors of the graph hypoelliptic Laplacians to the eigen- values and eigenvectors of the manifold hypoelliptic Laplacians, thus establishing the mathematical foundation for the spectral analysis of HDM. Moreover, the hypoelliptic diffusion maps differ from diffusion maps and vector diffusion maps in that the fibres tend to be registered to a common “template”, which, to our knowledge, is a new phenomenon that is addressed here for the first time.

3) Spectral Clustering and Cheeger-Type Inequalities. An important application of graph Laplacian is spectral clustering (graph partitioning). In a simple case, for a connected graph, the eigenvector corresponding to the smallest positive eigen- value of the graph Laplacian partitions the graph vertices into two similarly sized

71 subsets, in such a way that the number of edges across the subsets is as small as possible. In spectral graph theory [37], the classical Cheeger’s Inequality pro- vides upper and lower bounds for the performance of the partition; recently, [7] established similar results for the graph connection Laplacian, the central object of VDM. We believe that similar inequalities can be established for graph hy- poelliptic Laplacians as well, with potentially more interesting behavior of the eigenvectors. For instance, we observed in practice that the eigenvector corre- sponding to the smallest positive eigenvalue of the graph hypoelliptic Laplacian stably partitions all the fibres in a globally consistent manner.

4) Multiscale Analysis and Hierarchical Coarse-Graining. Multiscale representation of massive, complex data sets based on similarity graphs is an interesting and fruitful application of diffusion operators [86, 42]. Based on HDM, one can build a similar theory for data sets possessing fibre bundle structures, providing a natural framework for coarse-graining that is meaningful (or even possible) only when performed simultaneously on the base and fibre manifolds. Moreover, since the hypoelliptic diffusion matrix is often of high dimensionality, an efficient approach to store and compute its powers will significantly improve the applicability of the HDM algorithm. We thus expect to develop a theory of hypoelliptic diffusion wavelets and investigate their performance on real data sets with underlying fibre bundle structures.

72 4

Hypoelliptic Diffusion Maps on the Heisenberg Group

Chapter3 established the theory of hypoelliptic diffusion maps (HDM) on tangent and unit tangent bundles, and pointed out that similar constructions apply to much wider classes of fibre bundles. In this chapter, we present such a construction using the Heisenberg group, a classical object studied in sub- and mathematical physics.

4.1 The Heisenberg Group H3 pRq

There are many different ways to consider the Heisenberg Group H3 pRq. For ex- ample, it can be considered as a C8 structure induced by the central extension of the symplectic vector space R2 [51]. The point of view we adopt here is completely differential-geometric.

3 Definition 24 (R as the Heisenberg Group H3 pRq). The Heisenberg group H3 pRq is defined as the trivial R-bundle over R2, equipped with a connection specified by

73 the zero-level set of the differential form

θ “ dz ´ c px dy ´ y dxq ,

where tdx, dyu is a coframe for the base manifold R2, dz is a coframe for the fibre R, and c ‰ 0 is a fixed constant.

2 As a preparation, we compute the canonical lifts of B{Bx, B{By from R to H3 pRq

through the connection defined by θ. Denote the canonical projection π : H3 pRq Ñ R2 by px, y, zq ÞÑ px, yq, and L pB{Bxq, L pB{Byq the lifted images of B{Bx, B{By, respectively. Suppose

B B B B L “ A px, y, zq ` B px, y, zq ` C px, y, zq , Bx Bx By Bz ˆ ˙ then B B π L “ ñ A px, y, zq “ 1,B px, y, zq “ 0 Bx Bx „ ˆ ˙ and

B θ L “ 0 ñ C px, y, zq “ c pxB px, y, zq ´ yA px, y, zqq “ ´cy, Bx „ ˆ ˙ hence B B B L “ ´ cy . Bx Bx Bz ˆ ˙ Similarly, B B B L “ ` cx . By By Bz ˆ ˙ Therefore, it is straightforward to lift any any first order differential operator on

2 R to H3 pRq, by simply replacing B{Bx, B{By with L pB{Bxq, L pB{Byq. One such example is the gradient operator on R2

Bf Bf 2 ∇f “ , , f P C8 R , (4.1.1) Bx By ˆ ˙ ` ˘ 74 for which we define the horizontal gradient operator of any smooth function on H3 pRq by

H Bf Bf Bf Bf 8 ∇ f “ ´ cy , ` cx , f P C pH3 pRqq . (4.1.2) Bx Bz By Bz ˆ ˙ 2 Furthermore, it is equally immediate to lift any diffusion operator on R to H3 pRq, simply by translating the former into its H¨omanderform [51] and replacing B{Bx,

B{By with their lifted images L pB{Bxq, L pB{Byq, respectively. For example, the

Laplace-Beltrami operator on R2, denoted as

B2 B2 B B B B ∆ “ ` “ ` (4.1.3) Bx2 By2 Bx Bx By By can be lifted to R3 as

B B B B ∆H “ L L ` L L Bx Bx By By ˆ ˙ ˆ ˙ ˆ ˙ ˆ ˙ B B B B B B B B “ ´ cy ´ cy ` ` cx ` cx (4.1.4) Bx Bz Bx Bz By Bz By Bz ˆ ˙ ˆ ˙ ˆ ˙ ˆ ˙ B2 B2 B2 B2 B2 “ ` ` c2 x2 ` y2 ´ 2cy ` 2cx . Bx2 By2 Bz2 Bx Bz By Bz ` ˘ Here the superscript H stands for “horizontal”. Note that ∆H is known as a hor- izontal Laplacian [146] on the Heisenberg group; we refer to ∆H as the horizontal

Laplace-Beltrami operator on H3 pRq. Specifying the “vertical” counterparts of these differential operators is thus much more straightforward. We define the vertical gradient operator as

V Bf 8 ∇ f “ 0, 0, , f P C pH3 pRqq , (4.1.5) Bz ˆ ˙ and the vertical Laplace-Beltrami operator

B2 B B ∆V “ “ . (4.1.6) Bz2 Bz Bz 75 In the following sections, we construct HDM on the Heisenberg group R3, and the limit partial differential operators will be linear combinations of ∆H and ∆V .

4.2 Hypoelliptic Diffusion Maps on H3 pRq 4.2.1 Construction

Let kernel K : R2 Ñ Rě0 be any smooth function supported on the closed unit square r0, 1s ˆ r0, 1s, and  ą 0, δ ą 0 arbitrary bandwidth parameters. Denote for

any pairs of x, y P R2 and v, w P R

}x ´ y} |Py,xv ´ w| K,δ px, v; y, wq “ K ? , ? .  δ ˆ ˙

2 where } ¨ } is the canonical Euclidean 2-norm on R . Intuitively, K,δ px, y; v, wq is radially symmetrically supported within a tube around the trace of parallel trans- porting v from the fibre based at x to the fibre based at y. Though not symmetric

in variable pair pv, wq, K,δ px, y; v, wq is still a symmetric kernel in the sense that

K,δ px, v; y, wq “ K,δ py, w; x, vq .

Indeed,

}y ´ x} |w ` c py1x2 ´ y2x1q ´ v| K,δ py, w; x, vq “ K ? , ?  δ ˆ ˙ }x ´ y} |v ´ c py x ´ y x q ´ w| “ K ? , 1 2? 2 1  δ ˆ ˙

}x ´ y} |v ` c px1y2 ´ x2y1q ´ w| “ K ? , ? “ K,δ px, v; y, wq .  δ ˆ ˙

This symmetry is a result of the metric-compatibility of the connection. This point will be exploited in future generalizations of this framework to vector bundles and principal bundles.

76 Like in the setup for horizontal diffusion maps, we have to deal with non-uniform

sampling density. Let the density function p P C8 pR2 ˆ Rq be a positive function on R2 ˆ R such that

2 p py, wq dw dy “ 1, p py, wq ě p0 ą 0 for all py, wq P R ˆ R, 2 żR żR

where p0 is a positive constant, and define the empirical density function p,δ P

C8 pR2 ˆ Rq by

p,δ px, vq “ K,δ px, y; v, wq p py, wq dw dy. 2 żR żR

Furthermore, with a fixed constant 0 ď α ď 1, define the α-normalized kernel

α 2 2 K,δ : R ˆ R ˆ pR ˆ Rq Ñ R ` ˘ by

α K,δ px, v; y, wq K,δ px, v; y, wq “ α α . p,δ px, vq p,δ py, wq

Definition 25 (Bundle Diffusion Operator on H3 pRq). For any α P r0, 1s and  ą 0, δ ą 0, define the α, , δ-normalized bundle diffusion operator, or bundle diffusion

8 operator for short, as the integral operator on C pH3 pRqq by

α K,δ px, y; v, wq f py, wq p py, wq dw dy α R2 R H,δf px, vq “ ż ż , px, vq P H3 pRq , (4.2.1) α K,δ px, y; v, wq p py, wq dw dy 2 żR żR where px, vq stands for a point on fibre bundle H3 pRq, with x and v corresponding to the “base” and “fibre” components, respectively; Py,xv is the parallel transport of v from the fibre based at x to the fibre based at y, along the geodesic segment on R2 (i.e., straight line segment) that connects x to y.

77 For sake of brevity, from now on we shall refer to the R2 variable x of any function f px, vq P R2 ˆ R as the “base variable”, and the R variable v as the “fibre variable”. The remainder of this chapter is devoted to proving the following theorem:

8 Theorem 26. As  Ñ 0, δ Ñ 0, for any f P C pH3 pRqq,

m ∆H rfp1´αs px, vq ∆H p1´α px, vq Hα f px, vq “ f px, vq `  1 ´ f px, vq ,δ 2m p1´α px, vq p1´α px, vq 0 „  V 1´α V 1´α 3 m2 ∆ rfp s px, vq ∆ p px, vq 1 1 δ f x, v O  2 δ 2 ` 1´α ´ p q 1´α ` ` 2m0 p px, vq p px, vq „  ˆ´ ¯ ˙ m ∆H rfp1´αs px, vq ∇H p1´α px, vq “ f px, vq `  1 ` 2∇H f px, vq ¨ 2m p1´α px, vq p1´α px, vq 0 „  V 1´α V 1´α 3 m2 ∆ rfp s px, vq V ∇ p px, vq 1 1 δ 2∇ f x, v O  2 δ 2 , ` 1´α ` p q ¨ 1´α ` ` 2m0 p px, vq p px, vq „  ˆ´ ¯ ˙ (4.2.2)

where m0, m1, m2 are positive constants. In particular, when α “ 1,

3 m1 m2 1 1 1 H V 2 2 H,δf px, vq “ f px, vq `  ∆ f px, vq ` δ ∆ f px, vq ` O  ` δ . 2m0 2m0 ˆ´ ¯ ˙ (4.2.3)

4.2.2 Proof of Theorem 26

2 The first crucial step is to expand Py,xv locally at x P R . Note that as a trivial fibre 2 2 bundle, H3 pRq splits into R ˆ R, where R is the base manifold and R is the fibre, thus we can view x as a point on R2 and v as a point on the R-fibre based at x. 2 On the base R , let x “ px1, x2q , y “ py1, y2q be two distinct points connected by a line segment, parametrized by

2 γ : r0, 1s Ñ R ,

t ÞÑ ty ` p1 ´ tq x “ x ` t py ´ xq .

78 Written out in coordinates, γ can be denoted as

γ ptq “ pγ1 ptq , γ2 ptqq “ px1 ` t py1 ´ x1q , x2 ` t py2 ´ x2qq .

Letγ ˜ : r0, 1s Ñ H3 pRq denote the unique lifted image of γ by the connection. Then

π ˝ γ˜ ptq “ γ ptq ,

θ pγ˜1q “ 0.

Writeγ ˜ out in coordinates asγ ˜ ptq “ pγ˜1 ptq , γ˜2 ptq , γ˜3 ptqq, the above equations trans- late into

1 1 1 γ˜1 ptq “ γ1 ptq , γ˜2 ptq “ γ2 ptq , γ˜3 ptq “ c pγ˜1 ptq γ˜2 ptq ´ γ˜2 ptq γ˜1 ptqq .

Notice that

γ˜1 ptq “ γ1 ptq “ x1 ` t py1 ´ x1q , γ˜2 ptq “ γ2 ptq “ x2 ` t py2 ´ x2q ,

we have

1 1 γ˜1 ptq “ y1 ´ x1, γ˜2 ptq “ y2 ´ x2,

and

1 γ˜3 ptq “ c px1 ` t py1 ´ x1qq py2 ´ x2q ´ c px2 ` t py2 ´ x2qq py1 ´ x1q

“ cx1 py2 ´ x2q ` ct py1 ´ x1q py2 ´ x2q ´ cx2 py1 ´ x1q ´ ct py2 ´ x2q py1 ´ x1q

“ cx1 py2 ´ x2q ´ cx2 py1 ´ x1q “ cx1y2 ´ cx1x2 ´ cx2y1 ` cx2x1

“ c px1y2 ´ x2y1q .

Sinceγ ˜3 p0q “ v, this implies

γ˜3 ptq “ v ` tc px1y2 ´ x2y1q .

By the definition of parallel transport,

Py,xv “ γ˜3 p1q “ v ` c px1y2 ´ x2y1q . (4.2.4)

We begin the proof with the following lemma.

79 8 8 2 Lemma 27. For any function g P C pH3 pRqq “ C pR ˆ Rq,

K,δ px, v; y, wq g py, wq dw dy 2 żR żR 3 1 m1 H m2 V 1 1 “ δ 2 m g px, vq `  ∆ g px, vq ` δ ∆ g px, vq ` O  2 ` δ 2 , 0 2 2 „ ˆ´ ¯ ˙ (4.2.5) where m0, m1, m2 are positive constants, and ∆H , ∆V are the horizontal and vertical

Laplace-Beltrami operators on H3 pRq, respectively.

Proof. Noting that the kernel K P C8 pR2q is supported on the closed unit square,

K,δ px, v; y, wq g py, wq dw dy 2 żR żR }x ´ y} |P v ´ w| “ K ? , y,x? g py, wq dw dy 2  δ żR żR ˆ ˙ ? Py,xv` δ }x ´ y} |Py,xv ´ w| “ ? K ? , ? g py, wq dw dy. ?  δ żB pxqżPy,xv´ δ ˆ ˙ ? Py,xv` δ }x ´ y} |P v ´ w| “ K ? , y,x? g py, wq dw dy ?  δ żB pxqżPy,xv ˆ ˙

Py,xv }x ´ y} |Py,xv ´ w| ` ? K ? , ? g py, wq dw dy ?  δ żB pxqżPy,xv´ δ ˆ ˙ “: I ` II.

In pIq, putting variables x, y in polar coordinates on R2

? 1 1 y “ x `  rθ “ x1 `  2 rθ1, x2 `  2 rθ2 , r P p0, 1s , θ1, θ2 P r0, 2πq , ´ ¯ and setting |P v ´ w| |v ` c px y ´ x y q ´ w| ρ “ y,x? “ 1 2? 2 1 P r0, 1s δ δ

80 yields

? v`cpx1y2´x2y1q` δ }x ´ y} |v ` c px y ´ x y q ´ w| I “ K ? , 1 2? 2 1 g py, wq dw dy ?  δ żB pxqżv`cpx1y2´x2y1q ˆ ˙ 1 1 1 1 1 “ δ 2 K pr, ρq g x `  2 rθ, v ` c px1y2 ´ x2y1q ` δ 2 ρ dρ dθ rdr 0 S1 0 ż ż ż ´ ¯ 1 1 1 1 1 “ δ 2 K pr, ρq g x `  2 rθ, v ` c px1 py2 ´ x2q´x2 py1 ´ x1qq ` δ 2 ρ dρ dθ rdr 0 S1 0 ż ż ż ´ ¯ 1 1 1 1 1 1 “ δ 2 K pr, ρq g x `  2 rθ, v ` c 2 r px1θ2 ´ x2θ1q ` δ 2 ρ dρ dθ rdr. 0 S1 0 ż ż ż ´ ¯ Similarly, for pIIq

v`cpx1y2´x2y1q }x ´ y} |v ` c px1y2 ´ x2y1q ´ w| II “ ?K ? , ? g py, wq dw dy ?  δ żB pxqżv`cpx1y2´x2y1q´ δ ˆ ˙ 1 0 1 1 1 “ δ 2 K pr, ρq g x `  2 rθ, v ` c px1y2 ´ x2y1q ´ δ 2 ρ p´dρq dθ rdr 0 S1 1 ż ż ż ´ ¯ 1 1 1 1 1 “ δ 2 K pr, ρq g x `  2 rθ, v ` c px1 py2 ´ x2q´x2 py1 ´ x1qq ´ δ 2 ρ dρ dθ rdr 0 S1 0 ż ż ż ´ ¯ 1 1 1 1 1 1 “ δ 2 K pr, ρq g x `  2 rθ, v ` c 2 r px1θ2 ´ x2θ1q ´ δ 2 ρ dρ dθ rdr. 0 S1 0 ż ż ż ´ ¯ Combining pIq and pIIq,

K,δ px, v; y, wq g py, wq dw dy 2 żR żR 1 1 1 1 1 1 “ δ 2 K pr, ρq g x `  2 rθ, v ` c 2 r px1θ2 ´ x2θ1q ` δ 2 ρ 0 S1 0 « ż ż ż ´ ¯

1 1 1 ` g x `  2 rθ, v ` c 2 r px1θ2 ´ x2θ1q ´ δ 2 ρ dρ dθ rdr. ff ´ ¯

We next Taylor expand g around px, vq. Denote B1g, B2g, B3g for the partial deriva- tives of g with respect to the first base variable, second base variable, and the fibre

81 variable, respectively.

1 1 1 1 g x `  2 rθ, v ` c 2 r px1θ2 ´ x2θ1q ` δ 2 ρ “ g px, vq ` B1g px, vq ¨  2 rθ1 ´ ¯ 1 1 1 ` B2g px, vq ¨  2 rθ2 ` B3g px, vq c 2 r px1θ2 ´ x2θ1q ` δ 2 ρ ´ ¯ 2 1 2 2 2 1 2 2 2 1 2 1 1 ` B g px, vq r θ ` B g px, vq r θ ` B g px, vq c 2 r px θ ´ x θ q ` δ 2 ρ 2 1 1 2 2 2 2 3 1 2 2 1 ´ ¯ 1 1 1 2 2 2 2 ` B13g px, vq  rθ1 c r px1θ2 ´ x2θ1q ` δ ρ ´ ¯ 1 1 1 2 2 2 2 ` B23g px, vq  rθ2 c r px1θ2 ´ x2θ1q ` δ ρ ´ ¯ 1 1 3 2 2 2 2 ` B12g px, vq r θ1θ2 ` O  r ` δ ρ , ˆ´ ¯ ˙ 1 1 1 1 g x `  2 rθ, v ` c 2 r px1θ2 ´ x2θ1q ´ δ 2 ρ “ g px, vq ` B1g px, vq ¨  2 rθ1 ´ ¯ 1 1 1 ` B2g px, vq ¨  2 rθ2 ` B3g px, vq c 2 r px1θ2 ´ x2θ1q ´ δ 2 ρ ´ ¯ 2 1 2 2 2 1 2 2 2 1 2 1 1 ` B g px, vq r θ ` B g px, vq r θ ` B g px, vq c 2 r px θ ´ x θ q ´ δ 2 ρ 2 1 1 2 2 2 2 3 1 2 2 1 ´ ¯ 1 1 1 2 2 2 2 ` B13g px, vq  rθ1 c r px1θ2 ´ x2θ1q ´ δ ρ ´ ¯ 1 1 1 2 2 2 2 ` B23g px, vq  rθ2 c r px1θ2 ´ x2θ1q ´ δ ρ ´ ¯ 1 1 3 2 2 2 2 ` B12g px, vq r θ1θ2 ` O  r ` δ ρ . ˆ´ ¯ ˙ Therefore,

1 1 1 1 1 1 g x` 2 rθ, v`c 2 r px1θ2 ´ x2θ1q ` δ 2 ρ `g x` 2 rθ, v`c 2 r px1θ2 ´ x2θ1q ´ δ 2 ρ ´ ¯ ´ ¯ 1 1 1 “ 2g px, vq ` 2B1g px, vq  2 rθ1 ` 2B2g px, vq  2 rθ2 ` 2B3g px, vq ¨ c 2 r px1θ2 ´ x2θ1q

2 2 2 2 2 2 2 2 2 2 2 ` B1g px, vq r θ1 ` B2g px, vq r θ2 ` B3g px, vq c r px1θ2 ´ x2θ1q ` δρ 2 2 2 “ 2 ‰ ` 2B13g px, vq ¨ cr θ1 px1θ2 ´ x2θ1q ` 2B23g px, vq ¨ cr θ2 px1θ2 ´ x2θ1q

1 1 3 2 2 2 2 ` 2B12g px, vq r θ1θ2 ` O  r ` δ ρ . ˆ´ ¯ ˙ 82 We shall integrate each of the terms on the right hand side above. By radial symmetry of the kernel K and the domain S1 of variable θ,

1 1 K pr, ρq rθ1 dρ dθ rdr “ 0, K pr, ρq rθ2 dρ dθ rdr “ 0, 1 1 żS ż0 żS ż0 1 2 K pr, ρq r θ1θ2 dρ dθ rdr “ 0, 1 żS ż0 1 1 1 1 2 2 2 2 K pr, ρq r θ1 dρ dθ rdr “ K pr, ρq r θ2 dρ dθ rdr, 1 1 ż0żS ż0 ż0żS ż0 and we define the following positive constants for the sake of brevity

1 1 1 1 2 m0 “ 2 K pr, ρq dρ dθ rdr ą 0, m2 “ 2 K pr, ρq ρ dρ dθ rdr ą 0, 1 1 ż0żS ż0 ż0żS ż0 1 1 1 1 2 2 2 2 m1 “ 2 K pr, ρq r θ1 dρ dθ rdr “ 2 K pr, ρq r θ2 dρ dθ rdr ą 0. 1 1 ż0żS ż0 ż0żS ż0 Now we have

1 1 K pr, ρq ¨ 2g px, vq dρ dθ rdr “ m0g px, vq , ż0żS1ż0 1 1 1 K pr, ρq ¨ 2B1g px, vq  2 rθ1 dρ dθ rdr “ 0, ż0żS1ż0 1 1 1 K pr, ρq ¨ 2B2g px, vq  2 rθ2 dρ dθ rdr “ 0, ż0żS1ż0 1 1 1 K pr, ρq ¨ 2B3g px, vq ¨ c 2 r px1θ2 ´ x2θ1q dρ dθ rdr “ 0, ż0żS1ż0 1 1 1 K pr, ρq B2g px, vq r2θ2 dρ dθ rdr “ m B2g px, vq , 1 1 2 1 1 ż0żS1ż0 1 1 1 K pr, ρq B2g px, vq r2θ2 dρ dθ rdr “ m B2g px, vq , 2 2 2 1 2 ż0żS1ż0 1 1 2 2 2 2 2 K pr, ρq B3g px, vq c r px1θ2 ´ x2θ1q ` δρ dρ dθ rdr ż0żS1ż0 “ ‰ 83 1 1 “ c2m x2 ` x2 B2g px, vq ` δm B2g px, vq , 2 1 1 2 3 2 2 3 ` ˘ 1 1 2 2 2 K pr, ρq ¨ 2B13g px, vq ¨ cr θ1 px1θ2 ´ x2θ1q dρ dθ rdr “ ´m1cx2B13g px, vq , ż0żS1ż0 1 1 2 2 2 K pr, ρq ¨ 2B23g px, vq ¨ cr θ2 px1θ2 ´ x2θ1q dρ dθ rdr “ m1cx1B23g px, vq , ż0żS1ż0 1 1 2 2 K pr, ρq ¨ 2B12g px, vq r θ1θ2 dρ dθ rdr “ 0, ż0żS1ż0 and it follows immediately that

K,δ px, v; y, wq g py, wq dw dy 2 żR żR 1 1 1 1 1 1 “ δ 2 K pr, ρq g x `  2 rθ, v ` c 2 r px1θ2 ´ x2θ1q ` δ 2 ρ 0 S1 0 « ż ż ż ´ ¯

1 1 1 ` g x `  2 rθ, v ` c 2 r px1θ2 ´ x2θ1q ´ δ 2 ρ dρ dθ rdr ff ´ ¯

1 1 2 2 2 2 2 2 “ δ 2 m g px, vq ` m B g px, vq ` B g px, vq ` c x ` x B g px, vq 0 2 1 1 2 1 2 3 « ˆ ˆ ˙

1 1 1 3 2 2 2 2 2 ´ 2cx2B13g px, vq ` 2cx1B23g px, vq ` δm2B3g px, vq ` O  ` δ 2 ff ˙ ˆ´ ¯ ˙ 3 1 m1 H m2 V 1 1 “ δ 2 m g px, vq `  ∆ g px, vq ` δ ∆ g px, vq ` O  2 ` δ 2 , 0 2 2 „ ˆ´ ¯ ˙ where ∆H , ∆V are the horizontal and vertical Laplace-Beltrami operators on H3 pRq as defined in (A.0.8) and (A.0.12). This completes the proof of the lemma.

Directly applying Lemma 27 to the empirical density function p,δ yields the

84 following asymptotic expansion

p,δ px, vq “ K,δ px, v; y, wq p py, wq dw dy 2 żR żR 3 1 m1 H m2 V 1 1 “ δ 2 m p px, vq `  ∆ p px, vq ` δ ∆ p px, vq ` O  2 ` δ 2 , 0 2 2 „ ˆ´ ¯ ˙ (4.2.6) and raising it to power α, 0 ď α ď 1 gives

´α p,δ px, vq

α ´α ´ 2 ´α ´α “  δ m0 p px, vq ˆ

α H V 3 ´ m1 ∆ p px, uq m2 ∆ p px, uq 1 1 1 `  ` δ ` `O  2 ` δ 2 2m0 p px, uq 2m0 p px, uq „ ˆ´ ¯ ˙ α ´α ´ 2 ´α ´α “  δ m0 p px, vq ˆ

H V 3 αm1 ∆ p px, vq αm2 ∆ p px, vq 1 1 1 ´  ´ δ ` O  2 ` δ 2 . 2m0 p px, vq 2m0 p px, vq „ ˆ´ ¯ ˙ Expanding the denominator of (4.2.1) is now as simple as

α K,δ px, v; y, wq p py, wq dw dy 2 żR żR

´α ´α “ K,δ px, v; y, wq p,δ px, vq p,δ py, wq p py, wq dw dy 2 żR żR H α αm1 ∆ p py, wq ´α ´ 2 ´α ´α 1´α “  δ m0 p,δ px, vq K,δ px, y; v, wq p py, wq 1 ´  2 2m p py, wq żR żR « 0

V 3 αm2 ∆ p py, wq 1 1 ´ δ ` O  2 ` δ 2 dw dy 2m0 p py, wq ff ˆ´ ¯ ˙ H V 1´α p1´αq ´α ´α 1´α αm1 ∆ p px, vq αm2 ∆ p px, vq “  δ 2 m p px, vq m p px, vq 1 ´  ´ δ 0 ,δ 0 2m p px, vq 2m p px, vq # „ 0 0 

3 m1 H 1´α m2 V 1´α 1 1 `  ∆ p px, vq ` δ ∆ p px, vq ` O  2 ` δ 2 2 2 + ˆ´ ¯ ˙ 85 H 1´α H 1´α p1´αq 1´α ´α 1´α m1 ∆ p px, vq ∆ p px, vq “  δ 2 m p px, vq p px, vq 1 `  ´ α 0 ,δ 2m p1´α px, vq p px, vq # 0 ˆ ˙

V 1´α V 3 m2 ∆ p px, vq ∆ p px, vq 1 1 δ α O  2 δ 2 . ` 1´α ´ ` ` 2m0 p px, vq p px, vq + ˆ ˙ ˆ´ ¯ ˙

To expand the numerator of (4.2.1), we apply Lemma 27 to g py, wq “ f py, wq p px, wq. Following a similar pattern as in expanding the denominator of (4.2.1),

α K,δ px, v; y, wq f pp, wq p py, wq dw dy 2 żR żR

´α ´α “ K,δ px, y; v, wq p,δ px, vq p,δ py, wq f py, wq p py, wq dw dy 2 żR żR

α ´α ´ 2 ´α ´α 1´α “  δ m0 p,δ px, vq K,δ px, y; v, wq f py, wq p py, wq ˆ 2 żR żR

H V 3 αm1 ∆ p py, wq αm2 ∆ p py, wq 1 1 1 ´  ´ δ ` O  2 ` δ 2 dw dy « 2m0 p py, wq 2m0 p py, wq ff ˆ´ ¯ ˙ p1´αq 1´α 2 ´α ´α “  δ m0 p,δ px, vq ˆ

αm ∆H p px, vq αm ∆V p px, vq m f px, vq p1´α px, vq 1 ´  1 ´ δ 2 0 2m p px, vq 2m p px, vq # „ 0 0 

3 m1 H 1´α m2 V 1´α 1 1 `  ∆ fp px, vq ` δ ∆ fp px, vq ` O  2 ` δ 2 2 2 ˆ ˙ + “ ‰ “ ‰ ´ ¯ H p1´αq αm1 ∆ p px, vq 1´α 2 1´α ´α 1´α “  δ m0 p,δ px, vq p px, vq f px, vq ´  f px, vq # 2m0 p px, vq

αm ∆V p px, vq m ∆H rfp1´αs px, vq m ∆V rfp1´αs px, vq δ 2 f x, v  1 δ 2 ´ p q ` 1´α ` 1´α 2m0 p px, vq 2m0 p px, vq 2m0 p px, vq

1 1 3 ` O  2 ` δ 2 . + ˆ´ ¯ ˙

86 Combining all previous computations,

α K,δ px, y; v, wq f py, wq p py, wq dw dy α R2 R H,δf px, vq “ ż ż α K,δ px, y; v, wq p py, wq dw dy 2 żR żR αm ∆H p px, vq αm ∆V p px, vq “ f px, vq ´  1 f px, vq ´ δ 2 f px, vq # 2m0 p px, vq 2m0 p px, vq

H 1´α V 1´α 3 m1 ∆ rfp s px, vq m2 ∆ rfp s px, vq 1 1  δ O  2 δ 2 ` 1´α ` 1´α ` ` 2m0 p px, vq 2m0 p px, vq + ˆ´ ¯ ˙ m ∆H p1´α px, vq ∆H p px, vq ˆ 1 `  1 ´ α 2m p1´α px, vq p px, vq # 0 ˆ ˙

´1 V 1´α V 3 m2 ∆ p px, vq ∆ p px, vq 1 1 δ α O  2 δ 2 ` 1´α ´ ` ` 2m0 p px, vq p px, vq + ˆ ˙ ˆ´ ¯ ˙ αm ∆H p px, vq αm ∆V p px, vq “ f px, vq ´  1 f px, vq ´ δ 2 f px, vq # 2m0 p px, vq 2m0 p px, vq

H 1´α V 1´α 3 m1 ∆ rfp s px, vq m2 ∆ rfp s px, vq 1 1  δ O  2 δ 2 ` 1´α ` 1´α ` ` 2m0 p px, vq 2m0 p px, vq + ˆ´ ¯ ˙ m ∆H p1´α px, vq ∆H p px, vq ˆ 1 ´  1 ´ α 2m p1´α px, vq p px, vq # 0 ˆ ˙

V 1´α V 3 m2 ∆ p px, vq ∆ p px, vq 1 1 δ α O  2 δ 2 ´ 1´α ´ ` ` 2m0 p px, vq p px, vq + ˆ ˙ ˆ´ ¯ ˙ m ∆H p1´α px, vq ∆H p px, vq “ f px, vq ´  1 f px, vq ´ α 2m p1´α px, vq p px, vq 0 ˆ ˙ m ∆V p1´α px, vq ∆V p px, vq ´ δ 2 f px, vq ´ α 2m p1´α px, vq p px, vq 0 ˆ ˙ αm ∆H p px, vq αm ∆V p px, vq m ∆H rfp1´αs px, vq  1 f x, v δ 2 f x, v  1 ´ p q ´ p q ` 1´α 2m0 p px, vq 2m0 p px, vq 2m0 p px, vq

87 V 1´α 3 m2 ∆ rfp s px, vq 1 1 δ O  2 δ 2 ` 1´α ` ` 2m0 p px, vq ˆ´ ¯ ˙ m ∆H p1´α px, vq αm ∆H p px, vq f x, v  1 f x, v  1 “ p q ´ p q 1´α ` 2m0 p px, vq 2m0 p px, vq

m ∆V p1´α px, vq αm ∆V p px, vq αm ∆H p px, vq δ 2 f x, v δ 2  1 f x, v ´ p q 1´α ` ´ p q 2m0 p px, vq 2m0 p px, vq 2m0 p px, vq

αm ∆V p px, vq m ∆H rfp1´αs px, vq m ∆V rfp1´αs px, vq δ 2 f x, v  1 δ 2 ´ p q ` 1´α ` 1´α 2m0 p px, vq 2m0 p px, vq 2m0 p px, vq

1 1 3 ` O  2 ` δ 2 ˆ´ ¯ ˙ m ∆H rfp1´αs px, vq ∆H p1´α px, vq “ f px, vq `  1 ´ f px, vq 2m p1´α px, vq p1´α px, vq 0 „  V 1´α V 1´α 3 m2 ∆ rfp s px, vq ∆ p px, vq 1 1 δ f x, v O  2 δ 2 . ` 1´α ´ p q 1´α ` ` 2m0 p px, vq p px, vq „  ˆ´ ¯ ˙ which proves Theorem 26.

88 5

Applications and Future Work

In this chapter, we discuss several applications in geometric morphometrics of the techniques developed in this thesis.

5.1 Applying HDM to Collections of Anatomical Surfaces

Though this thesis developed the theoretical foundation of HDM for only tangent bundles, the framework can be used in practice on a wide range of data sets that possess fibre bundle structures. In this subsection, we apply HDM to a data set consisting of 50 discretized triangular meshes of the second mandibular molar of prosimian primates and nonprimate close relatives1. The 50 meshes are evenly di- vided into 5 genera: Alouatta, Ateles, Brachyteles, Callicebus, and Saimiri; each mesh contains about 5000 vertices and 10000 faces. We compute the continuous Procrustes distance (CPD) and hypoelliptic base diffusion distance (HBDD) between each pair of triangular meshes in this data set, and compare the two distance matrices resulting from these computation by embedding them into R3 via multi-dimensional scaling (MDS).

1 All triangular meshes in this data set can be downloaded from http://morphosource.org/.

89 The algorithm used here for computing CPD is detailed in [3]. The HBDD is constructed from the distances and smooth maps computed in CPD, as follows. For

each pair of triangular meshes Si,Sj in the data set, denote their CPD as dij, and the smooth map from Si to Sj as fij. Note that

´1 dij “ dji, fij “ fji .

In the first step, we discretize each surface area measure µj “ dvolSj into a linear combination of Dirac delta measures supported on vertices of Sj, where each vertex of Sj is assigned 1{3 of the surface area of its one-ring neighborhood. We then soften each bijective smooth map fij into a random walk matrix wij, the s-th row of which records the transition probability from vertex xi,s of Si to each vertex on Sj; moreover, the specific softening we choose here allows each xi,s to jump (in one step)

2 only to the three vertices of the unique triangular face on Sj that contains fij pxi,sq.

If xj,r is a vertex on Sj that can be reached from xi,s in one step of the random walk,

we set the transition probability between xi,s and xj,r proportional to

}f px q ´ x }2 exp ´ ij i,s j,r , ˜ F ¸

where F is some prescribed positive constant that plays the role of the “bandwidth along the fibre” in HDM. In this application, we choose F “ 0.001 which is the order of magnitude of the average distance between adjacent vertices on each mesh in the data set. Next, we construct the hypoelliptic diffusion matrix H as a 50 ˆ 50 block

2 It is conceivable that fij pxi,sq could fall on the edge shared by two triangles in Sj, or even on a vertex of Sj shared by more than 2 triangles. While this rarely happens in practice, in our implementation for this application we resolve such conflicts by assigning fij pxi,sq randomly to any of the qualified triangles. This is because we express fij pxi,sq as a barycentric combination of the vertices of the triangle to which it is assigned, and thus the softening is in fact independent of the specific choice made.

90 matrix, with block pi, jq given by

d2 exp ´ ij ¨ w if S is within the N -neighborhood of S under CPD,  ij j B i $ ˆ B ˙ &0 otherwise. % In this application, NB “ 4 and B “ 0.03. These parameters are chosen empirically, where 0.03 is usually the maximum CPD between surfaces that belong to the same

H species group. We then construct the normalized graph hypoelliptic Laplacian Lα,˚ from H, as in (3.3.11), and solve for the largest 100 eigenvalues and corresponding

H eigenvectors of Lα,˚. Using these eigenvalues and eigenvectors, we compute the hy- poelliptic base diffusion map as in (3.3.16), resulting in an embedding of the data set

p100q 4950 into R 2 “ R . Though this embedding is still high dimensional, it is only 1{3 of the original dimensionality (approximately 5000 ˆ 3 “ 15000). The HBDD between each pair Si,Sj is then defined as the Euclidean distance between their images in

R4950, as in (3.3.18). See Figure 5.1 for the comparison of the MDS embeddings of the two distance matrices. Note that the HBDD distance matrix demonstrates a much more clear pattern of genera clusters.

Figure 5.1: MDS embeddings of CPD (left) and HBDD (right)

For applications in geometric morphometrics, a major advantage that HDM has over persistence-diagram-based methods is the morphological interpretability. This

91 interpretability amounts to a globally consistent manner to identify corresponding regions on each surface in the collection; this has the potential to be useful for sub- sequent studies for evolutionary and developmental history of species. In standard morphologists’ practice, such correspondences are assessed visually and manually; recent progress in techniques for generating and analyzing digital representations led to major advances [159, 154, 118], but they still require determination of anatomical landmarks by observers. In contrast, by spectral clustering on the point cloud em-

bedded into R100 by the hypoelliptic diffusion map (see (3.3.19)), we can easily obtain a globally consistent segmentation for each surface in the collection, see Figure 5.2.

Figure 5.2: Globally consistent segmentation by spectral clustering for HDM

5.2 Automatic Landmarking for Morphometircs via Sparsity

Beyond globally consistent segmentation, we further demonstrate the potential of the HDM framework in another application useful for automatic generation of globally consistent landmarks. Instead of regions, we now generate points similar to the

92 practice of morphologists. Figure 5.3 illustrates three near-minimizers (one for each row) of the sparse eigen-problem

2 min LH x λ x α,˚ 2 ` } }1 }x}2“1 › › › › that are orthogonal to each other, obtained by the Matching Pursuit algorithm.

S1 S2 S3 S4

x1

S1 S2 S3 S4

x2

S1 S2 S3 S4

x3

Figure 5.3: Automatic landmarking using sparse eigenvectors for a group of 4 teeth

Even though the minimization procedure promoted the selection of sparse x but in no way biased the vectors to distribute their entries over the four meshes, each of x1, x2, x3 automatically has only a single non-zero entry on each triangular mesh; moreover, those entries (indicated by the blue color of their Voronoi regions) point to homologous landmarks on this species group.

5.3 Tree-Based Metric Approximation in Shape Spaces

In practically, inconsistency along cycles of length 3 or larger is inevitable for pairwise- correspondence-based analysis of collections of shapes, see Figure 3.2. In other words,

93 the correspondence between two shapes Si,Sj depends on the choice of intermedi- ate shapes that form a path connecting Si and Sj. In the framework of generalized Procrustes distance, Puente [119] suggested resolving this ambiguity by enforcing all paths to be confined to a Minimum Spanning Tree (MST) for the complete dis- tance graph defined by all pairwise generalized Procrustes distances. In the context of non-rigid shape recognition and analysis, similar methods have been studied in [72, 69, 33]; from an algorithmic of view, this idea of approximating a finite metric space with additive metrics (tree metrics/ultrametrics) can be traced back to Karp [77], and has been popular in studying metric embeddings [4, 81,8, 121,9, 32, 52]. The tree-based metric space approximation strategy can be directly adapted to the continuous Procrustes distance (CPD) and continuous Procrustes-Kantorovich distance (CKPD). Similar to [119], we first construct the complete weighted distance graph in which vertices stand for shapes in the collection, and each edge is assigned a pairwise distance together with a pairwise correspondence. We then have the choice of constructing an MST, a Shortest Path Tree (SPT), or a Light Approximate Shortest-path Tree (LAST) [81] for this distance graph; for each choice, a unique

path γij is then fixed for any pair of shapes Si,Sj. We then update the pairwise

correspondence between Si,Sj with the composition of the correspondences attached

to the edges in γij. The tree-construction algorithms often require repeated evalu- ations of the length of paths under consideration to insert into the tree, a type of operation for which we again choose between several options. For instance, we can use either the sum of edge weights, or the value of the shape functional (1.3.1) for the correspondence under consideration. All these choices are presently being tested and compared on real data sets that had been used in [25]; these results will appear in a future work [56]. We expect that the greater sensitivity shown by our new ap- proaches, reflected in Figure 5.2, may also provide for better correspondence-based path construction in this application.

94 5.4 Phylogeny Reconstruction with Distance Methods

Distance methods are among the most straightforward approaches in molecular-based phylogeny reconstruction [157]. Typically two steps are involved: calculation of genetic distances between pairs of species and reconstruction of a phylogenetic tree from the distance matrix. Since distance matrices are available in our framework, phylogenies can be reconstructed using these methods as well.

(a) cPComposedLAST (b) cPComposedLASTmedian Figure 5.4: Phylogenies of a collection of 19 lemur teeth from 5 extant genera, constructed by applying UPGMA to shape distance matrices

The Un-weighted Pair Group Method with Arithmetic Mean (UPGMA) is one of the simplest distance-based phylogeny reconstruction techniques. Figure 5.4 illus- trates two phylogenies reconstructed from distance matrices obtained by two slightly different tree-based shape space approximations. The cPComposedLAST phylogeny seems to give better results than the cPComposedLASTmedian phylogeny, in that the former regroups together all the samples from the same genus type before link- ing with samples from a different genus type; in cPComposedLASTmedian there are two anomalies in this respect (K07 Tarsius and j14 Cheirogaleus). Note that the cPComposedLAST tree is not perfect, however: comparing with the ground truth 3 in Figure 5.5 (validated by molecular-based methods), we see that Figure 5.4(a)

3 http://10ktrees.fas.harvard.edu/

95 attaches the Tarsius genus in the wrong place. Noting that the ground truth uses a much larger molecular database obtained from many more individual animals, it is probably not surprising that the evolutionary history is not revealed correctly by the cPComposedLAST tree, given the very small samples from which it is constructed here. In the future, we intend to apply our methods to larger morphological data sets, and also combine the outputs with other phylogeny reconstruction methods.

Figure 5.5: Phylogenetic tree estimated with molecular-based methods.

96 Appendix A

The Geometry of Tangent Bundles

In this appendix, we briefly summarize some preliminaries on the geometry of tangent and unit tangent bundles. For the Sasaki metric [128, 129], readers may find useful the expositions in [64, 45, 106], or jump start from [44, Exercise 3.2]; for the unit tangent bundle, some results are collected in [21, 20] and the references therein. We define horizontal differential operators by directly lifting vector fields from the base manifold to the fibre bundle, which in principle applies to any diffusion operators [51].

A.0.1 Coordinate Charts on Tangent Bundles

Let M be a d-dimensional Riemannian manifold, TM its tangent bundle, and π :

TM Ñ M the canonical projection from TM to M. In a local coordinate chart U; x1, ¨ ¨ ¨ , xd of M, B{Bx1, ¨ ¨ ¨ , B{Bxd is a local frame, and we write the basis for

T` xM as ˘ (

B B 1 , ¨ ¨ ¨ , d . #Bx ˇ Bx ˇ + ˇx ˇx ˇ ˇ ˇ ˇ ˇ ˇ

97 A trivialization for TM on U can be chosen as

1 d 1 d px, vq ÞÑ x , ¨ ¨ ¨ , x , v , ¨ ¨ ¨ , v , x P U, v P TxM, ` ˘ and we write

B B B B 1 , ¨ ¨ ¨ , d , 1 , ¨ ¨ ¨ , d , $Bx ˇ Bx ˇ Bv ˇ Bv ˇ , & ˇpx,vq ˇpx,vq ˇpx,vq ˇpx,vq. ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ % ˇ ˇ ˇ ˇ - for a natural basis for Tpx,vqTM. It is immediately verifiable that

B B B dπpx,vq j “ j , dπpx,vq j “ 0, j “ 1, ¨ ¨ ¨ , d, ¨Bx ˇ ˛ Bx ˇ ¨Bv ˇ ˛ ˇpx,vq ˇx ˇpx,vq ˇ ˇ ˇ ˝ ˇ ‚ ˇ ˝ ˇ ‚ ˇ ˇ ˇ where dπpx,vq : Tpx,vqTM Ñ TxM denotes the differential of the canonical projection

π at px, vq P TM. Note our usage of “|x” and “|px,vq” to distinguish tangent vectors in TxM or Tpx,vqTM. Even when a connection is not present, the vertical tangent vectors to TM are well-defined. It suffices to take the subspace spanned by tangent vectors along the v-coordinates

B VTpx,vqM :“ span j , j “ 1 ¨ ¨ ¨ , d “ Ker dπpx,vq . (A.0.1) $Bv ˇ , & ˇpx,vq . ˇ ` ˘ ˇ % ˇ - We call the subbundle of TM consisting of all vertical tangent vectors the vertical tangent bundle

VTM :“ VTpx,vqM “ Ker pdπq . (A.0.2) px,vqPTM ž This immediately gives

T pTMq VTM » π˚TM. M 98 Using the Levi-Civita connection on M, we can find another subbundle of T pTMq, called the horizontal tangent bundle

HTM :“ HTpx,vqM (A.0.3) px,vqPTM ž where

B β α B HTpx,vqM :“ span j ´ Γαj pxq v β , j “ 1 ¨ ¨ ¨ , d . (A.0.4) $Bx ˇ Bv ˇ , & ˇpx,vq ˇpx,vq . ˇ ˇ ˇ ˇ β % ˇ ˇ - The symbols Γαj are the connection coefficients of the Levi-Civita connection, or the Christoffel symbols. The tangent bundle of TM splits into the direct sum of its horizontal and vertical components

T pTMq “ HTM ‘ VTM. (A.0.5)

The Sasaki metric is a natural metric [64] on TM. For two tangent vectors

X,Y P Tpx,vqTM, choose curves in TM

α : t ÞÑ pp ptq , u ptqq , β : s ÞÑ pq psq , w psqq , such that

p p0q “ q p0q “ x, u p0q “ w p0q “ v, and define

Du Dw xX,Y y “ xdπ pXq , dπ pY qy ` p0q , p0q , px,vq x dt ds B Fx where Du{dt and Dw{ds are covariant derivatives. Using the horizontal-vertical splitting of T pTMq, this metric can be equivalently defined as

xX,Y ypx,vq “ xdπ pXq , dπ pY qyx if X,Y P HTpx,vqM,

xX,Y ypx,vq “ xX,Y yx if X,Y P VTpx,vqM, (A.0.6)

xX,Y ypx,vq “ 0 if X P HTpx,vqM,Y P VTpx,vqM. 99 In words, the Sasaki metric imposes orthogonality between horizontal and vertical tangent bundles, and adopts metrics on HTM and VTM induced from the Rieman- nian metric on M.

A.0.2 Horizontal and Vertical Differential Operators on Tangent Bundles

Let Γ denote the Christoffel symbols of the Levi-Civita connection on M. Define the horizontal lift operator L : TxM Ñ Tpx,vqTM, from the tangent space of M at x P M to the tangent space of TM at px, vq P TM, by

B B β α B L j “ j ´ Γαj pxq v β . ˜Bx ˇ ¸ Bx ˇ Bv ˇ ˇx ˇpx,vq ˇpx,vq ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ It is direct to verify that this definition is independent of coordinates, as for (A.0.4).

L can be used to define “horizontal” first order differential operators on TM: just lift vector fields from M to TM. For instance, the gradient operator on M, denoted as ∇ : C8 pMq Ñ Γ pTMq, is defined as

x∇f, vyg “ df pvq for all v P TM, or in local coordinates

Bf g p∇fqj vk “ vk jk Bxk Bf ñ g p∇fqj “ , k “ 1, ¨ ¨ ¨ , d jk Bxk

ik Bf B ñ p∇fq x “ g pxq k i . Bx ˇ Bx ˇ ˇx ˇx ˇ ˇ ˇ ˇ ˇ ˇ Note that here f is a smooth function on M. Theˇ horizontalˇ gradient operator on

100 TM can be defined using L as follows

H ik B B ∇ f px, vq “ g pxq L k L i ˜Bx ˇ ¸ ˜Bx ˇ ¸ ˇx ˇx ˇ ˇ ˇ ˇ (A.0.7) ˇ ˇ ik Bf β α Bf B β α B “ g pxq k ´ Γαk pxq v β i ´ Γαi pxq v β . ¨Bx ˇ Bv ˇ ˛ ¨Bx ˇ Bv ˇ ˛ ˇpx,vq ˇpx,vq ˇpx,vq ˇpx,vq ˇ ˇ ˇ ˇ ˝ ˇ ˇ ‚˝ ˇ ˇ ‚ ˇ ˇ ˇ ˇ Similarly, for the Laplace-Beltrami operator on M

1 B jk Bf ∆f pxq “ j |g pxq| g pxq k , |g pxq| Bx ˇ ˜ Bx ˇ ¸ ˇx ˇx ˇ a ˇ a ˇ ˇ ˇ ˇ (where |g| “ |det g|), its horizontal counterpart is

H 1 B jk Bf ∆ f px, vq “ L j |g pxq| g pxq L k |g pxq| ˜Bx ˇ ¸ « ˜Bx ˇ ¸ff ˇx ˇx ˇ a ˇ a ˇ ˇ ˇ ˇ 1 B β α B “ j ´ Γαj pxq v β (A.0.8) |g pxq| ¨Bx ˇ Bv ˇ ˛ ˇpx,vq ˇpx,vq ˇ ˇ a ˝ ˇ ˇ ‚ ˇ ˇ jk Bf β α Bf |g pxq| g pxq k ´ Γαk pxq v β . » ¨Bx ˇ Bv ˇ ˛fi ˇpx,vq ˇpx,vq a ˇ ˇ – ˝ ˇ ˇ ‚fl ˇ ˇ k In a geodesic normal neighborhood centered at some fixed x P M,Γij pxq “ 0 for all indices 1 ď i, j, k ď d. Consequently, (A.0.7) and (A.0.8) simplify as

d H Bf B ∇ f px, vq “ k k . (A.0.9) Bx ˇ Bx ˇ k“1 ˇpx,vq ˇpx,vq ÿ ˇ ˇ ˇ ˇ ˇ ˇ and

d 2 H B f 1 α Bf ∆ f px, vq “ 2 ` Rαβ pxq v β . (A.0.10) B pxkq ˇ 3 Bv ˇ k“1 ˇpx,vq ˇpx,vq ÿ ˇ ˇ ˇ ˇ ˇ 101 ˇ The definition of vertical differential operators does not depend on the Levi-Civita connection. The vertical gradient operator on TM is simply

V ik Bf B ∇ f px, vq “ g pxq k i , (A.0.11) Bv ˇ Bv ˇ ˇpx,vq ˇpx,vq ˇ ˇ ˇ ˇ ˇ ˇ and the vertical Laplace-Beltrami operator on TM

V 1 B jk Bf ∆ f px, vq “ j |g pxq| g pxq k |g pxq| Bv ˇ ¨ Bv ˇ ˛ ˇpx,vq ˇpx,vq ˇ a ˇ (A.0.12) a ˇ ˝ ˇ ‚ 2 ˇ ˇ jk B f “ g pxq j k . Bv Bv ˇ ˇpx,vq ˇ ˇ ˇ The coordinate independence of these vertical differential operators follows from the observation that the v-components of the coordinates “behave like” the x-components under change of coordinates:

k j j B j Bx˜ B k B k Bx B v “ v j “ v j k “ v˜ k “ v˜ k j , Bx ˇ Bx ˇ Bx˜ ˇ Bx˜ ˇ Bx˜ ˇ Bx ˇ ˇx ˇx ˇx ˇx ˇx ˇx ˇ ˇ ˇ ˇ ˇ ˇ (A.0.13) ˇ j ˇ ˇ j ˇ j ˇ j ˇ j j j ˇ k Bx ˇ j ˇk Bx˜ ˇ Bv Bˇx Bv˜ˇ Bx˜ ñ v “ v˜ k , v˜ “ v k , ñ k “ k , k “ k , Bx˜ ˇ Bx ˇ Bv˜ Bx˜ Bv Bx ˇx ˇx ˇ ˇ ˇ ˇ or equivalently, they haveˇ the same Jacobian.ˇ Therefore, the coordinate invariance of ∇ and ∆ is equivalent to the coordinate invariance of ∇V and ∆V . For the same

reason, the volume form on TxM

1 d dVx pvq “ |g pxq| dv ^ ¨ ¨ ¨ ^ dv , (A.0.14) a is also coordinate invariant, since the volume form on M

1 d dvolM pxq “ |g pxq| dx ^ ¨ ¨ ¨ ^ dx (A.0.15) a 102 is well-defined. The volume element on TM with respect to the Sasaki metric is locally the product of the volume forms in (A.0.14) and (A.0.15). According to Li- ouville’s Theorem, this volume element is invariant under geodesic flows (see, e.g.,[44, Exercise 3.14]). Note that the existence of a volume element on a general fibre bundle is not guaranteed; the tangent bundle is special in that its total manifold is always orientable regardless of the orientability of its base manifold (see, e.g., [44, Exercise 0.2]). The volume element on the tangent bundle of M induces a volume element on the unit tangent bundle of M, as we shall see below.

A.0.3 The Unit Tangent Bundle as a Subbundle

The hypoelliptic diffusion map constructed in Section 3.3 works for both compact and non-compact manifolds. In practice, due to the constraint of finite sampling, we prefer to apply HDM to a compact object. Assuming M is compact is not sufficient, since TM is always non-compact. This motivates us to apply HDM to the unit tangent bundle UTM, a natural subbundle of TM with compact fibres. Following the notations used in Section A.0.1, a unit tangent bundle over a Rie- mannian manifold pM, gq is a subbundle of TM, with fibre over x P M consisting of the tangent vectors in TxM with unit length:

UTM :“ Sx,Sx :“ tv P TxM | gx pv, vq “ 1u Ă TxM, xPM ž where gx p¨, ¨q denotes for the inner product on TxM, defined by the Riemannian metric on M. Note that UTM is a hypersurface of TM, and thus induces a metric from that of TM. The volume form on UTM with respect to the induced metric, known as the Liouville measure or the kinematic density [35, Chapter VII], is the only invariant measure on UTM under geodesic flows.

We can define the gradient and the Laplace-Beltrami operator on Sx with respect

to the metric on UTM, denoted as ∇Sx and ∆Sx , respectively. The vertical spherical 103 V V gradient ∇S and vertical spherical Laplace-Beltrami operator ∇S can be defined

through ∇Sx and ∆Sx , similar to (A.0.11) and (A.0.12):

V ∇S f px, vq :“ ∇Sx f px, vq , x P M, v P Sx, (A.0.16) V ∆S f px, vq :“ ∆Sx f px, vq , x P M, v P Sx,

where we implicitly identified f with its restriction on Sx when and are applied to it, as long as no confusion arises. In order to define the horizontal lifts of ∇ and ∆ from M to UTM, we take advantage of the fact that UTM is a subbundle of TM, and set

H H ˆ ∇S f px, vq :“ ∇ f px, vq , (A.0.17) H H ˆ ∆S f px, vq :“ ∆ f px, vq , where fˆpx, uq is an arbitrary extension of f from C8 pUTMq to C8 pTMq. We can show that the definition in (A.0.17) does not depend on any specific choice of extensions. In fact, from (A.0.7)(A.0.8) it is clear that the value of ∇H f,ˆ ∆H fˆ at ˆ px, vq P Sx only depends on the data of f along the flow generated by vector fields

B β α B j ´ Γαj pxq v β , j “ 1, ¨ ¨ ¨ , d. (A.0.18) Bx ˇ Bv ˇ ˇpx,vq ˇpx,vq ˇ ˇ ˇ ˇ ˇ ˇ Thus it suffices to show that such a flow, if started at a point px, vq P UTM, will remain on UTM for all time. Consider a curve γ : t ÞÑ TM starting at px, vq P UTM that follows along the direction of one of the vector fields in (A.0.18), say the one

indexed by j. Write γ ptq in coordinates as

γ ptq “ px ptq , v ptqq “ x1 ptq , ¨ ¨ ¨ , xd ptq , v1 ptq , ¨ ¨ ¨ , vd ptq . ` ˘

104 By construction,

d k k 1 dx ptq B dv ptq B γ ptq “ k ` k ¨ dt Bx ˇ dt Bv ˇ ˛ k“1 ˇpxptq,vptqq ˇpxptq,vptqq ÿ ˇ ˇ ˝ ˇ ˇ ‚ ˇ ˇ B β α B “ j ´ Γαj px ptqq v ptq β , Bx ˇ Bv ˇ ˇpxptq,vptqq ˇpxptq,vptqq ˇ ˇ ˇ ˇ ˇ ˇ which implies dxj ptq dxk ptq “ 1, “ 0, k ‰ j, k “ 1, ¨ ¨ ¨ , d, dt dt

and dvj ptq dvk ptq “ ´Γβ px ptqq vα ptq , “ 0, k ‰ j, k “ 1, ¨ ¨ ¨ , d, dt αj dt by linear independence. In other words, v ptq is indeed the parallel transport of v p0q along curve π ˝ γ : t ÞÑ x ptq on M, since v ptq satisfies

dvk ptq dxi ptq ` Γk px ptqq vj ptq “ 0, k “ 1, ¨ ¨ ¨ , d. dt ij dt

which is the same equation that defines the parallel transport on M, from v p0q and long the curve π ˝ γ ptq “ x ptq. In other words, if γ p0q “ px p0q , v p0qq P UTM, then γ ptq “ px ptq , v ptqq will stay on UTM for all time, since the parallel transport is an isometry between tangent spaces and thus preserves the unit length of v p0q “ v.

105 Appendix B

Proofs of Theorem 10, 14, 21, and 23

B.0.4 Proofs of Theorem 10 and Theorem 14

We include in this section a proof of Theorem 14. The proof of Theorem 10 can be similarly constructed. The Einstein summation convention is assumed everywhere unless otherwise specified. Our starting point is the following lemma, in reminiscent of [41, Lemma 8] and [133, Lemma B.10].

Lemma 28. Let Φ: R Ñ R be a smooth function compactly supported in r0, 1s. Assume M is a d-dimensional compact Riemannian manifold without boundary, with injectivity radius Inj pMq ą 0. For any  ą 0, define kernel function

d2 px, yq Φ px, yq “ Φ M (B.0.1)   ˆ ˙

2 on M ˆ M, where dM p¨, ¨q is the geodesic distance on M. If the parameter  is sufficiently small such that 0 ď  ď Inj pMq, then the integral operator associated

with kernel Φ a

pΦ gq pxq :“ Φ px, yq g pyq dvolM pyq (B.0.2) żM 106 has the following asymptotic expansion as  Ñ 0

d m2 1 2 pΦ gq pxq “  2 m g pxq `  ∆ g pxq ´ Scal pxq g pxq ` O  , (B.0.3)  0 2 M 3 „ ˆ ˙  ` ˘ where m0, m2 are constants that depend on the moments of Φ and the dimension d of the Riemannian manifold M, ∆M is the Laplace-Beltrami operator on M, and Scal pxq is the scalar curvature of M at x.

Proof. We put everything in geodesic normal coordinates centered at x P M. If

1 d 1 2 dM px, yq “ r, let y have geodesic normal coordinates s , ¨ ¨ ¨ , s such that ps q `

2 ¨ ¨ ¨ ` sd “ r2. Note that ` ˘ ` ˘ 2 dM px, yq Φ px, yq g pyq dvolM pyq “ Φ g pyq dvolM pyq ?  żM żB pxq ˆ ˙ (B.0.4) r2 “ Φ g˜ psq dvolM psq ?  żB p0q ˆ ˙

where

1 d 1 d g˜ psq “ g˜ s , ¨ ¨ ¨ , s “ g ˝ expx s e1 ` ¨ ¨ ¨ ` s ed ,

` 1 d˘ ` 1 ˘d dvolM psq “ dvolM s , ¨ ¨ ¨ , s “ dvolM expx s e1 ` ¨ ¨ ¨ ` s ed ` ˘ ` ` ˘˘ with te1, ¨ ¨ ¨ , edu being an orthonormal basis for TxM. A further change of variables

s1 sd r s˜1 “ ? , ¨ ¨ ¨ , s˜d “ ? ;r ˜ “ ?   

leads to

2 r 2 ? ? Φ g˜ psq dvolM psq “ Φ r˜ g˜  s˜ dvolM  s˜ ?  żB p0q ˆ ˙ żB1p0q ` ˘ ` ˘ ` ˘ (B.0.5) 2 ? 1 ? d ? 1 ? d “ Φ r˜ g˜  s˜ , ¨ ¨ ¨ ,  s˜ dvolM  s˜ , ¨ ¨ ¨ ,  s˜ . żB1p0q ` ˘ ` ˘ ` ˘ 107 Recall [115] that in geodesic normal coordinates

1 dvol s1, ¨ ¨ ¨ , sd “ 1 ´ R pxq sksl ` O r3 ds1 ¨ ¨ ¨ dsd M 6 kl „  ` ˘ ` ˘ where Rkl is the Ricci curvature tensor

ij Rkl pxq “ g Rkilj pxq .

Thus

? 1 ? d  k l 3 3 d 1 d dvol  s˜ , ¨ ¨ ¨ ,  s˜ “ 1 ´ R pxq s˜ s˜ ` O  2 r˜ ¨  2 ds˜ ¨ ¨ ¨ ds˜ . (B.0.6) M 6 kl ` ˘ ” ´ ¯ı In the meanwhile, the Taylor expansion ofg ˜ s1, ¨ ¨ ¨ , sd near x reads ` ˘ Bg˜ 1 B2g˜ g˜ s1, ¨ ¨ ¨ , sd “ g˜ p0q ` p0q sj ` p0q sksl ` O r3 Bsj 2 BskBsl ` ˘ ` ˘ and thus

2 ? 1 ? d ? Bg˜ j 1 B g˜ k l 3 3 g˜  s˜ , ¨ ¨ ¨ ,  s˜ “ g pxq` ¨ p0q s˜ `¨ p0q s˜ s˜ `O  2 r˜ . (B.0.7) Bsj 2 BskBsl ` ˘ ´ ¯ Combining (B.0.4)–(B.0.7) and noting that

2 ? Bg˜ j d 1 d Φ r˜ ¨  ¨ p0q s˜ ¨  2 ds˜ ¨ ¨ ¨ ds˜ “ 0 Bsj żB1p0q ` ˘ by the symmetry of the kernel and the domain of integration B1 p0q, we have

2 ? 1 ? d ? Φ px, yq g pyq dvolM pyq “ Φ r˜ g˜  s˜ , ¨ ¨ ¨ ,  s˜ dvolM  s˜ żM żB1p0q ` ˘ ` ˘ ` ˘ 2 2 ? Bg˜ j 1 B g˜ k l 3 3 “ Φ r˜ g pxq `  ¨ p0q s˜ `  ¨ p0q s˜ s˜ ` O  2 r˜ ¨ Bsj 2 BskBsl żB1p0q „  ` ˘ ´ ¯  k l 3 3 d 1 d 1 ´ R pxq s˜ s˜ ` O  2 r˜ ¨  2 ds˜ ¨ ¨ ¨ ds˜ 6 kl ” ´ ¯ı

d 2 “  2 g pxq Φ r˜ ds˜ « żB1p0q ` ˘ 108 1 B2g˜ 1 `  p0q ´ g pxq R pxq Φ r˜2 s˜ks˜l ds˜ ` O 2 2 BskBsl 6 kl ˆ ˙ żB1p0q ff ` ˘ ` ˘

d 2 “  2 g pxq Φ r˜ ds˜ « żB1p0q ` ˘ 2 1 B g˜ 1 2 k 2 2 `  p0q ´ g pxq Rkk pxq Φ r˜ s˜ ds˜ ` O  k 2 2 B ps q 6 B 0 ˆ ˙ ż 1p q ff ` ˘ ` ˘ ` ˘ where the last equality follows from the observation that

Φ r˜2 s˜ks˜l ds˜1 ¨ ¨ ¨ ds˜d “ 0 if k ‰ l żB1p0q ` ˘

3 again by the symmetry of the kernel and the domain of integration B1 p0q; the O  2 ´ ¯ term vanishes due to the symmetry of the kernel (as argued in [134, §2]). The

constants m0, m2 can now be explicitly characterized:

1 1 2 1 d 2 d´1 d´1 2 d´1 m0 :“ Φ r˜ ds˜ ¨ ¨ ¨ ds˜ “ Φ r˜ r˜ dr˜ dσ “ ω Φ r˜ r˜ dr,˜ żB1p0q ż0 żS1p0q ż0 ` ˘ ` ˘ ` ˘ 2 k 2 1 d m2 :“ Φ r˜ s˜ ds˜ ¨ ¨ ¨ ds˜ independent of k P t1, ¨ ¨ ¨ , du by symmetry. żB1p0q ` ˘ ` ˘ Finally, recall that in geodesic normal coordinates

d B2g˜ d p0q “ ∆M g pxq , Rkk pxq “ Scal pxq , k 2 k“1 B ps q k“1 ÿ ÿ from which the conclusion follows:

Φ px, yq g pyq dvolM pyq żM

d 1 1 2 “  2 m g pxq ` m ∆ g pxq ´ Scal pxq g pxq ` O  0 2 2 M 6 „ ˆ ˙  ` ˘ d m2 1 2 “  2 m g pxq `  ∆ g pxq ´ Scal pxq g pxq ` O  . 0 2 M 3 „ ˆ ˙  ` ˘ 109 In order to study K,δ, we need an expansion for the parallel transport term Py,xv. This is established in Lemma 29.

Lemma 29. Let M be a Riemannian manifold, x P M, v P TxM. In geodesic normal

coordinates around x, the parallel transport of v along a geodesic γ : t ÞÑ expxtθ

(t P r0, s , }θ}TxM “ 1) has the following asymptotic expansion:

2 j t P pvq “ vj ´ θkθσvl R j pxq ` R j pxq ` O t3 expxtθ,x 6 lσk kσl ` ˘ ´ ¯ ` ˘ where Py,x : TxM Ñ TyM denotes the parallel transport from TxM to TyM along the geodesic segment connecting x and y.

Proof. Let s1, ¨ ¨ ¨ , sd be a geodesic normal coordinate chart centered at x P M.

Assume v P TxM takes( the expression

d j B v “ v j Bs ˇ j“1 ˇx ÿ ˇ ˇ and let V : r0, s Ñ TM be the parallel transportedˇ vector field along the given geodesic γ

d j B V ptq “ V ptq j . Bs ˇ j“1 ˇγptq ÿ ˇ 2 ˇ Note that ps1q2 ` ¨ ¨ ¨ ` sd “ t2. Recall that V beingˇ parallel along γ means ` ˘ d j d k dV dγ l j B 0 “ ∇ 1 V ptq “ ptq ` ptq V ptq Γ pγ ptqq γ ptq dt dt kl xj j 1 ˜ ¸ B ˇ “ k,l“1 ˇγptq ÿ ÿ ˇ ˇ or equivalently ˇ

dV j d dγk ptq ` ptq V l ptq Γj pγ ptqq “ 0 for all j “ 1, ¨ ¨ ¨ , d. (B.0.8) dt dt kl k,l“1 ÿ 110 Let t; θ1, ¨ ¨ ¨ , θd be the geodesic polar coordinates corresponding to s1, ¨ ¨ ¨ , sd . By iteratively` using˘ (B.0.8), ` ˘

V j p0q “ vj

dV j d dγk p0q “ ´ p0q V l p0q Γj pγ p0qq “ 0 dt dt kl k,l“1 ÿ 2 j d k d V d dγ l j 2 p0q “ ´ ptq V ptq Γkl pγ p0qq dt dtˇ dt k,l“1 ˇt“0 ˆ ˙ ÿ ˇ d ˇ k dγˇ l d j ´ p0q V p0q Γkl pγ ptqq dt dtˇ k,l“1 ˇt“0 ÿ ˇ ˇ d d ˇ 1 “ ´ θkvl B Γj pγ p0qq θσ “ ´ θkθσvl R j pxq ` R j pxq , σ kl 3 lσk kσl k,l“1 σ“1 ÿ ÿ ´ ¯ d2V j where the last equality for follows from a simple calculation of Christoffel dt2 symbols, as follows. Recall that in geodesic normal coordinates

1 g “ δ ` R pxq sksl ` O t3 ij ij 3 iklj ` ˘ hence

1 1 1 B g “ R pxq sl ` R pxq sk ` O t2 “ sk pR pxq ` R pxqq ` O t2 . µ ij 3 iµlj 3 ikµj 3 iµkj ikµj ` ˘ ` ˘ Plugging these partial derivatives into the expression of Christoffel symbols to obtain

1 Γj “ gνj pB g ` B g ´ B g q kl 2 k lν l kν ν kl 1 “ δνj ` O t2 ˆ 2 1 “ ` ˘‰ sσ pR pxq ` R pxq ` R pxq ` R pxq ´ R pxq ´ R pxqq ` O t2 3 lkσν lσkν klσν kσlν kνσl kσνl 1“ 1 ` ˘‰ “ δνjsσ r2R pxq ` 2R pxqs ` O t2 “ sσ R j pxq ` R j pxq ` O t2 , 6 lσkν kσlν 3 lσk kσl ´ ¯ `111˘ ` ˘ therefore 1 B Γj pxq “ R j pxq ` R j pxq ` O ptq σ kl 3 lσk kσl ´ ¯ d2V j which verifies the last equality for p0q. Therefore, we have the following Taylor dt2 expansion for V j ptq up to the second order:

t2 V j ptq “ vj ´ θkθσvl R j pxq ` R j pxq ` O t3 , 6 lσk kσl ´ ¯ ` ˘ which establishes the desired conclusion.

Remark 30. It is also useful to note that

1 V j ptq “ vj ´ sksσvl R j pxq ` R j pxq ` O t3 . 6 lσk kσl ´ ¯ ` ˘ where s1, ¨ ¨ ¨ , sd are the geodesic normal coordinates of y in the proof of Lemma 29. Moreover,` it is not˘ hard to obtain higher order terms in the asymptotic expansion. For instance, differentiating both sides of (B.0.8) twice, we have

d3V j d d2 dγk p0q “ ´ ptq V l ptq Γj pγ ptqq dt3 dt2 dt kl k,l“1 ˇt“0 ˆ ˙ ÿ ˇ ˇ d 2 k ˇ d k 2 d dγ l j dγ l d j “ ´ 2 ptq V ptq Γkl pγ p0qq ´ p0q V p0q 2 Γkl pγ ptqq dt ˇ dt dt dt ˇ k,l“1 ˇt“0ˆ ˙ k,l“1 ˇt“0 ÿ ˇ ÿ ˇ d ˇ k ˇ ˇd dγ l d j ˇ ´ 2 ptq V ptq Γkl pγ ptqq dtˇ dt dtˇ k,l“1 ˇt“0ˆ ˙ ˇt“0 ÿ ˇ ˇ ˇ ˇ ˇ ˇ (B.0.9) j Note that Γkl pγ p0qq “ 0 since the coordinate system is geodesic, the first term in the right hand side of (B.0.9) vanishes. In the meanwhile, since in geodesic normal coordinates the parametrization of geodesic γ is linear, we have

d2γk p0q “ 0, k “ 1, ¨ ¨ ¨ , d. dt2 112 Combining this observation with the computation

dV j d dγk p0q “ ´ p0q V l p0q Γj pγ p0qq “ 0, dt dt kl k,l“1 ÿ we conclude that the last term in the right hand side of (B.0.9) also vanishes. Therefore, (B.0.9) is left with

3 j d k 2 d V dγ l d j 3 p0q “ ´ p0q V p0q 2 Γkl pγ ptqq dt dt dt ˇ k,l“1 ˇt“0 ÿ ˇ k l j σ ω kˇ σ ω l j “ θ v BσωΓkl pxq θ θ “ θ ˇθ θ v BσωΓkl pxq .

We can compute the second order derivative of the Christoffel symbol at x (see,

e.g., [63]) and completely determine the O pt3q term; but this is less crucial for our application. The only point in carrying through the computation of the third order derivative of V j ptq is that the O pt3q term in the expansion of Lemma 29 is a third order homogeneous polynomial in the geodesic coordinates s1, ¨ ¨ ¨ , sd : ` ˘ d3V j t3 p0q “ t3θkθσθωvlB Γj pxq “ sksσsωvlB Γj pxq , dt3 σω kl σω kl an observation that is necessary for dropping the higher order error term in Lemma 31

3 2 from O  2 to  , as we will see below. ´ ¯ Armed with Lemma 28 and Lemma 29, we are ready to take a step at analyz- ing K,δ. Lemma 31 starts our investigation of kernel functions incorporated with parallel-transports. Virtually it only deals with K,δ with δ Ñ 0, but we’ll soon see that it opens the door for understanding much more general cases.

Lemma 31. Following Lemma 28, let Py,x : TxM Ñ TyM denote the parallel trans-

port from TxM to TyM determined by the Levi-Civita connection on M. For any

113 function f P C8 pTMq, as  Ñ 0,

Φ px, yq f py, Py,xvq dvolM pyq M ż (B.0.10) d m2 H 1 2 “  2 m f px, vq `  ∆ f px, vq ´ Scal pxq f px, vq ` O  , 0 2 3 " „  * ` ˘ where ∆H is the horizontal Laplacian on TM defined in (A.0.8).

Proof. Consider the geodesic normal neighborhood around x P M, with  ą 0 suffi- ? ciently small such that a geodesic ball of radius  centered at x is contained in this neighborhood. The integral is actually supported only on such a geodesic ball, due to the compact support of Φ

d2 px, yq Φ px, yq f py, P vq dvol pyq “ Φ M f py, P vq dvol pyq  y,x M  y,x M żM żM ˆ ˙ 2 dM px, yq “ Φ f py, Py,xvq dvolM pyq . ?  żB pxq ˆ ˙

? 1 d Express y P B  pxq in geodesic coordinates s , ¨ ¨ ¨ , s , with ` ˘ 1 2 d 2 2 2 s ` ¨ ¨ ¨ ` s “ r “ dM px, yq , ` ˘ ` ˘ and put y into polar coordinates

y “ expx rθ, θ P TxM, }θ}x “ 1.

Recall from Lemma 29 that the j-th coordinate component (j “ 1, ¨ ¨ ¨ , d) of Py,xv is

r2 pP vqj “ vj ´ θkθσvl R j pxq ` R j pxq ` O r3 y,x 6 lσk kσl ´ ¯ 1 ` ˘ “ vj ´ sksσvl R j pxq ` R j pxq ` O r3 . 6 lσk kσl ´ ¯ ` ˘

114 A further change of coordinates

s1 sd r s˜1 “ ? , ¨ ¨ ¨ , s˜d “ ? ;r ˜ “ ?   

leads to

2 dM px, yq Φ f py, Py,xvq dvolM pyq ?  żB pxq ˆ ˙ 2 r ˜ 1 d 1 d 1 d “ Φ f s , ¨ ¨ ¨ , s , pPy,xvq , ¨ ¨ ¨ , pPy,xvq dvolM s , ¨ ¨ ¨ , s ?  żB p0q ˆ ˙ ´ ¯ ` ˘ 2 ˜ 1 1 1 d 1 d 1 d “ Φ r˜ f  2 s˜ , ¨ ¨ ¨ ,  2 s˜ , pPy,xvq , ¨ ¨ ¨ , pPy,xvq dvolM s˜ , ¨ ¨ ¨ , s˜ żB1p0q ` ˘ ´ ¯ ` ˘ (B.0.11) where f˜ denotes for f in these geodesic normal coordinates,

j j  k σ l j j 3 3 pP vq “ v ´ s˜ s˜ v R pxq ` R pxq ` O  2 r˜ y,x 6 lσk kσl ´ ¯ ´ ¯ and 1 dvol s1, ¨ ¨ ¨ , sd “ 1 ´ R pxq sksl ` O r3 ds1 ¨ ¨ ¨ dsd, M 6 kl „  ` ˘ ` ˘ 1 d  k l 3 3 d 1 d dvol s˜ , ¨ ¨ ¨ , s˜ “ 1 ´ R pxq s˜ s˜ ` O  2 r˜ ¨  2 ds˜ ¨ ¨ ¨ ds˜ . M 6 kl ` ˘ ” ´ ¯ı ˜ Taylor expanding f at p0, vq P TxM in these coordinates, we have

˜ 1 1 1 d 1 d f  2 s˜ , ¨ ¨ ¨ ,  2 s˜ , pPy,xvq , ¨ ¨ ¨ , pPy,xvq ´ ¯ ˜ ˜ 2 ˜ 1 j Bf j j Bf  k l B f “ f˜p0, vq `  2 s˜ p0, vq ` pP vq ´ v p0, vq ` s˜ s˜ p0, vq Bsj y,x Bvj 2 BskBsl ” ı 1 B2f˜ ` pP vqk ´ vk pP vql ´ vl p0, vq 2 y,x y,x BvkBvl ” ı ” ı 1 2 ˜  2 j j m B f 3 ` pP vq ´ v s˜ p0, vq ` O  2 . 2 y,x Bvjsm ” ı ´ ¯ 115 The rest of the proof follows from simply substituting this Taylor expansion into (B.0.11) and integrate term-by-term. For simplicity of notations, let us write

2 1 d m0 :“ Φ r˜ ds˜ ¨ ¨ ¨ ds˜ . żB1p0q ` ˘ By the symmetry of the domain of integration,

Φ r˜2 s˜jds˜1 ¨ ¨ ¨ ds˜d “ 0, j “ 1, ¨ ¨ ¨ , d, żB1p0q ` ˘ Φ r˜2 s˜ks˜lds˜1 ¨ ¨ ¨ ds˜d “ 0, k ‰ l, k, l “ 1, ¨ ¨ ¨ , d, żB1p0q ` ˘ and

2 j 2 1 d m2 :“ Φ r˜ s˜ ds˜ ¨ ¨ ¨ ds˜ , j “ 1, ¨ ¨ ¨ , d żB1p0q ` ˘ ` ˘ are constants independent of super-indices 1 ď j ď d. Following a direct computa- tion,

2  k l 3 3 d 1 d Φ r˜ f˜p0, vq 1 ´ R pxq s˜ s˜ ` O  2 r˜ ¨  2 ds˜ ¨ ¨ ¨ ds˜ 6 kl żB1p0q ` ˘ ” ´ ¯ı d m2 2 “  2 f px, vq m ´  Scal pxq ` O  , 0 6 ” ı ˜ ` ˘ 2 1 j Bf  k l 3 3 d 1 d Φ r˜  2 s˜ p0, vq 1 ´ R pxq s˜ s˜ ` O  2 r˜ ¨  2 ds˜ ¨ ¨ ¨ ds˜ Bsj 6 kl żB1p0q ` ˘ ” ´ ¯ı ˜ d 1 Bf j 1 d 3 d 2 “  2  2 p0, vq Φ pr˜q s˜ ds˜ ¨ ¨ ¨ ds˜ ` O  2 “  2 ¨ O  , Bsj „żB1p0q  ´ ¯ ` ˘ ˜ 2 j j Bf  k l 3 3 d 1 d Φ r˜ pP vq ´ v p0, vq 1 ´ R pxq s˜ s˜ ` O  2 r˜ ¨  2 ds˜ ¨ ¨ ¨ ds˜ y,x Bvj 6 kl żB1p0q ` ˘ ” ı ” ´ ¯ı ˜ d m2 Bf 2 l 2 “   v j p0, vq Rlj pxq ` O  , « 6 Bv ff ` ˘ 2 ˜ 2  k l B f  k l 3 3 d 1 d Φ r˜ s˜ s˜ p0, vq 1 ´ R pxq s˜ s˜ ` O  2 r˜ ¨  2 ds˜ ¨ ¨ ¨ ds˜ 2 BskBsl 6 kl żB1p0q ` ˘ ” ´ ¯ı 116 d 2 ˜ d m2 B f 2 “  2  p0, vq ` O  , 2 k 2 « k“1 B ps q ff ÿ ` ˘ 1 B2f˜ Φ r˜2 ¨ pP vqk ´ vk pP vql ´ vl p0, vq 2 y,x y,x BvkBvl żB1p0q ` ˘ ” ı ” ı  k l 3 3 d 1 d d 2 ¨ 1 ´ R pxq s˜ s˜ ` O  2 r˜ ¨  2 ds˜ ¨ ¨ ¨ ds˜ “  2 ¨ O  , 6 kl ” ´ ¯ı ` ˘ 1 2  2 B f˜ Φ r˜2 pP vqj ´ vj s˜m p0, vq ˆ 2 y,x Bvjsm żB1p0q ` ˘ ” ı  k l 3 3 d 1 d d 2 1 ´ R pxq s˜ s˜ ` O  2 r˜ ¨  2 ds˜ ¨ ¨ ¨ ds˜ “  2 ¨ O  , 6 kl ” ´ ¯ı ` ˘ 3 where all O  2 terms drop out as in the proof of Lemma 28, thanks to Remark 30. ´ ¯ Combining these computation, we have

Φ px, yq f py, Py,xvq dvolM pyq żM ˜ 1 1 1 d 1 d 1 d “ Φ pr˜q f  2 s˜ , ¨ ¨ ¨ ,  2 s˜ , pPy,xvq , ¨ ¨ ¨ , pPy,xvq dvolM s˜ , ¨ ¨ ¨ , s˜ żB1p0q ´ ¯ ` ˘

d m2 “  2 f px, vq m0 ´  Scal pxq # 6 ” ı ˜ d 2 ˜ m2 l Bf m2 B f 2 `  v p0, vq Rlj pxq `  p0, vq ` O  6 Bvj 2 k 2 k“1 B ps q + ÿ ` ˘

d m2 “  2 f px, vq m0 ´  Scal pxq # 6 ” ı d 2 ˜ ˜ m2 B f 1 l Bf 2 `  p0, vq ` Rlj pxq v p0, vq ` O  2 k 2 3 Bvj «k“1 B ps q ff + ÿ ` ˘ d m2 m2 H 2 “  2 m f px, vq ´  f px, vq Scal pxq `  ∆ f px, vq ` O  0 6 2 ! ` ˘) d m2 H 1 2 “  2 m f px, vq `  ∆ f px, vq ´ Scal pxq f px, vq ` O  , 0 2 3 " „  * ` ˘ 117 where in the second to last equality we used the expression of ∆H in geodesic normal coordinates from (A.0.10).

The following Lemma 32 is the unit tangent bundle version of Lemma 31.

Lemma 32. Following Lemma 28, for any function f P C8 pUTMq, as  Ñ 0,

Φ px, yq f py, Py,xvq dvolM pyq M ż (B.0.12) d m2 H 1 2 “  2 m f px, vq `  ∆ f px, vq ´ Scal pxq f px, vq ` O  , 0 2 S 3 " „  * ` ˘ H where ∆S is the horizontal Laplacian on UTM as defined in (A.0.17).

Proof. First of all, note that by the metric compatibility of the Levi-Civita connec- tion, Py,x : TxM Ñ TyM is an isometry, which descends naturally to an isometry

8 ˆ from Sx to Sy. For any function f P C pUTMq, let f denote its extension to the whole TM, as in (A.0.17). By Lemma 31,

ˆ Φ px, yq f py, Py,xvq dvolM pyq “ Φ px, yq f py, Py,xvq dvolM pyq żM żM

d m2 H 1 2 “  2 m fˆpx, vq `  ∆ fˆpx, vq ´ Scal pxq fˆpx, vq ` O  0 2 3 " „  * ` ˘ d m2 H 1 2 “  2 m f px, vq `  ∆ f px, vq ´ Scal pxq f px, vq ` O  . 0 2 S 3 " „  * ` ˘

Based on Lemma 32, we have the following Lemma 33, which carries most of the work for proving the first part of Theorem 14.

Lemma 33. Assume M is a d-dimensional closed Riemannian manifold with strictly

positive injectivity radius Inj pMq. For any function g P C8 pUTMq and sufficiently

118 small  P 0, Inj pMq2 , δ “ O pq, ` ˘

K,δ px, v; y, wq g py, wq dΘ py, wq żUTM

“ K,δ px, v; y, wq g py, wq dσy pwq dvolM pyq żM żSy

d d´1 m21 H 1 “  2 δ 2 m g px, vq `  ∆ g px, vq ´ Scal pxq g px, vq 0 2 S 3 # ˆ ˙

m pd ´ 1q pd ´ 2q ` δ 22 ∆V g px, vq ´ g px, vq ` O 2 ` δ2 2 S 3 ˆ ˙ + ` ˘ (B.0.13)

H where m0, m21, m22 are positive constants, dσy is the volume element on Sy, and ∆S ,

V ∆S are the horizontal and vertical spherical Laplace-Beltrami operators on UTM as defined in (A.0.16)(A.0.17).

Proof. By definition,

K,δ px, v; y, wq g py, wq dσy pwq dvolM pyq żM żSy (B.0.14) 2 2 d px, yq dS pPy,xv, wq “ K M , y g py, wq dσ pwq dvol pyq .  δ y M żM żSy ˜ ¸

Since δ “ O pq, δ Ñ 0 as  Ñ 0. Applying Lemma 28,

2 2 d px, yq dS pPy,xv, wq K M , y g py, wq dσ pwq  δ y żSy ˜ ¸

2 d´1 dM px, yq “ δ 2 M g py, P vq 0  y,x # ˆ ˙

δ d2 px, yq 1 ` M M ∆V g py, P vq ´ ScalSy pP vq g py, P vq ` O δ2 , 2 2  S y,x 3 y,x y,x ˆ ˙ „  + ` ˘ 119 Sy where Scal p¨q is the scalar curvature of Sy, and M0 prq, M2 prq are functions of a single variable determined by the following integrals over the unit ball in Rd

2 2 2 1 d´1 M0 r “ K r , ρ dθ ¨ ¨ ¨ dθ , Bd´1 0 ż 1 p q ` ˘ ` ˘ 2 1 2 2 2 1 d´1 M2 r “ θ K r , ρ dθ ¨ ¨ ¨ dθ Bd´1 0 ż 1 p q ` ˘ ` ˘ ` ˘ with 2 2 ρ2 “ θ1 ` ¨ ¨ ¨ θd´1 . ` ˘ ` ˘ Moreover, for any y P M, since pTyM, } ¨ }yq is isometric to the standard d-Euclidean space (this can be seen by simply fixing an orthonormal basis in TyM), Sy equipped with the induced metric from } ¨ }y is also isometric to the standard unit pd ´ 1q-ball in Rd. As a result,

Sy Scal pwq “ pd ´ 1q pd ´ 2q for all y P M, w P Sy.

Thus

2 2 d px, yq dS pPy,xv, wq K M , y g py, wq dσ pwq  δ y żSy ˜ ¸

2 d´1 dM px, yq “ δ 2 M g py, P vq 0  y,x # ˆ ˙

δ d2 px, yq pd ´ 1q pd ´ 2q ` M M ∆V g py, P vq ´ g py, P vq ` O δ2 . 2 2  S y,x 3 y,x ˆ ˙ „  + ` ˘ (B.0.15)

120 It remains to apply Lemma 32 multiple times to (B.0.15) to obtain

K,δ px, v; y, wq g py, wq dσy pwq dvolM pyq żM żSy

2 2 d px, yq dS pPy,xv, wq “ K M , y g py, wq dσ pwq dvol pyq  δ y M żM «żSy ˜ ¸ ff

2 d´1 dM px, yq “ δ 2 M g py, P vq dvol pyq 0  y,x M # żM ˆ ˙ δ d2 px, yq 1 ` M M ∆V g py, P vq ´ pd ´ 1q pd ´ 2q g py, P vq dvol pyq 2 2  S y,x 3 y,x M żM ˆ ˙ „ 

` O δ2 + ` ˘

d´1 d m21 H 1 2 “ δ 2  2 m g px, vq `  ∆ g px, vq ´ Scal pxq g px, vq ` O  0 2 S 3 # „ ˆ ˙  ` ˘

d m22 V pd ´ 1q pd ´ 2q d 2 2 `  2 ¨ δ ∆ g px, vq ´ g px, vq `  2 ¨ O  ` δ 2 S 3 ˆ ˙ + ` ˘

d d´1 m21 H 1 “  2 δ 2 m g px, vq `  ∆ g px, vq ´ Scal pxq g px, vq 0 2 S 3 # ˆ ˙

m pd ´ 1q pd ´ 2q ` δ 22 ∆V g px, vq ´ g px, vq ` O 2 ` δ2 , 2 S 3 ˆ ˙ + ` ˘ 2 where m0, m1, m2 are constants determined by the following integrals of M0 pr q or

2 d M2 pr q over the unit ball in R

2 1 d m0 “ M0 r ds ¨ ¨ ¨ ds , Bd 0 ż 1 p q ` ˘ 2 1 2 1 d m21 “ M0 r s ds ¨ ¨ ¨ ds , Bd 0 ż 1 p q ` ˘ ` ˘ 2 1 d m22 “ M2 r ds ¨ ¨ ¨ ds Bd 0 ż 1 p q ` ˘ 121 where 2 2 r2 “ s1 ` ¨ ¨ ¨ ` sd . ` ˘ ` ˘

Proof of Theorem 14. As δ “ O pq, direct application of Lemma 33 gives

p,δ px, vq “ K,δ px, v; y, wq p py, wq dΘ py, wq żUTM

d d´1 m21 H 1 “  2 δ 2 m p px, vq `  ∆ p px, vq ´ Scal pxq p px, vq 0 2 S 3 # ˆ ˙

m pd ´ 1q pd ´ 2q ` δ 22 ∆V p px, vq ´ p px, vq ` O 2 ` δ2 . 2 S 3 ˆ ˙ + ` ˘ (B.0.16) In order to expand the denominator of (3.4.22), note that

´α ´ αd ´ αpd´1q m21 H 1 p px, vq “  2 δ 2 m p px, vq `  ∆ p px, vq ´ Scal pxq p px, vq ,δ 0 2 S 3 # ˆ ˙

´α m pd ´ 1q pd ´ 2q ` δ 22 ∆V p px, vq ´ p px, vq ` O 2 ` δ2 2 S 3 ˆ ˙ + ` ˘ H ´ αd ´ αpd´1q ´α ´α αm21 ∆S p px, vq 1 “  2 δ 2 m p px, vq 1 ´  ´ Scal pxq 0 2m p px, vq 3 # 0 ˆ ˙

αm ∆V p px, vq pd ´ 1q pd ´ 2q ´ δ 22 S ´ ` O 2 ` δ2 , 2m p px, vq 3 0 ˆ ˙ + ` ˘ and hence by Lemma 33

α K,δ px, v; y, wq p py, wq dΘ py, wq żUTM

´α ´α “ K,δ px, v; y, wq p,δ px, vq p,δ py, wq p py, wq dσy pwq dvolM pyq żM żSy

αd αpd´1q ´ 2 ´ 2 ´α ´α “  δ m0 p,δ px, vq ˆ 122 αm ∆H p py, wq 1 K px, y; v, wq p1´α py, wq 1 ´  21 S ´ Scal pyq ,δ 2m p py, wq 3 żM żSy « 0 ˆ ˙

αm ∆V p py, wq pd ´ 1q pd ´ 2q ´ δ 22 S ´ ` O 2 ` δ2 2m p py, wq 3 0 ˆ ˙ ff ` ˘ p1´αqd p1´αqpd´1q 2 2 ´α ´α “  δ m0 p,δ px, vq ˆ

αm ∆H p px, vq 1 m p1´α px, vq 1 ´  21 S ´ Scal pxq 0 2m p px, vq 3 # « 0 ˆ ˙

αm ∆V p px, vq pd ´ 1q pd ´ 2q ´ δ 22 S ´ 2m p px, vq 3 0 ˆ ˙ ff m 1 `  21 ∆H p1´α px, vq ´ Scal pxq p1´α px, vq 2 S 3 „  m pd ´ 1q pd ´ 2q ` δ 22 ∆V p1´α px, vq ´ p1´α px, vq ` O 2 ` δ2 2 S 3 „  + ` ˘ p1´αqd p1´αqpd´1q 2 2 1´α ´α 1´α “  δ m0 p,δ px, vq p px, vq ˆ

αm ∆H p px, vq 1 αm ∆V p px, vq pd ´ 1q pd ´ 2q 1 ´  21 S ´ Scal pxq ´ δ 22 S ´ 2m p px, vq 3 2m p px, vq 3 # 0 ˆ ˙ 0 ˆ ˙ m ∆H p1´α px, vq 1 m ∆V p1´α px, vq pd ´ 1q pd ´ 2q `  21 S ´ Scal pxq ` δ 22 S ´ 2m p1´α px, vq 3 2m p1´α px, vq 3 0 ˆ ˙ 0 ˆ ˙

` O 2 ` δ2 + ` ˘ p1´αqd p1´αqpd´1q 2 2 1´α ´α 1´α “  δ m0 p,δ px, vq p px, vq ˆ

m ∆H p1´α px, vq ∆H p px, vq 1 1 `  21 S ´ α S ´ p1 ´ αq Scal pxq 2m p1´α px, vq p px, vq 3 # 0 „ 

m ∆V p1´α px, vq ∆V p px, vq pd ´ 1q pd ´ 2q ` δ 22 S ´ α S ´ p1 ´ αq ` O 2 ` δ2 . 2m p1´α px, vq p px, vq 3 0 „  + ` ˘

123 A similar computation expands the numerator of (3.4.22):

α K,δ px, y; v, wq f py, wq p py, wq dσy pwq dvolM pyq żUTM

´α ´α “ K,δ px, y; v, wq p,δ px, vq p,δ py, wq f py, wq p py, wq dσy pwq dvolM pyq żM żSy

αd αpd´1q ´ 2 ´ 2 ´α ´α “  δ m0 p,δ px, vq ˆ

αm ∆H p py, wq 1 K px, y; v, wq f py, wq p1´α py, wq 1 ´  21 S ´ Scal pyq ,δ 2m p py, wq 3 żM żSy « 0 ˆ ˙

αm ∆V p py, wq pd ´ 1q pd ´ 2q ´ δ 22 S ´ ` O 2 ` δ2 2m p py, wq 3 0 ˆ ˙ ff ` ˘ p1´αqd p1´αqpd´1q 2 2 ´α ´α “  δ m0 p,δ px, vq ˆ

αm ∆H p px, vq 1 m f px, vq p1´α px, vq 1 ´  21 S ´ Scal pxq 0 2m p px, vq 3 # « 0 ˆ ˙

αm ∆V p px, vq pd ´ 1q pd ´ 2q ´ δ 22 S ´ 2m p px, vq 3 0 ˆ ˙ ff m 1 `  1 ∆H fp1´α px, vq ´ Scal pxq fp1´α px, vq 2 S 3 „  “ ‰ “ ‰ m pd ´ 1q pd ´ 2q ` δ 2 ∆V fp1´α px, vq ´ fp1´α px, vq ` O 2 ` δ2 2 S 3 „  + “ ‰ “ ‰ ` ˘ p1´αqd p1´αqpd´1q 2 2 1´α ´α 1´α “  δ m0 p,δ px, vq p px, vq ˆ

αm ∆H p px, vq 1 f px, vq ´  21 f px, vq S ´ Scal pxq 2m p px, vq 3 # 0 ˆ ˙ αm ∆V p px, vq pd ´ 1q pd ´ 2q ´ δ 22 f px, vq S ´ 2m p px, vq 3 0 ˆ ˙ m ∆H rfp1´αs px, vq 1 `  21 f px, vq S ´ Scal pxq 2m rfp1´αs px, vq 3 0 ˆ ˙

124 m ∆V rfp1´αs px, vq pd ´ 1q pd ´ 2q ` δ 22 f px, vq S ´ ` O 2 ` δ2 2m rfp1´αs px, vq 3 0 ˆ ˙ + ` ˘ p1´αqd p1´αqpd´1q 2 2 1´α ´α 1´α “  δ m0 p,δ px, vq p px, vq ˆ

m ∆H rfp1´αs px, vq ∆H p px, vq 1 f x, v  21 f x, v S α S 1 α Scal x p q ` p q 1´α ´ ´ p ´ q p q # 2m0 « rfp s px, vq p px, vq 3 ff

m ∆V rfp1´αs px, vq ∆V p px, vq pd ´ 1q pd ´ 2q ` 22 f px, vq S ´ α S ´ p1 ´ αq 2m rfp1´αs px, vq p px, vq 3 0 „ 

` O 2 ` δ2 . + ` ˘ Combining expansions for denominator and numerator, we conclude that

Kα px, v; y, wq f py, wq p py, wq dΘ py, wq Hα f px, vq “ UTM ,δ ,δ Kα px, v; y, wq p py, wq dΘ py, wq ş UTM ,δ ş m ∆H rfp1´αs px, vq ∆H p px, vq 1 “ f px, vq `  21 f px, vq S ´ α S ´ p1 ´ αq Scal pxq 2m rfp1´αs px, vq p px, vq 3 # 0 „  m ∆V rfp1´αs px, vq ∆V p px, vq pd ´ 1q pd ´ 2q ` δ 22 f px, vq S ´ α S ´ p1 ´ αq 2m rfp1´αs px, vq p px, vq 3 0 „ 

` O 2 ` δ2 + ` ˘ m ∆H p1´α px, vq ∆H p px, vq 1 ¨ 1 `  21 S ´ α S ´ p1 ´ αq Scal pxq 2m p1´α px, vq p px, vq 3 # 0 „  m ∆V p1´α px, vq ∆V p px, vq pd ´ 1q pd ´ 2q ` δ 22 S ´ α S ´ p1 ´ αq 2m p1´α px, vq p px, vq 3 0 „  ´1 ` O 2 ` δ2 + ` ˘ m ∆H rfp1´αs px, vq ∆H p px, vq 1 “ f px, vq 1 `  21 S ´ α S ´ p1 ´ αq Scal pxq 2m rfp1´αs px, vq p px, vq 3 # 0 „ 

125 m ∆V rfp1´αs px, vq ∆V p px, vq pd ´ 1q pd ´ 2q ` δ 22 S ´ α S ´ p1 ´ αq 2m rfp1´αs px, vq p px, vq 3 0 „  m ∆H p1´α px, vq ∆H p px, vq 1 ´  21 S ´ α S ´ p1 ´ αq Scal pxq 2m p1´α px, vq p px, vq 3 0 „  m ∆V p1´α px, vq ∆V p px, vq pd ´ 1q pd ´ 2q ´ δ 22 S ´ α S ´ p1 ´ αq 2m p1´α px, vq p px, vq 3 0 „  + ` O 2 ` δ2 ` ˘ m ∆H rfp1´αs px, vq ∆H p1´α px, vq f x, v 1  21 S S “ p q ` 1´α ´ 1´α # 2m0 « rfp s px, vq p px, vq ff

m ∆V rfp1´αs px, vq ∆V p1´α px, vq ` δ 22 S ´ S ` O 2 ` δ2 2m rfp1´αs px, vq p1´α px, vq 0 „  + ` ˘ m ∆H rfp1´αs px, vq ∆H p1´α px, vq “ f px, vq `  21 S ´ f px, vq S 2m p1´α px, vq p1´α px, vq 0 „  m ∆V rfp1´αs px, vq ∆V p1´α px, vq ` δ 22 S ´ f px, vq S ` O 2 ` δ2 . 2m p1´α px, vq p1´α px, vq 0 „  ` ˘

A proof of Theorem 10 can be composed with a similar direct computation. The only prerequisite is to establish a tangent bundle version of Lemma 33, using Lemma 31 instead of Lemma 32. Similarly, a proof of Proposition 11 can be derived from the following proof of Proposition 15.

Proof of Proposition 15. To establish (3.4.24), first note that lim p,γ px, vq “ lim K,γ px, v; y, wq p py, wq dσy pwq dvolM pyq γÑ8 γÑ8 żM żSy 2 d2 P v, w dM px, yq Sy p y,x q “ lim K , p py, wq dσy pwq dvolM pyq γÑ8  γ żM żSy ˜ ¸ 2 d2 P v, w dM px, yq Sy p y,x q “ lim K , p py, wq dσy pwq dvolM pyq γÑ8  γ żM żSy ˜ ¸ 126 d2 px, yq “ K M , 0 p py, wq dσ pwq dvol pyq  y M żM żSy ˆ ˙ d2 px, yq “ K M , 0 p py, wq dσ pwq dvol pyq ,  y M żM ˆ ˙ « żSy ff

since Py,xv does not depend on γ. Recall from (3.4.19) that

p pxq “ p px, vq dVx pvq , żSx and define d2 px, yq d2 px, yq K px, yq “ K M “ K M , 0 ,    ˆ ˙ ˆ ˙

α K px, yq K px, yq “ α α . p pxq p pyq

Then

α K px, yq f py, wq p py, wq dσy pwq dvolM pyq α żM « żSy ff lim H,γf px, vq “ γÑ8 α K px, yq p py, wq dσy pwq dvolM pyq żM « żSy ff

α p py, wq K px, yq f py, wq dσ pwq p pyq dvol pyq  p y y M M « Sy p q ff “ ż ż α K px, yq p pyq dvolM pyq żM α K px, yq f pyq p pyq dvolM pyq “ żM . α K px, yq p pyq dvolM pyq żM (B.0.17)

127 By [41, Theorem 2], as  Ñ 0

α lim H,γf px, vq γÑ8

α K px, yq f pyq p pyq dvolM pyq “ żM α (B.0.18) K px, yq p pyq dvolM pyq żM 1 1´α 1´α m ∆M fp pxq ∆ p pxq “ f pxq `  2 ´ f pxq M ` O 2 , 2m1 p1´α x p1´α x 0 « “ p q‰ p q ff ` ˘ where

1 2 1 d m0 “ K r ds ¨ ¨ ¨ ds , Bd 0 ż 1 p q ` ˘ 1 1 2 2 1 d m2 “ s K r ds ¨ ¨ ¨ ds , Bd 0 ż 1 p q ` ˘ ` ˘ 3 2 and we again dropped the higher order error term from O  2 to O p q, as argued ´ ¯ in [134, §2].

B.0.5 Proofs of Theorem 21 and Theorem 23

To prove the two finite sampling theorems, we’ll follow the path paved by [12, 66, 134, 133].

Sampling without Noise

The following lemma builds the bridge between the geodesic distance on the manifold and the Euclidean distance in the ambient space.

Lemma 34. Let ι : M ãÑ RD be an isometric embedding of the smooth d-dimensional D closed Riemannian manifold M into R . For any x, y P M such that dM px, yq ă Inj pMq, we have

1 d2 px, yq “ }ι pxq ´ ι pyq}2 ` d4 px, yq }Π pθ, θq}2 ` O d5 px, yq , (B.0.19) M 12 M M 128 ` ˘ where θ P TxM, }θ}x “ 1 comes from the geodesic polar coordinates of y in a geodesic normal neighborhood of x:

y “ expxrθ, r “ dM px, yq .

Proof. See [137, Proposition 6].

The reason we need Lemma 34 is due to the fact that the hypoelliptic diffusion operator in (3.4.18) is constructed using geodesic distances on the manifolds, whereas in practice only the Euclidean distance in the ambient space is observed. In order to prove Theorem 21, it is convenient to introduce the “Euclidean distance version” of the hypoelliptic diffusion operators. Note that in Definition 20 the hat “ˆ” is used for empirical quantities; for the remainder of this appendix, the tilde “˜” will be used for quantities in hypoelliptic diffusion operators that replace the geodesic distance with the Euclidean distance. These quantities include1

2 }P v ´ w}2 ˜ }x ´ y} y,x y K,δ px, v; y, wq “ K , , ˜  δ ¸

˜ p˜,δ px, vq “ K,δ px, v; y, wq p py, wq dΘ py, wq , żUTM ˜ ˜ α K,δ px, v; y, wq K,δ px, v; y, wq “ α α , p˜,δ px, vq p˜,δ py, wq and eventually

˜ α K,δ px, v; y, wq f py, wq p py, wq dΘ py, wq ˜ α UTM H,δf px, vq “ ż . ˜ α K,δ px, v; y, wq p py, wq dΘ py, wq żUTM 1 ˆ ˜ α Note that in this subsection K,δ is not much different from K,δ, since they are both constructed from Euclidean distance and exact parallel-transports. They will represent quite different quantities ˆ α in next subsection, where K,δ is constructed from estimated parallel-transports.

129 ˜ α The next step is to establish an asymptotic expansion of type (3.4.23) for H,δ. We deduce the following Lemma 35, the “Euclidean distance version” of Lemma 28, from Lemma 34 and Lemma 28 itself.

Lemma 35. Let Φ: R Ñ R be a smooth function compactly supported in r0, 1s. Assume M is a d-dimensional closed Riemannian manifold isometrically embedded

in RD, with injectivity radius Inj pMq ą 0. For any  ą 0, define kernel function

2 ˆ }x ´ y} Φ px, yq “ Φ (B.0.20) ˜  ¸

on MˆM, where }¨} is the Euclidean distance on RD. If the parameter  is sufficiently small such that 0 ď  ď Inj pMq, then the integral operator associated with kernel

Φ a

ˆ Φ g pxq :“ Φ px, yq g pyq dvolM pyq (B.0.21) M ´ ¯ ż has the following asymptotic expansion as  Ñ 0

d m2 2 Φˆ g pxq “  2 m g pxq `  p∆ g pxq ` E pxq g pxqq ` O  , (B.0.22)  0 2 M ´ ¯ ” ` ˘ı with 1 d pd ` 2q E pxq “ ´ Scal pxq ` A pxq 3 12 where m0, m2 are constants that depend on the moments of Φ and the dimension d of the Riemannian manifold M, ∆M is the Laplace-Beltrami operator on M, Scal pxq is the scalar curvature of M at x, and A pxq is a scalar function on M that only de- pends on the intrinsic dimension d and the second fundamental form of the isometric

embedding ι : M ãÑ RD.

130 Proof. From Lemma 28,

d2 px, yq Φ M g pyq dvol pyq  M żM ˆ ˙

d m2 1 2 “  2 m g pxq `  ∆ g pxq ´ Scal pxq g pxq ` O  , 0 2 M 3 „ ˆ ˙  ` ˘ thus we only need to expand

}x ´ y}2 d2 px, yq Φ ´ Φ M g pyq dvol pyq . (B.0.23)   M żM « ˜ ¸ ˆ ˙ff

Put y in geodesic polar coordinates in a geodesic normal neighborhood of x P M,

y “ expxrθ, r “ dM px, yq , θ P TxM, }θ}x “ 1, and denote the geodesic normal coordinates around x as s1, ¨ ¨ ¨ , sd . By Lemma 34, ` ˘ 1 }x ´ y}2 ´ d2 px, yq “ ´ d4 px, yq }Π pθ, θq}2 ` O d5 px, yq M 12 M M ` ˘ thus

}x ´ y}2 d2 px, yq Φ ´ Φ M   ˜ ¸ ˆ ˙

2 d2 px, yq }x ´ y}2 d2 px, yq }x ´ y}2 d2 px, yq “ Φ1 M ¨ ´ M ` O ´ M    ¨   ˛ ˆ ˙ « ff « ff ˝ ‚ d2 px, yq 1 d8 px, yq “ Φ1 M ¨ ´ d4 px, yq }Π pθ, θq}2 ` O M .  12 M 2 ˆ ˙ ˆ ˙ ˆ ˙ (B.0.24) Recall that Φ is supported on the unit interval, which implies that in (B.0.23) only ? ? those y P M satisfying }x ´ y} ď  or dM px, yq ď  are involved. According to

131 ? ? Lemma 34, for sufficiently small  ą 0, }x ´ y} ď  implies dM px, yq ă 2 , which means that the higher order error in (B.0.24) is indeed

? d8 px, yq p q8 O M “ O “ O 2 . 2 2 ˆ ˙ ˜ ¸ ` ˘ Therefore,

}x ´ y}2 d2 px, yq Φ ´ Φ M g pyq dvol pyq   M żM « ˜ ¸ ˆ ˙ff 2 1 1 dM px, yq 4 2 d 2 “ ´ Φ d px, yq }Π pθ, θq} g pyq dvol pyq `  2 ¨ O  12  M M żM ˆ ˙ ` ˘ 2 1 1 r 4 2 d 2 “ ´ Φ r }Π pθ, θq} g pyq dvol pyq `  2 ¨ O  . 12  M żM ˆ ˙ ` ˘ (B.0.25)

In geodesic normal coordinates s1, ¨ ¨ ¨ , sd , ` ˘ r2 Φ1 r4 }Π pθ, θq}2 g pyq dvol pyq  M żM ˆ ˙ 2 1 r 4 2 1 k l 3 1 d “ Φ r }Π pθ, θq} g˜ psq 1 ´ Rkl pxq s s ` O r ds ¨ ¨ ¨ ds . ?  6 żB p0q ˆ ˙ „  ` ˘ (B.0.26)

As in Lemma 28, we Taylor expandg ˜ psq around s “ 0

Bg˜ 1 B2g˜ g˜ s1, ¨ ¨ ¨ , sd “ g˜ p0q ` p0q sj ` p0q sksl ` O r3 , Bsj 2 BskBsl ` ˘ ` ˘ and note that by symmetry

r2 Φ1 r4 }Π pθ, θq}2 sjds1 ¨ ¨ ¨ dsd “ 0, j “ 1, ¨ ¨ ¨ , d, ?  żB p0q ˆ ˙

132 thus (B.0.26) reduces to

r2 Φ1 r4 }Π pθ, θq}2 g pxq ds1 ¨ ¨ ¨ dsd ?  żB p0q ˆ ˙ r2 ` Φ1 r4 }Π pθ, θq}2 O r2 ds1 ¨ ¨ ¨ dsd ?  żB p0q ˆ ˙ ` ˘ where

r2 Φ1 r4 }Π pθ, θq}2 g pxq ds1 ¨ ¨ ¨ dsd ?  żB p0q ˆ ˙

1 2 2 4 2 d 1 d “ g pxq Φ r˜ ¨  r˜ }Π pθ, θq} ¨  2 ds˜ ¨ ¨ ¨ s˜ żB1p0q ` ˘ (B.0.27) 1 d 2 2 1 2 4`pd´1q “  2 ¨  g pxq }Π pθ, θq} dθ Φ r˜ r˜ dr˜ żS1p0q ż0 ` ˘ 1 d 2 2 1 2 3`d “  2 ¨  g pxq }Π pθ, θq} dθ Φ r˜ r˜ dr,˜ żS1p0q ż0 ` ˘ and

r2 Φ1 r4 }Π pθ, θq}2 O r2 ds1 ¨ ¨ ¨ dsd ?  żB p0q ˆ ˙ ` ˘ 1 2 2 4 2 2 d 1 d (B.0.28) “ Φ r˜ ¨  r˜ }Π pθ, θq} ¨ O r˜ ¨  2 ds˜ ¨ ¨ ¨ s˜ żB1p0q ` ˘ ` ˘ d 3 “  2 ¨ O  . ` ˘ Recall from the proof of Lemma 31, we adopted notation

2 j 2 1 d m2 “ Φ r˜ s˜ ds˜ ¨ ¨ ¨ ds˜ , j “ 1, ¨ ¨ ¨ , d, żB1p0q ` ˘ ` ˘ thus 1 2 2 1 d 2 d`1 m2d “ Φ r˜ r˜ ds˜ ¨ ¨ ¨ ds˜ “ ωd´1 Φ r˜ r˜ dr żB1p0q ż0 ` ˘ ` ˘ 133 where ωd´1 is the volume of the standard unit sphere of dimension pd ´ 1q. Denoting

1 A pxq “ }Π pθ, θq}2 dθ ω d´1 żS1p0q as the average of the length of the second fundamental form over the standard unit sphere, we can write (B.0.27) as

1 d 2 2 1 2 3`d  2 ¨  g pxq }Π pθ, θq} dθ Φ r˜ r˜ dr˜ żS1p0q ż0 ` ˘ d 2 m2 d 2 m2 “  2 ¨  g pxq A pxq ω ¨ ´ d pd ` 2q “ ´ 2 ¨  d pd ` 2q g pxq A pxq , d´1 2ω 2 ˆ d´1 ˙ (B.0.29) where we integrated by parts

1 2 1 1 1 2 3`d ξ“r˜ 1 1 3`d ´ 1 1 1 1` d Φ r˜ r˜ dr˜ “““ Φ pξq ξ 2 ¨ ξ 2 dξ “ Φ pξq ξ 2 dξ 2 2 ż0 ż0 ż0 ` ˘ ξ“1 1 1 1` d d d “ Φ pξq ξ 2 ´ 1 ` Φ pξq ξ 2 dξ 2 ξ“0 2 0 „ ˇ ˆ ˙ ż  ˇ d ` 2 1 ˇ m “ ´ Φ r˜2 r˜d`1dr˜ “ ´ 2 d pd ` 2q . 2 2ω ż0 d´1 ` ˘ Combining (B.0.29), (B.0.27), (B.0.28) with (B.0.26), we conclude that

}x ´ y}2 d2 px, yq Φ ´ Φ M g pyq dvol pyq   M żM « ˜ ¸ ˆ ˙ff

d 1 2 m2 2 “  2 ¨  d pd ` 2q g pxq A pxq ` O  12 2 „  ` ˘ d m2 2 “  2  d pd ` 2q A pxq g pxq ` O  , 24 ” ` ˘ı

134 which establishes

ˆ ˆ Φ g pxq “ Φ px, yq g pyq dvolM pyq M ´ ¯ ż d2 px, yq “ Φ M g pyq dvol pyq  M żM ˆ ˙ }x ´ y}2 d2 px, yq ` Φ ´ Φ M g pyq dvol pyq   M żM « ˜ ¸ ˆ ˙ff

d m2 1 1 2 “  2 m g pxq` ∆ g pxq ´ Scal pxq g pxq` d pd ` 2q A pxq g pxq `O  0 2 M 3 12 „ ˆ ˙  ` ˘ d m2 2 “  2 m g pxq `  p∆ g pxq ` E pxq g pxqq ` O  0 2 M ” ` ˘ı with 1 1 E pxq :“ ´ Scal pxq ` d pd ` 2q A pxq . 3 12

Remark 36. The only difference between the conclusions in Lemma 35 and Lemma 28

is that the scalar function E pxq takes the place of the scalar curvature Scal pxq; one can check, essentially by going through the proof of Theorem 14, that this change does not affect the conclusion of Theorem 14. Specifically, in that proof the same

Scal pxq from the numerator and denominator cancel out with each other in the asymptotic expansion, and this cancellation still occurs if one replaces Scal pxq with E pxq. In fact, by applying Lemma 35 repeatedly we have the following expansions for f, g P C8 pUTMq

ˆ Φ px, yq f py, Py,xvq dvolM pyq żM (B.0.30) d m2 H 2 “  2 m f px, vq `  ∆ f px, vq ` E pxq f px, vq ` O  , 0 2 S 1 ! “ ‰ ` ˘)

135 and

˜ K,δ px, v; y, wq g py, wq dΘ py, wq żUTM

d d´1 m21 2 2 H “  δ m0g px, vq `  ∆S g px, vq ` E1 pxq g px, vq (B.0.31) # 2 “ ‰

m22 V 2 2 ` δ ∆S g px, vq ` E2 ¨ g px, vq ` O  ` δ , 2 + “ ‰ ` ˘ where 1 d pd ` 2q 1 E pξ q “ ´ Scal pξ q ` ¨ }Π pθ, θq}2 dθ 1 i 3 M i 12 ω M d´1 żS1p0q

only depends on the scalar curvature ScalM and the second fundamental form ΠM of the base manifold Mat ξ, and

1 pd ´ 1q pd ` 1q 1 E “ ´ Scal ` ¨ }Π pθ, θq}2 dθ 2 3 S 12 ω S d´2 żS1p0q is a constant because

2 ScalS ” pd ´ 1q pd ´ 2q , }ΠS pθ, θq} ” 1 for any unit tangent vector θ.

˜ These expansions are essentially the equivalents of Lemma 32 and Lemma 33 for K,δ. Using (B.0.30) and (B.0.31), and picking δ “ O pq as  Ñ 0, a version of Theorem 14 α ˜ α holds true when K,δ is replaced with K,δ, i.e., as  Ñ 0 (and thus δ Ñ 0),

m ∆H rfp1´αs px, vq ∆H p1´α px, vq H˜ α f px, vq “ f px, vq `  21 S ´ f px, vq S ,δ 2m p1´α px, vq p1´α px, vq 0 „  m ∆V rfp1´αs px, vq ∆V p1´α px, vq ` δ 22 S ´ f px, vq S ` O 2 ` δ2 . 2m p1´α px, vq p1´α px, vq 0 „  ` ˘ (B.0.32) As we shall see below, this observation is the key to establish estimates for the bias error in the proof of Theorem 21.

136 The last missing piece for the proof of Theorem 21 is a large deviation bound for

our two-step sampling strategy. Recall from Assumption 19 that we first sample NB

points ξ1, ¨ ¨ ¨ , ξNB i.i.d. with respect to p on the base manifold M, and then sample

NF points on each fibre Sξj i.i.d. with respect to p p¨ | ξjq. The resulting NB ˆ NF points on UTM

x1,1, x1,2, ¨ ¨ ¨ , x1,NF

x2,1, x2,2, ¨ ¨ ¨ , x2,NF . . . . . ¨ ¨ ¨ .

xNB ,1, xNB ,2, ¨ ¨ ¨ , xNB ,NF are generally not i.i.d. sampled from UTM. This forbids applying the Law of Large Numbers directly to quantities that take the form of an average over the entire unit tangent bundle, such as

1 NB NF Kˆ px , x q f px q . N N ,δ i,r j,s j,s B F j“1 s“1 ÿ ÿ However, due to the conditional i.i.d. fibrewise sampling, it makes sense to apply the law of large numbers to average quantities on a fixed fibre, e.g.,

1 NF Kˆ px , x q f px q ÝÑ K˜ px , pξ ,Zqq f pξ ,Zq , N ,δ i,r j,s j,s EZ ,δ i,r j j F s“1 ÿ ” ı where EZ stands for the expectation with respect to the “fibre component” of the coordinates of the points on Sξj . Explicitly,

˜ ˜ EZ K,δ pxi,r, pξj, ¨qq f pξj, ¨q “ K,δ pxi,r, pξj, wqq f pξj, wq p pw | ξjq dσξj pwq . S ” ı ż ξj

Next, note that ξ1, ¨ ¨ ¨ , ξNB are i.i.d. sampled from the base manifold M, the partial expectations

N ˜ B EZ K,δ pxi,r, pξj,Zqq f pξj,Zq j“1 ! ” ı) 137 are i.i.d. random variables on M with respect to p. Thus

1 NB K˜ px , pξ ,Zqq f pξ ,Zq ÝÑ K˜ px , pY,Zqq f pY,Zq . N EZ ,δ i,r j j EY EZ ,δ i,r B j“1 ÿ ” ı ” ” ıı

Explicitly,

˜ EY EZ K,δ pxi,r, pY,Zqq f pY,Zq ” ” ıı ˜ “ p pyq K,δ pxi,r, pξj, wqq f py, wq p pw | yq dσy pwq dvolM pyq M S ż ż ξj

˜ “ K,δ pxi,r, pξj, wqq f py, wq rp pyq p pw | yqs dσy pwq dvolM pyq M S ż ż ξj

˜ “ K,δ pxi,r, pξj, wqq f py, wq p py, wq dσy pwq dvolM pyq . M S ż ż ξj

This observation suggests the following iterated limit process

NB NF 1 ˆ lim lim K,δ pxi,r, xj,sq f pxj,sq NB Ñ8 NF Ñ8 N N B F j“1 s“1 ÿ ÿ ˜ “ K,δ pxi,r, pξj, wqq f py, wq p py, wq dσy pwq dvolM pyq , M S ż ż ξj and the two limits on the left hand side generally do not commute. For this reason, it is natural for us to consider iterated partial expectations rather than total expectation on the entire UTM. For simplicity of notation, let us denote

EY , EZ as E1, E2 respectively, see the following definition.

Definition 37. Let p be a probability density function on UTM, and

p px, vq p pxq “ p px, wq dσ pwq , p pv | xq “ x p pxq żSx

138 as defined in (3.4.7)(3.4.8). For any function f P C8 pMq, define

E1f :“ f pyq p pyq dvolM pyq . żM

8 For any function g P C pSξq for ξ P M, define

ξ E2g :“ g pξ, wq p pw | ξq dσξ pwq . żSξ

Definition 38. Let p be a probability density function on UTM. We call a collection of NB ˆ NF real-valued random functions

tXj,s | 1 ď j ď NB, 1 ď s ď NF u

Procrustean with respect to p on UTM, if

(i) For each 1 ď j ď NB, the subcollection tXj,s | 1 ď s ď NF u are i.i.d. on Sξj for

some ξj P M, with respect to the conditional probability density p p¨ | ξjq;

(ii) The points tξj | 1 ď j ď NBu are i.i.d. on M with respect to the projected probability density p p¨q.

Due to (i), we denote for simplicity of notation

ξj ξj ξj 2 ξj 2 E2 Xj :“ E2 Xj,s, E2 Xj :“ E2 Xj,s, and for the same purpose, due to (ii),

ξj 2 ξj E1E2X :“ E1E2 Xj, E1 pE2Xq :“ E1 E2 Xj . ´ ¯

Lemma 39. Let tXj,s | 1 ď j ď NB, 1 ď s ď NF u be a collection of Procrustean ran- dom functions with respect to some density function p on UTM. If

ξj |Xj,s| ď M0, E2 Xj ď M1, |E1E2X| ď M2 a.s. for all 1 ď j ď NB, 1 ď s ď NF , ˇ ˇ ˇ ˇ 139 ˇ ˇ then for any t ą 0 and 0 ă θ ă 1,

1 NB NF X ´ X ą t P N N j,s E1E2 # B F j“1 s“1 + ÿ ÿ

1 2 2 NB p1 ´ θq NF t exp 2 ď $´ 2 , ξ ξ 1 j“1 ’ j X2 ´ j X ` pM ` M q p1 ´ θq t/ ÿ & E2 j E2 j 3 0 1 . „ ´ ¯  ’ / % 1 2 2 - θ NBt ` exp ´ 2 . $ 2 2 1 , ’ 1 p 2Xq ´ p 1 2Xq ` pM1 ` M2q θt/ & E E E E 3 . “ ‰ Proof. Note that %’ -/

1 NB NF X ´ X ą t P N N j,s E1E2 # B F j“1 s“1 + ÿ ÿ 1 NB NF 1 NB 1 NB “ X ´ ξj X ` ξj X ´ X ą t P N N j,s N E2 j N E2 j E1E2 #˜ B F j“1 s“1 B j“1 ¸ ˜ B j“1 ¸ + ÿ ÿ ÿ ÿ 1 NB NF 1 NB ď X ´ ξj X ą p1 ´ θq t P N N j,s N E2 j ˜ # B F j“1 s“1 B j“1 + ÿ ÿ ÿ 1 NB ξj X ´ X ą θt N E2 j E1E2 # B j“1 + ¸ ď ÿ 1 NB NF 1 NB ď X ´ ξj X ą p1 ´ θq t P N N j,s N E2 j # B F j“1 s“1 B j“1 + ÿ ÿ ÿ 1 NB ` ξj X ´ X ą θt P N E2 j E1E2 # B j“1 + ÿ “: pIq ` pIIq , where θ P p0, 1q will be fixed in specific applications. Since

ξj E2 Xj ´ E1E2X ď M1 ` M2, ˇ ˇ ˇ 140ˇ ˇ ˇ by Bernstein’s Inequality [38, §2.2],

NB ξj pIIq “ P E2 Xj ´ E1E2X ą θNBt #j“1 + ÿ ´ ¯

1 2 2 2 $ θ NBt , ď exp ´ 2 ’ N / ’ B 2 1 / &’ ξj X ´ X ` pM ` M q θN t./ E1 E2 j E1E2 3 1 2 B j“1 ’ ÿ ” ı / ’ / ’ / % 1 2 2 2 - θ NBt “ exp ´ 2 $ 2 2 1 , ’ NB 1 p 2Xq ´ p 1 2Xq ` pM1 ` M2q θNBt/ & E E E E 3 . ’ “ ‰ / % 1 2 - θ NBt “ exp ´ 2 . $ 2 2 1 , ’ 1 p 2Xq ´ p 1 2Xq ` pM1 ` M2q θt/ & E E E E 3 . ’ “ ‰ / For pIq, note that % -

NB 1 NF pIq “ X ´ ξj X ą p1 ´ θq N t P N j,s E2 j B #j“1 ˜ F s“1 ¸ + ÿ ÿ NB 1 NF ď X ´ ξj X ą p1 ´ θq t P N j,s E2 j j“1 # F s“1 + ÿ ÿ

NB NF ξj “ P Xj,s ´ E2 Xj ą p1 ´ θq NF t j“1 #s“1 + ÿ ÿ ´ ¯ and applying Bernstein’s Inequality to each individual term in the summation yields

NF ξj P Xj,s ´ E2 Xj ą p1 ´ θq NF t #s“1 + ÿ ´ ¯

1 p1 ´ θq2 N 2 t2 $ F , ď exp ´ 2 ’ N / ’ F 2 1 / &’ ξj X ´ ξj X ` pM ` M q p1 ´ θq N t./ E2 j,s E2 j 3 0 1 F s“1 ’ ÿ ” ı / ’ / %’ 141 -/ 1 p1 ´ θq2 N 2 t2 2 F “ exp $´ , 2 1 ’ N ξj X2 ´ ξj X ` pM ` M q p1 ´ θq N t/ & F E2 j E2 j 3 0 1 F . „ ´ ¯  ’ / % 1 - p1 ´ θq2 N t2 2 F “ exp $´ , , 2 1 ’ ξj X2 ´ ξj X ` pM ` M q p1 ´ θq t/ & E2 j E2 j 3 0 1 . „  ’ ´ ¯ / %’ -/ and thus

1 2 2 NB p1 ´ θq NF t I exp 2 . p q ď $´ 2 , ξ ξ 1 j“1 ’ j X2 ´ j X ` pM ` M q p1 ´ θq t/ ÿ & E2 j E2 j 3 0 1 . „  ’ ´ ¯ / %’ -/

Remark 40. Intuitively, the second term in the bound stems from the sampling error on the base manifold, and is thus independent of δ and NF ; the first term in the bound comes from accumulating fibrewise sampling error across all NB fibres.

Proof of Theorem 21. We shall first establish the result for α “ 0. In this case, ˆ 0 ˆ K,δ p¨, ¨q “ K,δ p¨, ¨q, and

NB NF ˆ K,δ pxi,r, xj,sq f pxj,sq ˆ 0 j“1 s“1 H,δf pxi,rq “ ÿ ÿ NB NF ˆ K,δ pxi,r, xj,sq j“1 s“1 ÿ ÿ

NB NF 2 2 1 }ξ ´ ξ } }Pξ ,ξ xi,r ´ xj,s} K i j , j i f px q N N  δ j,s B F j“1 s“1 “ ÿ ÿ ˆ ˙ . NB NF 2 2 1 }ξ ´ ξ } }Pξ ,ξ xi,r ´ xj,s} K i j , j i N N  δ B F j“1 s“1 ÿ ÿ ˆ ˙

142 NF Since txj,sus“1 are i.i.d. with respect to p p¨ | ξjq, by the law of large numbers, for each fixed j “ 1, ¨ ¨ ¨ ,NB, as NF Ñ 8,

NF 2 2 1 }ξi ´ ξj} }Pξj ,ξi xi,r ´ xj,s} lim K , f pxj,sq NF Ñ8 N  δ F s“1 ÿ ˆ ˙ 2 2 }ξi ´ ξj} }Pξj ,ξi xi,r ´ w} “ K , f pξj, wq p pw | ξjq dσξj pwq , S  δ ż ξj ˆ ˙

NB Note that tξjuj“1 are i.i.d. with respect to p, again by the law of large numbers, as

NB Ñ 8,

NB NF 2 2 1 1 }ξi ´ ξj} }Pξj ,ξi xi,r ´ xj,s} lim lim K , f pxj,sq NB Ñ8 N NF Ñ8 N  δ B j“1 F s“1 ÿ ÿ ˆ ˙ 2 2 1 }ξi ´ y} }Py,ξi xi,r ´ w} “ p pyq K , f py, wq dσξj pwq p pw |yq dvolM pyq M ωd´1 S  δ ż ż ξj ˆ ˙ }ξ ´ y}2 }P x ´ w}2 “ K i , y,ξi i,r f py, wq p py, wq dΘ py, wq ,  δ żUTM ˆ ˙

where we used p py, wq “ p pyq p pw | yq. Setting f ” 1,

NB NF 2 2 1 1 }ξ ´ ξ } }Pξ ,ξ xi,r ´ xj,s} lim lim K i j , j i NB Ñ8 N NF Ñ8 N  δ B j“1 F s“1 ÿ ÿ ˆ ˙ }ξ ´ y}2 }P x ´ w}2 “ K i , y,ξi i,r p py, wq dΘ py, wq .  δ żUTM ˆ ˙

Therefore,

ˆ 0 lim lim H,δf pxi,rq NB Ñ8 NF Ñ8

}ξ ´ y}2 }P x ´ w}2 K i , y,ξi i,r f py, wq p py, wq dΘ py, wq  δ “ żUTM ˆ ˙ }ξ ´ y}2 }P x ´ w}2 K i , y,ξi i,r p py, wq dΘ py, wq  δ żUTM ˆ ˙ 143 ˜ 0 “ H,δf pxi,rq

m ∆H rfps px q ∆H p px q “ f px q `  21 S i,r ´ f px q S i,r i,r 2m p px q i,r p px q 0 „ i,r i,r  m ∆V rfps px q ∆V p px q ` δ 22 S i,r ´ f px q S i,r ` O 2 ` δ2 , 2m p px q i,r p px q 0 „ i,r i,r  ` ˘ where in the last equality we used the assumption δ “ O pq as  Ñ 0, as well as the

α observation in Remark 36 that Theorem 14 holds true when K,δ is replaced with

˜ α 2 2 K,δ. The bias error is thus O p ` δ q. It remains to estimate the variance error for the special case α “ 0. Write for

any fixed xi,r P UTM

ˆ ˆ Fj,s “ K,δ pxi,r, xj,sq f pxj,sq ,Gj,s “ K,δ pxi,r, xj,sq .

Note that Fi,s “ 0, Gi,s “ 0 for all s “ 1, ¨ ¨ ¨ ,NF , by Definition 20(1). Also, by the compactness of UTM we have some trivial bounds uniform in j, s:

|Fj,s| ď }K}8 }f}8 , |Gj,s| ď }K}8 .

In these notations,

ˆ 0 E1E2F lim lim H,δf pxi,rq “ , NB Ñ8 NF Ñ8 E1E2G and we would like to estimate

j s Fj,s E1E2F p pN ,N , βq :“ ´ ą β B F P G G # řj řs j,s E1E2 + ř ř for sufficiently small β ą 0. An upper bound for

j s Fj,s E1E2F ´ ă ´β P G G # řj řs j,s E1E2 + ř ř can be obtained in a similar manner.

144 Since Gj,s are all positive,

j s Fj,s E1E2G ´ j s Gj,s E1E2F p pNB,NF , βq “ P ą β $´ř ř ¯ G ´ř ř G ¯ , & j s j,s E1E2 . ´ ¯ % ř ř - “ P Fj,s E1E2G ´ Gj,s E1E2F ą β Gj,s E1E2G . #˜ j s ¸ ˜ j s ¸ ˜ j s ¸ + ÿ ÿ ÿ ÿ ÿ ÿ Denote

Yj,s :“ Fj,sE1E2G ´ Gj,sE1E2F ` β pE1E2G ´ Gj,sq E1E2G,

then it is easily verifiable that E1E2Yj,s “ 0 for all 1 ď j ď NB, 1 ď s ď NF , and

1 p pN ,N , βq “ Y ą β p Gq2 . B F P N N j,s E1E2 # B F j s + ÿ ÿ By Lemma 39, bounding this quantity reduces to computing various moments. Define

Xj :“ E2Yj,

then X1, ¨ ¨ ¨ ,XNB are i.i.d. on M with respect to p, and E1Xj “ 0 for 1 ď j ď NB.

Furthermore, X1, ¨ ¨ ¨ ,XNB are uniformly bounded. To find this bound explicitly, note that

|Xj| “ |E2Yj| “ |pE2Fjq E1E2G ´ pE2Gjq E1E2F ` β pE1E2G ´ E2Gjq E1E2G|

2 ď |pE2Fjq E1E2G| ` |pE2Gjq E1E2F | ` β pE1E2Gq ` β |E2Gj| |E1E2G| ,

and recall from Lemma 35 and Remark 36 that

d d´1 d d´1 E1E2F “ O  2 δ 2 , E1E2G “ O  2 δ 2 , ´ ¯ ´ ¯ d´1 d´1 E2Fj “ O δ 2 , E2Gj “ O δ 2 , ´ ¯ ´ ¯ thus ˜ d d´1 d d´1 d d´1 |Xj| ď C 2 δ ` β  δ `  2 δ ´ ¯ 145 where C˜ is some positive constant depending on the pointwise bounds of K, p, and f. Since we will be mostly interested in small β ą 0, let us pick β “ O p2 ` δ2q and write the upper bound on |Xj| as

d 2 d´1 |Xj| ď C δ ,C “ C p}K}8 , }f}8 , pm, pM q ą 0. (B.0.33)

2 We then need to bound E1Xj . Note that

2 2 2 2 2 E1Xj “ E1 pE2Fjq pE1E2Gq ` E1 pE2Gjq pE1E2F q

“ ‰ “ ‰ 2 2 2 ´ 2E1 rpE2Fjq pE2Gjqs pE1E2F q pE1E2Gq ` β E1 pE1E2G ´ E2Gjq pE1E2Gq “ ‰ ` 2β pE1E2Gq E1 pE1E2G ´ E2Gjq rpE2Fjq E1E2G ´ pE2Gjq E1E2F s

2 2 2 2 ( “ E1 pE2Fjq pE1E2Gq ` E1 pE2Gjq pE1E2F q “ ‰ “ ‰ ´ 2E1 rpE2Fjq pE2Gjqs pE1E2F q pE1E2Gq

2 2 2 2 ` β pE1E2Gq E1 pE2Gq ´ pE1E2Gq

“ 2 ‰ ` 2β pE1E2Gq E1 pE2Gq E1E2F ´ pE1E2Gq E1 pE2FjE2Gjq , “ ‰ it suffices to compute the first and second moments of E2Fj, E2Gj for 1 ď j ď NB. By (B.0.31),

d d´1 m21 H F “  2 δ 2 m rfps px q `  ∆ rfps px q ` E pξ q rfps px q E1E2 0 i,r 2 S i,r 1 i i,r " ` ˘ m ` δ 22 ∆V rfps px q ` E ¨ rfps px q ` O 2 ` δ2 , 2 S i,r 2 i,r * ` ˘ ` ˘ and if we set f ” 1, then

d d´1 m21 H G “  2 δ 2 m p px q `  ∆ p px q ` E pξ q p px q E1E2 0 i,r 2 S i,r 1 i i,r " ` ˘ m ` δ 22 ∆V p px q ` E p px q ` O 2 ` δ2 . 2 S i,r 2 i,r * ` ˘ ` ˘ Before we turn to computation of second moments, let us introduce another notation:

146 write the conditional probability density function on fibre Sx as

p px, vq p px, vq p p pv | xq “ “ “ px, vq . p pxq p ˝ π px, vq p ˝ π „  where π : UTM Ñ M is the canonical projection from UTM to M. This will help us avoid creating new notations for the base and fibre components of the coordinates

of xi,r, as needed in p pv | xq. Applying Lemma 35 once, we have

2 d´1 }ξi ´ ξj} fp 2 E2Fj “ δ M0 Pξj ,ξi xi,r  p ˝ π " ˜ ¸ „  ` ˘ δ }ξ ´ ξ }2 fp fp ` M i j ∆V P x ` E ¨ P x ` O δ2 , 2 2  S p ˝ π ξj ,ξi i,r 2 p ˝ π ξj ,ξi i,r ˜ ¸ „ „  „   * ` ˘ ` ˘ ` ˘ where M0 p¨q, M2 p¨q are functions depending only on the kernel K, as in the proof of

Lemma 33. For simplicity of notation, let us write them as M0,M2 for short. Now we square both sides of the equality above 2 2 d´1 2 fp fp pE2Fjq “ δ M0 Pξj ,ξi xi,r ` δM0M2 Pξj ,ξi xi,r ˆ p ˝ π p ˝ π " „  „  ` ˘ ` ˘ fp fp ∆V P x ` E ¨ P x ` O δ2 , S p ˝ π ξj ,ξi i,r 2 p ˝ π ξj ,ξi i,r ˆ „  „  ˙ * ` ˘ ` ˘ ` ˘ and apply Lemma 35 to get 2 E1 pE2Fjq

2 1 2 2 d d´1 1 pfpq m21 H pfpq pfpq “  2 δ m px q `  ∆ px q ` E pξ q px q 0 p ˝ π i,r 2 S p ˝ π i,r 1 i p ˝ π i,r " « ff ˜ « ff « ff ¸ fp fp ` δm1 rfps px q ∆V px q ` E ¨ px q ` O 2 ` δ2 , 22 i,r S p ˝ π i,r 2 p ˝ π i,r ˆ „  „  ˙ * ` ˘ 1 1 1 where m0, m21, m22 are positive constants determined by the kernel function K and dimension d: d 1 2 2 1 d 2 k 2 m0 “ M0 r ds ¨ ¨ ¨ ds , r “ s , Bd 0 ż 1 p q k“1 ` ˘ ÿ ` ˘ 147 1 2 1 2 1 d m21 “ M0 r s ds ¨ ¨ ¨ ds , Bd 0 ż 1 p q ` ˘ ` ˘ 1 2 2 1 d m22 “ M0 r M2 r ds ¨ ¨ ¨ ds . Bd 0 ż 1 p q ` ˘ ` ˘

Setting f ” 1, we obtain a similar expansion for the variance of E2Gj

2 E1 pE2Gjq

2 1 2 2 d d´1 1 p m21 H p p “  2 δ m px q `  ∆ px q ` E pξ q px q 0 p ˝ π i,r 2 S p ˝ π i,r 1 i p ˝ π i,r " „  ˆ „  „  ˙ p p ` δm1 p px q ∆V px q ` E ¨ px q ` O 2 ` δ2 . 22 i,r S p ˝ π i,r 2 p ˝ π i,r ˆ „  „  ˙ * ` ˘ Similarly,

2 d fp 2 d´1 1 E1 rpE2Fjq pE2Gjqs “  δ m0 pxi,rq p ˝ π " „  m1 fp2 fp2 `  21 ∆H px q ` E pξ q 2 S p ˝ π i,r 1 i p ˝ π ˆ „  „ ˙ m1 fp p ` δ 22 p px q ∆V px q ` rfps px q ∆V px q 2 i,r S p ˝ π i,r i,r S p ˝ π i,r ˆ „  „  fp2 ` 2E ¨ px q ` O 2 ` δ2 . 2 p ˝ π i,r „  ˙ * ` ˘ 2 It remains to plug all these moment expansions back into E1Xj . Clearly, we are only interested in scenarios in which β is sufficiently small, say β “ O p2 ` δ2q, thus the

2 2 3d 2pd´1q 2 2 O pβq and O pβ q terms in E1Xj can be thrown into O  2 δ p ` δ q . Direct ” ı computation yields

2 2 2 2 2 E1Xj “ E1 pE2Fjq pE1E2Gq ` E1 pE2Gjq pE1E2F q

“ ‰ “ ‰ 2 2 ´ 2E1 rpE2Fjq pE2Gjqs pE1E2F q pE1E2Gq ` O  ` δ

2 1 2 ` ˘ 2 3d 2pd´1q m0m21 2 H pfpq 2 H p “  2 δ  p px q ∆ px q ` f px q ∆ px q 2 i,r S p ˝ π i,r i,r S p ˝ π i,r " ˆ « ff „  148 fp2 ´ 2f px q ∆H px q ` O 2 ` δ2 i,r S p ˝ π i,r „  ˙ * ` ˘ 2 1 2 p˚q 3d 2pd´1q m0m21 2 p H 2 “  2 δ  p px q ¨ 2 px q ∇ f px q 2 i,r p ˝ π i,r S i,r " „  › › › › ` O 2 ` δ2 * ` ˘ 4 3d 2pd´1q 2 1 p H 2 2 2 “  2 δ m m px q ∇ f px q ` O  ` δ . 0 21 p ˝ π i,r S i,r " „  * › › ` ˘ › › Note that at p˚q we used (denote g “ p2{ pp ˝ πq for short)

H 2 2 H H ∆S f g ` f ∆S g ´ 2f∆S pfgq

` ˘ H 2 2 H H 2 H 2 H “ ∆S f g ` f ∆S g ` 2 ∇S f , ∇S g ` f ∆S g

` 2˘ H H @ H D H ´ 2f ∆S g ´ 2fg∆S f ´ 4f ∇S f, ∇S g

H 2 H H @2 H D H “ ∆S f ´ 2f∆S f g ` 2 ∇S f ´ 2f∇S f, ∇S g

“ H 2 ‰ @ D “ 2 ∇S f g. › › › 2 › Therefore, we can bound E1Xj uniformly in j as

3d 2 2 2pd´1q 1 2 2 E1Xj ď  δ C  ` O  ` δ ` ` ˘˘ where m2m1 p4 ∇H f 2 C1 “ 0 21 M S 8 ω4 p d´1› m › › › is a positive constant. Interestingly, O pδq terms do not show up in this bound. In

hindsight, this makes sense because Xj “ E2Yj,s is already the expectation along the fibre direction, which intuitively “froze” the variability controlled by the fibrewise bandwidth δ. It remains to bound 2 ξj 2 ξj E2 Yj ´ E2 Yj ´ ¯ 149 2 2 for each 1 ď j ď NB. For β “ O p ` δ q, a bound for |Yj,s| can be found as

|Yj,s| “ |Fj,sE1E2G ´ Gj,sE1E2F ` β pE1E2G ´ Gj,sq E1E2G|

d d´1 d d´1 2 2 2 2 “ O  δ }K}8 }f}8 ` O  δ }K}8 }f}8 ´ ¯ ´ ¯ d d´1 2 2 d d´1 ` β O  δ }K}8 ` O  δ }K}8 ´ ´ ¯ ¯ d d´1 ` ˘ ď C 2 δ 2 where

C “ C p}K}8 , }f}8 , pm, pM q is a positive constant independent of j. Again taking advantage of β “ O p2 ` δ2q, we have

2 2 2 2 2 E2Yj “ E2Fj,s pE1E2Gq ` E2Gj,s pE1E2F q ´ 2E2 pFj,sGj,sq pE1E2F q pE1E2Gq

` O dδ2pd´1q 2 ` δ2 ,

2 “ ` ˘‰ 2 pE2Yjq “ rpE2Fjq E1E2G ´ pE2Gjq E1E2F ` β pE1E2G ´ E2Gjq E1E2Gs

2 2 2 2 “ pE2Fjq pE1E2Gq ` pE2Gjq pE1E2F q

d 2pd´1q 2 2 ´ 2 pE2Fjq pE1E2Gq pE2Gjq pE1E2F q ` O  δ  ` δ , “ ` ˘‰ thus

2 2 2 2 2 2 2 2 E2Yj ´ pE2Yjq “ E2Fj,s ´ pE2Fjq pE1E2Gq ` E2Gj,s ´ pE2Gjq pE1E2F q “ ‰ “ ‰ ` 2 rpE2Fjq pE2Gjq ´ E2 pFj,sGj,sqsq pE1E2F q pE1E2Gq

` O dδ2pd´1q 2 ` δ2 . “ ` ˘‰ Observe that

d´1 d´1 d´1 2 2 2 2 2 E2Fj,s “ O δ , E2Gj,s “ O δ , E2 rFj,sGj,ss “ O δ , ´ ¯ ´ ¯ ´ ¯ while

2 d´1 2 d´1 d´1 pE2Fj,sq “ O δ , pE2Gj,sq “ O δ , pE2Fjq pE2Gjq “ O δ , ` ˘ 150` ˘ ` ˘ 2 2 thus the leading order error in E2Yj ´ pE2Yjq are determined by

2 2 2 2 E2Fj,s pE1E2Gq ` E2Gj,s pE1E2F q ´ 2E2 pFj,sGj,sq pE1E2F q pE1E2Gq . ` ˘ ` ˘ Note that by Lemma 35

2 2 2 2 }ξi ´ ξj} Pξj ,ξi xi,r ´ pξj, wq 2 E2Fj,s “ K , f pξj, wq p pw | ξjq dσξj pwq S  δ ξj ˜ › › ¸ ż › › 2 d´1 }ξi ´ ξj} 2 p “ δ 2 M f P x P x 0  ξj ,ξi i,r p ˝ π ξj ,ξi i,r " ˜ ¸ „  ` ˘ ` ˘ Ă δ }ξ ´ ξ }2 f 2p f 2p ` M i j ∆V P x ` E ¨ P x 2 2  S p ˝ π ξj ,ξi i,r 2 p ˝ π ξj ,ξi i,r ˜ ¸ „ „  „   ` ˘ ` ˘ Ă ` O δ2 , * ` ˘ where M0, M2 are constants depending on ξj but uniformly bounded over M:

2 2 Ă Ă }ξi ´ ξj} 2 }ξi ´ ξj} 2 1 d´1 M0 “ K , ρ dθ ¨ ¨ ¨ dθ  d´1  ˇ ˜ ¸ˇ ˇ B1 p0q ˜ ¸ ˇ ˇ ˇ ˇż ˇ ˇ ˇ ˇ ˇ ˇĂ ˇ ˇ 2 d´1 ˇ ˇ ˇ ďˇ }K}8 Vol B1 p0q , ˇ

2 ` ˘ 2 }ξi ´ ξj} 1 2 2 }ξi ´ ξj} 2 1 d´1 M2 “ θ K , ρ dθ ¨ ¨ ¨ dθ  d´1  ˇ ˜ ¸ˇ ˇ B1 p0q ˜ ¸ ˇ ˇ ˇ ˇż ˇ ˇ ˇ ˇ ` ˘ ˇ ˇĂ ˇ ˇ 2 d´1 ˇ ˇ ˇ ďˇ }K}8 Vol B1 p0q . ˇ ` ˘ For simplicity of notation, we shall denote these constants merely as M0, M2, drop- ping the dependency on ξ . The expansion for F 2 is thus j E2 j,s Ă Ă

2 2 d´1 f p δ f p 2 2 V E2Fj,s “ δ M0 Pξj ,ξi xi,r ` M2 ∆S Pξj ,ξi xi,r p ˝ π 2 p ˝ π " „  „ „  ` ˘ ` ˘ Ă Ăf 2p ` E ¨ P x ` O δ2 . 2 p ˝ π ξj ,ξi i,r „   * ` ˘ ` ˘ 151 Similarly,

d´1 p δ p 2 2 V E2Gj,s “ δ M0 Pξj ,ξi xi,r ` M2 ∆S Pξj ,ξi xi,r p ˝ π 2 p ˝ π " „  „ „  ` ˘ ` ˘ Ă Ăp ` E ¨ P x ` O δ2 , 2 p ˝ π ξj ,ξi i,r „   * ` ˘ ` ˘ d´1 fp δ fp 2 V E2 pFj,sGj,sq “ δ M0 Pξj ,ξi xi,r ` M2 ∆S Pξj ,ξi xi,r p ˝ π 2 p ˝ π " „  „ „  ` ˘ ` ˘ Ă Ăfp ` E ¨ P x ` O δ2 . 2 p ˝ π ξj ,ξi i,r „   * ` ˘ ` ˘ A direct computation yields

2 2 2 2 E2Fj,s pE1E2Gq ` E2Gj,s pE1E2F q ´ 2E2 pFj,sGj,sq pE1E2F q pE1E2Gq ` ˘ ` ˘ d 3pd´1q 2 2 p 2 “  δ 2 m M p px q P x f P x ´ f px q 0 0 i,r p ˝ π ξj ,ξi i,r ξj ,ξi i,r i,r " „  ` ˘ “ ` ˘ ‰ Ă p ` m m M p px q P x 0 21 0 i,r p ˝ π ξj ,ξi i,r „  ` ˘ Ă H 2 ˆ ∆S p pxi,rq ` E1 pξq p pxi,rq f Pξj ,ξi xi,r ´ f pxi,rq „ ` ˘ “ ` ˘ ‰ H H ´ ∆S rfps pxi,rq ´ f pxi,rq ∆S p pxi,rq f Pξj ,ξi xi,r ´ f pxi,rq  ` ˘ “ ` ˘ ‰ p 2 ` δm2M p2 px q P x ∇H f P x 0 2 i,r p ˝ π ξj ,ξi i,r S ξj ,ξi i,r „  ` ˘ › ` ˘› Ă p › › ` δm m M p px q P x 0 22 0 i,r p ˝ π ξj ,ξi i,r „  ` ˘ Ă V 2 ˆ ∆S p pxi,rq ` E2 ¨ p pxi,rq f Pξj ,ξi xi,r ´ f pxi,rq „ ` ˘ “ ` ˘ ‰ V V 2 2 ´ ∆S rfps pxi,rq ´ f pxi,rq ∆S p pxi,rq f Pξj ,ξi xi,r ´ f pxi,rq ` O  ` δ .  * ` ˘ “ ` ˘ ‰ ` ˘

Recall from Lemma 29 that the difference between Pξj ,ξi xi,r and xi,r along the fibre

2 direction is O pdM pξj, ξiqq “ O pq. Therefore, the distance (under the Sasaki metric) 152 between Pξj ,ξi xi,r and xi,r in UTM is bounded by the square root of the square sum

of dM pξj, ξiq and the difference between Pξj ,ξi xi,r and xi,r along the fibre direction,

1 which is of order O  2 . As a result, all terms involving ´ ¯

f Pξj ,ξi xi,r ´ f pxi,rq “ ` ˘ ‰ 1 are of order O  2 . Thus ´ ¯ 2 2 2 2 E2Fj,s pE1E2Gq ` E2Gj,s pE1E2F q ´ 2E2 pFj,sGj,sq pE1E2F q pE1E2Gq

ˇ` d 3pd˘´1q 1 2 ` 1˘ 2 ˇ ˇď  δ 2 pC  ` C δq ,C ą 0,C ą 0. ˇ

As a result,

2 3pd´1q ξj ξj 2 d 2 E2 Yj ´ E2 Yj “ O  δ p ` δq . ´ ¯ ´ ¯ We are now ready for applying Lemma 39 to Yj,s. Since

d d´1 |E1E2G| “ O  2 δ 2 , ´ ¯ 2 2 we have constants C1 ,C2 such that

d d´1 d d´1 2 2 2 2 2 2 C1  δ ď |E1E2G| ď C2  δ .

For any θ P p0, 1q to be fixed later,

1 p pN ,N , βq “ Y ą β p Gq2 B F P N N j,s E1E2 # B F j s + ÿ ÿ

1 2 2 4 NB p1 ´ θq N β p Gq 2 F E1E2 ď exp $´ , 2 d 1 ξj 2 ξj 2 d ´ 2 j“1 ’ Y ´ Y ` ¨ 2C 2 δ 2 p1 ´ θq β p Gq / ÿ & E2 j E2 j 3 E1E2 . „  ’ ´ ¯ / %’ 1 -/ θ2N β2 p Gq4 2 B E1E2 ` exp $´ , 2 2 d d´1 2 ’ 1X ` C 2 δ θβ p 1 2Gq / & E j 3 E E . %’ 153 -/ 1 p1 ´ θq2 N β2 pC2q4 2dδ2pd´1q 2 F 1 ď NB ¨ exp ´ $ 3pd´1q d 1 , d 4 d ´ 2 2 d d´1 ’ C˜ δ 2 p ` δq ` C 2 δ 2 p1 ´ θq β ¨ pC q  δ / & 3 2 .

%’ 1 4 -/ θ2N β2 pC2q 2dδ2pd´1q 2 B 1 ` exp $´ , . 1 2 2 3d 2pd´1q 2 d d´1 2 2 d d´1 ’ C  ` O  ` δ  2 δ ` C 2 δ θβ ¨ pC q  δ / & 3 2 . ` ` ˘˘ %’ -/ Again by restricting ourselves to β “ O p2 ` δ2q, this bound can be rewritten as

p pNB,NF , βq

2 d d´1 2 2 d 2 p1 ´ θq NF  δ 2 β θ NB 2 β ď NB exp ´ ` exp ´ . d C  O 2 δ2 $ C p ` δq ` O  2 p2 ` δ2q , # 2 ` p ` q+ & 1 . ´ ¯ % - (B.0.34) As pointed out in Remark 40, the second term in this bound is the sampling error on the base manifold; the noise error resulted from this term is of the order

1 1 d ´ 2 ´ 1 d 2 ´1 2 2 ´ 4 O NB “ O NB  , „´ ¯  ´ ¯ which is in accordance with the convergence rate obtained in [134]. The first term in the bound reflects the accumulated fibrewise sampling error and grows linearly with respect to the number of fibres sampled, but can be reduced as one increases NF accordingly (which has an effect of reducing fibrewise sampling errors). The choice of θ is important: as θ increases from 0 to 1, the first term in the bound decreases but the second term increases. One may wish to pick an “optimal” θ P p0, 1q, but this does not make sense unless one chooses , δ, NF appropriately so as to make the sum of the two terms smaller than 1. Let us consider θ˚ P p0, 1q satisfying

2 d´1 d d 2 2 2 p1 ´ θ˚q NF  δ “ θ˚NB , (B.0.35)

154 or equivalently

d d´1 NF  4 δ 4 d d´1 NF θ˚ NB  4 δ 4 “ ô θ˚ “ c . (B.0.36) NB 1 ´ θ˚ d d´1 NF c 1 `  4 δ 4 N c B

Setting θ “ θ˚ in (B.0.34), we have

d 2 2 2 θ˚NB β p pNB,NF , βq ď pNB ` 1q exp ´ # C p ` δq + (B.0.37) d 2 2 2 θ˚NB β “ exp ´ ` log pNB ` 1q , ˜ C p ` δq ¸ where C is some positive constant. Since

N lim B “ 8, NB Ñ8 log NB

for any fixed , δ we have p pNB,NF , βq Ñ 0 as NB Ñ 8, as long as one increases

NF accordingly so as to prevent θ˚ from approaching 0 or 1; for instance, this can be achieved by requiring N lim F “ ρ P p0, 8q . (B.0.38) NB Ñ8 NB NF Ñ8

ˆ 0 Under this condition, we have the pointwise convergence in probability of H,δf. We now turn to the general case α ‰ 0. Recall that

N N N N B F B F Kˆ px , x q f px q Kˆ α px , x q f px q ,δ i,r j,s j,s ,δ i,r j,s j,s pˆα x pˆα x j 1 s 1 j 1 s 1 ,δ p i,rq ,δ p j,sq Hˆ α f x “ “ “ “ ,δ p i,rq “ ÿ ÿN N “ ÿN ÿN B F B F Kˆ px , x q Kˆ α px , x q ,δ i,r j,s ,δ i,r j,s pˆα px q pˆα px q j“1 s“1 j“1 s“1 ,δ i,r ,δ j,s ÿ ÿ ÿ ÿ where

NB NF ˆ pˆpxj,sq “ K,δ pxj,s, xk,tq . k“1 t“1 ÿ ÿ 155 As NB Ñ 8, NF Ñ 8, by the law of large numbers,

1 1 ˜ lim lim pˆpxj,sq “ K,δ pxi,r, ηq p pηq dΘ pηq NB Ñ8 N NF Ñ8 N B F żUTM ˜ “ p˜pxi,rq “ E1E2 K,δ pxj,s, ¨q . ” ı ˆ α Therefore, as NB Ñ 8,NF Ñ 8, we expect H,δf pxi,rq to converge to

˜ α K,δ pxi,r, ηq f pηq p pηq dΘ py, wq UTM ˜ α ż “ H,δf pxi,rq ˜ α K,δ pxi,r, ηq p pηq dΘ py, wq żUTM m ∆H rfp1´αs px q ∆H p1´α px q “f px q `  21 S i,r ´ f px q S i,r i,r 2m p1´α px q i,r p1´α px q 0 „ i,r i,r  m ∆V rfp1´αs px q ∆V p1´α px q ` δ 22 S i,r ´ f px q S i,r ` O 2 ` δ2 , 2m p1´α px q i,r p1´α px q 0 „ i,r i,r  ` ˘ which gives the same bias error O p2 ` δ2q as for the α “ 0 case.

Now it remains to estimate the variance error. Since our notationp ˆ,δ differs from

d d´1 the standard kernel density estimator by a factor  2 δ 2 , we shall compensate for it in the following computation.

N N N N B F Kˆ px , x q f px q B F ,δ i,r j,s j,s Kˆ px , x q pˆ´α px q f px q pˆα x pˆα x ,δ i,r j,s ,δ j,s j,s j 1 s 1 ,δ p i,rq ,δ p j,sq j 1 s 1 Hˆ α f x “ “ “ “ , ,δ p i,rq “ ÿN ÿN “ ÿ ÿN N B F Kˆ px , x q B F ,δ i,r j,s Kˆ px , x q pˆ´α px q pˆα px q pˆα px q ,δ i,r j,s ,δ j,s j“1 s“1 ,δ i,r ,δ j,s j“1 s“1 ÿ ÿ ÿ ÿ

and

NB NF NB NF ˆ ´α ˆ ´α K,δ pxi,r, xj,sq pˆ,δ pxj,sq f pxj,sq K,δ pxi,r, xj,sq p˜,δ pxj,sq f pxj,sq j“1 s“1 j“1 s“1 ÿ ÿ ´ ÿ ÿ NB NF NB NF ˆ ´α ˆ ´α K,δ pxi,r, xj,sq pˆ,δ pxj,sq K,δ pxi,r, xj,sq p˜,δ pxj,sq j“1 s“1 j“1 s“1 ÿ ÿ ÿ ÿ 156 NB NF ˆ α α ´α ´α K,δ pxi,r, xj,sq NBNF pˆ,δ pxj,sq ´ p˜,δ pxj,sq f pxj,sq j“1 s“1 “ ÿ ÿ NB NF “ ‰ ˆ α α ´α K,δ pxi,r, xj,sq NBNF pˆ,δ pxj,sq j“1 s“1 ÿ ÿ

NB NF ˆ ´α ` K,δ pxi,r, xj,sq p˜,δ pxj,sq f pxj,sq ˆ j“1 s“1 ÿ ÿ

NB NF ˆ α α ´α ´α ´ K,δ pxi,r, xj,sq NBNF pˆ,δ pxj,sq ´ p˜,δ pxj,sq » j“1 s“1 fi ÿ ÿ “ ‰ — NB NF NB NF ffi — Kˆ px , x q N αN αpˆ´α px q Kˆ px , x q p˜´α px q ffi — ,δ i,r j,s B F ,δ j,s ,δ i,r j,s ,δ j,s ffi —˜j“1 s“1 ¸ ˜j“1 s“1 ¸ffi — ÿ ÿ ÿ ÿ ffi – fl “: pAq ` pBq , thus if we can estimate pAq, pBq by controlling the error

α α ´α ´α NBNF pˆ,δ pxj,sq ´ p˜,δ pxj,sq “ ‰ then it suffices to estimate the variance error caused by

NB NF ˆ K,δ pxi,r, xj,sq f pxj,sq p˜α px q p˜α px q j“1 s“1 ,δ i,r ,δ j,s ÿ ÿ . (B.0.39) NB NF ˆ K,δ pxi,r, xj,sq p˜α px q p˜α px q j“1 s“1 ,δ i,r ,δ j,s ÿ ÿ Our previous proof for the special case α “ 0 can then be applied to (B.0.39): the ˆ only adjustment is to replace the kernel K,δ px, yq in that proof with the α-normalized kernel ˜ K,δ px, yq α α . p˜,δ pxq p˜,δ pyq

We would like to estimate the tail probability

1 pˆ px q ´ p˜ px q ą β , P N N ,δ j,s ,δ j,s " B F * 157 d d´1 but sincep ˜,δ pxj,sq “ O  2 δ 2 , it is not lower bounded away from 0 as , δ Ñ 0. ´ ¯ For this reason, and noting that pAq and pBq are invariant if we replacep ˆ,δ,p ˜,δ with

´ d ´ d´1 ´ d ´ d´1  2 δ 2 pˆ,δ,  2 δ 2 p˜,δ, we estimate instead

1 ´ d ´ d´1 ´ d ´ d´1 q pN ,N , βq :“  2 δ 2 p˜ px q ´  2 δ 2 p˜ px q ą β B F P N N ,δ j,s ,δ j,s " B F *

1 d d´1 “ p˜ px q ´ p˜ px q ą  2 δ 2 β P N N ,δ j,s ,δ j,s " B F * where

NB NF ˆ pˆ,δ pxj,sq “ K,δ pxj,s, xk,tq , k“1 t“1 ÿ ÿ ˜ p˜,δ pxj,sq “ E1E2 K,δ pxj,s, ¨q . ” ı We would like to apply Lemma 39 again. For this purpose, first note that

d´1 ˜ ˜ 2 K,δ pxi,r, xj,sq ď }K}8 , E2 K,δ pxi,r, ¨q ď Cδ , ˇ ˇ ˇ ” ıˇ ˇ ˇ ˜ ˇ d dˇ´1 ˇ ˇ E1E2 K,δ pˇxi,r, ¨q ď C 2 δ ˇ2 , ˇ ” ıˇ ˇ ˇ where ˇ ˇ

C “ C p}K}8 , pM , pm, dq is some positive constant. Moreover, direct computation yields

2 d 1 2 ξj ´ d ˜ 2 ˜ 2 d´1 E2 K,δ pxi,r, ¨q “ O δ , E1 E2K,δ pxi,r, ¨q “ O  δ . ” ı ´ ¯ ” ı ´ ¯ Therefore, by Lemma 39, for β “ O p2 ` δ2q,

2 d d´1 2 2 d d´1 2 p1 ´ θq NF  δ β θ NB δ β q pNB,NF , βq ď NB exp ´ d 1 ` exp ´ d ´ d´1 # 2C1δ 2 + " 2C1 2 δ *

2 d d´1 2 2 d 2 p1 ´ θq NF  δ 2 β θ NB 2 β “ NB exp ´ ` exp ´ # 2C1 + # 2C1 + 158 where C1 ą 0 is some constant. A simple union bound gives

1 d d´1 pˆ px q ´ p˜ px q ą  2 δ 2 β P N N ,δ j,s ,δ j,s ˜ j,s "ˇ B F ˇ *¸ ď ˇ ˇ ˇ ˇ ˇ 2 dˇ d´1 2 2 d 2 p1 ´ θq NF  δ 2 β θ NB 2 β ď NBNF NB exp ´ ` exp ´ . « # 2C1 + # 2C1 +ff

If we let θ “ θ˚, where θ˚ is defined in (B.0.35),

2 2 θ˚NB p1 ´ θ˚q NF “ d d´1 ,  2 δ 2

and hence

1 d d´1 pˆ px q ´ p˜ px q ą  2 δ 2 β P N N ,δ j,s ,δ j,s ˜ j,s "ˇ B F ˇ *¸ ď ˇ ˇ (B.0.40) ˇ d ˇ ˇ 2 2 ˇ2 θ˚NB β ď NB pNB ` 1q NF exp ´ . # 2C1 +

We are interested in seeing how this bound compares with the bound in (B.0.37).

As NB,NF Ñ 8, as long as (B.0.38) holds,

d 2 2 2 θ˚NB β NB pNB ` 1q NF exp ´ # 2C1 + d 2 2 2 θ˚NB β pNB ` 1q exp ´ # C p ` δq +

2 d 2 1 1 “N N exp ´θ N  2 β ´ ÝÑ 8 for small , δ, B F ˚ B 2C C p ` δq " „ 1 *

thus the bound in (B.0.37) is asymptotically negligible compared to the bound in

(B.0.40). This means that when α ‰ 0 the density estimation in general slows down

1 the convergence rate by a factor p ` δq 2 , which is consistent with the conclusion ˆ α for standard diffusion maps on manifolds [133, 66], since H,δ is essentially the heat 159 kernel of the diffusion process (instead of the graph hypoelliptic Laplacian itself). As a consequence of this observation, we know that for probability at least

d 2 2 2 θ˚NB β 1 ´ NB pNB ` 1q NF exp ´ # 2C1 +

we have

NB NF ˆ K,δ pxi,r, xj,sq f pxj,sq α α ˇ p˜,δ pxi,rq p˜,δ pxj,sq ˇ ˇ j“1 s“1 ˜ α ˇ ÿ ÿ ´ H,δf pxi,rq ď β ˇ NB NF ˆ ´α ˇ ˇ K,δ pxi,r, xj,sq p˜ pxj,sq ˇ ˇ ,δ ˇ ˇ p˜α px q p˜α px q ˇ ˇ j“1 s“1 ,δ i,r ,δ j,s ˇ ˇ ÿ ÿ ˇ ˇ ˇ ˇ ˇ as well as ˇ ˇ

1 d d´1 pˆ px q ´ p˜ px q ď  2 δ 2 β for all 1 ď j ď N , 1 ď s ď N . N N ,δ j,s ,δ j,s B F ˇ B F ˇ ˇ ˇ ˇ ˇ Noteˇ that by our assumption ˇ

0 ă pm ď p px, vq ď pM ă 8 for all px, vq P UTM

there exists constants C1,C2 such that

´ d ´ d´1 0 ă C1 ă  2 δ 2 p˜,δ pxj,sq ă C2 ă 8.

´1 ´1 For sufficiently small β, these bounds also apply to NB NF pˆ,δ pxj,sq with high prob- ability:

1 ´ d ´ d´1 0 ă C1 ă  2 δ 2 pˆ,δ pxj,sq ă C2 ă 8. NBNF

More specifically, we have

1 ´1 1 0 ă d d´1 ă NBNF pˆ,δ pxj,sq ă d d´1 ă 8, C2 2 δ 2 C1 2 δ 2

1 ´1 1 0 ă d d´1 ă p˜,δ pxj,sq ă d d´1 ă 8, C2 2 δ 2 C1 2 δ 2 160 and

´1 ´1 d d´1 1 β N N pˆ x p˜ x  2 δ 2 β . B F ,δ p j,sq ´ ,δ p j,sq ď ¨ 2 d d´1 “ d d´1 C  δ 2 2 2 1 C1  δ ˇ ˇ ˇ ˇ The errors pAq, pBq can thus be bounded as

α´1 α´1 αd αpd´1q 2 β 2 αC2 }f} α 2 2 8 |pAq| ď C2  δ }f}8 ¨ α d d´1 d d´1 “ 2 β, 2 2 2 2 2 C ˆC2 δ ˙ C1  δ 1 C2ααdδαpd´1q 2 α´1 β 2α´1αCα`1 }f} |pBq| ď 2 }f} ¨ α “ 2 8 β. α αd αpd´1q 8 d d´1 2 d d´1 α`2 2 2 C  2 δ 2 C  2 δ 2 C C1  δ ˆ 2 ˙ 1 1

Since C1,C2 only depends on the kernel function K, the dimension d, and pm, pM , these bounds ensures that

ˆ α ˜ α H,δf pxi,rq ´ H,δf pxi,rq ă Cβ ˇ ˇ ˇ ˇ with probability at least ˇ ˇ

d 2 2 2 θ˚NB β 1 ´ NB pNB ` 1q NF exp ´ , # 2C1 +

where constants C,C1 only depends on the kernel function K, the dimension d, and pm, pM . This establishes the conclusion for all α P r0, 1s.

Sampling from Empirical Tangent Spaces

The following two lemmas from [133] provide estimates for the approximation error in the local PCA step. We adapted these lemmas to our notation; note that the statements are more compact than their original form since we assume M is closed.

2 2 ´ d`2 Lemma 41. Suppose KPCA P C pr0, 1sq. If PCA “ O NB , then, with high ˆ ˙ probability, the columns of the D ˆ d matrix Oi determined by local PCA form an

161 D orthonormal basis to a d-dimensional subspace of R that deviates from ι˚Txi M by

3 2 O PCA , in the following sense: ´ ¯

3 ´ 3 J 2 d`2 min }Oi Θi ´ O}HS “ O PCA “ O NB , (B.0.41) OPOpdq ´ ¯ ˆ ˙ where Θi is a D ˆ d matrix whose columns form an orthonormal basis to ι˚Txi M. Let the minimizer if (B.0.41) be

ˆ J Oi “ arg min }Oi Θi ´ O}F, (B.0.42) OPOpdq

and denote by Qi the D ˆ d matrix

ˆJ Qi :“ ΘiOi , (B.0.43)

The columns of Qi form an orthonormal basis to ι˚Txi M, and

}Oi ´ Qi}F “ O pPCAq , (B.0.44)

where }¨}F is the matrix Frobenius norm.

Proof. See [133, Lemma B.1].

Lemma 42. Consider points xi, xj P M such that the geodesic distance between them

2 1 ´ d`2 2 is O  . For PCA “ O NB , with high probability, Oij approximates Pxi,xj in ´ ¯ ˆ ˙ the following sense:

d 1 3 O X¯ ι P X x , u x O  2  2 , for all X Γ M,TM , ij j “ x ˚ xi,xj p jq l p iqy l“1 ` PCA ` P p q ´ ¯ ` ˘ (B.0.45)

d where tul pxiqul“1 is an orthonormal set determined by local PCA, and

¯ d d Xi ” pxι˚X pxiq , ul pxiqyql“1 P R .

162 Proof. See [133, Theorem B.2].

Proof of Theorem 23. By Definition 22(2),

J OjiBi τ i,r Ojici,r “ J . }Bi τ i,r}

By Lemma 42,

1 3 J J 2 2 OjiBi τ i,r “ Bj Pξj ,ξi τ i,r ` O PCA `  , ` ˘ ´ ¯ thus

J J 1 OjiB τ i,r Bj Pξj ,ξi τ i,r 3 i 2 2 J “ J ` O PCA `  , }Bi τ i,r} }Bi τ i,r} ` ˘ ´ ¯ where we used

J J J J J Bi τ i,r F ´ 1 “ Bi τ i,r F ´ Qi τ i,r F ď Bi τ i,r ´ Qi τ i,r F

ˇ› › ˇ ˇ› J ›J › › ˇ › › ˇ› › ˇ ď ˇ›Bi ´ Q›i F “› O pPCA› ˇq › › › › and › ›

J Bj Pξj ,ξi τ i,r ď Pξj ,ξi τ i,r “ 1. › ` ˘› › › Therefore, › › › ›

J J O B τ B τ j,s O c c ji i i,r j ji i,r ´ j,s “ J ´ J }Bi τ i,r} Bj τ j,s

J › › 1 J Bj Pξj ,ξi τ i,r › › 3 Bj τ j,s O  2  2 “ J ` PCA ` ´ J }Bi τ i,r} Bj τ j,s ` ˘ ´ ¯ 1 › 3 › J J 2 › 2 › “ Bj Pξj ,ξi τ i,r ´ Bj τ j,s ` O PCA `  ´ ¯ ` ˘ 1 3 2 2 “ Pξj ,ξi τ i,r ´ τ j,s ` O PCA `  , ´ ¯ and

2 2 }Ojici,r ´ cj,s} ´ Pξj ,ξi τ i,r ´ τ j,s ď 4 pOjici,r ´ cj,sq ´ Pξj ,ξi τ i,r ´ τ j,s ˇ ˇ ˇ › › ˇ163 › ` ˘› ˇ › › ˇ › › 1 3 2 2 “ O PCA `  . ´ ¯ Thus a Taylor expansion for K at point

}ξ ´ ξ }2 }O c ´ c }2 i j , ji i,r j,s ˜  δ ¸ gives

}ξ ´ ξ }2 }O c ´ c }2 K i j , ji i,r j,s ˜  δ ¸

2 1 3 2 P τ τ O  2  2 }ξ ´ ξ } ξj ,ξi i,r ´ j,s ` PCA ` “ K i j , ¨  › δ ´ ¯› ˛ › › ˚ › › ‹ ˝ ‚ 2 2 }ξ ´ ξ } Pξ ,ξ τ i,r ´ τ j,s “ K i j , j i  δ ˜ › › ¸ › ›

1 3 2 2 O  2 `  2 }ξ ´ ξ } Pξ ,ξ τ i,r ´ τ j,s PCA ` B K i j , j i ¨ . 2  δ ´ δ ¯ ˜ › › ¸ › ›

For any function g P C8 pUTMq, this leads to

K,δ pτ i,r, ηq g pηq dΘ pηq żUTM ˜ “ K,δ pτ i,r, ηq g pηq dΘ pηq żUTM

1 3 O  2 `  2 2 2 PCA }ξ ´ y} Pξ ,ξ τ i,r ´ w ` B K i , j i g py, wq dσ pwq dvol pyq ´ δ ¯ 2  δ y M M Sy ˜ › › ¸ ż ż › ›

d d´1 1 3 ˜ 2 2 ´1 2 2 “ K,δ pτ i,r, ηq g pηq dΘ pηq `  δ O PCA `  . UTM ż ´ ¯ Following the notation used in the proof of Theorem 21, by the law of large numbers 1 lim lim qˆ,δ pτ i,rq “ E1E2 rK,δ pτ i,r, ¨qs NB Ñ8 NF Ñ8 NBNF 164 d d´1 1 3 ˜ 2 2 ´1 2 2 “ E1E2 K,δ pτ i,r, ¨q `  δ O PCA `  , ” ı ´ ¯ α and hence we expect H,δf pτ i,rq to converge to

1 3 ˜ α ´1 2 2 H,δf pτ i,rq ` O δ PCA `  ´ ´ ¯¯ m ∆H rfp1´αs pτ q ∆H p1´α pτ q “ f pτ q `  21 S i,r ´ f pτ q S i,r i,r 2m p1´α pτ q i,r p1´α pτ q 0 „ i,r i,r  m ∆V rfp1´αs pτ q ∆V p1´α pτ q ` δ 22 S i,r ´ f pτ q S i,r 2m p1´α pτ q i,r p1´α pτ q 0 „ i,r i,r 

1 3 2 2 ´1 2 2 ` O  ` δ ` O δ PCA `  . ` ˘ ´ ´ ¯¯ In fact, noting that

1 1 NF NB d d´1 qˆ,δ pτ i,rq “ d d´1 K,δ pτ i,r, τ j,sq  2 δ 2 NBNF  2 δ 2 NBNF j“1 s“1 ÿ ÿ

1 3 ´1 2 1 O δ PCA `  2 “ K pτ , τ q ` d d´1 ,δ i,r j,s ´ ´d d´1 ¯¯  2 δ 2 NBNF  2 δ 2

1 3 ´1 2 1 O δ PCA `  2 “ pˆ pτ , τ q ` , d d´1 ,δ i,r j,s ´ ´d d´1 ¯¯  2 δ 2 NBNF  2 δ 2 we have

αd αpd´1q 2α 2α α  δ NB NF K,δ pτ i,r, τ j,sq

K,δ pτ i,r, τ j,sq “ 1 α 1 α d d´1 qˆ,δ pτ i,rq d d´1 qˆ,δ pτ j,sq ˆ 2 δ 2 NBNF ˙ ˆ 2 δ 2 NBNF ˙

1 3 ´1 2 2 K,δ pτ i,r, τ j,sq ` O δ PCA `  “ α α 1 1 ´ ´ ¯¯ 1 3 ´1 2 2 d d´1 pˆ,δ pτ i,rq d d´1 pˆ,δ pτ j,sq ` O δ PCA `   2 δ 2 NBNF  2 δ 2 NBNF ˆ ˙ ˆ ˙ ´ ´ ¯¯ 1 3 αd αpd´1q 2α 2α α ´1 2 2 “  δ NB NF K,δ pτ i,r, τ j,sq ` O δ PCA `  . ´ ´ ¯¯ 165 α ˆ α Consequently, H,δf pτ i,rq is approximately H,δf pτ i,rq:

NB NF α K,δ pτ i,r, τ j,sq f pτ j,sq α j“1 s“1 H,δf pτ i,rq “ ÿ ÿ NB NF α K,δ pτ i,r, τ j,sq j“1 s“1 ÿ ÿ

NB NF 1 αd αpd´1q 2α 2α α  δ N N K pτ i,r, τ j,sq f pτ j,sq N N B F ,δ B F j“1 s“1 “ ÿ ÿ NB NF 1 αd αpd´1q 2α 2α α  δ N N K pτ i,r, τ j,sq N N B F ,δ B F j“1 s“1 ÿ ÿ

NB NF 1 3 1 αd αpd´1q 2α 2α α ´1 2  δ N N K˜ pτ , τ q f pτ q ` O δ  `  2 N N B F ,δ i,r j,s j,s PCA B F j“1 s“1 “ ÿ ÿ ´ ´ ¯¯ NB NF 1 3 1 αd αpd´1q 2α 2α α ´1 2  δ N N K˜ pτ , τ q ` O δ  `  2 N N B F ,δ i,r j,s PCA B F j“1 s“1 ÿ ÿ ´ ´ ¯¯ 1 NB NF αdδαpd´1qN 2αN 2αK˜ α pτ , τ q f pτ q N N B F ,δ i,r j,s j,s B F j“1 s“1 1 3 ´1 2 2 “ ÿ ÿ ` O δ PCA `  1 NB NF αdδαpd´1qN 2αN 2αK˜ α pτ , τ q ´ ´ ¯¯ N N B F ,δ i,r j,s B F j“1 s“1 ÿ ÿ

NB NF ˜ α K,δ pτ i,r, τ j,sq f pτ j,sq j“1 s“1 1 3 ´1 2 2 “ ÿ ÿ ` O δ PCA `  NB NF ˜ α K,δ pτ i,r, τ j,sq ´ ´ ¯¯ j“1 s“1 ÿ ÿ

1 3 ˆ α ´1 2 2 “ H,δf pτ i,rq ` O δ PCA `  . ´ ´ ¯¯ Therefore, under the assumption that

1 3 ´1 2 2 δ PCA `  ÝÑ 0 as  Ñ 0, ´ ¯ we can apply Theorem 21. This completes the whole proof.

166 Bibliography

[1] Najma Ahmad. The Geometry of Shape Recognition via the Monge-Kantorovich Optimal Transport Problem. PhD thesis, Brown University, 2004.

[2] Najma Ahmad, Hwa Kil Kim, and Robert J McCann. Optimal Transportation, Topology and Uniqueness. Bulletin of Mathematical Sciences, 1(1):13–32, 2011.

[3] Reema Al-Aifari, Ingrid Daubechies, and Yaron Lipman. Continuous Pro- crustes Distance Between Two Surfaces. Communications on Pure and Applied Mathematics, 66(6):934–964, 2013.

[4] Noga Alon, Richard M Karp, David Peleg, and Douglas West. A Graph- Theoretic Game and its Application to the k-Server Problem. SIAM Journal on Computing, 24(1):78–100, 1995.

[5] W. Ambrose and I. M. Singer. A Theorem on Holonomy. Transactions of the American Mathematical Society, 75(3):pp. 428–443, 1953.

[6] Afonso S Bandeira, Christopher Kennedy, and Amit Singer. Approximating the Little Grothendieck Problem over the Orthogonal Group. arXiv preprint arXiv:1308.5207, 2013.

[7] Afonso S. Bandeira, Amit Singer, and Daniel A. Spielman. A Cheeger Inequal- ity for the Graph Connection Laplacian. SIAM Journal on Matrix Analysis and Applications, 34(4):1611–1630, 2013.

[8] Yair Bartal. Probabilistic Approximation of Metric Spaces and its Algorithmic Applications. In Foundations of Computer Science, 1996. Proceedings., 37th Annual Symposium on, pages 184–193. IEEE, 1996.

[9] Yair Bartal. On approximating arbitrary metrices by tree metrics. In Proceed- ings of the thirtieth annual ACM symposium on Theory of computing, pages 161–168. ACM, 1998.

[10] Fabrice Baudoin. Sub-Laplacians and Hypoelliptic Operators on Totally Geodesic Riemannian Foliations. arXiv preprint arXiv:1410.3268, 2014.

167 [11] Mikhail Belkin and Partha Niyogi. Semi-Supervised Learning on Riemannian Manifolds. Machine Learning, 56(1-3):209–239, 2004.

[12] Mikhail Belkin and Partha Niyogi. Towards a Theoretical Foundation for Laplacian-Based Manifold Methods. In Learning Theory, pages 486–500. Springer, 2005.

[13] Mikhail Belkin and Partha Niyogi. Convergence of Laplacian Eigenmaps. Ad- vances in Neural Information Processing Systems, 19:129, 2007.

[14] P. B´erard,G. Besson, and S. Gallot. Embedding Riemannian Manifolds by Their Heat Kernel. Geometric & Functional Analysis GAFA, 4(4):373–398, 1994.

[15] Tyrus Berry and John Harlim. Variable Bandwidth Diffusion Kernels. Applied and Computational Harmonic Analysis, ?(?):?–??, 2015.

[16] Arthur L Besse. Einstein manifolds. Springer, 2007.

[17] Garrett Birkhoff. Lattice Theory, volume 25 of Colloquium Publications. Amer- ican Mathematical Society, , revised edition edition, 1948.

[18] Jean-Michel Bismut. Hypoelliptic Laplacian and BottChern Cohomology: A Theorem of RiemannRochGrothendieck in Complex Geometry, volume 305 of Progress in Mathematics. Birkhuser Basel, 2013.

[19] Jean-Michel Bismut and Jeff Cheeger. η-Invariants and Their Adiabatic Limits. Journal of the American Mathematical Society, 2(1):pp. 33–70, 1989.

[20] Eric Boeckx. A Case for Curvature: the Unit Tangent Bundle. In Complex, Contact and Symmetric Manifolds, pages 15–26. Springer, 2005.

[21] Eric Boeckx and Lieven Vanhecke. Characteristic Reflections on Unit Tangent Sphere Bundles. Houston Journal of Mathematics, 23(3):427–448, 1997.

[22] F. L. Bookstein. Principal Warps: Thin-Plate Splines and the Decomposition of Deformations. IEEE Trans. Pattern Anal. Mach. Intell., 11(6):567–585, June 1989.

[23] Z. I. Botev, J. F. Grotowski, and D. P. Kroese. Kernel Density Estimation via Diffusion. Ann. Statist., 38(5):2916–2957, 10 2010.

[24] and Loring W Tu. Differential Forms in Algebraic Topology, vol- ume 82 of Graduate Texts in Mathematics. Springer-Verlag New York, 1982.

168 [25] Doug M. Boyer, Yaron Lipman, Elizabeth St. Clair, Jesus Puente, Biren A. Patel, Thomas Funkhouser, Jukka Jernvall, and Ingrid Daubechies. Algorithms to Automatically Quantify the Geometric Similarity of Anatomical Surfaces. Proceedings of the National Academy of Sciences, 108(45):18221–18226, 2011.

[26] Yann Brenier. Polar Factorization and Monotone Rearrangement of Vector- Valued Functions. Communications on Pure and Applied Mathematics, 44(4):375–417, 1991.

[27] Alexander M Bronstein, Michael M Bronstein, and Ron Kimmel. Numerical Geometry of Non-Rigid Shapes. Springer Science & Business Media, 2008.

[28] Robert L Bryant. Geometry of Manifolds with Special Holonomy:“100 Years of Holonomy”. Contemporary Mathematics, 395:29–38, 2006.

[29] Luis A Caffarelli. Boundary Regularity of Maps with Convex Potentials. Com- munications on pure and applied mathematics, 45(9):1141–1151, 1992.

[30] Luis A Caffarelli. The Regularity of Mappings with a Convex Potential. Journal of the American Mathematical Society, pages 99–104, 1992.

[31] Luis A Caffarelli. Boundary Regularity of Maps with Convex Potentials–II. Annals of Mathematics, pages 453–496, 1996.

[32] Moses Charikar, Chandra Chekuri, Ashish Goel, Sudipto Guha, and Serge Plotkin. Approximating a Finite Metric by a Small Number of Tree Metrics. In Foundations of Computer Science, 1998. Proceedings. 39th Annual Symposium on, pages 379–388. IEEE, 1998.

[33] Guillaume Charpiat. Learning Shape Metrics based on Deformations and Transport. In Second Workshop on Non-Rigid Shape Analysis and Deformable Image Alignment, Kyoto, Japon, September 2009.

[34] Kunal N Chaudhury, Yuehaw Khoo, Amit Singer, and David Cowburn. Global Registration of Multiple Point Clouds using Semidefinite Programming. arXiv preprint arXiv:1306.5226, 2013.

[35] Isaac Chavel. Riemannian Geometry: a Modern Introduction. Number 98 in Cambridge Studies in Advanced Mathematics. Cambridge University Press, 2nd edition, 2006.

[36] Yuxin Chen, Leonidas Guibas, and Qixing Huang. Near-Optimal Joint Object Matching via Convex Relaxation. In Tony Jebara and Eric P. Xing, editors, Proceedings of the 31st International Conference on Machine Learning (ICML- 14), pages 100–108. JMLR Workshop and Conference Proceedings, 2014.

169 [37] Fan R.K. Chung. Spectral Graph Theory. Number 92 in CBMS Regional Conference Series in Mathematics. American Mathematical Society, 1997.

[38] Fan RK Chung and Linyuan Lu. Complex Graphs and Networks, volume 107. American Mathematical Society Providence, 2006.

[39] R. R. Coifman, S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner, and S. W. Zucker. Geometric Diffusions as a Tool for Harmonic Analysis and Structure Definition of Data: Diffusion Maps. Proceedings of the National Academy of Sciences of the of America, 102(21):7426–7431, 2005.

[40] R. R. Coifman, S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner, and S. W. Zucker. Geometric Diffusions as a Tool for Harmonic Analysis and Structure Definition of Data: Multiscale Methods. Proceedings of the National Academy of Sciences of the United States of America, 102(21):7432–7437, 2005.

[41] Ronald R. Coifman and St´ephaneLafon. Diffusion Maps. Applied and Compu- tational Harmonic Analysis, 21(1):5 – 30, 2006. Special Issue: Diffusion Maps and Wavelets.

[42] Ronald R. Coifman and Mauro Maggioni. Diffusion Wavelets. Applied and Computational Harmonic Analysis, 21(1):53 – 94, 2006. Special Issue: Diffu- sion Maps and Wavelets.

[43] Keenan Crane, Mathieu Desbrun, and Peter Schr¨oder.Trivial Connections on Discrete Surfaces. Computer Graphics Forum (SGP), 29(5):1525–1533, 2010.

[44] Manfredo P Do Carmo. Riemannian Geometry. Springer, 1992.

[45] Peter Dombrowski. On the Geometry of the Tangent Bundle. Journal fr die reine und angewandte Mathematik, 210:73–88, 1962.

[46] David L. Donoho and Carrie Grimes. Hessian Eigenmaps: Locally Linear Embedding Techniques for High-Dimensional Data. Proceedings of the National Academy of Sciences, 100(10):5591–5596, 2003.

[47] Ian L Dryden and Kanti V Mardia. Statistical Shape Analysis, volume 4. John Wiley & Sons New York, 1998.

[48] IL Dryden and KV Mardia. Multivariate Shape Analysis. Sankhy¯a:The Indian Journal of Statistics, Series A, pages 460–480, 1993.

[49] Richard Durrett. Stochastic Calculus: A Practical Introduction, volume 6. CRC press, 1996.

[50] Charles Ehresmann. Les Connexions Infinit´esimales dans un Espace Fibr´e Diff´erentiable. S´eminaire Bourbaki, 1:153–168, 1950.

170 [51] K David Elworthy, Yves Le Jan, and Xue-Mei Li. The Geometry of Filtering. Springer, 2010.

[52] Jittat Fakcharoenphol, Satish Rao, and Kunal Talwar. A Tight Bound on Approximating Arbitrary Metrics by Tree Metrics. In Proceedings of the thirty- fifth annual ACM symposium on Theory of computing, pages 448–455. ACM, 2003.

[53] David Spotts Fry. Shape Recognition Using Metrics on the Space of Shapes. PhD thesis, Harvard University, Cambridge, MA, USA, 1993. UMI Order No. GAX94-12337.

[54] Wilfrid Gangbo and Robert J. McCann. Shape Recognition via Wasserstein Distance. Quarterly of Applied Mathematics, 58(4):705–737, 2000.

[55] Wilfrid Gangbo and RobertJ. McCann. The Geometry of Optimal Transporta- tion. Acta Mathematica, 177(2):113–161, 1996.

[56] Tingran Gao, Gabriel Yapuncich, Doug M. Boyer, Ingrid C Daubechies, Roi Poranne, and Yaron Lipman. Advances and Automated Methods for Morpho- metrical Comparisons of Teeth. Duke preprint, 2015.

[57] Deboshmita Ghosh, Andrei Sharf, and Nina Amenta. Feature-Driven Defor- mation for Dense Correspondence. In SPIE Medical Imaging, pages 726136– 726136. International Society for Optics and Photonics, 2009.

[58] William M Goldman. Two Papers which Changed My Life: Milnor’s Seminal Work on Flat Manifolds and Flat Bundles. arXiv preprint arXiv:1108.0216, 2011.

[59] Colin Goodall. Procrustes Methods in the Statistical Analysis of Shape. Jour- nal of the Royal Statistical Society. Series B (Methodological), pages 285–339, 1991.

[60] John C Gower and Garmt B Dijksterhuis. Procrustes problems, volume 3 of Oxford Statistical Science Series. Oxford University Press Oxford, 2004.

[61] Phillip Griffiths and Joseph Harris. Principles of Algebraic Geometry, vol- ume 52 of Pure and Applied Mathematics. John Wiley & Sons, 2011.

[62] Mikhaıl Gromov. Geometric Group Theory, Vol. 2: Asymptotic Invariants of Infinite Groups. Bull. Amer. Math. Soc, 33:0273–0979, 1996.

[63] David T Guarrera, Niles G Johnson, and Homer F Wolfe. The Taylor Expan- sion of a Riemannian Metric. preprint, 2002.

171 [64] Sigmundur Gudmundsson and Elias Kappos. On the Geometry of Tangent Bundles. Expositiones Mathematicae, 20(1):1 – 41, 2002.

[65] Behrend Heeren, Martin Rumpf, Peter Schr¨oder, Max Wardetzky, and Benedikt Wirth. Exploring the Geometry of the Space of Shells. Computer Graphics Forum, 33(5), 2014. to appear.

[66] Matthias Hein, Jean-Yves Audibert, and Ulrike Von Luxburg. Graph Lapla- cians and Their Convergence on Random Neighborhood Graphs. Journal of Machine Learning Research, 8:1325–1368, 2007.

[67] Kevin Hestir and Stanley C Williams. Supports of Doubly Stochastic Measures. Bernoulli, pages 217–243, 1995.

[68] Lars H¨ormander. Hypoelliptic Second Order Differential Equations. Acta Mathematica, 119(1):147–171, 1967.

[69] Qi-Xing Huang, Simon Fl¨ory, Natasha Gelfand, Michael Hofer, and Helmut Pottmann. Reassembling Fractured Objects by Geometric Matching. ACM Trans. Graph., 25(3):569–578, July 2006.

[70] Qixing Huang and Leonidas Guibas. Consistent Shape Maps via Semidefinite Programming. Computer Graphics Forum, Proc. Eurographics Symposium on Geometry Processing (SGP), 32(5):177–186, 2013.

[71] Qixing Huang, Guoxin Zhang, Lin Gao, Shimin Hu, Adrian Bustcher, and Leonidas Guibas. An Optimization Approach for Extracting and Encoding Consistent Maps in a Shape Collection. ACM Transactions on Graphics, 31:125:1–125:11, 2012.

[72] Daniel Huber. Automatic Three-dimensional Modeling from Reality. PhD the- sis, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, December 2002.

[73] Miao Jin, Wei Zeng, Feng Luo, and Xianfeng Gu. Computing Teichm¨uller Shape Space. IEEE Trans. Vis. Comput. Graph., 15(3):504–517, 2009.

[74] Peter W. Jones, Mauro Maggioni, and Raanan Schul. Manifold Parametriza- tions by Eigenfunctions of the Laplacian and Heat Kernels. Proceedings of the National Academy of Sciences, 105(6):1803–1808, 2008.

[75] Noureddine El Karoui and Hau-Tieng Wu. Vector Diffusion Maps and Random Matrices with Random Blocks, 2013.

[76] Noureddine El Karoui and Hau-Tieng Wu. Graph Connection Laplacian and Random Matrices with Random Blocks, 2014.

172 [77] Richard M Karp. A 2k-Competitive Algorithm for the Circle. Karp’s Random Embedding of a Circle into Trees, 1989. [78] David G Kendall. Shape Manifolds, Procrustean Metrics, and Complex Pro- jective Spaces. Bulletin of the London Mathematical Society, 16(2):81–121, 1984. [79] David G. Kendall. A Survey of the Statistical Theory of Shape. Statistical Science, 4(2):87–99, 05 1989. [80] John T Kent. The Complex Bingham Distribution and Shape Analysis. Journal of the Royal Statistical Society. Series B (Methodological), pages 285–299, 1994. [81] Samir Khuller, Balaji Raghavachari, and Neal Young. Balancing Minimum Spanning Trees and Shortest-Path Trees. Algorithmica, 14(4):305–321, 1995. [82] Martin Kilian, Niloy J. Mitra, and Helmut Pottmann. Geometric Modeling in Shape Space. ACM Trans. Graph., 26(3), July 2007. [83] Vladimir G. Kim, Wilmot Li, Niloy J. Mitra, Stephen DiVerdi, and Thomas Funkhouser. Exploring Collections of 3D Models Using Fuzzy Correspon- dences. ACM Trans. Graph., 31(4):54:1–54:11, July 2012. [84] YH Kim and RJ McCann. Continuity, Curvature, and the General Covariance of Optimal Transportation. J. Eur. Math. Soc., 432(12):1009–1040, 2010. [85] Ron Kimmel and James A Sethian. Computing geodesic paths on manifolds. Proceedings of the National Academy of Sciences, 95(15):8431–8435, 1998. [86] Stephane Lafon and Ann B Lee. Diffusion Maps and Coarse-Graining: A Unified Framework for Dimensionality Reduction, Graph Partitioning, and Data Set Parameterization. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 28(9):1393–1403, 2006. [87] St´ephaneS Lafon. Diffusion Maps and Geometric Harmonics. PhD thesis, Yale University, 2004. [88] Jacques LaFontaine, M Katz, Mikhail Gromov, Sean Michael Bates, P Pansu, Pierre Pansu, and S Semmes. Metric Structures for Riemannian and Non- Riemannian Spaces. Springer, 2007. [89] Y. Lipman and I. Daubechies. Conformal Wasserstein Distances: Comparing Surfaces in Polynomial Time. Advances in Mathematics, 227(3):1047 – 1077, 2011. [90] Yaron Lipman, Xiaobai Chen, Ingrid Daubechies, and Thomas Funkhouser. Symmetry Factored Embedding and Distance. ACM Transactions on Graphics (SIGGRAPH 2010), July 2010.

173 [91] Yaron Lipman, J. Puente, and Ingrid Daubechies. Conformal Wasserstein Distance: II. Computational Aspects and Extensions. Math. Comput., 82(281), 2013.

[92] Anna V Little, Mauro Maggioni, and Lorenzo Rosasco. Multiscale geometric methods for estimating intrinsic dimension. Proc. SampTA, 2011.

[93] R. McCann, B. Pass, and M. Warren. Rectifiability of Optimal Transportation Plans. Canad. J. Math, 64:924–934, 2012.

[94] Robert J. McCann. Existence and Uniqueness of Monotone Measure-Preserving Maps. Duke Math. J., 80(2):309–323, 11 1995.

[95] Facundo Memoli. On the use of Gromov-Hausdorff Distances for Shape Com- parison. In M. Botsch, R. Pajarola, B. Chen, and M. Zwicker, editors, Euro- graphics Symposium on Point-Based Graphics. The Eurographics Association, 2007.

[96] Facundo M´emoli.Gromov-Hausdorff Distances in Euclidean Spaces. In Com- puter Vision and Pattern Recognition Workshops, 2008. CVPRW’08. IEEE Computer Society Conference on, pages 1–8. IEEE, 2008.

[97] Facundo M´emoliand Guillermo Sapiro. A Theoretical and Computational Framework for Isometry Invariant Recognition of Point Cloud Data. Founda- tions of Computational Mathematics, 5(3):313–347, 2005.

[98] Mario Micheli. The Differential Geometry of Landmark Shape Manifolds: Met- rics, Geodesics, and Curvature. PhD thesis, Brown University, Providence, RI, USA, 2008. AAI3335682.

[99] Peter W. Michor and . Riemannian Geometries on Spaces of Plane Curves. J. Eur. Math. Soc., 8(1):1–48, 2006.

[100] Peter W Michor and David Mumford. An Overview of the Riemannian Metrics on Spaces of Curves using the Hamiltonian Approach. Applied and Computa- tional Harmonic Analysis, 23(1):74–113, 2007.

[101] John Milnor. On the Existence of a Connection with Curvature Zero. Com- mentarii Mathematici Helvetici, 32(1):215–223, 1958.

[102] Philipp Mitteroecker and Philipp Gunz. Advances in Geometric Morphomet- rics. Evolutionary Biology, 36(2):235–247, 2009.

[103] Carsten Moenning and Neil A Dodgson. Fast marching farthest point sampling. In Proc. EUROGRAPHICS 2003, 2003.

174 [104] Richard Montgomery. A Tour of Subriemannian Geometries, Their Geodesics and Applications. Number 91 in Mathematical Surveys and Monographs. American Mathematical Soc., 2006.

[105] David Mumford. The Geometry and Curvature of Shape Spaces. In Umberto Zannier, editor, Colloquium De Giorgi 2009, volume 3 of Colloquia, pages 43– 53. Scuola Normale Superiore, 2012.

[106] E. Musso and F. Tricerri. Riemannian Metrics on Tangent Bundles. Annali di Matematica Pura ed Applicata, 150(1):1–19, 1988.

[107] Assaf Naor, Oded Regev, and Thomas Vidick. Efficient Rounding for the Non- commutative Grothendieck Inequality. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 71–80. ACM, 2013.

[108] Arkadi Nemirovski. Sums of Random Symmetric Matrices and Quadratic Opti- mization under Orthogonality Constraints. Mathematical programming, 109(2- 3):283–317, 2007.

[109] Andy Nguyen, Mirela Ben-Chen, Katarzyna Welnicka, Yinyu Ye, and Leonidas Guibas. An Optimization Approach to Improving Collections of Shape Maps. Computer Graphics Forum, 30(5):1481–1491, 2011.

[110] P.W. Nowak and G. Yu. Large Scale Geometry. EMS Textbooks in Mathe- matics. European Mathematical Society, 2012.

[111] Paul O’Higgins and Nicholas Jones. Facial growth in Cercocebus torquatus: an application of three-dimensional geometric morphometric techniques to the study of morphological variation. Journal of Anatomy, 193(2):251–272, 1998.

[112] M. Ovsjanikov, Q. Mrigot, V. Patraucean, and L. Guibas. Shape Matching via Quotient Spaces. In Proc. Eurographics Symposium on Geometry Processing (SGP), 2013.

[113] Maks Ovsjanikov, Mirela Ben-Chen, Justin Solomon, Adrian Butscher, and Leonidas Guibas. Functional Maps: A Flexible Representation of Maps Be- tween Shapes. ACM Transactions on Graphics, 31(4), 2012.

[114] Brendan Pass. On the Local Structure of Optimal Measures in the Multi- Marginal Optimal Transportation Problem. Calculus of Variations and Partial Differential Equations, 43(3-4):529–536, 2012.

[115] Peter Petersen. Riemannian Geometry, volume 171 of Graduate Texts in Math- ematics. Springer Science & Business Media, 2006.

175 [116] Roger Phillips, Paul O’Higgins, Fred Bookstein, Bill Green, Helgi Gunnarson, Youssef Shady, Vincent Dalge, Ramy Gowigati, and Oualid Ben Ali. EVAN (European Virtual Anthropology Network) toolbox, 2010. Media of output: executable file, opensource code.

[117] Ulrich Pinkall and Konrad Polthier. Computing Discrete Minimal Surfaces and Their Conjugates. Experimental mathematics, 2(1):15–36, 1993.

[118] P David Polly and Norman MacLeod. Locomotion in fossil Carnivora: an ap- plication of eigensurface analysis for morphometric comparison of 3D surfaces. Palaeontologia Electronica, 11(2):10–13, 2008.

[119] Jes`usPuente. Distances and Algorithms to Compare Sets of Shapes for Auto- mated Biological Morphometrics. PhD thesis, Princeton University, 2013.

[120] Jes`usPuente, Douglas M Boyer, Justin T Gladman, and Ingrid C Daubechies. Automated Approaches to Geometric Morphometrics. American Journal of Physical Anthropology, 150:226, 2013.

[121] Yuri Rabinovich and Ran Raz. Lower Bounds on the Distortion of Embed- ding Finite Metric Spaces in Graphs. Discrete & Computational Geometry, 19(1):79–94, 1998.

[122] Anand Rangarajan, Haili Chui, and Fred L Bookstein. The Softassign Pro- crustes Matching Algorithm. In Information Processing in Medical Imaging, pages 29–42. Springer, 1997.

[123] B. Riemann and William Kingdon Clifford. On the Hypotheses which lie at the Bases of Geometry. Nature, 8:14–17, 36, 37, 1873.

[124] Bernhard Riemann. Uber¨ die Hypothesen, welche der Geometrie zu Grunde liegen. In Gaußsche Fl¨achentheorie, Riemannsche R¨aumeund Minkowski-Welt, pages 67–83. Springer, 1984.

[125] Ludovic Rifford. Sub-Riemannian Geometry and Optimal Transport. Springer Briefs in Mathematics. Springer, 2014.

[126] Steven Rosenberg. The Laplacian on a Riemannian Manifold: an introduction to analysis on manifolds. Number 31 in London Mathematical Society Student Texts. Cambridge University Press, 1997.

[127] Sam T. Roweis and Lawrence K. Saul. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science, 290(5500):2323–2326, 2000.

[128] Shigeo Sasaki. On the Differential Geometry of Tangent Bundles of Riemannian Manifolds. Tohoku Math. J. (2), 10(3):338–354, 1958.

176 [129] Shigeo Sasaki. On the Differential Geometry of Tangent Bundles of Riemannian Manifolds, II. Tohoku Math. J. (2), 14(2):146–155, 1962.

[130] Yoshihiro Shikata. On a Distance Function Between Differentiable Structures. Nagoya Mathematical Journal, 56:53–60, 1975.

[131] Peter Shirley, Michael Ashikhmin, and Steve Marschner. Fundamentals of computer graphics. CRC Press, 2009.

[132] Oana Sidi, Oliver van Kaick, Yanir Kleiman, Hao Zhang, and Daniel Cohen- Or. Unsupervised Co-segmentation of a Set of Shapes via Descriptor-space Spectral Clustering. ACM Trans. Graph., 30(6):126:1–126:10, December 2011.

[133] A. Singer and H.-T. Wu. Vector Diffusion Maps and the Connection Laplacian. Communications on Pure and Applied Mathematics, 65(8):1067–1144, 2012.

[134] Amit Singer. From Graph to Manifold Laplacian: The Convergence Rate. Applied and Computational Harmonic Analysis, 21(1):128–134, 2006.

[135] Amit Singer and Hau-Tieng Wu. Orientability and Diffusion Maps. Applied and Computational Harmonic Analysis, 31(1):44 – 58, 2011.

[136] Amit Singer and Hau-tieng Wu. Spectral Convergence of the Connection Lapla- cian from Random Samples. arXiv preprint arXiv:1306.1587, 2013.

[137] Oleg G Smolyanov, Heinrich v Weizs¨acker, and Olaf Wittich. Chernoff’s The- orem and Discrete Time Approximations of Brownian Motion on Manifolds. Potential Analysis, 26(1):1–29, 2007.

[138] Anthony Man-Cho So. Moment Inequalities for Sums of Random Matrices and Their Applications in Optimization. Mathematical Programming, 130(1):125– 151, 2011.

[139] Justin Solomon, Mirela Ben-Chen, Adrian Butscher, and Leonidas Guibas. Discovery of Intrinsic Primitives on Triangle Meshes. In Computer Graphics Forum, volume 30, pages 365–374. Wiley Online Library, 2011.

[140] Justin Solomon, Leonidas Guibas, and Adrian Butscher. Dirichlet Energy for Analysis and Synthesis of Soft Maps. In Proc. EProc. Eurographics Symposium on Geometry Processing (SGP), 2013.

[141] Justin Solomon, Andy Nguyen, Adrian Butscher, Mirela Ben-Chen, and Leonidas Guibas. Soft Maps Between Surfaces. In Proc. SGP 2012, 2012.

[142] Norman Earl Steenrod. The Topology of Fibre Bundles, volume 14. Princeton University Press, 1951.

177 [143] Elias M Stein. Topics in Harmonic Analysis, related to the Littlewood-Paley theory. Number 63 in Annals of Mathematical Studies. Princeton University Press, 1970.

[144] Daniel W Stroock. An Introduction to the Analysis of Paths on a Rieman- nian Manifold, volume 74 of Mathematical Surveys and Monographs. American Mathematical Soc., 2005.

[145] Clifford Taubes. Differential Geometry: Bundles, Connections, Metrics and Curvature, volume 23. Oxford University Press, 2011.

[146] Michael Eugene Taylor. Noncommutative Harmonic Analysis, volume 22. American Mathematical Soc., 1990.

[147] Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 290(5500):2319–2323, 2000.

[148] Katharine Turner, Sayan Mukherjee, and Doug M. Boyer. Persistent Homology Transform for Modeling Shapes and Surfaces. Information and Inference, 2014.

[149] Fred A. Velez-Reyes, Miguel; Kruse, editor. Schroedinger Eigenmaps with Non- diagonal Potentials for Spatial-Spectral Clustering of Hyperspectral Imagery, volume 9088, 2014.

[150] C´edricVillani. Topics in Optimal Transportation. Graduate studies in math- ematics. American Mathematical Society, 2003.

[151] C´edricVillani. Optimal Transport: Old and New. Grundlehren der mathema- tischen Wissenschaften. Springer, 2009 edition, November 2008.

[152] Fan Wang, Qixing Huang, and Leonidas J Guibas. Image Co-Segmentation via Consistent Functional Maps. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 849–856. IEEE, 2013.

[153] Shinzo Watanabe and Nobuyuki Ikeda. Stochastic Differential Equations and Diffusion Processes. Elsevier, 1981.

[154] David F Wiley, Nina Amenta, Dan A Alcantara, Deboshmita Ghosh, Yong Joo Kil, Eric Delson, Will Harcourt-Smith, F James Rohlf, K St John, and Bernd Hamann. Evolutionary Morphing. In Visualization, 2005. VIS 05. IEEE, pages 431–438. IEEE, 2005.

[155] Benedikt Wirth, Leah Bar, Martin Rumpf, and Guillermo Sapiro. A Contin- uum Mechanical Approach to Geodesics in Shape Space. International Journal of Computer Vision, 93(3):293–318, 2011.

178 [156] Hau-Tieng Wu. Embedding Riemannian Manifolds by the Heat Kernel of the Connection Laplacian. arXiv preprint arXiv:1305.4232, 2013.

[157] Ziheng Yang. Computational Molecular Evolution, volume 21. Oxford Univer- sity Press Oxford, 2006.

[158] Laurent Younes. Shapes and Diffeomorphisms, volume 171. Springer, 2010.

[159] M. L. Zelditch, D. L. Swiderski, D. H. Sheets, and W. L. Fink. Geometric Morphometrics for Biologists. San Diego: Elsevier Academic Press, 2004.

[160] Zhenyue Zhang and Hongyuan Zha. Principal Manifolds and Nonlinear Di- mensionality Reduction via Tangent Space Alignment. SIAM J. Sci. Comput., 26(1):313–338, January 2005.

[161] Xiaojin Zhu. Semi-Supervised Learning Literature Survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison, 2005.

179 Biography

Tingran Gao was born in January 1988 in Wenzhou, China. He received a B.S. in Mathematics in July 2010 from Tsinghua University in Beijing, China. In 2010, he arrived at Duke University and became a graduate student in the Department of Mathematics, under the supervision of Professor Ingrid Daubechies. During his studies at Duke, he did his Ph.D. thesis in the intersection of high-dimensional data analysis, machine learning, and applied harmonic analysis; the extensive software engineering and algorithm designing experience involved in his research also earned him a M.S. degree in Computer Science. He accepted a one-year appointment from Duke University right after his graduation, and will become a Visiting Assistant Professor for the Academic Year 2015-2016.

180