Quick viewing(Text Mode)

Geometry of Diffeomorphism Groups and Shape Matching

Geometry of Diffeomorphism Groups and Shape Matching

Imperial College London Department of

Geometry of Diffeomorphism Groups and Shape Matching

Martins Bruveris

May 31, 2012

Supervised by Prof. Darryl D. Holm

Submitted in part fulfilment of the requirements for the degree of Doctor of Philosophy in Mathematics of Imperial College London and the Diploma of Imperial College London

Declaration

I herewith certify that all material in this dissertation which is not my own work has been duly acknowledged. Selected results from this dissertation have been disseminated in scientific publications as detailed in Section 1.4.

Martins Bruveris

3

Abstract

The large deformation matching (LDM) framework is a method for registra- tion of images and other data structures, used in computational anatomy. We show how to reformulate the large deformation matching framework for registration in a geometric way. The general framework also allows to gen- eralize the large deformation matching framework to include multiple scales by using the iterated semidirect of groups. An important ingredient in the LDM framework is the choice of a suitable Riemannian metric on the space of diffeomorphisms. Since the space in question is infinite-dimensional, not every choice of the metric is suitable. In particular the geodesic distance, which is defined as the infimum over the length of all paths connecting two points, may vanish. For the family s of Sobolev-type H -metrics on the diffeomorphism groups of R and S1 we establish that the geodesic distance vanishes for metrics of 0 s 1 . ≤ ≤ 2 The geodesic distance also vanishes for the L2-metric on the Virasoro-Bott , which is a central extension of the diffeomorphism group of the circle. Vanishing of geodesic distance implies that the length-functional, which assigns to each curve in the manifold its length, has no global minima, when restricted to paths with fixed endpoints. We show that for the L2-metric on the diffeomorphism group of R and the Virasoro-Bott group doesn’t have any local minima either. The large deformation matching framework is not the only approach to the registration and shape comparison. For curves and surfaces it is possible to define a Riemannian metric directly on the space of curves or surfaces and use geodesics with respect to this metric to measure differences in shape. We use the family of Sobolev-type metrics on surfaces from [7]. We show how to discretize the geodesic equations and solve the boundary value problem via a shooting method on the initial velocity. The discrete equations are implemented via the finite element method.

5

To my family.

7

Acknowledgments

Over the last three years I had the fortune to meet many people, to whom I owe my gratitude and without whom this work would not have been possible. First I want to thank my adviser Darryl Holm, who gave me the opportunity to study in London and guided me on my first steps in the world of research. I also want to thank Peter Michor for his valuable advice, both technical and personal and for always making me feel welcome in Vienna. It is due to my colleagues in London, David Ellis, Laurent Risser, Sehun Chun, Fran¸cois- Xavier Vialard, David Meier, Chris Cantwell and Christopher Burnett, that I always felt like part of a team and never had to look far for someone to drink coffee with. It was a pleasure to collaborate with my friends in Vienna, Martin Bauer and Philipp Harms. My thanks also goes to Colin Cotter, Fran¸coisGay-Balmaz and Tudor Ratiu for interesting discussions and valuable advice. And finally I want to thank all the members of the ShapeFRG meetings, who made working in this field like being part of a large family.

9

Contents

Abstract 5

Acknowledgments 9

1 Introduction 13 1.1 Diffeomorphism Group and Applications ...... 13 1.2 Content of this Work ...... 15 1.3 Contributions of this Work ...... 17 1.4 Publications ...... 18

2 Multiscale Registration 19 2.1 Geometry of Registration ...... 19 2.1.1 Motivation ...... 19 2.1.2 Abstract Framework ...... 24 2.2 Registration using Diff(Ω) ...... 31 2.2.1 The Setting ...... 31 2.2.2 Matching Problems ...... 34 2.2.3 Landmark Matching ...... 36 2.2.4 Matching ...... 37 2.2.5 Vector Field Matching ...... 39 2.3 Multiscale Registration ...... 41 2.3.1 Semidirect Products ...... 42 2.3.2 Semidirect Products of Diffeomorphism Groups . . . . 45 2.3.3 Sums of Kernels ...... 49 2.3.4 The Order Reversed ...... 53 2.3.5 A Continuum of Scales ...... 56 2.3.6 Restriction to a Finite Number of Scales ...... 60

3 Geodesic Distance on Diffeomorphism and Related Groups 63 3.1 Overview of the Results ...... 63

11 3.2 Mathematical Background ...... 65 3.2.1 Diffeomorphism Groups ...... 65 3.2.2 Sobolev Spaces on Manifolds ...... 66

3.2.3 Sobolev Metrics on Diffc(M)...... 68 3.2.4 Virasoro-Bott Group ...... 69 3.3 Diffeomorphism Groups ...... 70 3.4 Virasoro-Bott Group ...... 82 3.5 Local Minima of the Length Functional ...... 89 3.6 Outlook ...... 100

4 Surface Matching 103 4.1 Background ...... 103 4.2 First Order Sobolev-type Metric ...... 104 4.3 Variation of the Metric ...... 105 4.4 The Helmholtz Operator and Duality ...... 110 4.5 Discretization ...... 113 4.5.1 The Geodesic Equation ...... 113 4.5.2 Computing the Gradient ...... 116 4.6 Numerical Experiments ...... 120 4.7 Outlook ...... 124

Bibliography 127

12 1 Introduction

1.1 Diffeomorphism Group and Applications

The diffeomorphism group plays a central role in the field of computational anatomy. For the first time its use in biology was noted by D’Arcy Thomp- son in his book “On Growth and Form” [79]. Following the paradigm of pattern theory, introduced by Grenander [31, 33], we assume that the anatomical variety of an object of interest across the population can be explained by choosing a template, upon which a set of deformations acts, thus generating the entire population, potentially small-scale noise. In this setting the collection of anatomical objects forms a homogeneous space under the group of deformations [32]. The objects, which are stud- ied in computational anatomy are those that can be obtained by medical imaging procedures. They include volumetric gray-level images from MRI and CT scans, vector and tensor fields from diffusion tensor MRI, surfaces representing the outline of organs, curves in space representing white mat- ter fiber tracts and manually or automatically assigned feature points. The group of deformations is usually the diffeomorphism group of the ambient three-dimensional space. The aim of the pattern theory approach is to encode the differences be- tween two objects in the deformation that matches one object to the other. This allows one to ignore all the complexity inherent in anatomical struc- tures and only concentrate on the differences between them. Therefore, when given a template object I V and a target object I V from 0 ∈ targ ∈ the population, denoted by V and assumed to be a vector space, as a first step, it is necessary to determine the deformation g G from the group G ∈ of all deformations, mapping I0 to g.I0 = Itarg. This can be done using the large deformation diffeomorphic metric matching (LDM) framework of [40, 10, 61, 91, 93]. The objective of LDM is not just to determine a deformation g G such 1 ∈

13 that the g I of g G on the template I V approximates 1 0 1 ∈ 0 ∈ the target I V to within a certain tolerance. Rather, the objective of targ ∈ LDM is to find the optimal path gt G continuously parametrized by time ∈ t R that smoothly deforms I0 through It = gtI0 to g1I0. The optimal path ∈ gt G is defined as the path that costs the least in time-integrated kinetic ∈ energy for a given tolerance. Hence, the deformable template method may be formulated as an optimization problem based on a trade-off between the following two properties: (1) the tolerance for inexact matching between the final deformed template g1I0 and the target template Itarg; and (2) the cost of time-integrated kinetic energy of the rate of deformation along the path gt. The former is defined by assigning a norm : V R to measure k · k → the mismatch gtI I between the two images. The latter is obtained k 0 − targk by choosing a Riemannian metric : TG R that defines the kinetic | · | → energy on the tangent space TG of the group G. In applications of LDM to the analysis of features in bio-medical images, the optimal path gt is naturally chosen from among the diffeomorphic trans- formations G = Diff(Ω) of an open, bounded domain Ω. The domain Ω will be taken to be the ambient space in which the anatomy is located. It can be shown that [64, 89] the optimal path of deformation satisfies an evolution equation, which is the geodesic equation for the metric, which was chosen to measure the kinetic energy. Therefore the whole path is encoded in its initial value at time t = 0. These geodesic equations on the diffeomorphism group naturally lead us the field of hydrodynamics [38]. The importance of the diffeomorphism group was also realized in a differ- ent context. In the seminal work of Arnold [3] it was realized that Euler’s equations, which govern the motion of incompressible fluids, can be inter- preted as geodesic equations on the group of volume-preserving diffeomor- phisms with respect to the right-invariant L2 metric. This interpretation was used by Ebin and Marsden [23] to rewrite Euler’s equation, a system of PDEs, as a second order ODE on a suitable Hilbert space and thus to prove the well-posedness of Euler’s equations in three dimensions. Inspired by Arnold, it was shown that other PDEs of hydrodynamic type can be interpreted as geodesic equations on diffeomorphism or related groups with respect to certain Riemannian metrics: examples include Burgers’ equation for the diffeomorphism group with respect to the L2-metric, the Camassa-Holm equation with respect to the H1-metric [14, 19, 28] and the

14 KdV equation, which is a geodesic equation for the Virasoro-Bott group with respect to the L2-metric [65, 73]. Closely associated are also the Hunter-Saxton and the modified Constantin-Lax-Majda equations, which arise, when the diffeomorphism group is endowed with the homogeneous H˙ 1 or H˙ 1/2-metrics [43, 88]. The interpretation of a PDE as the geodesic equation on an infinite di- mensional manifold opened up a variety of geometrical questions, which may be asked about the manifold. What is the curvature of this mani- fold? Do geodesics have conjugate points? Do there exist totally geodesic submanifolds? What are the properties of the metric induced by geodesic distance? Some of these questions have been addressed in the past. See e.g. the papers [63] for properties of conjugate points, [57, 56] for properties of geodesic distance and [42, 7, 54] for properties of the curvature tensor. The problem of matching objects and comparing shape has a history that goes beyond LDM or the field of computational anatomy. Beginning with [41], shape is understood as the properties of objects up to a group action. For example [41] studied the manifold of triangles modulo transla- tions, rotations and scalings and endowed it with a Riemannian structure. Another widely studied shape space is the space of closed unparametrized curves. Various Riemannian metrics have been proposed on this space [57, 58, 90, 74, 78, 21] and used for shape comparison. More recently atten- tion has turned to the space of surfaces embedded in R3, both parametrized [45, 49] and unparametrized [87, 48, 7]. We will concentrate in particular on the space of parametrized surfaces and the family of metrics for these, which where proposed in [7]. In contrast to the LDM framework, where a group of deformations is assumed to act on the surface and thereby induces a Riemannian metric, in this setting the metric is defined directly on the tangent space of the manifold of surfaces and takes into account the local geometry of the surface. The theoretical properties of these metrics were studied in depth in [7], but robust numerical methods for matching have yet to be developed.

1.2 Content of this Work

This work consists of three parts. In the first part we study the LDM frame- work, which is used for matching images [10], vector fields [16], landmarks

15 [40] and other data structures. We show how to formulate it abstractly in terms of a of deformations acting on a vector space of anatomical objects and derive the matching equations for them. All the above men- tioned examples can be seen as special cases of this general framework. More importantly this generalization allows us to consider other groups than the diffeomorphism group for matching. We use the freedom of changing the group of deformation to replace the diffeomorphism group by a semidirect product of diffeomorphism groups to account for multiple length-scales in the images. Other approaches to incor- porate multiscale aspects in LDM were [67, 68, 76]. We show the equivalence of all three methods and in doing so give a geometric interpretation to the sum-of-kernels strategy presented in [68]. All three approaches can also be generalized from a finite, discrete set of scales to the case of a continuum of scales. The second part deals with properties of the geodesic distance on the dif- feomorphism and Virasoro-Bott group. The geodesic distance between two points is defined as the infimum of the path-length over all paths connecting the two points. In finite dimensions, because of the local invertibility of the exponential map, this distance is always positive and the topology of the resulting metric space is the same as the manifold topology. However, in in- finite dimensions, this does not always hold: the induced geodesic distance for weak Riemannian metrics on infinite dimensional manifolds may van- ish. This surprising fact was first noticed for the L2-metric on shape space Imm(S1, R2)/ Diff(S1) in [57, 3.10]. Here Imm(S1, R2)/ Diff(S1) denotes the orbifold of all immersions Imm(S1, R2) of S1 into R2 modulo reparametriza- tions. In [56] it was shown that this result holds for the general shape space Imm(M,N)/ Diff(M) for any compact manifold M and Riemannian mani- fold N, and also for the right invariant L2-metric (or equivalently Sobolev- type metric of order zero) on each full diffeomorphism group with compact support Diffc(N). In particular, since Burgers’ equation is related to the 2 1 geodesic equation of the right invariant L -metric on Diffc(R ), it implies that solutions of Burgers’ equation are critical points of the length func- tional, but they are not length-minimizing. If the metric is too weak and the geodesic distance on the diffeomorphism group vanishes then the matching problem using this metric will be ill- posed, i.e. the minimum won’t be attained. Conversely, it was shown in

16 [56] that for Sobolev-type metrics on the diffeomorphism group of order one or higher the induced geodesic distance is positive. This naturally leads to the question whether one can determine the Sobolev order where this change of behavior occurs. We give a complete answer to this question in the case of M = S1 and a partial answer for M = R. Furthermore we show that the geodesic distance vanishes on the Virasoro-Bott group with respect to the L2-metric and that the path-length functional does not have any local minima. In the third part we explore alternative approaches to LDM for matching surfaces. The family of Sobolev-type metrics on surfaces, which was studied in [7], is a natural starting point to develop numerical methods for the matching problem of parametrized surfaces. In this work we concentrate on the Sobolev metric of first order and discretize the geodesic and matching equations using finite elements.

1.3 Contributions of this Work

The LDM matching framework is formulated in an abstract setting • thus exposing its geometric structure and the role played by the mo- mentum map. We show, that the standard examples of image, vector field and landmark matching are special cases of this abstract setting.

We show the equivalence between matching images with a semidirect • product of diffeomorphism groups and matching with a single group, but with a different metric. By letting each group in the semidirect product represent one scale, we obtain a method for multiscale match- ing.

This equivalence is then generalized from a finite number of discrete • scales to the case of a continuum of scales and show how to extract the information corresponding to each scale and how matching with a finite number of scales can be seen as a special case of the continuum of scales.

We show that the geodesic distance vanishes on the group Diffc(R) • of compactly supported diffeomorphisms of R, when equipped with a right-invariant Sobolev metric of order 0 s < 1 . ≤ 2

17 We shown that the geodesic distance also vanishes for s = 1 on the • 2 1 1 1 group Diff(S ) and that it does not vanish for s > 2 on either Diff(S ) or Diffc(R).

Furthermore it is shown that the distance vanishes on the Virasoro- • Bott group, when equipped with the right-invariant L2-distance.

In addition to knowing that the geodesic distance vanishes, we also • show that the length functional has not even local minima for either Diff(R) or the Virasoro-Bott group with the L2-metric.

A discretization for the H1-metric on the space of parametrized sur- • faces is proposed and it is shown how to solve the initial and boundary value problem for geodesics. This discretization is implemented and tested on a set of synthetic examples.

1.4 Publications

Several results from this thesis have been disseminated in scientific publica- tions, details of which are given below. The abstract formulation of the LDM framework and first results on the semidirect product (Chapter 2) were developed in collaboration with Dar- ryl D. Holm, Fran¸coisGay-Balmaz and Tudor Ratiu and published in [12]. This work was continued with Fran¸cois-Xavier Vialard and Laurent Risser to include a description of a continuum of scales, provide a variational in- terpretation of the mixture of kernels and was published in [13]. The vanishing of the geodesic distance for the L2-metric on the Virasoro- Bott group was joint work with Martin Bauer, Philipp Harms and Peter Michor and was published in [8]. The results on the non-existence of local minima was a continuation of this work and was published in [11]. The results on the Sobolev-type metrics on the diffeomorphism groups were also obtained together with Martin Bauer, Philipp Harms and Peter Michor and were published in [6]. They constitute Chapter 3. The discretization of the geodesic equation for surfaces and the match- ing problem was developed together with Martin Bauer and presented at the MICCAI 2011 workshop Mathematical Foundations of Computational Anatomy [5].

18 2 Multiscale Registration

2.1 Geometry of Registration

2.1.1 Motivation

The optimal solution to a non-rigid template matching problem is defined as the shortest, or least expensive, path of continuous deformations of one geometric object (template) into another one (target). The goal is to the find the path of deformations that is the shortest, or costs the least, for a given tolerance in matching the target. The approach focuses its attention on the properties of the action of a Lie group G of transformations on the set of deformable templates. The attribution of a cost to this process is based on metrics defined on the tangent space TG of the group G, following the principles in [31].

Formulation of LDM

In the Large Deformation Diffeomorphism Metric Matching (LDM) frame- work the template matching procedure for image registration is formulated as follows. Suppose an image, say a medical image, is acquired using MRI, CT, or some other imaging technique. To begin, consider the case that the information in an image can be represented as a function I :Ω R, where d → Ω R is the domain of the image. The set of images V = (Ω) is the ⊆ F vector space of smooth functions. One usually deals with planar (d = 2) or volumetric (d = 3) images. Consider the comparison of two images, the template image I0 and the target image I1. The goal is to find a transfor- mation ϕ :Ω Ω, such that the transformed image I ϕ−1 matches the → 0 ◦ target image I1 up to prescribed accuracy, as measured by, say, the squared L2-norm of their difference,

−1 2 E (I ,I ) = I ϕ I 2 . 2 0 1 k 0 ◦ − 1kL

19 For this purpose, one introduces a time-indexed deformation process, that starts at time t = 0 with the template (denoted I0), and reaches the target

I1 at time t = 1. At a given time t during this process, the current object It is assumed to be the image of the template, I0, obtained through a sequence of deformations. We also want the time-indexed transformation to be regular. To ensure its regularity, we require the transformation to be generated as the flow of a smooth, time dependent vector field u : [0, 1] Ω Ω, i.e. ϕ = ϕ with × → 1

∂tϕt = ut ϕt, ϕ (x) = x . (2.1) ◦ 0

We measure the regularity of ut via a kinetic-energy like term

Z 1 2 Ekin(ut) = ut H dt , 0 | | where ut H is a norm on the space of vector fields on Ω defined in terms of | | a positive self-adjoint differential operator L by

2 ut = u, Lu 2 . (2.2) | |H h iL

A possible choice for L is the Helmholtz operator Lu = u α2∆u. We − denote by the space of vector fields, for which this norm is finite. H Following [10] we can cast the problem of registering I0 to I1 as a varia- tional problem. Namely, we seek to minimize the cost

Z 1 2 1 −1 2 E(ut) = ut H dt + 2 I0 ϕ1 I1 L2 (2.3) 0 | | 2σ k ◦ − k over all time-dependent vector fields ut. The transformation ϕ1 is related to the vector field ut via (2.1). A necessary condition for a vector field ut to be minimal is that the derivative of the cost functional E vanishes at ut, that is DE(ut) = 0. It is shown in [10, Theorem 2.1.] and [60, Theorem

4.1] that DE(ut) = 0 is equivalent to

1 −1 0 1 0 Lut = det Dϕ (J J ) J , (2.4) σ2 | t,1 | t − t ∇ t

−1 0 −1 1 −1 where ϕt,s = ϕt ϕ and J = I ϕ , J = I ϕ . This condition ◦ s t 0 ◦ t,0 t 1 ◦ t,1 is then used in [10] to devise a gradient descent algorithm for numerically

20 computing the optimal transformation ϕ1.

Geometric reformulation of LDM

Formula (2.4) can be reformulated in a way that emphasizes its geometric nature. As we will show in Section 2.1.2, formula (2.4) is equivalent to

1  [ Lut = (ϕt.I ) ϕt, . (ϕ .I I ) . (2.5) −σ2 0  1 1 0 − 1

This formula can be understood as follows: the first factor ϕt.I0 is the action of the transformation ϕt on the image I0 V = (Ω). This is defined as −∈1 F ∗ the composition of functions, ϕt.I = I ϕ . The flat-operator [ : V V 0 0 ◦ t → maps images in V to the objects in V ∗, which are dual to scalar functions. (These dual objects are the scalar densities.) To describe such an operator, one first needs to choose a convenient space V ∗ in non-degenerate duality with V . We choose to identify V ∗ with functions in (Ω), by using the F L2-pairing Z f, I := f(x)I(x) dx , h i Ω where dx is a fixed volume element on Ω. With this choice, the flat operator ( [ ) is simply the identity map on functions. The space V ∗ is the smooth dual of V , that is the subspace of the topological dual, which would be the space of distributions, generated by linear functionals of the form I R fI dx 7→ Ω with f V . Although we have the flat-operator [, which allows us to ∈ move between the spaces V and V ∗, it is important that we conceptually distinguish between elements in V and in its dual V ∗. Indeed, the action of a transformation ϕ on an element in V ∗ is the dual action, and does not coincide with the action on V in general. In our example, the action on f V ∗ is ∈

ϕ.f = det Dϕ−1 (f ϕ−1) . (2.6) | | ◦

To see how this action arises, we need the abstract definition of a dual action, which is ϕ.f, I = f, ϕ−1.I . h i h i

Remark. The inverse in the definition of the dual action is necessary to

21 ensure that we have a left action:

ϕ.(ψ.f) = (ϕ ψ).f . ◦

Using this definition and the change of variables formula we see that Z Z ϕ.f, I = f, ϕ−1.I = (I ϕ)f dx = I(f ϕ−1) det Dϕ−1 dx h i h i Ω ◦ Ω ◦ | | = det Dϕ−1 f ϕ−1 ,I . | | ◦ [ Therefore, in the second factor ϕt, . (ϕ .I I ) of equation (2.5), the term 1 1 0 − 1 (ϕ .I I )[ is interpreted as an element in V ∗. Consequently, the action 1 0 − 1 is the dual action given by

[ −1 0 1 ϕt, . (ϕ .I I ) = det Dϕ (J J ) . 1 1 0 − 1 | t,1 | t − t

It remains to explain the last ingredient; namely, the diamond map in equation (2.5), : V V ∗ ∗ . (2.7)  × → H This is the cotangent-lift momentum map associated with the given repre- sentation of the Lie group G on the vector space V . Such momentum maps are familiar in geometric mechanics; see, e.g., [35] or [50]. The momentum map (2.7) takes elements of V V ∗, regarded as the cotangent bundle T ∗V × of the space of images, to objects in ∗, the dual to the space of vector H H fields. The map depends on the choice of ∗. For example, using the  H L2-pairing with respect to the fixed volume element dx, the momentum map (2.7) is defined for images that are scalar functions I V = (Ω) and ∈ F densities f V ∗ = ∗(Ω) by the relation ∈ F Z I f , u = f I u dx , (2.8) h  i Ω − ∇ · so that in this case I f = f I.  − ∇ Remark (Momentum maps).

In geometric mechanics, momentum maps generalize the notions of • linear and angular momenta. For a mechanical system, whose config- uration space is a manifold M, which is acted on by a Lie group G,

22 the momentum map J : T ∗M g∗ assigns to each element of the → phase space T ∗M a generalized “momentum” in the dual g∗ of the g of the Lie group G. For example, the momentum map for spatial translations is the linear momentum and for rotations it is the angular momentum. The importance of the momentum map in geometric mechanics is due to Noether’s theorem. Noether’s theorem states that the generalized momentum J is a constant of motion for the system under considera- tion when its Hamiltonian is invariant under the action of G on T ∗M. This theorem enables one to turn symmetries of the Hamiltonian into conservation laws.

[Notation for momentum maps: J versus ] For convenience in refer- •  ring to earlier work, e.g., [37, 39], we distinguish between the notation J for general momentum maps J : T ∗M g∗ and the notation →  for the particular type of cotangent-lift momentum maps on linear spaces, : V V ∗ ∗ that typically appear in applications of  × → H Euler-Poincar´etheory, as in equation (2.7).

Remark (Momentum of images). Momentum maps for images have been discussed previously. In particular, the momentum map for the EPDiff equa- tion of [36] produces an isomorphism between landmarks (and outlines) for images and singular soliton solutions of the EPDiff equation. This momen- tum map was shown in [38] to provide a complete parametrization of the landmarks by their canonical positions and momenta. A related interpreta- tion of momentum for images in computational anatomy was also discussed in [61].

We now explain in which sense expression (2.8) is a momentum map. Even though the cost functional (2.3) is not invariant under the action of the diffeomorphism group, one may still define the momentum map :  V V ∗ ∗ via × → H I f, u = f, uI , h  i h i as done in geometric mechanics, see [50] and [35]. The action u.I is defined as u.I := ∂t t ϕt.I for a curve ϕt such that ϕ (x) = x and ∂t t ϕt = | =0 0 | =0 u. This is the infinitesimal action corresponding to the action of Diff(Ω) on V . Although the -map does not provide a conserved quantity of the 

23 dynamics, it nevertheless helps our intuition and gives us a way to structure the formulas. Let us apply this concept to the registration of scalar images I (Ω) ∈ F on the domain Ω. The infinitesimal action is given by

−1 u.I = ∂t (I ϕ ) = I u t=0 ◦ t −∇ · and thus the momentum map in this case is Z

I f, u H∗×H = f, I u V ∗×V = ( I u)f dx = f I, u H∗×H , h  i h −∇ · i Ω − ∇ · h− ∇ i as stated in formula (2.8). The key is to reinterpret the L2-duality between the functions I u and f as the duality between the vector fields f I −∇ · − ∇ and u. Using formulas (2.8) and (2.6) in equation (2.5), we regain the stationarity condition (2.4). Remark. Writing the gradient of the cost functional (2.4) in the geometric form (2.5) has several advantages. For example, it allows us to generalize an algorithm that matches images as scalar functions, to cope with differ- ent data structures, such as densities, vector fields, tensor fields and others. Making this generalization allows one to see the underlying common geo- metrical framework in which we may unify the treatment of these various data structures. We can also keep the data structure fixed and vary the norm , and thereby alter our criteria of how we measure the distance k · k between two objects. In addition, the geometrical setting introduced here for image analysis allows us to vary not only the data structure, but also to change the group of transformations. We will explore this possibility and consider images registration using an iterated semidirect product of diffeomorphism groups, thus incorporating multiple scales into the registration framework.

2.1.2 Abstract Framework

Diffeomorphic image registration may be formulated abstractly as follows. Consider a vector space V of deformable objects on which an inner product , is defined, that allows us to measure distances between two such objects. h· ·i We can think of V as containing brain MRI images, an example frequently

24 encountered in computational anatomy, see e.g. [60]. The distance between two objects can be defined as I J 2 = I J, I J , which in the case k − k h − − i of images is the L2-distance Z I(x) J(x) 2 dx . Ω| − |

The second ingredient is a Lie group G of deformations, that acts on the space V of deformable objects from the left

(g, I) G V gI V. ∈ × 7→ ∈

In computational anatomy G usually is taken to be the group of diffeomor- phisms Diff(Ω) or variants of it. A diffeomorphism ϕ Diff(Ω) acts on ∈ images by push-forward; that is, by pull back with the inverse map,

−1 −1 ϕ.I := ϕ∗I = I ϕ or ϕ.I(x) = I(ϕ (x)) . ◦

Roughly speaking, this action corresponds to drawing the image I on a rubber canvas, then deforming the canvas by ϕ and watching the image being deformed along with the canvas. It is also the basis for the familiar Lagrangian representation of fluid dynamics as described in [37].

Given a curve t gt of transformations, we define the right-invariant 7→ velocity vector ut g as ∈ −1 ut = (∂tgt)gt . (2.9)

We obtain ut by taking the tangent vector of gt and right-translating it back to the tangent space at the identity TeG = g, which is the Lie algebra of G. Rewriting (2.9) as

∂tgt = utgt (2.10) and specifying initial conditions at some time t = s, we obtain an ordinary differential equation (ODE). If we start with velocity vectors ut, we can solve this ODE to reconstruct the curve gt. This corresponds to the construction of diffeomorphisms as flows of vector fields via the equation

∂tϕt = ut ϕt , ϕ (x) = x . ◦ 0

This idea was first introduced in image matching in [18]. Let us denote by

25 u gt,s the solution of the ODE (2.10) rewritten as

u u u ∂gt,s = utgt,s , gs,s = e ,

u with the initial condition that gt,s is the identity e at time t = s. Since the u u time t = 0 will play a special role, we denote gt := gt,0. Standard results for differential equations show the following properties

−1 −1 gt,sgs,r = gt,r , gt,s = gtgs , gt,s = gs,t , which we will use in our calculations.

Following the motivation discussed in Section 2.1.1 we define the abstract version of the cost functional (2.3) as

Z 1 1 u 2 E(ut) := `(ut) dt + 2 g1 I0 I1 V , (2.11) 0 2σ k − k where the function ` : g R is a Lagrangian measuring the kinetic energy → contained in ut and is the norm on V induced by the inner product k · k , . Note that formula (2.11) defines a matching problem for any data h · · i structure living in a vector space V and any group of deformations G acting on V . Although it was inspired by the concrete problem of diffeomorphically matching scalar-valued images, the cost function (2.11) no longer contains any reference to image matching.

All the results in this section are to be interpreted formally. They are helpful to show the common geometrical ideas in different matching prob- lems, but as we will see in Section 2.2, they cannot be directly applied to the problem of diffeomorphic image registration. Therefore we will now assume that the objects are sufficiently smooth, so all the operations can be carried out.

Next, we want to deduce formula (2.5) for the derivative of the energy in our abstract framework. In order to compute the derivative DE(ut) we u need to know how g1 behaves under variations δut of ut. This is answered by the following lemma, the proof of which is adapted from [84] and [10].

Lemma 2.1. Let u : R g, t u(t) be a smooth curve in g and ε uε → 7→ 7→

26 a smooth variation of this curve. Then

Z t u d uε u   u u δgt,s := gt,s = gt,s Adgs,r δu(r) dr Tgt,s G. dε ε=0 s ∈

Proof. For all ε we have

d uε uε uε g = uε(t)g , g = e . dt t,s t,s s,s

Taking the ε-derivative of this equality yields the ODE     d d uε u d uε gt,s = δu(t)gt,s + u(t) gt,s , dt dε ε=0 dε ε=0

u d uε and then, using the notation δgt,s := dε ε=0 gt,s, we compute

d  −  − − gu  1 δgu = gu  1 u(t)gu gu  1 δgu + dt t,s t,s − t,s t,s t,s t,s u −1 u u  + gt,s δu(t)gt,s + u(t)δgt,s u u = gs,tδu(t)gt,s

u = Adgs,t δu(t) .

u Now we integrate both sides from s to t and multiply by gt,s from the left to get Z t u u   u δgt,s = gt,s Adgs,r δu(r) dr , s as required.

Notation and definitions for cotangent lifts. Already knowing from

(2.5) how the first derivative DE(ut) of the cost functional is going to look, we want to establish the necessary notation before we proceed with the rest of the calculation.

The inner product on V provides a way to identify V with the smooth • dual V ∗, which is a subspace of the topological dual. To I V one ∈ associates the linear form I[ := I, V ∗. h · i ∈ Given an action G on V , we define the cotangent lift action of G on • π V ∗ via ∈ gπ, I = π, g−1I , for all I V. h i ∈

27 As mentioned earlier in remark 2.1.1, the inverse in this definition is necessary to make the dual action G V ∗ V ∗ into a left action. × → Finally we define the cotangent-lift momentum map : V V ∗ g∗ •  × → via I π, u = π, uI , h  i h i

where uI is the infinitesimal action of g on V defined by uI = ∂t t gtI | =0 for a curve gt with g = e and ∂t t gt = u. The use of the momentum 0 | =0 map was motivated in Remark 2.1.1.

Now we are ready to calculate the stationarity condition DE(ut) = 0.

Theorem 2.2. Given a smooth curve t ut g, we have 7→ ∈

δ` u u DE(ut) = 0 (t) = g I g π , (2.12) ⇐⇒ δu − t 0  t,1 or, equivalently

δ` 1 0  u 0 1[ DE(ut) = 0 (t) = J g J J , (2.13) ⇐⇒ δu −σ2 t  t,1 1 − 1

0 1 where the quantities π, Jt , and Jt are defined as

1 π := (guI I )[ V ∗,J 0 = guI V,J 1 = gu I V. σ2 1 0 − 1 ∈ t t 0 ∈ t t,1 1 ∈

When G acts by , the stationarity condition simplifies to

δ` 1 0 0 1[ DE(ut) = 0 (t) = J J J . ⇐⇒ δu −σ2 t  t − t

0 The quantity Jt is the template object moved forward by gt until time t 1 and Jt is the target object moved backward in time from 1 to t.

Proof. Using the notation π := 1 (guI I )[ = 1 (J 0 J 1)[ V ∗, we may σ2 1 0 − 1 σ2 1 − 1 ∈ calculate

Z 1  1 u 2 DE(u), δu = δ `(u(t)) dt + 2 g1 I0 I1 V h i 0 2σ k − k Z 1     δ` d uε = (t), δu(t) dt + π, (g1 I0 I1) 0 δu dε ε=0 − Z 1   δ` u = (t), δu(t) dt + π, δg1 I0 0 δu h i

28 Z 1     Z 1   δ` u   u = (t), δu(t) dt + π, g1 Adg0,s δu(s) ds I0 0 δu 0 Z 1    δ` D u −1   E u = (t), δu(t) dt + (g1 ) π, Adg0,t δu(t) I0 dt 0 δu Z 1    δ` D u −1 E u = (t), δu(t) + I0 (g1 ) π, Adg0,t δu(t) dt 0 δu  Z 1   δ` ∗  u −1  = (t) + Ad u I0 (g ) π , δu(t) dt , g0,t 1 0 δu  which must hold for all variations δu(t). Therefore,

δ` ∗  u −1  (t) = Adgu I0 (g1 ) π δu − 0,t  = guI gu π − t 0  t,1 1 [ = J 0 gu J 0 J 1 . − σ2 t  t,1 1 − 1

If G acts by isometries, then the group action commutes with the flat map and we obtain δ` 1 [ (t) = J 0 J 0 J 1 . δu −σ2 t  t − t This concludes the proof.

This theorem tells us how to compute the gradient of the cost functional for any data structure and any group action. Just like the cost functional (2.11) itself, it is expressed entirely in geometric terms and contains no reference to particular examples such as images. This makes the theorem widely applicable.

δ` Remark. Although the momentum δu (t) at each time depends on I0 and δ` I1, it turns out that δu (t) obeys a dynamical equation that is independent of I0, I1. The equation in question is the Euler-Poincar´eequation on G. History and applications of the Euler-Poincar´eequation can be found in [37], [50] and [51] and its use in computational anatomy in more detail in [89].

δ` Lemma 2.3. The momentum δu (t) satisfies

d δ` δ` (t) = ad∗ (t) . (2.14) dt δu − ut δu

29 This is the Euler-Poincar´eequation on the Lie group G with Lagrangian ` : T G/G g R. ' →

Proof. Because the cotangent-lift momentum map is Ad∗-invariant [50] we obtain from theorem 2.2

δ` (t) = guI gu π δu − t 0  t,1 ∗  u −1  = Ad u 1 I0 (g1 ) π . − (gt )− 

Differentiation of Ad∗ follows the rules

∗ ∗ ∗ ∂t Adgt η = Adgt ad 1 η , ∂gtgt− ∗ ∗ ∗ ∂t Ad 1 η = ad 1 Adgt η . gt− − ∂gtgt−

From this we see that

d δ` d ∗  u −1  (t) = Ad u 1 I0 (g1 ) π dt δu −dt (gt )−    = ad∗ Ad∗ I (gu)−1 π ut gt 0  1 δ` = ad∗ (t) , − ut δu and hence the momentum satisfies the Euler-Poincar´eequation.

Remark (Dependence of I0,I1 on the initial momentum). It might seem counter-intuitive that the momentum evolves independently of the objects we are trying to match. However, the objects I0,I1 do influence the mo- δ` mentum δu (t) in a significant way. Solving the Euler-Poincar´eequations δ` requires that we know the initial momentum δu (0) and this initial momen- tum depends on I0, I1 through the formula

δ` (0) = I (gu)−1π . δu − 0  1

Alternatively, we might look at the problem from the viewpoint of the variational principle. Assume that `(u) = 1 u 2 is the squared length of a 2 | | vector for some inner product , on g. If we have found a vector field ut h· ·i

30 and group element g1, which minimize

Z 1 1 2 1 2 u dt + 2 g1I0 I1 V , 2 0 | | 2σ k − k then the vector field ut must also minimize

Z 1 u 2 dt , 0 | | among all vector fields uet whose flows get coincide with gt at time t = 1, i.e., ge1 = g1. But this means that ut must be the velocity vector field of a geodesic gt in G. Here we have implicitly endowed G with a right-invariant Riemannian metric induced by the inner product , on g. The Euler- h· ·i Poincar´eequation (2.14) is just the geodesic equation on the Lie group G with respect to this Riemannian metric.

2.2 Registration using Diff(Ω)

2.2.1 The Setting

In computational anatomy the group of deformations G is usually the group d of diffeomorphisms of some domain Ω R . Different types of data used in ⊂ computational anatomy, such as landmarks, scalar-valued images or vector fields, are deformed by diffeomorphisms via the mathematical operations of pull-back and push-forward. Intuitively this corresponds to embedding your data into the domain Ω, then deforming Ω by the diffeomorphism and observing how the data is deformed with it. From a numerical point of view an efficient way of constructing diffeomorphisms is as the flow of a vector field. Following [92] we will consider a certain class of vector spaces, called admissible vector spaces.

Definition 2.4. A Hilbert space , consisting of vector fields on the domain H 1 d Ω, is called admissible, if it is continuously embedded in C0 (Ω, R ), i.e. there exists a constant C > 0 such that

u ,∞ C u H . | |1 ≤ | |

1 d 1 Here C0 (Ω, R ) is the space of all C -vector fields on Ω that vanish on

31 the boundary ∂Ω and at infinity with the norm

d X i u 1,∞ := sup u(x) + u (x) . | | x∈ | | |∇ | Ω i=1

An admissible vector space falls into the class of reproducing H Hilbert spaces.

d Definition 2.5. A Hilbert space , consisting of functions u :Ω R H → is called a reproducing kernel Hilbert space (RKHS), if for all x Ω and d a a ∈ p R the point-evaluation evx : R defined as evx(u) := a u(x) is a ∈ H → · continuous linear functional. In this case the relation

d u, K(., x)a = a u(x), u , a R h i · ∈ H ∈

d×d defines a function K :Ω Ω R , called the kernel of . × → H If we denote by L : ∗ the canonical isomorphism between a Hilbert H → H space and its dual, then we have the relation

−1 a K(y, x)a = L (evx)(y) .

In order for the RHKS to be admissible the kernel K has to satisfy the following properties:

K is twice continuously differentiable with bounded derivatives, i.e. • 2 d×d K C (Ω Ω, R ) and K 2,∞ < . ∈ × | | ∞ K vanishes on the boundary of Ω Ω, i.e. K(x, y) = 0 whenever • × x ∂Ω or y ∂Ω. ∈ ∈ Further exposition of the theory of RKHS can be found, e.g. in [4], [69].

Example. The Sobolev embedding theorem (see e.g. [2, Chapter 6]) states d that for Ω R there is an embedding ⊆ d Hk+m(Ω) , Ck(Ω), m > → 2 of the Sobolev space Hk+m(Ω) into the space of k-times continuously dif- ferentiable functions Ck(Ω). Therefore for m big enough, Hk+m(Ω) is an

32 admissible space. The corresponding kernel is the Green’s function of the operator L = Id + Pk+m ( 1)j∆j. j=1 −

We fix a RKHS with kernel K and let u L2([0, 1], ) be a time- H ∈ H dependent vector field. We consider the differential equation

∂tϕt = ut ϕt , ϕ (x) = x . (2.15) ◦ 0

Results from [92] tell us that this equation has a solution

ϕ C1([0, 1] Ω, Ω) , ∈ × defined for all t [0, 1] and for each t the map ϕt :Ω Ω is a diffeomor- ∈ → phism of Ω. For our matching purposes we will use the group GH consisting of all diffeomorphisms obtained in such way.

2 GH = ϕ : ϕt is a solution of (2.15) for some u L ([0, 1], ) (2.16) { 1 ∈ H }

Theorem 2.6 (from [92]). GH is a group.

2 Proof. Let ut, vt L ([0, 1], ) be two vector fields and ϕt, ψt GH their ∈ −1 H ∈ flows. To show that ϕ GH consider the vector field ut := u −t and 1 ∈ e − 1 denote by ϕt its flow. Then, since ϕt ϕ and ϕ −t are both integral curves e e ◦ 1 1 of uet, i.e.

∂tϕt ϕ = ut (ϕt ϕ ) , ∂tϕ −t = u −t ϕ −t , e ◦ 1 e ◦ e ◦ 1 1 − 1 ◦ 1

−1 we have ϕt ϕ = ϕ −t and evaluating at t = 1 gives ϕ = ϕ . Hence e ◦ 1 1 e1 1 1 ϕ GH. 1 ∈ To prove that ψ ϕ GH we define the vector field 1 ◦ 1 ∈ ( ut, if t 1 (u ? v)t := ≤ . vt− , if 1 < t 2 1 ≤

If ηt is the flow of (u ? v), we see that for t 1 we have ηt = ϕt while for ≤ 1 < t 2 the flow is given by η = ψt− ϕ . Thus ϕ = ψ ϕ . Now we ≤ 2 1 ◦ 1 2 1 ◦ 1 can rescale the vector field to fit into the time interval [0, 1] and have thus shown that ψ ϕ GH, which completes the proof. 1 ◦ 1 ∈

33 2.2.2 Matching Problems

Consider a general matching problem that seeks to minimize an energy of the form Z 1 1 2 1 E(u) = ut H dt + 2 U(ϕ1) , 2 0 | | 2σ where U : GH R is a functional containing the information about the → data structure. Examples of such functionals are

U(ϕ) = Pn ϕ(xi) yi 2 for landmark matching • i=1| − | U(ϕ) = I ϕ−1 I 2 for image-matching • k 0 ◦ − 1kL2 The energy is to be minimized over the space u L2([0, 1], ) of time de- ∈ H pendent vector fields. The existence of a minimizer is shown in the following theorem.

Theorem 2.7 (from [92]). Let U : GH R be bounded from below and → have the following property

n n n −1 If (ϕ )n∈ is a sequence in GH such that ϕ ϕ and (ϕ ) N → → −1 n ϕ uniformly on compact sets and Dϕ ∞ is bounded then | | U(ϕn) U(ϕ). → Then there exists a minimizer u L2([0, 1], ) such that e ∈ H

E(u) = inf E(u) . e u

n Proof. Let (u )n∈N be a minimizing sequence, i.e.

lim E(un) = inf E(u) . n→∞ u

n Then since U(ϕ) is bounded from below, (u )n∈N must be a bounded se- quence in L2([0, 1], ). Since bounded sets in Hilbert spaces are weakly H n compact, we can extract a subsequence, again denoted by (u )n∈N, that converges weakly to some ue. From

n n u , u u 2 u 2 h ei ≤ | |L |e|L

n we see by passing to the lim inf that u 2 lim infn→∞ u 2 . Concerning |e|L ≤ | |L U(ϕn) it is shown in [92] that weak convergence un u of the vector 1 → e

34 fields implies the uniform convergence of ϕn ϕ on compact sets. Since 1 → e1 (ϕn)−1 is the flow of the vector field un := un we also have the uniform t t − 1−t convergence of (ϕn)−1 ϕ−1 on compact sets. A generalized version of 1 → 1 Gr¨onwall’s lemma (see [92]) gives the estimate

R 1 n n 0 C00 |ut |1, dt Dϕ ∞ C e 0 ∞ | | ≤

n 2 and since (u )n∈N is L -bounded, we see from

Z 1 Z 1 n n n n ut 1,∞ dt C ut k dt = C u L1 C u L2 0 | | ≤ 0 | | | | ≤ | |

n n that Dϕ ∞ is bounded as well. Hence by assumption U(ϕ ) U(ϕ ). | | 1 → e1 Putting all pieces together we get

2 1 E(u) = u 2 + U(ϕ ) e |e|L 2σ2 e1 n 2 1 n n lim inf u 2 + lim U(ϕ ) = lim E(u ) ≤ n→∞ | |L 2σ2 n→∞ 1 n→∞ inf E(u) , ≤ u∈L2

Hence ue is a minimizer for E(u).

In order to apply this theorem we need to check in our examples that the functional U(ϕ) satisfies the required property. For landmark matching even point-wise convergence would be sufficient for the convergence of U(ϕn) to- wards U(ϕ). For image matching the required convergence is proven below.

Lemma 2.8. If I ,I L2(Ω), then U(ϕ) = I ϕ−1 I 2 satisfies the 0 1 ∈ k 0 ◦ − 1k property in Theorem 2.7.

n n −1 −1 Proof. Let (ϕ )n∈N be a sequence in GH such that (ϕ ) ϕ uniformly n → on compact sets and Dϕ ∞ is bounded. First note that | |

n −1 −1 n −1 −1 I0 (ϕ ) I1 2 I0 ϕ I1 2 f (ϕ ) I0 ϕ 2 . k ◦ − kL − k ◦ − kL ≤ k ◦ − ◦ kL

Next approximate I with a smooth function f C∞(Ω) with compact 0 ∈ c support, so that I f 2 < . Then k 0 − kL

n −1 −1 I (ϕ ) I ϕ 2 k 0 ◦ − 0 ◦ kL ≤

35 n −1 n −1 n −1 −1 I (ϕ ) f (ϕ ) 2 + f (ϕ ) f ϕ 2 + ≤ k 0 ◦ − ◦ kL k ◦ − ◦ kL −1 −1 + I ϕ I ϕ 2 k 0 ◦ − 0 ◦ kL ≤ 1 Z  2 p n 0 2 n −1 −1 2 det Dϕ (I0 f) L2 + f ∞ (ϕ ) (x) ϕ (x) dx + ≤ k | | − k Ω| | | − | p + det Dϕ (I f) 2 k | | 0 − kL ≤ 1 Z ! 2 0 n −1 −1 2 C1 + f ∞ (ϕ ) (x) ϕ (x) dx + C2 . ≤ | | supp(f)| − |

Since (ϕn)−1 converges uniformly (ϕn)−1 ϕ−1 on compact sets, we see → that U(ϕn) U(ϕ) < , provided n is large enough. This concludes the | − | proof.

2.2.3 Landmark Matching

The simplest kind of objects used in computational anatomy are landmarks. n i d Landmarks are labeled collections I = (x1, . . . , x ) of points x R . Given ∈ two sets (x1, . . . , xn), (y1, . . . , yn) of landmarks, the landmark matching problem consists of minimizing the energy

Z 1 n 1 2 1 X i i 2 E(u) = ut dt + ϕ (x ) y . (2.17) 2 | |H 2σ2 | 1 − | 0 i=1

d n Our space of deformable objects is V = (R ) with the usual inner product

n X I,J = xi yi , h i · i=1 for I = (x1, . . . , xn), J = (y1, . . . , yn). The action of the diffeomorphism group GH is by push-forward

ϕ.I := ϕ(x1), . . . , ϕ(xn) .

dn ∗ dn The corresponding cotangent-lift action on the dual space (R ) ∼= R is given by ϕ.J [ = Dϕ(x1)−1,T y1, . . . , Dϕ(xn)−1,T yn ,

36 and the calculation D E D E I J [, u = J [, uI  H∗×H = (y1, . . . , yn), (u(x1), . . . , u(xn)) n X = yi u(xi) · i=1 * n + X i = y δxi , u i=1 H∗×H yields the diamond operator

n 1 n 1 n [ X i (x , . . . , x ) (y , . . . , y ) = y δ i ,  x i=1 R where δx is the delta-distribution defined by f(y)δx(y) dy = f(x) for a test function f(y). Note that since, is a RKHS, the delta-distribution i H∗ y δxi is an element of the dual space , since it is the evaluation functional i yi i H i y δ i = ev at the point x Ω in the direction y . x xi ∈ Applying Theorem 2.7 we see that (2.17) attains a minimum. The con- dition (2.13) that a minimizing vector field ut must satisfy is

n 1 X i −1,T i i Lut = Dϕt, (ϕ (x )) (ϕ (x ) y ) δ i . −σ2 1 1 1 − ϕt(x ) i=1

i Consequently, the momentum Lut is concentrated only on the points ϕt(x ). By using the Green’s function K(x, y) of the differential operator L, the minimizing condition above can be rewritten for the velocity ut as

n 1 X i  i −1,T i i  ut = K(x, ϕt(x )) Dϕt, (ϕ (x )) (ϕ (x ) y ) . −σ2 1 1 1 − i=1

2.2.4 Image Matching

The large deformation diffeomorphic matching framework used in [9] and

[10] seeks to match two images I0,I1 by minimizing

Z 1 1 2 1 −1 2 E(u) = ut k dt + 2 I0 ϕ1 I1 L2 . (2.18) 2 0 | | 2σ k ◦ − k

37 This example has already been discussed in Section 2.1.1. We review it here and apply the abstract formalism developed above. In this example the space V = (Ω) of deformable objects consists of real valued functions F on Ω. We endow this space with the L2-inner product. The group of deformations is again the group of diffeomorphisms Gk, generated by vector

fields in k. To avoid analytical difficulties we restrict ourselves to smooth, H i.e. C∞, images with compact support and require the vector fields to be smooth as well.

The action of GH on V is by push-forward

−1 ϕ.I = ϕ∗I = I ϕ ◦ for ϕ GH and I V . As we have seen, the dual action on the smooth ∈ ∈ dual reads ϕ.π = det Dϕ−1 π ϕ−1 , | | ◦ where det Dϕ denotes the absolute value of the of Dϕ. The | | diamond operator in this example is

I π = π I.  − ∇

It is proven in Theorem 2.7 and Lemma 2.8 that the energy (2.18) does admit a minimizing vector field. In order to apply theorem 2.2 one needs to assume more regularity of the image: I H1(Ω) would be enough. With 0 ∈ this additional regularity the necessary conditions are

1 −1 0 1 0 Lut = det Dϕ (J J ) J , (2.19) σ2 | t,1 | t − t ∇ t

0 −1 1 −1 where J = I ϕ , J = I ϕ , and ϕt,s is the flow of the vector field t 0 ◦ t,0 t 1 ◦ t,1 ut

∂tϕt,s = ut ϕt,s, ϕs,s(x) = x . ◦ See [86] for an extension of the LDM framework to functions of bounded variation. Equation (2.19) was used in [10] in devising a gradient descent scheme to computationally find the minimizing vector field.

38 2.2.5 Vector Field Matching

Diffusion tensor magnetic resonance imaging measures the anisotropic dif- fusion of water molecules in biological tissues, thus enabling us to quantify the structure of the tissue. The measurement at each voxel is a second order symmetric tensor. It was shown in [66] and [72] that the alignment of the principal eigenvector of this tensor tends to coincide with the fiber orientation in brain and heart. d The fiber orientation can be described by a vector field I :Ω R and → matching two vector fields can be formulated as minimizing the energy

Z 1 1 2 1 −1 2 E(u) = ut H dt + 2 Dϕ1 I0 ϕ1 I1 L2 . (2.20) 2 0 | | 2σ k ◦ ◦ − k

d In this example the space of deformable objects V = X(Ω, R ) consists of vector fields in Ω, the deformation group is the group of diffeomorphisms

GK , generated by vector fields in , and GH acts on V by push forward H

−1 ϕ.I = ϕ∗I = Dϕ I ϕ . ◦ ◦

Again to avoid analytical issues, we assume that the vector fields are C∞ with compact support and that the space also consists of smooth vector H fields. The smooth dual of V with respect to the L2-inner product can be identified with the space of one-forms V ∗ = Ω1(Ω). Concerning the existence of the minimizing vector field we have the fol- lowing theorem.

Theorem 2.9. If is an admissible vector space and additionally embedded H 2 d in C0 (Ω, R ), then the energy (2.20) admits a minimizing vector field.

Sketch of proof. It is shown in [92, Chapter 12] that given an embedding d n , C1(Ω, R ) weak convergence u u of time-dependent vector fields H → 0 → un L2([0, 1], ) implies uniform convergence of the flows ϕn ϕ on ∈ H 1 → 1 compact sets. Since the derivative Dϕt of the flow ϕt satisfies the ODE

∂tDϕt(x) = Dut(ϕt(x)).Dϕt(x), Dϕ0(x) = Id ,

d the additional assumption , C2(Ω, R ) allows us to conclude that we also H → 0 have the uniform convergence of the derivatives Dϕn Dϕ on compact 1 → 1

39 sets. In the same way as in lemma 2.8 we can show that the matching functional U(ϕ) = Dϕ I ϕ−1 I 2 has the convergence property k ◦ 0 ◦ − 1kL2 U(ϕn) U(ϕ), whenever we have the convergence of ϕn → → ϕ and (ϕn)−1 ϕ−1 uniformly on compact sets and also the → convergence of the derivatives Dϕn Dϕ uniformly on compact → sets.

Now, proceeding as in theorem 2.7 shows the existence of a minimum.

The infinitesimal action of u on I V is given by the negative of ∈ H ∈ the Jacobi-Lie bracket whose components are

∂ui ∂Ii (uI)i = Ij uj = [u, I]i . ∂xj − ∂xj −

The object dual to vector fields with respect to the L2-pairing are one-forms π V ∗ = Ω1(Ω). The diamond map is given by ∈

I π = £I π div(I)π ,  − − where £I π denotes the Lie derivative of the one-form π along the vector i ∂ i field I. In coordinates, writing I = I ∂xi and π = πidx , we can write the diamond map in the form

 j j  ∂I j ∂πi ∂I i I π = πj + I + πi dx .  − ∂xi ∂xj ∂xj

Using these formulas, we can write the necessary condition for a vector field ut to minimize (2.20) as

  −1  Lut = £(ϕt) I0 + div ((ϕt)∗I0) det Dϕt,1 (ϕt,1)∗π , ∗ | |

1 [ ∗ where π = ((ϕ )∗I I ) V . As in the case of images, for these σ2 1 0 − 1 ∈ formulas to make sense additional regularity if the vector field I0 has to be assumed. Note that because the [-map does not commute with pull backs and push forwards, i.e. ∗ [ [ ϕ (ϕ∗I) = I , 6 this formula cannot be significantly simplified.

40 2.3 Multiscale Registration

Source I0 Target I1

K1 K10 MK5 (a) (b)

Figure 2.1: Influence of the smoothing kernel when registering two images containing feature differences at several scales simultaneously in the LDM framework (Images courtesy of Risser, [68]). (a) Source and target images I0 and I1. (b) Registration of the images I0 and I1 using different kernels: (K1 and K10) Gaussian kernels of standard deviations σ = 1 and σ = 10 pixels; (MK5) sum of 5 Gaussian kernels linearly sampled between 10 and 1 pixels. Diffeomorphic transformations of I0 at t = 1 (final de- formation) are shown on the top and corresponding deformed homogeneous grids (step = 1 pixel) are on the bottom.

When performing matching within the LDM framework, the choice of a kernel and in particular the length scale for the kernel plays an important role in the practical results. Images usually exhibit differences on several length scales and hence the choice of one single length scale for the matching may not be adequate. In Figure 2.1 the task is to match a square to a translated square with a small slit added to it. The translation represents the large scale motion, while the slit is a small-scale difference between the images. If we choose a large kernel in Figure 2.1, we don’t see the small scale differences between the source and target images at all. By choosing a small kernel, the resulting diffeomorphism exhibits an unnatural behavior. Instead of translating the square, the square is shrunk on one side and expanded on the other. Neither behavior is desirable. In this section we

41 want to propose a way to address this problem.

2.3.1 Semidirect Products

Given a space V of images, we want to explore the idea of matching them with n groups G1,...,Gn of deformations simultaneously. Let us illustrate the idea first for n = 2. We imagine to contain “large-scale” deformations and G1 to contain “small-scale” deformations. Since a deformation that captures small structures is also capable of capturing large ones, we will assume that G is a of G , G G . We assume that G acts 2 1 1 ≥ 2 1 on the space of images from the left and this naturally induces an action of G2 on V . To capture both scales simultaneously we define a new action that deforms the image first with a large-scale deformation and then with a small-scale one. ((g , g ),I) g .(g .I) V. 1 2 7→ 1 2 ∈ For this to be an action of G G on V one has to introduce a group mul- 1 × 2 tiplication of G G such that associativity is preserved. The calculation 1 × 2

((g , g ) (h , h )).I = (g , g ).((h , h ).I) 1 2 · 1 2 1 2 1 2 = (g1, g2).(h1.h2.I)

= g1.g2.h1.h2.I −1 = (g1g2h1g2 , g2h2).I shows that using the semidirect product group multiplication on G G 1 × 2

(g , g ) (h , h ) := (g g h g−1, g h ) 1 2 · 1 2 1 2 1 2 2 2 gives us an action of G1 o G2 on V . In the general case of n groups the group structure is given in the following lemma.

Lemma 2.10. Let G G ... Gn be a chain of Lie groups. One can 1 ≥ 2 ≥ ≥ define the n-fold semidirect product multiplication on the set G ... Gn 1 × × via

(g , . . . , gn) (h , . . . , hn) = (g cg ···g h , g cg ···g h , . . . , gnhn) (2.21) 1 · 1 1 2 n 1 2 3 n 2

−1 with cg h = ghg denoting conjugation. Then given the right-trivialized

42 tangent vector v(t) = (v1(t), . . . , vn(t)) of the curve g(t) = (g1(t), . . . , gn(t)), the curve can be reconstructed via the ODE

n ! X ∂tgk(t) = vk(t) + (Id Ad ) vi(t) gk(t) , (2.22) − gk(t) i=k+1 if k 2 and ∂tg (t) = v (t)g (t). We shall denote this semidirect product ≥ 1 1 1 by

G1 o o Gn (2.23) ··· to emphasise that each sub-product G1 o o Gk is a of ··· the whole product.

If each group Gk acts on a vector space V from the left, then

(g , . . . , gn).I := g .g . .gn.I (2.24) 1 1 2 ··· defines an action of G1 o o Gn on V . ··· Proof. Verifying the axioms for the group multiplication and the associa- tivity of the action on V is a straight-forward, if slightly longer calculation. The inverse is given by

−1 −1 −1 −1 (g , . . . , g ) = (c 1 g ,..., c 1 g , g ) . 1 n (g2···gn)− 1 gn− n−1 n

The right hand side of equation (2.22) can be obtained by differentiating the group multiplication at the identity, i.e. computing ∂t(h(t) g) t with · | =0 g fixed, h(0) = Id and ∂th(t) t = u. Step-by-step the computation is as | =0 follows.

∂t(h(t) g)k t = ∂thk(t) c gk t (2.25) · | =0 hk+1(t)···hn(t) | =0 n X = vkgk + ∂t c gk t (2.26) hi(t) | =0 i=k+1 n X −1 = vkgk + ∂thi(t)gkhi(t) t (2.27) | =0 i=k+1 n X = vkgk + vigk gkvi (2.28) − i=k+1 n X = vkgk + (Id Adg ) vigk (2.29) − k i=k+1

43 (2.30)

This concludes the proof.

The matching problem using a semidirect product group is to minimize the energy

Z 1 1 2 E(u1, . . . , un) = `(u1, . . . , un) dt + 2 g1(1). .gn(1).I0 I1 , 0 2σ k ··· − k (2.31) given the relationship (2.22) between the velocities u1, . . . , un and the de- formations g1, . . . , gn.

Theorem 2.11. Given a curve t (u1(t), . . . , un(t)) g1 o ogn the sta- 7→ ∈ ··· tionarity condition DE(u1(t), . . . , un(t)) for the action (2.31) is equivalent to

δ` δ` δ` (t) = g(t)I0 1 g(t, 1)π, (t) = (t) , δu δu δu 1 −  k 1 gk with π = 1 (g(1)I I )[ and g(t, s) G being the flow of the vector field σ2 0 − 1 ∈ 1 u + + un g , 1 ··· ∈ 1

∂tg(t, s) = (u (t) + + un(t))g(t, s), g(s) = e . 1 ···

Proof. The momentum map for the action of G1 o o Gn on V is given ··· by

I SDP π = (I π, . . . , I n π) ,  1  where I k π denotes the cotangent-lift momentum map corresponding to  the action of Gk on V , as the computation

I SDP π, (u , . . . , un) = π, (u , . . . , un).I h  1 i h 1 i = π, u .I + + un.I h 1 ··· i = I π, u + + I n π, un h 1 1i ··· h  i shows. The Gk-action for k 2 is a restriction of the G -action to a ≥ 1 smaller space, therefore the k-momentum map is also the restriction of the  -momentum map. 1 I k π = (I π) g .  1 | k

44 Applying theorem 2.2 we get the following equations for stationary vector

fields (u1, . . . , un)

δ` (t) = ge(t)I0 k ge(t, 1)π , δuk −  with g(t) = (g1(t), . . . , gn(t)) denoting an element of G1 o o Gn. The e ··· previous remarks about momentum maps imply that δ` (t) = δ` (t) for δuk δu1 gk k 2. Defining g(t) := g ... gn G we see from ≥ 1 · · ∈ 1

ge(t)I = (g1(t), . . . , gn(t))I = g1(t). . . . .gn(t)I = g(t)I that the actions of g(t) and g(t) on V are the same. Thus we have δ` (t) = e δu1 g(t)I g(t, 1)π and π = 1 (g(1)I I )[. It remains to show that g(t) is − 0 1 σ2 0 − 1 indeed the flow of u + + un. But this follows from 1 ··· n X ∂tg(t) = g gk− (∂tgk)gk gn 1 ··· 1 ··· k=1 n n n ! X X X = g gk− ui Adg ui gk gn 1 ··· 1 − k ··· k=1 i=k i=k+1 n n ! n ! X X X = g gk− ui gk gn g gk ui gk gn 1 ··· 1 ··· − 1 ··· +1 ··· k=1 i=k i=k+1 n ! n ! X X = ui g gn = ui(t) g(t) 1 ··· i=1 i=1 and hence everything is proven.

In the next section we will discuss the case, when the groups involved are diffeomorphism groups corresponding to different kernels.

2.3.2 Semidirect Products of Diffeomorphism Groups

To apply the framework of semidirect products, we need a nested sequence of diffeomorphism groups. We can obtain such a sequence by choosing admissible RKHS ,..., n of vector fields on Ω with kernels K ,...,Kn, H1 H 1 such that n. Then the corresponding groups will satisfy H1 ≥ · · · ≥ H Diff (Ω) Diffn(Ω), where Diffk(Ω) is the group defined by k via 1 ≥ · · · ≥ H (2.16). The matching problem can be expressed in the following way:

45 Definition 2.12 (LDM with a Semidirect Product). Registering I0 to I1 is done by finding the minimizing n-tuple (u1(t), . . . , un(t)) of

n Z 1 1 X 2 1 2 E(u , . . . , un) = ui(t) dt + ϕ(1).I Itarg , (2.32) 1 2 | |Ki 2σ2 k 0 − k i=1 0 where ϕ(t) = ϕ (t) ... ϕn(t) and ϕk(t) is defined via 1 ◦ ◦ n ! n ! X X ∂tϕk(t) = ui(t) ϕk(t) Dϕk(t). ui(t) , k < n (2.33a) ◦ − i=k i=k+1

∂tϕn(t) = un(t) ϕn(t) . (2.33b) ◦ with initial conditions ϕk(0) = Id.

Before we can ask, whether this problem has a minimizer, we have to show, that the diffeomorphisms are well-defined, i.e. the equations (2.33) have a solution.

Theorem 2.13. Let n be a nested sequence of RKHS and uk H1 ≥ · · · ≥ H ∈ 2 L ([0, 1], k) be time-dependent vector fields. Then the equations (2.33) H with initial conditions ϕk(0) = Id have unique solutions ϕk : [0, 1] Ω Ω, × → that are absolutely continuous in t and continuously differentiable in x. For almost all times t [0, 1] the solutions ϕk(t) are diffeomorphisms of Ω. ∈ Proof. In the case n = 1, the proof of existence and uniqueness of the flow for vector fields, which are only L2 in time, can be found in [92, Theorem 12.4]. For n = 2 the key idea of the proof is to consider the vector field u := u + u L2([0, 1], ). This vector field admits a flow ϕ(t), which is a 1 2 ∈ H1 solution of

∂tϕ(t) = u(t) ϕ(t) , ϕ(0) = Id . ◦

The existence and uniqueness of ϕ2(t) is clear from the case n = 1. If we compare the equation for ϕ(t) to (assuming that we already have established the existence of ϕ1(t))

∂t(ϕ (t) ϕ (t)) = (∂tϕ (t)) ϕ (t) + Dϕ (t).(∂tϕ (t)) 1 ◦ 2 1 ◦ 2 1 2 = (u (t) + u (t)) (ϕ (t) ϕ (t)) (Dϕ (t).u (t)) ϕ (t) 1 2 ◦ 1 ◦ 2 − 1 2 ◦ 2 + (Dϕ (t).u (t)) ϕ (t) 1 2 ◦ 2

46 = (u (t) + u (t)) (ϕ (t) ϕ (t)) , 1 2 ◦ 1 ◦ 2 we see that, since solutions of this equation are unique, necessarily

ϕ(t) = ϕ (t) ϕ (t) . 1 ◦ 2

−1 Hence we define ϕ1(t, x) := ϕ(t, ϕ2 (t, x)). Before we can claim that this is indeed a solution of (2.33), we need to check that this map is absolutely continuous in t. This follows from Lemma 2.14.

For n > 2 we proceed by considering the vector field uk + ... + un and comparing its flow to ϕk ... ϕn. The same argument can then be applied. ◦ ◦

Lemma 2.14. Let ϕ and ψ be the flows of two time-dependent vector 2 2 fields u L ([0, 1], and v L ([0, 1], ) respectively with i admis- ∈ H1 ∈ H2 H sible RKHS. Then for each x Ω, the map t ϕ(t, ψ(t, x)) is absolutely ∈ 7→ continuous in t.

Pn Proof. Choose ti < si such that ti si < δ. In order to estimate the i=1 | − | difference

ϕ(ti, ψ(ti, x)) ϕ(si, ψ(si, x)) | − | we will need Gronwall’s inequality (see [92, p170, (12.5)]), which allows us to estimate

Z t  ϕ(t, x) ϕ(t, y) x y exp u(r) 1,∞ dr . | − | ≤ | − | 0 | |

Using this we get

ϕ(ti,ψ(ti, x)) ϕ(si, ψ(si, x)) | − | ≤ ϕ(ti, ψ(ti)) ϕ(ti, ψ(si)) + ϕ(ti, ψ(si)) ϕ(si, ψ(si)) ≤ | − | | − | Z ti  Z si

ψ(ti) ψ(si) exp u(r) 1,∞ dr + u(r, ϕ(r, ψ(si))) dr ≤ | − | 0 | | ti Z si C ψ(ti) ψ(si) + u(r) 1,∞ dr ≤ | − | ti | | 0 C ψ(ti) ψ(si) + √si tiC u 2 ≤ | − | − k kL ([0,1],H1)

47 And it is not difficult to see that we can achieve

n X ϕ(ti, ψ(ti, x)) ϕ(si, ψ(si, x)) < ε | − | i=1 by choosing δ small enough, since ψ is absolutely continuous.

Concerning the spaces, in which the diffeomorphisms ϕk(t) lie, we have the following result.

Theorem 2.15. The diffeomorphisms ϕk(t) defined in Theorem 2.13 lie in

ϕk(t) Diffk(Ω). ∈

Proof. Because the spaces of vector fields form a nested sequence H1 ≥ · · · ≥ n, we have uk + + un k. As seen in the proof of Theorem 2.13, the H ··· ∈ H flow of uk + +un is given by ϕk ... ϕn and hence ϕk ... ϕn Diffk(Ω). ··· ◦ ◦ ◦ ◦ ∈ The statement now follows from

−1 −1 ϕk = (ϕk ... ϕn) ϕ ... ϕ ◦ ◦ ◦ n ◦ ◦ k+1 and the corresponding nesting Diff (Ω) Diffn(Ω) of the groups. 1 ≥ · · · ≥

Concerning the existence of a minimizer for the matching problem 2.12 we have the following result.

Theorem 2.16. Given two images I ,I L2(Ω), there exist vector fields 0 1 ∈ 2 uk L ([0, 1], k), such that the minimum of (2.32) is attained. ∈ H

Proof. This will be a consequence of theorem 2.18, where we will show that this problem is equivalent to a modified matching problem with a single kernel.

It would appear that the procedure of matching with multiple scales, involving a semidirect product of diffeomorphism groups is computationally more expensive, since the minimization has to be performed over n vector fields. However, we will show in the next section, that there is an equivalence between matching with a semidirect product and single scale matching with a modified kernel.

48 2.3.3 Sums of Kernels

There are a-priori different ways to perform multi-scale image matching within the LDM framework. The approach defined in definition 2.12 with a semidirect product group may be the most geometric one, but it would not be the first choice from a practical point of view. One idea, which was proposed in [67] and [68], was to change the kernel in order to include multi- scale aspects into the registration procedure. Instead of assigning a vector Pn field to each scale, the idea is to replace the kernel by a sum K = i=1 Ki of n kernels Ki, each having a length scale αi. The corresponding matching procedure is given below.

Definition 2.17 (LDM with Sum-of-Kernels). Registering I0 to I1 is done by finding the minimizer u(t) of

Z 1 1 2 1 2 E(u) = u(t) K dt + 2 ϕ(1).I0 I1 , (2.34) 2 0 | | 2σ k − k Pn where K = i=0 Ki is a sum of n kernels.

Intuitively we can explain, why this approach responds to the existence of multiple scales in the image, as follows. Let us take two kernels K1,K2 and consider a gradient descent algorithm to find the optimal velocity field ut as described in [10]. The update rule, using (2.19), is

un+1(t) = un(t) ε(un(t) K p(t)) − − ∗ = (1 ε)un(t) + εK p(t) , − ∗ where ε > 0 is a parameter for the gradient descent and p(t) a momentum

1 p(t) = det Dϕ(t, 1)−1 (J 0(t) J 1(t)) J 0(t) . σ2 | | − ∇

The information coming from the images is encoded in p(t), which is an element of p(t) L2(Ω). We obtain the velocity u(t) by convolving p(t) ∈ with the kernel K. For a kernel, which depends on a length scale α, like a Gaussian kernel, the parameter α represents the distance across which particles feel a correlation. If the momentum p was a δ-function, then the corresponding vector field would be significantly non-zero on a neighborhood with a size of order α.

49 Let us now write the update formula in the following way

un+1(t) = (1 ε)un(t) ε (K p(t) + K p(t)) . − − 1 ∗ 2 ∗

We see that the updated velocity will have a component K p(t) from the 2 ∗ large-scale kernel, which smoothes the image more and and also a component K p(t) from the small-scale kernel, which sees the more detailed behavior 1 ∗ of the image. Another important factor is the choice of relative weights between the kernels K1 and K2, since we are free to multiply each by a constant. This question and the choice of the scales itself is discussed further in [68]. The following theorem shows that both approaches are equivalent.

Theorem 2.18. Let K1,...,Kn be reproducing kernels such that the cor- responding spaces k satisfy n and define the kernel K = H H1 ≥ · · · ≥ H 2 K + + Kn. Let furthermore I ,I L (Ω) be two images. 1 ··· 0 1 ∈ 2 Then any minimum u L ([0, 1], K ) of (2.34) can be decomposed into ∈ H 2 u = u + +un with uk L ([0, 1], K ) such that the n-tuple (u , . . . , un) 1 ··· ∈ H k 1 minimizes (2.32). Conversely for any minimum (u1, . . . , un) of (2.32) the vector field u = u + + un is a minimum of (2.34). 1 ···

A formal argument. Given vector fields (u1, . . . , un) that minimize (2.34), according to theorem 2.11 they satisfy the equations

δ` (t) = ϕ(t).I0 ϕ(t, 1).π , δuk −  which in our case specify to

1 −1 0 0 1  uk(t) = Kk det Dϕ(t, 1) J (t)(J (t) J (t)) , −σ2 ∗ | |∇ − with ϕt being the flow of u (t)+ +un(t). But then u(t) = u (t)+ +un(t) 1 ··· 1 ··· satisfies the equation

1 −1 0 0 1  u(t) = (K + + Kn) det Dϕ(t, 1) J (t)(J (t) J (t)) , −σ2 1 ··· ∗ | |∇ − which coincides with image matching with the kernel K = K1 + K2. Al- though this doesn’t constitute a rigorous proof, it indicates, why the semidi- rect product matching problem and matching with the sum of kernels are

50 related.

Before we proceed with the proof we want to make some remarks about the space K K and its norm. It is shown in [4] that H 1+ 2

K K = u + u : u K , u K , H 1+ 2 { 1 2 1 ∈ H 1 2 ∈ H 2 } 2 2 2 u K1+K2 = inf u1 K1 + u2 K2 . | | u=u1+u2| | | |

Another interpretation is that the norm K K is the quotient norm with | · | 1+ 2 respect to the projection π : K K K K given by π(u , u ) = H 1 × H 2 → H 1+ 2 1 2 u1 + u2. The following lemma will be useful in the proof by allowing us to decomposing a path u(t) k k into a pair of paths (v(t), w(t)) ∈ H 1+ 2 ∈ k k . H 1 × H 2

Lemma 2.19. There exist continuous maps L : K K K and 1 H 1+ 2 → H 1 2 2 2 L2 : K +K K with the property that u = L1u + L2u H 1 2 → H 2 | |K1+K2 | |K1 | |K2

Proof. Let = K K be the direct sum of K and K and equip M H 1 × H 2 H 1 H 2 it with the inner product

(v, w) 2 := v 2 + w 2 . | |M | |K1 | |K2

Define the subspace := (v, v): v K K consisting of func- M0 { − ∈ H 1 ∩ H 2 } tions that lie in both vector spaces. Then = ker π, where π : M0 M → K K is the projection defined above. It is shown in [4, I.6.] that K K H 1+ 2 H 1+ 2 is isometrically isomorphic to ⊥. The required maps are given by M0

⊥ πi Li : K K = , K K K , H 1+ 2 ∼ M0 → H 1 × H 2 −→H i where πi : K is the canonical projection. M → H i Of course the same results also hold for the sum of more than two kernels.

Proof. The idea of the proof can be summarized in the commutative diagram

P u(t)= uk(t) (u1(t), . . . , un(t)) / u(t)

via (2.33) ∂tϕ(t)=u(t)◦ϕ(t) (2.35)   (ϕ1(t), . . . , ϕn(t)) / ϕ(t) ϕ(t)=ϕ1(t)◦...◦ϕn(t)

51 and the remark made above, that the norm on K is the quotient norm H with respect to the sum u = u1 + + un. ··· P Note first that whenever u and (u1, . . . , un) are related via u(t) = ui(t) P we also have ϕ(t) = ϕ (t) . . . ϕn(t) and therefore E(u) = E( ui) 1 ◦ ≤ ε 2 E(u , . . . , un). Now let  > 0 and choose u L ([0, 1], k) such that 1 k ∈ H

ε ε E(u1, . . . , un) inf E(u1, . . . , un) +  . ≤ (u1,...,un)

Then

X ε ε ε min E(u) E( ui ) E(u1, . . . , un) inf E(u1, . . . , un) +  . u ≤ ≤ ≤ (u1,...,un) (2.36) Since it holds for all  > 0 we get the inequality

min E(u) inf E(u1, . . . , un) . (2.37) u ≤ u1,...,un

2 For the other inequality let u L ([0, 1], K ) be a minimizer of E(u), the ∈ H existence of which is guaranteed by theorem 2.7. By lemma 2.19 we get for P each time t [0, 1] a decomposition u(t) = ui(t) with the property that ∈ E(u) = E(u1,..., un). This implies

inf E(u1, . . . , un) E(u1,..., un) = E(u) = min E(u) (2.38) u1,...,un ≤ u

Combining (2.37) and (2.38) shows that (u1,..., un) is a minimizer for

E(u1, . . . , un) and that the minima for E(u) and E(u1, . . . , un) coincide.

min E(u) = inf E(u1, . . . , un) = E(u1,..., un) u u1,...,un

Starting with a minimizing pair (ue1,..., uen) for E(u1, . . . , un), we see from P equation (2.36) that ue = uei is a minimizer for E(u). This concludes the proof.

The first approach introduced n vector fields, which were the right-inva- riant velocity of the semidirect product group. The second approach used only one vector field and a kernel, which was the sum of n kernels. A third approach to multi-scale image matching, which is is a special case of the kernel bundle method proposed in [77] and [76] is to use n kernels

52 K1,...,Kn corresponding to scales α1, . . . , αn and have a vector field ui for each scale αi. The matching procedure is as follows:

Definition 2.20 (Simultaneous Multiscale Registration). Registering the image I0 to I1 is done by finding the n-tuple (u1(t), . . . , un(t)), that mini- mizes n Z 1 1 X 2 1 2 ui(t) dt + ϕ(1).I I , 2 | |Ki 2σ2 k 0 − 1k i=1 0 Pn where ϕ(t) is the flow of the vector field u(t) = i=1 ui(t).

There is one diffeomorphism ϕ(1) performing the matching, which is gen- Pn erated as the flow of the sum i=1 ui(t) of the vector fields ui(t) at each scale. The proof of theorem 2.18 also shows the following.

Theorem 2.21. The simultaneous multiscale registration defined in defini- tion 2.20 and matching with a semidirect product from definition 2.12 are equivalent, i.e. they attain the minima at the same points.

2.3.4 The Order Reversed

The action (2.24) of the semidirect product from Lemma 2.10 proceeds by deforming the image with the coarsest scale diffeomorphism first and with the finest scale diffeomorphism last. However, it is also possible to reverse this order and to act with the finest scale diffeomorphisms first. This action also corresponds to a semidirect product and is, up to a , equivalent to the previously presented order of deformation. The reason to expand on this here is that this version is better suited to be generalized to a continuum of scales.

In this section we will assume that the group G1 contains the deformations of the coarsest scale and Gn those of the finest scale. The corresponding semidirect product is described in the following lemma.

Lemma 2.22. Let G G ... Gn be a chain of Lie groups. One can 1 ≤ 2 ≤ ≤ define the n-fold semidirect product multiplication on the set G ... Gn 1 × × via

(g , . . . , g ) (h , . . . , h ) = (g h , (c 1 g )h ,..., (c 1 g )h ) 1 n 1 n 1 1 h− 2 2 (h1···hn 1)− n n · 1 − (2.39)

53 −1 with ch g = hgh denoting conjugation. Then given the right-trivialized tangent vector v(t) = (v1(t), . . . , vn(t)) of the curve g(t) = (g1(t), . . . , gn(t)), the curve can be reconstructed via the ODE

∂ g (t) = Ad 1 g (t) . (2.40) t k (g1(t)···gk 1(t))− k −

We shall denote this product by

G1 n n Gn ··· to emphasize that each subproduct Gk n n Gn from the left is a normal ··· subgroup of the whole product.

Proof. This lemma can be proven in the same way as lemma 2.10.

These semidirect products defined in lemmas 2.10 and 2.22 are equivalent up to a .

Lemma 2.23. Let G G Gn be a chain of Lie groups. The map 1 ≤ 2 ≤ · · · ≤ ( G1 n n Gn Gn o o G1 Φ: ··· → ··· (g , . . . , g ) (g ,..., c 1 g ,..., c 1 g ) 1 n n (gn+2 k...gn)− n+1−k (g2...gn)− 1 7→ − is a group isomorphism between the two semidirect products and its deriva- tive at the identity is given by ( g1 n n gn gn o o g1 TeΦ: ··· → ··· (v , . . . , vn) (vn, . . . , vn −k, . . . , v ) 1 7→ +1 1 Proof. Direct computation.

The map Φ can be seen as one side of the following commuting triangle

Φ G1 n n Gn / Gn o o G1 ··· RRR ll ··· RRR lll RRR lll T1 RR ll T2 RR) ulll G Gn 1 × · · · × with the maps Ti given by

T (g , . . . , gn) = (g gn, g gn, . . . , gn− gn, gn) 1 1 1 ··· 2 ··· 1

54 T (gn, . . . , g ) = (gn g , gn g , . . . , gngn− , gn) . 2 1 ··· 1 ··· 2 1

They are group homomorphisms from the corresponding semidirect prod- ucts into the G Gn. They can be regarded as a 1 × · · · × trivialization of the semidirect product in the special case that the factors form a chain of .

We will now assume that we are given n kernels K1,...,Kn such that corresponding spaces satisfy the inclusions n, i.e. K rep- H1 ≤ · · · ≤ H 1 resents the coarsest scale and Kn the finest one. Note that the inclusions are reversed as compared to Section 2.3.1. The registration problem is now defined as follows.

Definition 2.24 (LDM with the other Semidirect Product). Registering I0 to I1 is done by finding the minimizing n-tuple (u1(t), . . . , un(t)) of

n Z 1 1 X 2 1 2 uk(t) dt + ϕ(1).I I , 2 | |k 2σ2 k 0 − 1k k=1 0 where ϕ(t) = ϕ (t) ... ϕn(t) and ϕk(t) is defined via 1 ◦ ◦   ∂ ϕ (t) = Ad 1 u (t) ϕ (t) . t k (ϕ1(t)◦...◦ϕk 1(t))− k k − ◦

To see that problems 2.12 and 2.24 are equivalent note that the following diagram commutes

Φ G1 o o Gn / Gn n n G1 ··· MM q ··· MMM qqq ϕ ϕ ◦...◦ϕMM ϕqqϕ ◦...◦ϕ = 1 nMMM qq= n 1 M& xqqq G1 and that TeΦ merely reverses the order of the vector fields (v1(t), . . . , vn(t)).

In particular the minimizing vector fields vi(t) are the same (up to order) and the diffeomorphisms ϕ(1) coincide as well. The difference is in the diffeomorphisms ϕi(t) at each scale. We will see in Section 2.3.5 that this version of the semidirect product is better suited to be generalized to a continuum of scales.

55 2.3.5 A Continuum of Scales

Besides considering a finite sum of kernels or scales, it is also possible to consider a continuum of scales. The theory of RKHS generalizes very well to this situation.

Let us assume that we have a family of admissible RKHS s with kernels H Ks and s [0, 1] with the following properties: ∈ The inclusions hold, •

s t for s t . H ≤ H ≤

Since s is an admissible RKHS, there exists a constant Cs such that • H

u ,∞ Cs u s , | |1 ≤ | |

with . s denoting the norm in s. The map s Cs is assumed to be | | H 7→ measurable and Z 1 2 Cs ds < . 0 ∞

The map (s, x, y) Ks(x, y) is assumed to be measurable. • 7→ From the first assumption it follows that is the coarsest scale and H0 H1 is the finest scale.

Let us denote in this section by the collection = ( s) of all H H H s∈[0,1] spaces and we introduce the space

   (1) x Ω, s us(x) is measurable  2  2 ∀ ∈ 7→  L ( ) = u L ([0, 1] Ω) : (2) s us s is measurable . H ∈ × 7→ | |  R 1 2   (3) us ds < +  0 | |s ∞ R 1 We can also define the kernel K(x, y) = 0 Ks(x, y) ds and the associated RKHS K . The following theorem explains the relationship between the H 2 spaces L ( ) and K and their norms. H H

Theorem 2.25 (From [69, 71]). The space K consists of vectors of the H form Z 1  2 K = us ds : u L ( ) , H 0 ∈ H

56 with the norm given by

Z 1 2 2 u K = inf vs s ds , | | v∈L2(H) 0 | |

R 1 for v satisfying the constraint u = 0 vs ds. Furthermore there exists an 2 2 R 1 2 operator L : K L ( ) such that u = (Lu)s ds. H → H | |K 0 | |s 2 R 1 By introducing the projection π : L ( ) K , given by π(u) = us ds H → H 0 we see that the norm on K is the quotient norm with respect to the H projection π.

Remark. The hypotheses on the spaces s imply that K is an admissible H H RKHS: from the hypothesis on the RKHS Hs the map x vs(x) is differ- 7→ entiable at any point x Ω for almost all s [0, 1] and vs ,∞ Cs vs s. ∈ ∈ | |1 ≤ | | By applying the Cauchy-Schwarz’s inequality we obtain

Z 1 2 Z 1 2 Z 1 Z 1 2 2 vs ds Cs vs s ds Cs ds vs s ds 0 1,∞ ≤ 0 | | ≤ ≤ 0 0 | | and hence s Z 1 2 u 1,∞ Cs ds us K . | | ≤ 0 | | In analogy to the discrete sum of kernels we can define the matching problem with a continuum of scales.

Definition 2.26 (LDM with an Integral over Kernels). Registering I0 to

I1 is done by finding the minimizer u(t) of

Z 1 1 2 1 2 E(u) = u(t) K dt + 2 ϕ(1).I0 I1 , (2.41) 2 0 | | 2σ k − k

R 1 where K = 0 Ks ds is the integral over the scales Ks.

The aim of this section is to give a geometric interpretation of this match- ing procedure. In particular we want to decompose the minimizing flow of diffeomorphisms ϕ(t), such that the effect of each scale becomes visible. In order to do this decomposition we define:

Z s ψs(t) is the flow in t of ur(t) dr . (2.42) 0

57 R s This flow exists, since the vector field 0 ur(t) dr belongs to the RKHS R s corresponding to the kernel 0 Kr dr and this is again an admissible space. The following theorem allows us to, at least formally, interchange time and scale in the flow ψs(t). It can be seen as a corollary of Lemma 2.1, in the sense that it computes the variation of the flow ψs(t) with respect to the scale variable s. The point of view in this proof is that the vector fields for the flow in time and for the flow in scale have to satisfy a compability condition to generate the same two parameter familt of diffeomorphisms.

Theorem 2.27. For each fixed t, the one-parameter family s ψs(t) can 7→ be regarded as the flow of the vector field

Z t Ad Ad 1 u (r) dr . ψs(t) ψs(r)− s 0

To prove this theorem we will use the following lemma.

Lemma 2.28. Let u(s, t, x) and v(s, t, x) be two-parameter families of vec- tor fields which are C2 in the (s, t)-variables and C1 in x. If they satisfy

∂su(s, t, x) ∂tv(s, t, x) = [u(s, t), v(s, t)](x) , (2.43) − where [u, v] = Dv.u Du.v is the Jacobi-Lie algebra bracket, and if v(s, 0) − ≡ 0 for all s, then the flow of u(s, .) for fixed s coincides with the flow of v(., t) for fixed t.

Proof. Denote by as(t) the flow of u(s, .) in t. Then

∂t∂sas(t) = ∂s∂tas(t) = ∂s(u(s, t) as(t)) ◦ = ∂su(s, t) as(t) + Du(s, t, as(t)).∂sas(t) ◦ = ∂tv(s, t) as(t) + [u(s, t), v(s, t)] as(t) + Du(s, t, as(t)).∂sas(t) ◦ ◦ = ∂t (v(s, t) as(t)) + Du(s, t, as(t)). (∂sas(t) v(s, t) as(t)) . ◦ − ◦

This implies that bs(t) := ∂sas(t) v(s, t) as(t) is the solution of the ODE − ◦

∂tbs(t) = Du(s, t, as(t)).bs(t) . (2.44)

Since for t = 0 we have bs(0) = ∂sas(0) v(s, 0) as(t) = 0, it follows that − ◦

58 bs(t) 0 is the unique solution of (2.44). This means that ≡

∂sas(t) = v(s, t) as(t) , (2.45) ◦ i.e. the flows of u(s, .) in t and of v(., t) in s coincide.

Proof of theorem. We apply Lemma 2.28 to the vector fields

Z s Z t u (t) dr and Ad Ad 1 u (r) dr . r ψs(t) ψs(r)− s 0 0

We can differentiate Ad using the following rule

−1 ∂t Adg(t) u = [∂tg(t)g(t) , Adg(t) u] . (2.46)

This can be seen by writing

∂t Adg t u t=t = ∂t Ad 1 Adg t u t=t ( ) | 0 g(t)g(t0)− ( 0) | 0 = ad 1 Ad u ∂tg(t)g(t0)− |t=t0 g(t0) −1 = [∂tg(t)g(t ) t t , Ad u] . 0 | = 0 g(t0)

Using this we can verify the compatibility condition

Z s   Z t  ∂ u (t) dr ∂ Ad Ad 1 u (r) dr = s r t ψs(t) ψs(r)− s 0 − 0  Z t  −1 = u (t) + ∂ ψ (t)ψ (t) , Ad Ad 1 u (r) dr u (t) s t s s ψs(t) ψs(r)− s s 0 − Z s Z t  = u (t) dr, Ad Ad 1 u (r) dr . s ψs(t) ψs(r)− s 0 0

The condition u(s, 0) 0 is trivially satisfied. This concludes the proof. ≡

Theorem 2.27 gives us a way to decompose the matching diffeomorphism ϕ(1) into separate scales. As we follow the flow η(s), we add more and more scales, starting from the identity, when no scales are taken into account and finishing with ϕ(1), which includes all scales. In this sense η(s)−1 η(t) ◦ contains the scale information for the scales in the interval [s, t]. In the papers [77] and [76] a different approach to registration with a continuum of scales was proposed, called registration with a kernel bundle.

59 The term kernel bundle refers to the one-parameter family (Ks)s∈[0,1] of kernels.

Definition 2.29 (LDM with a Kernel Bundle). Registering I0 to I1 is done by finding the one-parameter family us(t) of vector fields, which minimizes

Z 1 Z 1 1 2 1 2 E(us) = us(t) s ds dt + 2 ϕ(1).I0 Itarg , (2.47) 2 0 0 | | 2σ k − k

R 1 where ϕ(t) is the flow of the vector field u(t) = 0 us(t) ds. As in finite dimensions these two approaches are equivalent.

Theorem 2.30. The matching problems 2.26 and 2.29 are equivalent. In R 1 particular a minimizer us of (2.47) gives rise to a minimizer 0 us ds of (2.41) and conversely any minimizer u of (2.41) can be decomposed into a R 1 family us such that u = 0 us ds and us minimizes (2.47). Proof. Using theorem 2.25 the proof proceeds in the same way as the proof of theorem 2.18.

2.3.6 Restriction to a Finite Number of Scales

It is of interest to understand the relationship between a continuum of scales and the case, where we have only a finite number of discrete scales. We will see, that it is possible to see the latter case as a special case of the continuum of scales.

Let us start with a family Ks of kernels with s [0, 1], where as before ∈ the scales are ordered from the coarsest to the finest, i.e. s t for H ≤ H s t. Divide the interval [0, 1] into n parts 0 = s < < sn = 1 and ≤ 0 ··· denote the intervals Ik = [sk−1, sk]. To each interval Ik corresponds a kernel R Kk = Ks ds. The discrete sampling map Ik

( 2 L ( ) HK1 HKn Ψ: H → R × · · · ×R (2.48) us ( us ds, . . . , us ds) 7→ I1 In discretizes u L2( ) into n scales. Formally we can introduce a Lie bracket ∈ H on the space L2( ) by defining H  Z s  Z s  [u, v]s = us, vr dr + ur dr, vs . (2.49) 0 0

60 Using this bracket the sampling map Ψ is a Lie algebra homomorphism as shown in the next theorem.

Theorem 2.31. The sampling map Ψ is a Lie algebra homomorphism from the Lie algebra L2( ) with the bracket defined in (2.49) into the n-fold H semidirect product with the bracket

[(u1, . . . , un), (v1, . . . , vn)] = k−1 k−1 X X = ([u1, v1],..., [uk, vi] + [ ui, vk] + [uk, vk],...) (2.50) i=1 i=1

Proof. Using the definitions we first compute " # " # Z sk Z sk 1 Z sk 1 Z sk − − [Ψ(u), Ψ(v)]k = us ds, vs ds + us ds, vs ds + sk 1 0 0 sk 1 "− # − Z sk Z sk + us ds, vs ds sk 1 sk 1 − − Z sk Z sk  Z sk 1 Z sk 1  − − = us ds, vs ds us ds, vs ds , 0 0 − 0 0 and then write the other side

Z sk  Z s  Z s  Ψ([u, v])k = us, vr dr + ur dr, vs ds . sk 1 0 0 − Below we interchange the order of integration in the first summand and merely switch s and r in the second summand to obtain

Z sk  Z s  Z s  us, vr dr + ur dr, vs ds = 0 0 0 Z sk Z sk Z sk Z sk = 1r≤s[us, vr] dr ds + 1s≤r[us, vr] ds dr 0 0 0 0 Z sk Z sk = [us, vr] ds dr 0 0 Z sk Z sk  = us ds, vs ds . 0 0

Decomposing the integral into

Z sk Z sk 1 − Ψ([u, v])k = ... ds ... ds 0 − 0

61 finishes the proof.

The diffeomorphisms ϕk at each scale, that were defined in definition 2.24 R are also contained in the continuous setting. If vk = us ds is the k-th Ik component of the sampling map, then ϕ (t) ... ϕk(t) is the flow of the 1 ◦ ◦ vector field v (t) + + vk(t) and we have 1 ··· Z sk v1(t) + + vk(t) = us(t) ds . (2.51) ··· 0

Hence we obtain the identity ϕ (t) ... ϕk(t) = ψs (t), where ψs was 1 ◦ ◦ k k defined in (2.42). In particular we retrieve

−1 ϕk(t) = ψsk 1 (t) ψsk (t) (2.52) − ◦ the scale decomposition of the discrete case as a continuous scale decompo- sition evaluated at specific points. For t = 1 this relation can be written in terms of the scale flow η(s) as

−1 ϕk(t) = η(sk− ) η(sk) . 1 ◦

Now we leave the matching problems for a while and look in more detail into some properties, which are governed by the choice of the kernel or equivalently the norm on the space of vector fields and how it influences the geometry of the corresponding group of diffeomorphisms. We will return to matching problems in Chapter 4.

62 3 Geodesic Distance on Diffeomorphism and Related Groups

3.1 Overview of the Results

The aim of this chapter is to study properties of geodesic distance and the existence of length-minimizing geodesics. The geodesic distance between two points p , p M of a Riemannian manifold M is defined as the infimum 0 1 ∈ over the length of all paths connecting them,

Z 1 q dist(p0, p1) = inf L(p) = inf Gp(t)(p ˙(t), p˙(t)) dt . p(0)=p0, p(0)=p0, 0 p(1)=p1 p(1)=p1

These questions were motivated by [56], where it was shown that the right- 2 invariant L -metric vanishes on the group Diffc(M) of compactly supported diffeomorphisms of a Riemannian manifold. It was also shown in the same paper that the geodesic distance is positive for the right-invariant H1-metric. This leads to the following two questions:

Since it is possible to define Sobolev spaces of fractional order and • consider the corresponding metrics, how does the geodesic distance behave for the metrics Hs with order 0 < s < 1? For which orders does the distance vanish?

What is the behavior of the metric on the Virasoro-Bott group, which • is a central extension of Diff(S1)? Does the geodesic distance vanish there as well?

We give a partial answer to the first question in one dimension in form of the following theorem.

63 Theorem 3.1 (Geodesic distance).

1. The geodesic distance for the fractional order Sobolev-type Hs-metric vanishes for

1 0 s < on Diffc(R), • ≤ 2 0 s 1 on Diff(S1). • ≤ ≤ 2 2. The geodesic distance is positive for

1 1 < s on Diffc(R) and Diff(S ). • 2 This work was published in [6]. Concerning the second question, we provide an answer for the L2-metric on the Virasoro-Bott group, for which the distance vanishes as well.

Theorem 3.2. The geodesic distance on the Virasoro-Bott group R c × 2 DiffS (R) vanishes for the right invariant L -metric.

This work was published in [8]. The third question we study concerns the local minima of the length functional. It is shown in [56] that the length functional for the L2-metric of the diffeomorphism group Diffc(M), when restricted to paths with fixed endpoints, has no global minima, since for ϕ , ϕ Diffc(M) we have 0 1 ∈

L2 0 = dist (ϕ0, ϕ1) = inf L(ϕ) . ϕ(0)=ϕ0, ϕ(1)=ϕ1

This result however does not rule out local minima. Locally is understood with respect to the C∞-topology on the space of paths and is defined in Definition 3.18. We show in Section 3.5 that the length functional for the 2 L -metric on DiffS (R) and the Virasoro-Bott group has no local minima.

Theorem 3.3. Let ϕ(t, x) with t [0,T ] be a path in DiffS (R). Then in ∈ any C∞-neighborhood of the path there exists a path ψ(t, x) with the same endpoints and smaller length

L(ψ) < L(ϕ) .

The same result holds for the Virasoro-Bott group R c DiffS (R). ×

64 This implies in particular that all geodesics, which are by definition crit- ical points of the length functional, are saddle points.

Corollary 3.4. The length and energy functionals on DiffS (R) and the Virasoro-Bott group, even when restricted to paths with fixed endpoints, have no local minima. All stationary points are therefore saddle-points.

It is possible to smoothly deform a geodesic, while keeping the endpoints fixed and obtain paths, which are shorter. However, since the deformed paths won’t in general be geodesics any more, this does not state anything about conjugate points.

3.2 Mathematical Background

3.2.1 Diffeomorphism Groups

The main objects of this chapter are the groups Diffc(R) and DiffS (R), which are the groups of compactly supported diffeomorphisms and the dif- feomorphism that decay rapidly to the identity. For the circle S1 the group 1 1 Diffc(S ) consists of all smooth diffeomorphisms, since S is compact, and so we will simply write Diff(S1). The space (R) consists of rapidly decaying S functions.

Definition 3.5. The space (R) consists of all functions f : R R for S → which 2 k m f Sk,m = (1 + x ) ∂x f(x) ∞, k, m N k k k | | k ∈ is finite and we equip it with the topology defined by these seminorms. The space DiffS (R) is defined by

DiffS (R) = ϕ Diff(R) ϕ Id (R) . { ∈ | − ∈ S }

We quote the following results from [47] and [55] concerning the manifold structure of the diffeomorphism groups.

Theorem 3.6 (Section 43.1 in [47]). Let M be a smooth manifold. The group Diffc(M) of all smooth, compactly supported diffeomorphisms on M is an open submanifold of C∞(M,M). Composition and inversion are smooth operations. The Lie algebra of the infinite-dimensional Lie group Diffc(M)

65 is Xc(M), the space of all compactly supported vector fields with the negative of the usual bracket as Lie bracket.

Theorem 3.7 (Section 6 in [55]). The group DiffS (R) of all smooth dif- feomorphisms on M, which decay rapidly to the identity, is an open sub- manifold of the space Id + (R). Composition and inversion are smooth S operations. The Lie algebra of the infinite-dimensional Lie group DiffS (M) is XS (M) the space of all rapidly decaying vector fields with the negative of the usual bracket as Lie bracket.

3.2.2 Sobolev Spaces on Manifolds

Although Sobolev spaces on R and S1 could be introduced directly via the Fourier transform on R and Fourier series on S1, we will treat them in a unified way by regarding S1 as a manifold and working with local charts, which make S1 look like R. As we will argue in the outlook, pending the resolution of a technical difficulty, this will enable us to extend the state- ment of Theorem 3.1 about vanishing of the geodesic distance to arbitrary manifolds. This should justify the use of additional tools, needed to de- fine Sobolev spaces of fractional order on general Riemannian manifolds of bounded geometry. s n For s > 0 the Sobolev H -norm of an R-valued function f on R is defined as 2 −1 2 s 2 f s n = (1 + ξ ) 2 f 2 n , k kH (R ) kF | | F kL (R ) where is the Fourier transform F Z − n −ihx,ξi f(ξ) = (2π) 2 e f(x) dx F Rn and ξ is the independent variable in the frequency domain. Using this norm we obtain the Sobolev space for non- values of s,

s n 2 n H (R ) = f L (R ): f Hs( n) < . { ∈ k k R ∞}

These spaces are also known under the name Liouville spaces or Bessel potential spaces. To make a connection with other families of function

66 s n spaces, we note that the spaces H (R ) coincide with

s n s n s n H (R ) = B22(R ) = F22(R ) ,

s n s n the Besov spaces B22(R ) and spaces of Triebel-Lizorkin type F22(R ). Def- initions of these spaces and an introduction to the general theory of function spaces can be found in [81, Section 1]. Following [81, Section 7.2.1] we will now introduce the spaces Hs(M) on a manifold M. If M is not compact we equip M with a Riemannian metric g of bounded geometry, the existence of which is guaranteed by [30]. This means that:

(I) The injectivity radius of (M, g) is positive.

(B∞) Each iterated covariant derivative of the curvature is uniformly g- i bounded: R g < Ci for i = 0, 1, 2,... . k∇ k The following is a compilation of special cases of results collected in [25, Chapter 1], which treats Sobolev spaces only for integer order.

Theorem 3.8 ([46], [75], [24]). If (M, g) satisfies (I) and (B∞) then the following holds:

1. (M, g) is complete.

2. There exists ε > 0 such that for each ε (0, ε ) there is a countable 0 ∈ 0 cover of M by geodesic balls Bε(xα) such that the cover of M by the

balls B2ε(xα) is still uniformly locally finite. P 3. Moreover there exists a partition of unity 1 = α ρα on M such that ∞ β ρα 0, ρα C (M), supp(ρα) B ε(xα), and Du ρα < Cβ where ≥ ∈ c ⊂ 2 | | u are normal (Riemann exponential) coordinates in B2ε(xα).

4. In each B2ε(xα), in normal coordinates, we have

β 0 β ij 00 β m 000 D gij < C , D g < C , and D Γ < C , | u | β | u | β | u ij | β

where all constants are independent of α.

We can define the Hs-norm of a function f on M using a partition of

unity (ρα), normal coordinates in the support of each function ϕα and the

67 usual Hs-norm on the preimage of the coordinate chart.

∞ 2 X 2 f s = (ραf) exp s n k kH (M,g) k ◦ xα kH (R ) α=0 ∞ X −1 2 s 2 = (1 + ξ ) 2 ((ραf) exp ) 2 n . kF | | F ◦ xα kL (R ) α=0

If M is compact the sum can be chosen to be finite. Changing the charts or the partition of unity leads to equivalent norms by the proposition above, see [81, Theorem 7.2.3]. For integer s we get norms which are equivalent to the Sobolev norms treated in [25, Chapter 2]. The norms depend on the choice of the Riemann metric g. This dependence is worked out in detail in [25]. For vector fields we use the local trivialization of the tangent bundle that is induced by the coordinate charts and define the norm in each coordinate as above. This leads to a (up to equivalence) well-defined Hs-norm on the

Lie algebra Xc(M).

3.2.3 Sobolev Metrics on Diffc(M)

Given a norm on Xc(M) we can use the right-multiplication in the diffeomor- phism group Diffc(M) to extend this norm to a right-invariant Riemannian metric on Diffc(M). In detail, given ϕ Diffc(M) and X,Y Tϕ Diffc(M) ∈ ∈ we define s −1 −1 G (X,Y ) = X ϕ ,Y ϕ s . ϕ h ◦ ◦ iH (M) We are interested in questions of vanishing and non-vanishing of geodesic distance. These properties are invariant under changes to equivalent inner products, since equivalent inner products on the Lie algebra

1 X,Y X,Y C X,Y , C h i1 ≤ h i2 ≤ h i1 imply that the geodesic distance metrics will be equivalent as metric spaces,

1 dist1(ϕ, ψ) dist2(ϕ, ψ) √C dist1(ϕ, ψ) . √C ≤ ≤

We are concerned with the question of vanishing or non-vanishing of geodesic distance and this property is invariant under the change to an equivalent

68 metric. Therefore the ambiguity in the definition of the Hs-norm, in partic- ular the dependence on the underlying Riemannian metric and the choice of the partition of unity, is of no concern to us.

3.2.4 Virasoro-Bott Group

The Virasoro-Bott group is a central extension of Diff(S1) (or Diff(R)). To be concrete we shall write all results for the analytically most difficult case DiffS (R) with the understanding that mutatis mutandis the results are also 1 true for Diffc(R) and Diff(S ). Define

c : DiffS (R) DiffS (R) R × → 1 Z 1 Z c(ϕ, ψ) := log(ϕ ψ)0 d log ψ0 = log(ϕ0 ψ) d log ψ0 . 2 ◦ 2 ◦

Then c satisfies c(ϕ, ϕ−1) = 0, c(Id, ψ) = 0, c(ϕ, Id) = 0 and is a smooth group cocycle, called the Bott cocycle:

c(ϕ , ϕ ) c(ϕ ϕ , ϕ ) + c(ϕ , ϕ ϕ ) c(ϕ , ϕ ) = 0 . 2 3 − 1 ◦ 2 3 1 2 ◦ 3 − 1 2

The corresponding central extension group R c DiffS (R), called the × Virasoro-Bott group, is a trivial R-bundle R DiffS (R) that becomes a × regular Lie group relative to the operations

ϕψ  ϕ ψ  ϕ−1 ϕ−1 = ◦ , = , α β α + β + c(ϕ, ψ) α α − with ϕ, ψ DiffS (R) and α, β R . The right-invariant velocity of a curve ϕ(t) ∈ ∈ R c DiffS (R) is given by α(t) ∈ ×

   −1  u(t) ∂tϕ(t) ϕ(t) = R◦ ϕtxϕxx a(t) ∂tα(t) 2 dx − ϕx

For a detailed treatment of the Virasoro-Bott group one should consult [44], [55], [59] or the book [34].

69 3.3 Diffeomorphism Groups

The aim of this section is to provide a partial answer to the question, when does the geodesic distance on Diffc(M), equipped with a right-invariant Sobolev metric of order s 0, vanish? The main statement of the section ≥ is the following theorem

Theorem 3.9 (Geodesic distance).

1. The geodesic distance for the fractional order Sobolev-type Hs-metric vanishes for

1 0 s < on Diffc(R), • ≤ 2 0 s 1 on Diff(S1). • ≤ ≤ 2 2. The geodesic distance is positive for

1 1 < s on Diffc(R) and Diff(S ). • 2 In order to prove this theorem we first prove three lemmas for the case dim(M) = 1, i.e. M = R or M = S1. The first of these, Lemma 3.10, treats 2 the most difficult case DiffS (R) with the simplest metric, i.e. the L -metric (s = 0). In this case we can construct an explicit path between any two diffeomorphisms. The second lemma, which is Lemma 3.11 restricts itself to compactly supported diffeomorphisms Diffc(R) but allows the Sobolev- order to vary in the interval 0 s < 1 . The third result, Lemma 3.13, ≤ 2 1 1 works only for the compact manifold S , but allows the order s = 2 . This section relies heavily on the ideas from [56].

Lemma 3.10. Any two diffeomorphisms in DiffS (R) can be connected by a path with arbitrarily short length for the right invariant L2-metric.

Proof. We show that any rapidly decreasing diffeomorphism can be con- nected to the identity by an arbitrarily short path. We will write this diffeomorphism as Id +g, where g (R) is a rapidly decreasing function ∈ S with g0 > 1. For λ = 1 ε < 1 we define − −

ϕ(t, x) = x + max(0, min(t λx, g(x))) max(0, min(t + λx, g(x))) . − − −

This is a (non-smooth) path defined for t ( , ) connecting the identity ∈ −∞ ∞

70 y

Id +g

ϕ(t , ) 1 · ϕ(t , ) 0 ·

x

Figure 3.1: Sketch of the path ϕ(t, x) between Id and Id +g.

in DiffS (R) with the diffeomorphism Id +g. We define

ψ(t, x) = ϕ(tan(t), x) ?Gε(t, x) ,

1 t x where G(t, x) = 2 G1(  ,  ) is a smoothing kernel with supp(G) Bε(0) RR ⊆ and G dx dt = 1. Thus ψ is a smooth path defined on the finite interval π π < t < connecting the identity in DiffS(R) with a diffeomorphism − 2 2 arbitrarily close to (Id +g) for ε small. See Figure 3.1 for an illustration of this path.

The L2-energy of ψ is

π π Z 2 Z Z 2 Z E(ψ) = (ψ ψ−1)2 dx dt = ψ2ψ dx dt . π t π t x − ◦ − 2 R 2 R

We have

∂a max(0, min(a, b)) = 10≤a≤b , ∂b max(0, min(a, b)) = 10≤b≤a and therefore

ψx(t, x) = ϕx(tan(t), x) ?Gε = 1 λ1 + g0(x)1 − 0≤tan(t)−λx≤g(x) 0≤g(x)≤tan(t)−λx 0  λ1 + g (x)1 ?Gε − 0≤tan(t)+λx≤−g(x) 0≤−g(x)≤tan(t)+λx

71 2  ψt(t, x) = (1 + tan(t) )ϕt(tan(t), x) ?Gε 2  = (1 + tan(t) )(1 1 ) ?Gε . 0≤tan(t)−λx≤g(x) − 0≤tan(t)+λx≤−g(x)

Note that these functions have disjoint support when ε = 0, λ = 1 ε = 1. − Claim. The mappings ε ψt and ε (ψx 1) are continuous into each 7→ 7→ − Lp-space for p 1. To prove the claim we calculate ≥ π Z 2 Z p ZZ (1 + tan(t)2)ϕ (tan(t), x) dx dt = ϕ (t, x)p(1 + t2)p−1 dx dt π t t − 2 2 R R ZZ 1 1 2 p−1 = ( 0≤t−λx≤g(x) + 0≤t+λx≤−g(x))(1 + t ) dx dt R2 Z Z λx+g(x) Z Z λx = (1 + t2)p−1 dt dx + (1 + t2)p−1 dt dx g(x)≥0 λx g(x)<0 λx+g(x) Z Z t=λx+g(x) = F (t) t=λx dx = F (λx + g(x)) F (λx) dx , R R | − | where F (λx + g(x)) F (λx) is a polynomial without constant term in g(x) − with coefficients also powers of λx. Since g is rapidly decreasing this shows 2 that (1 + tan(t) )ϕt(tan(t), x) p depends continuously on ε . Furthermore k k 2 the sequence (1+tan(t) )ϕt(tan(t), x) converges almost everywhere for ε → 0, thus it also converges in measure. By the theorem of Vitali this implies convergence in Lp, see for example [70, Theorem 16.6]. Convolution with p Gε acts as approximate unit in each L , which proves the claim for ψt. For

ψx 1 the proof is similar. − Finally the energy

π π π Z 2 Z Z 2 Z Z 2 Z E(ψ) = ψ2ψ dx dt = ψ2(ψ 1) dx dt + ψ2 dx dt , π t x π t x π t − − − − 2 R 2 R 2 R when viewed as a mapping on L4 L2 (first summand) and on L2 (second × summand), is continuous in ε. It also vanishes at ε = 0 since then ψx and

ψt have disjoint support. To pass from the energy to the length we use Cauchy-Schwarz inequality Len(ψ)2 < πE(ψ) and note that all the paths are defined on the interval [ π , π ], which does not depend on ε. − 2 2 π Finally we note that ψ( 2 ) = (Id +g) ?Gε is arbitrarily close to Id +g and hence the affine path t t(Id +g) + (1 t)(Id +g) ?Gε will have arbitrary 7→ − small length for ε small. This shows that any diffeomorphism Id +g can be

72 joined to Id by a path of arbitrary short L2-length.

f(t)

ϕ(t, x)

x g(t)

α t t = x cot α t = g−1(φ(t, x))

Figure 3.2: Sketch of the vector field u(t, x). The gray area represents the support of u and one integral curve of u( , x) is shown. ·

Lemma 3.11. Let ϕ Diffc(R) be a diffeomorphism satisfying ϕ(x) x. ∈ ≥ For 0 s < 1 we can connect ϕ to the identity by a path of arbitrary short ≤ 2 length with respect to the right-invariant Hs-metric.

Proof. The idea of the proof is as follows. Given the diffeomorphism ϕ with ϕ(x) > x we will construct a family of paths of the form

1 u(t, x) = [g(t),f(t)] ?Gε(x) , such that their flow ϕ(t, x) satisfy ϕ(0, x) = x and ϕ(T, x) = ϕ(x). In a second step we will show, that when f g ∞ is sufficiently small, so is also s k − k 1 x the H -length of the path ϕ(t, x). Here Gε(x) = ε G1( ε ) is a smoothing kernel, with G1 a smooth bump function. In the proof of Lemma 3.10 we defined the path directly on the diffeomorphisms. In this proof, we define it on the vector-fields, since that makes it easier to estimate the Hs-norm of the path. It is however more difficult to control the endpoint of the path. So let us construct the vector field u(t, x). If we could disregard continuity,

73 π we could choose an angle α > 4 and set

f(t) = t tan α g−1(x) = x (1 cot α)ϕ−1(x) . − −

See Figure 3.2 for an illustration of the vector field u and its flow. The flow 1 ϕ(t, x) of the unsmoothed vector field u(t, x) = [g(t),f(t)](x) satisfies

Z t ϕ(t, x) = x + u(s, ϕ(s, x)) ds . 0

In this case we can write down the explicit solution, which is given by  x, t < x cot α  ϕ(t, x) = t + (1 cot α)x, x cot α t ϕ(x) (1 cot α)x  − ≤ ≤ − − ϕ(x), t > ϕ(x) (1 cot α)x − − and we see that it satisfies the boundary conditions. We also have the relation f −1(x) g−1(x) = (1 cot α)(x ϕ−1(x)) , − − − − which implies that by choosing α sufficiently close to π we can make f 4 k − g ∞ as small as necessary. By replacing u with the smoothed vector field k 1 [g(t),f(t)] ?Gε(x) we change the endpoint of the flow. However, by changing g suitably we can regain control of the endpoint. The necessary changes will be of order ε and hence we don’t loose control over the difference f g ∞, k − k which will be necessary later on. Now we compute the norm of this vector field. Let u(t, x) have the form 1 u(t, x) = [g(t),f(t)] ?Gε(x), where f(t) and g(t) are smooth functions which coincide off a bounded interval. To compute the Hs-norm of u, we first need to compute its Fourier transform

Z f(t) iξx x=f(t) 1 1 iξx 1 e [g(t),f(t)](ξ) = e dx = F √2π g(t) √2π iξ x=g(t) 1 eiξf(t) eiξg(t) = − √2π iξ f(t) g(t) f(t) g(t) iξ − −iξ − 1 1 iξ f(t)+g(t) e 2 e 2 = e 2 − √2π ξ i

74   2 1 iξ f(t)+g(t) f(t) g(t) = e 2 sin ξ − . √2π ξ 2

f(t)−g(t) Setting a = 2 we can now compute the norm

s 2 s 2 ξ u 2 = ξ 1 Gε 2 k F kL k F [g(t),f(t)]F kL Z 2 1 = sin2(aξ)( G (εξ))2dξ π ξ 2−2s F 1 R | | 2 Z sin2(aξ) G 2 dξ ≤ π kF 1k∞ ξ 2−2s R | | 2 Z sin2(ξ) G 2 a1−2s dξ . ≤ π kF 1k∞ ξ 2−2s R | | 2 We get u L2 by setting s = 0 in the above calculation. We see that for 1 k k s s < 2 the H -norm of u(t, .) is bounded by

2 1−2s u(t, .) s C f(t) g(t) + C f(t) g(t) . k kH ≤ 1| − | 2| − |

Now, putting everything together we get

Z T 2 Z T 2 2 Len(ϕ) = u(t, .) Hs dt T u(t, .) Hs dt 0 k k ≤ 0 k k 2 1−2s T C f g ∞ + C f g . ≤ 1k − k 2k − k∞

Since the geodesic length is defined as the infimum over all paths and since we have shown in the first part of the proof that by choosing the angle α and the smoothing factor ε, we can control the norm f g ∞, the proof k − k is complete.

1 For the Sobolev metric of order s = 2 we will restrict ourselves to the circle S1. Using a construction similar to that in Lemma 3.11 we will construct arbitrary short paths from the identity to the shift ϕ(x) = x + 1. The following lemma supplies us with functions that have small H1/2-norm and large L∞-norm at the same time. We cannot use step functions as in the above proof, because they don’t lie in H1/2(S1).

Lemma 3.12. Let ψ be a non-negative, compactly supported C∞-function ∞ on R and (bj)j=0 a non-increasing sequence of non-negative numbers, with P∞ 2 1/2 P∞ j b < . Then the H -norm of the function f(x) := bjψ(2 x) j=0 j ∞ j=0

75 is bounded by ∞ X f 2 C b2 . k kH1/2 ≤ j j=0 Proof. This result is shown in step 4 of the proof of Theorem 13.2 in [82].

The main difference between Lemma 3.11 and the following Lemma 3.13 is that on S1 we don’t have to worry about the diffeomorphisms having compact support. On R the diffeomorphism ϕ(x) = x + 1 is not element of Diffc(R) and we would have to replace it by ϕ(x) = x + c(x), where c(x) is some function with compact support. This makes working on the circle much easier.

Lemma 3.13. Let ϕ Diff(S1) be the shift by 1, i.e. ϕ(x) = x + 1. Then ∈ we can connect ϕ to the identity by a path of arbitrary short length with 1 respect to the right-invariant H 2 -metric.

Proof. We will prove the lemma by constructing a sequence of vector fields 1/2 with arbitrary small H -norms, whose flows at time t = Tend will be 1 ϕ(Tend, x) = x + 1. First we apply Lemma 3.12 with bj = N for j = 2 1 0,...,N 1 and 0 otherwise. By doing so we obtain f H1/2 C N , while − k k 1 ≤ 1 x 2 f ∞ = 1. For the basic function ψ(x) we choose ψ(x) = e −| | . Note that k k supp(ψ) [ 1, 1] and that ψ is concave in a neighborhood around 0. Since ⊆ − each f is a finite sum of ψ(2jx), these properties hold also for f. We define the vector field

u(t, x) = λf(t x) with 0 < λ < 1 − for t [0,T ], where T will be specified later. The energy of this path ∈ end end is bounded by

Z Tend 2 1 E(u) = u(t, .) H1/2 dt CTend 0 k k ≤ N and hence can be made as small as necessary and the same holds for the length 1 Len(u)2 T E(u) CT 2 . ≤ end ≤ end N

It remains to show that the flow of this vector field at time t = Tend is indeed ϕ(Tend, x) = x + 1.

76 We do this in several steps. First we consider this vector field defined on all of R with time going from to . The initial condition for the flow −∞ ∞ is ϕ( , x) = x. Since u(t, x) has compact support in x, this doesn’t cause −∞ any analytical problems. As long as λ < 1 each integral curve of u will leave the support of u after finite time. Therefore we can consider ϕ( , x) to be ∞ the endpoint of the flow. Next we will establish that ϕ( , x) = x + S is a ∞ uniform shift, that is independent of x. Then we show that by appropriately choosing λ we can control the amount of shifting, in particular we can always obtain S = 1. Then we find bounds for the time each integral curve spends in the support of u. By showing that this time is only dependent on S, but not on the specific form of f or ψ, we will know that Tend doesn’t grow larger as we let N . In the last step we go back to the circle, define → ∞ Tend, start the flow at time t = 0 and show that the resulting flow is a shift by 1 at time Tend. This will conclude the proof. The flow ϕ(t, x) of u(t, x) is given by the equation

∂tϕ(t, x) = u(t, ϕ(t, x)) with the initial condition ϕ( , x) = x. Define the function ax(t) = t −∞ − ϕ(t, x). Because ∂tax(t) = 1 λf(ax(t)) > 0 the function ax(.) is a diffeomor- − phism in t for each fixed x. Since supp(f) [ 1, 1], we have ∂tϕ(t, x) = 0 ⊆ − 6 only for t [a−1( 1), a−1(1)]. Let us define T = a−1(1) a−1( 1) to ∈ x − x shift x − x − be the time necessary for the flow to pass through the vector field u.

Claim A. Tshift is independent of x. This follows from the following symmetries of the flow ϕ(t, x) and the map ax(t). We have ϕ(t, x) = ϕ(t (x y), y) + x y − − − and

ax(t) = ay(t (x y)) . − − To prove the first identity assume x > y and note that at time t = y 1 0 − we have ϕ(y 1, x) = x = x + ϕ(y 1 (x y), y) y − − − − − since at time y 1 (x y) the flow ϕ(t, y) still equals y. Now differentiate − − − to see that both functions satisfy the same ODE. The second identity is an immediate consequence of the first one. To prove the claim that Tshift is

77 independent of x, we will show that

−1 ∂xax (t) = 1 .

−1 −1 Start with ax(a (t)) = t, use the symmetry relation a (a (t) x) = t and x 0 x − differentiate with respect to x to obtain

−1 −1 ∂ta (a (t) x)(∂xa (t) 1) = 0 , 0 x − x − which concludes the proof of the claim.

For each x, the flow of the vector field performs a shift [a−1( 1), a−1(1)] x − x given by

1 Z ax− (1) Z 1 λf(t) ϕ( , x) = x + λf(t ϕ(t, x)) dt = x + dt . ∞ 1 − 1 λf(t) ax− (−1) −1 − Define Z 1 λf(t) I(λ) = dt 1 λf(t) −1 − to be the amount of shifting that is taking place as a function of λ. We claim that we can always choose λ close enough to 1, to obtain any shift necessary.

Claim B. As λ 1 the shift I(λ) . → → ∞ Each f, that we choose in our construction, is concave in some small neigh- borhood around 0. So choose a > 0, such that for t [0, a] we have f(t) 1 ∈ ≥ 2 and f(t) 1 ct for some constant c. Then we can estimate the integral ≥ − by

Z a λf(t) λ Z a 1 I(λ) dt dt ≥ 1 λf(t) ≥ 2 1 λ + λct 0 − 0 − λ t=a log(1 λ + λct) ≥ 2 − t=0 λ  λca  log 1 + . ≥ 2 1 λ − From this we can see that I(λ) grows towards infinity as λ 1 . → Claim C. We have control over the time necessary to induce a shift I(λ), via Tshift = 2 + I(λ) .

78 x

2π Tshift

shift

t t0 t + 2π 2 T 0 − end Figure 3.3: Sketch of the vector field u(t, x). The gray area represents the support of u and the blue curves are integral curves of u( , x). ·

−1 −1 −1 1 We had above T = a (1) a ( 1) and also ∂ta (t) = . Hence shift x − x − x 1−λf(t)

Z 1 1 Z 1 λf(t) T = dt = 1 + dt = 2 + I(λ) . shift 1 λf(t) 1 λf(t) −1 − −1 − This concludes the proof of the claim. Now we choose λ such that I(λ) = 1 and we define the flow ϕ(t, x) on the circle with period 2π for time from 0 to T = T + 2π 2 . Note, end shift − in particular, that Tend depends only on the amount of shift and not on the choice of f or N. Claim D. The endpoint of the resulting flow is a constant shift

ϕ(Tend, x) = x + 1 .

We have to consider two cases. First take a point x, such that x / supp(f). ∈ W.l.o.g. we assume that supp(f) is the interval [0, 2] starting at 0, which implies that in this case x > 2 . Then x meets the vector field at time t = x 2 and leaves it again at time t = x 2 + T after being shifted 0 − 1 − shift by one. It would meet the vector field again at time t = x 2+T +2π 2 , 2 − shift − but because x 2 > 0 we have t > T and hence the point doesn’t meet − 2 end the vector field again.

79 Now take a point 0 x 2 . This point starts in the vector field, leaves it ≤ ≤ at some time t < T and meets the vector field again at time t +2π 2 . 0 shift 0 − It then stays in the vector field until the T since T < t +2π 2+T . end end 0 − shift Altogether the point has spent time t + T (t + 2π 2) = T in the 0 end − 0 − shift vector field, so it has also been shifted by one. This concludes the proof.

Before we proceed with the proof of Theorem 3.9, we cite two general results about Sobolev spaces, which will be needed in the proof. The first result states that point-wise multiplication with a sufficiently smooth func- tion is a well-defined operation.

∞ n Lemma 3.14 (See Theorem 4.2.2 in [81]). Let s > 0 and g Cc (R ), ∈ a smooth function with compact support. Then multiplication f gf is a s n 7→ bounded map of H (R ) into itself.

We are also allowed to compose with diffeomorphisms from right and the resulting map is bounded.

n Lemma 3.15 (See Theorem 4.3.2 in [81]). Let s > 0 and ϕ Diffc(R ) ∈ be a diffeomorphism which equals the identity off some compact set. Then s n composition f f ϕ is an isomorphic mapping of H (R ) onto itself. 7→ ◦ Now we have all the ingredients to prove the main theorem.

Proof of Theorem 3.9. To prove vanishing of the geodesic distance we will follow the idea of [56], where it was proven that the geodesic distance van- 2 ishes on Diffc(M) for the right-invariant L -metric. To be precise, we only consider the connected component Diff0(M) of Id, i.e. those diffeomor- phisms of Diffc(M), for which there exist at least one path, joining them to L=0 the identity. Let us denote by Diffc(M) the set of all diffeomorphisms ϕ that can be reached from the identity by curves of arbitrarily short length, i.e., for each ε > 0 there exists a curve from the identity to ϕ with length smaller than ε. L=0 In the following we will show that Diffc(M) is a non-trivial normal subgroup of Diffc(M) (and Diff0(M)). It was shown in [26, 80, 52, 53] that

Diff0(M) is a , i.e. it has no non-trivial normal subgroups. L=0 From this it follows that Diffc(M) is the whole connected component

Diff0(M). In other words, every diffeomorphism that can be connected to the identity, can be connected via a path of arbitrary short length.

80 L=0 Claim A. Diffc(M) is a normal subgroup of Diffc(M).

Given a diffeomorphism ψ Diffc(M), we can choose a partition of unity ∈ τj such that normal coordinates centered at xj M are defined on supp(τj) ∈ and such that normal coordinates centered at ψ(xj) are defined on the −1 set ψ(supp(τj)). Then we can define ψj = exp ψ expx . For ϕ1 ψ(xj ) ◦ ◦ j ∈ L=0 Diffc(M) we choose a curve t ϕ(t, ) from the identity to ϕ with 7→ · 1 −1 −1 length less than ε. Let u = ϕt ϕ and consider the path t ψ ϕ(t) ψ. ◦ 7→ ◦ ◦ Its length can be estimated by

Z 1 −1 −1 −1 −1 Len(ψ ϕ ψ) C1(τ) (T ψ ϕt ψ) (ψ ϕ ψ) Hs(M,τ) dt ◦ ◦ ≤ 0 k ◦ ◦ ◦ ◦ ◦ k Z 1 −1 = C1(τ) (T ψ u ψ Hs(M,τ) dt 0 k ◦ ◦ k v Z 1 u ∞ uX ∗ −1 2 = C1(τ) t expx (τj.T ψ u ψ) s n dt k j ◦ ◦ kH (R ) 0 j=0 v Z 1 u ∞ uX −1 ∗ −1 2 = C1(τ) t T ψ .(exp (τj ψ .u)) ψj s n dt k j ψ(xj ) ◦ ◦ kH (R ) 0 j=0 v Z 1 u ∞ uX ∗ −1 2 C2(ψ, τ) t (exp (τj ψ .u)) s n dt ≤ k ψ(xj ) ◦ kH (R ) 0 j=0 Z 1 = C (ψ, τ) u s 1 dt C (ψ, τ) Len(ϕ) . 2 H (M,τ◦ψ− ) 3 0 k k ≤

Here we used that all partitions of unity τ induce equivalent norms Hs(M, τ) ∞ and that for h C (M) and ψ Diffc(M) point-wise multiplication f ∈ ∈ 7→ h.f and composition f f ψ are bounded linear operators on Hs(M), as 7→ ◦ noted in Lemmas 3.14 and 3.15.

L=0 Claim B. Diffc(M) is a nontrivial subgroup of Diffc(M). This claim follows from Lemma 3.11 for M = R and 0 s < 1 and from ≤ 2 1 1 Lemma 3.13 for M = S and s = 2 .

Non-vanishing. To show, that the geodesic distance doesn’t vanish for 1 dim(M) = 1 and < s let ϕ0, ϕ1 Diffc(R) with ϕ0(x) = ϕ1(x) for some 2 ∈ 6 x R. For any path ϕ(t, ), with ϕ(0, ) = ϕ0 and ϕ(1, ) = ϕ1 we have ∈ · · · Z 1 Z 1

0 = ϕ1(x) ϕ0(x) = ϕt(t, x) dt = u(t, ϕ(t, x)) dt 6 | − | 0 0

81 Z 1 Z 1 Z 1 u(t, ϕ(t, x)) dt u(t, ) ∞ dt u(t, ) 0,s 1/2 dt C − ≤ 0 | | ≤ 0 k · k ≤ 0 k · k Z 1 u(t, ) Hs dt . ≤ 0 k · k

In the last step, we used the Sobolev embedding theorem, see for example [81]. The case dim(M) 2 follows from [56, Theorem 5.7]. ≥

3.4 Virasoro-Bott Group

In this section we extend the results of Section 3.3 to the Virasoro-Bott group and show that the geodesic distance induced by the right-invariant L2-metric vanishes. The main part of the proof is contained in the following Id lemma, where we show that elements of the form R c Diff(R) with a ∈ × a R have zero distance from the identity. The proofs given in this section ∈ apply equally to the group DiffS (R) of rapidly vanishing diffeomorphisms, the group Diffc(R) of compactly supported diffeomorphisms or the the dif- feomorphism group Diff(S1) of the circle. We shall write the proofs for the most difficult case DiffS (R), but any of the groups can be substituted.

Lemma 3.16. For any a R there exists an arbitrarily short path connect- Id Id ∈L2 Id Id ing 0 and a , i.e., distVir 0 , a = 0 . Proof. The aim of the following argument is to construct a family of paths in the diffeomorphism group, parametrized by ε, with the following properties: all paths in the family start and end at the identity and their length in the diffeomorphism group with respect to the L2 metric tends to 0 as ε 0 . → By letting ε be time-dependent we are able to control the endpoint a(T ) of the horizontal lift for certain paths of diffeomorphisms. We consider the function

f(z, a, ε) = max(0, min(z, a)) ?Gε(z)Gε(a) ZZ = max(0, min(z z, a a))Gε(z)Gε(a) dz da − − ZZ (3.1) = max(0, min(z εz, a εa))G (z)G (a) dz da − − 1 1 z a = εf( , , 1) , ε ε

1 z R where Gε(z) = G ( ) is a function with supp(Gε) [ ε, ε] and Gε dx = ε 1 ε ⊆ −

82 Id+g ϕ(t + ∆, )

ϕ(t, )

ϕ(t − ∆, )

Id

x 1 − 1 λ (TA 1) λ TE

Figure 3.4: The path ϕ(t, ) defined in (3.2) connecting Id to Id +g, plotted at t ∆ < t < t + ∆. Between the dashed lines, g 1 is − ≡ constant.

1 . Furthermore, let g : R [0, 1] be a function with compact support 0 → contained in R>0 and g > 1, so that x + g(x) is a diffeomorphism. For − 0 < λ < 1 and t [0,T ] let ∈

ϕ(t, x) = x + f(t λx, g(x), ε(t)) − be the path going away from the identity (since supp(g) R>0, see also ⊂ Figure 3.4). For given ε0 > 0, let

ψ(t, x) = x + f(T t λx, g(x), ε ) (3.2) − − 0 be the path leading back again. The only difference between this construc- tion and [56] or Lemma 3.10 is that the parameter ε may vary along the path.

We shall need some derivatives of ϕ and f:

ϕt(t, x) = fz(t λx, g(x), ε(t)) +ε ˙(t)fε(t λx, g(x), ε(t)) − − 0 ϕx(t, x) = 1 λfz(t λx, g(x), ε(t)) + fa(t λx, g(x), ε(t))g (x) − − − Z z Z a−z fz(z, a, ε) = Gε(w)Gε(w + b) db dw −∞ −∞

83 Z a Z z−a fa(z, a, ε) = Gε(w)Gε(w + b) db dw −∞ −∞ 1  fε(z, a, ε) = f(z, a, ε) zfz(z, a, ε) afa(z, a, ε) ε − − Z a Z z fzz(z, a, ε) = Gε(z) Gε(b) db Gε(w)Gε(w (z a)) dw −∞ − −∞ − −

Claim A. The path ϕ followed by ψ still has arbitrarily small length for the L2-metric. We are working with a fixed time interval [0, 2T ]. Thus arbitrarily small length is equivalent to arbitrarily small energy. The energy is given by ZZ ZZ 2 2 0 ϕ ϕx dx dt = (fz +εf ˙ ε) (1 λfz + fag ) dx dt . (3.3) t −

Looking at the formula for fε we see that εfε is bounded on a domain with bounded a. Thus εf˙ ε ∞ 0 can be achieved by choosing ε, such that k k → 3/2 ε˙ Cε . We will see later that this is possible. Inspecting ϕt(t, x) and | | ≤ looking at the formulas for fz and f we see that for t λx < ε(t) and for − − t λx g(x) > 2ε(t) we have ϕt(t, x) = 0. Thus the domain of integration − − is contained in the compact set

T + g ∞ + 2 ε ∞ T + ε ∞ [0,T ] [ k k k k , k k ] . × − λ λ

Therefore it is enough to show that the L∞-norm of the integrand in (3.3) goes to zero as ε ∞ goes to zero. For all terms involvingεf ˙ ε this is true k k 0 by the above assumption since (1 λfz + fag ) and εfε are bounded. For − 2 2 0 the remaining parts f (1 λfz) and f fag we follow the argumentation of z − z [56]. For t fixed and λ close to 1, the function 1 λfz, when restricted to − the support of fz, is bigger than ε(t) only on an interval of length O(ε(t)). Hence we have

Z T Z Z T Z 2 2 fz (1 λfz) dx dt fz ∞ (1 λfz) dx dt = O( ε ∞) . 0 R − ≤ k k 0 R − k k

2 For the last part we note that the support of fz fa is contained in the set g(x) (t λx) 2ε. Now we define x < x by g(x ) + λx = T 2 ε ∞ | − − | ≤ 0 1 0 0 − k k

84 and g(x ) + λx = T + 2 ε ∞. Then 1 1 k k Z T Z Z 2 0 2 0 fz fag dx dt T fz ∞ fa ∞ g dx 2 0 R ≤ k k k k supp(fz fa)

= T (g(x ) g(x )) 4T ε ∞ . 1 − 0 ≤ k k

The estimate for ψ is similar and easier. This proves claim A.

Claim B.For every a R and δ > 0 we may choose ε(t) with ε ∞ < δ ∈ k k such that

Z T Z Z T Z ϕtxϕxx ψtxψxx 2 dx dt + 2 dx dt = a . 0 R ϕx 0 R ψx We will subject ε and g to several assumptions. First we partition the interval [0,T ] equidistantly into 0 < TA < TE < T and the (t, x)-domain into two parts, namely A1 = ([0,TA] [TE,T ]) R and A2 = [TA,TE] R. ∪ × 1 1 × We want g(x) 1 on a neighborhood of the interval [ (TA 1), TE]. We ≡ λ − λ choose ε(t) to be constant ε(t) ε on [0,TA] [TE,T ] and to be symmetric ≡ 0 ∪ in the sense, that ε(t) = ε(T t). In addition we want ε(t) small enough, 1 − 1 such that g(x) 1 on [ (TA 1 2ε(t)), (TE + ε(t))]. ≡ λ − − λ

On A we have ε(t) ε . This implies ψtx(t, x) = ϕtx(T t, x), 1 ≡ 0 − − ψx(t, x) = ϕx(T t, x) and ψxx(t, x) = ϕxx(T t, x). Hence − − ZZ ZZ ϕtxϕxx ψtxψxx 2 dx dt + 2 dx dt = 0 . A1 ϕx A1 ψx

Let A2 = [TA,TE] R be the region, where ε(t) is not constant. In the × interior, where

ε(t) < t λx < g(x) + 2ε(t) − − t g(x) 2ε(t) < λx < t + ε(t) , − − we have by assumption g(x) 1. Therefore one has in this region: ≡

ϕx(t, x) = λfz(t λx, 1, ε(t)) − − 2 ϕxx(t, x) = λ fzz(t λx, 1, ε(t)) − ϕtx(t, x) = λfzz(t λx, 1, ε(t)) λfεz(t λx, 1, ε(t))ε ˙(t) . − − − −

85 We divide the integral over A2 into two symmetric parts

1 1 Z T/2 Z (t+ε(t)) Z TE Z (t+ε(t)) λ ϕtxϕxx λ ϕtxϕxx dx dt + dx dt 1 ϕ2 1 ϕ2 TA λ (t−1−2ε(t)) x T/2 λ (t−1−2ε(t)) x and apply the following variable substitution to the second integral

1 et = T t, x = x + (et t) . − e λ −

Thus et λx = t λx. Together with ε(t) = ε(et) this implies − e −

ϕx(t, x) = ϕx(et, xe), ϕxx(t, x) = ϕxx(et, xe) .

Sinceε ˙(t) = ε˙(et) changes sign, the term containingε ˙(t) cancels out and − leaves only

ϕtx(t, x) + ϕtx(et, x) = 2λfzz(t λx, 1, ε(t)) . e − −

A simple calculation shows that the integration limits transform

1 1 Z TE Z (t+ε) Z T/2 Z (et+ε) λ ϕtxϕxx λ ϕtxϕxx dx dt = dx det 1 ϕ2 1 ϕ2 e T/2 λ (t−1−2ε) x TA λ (et−1−2ε) x to those of the first integral. Therefore the sum of the integrals gives

ZZ ϕtxϕxx 2 dx dt = A2 ϕx Z T/2 Z 1 (t+ε(t)) 2 λ fzz(t λx, 1, ε(t)) = 2λ3 − dx dt . (3.4) 1 2 − TA (t−1−2ε(t)) 1 λfz(t λx, 1, ε(t)) λ − − From formula (3.1) we see:

z a 1 z a fz(z, a, ε) = fz( , , 1), fzz(z, a, ε) = fzz( , , 1) . ε ε ε ε ε

We can use this to rewrite the above integral: ZZ ϕtxϕxx 2 dx dt = A2 ϕx Z T/2 Z 1 (t+ε(t)) 2 λ fzz(t λx, 1, ε(t)) = 2λ3 − dx dt 1 2 − TA (t−1−2ε(t)) 1 λfz(t λx, 1, ε(t)) λ − −

86 Z T/2 Z 2ε(t)+1 2 fzz(z, 1, ε(t)) = 2λ2 dz dt 2 − TA −ε(t) 1 λfz(z, 1, ε(t)) − Z T/2 Z 2ε(t)+1 z 1 2 1 fzz( ε(t) , ε(t) , 1) = 2λ2 dz dt 2 z 1 2 − TA −ε(t) ε(t) 1 λfz( , , 1) − ε(t) ε(t) Z T/2 Z 2+ 1 1 2 ε(t) 1 fzz(z, ε(t) , 1) = 2λ2 dz dt . 1 2 − TA −1 ε(t) 1 λfz(z, , 1) − ε(t)

Looking at the formula for fzz

Z z 1 1 fzz(z, ε , 1) = G1(z) G1(w)G1(w (z ε )) dw − −∞ − −

1 1 we see that fzz(z, , 1) is non-zero only on the intervals z < 1 and z < ε | | | − ε | 2. For small ε, these are two disjoint regions. Therefore the above integral equals ZZ ϕtxϕxx 2 dx dt = A2 ϕx Z T/2 Z 1 1 2 1 fzz(z, ε(t) , 1) = 2λ2 dz dt 1 2 − TA ε(t) −1 1 λfz(z, , 1) − − ε(t) Z T/2 Z 2 1 1 2 1 fzz(z + ε(t) , ε(t) , 1) = 2λ2 dz dt . 1 1 2 − TA ε(t) −2 1 λfz(z + , , 1) − ε(t) ε(t) For z bounded and sufficiently small ε(t), the functions under the integral do not depend on ε(t) any more as can be seen from the definitions of fz and fzz. Thus

Z 1 1 2 Z 2 1 1 2 fzz(z, ε(t) , 1) fzz(z + ε(t) , ε(t) , 1) I = λ2 dz + λ2 dz , 1 2 1 1 2 −1 1 λfz(z, , 1) −2 1 λfz(z + , , 1) − ε(t) − ε(t) ε(t) is independent of t and we have

ZZ Z TE ϕtxϕxx 1 2 dx dt = I dt . A2 ϕx − TA ε(t)

The same calculations can be repeated for the return path ψ, where ε ε ≡ 0

87 is constant in time:

ZZ Z TE ψtxψxx 1 2 dx dt = I dt . A2 ψx TA ε0

Note that the sign is positive now, which comes from the t-derivative. Putting everything together gives us

ZZ Z TE ϕtxϕxx ψtxψxx  1 1  a = 2 + 2 dx dt = I dt . ϕx ψx TA ε0 − ε(t)

3/2 Let ε(t) = ε0 + ε1ε0 b(t) where b(t) is a bump function with height 1 and 3/2 ε is a small constant. Note that ε(t) satisfies ε˙ b˙ ∞ ε ε . Choosing 1 | | ≤ k k 1 0 ε0 and ε1 small independently we may produce any a R. This concludes ∈ the proof.

Now we use the right-invariance of the metric and Lemma 3.10 to prove the in fact all elements in R c DiffS (R) have distance zero to the identity. × Theorem 3.17. The geodesic distance on the Virasoro-Bott group R c × 2 DiffS (R) vanishes for the right invariant L -metric.

Proof. Let (ϕ, a) R c DiffS (R). By Lemma 3.10 we get a smooth family ∈ × ϕ(δ, t, x) for δ > 0 and t [0, 1] such that ϕ(δ, t, ) DiffS (R), ϕ(δ, 0, ) = ∈ · ∈ · Id , ϕ(δ, 1, ) = ϕ, and such that the length of t ϕ(δ, t, ) is smaller than R · 7→ · δ. Consider the

p R / R c DiffS (R) / DiffS (R) . ×

Then p is a Riemannian submersion for the right invariant L2-metric on DiffS (R), i.e., T p is an on the orthogonal complements of the fibers. These complements are not integrable; in fact the curvature of the corresponding principal connection is given by the Gelfand-Fuks cocycle. For the curve ϕ(δ, t, ) in DiffS (R) its horizontal lift is given by ·  ϕ(δ, t, )  · R t R ϕtxϕxx a(δ, t) = a(δ, 0) 2 dx dt − 0 ϕx since the right translation to (Id, 0) of its velocity should have zero vertical component. The horizontal lift has the same length and energy as ϕ.

88 Now take the horizontal lift (ϕ(δ, t, ), a(δ, t)) R c DiffS (R) of this Id ϕ  · ∈ × family which connects 0 with a(δ,1) for each δ > 0 and has length < δ. But one can see from the proof of Lemma 3.16 that a(δ, 1) may become unbounded for δ 0. → However, using Lemma 3.16 we can find a horizontal path t ψ(δ,t,·) 7→ b(δ,t) for t [0, 1] in the Virasoro group of length < δ connecting Id with ∈ 0 Id . Then the curve t ψ(δ,t,·). ϕ  = ψ(δ,t)◦ϕ  a−a(δ,1) 7→ b(δ,t) a(δ,1) b(δ,t)+a(δ,1)+c(ψ(δ,t),ϕ) ϕ  Id ϕ  ϕ Id  ϕ  connects a(δ,1) = 0 . a(δ,1) with a = a−a(δ,1) . a(δ,1) and it has length Id ϕ < δ. Thus we have constructed a path from 0 to a of total length 2δ. This concludes the proof.

3.5 Local Minima of the Length Functional

We know from the previous sections that the geodesic distance vanishes on Diffc(R) and the Virasoro-Bott group R c Diffc(R) when endowed with the × L2-metric. This is a global result, which shows the existence of paths of arbitrary short length somewhere in the group.

For fixed, distinct endpoints ϕ0, ϕ1 geodesics between ϕ0 and ϕ1 can be defined as critical points of the length functional, when restricted to the space of paths which preserve the endpoints. Since the geodesic distance vanishes, we know that no geodesic between ϕ0 and ϕ1 can be a global minimum of the length functional. But it might be conceivable that (some) geodesics are locally length-minimizing, while the potentially strange paths of arbitrary small length lie somewhere else. Locally is to be understood ∞ with respect to the C -topology on ([0,T ] R), which will be defined S × below. ∞ Definition 3.18. The C -topology on ([0,T ] R) is defined by the family S × of seminorms

X 2 k i j ϕ Sk,m,n = (1 + x ) ∂x∂t ϕ ∞, k, m, n N , k k k | | k ∈ i≤m j≤n with ϕ ([0,T ] R). The group DiffS (R) is an open subset of the set ∈ S × Id + (R) and inherits its topology. S In this section we show that this is not the case. The length functionals for the Virasoro-Bott group and for the diffeomorphism group of R have

89 no local minima. Given any curve, there is a close-by curve with the same endpoints, which is shorter. Hence no geodesic is locally length minimizing. We prove the result for the energy functional and then use a parametriza- tion argument to show that it also holds for length. First the diffeomorphism group.

Theorem 3.19. Let ϕ(t, x) with t [0,T ] be a path in DiffS (R). Let U ∈ be a neighborhood of ϕ in the space ([0,T ] R). Then there exists a path S × ψ U with the same endpoints as ϕ and ∈

E(ψ) < E(ϕ) , where E(.) is the energy of a path w.r.t. the right-invariant L2-metric.

By definition 3.18 every neighborhood U of ϕ will contain an -neighbor- hood of one of the seminorms . k,m,n . Therefore it is sufficient to show k kS that there exists a path ψ with the same endpoints and

ψ ϕ k,m,n < , k − kS which has less energy. This is done in the following lemma.

Lemma 3.20. Let ϕ(t, x) with t [0,T ] be a path in DiffS (R). Given ∈ k, m, n N there exists for small  > 0 and a N>0 a family of paths ∈ ∈ ψ(t, x) with the same endpoints, close to the original path

 a ϕ ψ k,m,n <  k − kS which use less energy. Furthermore there exists a constant C > 0 such that we have a lower bound for the energy saved

E(ϕ) E(ψ) Cm+a. − ≥

The power a will be needed when we extend the argument to the Virasoro- Bott group. For now we can simply use a = 1. Let us first summarize the idea of the proof. The energy of the path is given by Z T Z 2 E(ψ) = ψt ψx dx dt . 0 R

90 The paths with small energy, which were constructed in Lemma 3.10 had essentially disjoint supports for ψt and ψx. Given that each point x R has ∈ to be moved from ψ(0, x) to ψ(1, x) we can arrange to move the point, when the closeby points have a similar value of ψ(t, .), i.e. when ψt(t, x) is small. The idea of the proof is to define a time-reparametrization r(t, x), which will depend on the location of space, such that when points are moved the value of (ψ r)t(t, x) will be smaller than for the original path. Careful balancing ◦ is neccassary, since such a time-reparametrization will also change the other term in the energy.

Proof. Let r(t, x) : [0,T ] R [0,T ] be a smooth function with the property × → that for each x R it is a reparametrization of the time interval. Define ∈ the new path via ψ(t, x) = ϕ(r(t, x), x) .

Claim. If r(t, x) t Cm,n <  then ψ ϕ k,m,n D with a constant k − k k − kS ≤ that only depends on ϕ k,m,n+m+1 . k kS We postpone the proof of the claim to the end of the proof. Now we want to compute E(ψ).

Z T Z 2 E(ψ) = ψt ψx dx dt 0 R Z T Z 2 2 = rt(t, x) ϕt(r(t, x), x) (rx(t, x)ϕt(r(t, x), x) + ϕx(r(t, x), x)) dx dt 0 R Z T Z −1 2 −1 = rt(r (t, x), x)ϕt(t, x) (rx(r (t, x), x)ϕt(t, x) + ϕx(t, x)) dx dt 0 R

We can express the derivatives of r using the derivatives of r−1 via the following rules

r(r−1(t, x), x) = t

−1 1 rt(r (t, x), x) = −1 (r )t(t, x) −1 −1 −1 rx(r (t, x), x) = (r )x(t, x)rt(r (t, x), x) − −1 (r )x(t, x) = −1 − (r )t(t, x)

91 to obtain

ZZ 3 2 −1 ϕt(t, x) ϕt (t, x)ϕx(t, x) L(ψ) = (r )x(t, x) −1 2 + −1 dx dt − (r )t(t, x) (r )t(t, x) ZZ 3 −1 ϕt(t, x) = (r )x(t, x) −1 2 dx dt+ − (r )t(t, x) ZZ −1 1 (r )t(t, x) 2 + − −1 ϕt (t, x)ϕx(t, x) dx dt + E(ϕ) . (r )t(t, x)

Now we have to choose r−1 in such a way that the sum of the integrals is always negative. Let’s choose a point (t0, x0) and δ > 0 such that

ϕt(t, x) > 0, for (t, x) [t δ, t + δ] [x δ, x + δ] . ∈ 0 − 0 × 0 − 0

Define x x  r−1(t, x) = t + m+af(t)g − 0 . 

We require that f 0 for t / [t0 δ, t0 +δ] and f 0. For g we require that ≡ ∈ 0− ≥ g be constant for x / [0, 1] and g 0. The proof also works for ϕt(t, x) < 0, ∈ ≥ in this case we would require g0 0. ≤ a First we check that r(t, x) t Cm,n De for some constant De > 0. We k − k ≤ see that

−1 m+a −1 a r (t, x) t Cm,n  g( (x x )) Cm f Cn  D, k − k ≤ k − 0 k k k ≤ with D depending only on g and f. Then we use the rules for differentiating the inverse function to see we can get a similar estimate for r(t, x) with some other constant De. Because f vanishes outside a small neighborhood of t0 we have r(0, x) = 0 and r(T, x) = T and thus the new path ψ has the same endpoints as ϕ.

Finally we estimate the energy gained

ZZ 3 −1 ϕt(t, x) (r )x(t, x) −1 2 dx dt = − (r )t(t, x) Z t0+δ Z x0+ 0 −1 m+a−1 f(t)g ( (x x0)) 3 =  − ϕt(t, x) dx dt − (1 + m+af 0(t)g(−1(x x )))2 t0−δ x0 − 0 Z t0+δ Z 1 0 m+a f(t)g (y) 3 =  m+a 0 ϕt(t, x0 + y) dy dt , − t0−δ 0 1 +  f (t)g(y)

92 where we used the substitution y = x x in the last step. We know that − 0 ϕt(t, x0 + y) > 0 stays away from 0 on the domain of integration by our choice of (t0, x0) and we can approximate the denominator by

1 = 1 + o(m+a) , 1 + m+af 0(t)g(y) which means that the whole integral has a negative part of order m+a plus some terms of order at least 2m+2a,

ZZ 3 −1 ϕt(t, x) m+a 2m+2a (r )x(t, x) −1 2 dx dt <  X +  (...) + ... − (r )t(t, x) − with X > 0. The other integral

ZZ −1 1 (r )t(t, x) 2 − −1 ϕt (t, x)ϕx(t, x) dx dt = (r )t(t, x) Z t0+δ Z x0+ 0 −1 m+a f (t)g( (x x0)) 2 = − ϕt(t, x) ϕx(t, x) dx dt = 1 + m+af 0(t)g(−1(x x )) t0−δ x0 − 0 Z t0+δ Z 1 0 m+a+1 f (t)g(y) 2 = m+a 0 ϕt(t, x0 + y) ϕx(t, x0 + y) dx dt t0−δ 0 1 +  f (t)g(y) is of order at least m+a+1. Therefore their difference has a negative part of order m+a and other terms of higher order. Thus we have for small  the result

ZZ 3 −1 ϕt(t, x) E(ϕ) E(ψ) = (r )x(t, x) −1 2 dx dt − (r )t(t, x) ZZ −1 1 (r )t(t, x) 2 − −1 ϕt (t, x)ϕx(t, x) dx dt − (r )t(t, x) m+aX + m+a+1(...) . ≥

This completes the proof.

Proof of claim. The claim is essentially follows by applying Fa`adi Bruno’s formula, a higher order version of the chain rule. For i m and ≤ j n we have to estimate ≤

(1 + x 2)k∂i ∂j(ψ ϕ)(t, x) = (1 + x 2)k(∂i ∂j(ϕ(r(t, x), x)) ∂i ∂jϕ(t, x)) . | | x t − | | x t − x t

93 First, since ϕ(r(t, x), x) has two x-dependencies, we split them

i ! X i ∂i ∂j = ∂j (∂i ϕ)(r(t, x), x) + ∂l (ϕ r)(t, x)(∂i−lϕ)(r(t, x), x) . x t t x l x ◦ x l=1

Here we denote by ∂x(ϕ r)(t, x) the differentiation of ϕ(r(t, x), x) with ◦ respect to the first x. Each term in the sum will contain some x-derivative of r, which means that we can estimate

i 2 k j X (1 + x ) ∂ ... D1 ∂xr m 1,n , | | t ≤ k kC − l=1 with the constant depending on ϕ k,m,n+m . Next we apply Fa`adi Bruno’s k kS formula for the t-derivatives

j i  j i i ∂t (∂xϕ)(r(t, x), x) = (∂t ∂xϕ)(r(t, x), x)rt(t, x) +

a αl X a i X Y ∂t r(t, x) + j! (∂t ∂xϕ)(r(t, x), x) . αl! j>a>0 α1+...αa=j l=1 αl>0

Since in the sum a < j and each αj > 0 in each decomposition α1+. . . αa = j we will have at least one αl 2. Therefore we can estimate ≥

2 k X (1 + x ) j! ... D2 rt 0,n 1 , | | ≤ k kC − j>a>0 with D2 depending on ϕ k,m,n 1 . Next we use Taylor expansion for the k kS − remaining term

(∂j∂i ϕ)(r(t, x), x) = ∂j∂i ϕ(t, x) + (r(t, x) t)∂j+1∂j ϕ(ξ, x) , t x t x − t x with ξ lying between t and r(t, x). Finally we note that by assumption rt(t, x) 1 <  and thus by putting it all the estimates together we obtain | − |

(1 + x 2)k(∂i ∂j(ϕ(r(t, x), x)) ∂i ∂jϕ(t, x)) D | | | x t − x t | ≤ for a constant D, depending on ϕ k,m,n+m or ϕ k,m,n+1 , in case m = 0. k kS k kS This completes the proof of the claim.

Now we prove the same statement for the Virasoro-Bott group.

94 Theorem 3.21. Let (ϕ(t, x), α(t)) with t [0,T ] be a path in the Virasoro- ∈ Bott group R c DiffS (R). Let U be a neighborhood of (ϕ, α) in the space × ∞ ([0,T ] R) C ([0,T ]). Then there exists a path (ψ, β) U with the S × × ∈ same endpoints as (ϕ, α) and

E(ψ, β) < E(ϕ, α), where E(.) is the energy of a path w.r.t. the right-invariant L2-metric.

As for the diffeomorphism group, using the definition of the C∞-topology from Definition 3.18, it is sufficient to prove the following lemma.

Lemma 3.22. Let (ϕ(t, x), α(t)) be a path in the Virasoro group R c × DiffS (R). Given δ > 0 and k, m, n N with k 1, m 2 and n 1, ∈ ≥ ≥ ≥ there exists a path (ψ(t, x), β(t)) close to the original path

ψ ϕ k,m,n + α β Cn < δ k − kS k − k with the same endpoints

ψ(0,.) = ϕ(0,.), ψ(T,.) = ϕ(T,.), β(0) = α(0), β(T ) = α(T ) and less energy E(ψ, β) < E(ϕ, α) .

Proof. We will first sketch the idea for the proof before going into the details. The energy for the Virasoro group is given by

ZZ Z T  Z 2 2 ϕtxϕxx E(ϕ, α) = ϕt ϕx dx dt + αt 2 dx dt . 0 − R ϕx It consists of the energy of the diffeomorphism part and a second term, that measures, how much the path deviates from a horizontal one. We will first apply Lemma 3.20 to obtain a path ϕe(t, x) such that the energy in the diffeomorphism group is smaller, E(ϕe) < E(ϕ). We would like to define the extension part via Z Z ϕetxϕexx ϕtxϕxx βt 2 dx = αt 2 dx , − R ϕex − R ϕx and of course β(0) = α(0), because this would imply that the second term

95 of the energy remains the same. However, we need to match the endpoints of the curves as well, which means, that we require

Z T Z Z T Z ϕetxϕexx ϕtxϕxx 2 dx dt = 2 dx dt . 0 R ϕex 0 R ϕx

This will in general not be the case. Therefore we perturb the path ϕe(t, x) a little and obtain another path ψ(t, x), such that the energy of the diffeo- morphism part is still less than for the original path, E(ψ) < E(ϕ), and the endpoints match. Since the energy depends only on first derivatives, but the endpoint α(T ) on second derivatives, we are able to move the endpoint by larger amounts, while keeping the energy close to where we started.

Now for the implementation of this plan. Let  > 0 be small and ϕe(t, x) be a path as given from lemma 3.20 (with a = m+1), such that ϕ ϕ k,m,n < k e− kS m+1 and E(ϕ) E(ϕ) C2m+1 − e ≥ for some constant C > 0, depending only on ϕ. We now define the perturbed path via 2m+ 3 −2  ψ(t, x) = ϕ(t, x) +  2 λf(t)g  (x x ) , e − 0 with some functions f(t), g(x), a point x0 and a constant λ, which we can choose. We require g (R) to be rapidly vanishing and f(t) to vanish for ∈ S t = 0,T , such that ψ has the same endpoints as ϕe. It is easy to see that 3 ψ ϕ Sk,m,n = O( 2 ). As discussed above, we modify the R-component of k − ek the curve as follows Z Z ψtxψxx ϕtxϕxx ∂tβ(t) = ∂tα(t) + 2 dx 2 dx ψx − ϕx β(0) = α(0) .

Claim A. We have β α Cn D ψ ϕ 1,2,n with a constant D k − k ≤ k − kS depending only on ϕ. This claim, whose proof we postpone until later, ensures that the new path is close enough to the old one. Next we estimate the energy of the path

Z T  Z 2 ψtxψxx EVir(ψ, β) = EDiff (ψ) + ∂tβ(t) 2 dx dt 0 − ψx

96 Z T  Z 2 ϕtxϕxx = EDiff (ψ) EDiff (ϕ) + EDiff (ϕ) + ∂tα(t) 2 dx dt − 0 − ϕx E (ψ) E (ϕ) + E (ϕ) E (ϕ) + E (ϕ, α) ≤ | Diff − Diff e | Diff e − Diff Vir E (ψ) E (ϕ) C2m+1 + E (ϕ, α) . ≤ | Diff − Diff e | − Vir

Claim B. The energy E(ψ) of the perturbed path satisfies the estimate 2m+ 3 E(ψ) E(ϕ) = O( 2 ). | − e | Therefore, for small  the difference E (ψ) E (ϕ) C2m+1 will be | Diff − Diff e |− negative, which means that the new path uses less energy than the original,

EVir(ψ, b) < EVir(ϕ, a) .

Finally we need to choose f, g, x0 and λ in such a way that the endpoint β(T ) = α(T ) is preserved. For this we need to study the difference

Z T Z Z T Z ψtxψxx ϕtxϕxx 2 dx dt 2 dx dt . 0 R ψx − 0 R ϕx If the difference equals 0, we’ve accomplished our task. The necessary second derivatives of ψ are

2m− 1 0 0 −2 ψtx(t, x) = ϕtx(t, x) +  2 f (t)g ( (x x )) e − 0 2m− 5 00 −2 ψxx(t, x) = ϕxx(t, x) +  2 f(t)g ( (x x )) . e − 0

To simplify the formulas, let us introduce the notation ZZ ϕtxϕxx C(ϕ) = 2 dx dt . ϕx

Then we can write the difference as

C(ψ) C(ϕ) = C(ψ) C(ϕ) + C(ϕ) C(ϕ) − − e e − ZZ 00 −2 m− 5 f(t)g ( (x x0))ϕtx(t, x) 2 2 e =  − 2 dx dt ϕex(t, x) ZZ 2m− 1 +  2 ... + C(ϕ) C(ϕ) . e −

Each of the integrals contains g(−2(x x )) or some derivative as a factor. − 0 Therefore we can perform the substitution 2y = x x to gain another − 0

97 factor of 2. Let us define

ZZ 00 2 f(t)g (y)ϕetx(t,  y + x0) X() = 2 2 dy dt . ϕex(t,  y + x0)

Choose x such that ϕtx(t, x ) = 0 and g and f such that X(0) = 0. Note 0 0 6 6 that in this case X() is bounded away from 0 for small . Then ZZ 2m− 1 2m+ 3 C(ψ) C(ϕ) =  2 λX() +  2 ... + C(ϕ) C(ϕ) . − e −

2m 2m−1 The difference C(ϕ) C(ϕ) is of order O( ), since ϕ ϕ k,2,1 = O( ) e − k e− kS and therefore can be written as

C(ϕ) C(ϕ) = 2mY () e − with Y () bounded near 0. With everything put together we get

2m− 1 1 C(ψ) C(ϕ) =  2 (λX() +  2 Y () + ( )) . − ···

This shows that by suitably choosing λ we can achieve our goal of C(ψ) − C(ϕ) = 0. All that remains is to verify the claims made above and the proof will be complete.

Proof of Claim A. For i n we have to estimate ≤ Z Z i i i−1 ψtxψxx i−1 ϕtxϕxx ∂tβ ∂tα = ∂t 2 dx ∂t 2 dx. − ψx − ϕx

The denominator doesn’t present difficulties, since ψ and ϕ are diffeomor- phisms that decay to the identity and hence ψx and ϕx are bounded away from 0. It is shown in [55, 6.4] that multiplication of rapidly decaying functions is a continuous bilinear map and from Z Z dx 2 ψ dx (1 + x )ψ ∞ ≤ 1 + x2 k k we see that integration is bounded as well. Hence we get the required estimate

β α Cn D ψ ϕ 1,2,n . k − k ≤ k − kS

98 Proof of Claim B. The derivatives of ψ are

2m+ 3 0 −2 ψt(t, x) = ϕt(t, x) +  2 f (t)g( (x x )) e − 0 2m− 1 0 −2 ψx(t, x) = ϕx(t, x) +  2 f(t)g ( (x x )) . e − 0

With this we estimate the energy ZZ 2 E(ψ) = ψt ψx dx dt ZZ 2 2m− 1 0 −2 2 2m+ 3 = ϕ ϕx +  2 f(t)g ( (x x ))ψ (t, x) +  2 (...) dx dt . et e − 0 t

Now we substitute in the middle term 2y = x x to gain another factor − 0 of 2 and so ZZ 2m+ 3 0 2 2m+ 3 E(ψ) = E(ϕe) +  2 f(t)g (y)ψt (t, ey + x0) dx dt + O( 2 ) we see that 2m+ 3 E(ψ) E(ϕ) = O( 2 ) , | − e | which proves the second claim. This completes the proof.

Now we apply a reparametrization argument to show that these results also hold for the length functional.

Corollary 3.23. Let ϕ(t, x) with t [0,T ] be a path in DiffS (R). Given ∈ k, m, n N there exists for small  > 0 a path ψ(t, x) with the same end- ∈ points, close to the original path

 ϕ ψ k,m,n <  k − kS with smaller length L(ψ) < L(ϕ) .

The same result holds for the Virasoro-Bott group R c DiffS (R). × Proof. Given a path ϕ(t), let ϕ(t) = ϕ f(t) be a reparametrization with e ◦ constant speed. For ϕe(t) we find a path ψe(t) close to it, with less energy E(ψe) < E(ϕ). Then let g(t) be a reparametrization, such that ψe g(t) has e ◦ constant speed and finally define ψ(t) = ψe g f −1(t). ◦ ◦

99 Estimating the length is simple

L(ψ)2 = L(ψe g f −1)2 = L(ψe g)2 ◦ ◦ ◦ Z T 1 2 1 = ∂t(ψe g)(t) dt = E(ψe g) ψe◦g(t) T 0 | ◦ | T ◦ Z T 1 1 1 2 E(ψe) < E(ϕ f) = ∂t(ϕ f)(t) ϕ◦f(t) dt ≤ T T ◦ T 0 | ◦ | = L(ϕ f)2 = L(ϕ)2 . ◦

Finally one has to show that ψe being -close to ϕe also implies that ψ = ψe g f −1 is -close to ϕ = ϕ f −1. The concatenation with f −1 doesn’t ◦ ◦ e ◦ present difficulties, since we can use the chain rule to express derivatives of ϕ f −1 using derivatives of ϕ and f only depends on the initial path ϕ. ◦ We defined g to be the reparametrization such that ψe g has constant ◦ speed and ψe is -close to the path ϕe, which already has constant speed. As long as we assume that n 1, which means that the t-derivatives of paths ≥ have to -close to those of the original path, it follows that g has to be close to the identity reparametrization.

We conclude by stating the following corollary.

Corollary 3.24. The length and energy functionals on DiffS (R) and the Virasoro-Bott group, even when restricted to paths with fixed endpoints, have no local minima. All stationary points are therefore saddle-points.

3.6 Outlook

The results presented here paint a more complete picture on questions of geodesic distance than was available before, but some loose ends still remain. 1 1 For s = 2 we could only settle the case Diff(S ). The difficulty with 1 Diffc(R), which is technical in nature, is as follows: for s < 2 the vector field u(t, x), which was used to reach a diffeomorphism ϕ, had, as we decreased the energy, decreasing support in x, i.e.

As E(u) 0 also sup supp u(t, .) 0 . → t∈[0,1] | | →

1 However for s = 2 the support of the functions from Lemma 3.12 stays

100 constant, even as E(u) 0. This makes the control of the flow ϕ(t, x) at → time t = 1 difficult. Still, we conjecture

1 Conjecture 3.25. The geodesic distance vanishes on Diffc(R) for s = 2 .

The more interesting question is about the behaviour for manifolds of higher dimension. For 0 s 1 we believe that it is possible to adapt ≤ ≤ 2 the method of [56] to show that the distance vanishes. What is neccessary to construct one diffeomorphism, which can be connected to the identity by paths of arbitrary short length. All the other parts of the proof would remain exactly the same.

Conjecture 3.26. The geodesic distance vanishes on the space Diffc(M) for 0 s 1 . ≤ ≤ 2 The remaining open case is the behavior for 1 < s < 1 and dim(M) 2. 2 ≥ The methods used here can’t be applied to this case. In one dimension the Hs-norm admits functions with small norm, that are large at one point if and only if s 1 . In more dimensions the Hs-norm admits functions with ≤ 2 small norm that are large on a codimension one set also only for s 1 . In ≤ 2 one dimension the idea of the proof was to compress the space to a point and only move around this one point. The corresponding generalization to more dimensions would be to compress the space to a codimension one set and move this set around. Intuitively it shouldn’t be possible to compress it to set with higher codimension, since we only have one time-dimension, in which to do the compression. Therefore short paths are only possible for orders s, for which the Hs-norm admits functions with small norm, that are large on a codimension one set. This is the case only for s 1 . Therefore ≤ 2 we conjecture that

Conjecture 3.27. The geodesic distance doesn’t vanish on the diffeomor- 1 phism group Diffc(M) for s > 2 and dim(M) > 1.

To show that the distance vanishes, one would have to find a genuinely two-dimensional diffeomorphism, that can be reached by arbitrary short paths, since there are no such paths in Diff(S1), that cold be embedded, any more. The generalization of the results for the Hs-metric to the Virasoro-Bott group would require a very careful analysis of the flows generated by the

101 vector fields, which were used in the proof. For the L2-metric we were able to work directly with diffeomorphisms and estimate the length of the path, since Z −1 2 2 ϕt ϕ L2 = ϕt ϕx dx . | ◦ | R −1 2 Unfortunately there is no such simple expression for ϕt ϕ s . It would | ◦ |H be necessary to combine the construction of ϕ in terms of the velocity u(t, x) R ϕtxϕxx with estimates of the term 2 dx, arising from the central extension. R ϕx These questions leave enough room for further work in this area.

102 4 Surface Matching

4.1 Background

The LDM framework presented in Chapter 2 can be applied to a variety of data structures. In the past it has been used to match landmarks [40], curves [29, 20], surfaces [83], volumetric images [10], vector fields [16] and tensor fields [15]. In all cases the object of interest is deformed by a diffeomorphism of the ambient space via a suitable action and the deformation energy is given as the kinetic energy of the curve of diffeomorphisms. When matching surfaces it is not always desirable to use a model, which assumes a deformation of the whole ambient space, since the objects are two-dimensional submanifolds and don’t include information about the rest of the space. In [87] the authors consider only the interior of the shape, i.e. the domain bounded by the shape and measure the deformation energy as the dissipation of a fluid occupying the interior. The method presented in [49], which is a generalization of [62] from curves to surfaces, defines a Riemannian metric directly on discrete triangular meshes by measuring the amount of local stretching and bending. A dif- ferent Riemannian metric defined on discrete meshes was presented in [45]. A drawback of these methods is that it assumes an a-priori point-to-point correspondence of the template and target meshes to be matched. This as- sumption may be justified in some applications, but will be an obstacle in others. This drawback is overcome in [48], where matching between two sur- faces also involves an optimization over the reparametrization group to find the correct point-to-point correspondences. The authors choose a different shape representation by multiplying each (parametrized) surface with the local area density and imposing the flat L2-metric on the resulting space. Via pullback this leads to a Riemannian metric on the space of surfaces. This metric is not flat any more and invariant under reparametrizations.

103 However there are theoretical questions, which are left unanswered in their paper. Most importantly it is not clear that the metric defined in this way is non-degenerate. We will in this Chapter use a metric from the family of Sobolev-type metrics studied in [7]. This metric has the properties that it is defined intrinsically on the surface and takes into account the local geometry of the surface. The metric is also invariant under reparametrizations. We extend the framework to include surfaces with boundaries and show how to devise an efficient implementation of the geodesic equations and how to solve the matching problem.

4.2 First Order Sobolev-type Metric

The problem we are dealing with is the registration of parametrized surfaces. Such a surface is given by a smooth function q : M R3 from a model → surface M into the . We will consider different choices for the model surface M, such as the plane sheet M = [0, 1] [0, 1], the cylinder × M = S1 [0, 1] and the M = S1 S1. Another important choice would × × be the sphere M = S2, the treatment of which is however more difficult, since the sphere cannot be covered by a single coordinate chart and therefore requires different techniques than those used in this work. Note however that the metric can be defined in exactly the same way as for the other model surfaces, but the numerical treatment is more challenging. Let us denote by   3 3 ∂q ∂q S = Imm(M, R ) = q : M R : , linearly independent → ∂x1 ∂x2 the set of regular parametrized surfaces. The tangent space TqS at a surface q consists of all vector fields along q or equivalently of all sections of the ∗ pullback bundle q T R3

∗ 3 ∞ 3 TqS = Γ(q T R ) ∼= C (M, R ) ,

∞ and for a tangent vector u C (M, R3) at the surface q it is understood ∈ 3 3 that u(x) Tq(x)R is an element of the tangent space of R at the point ∈ q(x). Since q is a surface in R3, it inherits a Riemannian metric from the

104 standard Euclidean inner product on R3. We shall denote the Euclidean inner product by g(u, v) = u v and the inner product induced on the · tangent planes to the surface by g,

g(X,Y ) = g(Tq(x).X, Tq(x).Y ) .

The induced volume form on the surface is

p 1 2 dvol(g) = det gij dx dx ∧ and the volume density with respect to a basis of the tangent space is

p vol(g) = det gij .

∂q ∂q We will always choose the basis ∂x1 , ∂x2 induced by the given parametriza- tion q. In this basis the metric has the form

3 X ∂qk ∂qk gij = . ∂xi ∂xj k=1

The metric we will consider on TqS is

3 Z X k k 2  g k g k  G(u, v)q : = u v + α g grad (u ), grad (v ) vol(g) dx (4.1) k=1 M 3 X Z ∂uk ∂vk = ukvk + α2gij vol(g) dx . (4.2) ∂xi ∂xj k=1 M

It is the H1-norm for vector-valued functions on the manifold (M, g).

4.3 Variation of the Metric

In order to compute the geodesic equation we will need to compute the derivative of the energy

1 Z 1 Z 1 E(ut) = Gqt (ut, ut) dt = `(ut, ut; qt) dt 2 0 0

105 with the Lagrangian

3 1 X Z `(u, v; q) = ujvj + α2g−1(duj, dvj) dvol , 2 j=1 M which represents the kinetic energy of the curve at each time. To compute the variation D`(u, v; q).δq with respect to the basepoint we will need the derivatives of the functionals Z −1 HF,u,v(q) = g (du, dv)F dx M Z GF (q) = F dvol . M

Then we will be able to write

3 X j j D`(u, v; q).δq = DGF1 (q).δq + DHF2,u ,v (q).δq (4.3) j=1 with

3 1 X F = ujvj + α2g−1(duj, dvj) 1 2 j=1 1 F = vol . 2 2

The variations of the functionals HF,u,v and GF are given in the following lemmas.

∞ Lemma 4.1. For q S a surface, F, u, v C (M, R) functions on the ∈ ∈ surface and δq TqS a variation of q we have ∈ Z g DHF,u,v(q).δq = g( gradg uδq, T q. grad v)F dx + (u v) − M ∇ ↔ 3 X Z = g−1(du, d(δqj))g−1(dv, dqj)F dx + (u v) . − ↔ j=1 M

The connection is the connection on the Euclidean space. In coordinates ∇ the derivative is Z   ij kl ∂u ∂v ∂u ∂v ∂q ∂(δq) DHF,u,v(q).δq = F g g i k + k i j l dx . − M ∂x ∂x ∂x ∂x ∂x · ∂x

106 Proof. The analogous rule for differentiating the inverse of a matrix

−1 −1 −1 ∂tA = A ∂tAA − for symmetric tensors is

D g−1(du, dv) .δq = D g−1 .δq(du, dv) = g−1Dg.δqg−1(du, dv) − = D (g(gradg u, gradg v)) .δq . −

Introducing the notation X = gradg u and Y = gradg v and choosing a one-parameter family of variations qt such that ∂tqt t = δq, we obtain | =0 Z DHF,u,v(q).δq = Dg(X,Y ).δqF dx M − Z ∗ = ∂tqt g(X,Y )F dx − M Z

= ∂tg(Tqt .X, Tqt .Y )F dx − M Z

= g(∂tTqt .X, Tq.Y )F dx + (u v) − M ↔ Z = g( X δq, Tq.Y )F dx + (u v) . − M ∇ ↔

To obtain the expression in coordinates we use the formulas

ij ∂u ∂ j ∂q X = g X δq = X ∂ui ∂xj ∇ ∂xj ∂u ∂ ∂q Y = gkl T q.Y = Y l ∂vk ∂xl ∂xl from which the result follows.

∞ Lemma 4.2. For q S a surface, F C (M, R) a function on the ∈ ∈ surface and δq TqS a variation of q we have ∈ 3 Z X −1 j j DGF (q).δq = g (dq , d(δq ))F dvol . j=1 M

Written in coordinates the result is Z ij ∂q ∂δq DGF (q).δq = F g i j dvol . M ∂x · ∂x

107 Proof. Since the volume form is defined using the determinant of the metric

p dvol(g) = det(g)dx1 dx2 ∧ we’ll need to differentiate the determinant. To do so, we’ll use some notation from linear algebra. Let gbij denote the co-factors of the matrix g. That is

i+j gij = ( 1) det(g without i-th row and j-th column). b −

Using the co-factors we can write the inverse of g as

ij 1 g = gji det(g)b and of course gbij = gbji, since g is symmetric. The differential of the deriva- tive is given by X D(det(g)).δg = gbijδgij i,j

P ∂qk ∂qk and from gij = k ∂xi ∂xj we get

X ∂(δqk) ∂qk ∂qk ∂(δqk) D(det(g)).δq = gij + b ∂xi ∂xj ∂xi ∂xj i,j,k X ∂qk ∂(δqk) = 2 gij b ∂xj ∂xi i,j,k X ∂qk ∂(δqk) = 2 det(g) gij ∂xj ∂xi i,j,k X = 2 det(g) g−1(dqk, d(δqk)). k

Using this it is straight-forward to compute the derivative of GF (q): Z DGF (q).δq = FD(vol(g)).δq dx M Z p = FD( det(g)).δq dx M Z 1 = F p D(det(g)).δq dx M 2 det(g) X Z p = F g−1(dqj, d(δqj)) det(g) dx j M

108 X Z = F g−1(dqj, d(δqj)) dvol . j M

The coordinate expression can be derived via the formula for the inverse metric ∂u ∂v g−1(du, dv) = gij . ∂xi ∂xj This concludes the proof.

We will also need the second derivatives of these functionals, which are computed in the following lemmas.

∞ Lemma 4.3. For q S a surface, F, u, v C (M, R) functions on the ∈ ∈ surface and δq , δq TqS variations of q we have 1 2 ∈

2 D HF,u,v.(δq1, δq2) = 3 X = DHF g 1(du,dδqj ),qj ,v.δq2 + DHF g 1(dv,dqj ),δqj ,v.δq2 − − 1 − 1 j=1 3 X Z g−1(du, dδqj)g−1(dv, dδqj)F dx + (u v) − 1 2 ↔ j=1 M

Proof. Differentiate the expression in Lemma 4.1 and reuse the functional

HF,u,v.

∞ Lemma 4.4. For q S a surface, F C (M, R) a function on the ∈ ∈ surface and δq , δq TqS variations of q we have 1 2 ∈ 3 2 X D GF .(δq1, δq2) = DG 1 j j (q).δq2 + DH j j (q).δq2 g− (dq ,dδq1) F vol,q ,δq1 j=1 3 Z X −1 j j + g (dδq1, dδq2)F dvol j=1 M

Proof. Differentiate the expression in Lemma 4.2.

Lemma 4.5. For q S a surface and δq , δq TqS variations of q. The ∈ 1 2 ∈ second derivative of `(u, u; q) is given by

3 2 2 X 2 j j D `(u, u; q).(δq1, δq2) = D GF1 (q).(δq1, δq2) + D HF2,u ,v (q).(δq1, δq2) j=1

109 3 X j j + DGF3 (q).δq2 + DHF4,u ,v (q).δq2 j=1 with

3 1 X F = ujvj + α2g−1(duj, dvj) 1 2 j=1 1 F = vol 2 2 3 1 X F = g−1(duj, dδqk)g−1(dvj, qk) 3 2 1 j,k=1 3 1 X F = g−1(dqk, dδqk) vol 4 2 1 k=1

Proof. Differentiate (4.3).

4.4 The Helmholtz Operator and Duality

In the theoretical and numerical treatment of LDM [85] it is common to write the inner product on the space of vector fields, which is given in terms of a smoothing kernel K and its inverse as an L2-pairing between a velocity and a momentum, i.e. Z Z −1 u, v K = K u v dx = p v dx = p, v L2(Ω) . h i Ω · Ω · h i

Here p = K−1u is the momentum associated to the velocity field u. Such a representation is of interest, because it can be shown that the momentum, which minimizes the matching energy for LDM has the same complexity as the data structure. This is an important reduction of complexity, especially when matching landmarks, curves or surfaces. Such a representation de- pends on the invertibility of the kernel K. In this section we will show how to define a momentum for the H1-metric on S and how to deal with the case, when M has nonempty boundary, ∂M = . 6 ∅ In the case that M has no boundary, we can proceed as for LDM. Write

110 the expression for the metric

3 Z X j j 2 g j g j Gq(u, v) = u v + α g(grad u , grad v ) dvol j=1 M and integrate by parts to obtain

3 Z X j 2 g j j Gq(u, v) = (u α ∆ u )v dvol , − j=1 M where ∆gu = divg gradg u is the Laplace operator on M. Hence

p = u α2∆u = (Id α2∆g)u − − is the momentum corresponding to u, which allows us to write

Gq(u, v) = p, v 2 . h iL (dvol)

The operator L = Id α2∆g is called the Helmholtz operator. Note that − the integration by parts was only possible, because M has no boundary. Now let M have nonempty boundary. In this case the integration by parts introduces a boundary term.

∞ Lemma 4.6. For u, v C (M, R) we have ∈ Z Z Z ∂u g−1(du, dv) dvol = (∆gu)v dvol + v dS , M − M ∂M ∂ν where ν denotes the outward pointing normal vector to the boundary and dS = dvol∂M is the volume form on ∂M.

Proof. If X X(M) is a vector field on M, Green’s identity states that ∈ Z Z divg(X) dvol = g(X, ν) dS . M ∂M

We apply this identity with X = v gradg u, for which the divergence is

divg (v gradg u) = dv (gradg u) + (∆gu)v = g−1(du, dv) + (∆gu)v .

111 Similarly the boundary term is

∂u g (v gradg u, ν) = g(gradg u, ν)v = v , ∂ν which concludes the proof.

Using this lemma we can rewrite the expression for the inner product as

3 Z X j j 2 g j g j Gq(u, v) = u v + α g(grad u , grad v ) dvol j=1 M 3 X Z Z ∂uj = (uj α2∆guj)vj dvol + vj dS . − ∂ν j=1 M ∂M

Hence the momentum p associated to the tangent vector u has to consist of two components, p = (pM , p∂), one of which corresponds to the classical momentum and the other, which lives on the boundary,

∂u pM = (Id α2∆g)u = Lu p∂ = = Bu . − ∂ν

The passage from velocity to momentum can be written as

( ∞ ∞ ∞ C (M, R3) C (M, R3) C (∂M, R3) L B : → ⊕ ⊕ u (Id α2∆g)u, ∂u  7→ − ∂ν Theory of elliptic partial differential equations with Neumann boundary conditions on Riemannian manifolds [17] tells us that the operator L B is ⊕ invertible and therefore we have a well-defined notion of momentum in the case of a surface with boundary, which allows us to write

M ∂ Gq(u, v) = p , v 2 + p , v 2 . h iL (dvol) h iL (dS)

It also shows that it is necessary to have a component of the momentum on the boundary. Using the momentum it is now possible to write the geodesic equations in the form

M M δ` ∂tqt = ut ∂tp = t δq ∂ M ∂ ∂ δ` (p , p ) = (Lu, Bu) ∂tp = , t δq

112 δ` where δq is the functional derivative of `, defined via * +  δ` M  δ` ∂ D`(u, v; q).δq = , δq + , δq . δq 2 δq L (dvol) L2(dS)

However to compute the time-evolution of the momentum, it is necessary to δ` evaluate the functional derivative δq , which involves evaluating derivatives of ut and qt on the boundary of the surface, an operation which is numerically instable. For this reason we will discretize the geodesic equation in a way, that avoids computing the momentum altogether.

4.5 Discretization

In this section we will describe how to discretize the variational problem

Z 1 1 2 E(ut) = `(ut, ut; qt) dt + 2 q1 qtarg L2 0 2σ k − k and compute the optimal path between two surfaces. Starting with an initial 1 guess for the velocity u0, the goal is to use a gradient descent scheme

i+1 i i u = u ε u E(u ) 0 0 − ∇ 0 0 to converge towards the initial velocity of the optimal geodesic. The dis- cretization thus consists of two parts:

compute the geodesic, given the initial velocity to evaluate E(ui ) • 0 i compute the gradient u E(u ) to update the initial velocity. • ∇ 0 0 We show how to discretize the geodesic equation in Section 4.5.1 and how to compute the gradient in Section 4.5.2.

4.5.1 The Geodesic Equation

We discretize the time-evolution of the surface q(t) using the explicit Euler method

qi+1 = qi + ∆tui , (4.4) where qi = q(i∆t) is the discretized version of the curve and ∆t = 1/N is the time step, if we divide the interval [0, 1] into N parts. To compute

113 ui we note that a geodesic is a critical point of the time-discretized energy ∆t PN−1 E(u) = Gq (ui, ui), i.e. u E(u) = 0. Following [22] we introduce 2 i=0 i ∇ i the Lagrangian multiplier pi in the discrete variational principle

N−1 X ∆t E(u , . . . , uN− ) = Gq (ui, ui) + pi, qi qi ∆tui 2 . (4.5) 0 1 2 i h +1 − − iL i=0 and take variations. As seen in the discussion in Section 4.4 the Lagrangian multiplier pi is a momentum and therefore consists of two components pi = M ∂ (pi , pi ). However, since we will not need to compute it directly, we will write with a slight abuse of notation

M ∂ p, u 2 = p , u 2 + p , u 2 h iL h iL (dvol) h iL (dS) for the pairing between momentum and velocity. Taking variations gives

N−1 X Dui E(u0, . . . , uN−1).δui = i=0 N−1 X  δ`  = ∆tGqi (ui, δui) + ∆t (ui, ui; qi), δqi δq 2 i=0 L

+ pi, δqi δqi ∆tδui 2 h +1 − − iL N−1 X = ∆t Gq (ui, δui) pi, δui 2 i − h iL i=0 N−1 X  δ`  + ∆t (ui, ui; qi), δqi + pi, δqi+1 L2 pi, δqi L2 δq 2 h i − h i i=0 L N−1 X = ∆t Gq (ui, δui) pi, δui 2 i − h iL i=0 N−1 X  δ`  + ∆t (ui, ui; qi), δqi + pi−1, δqi L2 pi, δqi L2 δq 2 h i − h i i=1 L  δ`  + ∆t (u0, u0; q0), δq0 + pN−1, δqN L2 p0, δq0 L2 . δq L2 h i − h i

From this we obtain the equations

 δ`  pi, δqi L2 = pi−1, δqi L2 + ∆t (ui, ui; qi), δqi h i h i δq L2

114 pi, δui 2 = Gq (ui, δui) , h iL i which, when combined give us a weak form of the evolution equation for u,

Gqi (ui, δui) = Gqi 1 (ui−1, δui) + ∆tD`(ui, ui; qi).δqi . −

This equation is an implicit time-step for ui, since ui appears on the right hand side in a quadratic term. To make computations easier and avoid having to solve a nonlinear equation, we changed the right hand side to an explicit Euler time step

Gqi (ui, δqi) = Gqi 1 (ui−1, δqi) + ∆tD`(ui−1, ui−1; qi−1).δqi . (4.6) −

The derivative D` was computed in (4.3) and its evaluation in direction δqi only involves first derivatives of ui, qi and δqi and integration over the whole surface M. No second derivatives or boundary terms are needed. The computation of a geodesic is summarized in the following pseudocode. Given an initial surface q and initial velocity u, it computes the value of the geodesic at time TEnd, discretized into N time-steps.

function Exp(q, u, TEnd,NT ) q , u q, u 0 0 ← ∆t T /NT ← End for i 1,NT do ← qi qi− + ∆tui ← 1 Find ui, such that for arbitrary δq

Gqi (ui, δq) = Gqi 1 (ui−1, δq) + ∆tD`(ui−1, ui−1; qi−1).δq − end for

return qN , uN end function

For spatial discretization we used the finite element method. This choice was natural, since the time-update of the velocity (4.6) is given in a weak form. The model manifold was taken to be the square [0, 1] [0, 1]. The × square was divided into N N smaller squares of equal size, which were × then split along the diagonal to give a triangular mesh. On this trianular mesh with N 2 vertices we used Lagrange finite elements of order 1. To test the convergence of the algorithm, we looked at how the endpoint

115 5 10−

6 10−

7 10−

8 10− -distance of the endpoint 2 L

10 9 − 10 20 30 40 50 Number of time-steps

2 Figure 4.1: The distance q (N,NT ) q , 2 decreases with increasing k 1 − 1 exkL number of time-steps NT . of the geodesic changes, if we refine the time- and space-discretizations.

We chose an initial surface q0 and an initial velocity u0 and computed the endpoint of the geodesic q (N,NT ). We used q , q (192, 100) as an 1 1 ex ≈ 1 approximation to the exact endpoint of the geodesic and looked at the L2- 2 distance q (N,NT ) q , 2 as we increase the number N of points in the k 1 − 1 exkL spatial discretization and the number NT of time-steps. We see in Figures 4.1 and 4.2 that in both cases the endpoint of the geodesic seems to converge.

4.5.2 Computing the Gradient

Given (4.4), (4.6) for the evolution of a geodesic, we again use the method of adjoint equations to compute the gradient of the energy with respect to the initial velocity. We introduce the Lagrangian multipliers ubi, pbi and write the energy

1 1 2 E(u ) = Gq (u , u ) + qN q 2 + 0 2 0 0 0 2σ2 k − targkL N−1 X + pi , qi qi ∆tui 2 + hb +1 +1 − − iL i=0

+ Gq (ui , ui ) Gq (ui , ui) ∆tD`(ui, ui; qi).ui . i+1 b +1 +1 − i b +1 − b +1

116 4 10−

5 10−

6 10−

7 10−

8 10− -distance of the endpoint

2 9 10− L

10 10 − 40 60 80 100 120 140 160 180 Spatial subdivisions in each dimension

2 Figure 4.2: The distance q (N,NT ) q , 2 decreases with increasing k 1 − 1 exkL number of points N in the spatial discretization.

Taking variations gives

1 DE(u ).δu =Gq (u , δu ) + D`(u , u ; q ).δq + qN q , δqN 2 0 0 0 0 0 0 0 0 0 σ2 h − targ iL N−1 X + pi , δqi δqi ∆tδui 2 + hb +1 +1 − − iL i=0

+ Gqi+1 (ubi+1, δui+1) + 2D`(ui+1, ubi+1; qi+1).δqi+1

Gq (ui , δui) 2D`(ui, ui ; qi).δqi − i b +1 − b +1 2 2∆tD`(ui, δui; qi).ui ∆tD `(ui, ui; qi).(ui , δqi) . − b +1 − b +1

Collecting terms with the same variations, noting that δq0 = 0, since we have fixed the initial value of the geodesic, leads to

1 DE(u ).δu =Gq (u , δu ) + qN q , δqN 2 0 0 0 0 0 σ2 h − targ iL N−1 X  + pi, δqi 2 pi , δqi 2 + 2D`(ui, ui; qi).δqi hb iL − hb +1 iL b i=1  2 2D`(ui, ui ; qi).δqi ∆tD `(ui, ui; qi).(ui , δqi) − b +1 − b +1

117 N−1 X  + Gq (ui, δui) Gq (ui , δui) ∆t pi , δui 2 i b − i b +1 − hb +1 iL i=1  2∆tD`(ui, δui; qi).ui − b +1

+ pN , δqN 2 + Gq (uN , δuN ) + 2D`(uN , uN ; qN ).δqN hb iL N b b ∆t p , δu 2 Gq (u , δu ) 2∆tD`(u , δu ; q ).u . − hb1 0iL − 0 b1 0 − 0 0 0 b1

The resulting equations for pbi, ubi are

Gq (ui, δui) = Gq (ui , δui) + ∆t pi , δui 2 + 2∆tD`(ui, δui; qi).ui i b i b +1 hb +1 iL b +1 pi, δqi 2 = pi , δqi 2 + 2D`(ui ui, ui; qi).δqi hb iL hb +1 iL b +1 − b 2 + ∆tD `(ui, ui; qi).(ubi+1, δqi) , which are integrated backwards in time. The initial condition for ubi is

GqN (ubN , δuN ) = 0 and hence ubN = 0. This then simplifies the initial condition for pbi, which is 1 pN , δqN 2 = qN q , δqN 2 2D`(uN , uN ; qN ).δqN hb iL −σ2 h − targ iL − b 1 = qN q , δqN 2 . −σ2 h − targ iL

The derivative of the energy is given by

DE(u0).δu0 =

= Gq (u , δu ) Gq (u , δu ) ∆t p , δu 2 2∆tD`(u , δu ; q ).u 0 0 0 − 0 b1 0 − hb1 0iL − 0 0 0 b1 = Gq (u , δu ) Gq (u , δu ) . 0 0 0 − 0 b0 0

If we define the gradient u E(u ) via ∇ 0 0

DE(u ).δu = Gq ( u E(u ), δu ) , 0 0 0 ∇ 0 0 0 then the gradient is

u E(u ) = u u . (4.7) ∇ 0 0 0 − b0

118 Finally, we introduce a new variable vbi, dual to pbi,

pi, δqi 2 = Gq (vi, δqi) , hb iL i b so we don’t have to compute the momentum itself. With this the adjoint equations are

Gqi (ubi, δui) = Gqi (ubi+1, δui) + ∆tGqi+1 (vbi+1, δui) + 2∆tD`(ui, δui; qi).ubi+1

Gq (vi, δqi) = Gq (vi , δqi) + 2D`(ui ui, ui; qi).δqi i b i+1 b +1 b +1 − b 2 + ∆tD `(ui, ui; qi).(ubi+1, δqi) , with the initial conditions

1 uN = 0 Gq (vN , δqN ) = qN q , δqN b N b −σ2 h − targ i at time t = 1. The gradient is then given by (4.7).

The following pseudocode summarizes how to compute the gradient of the energy 1 1 2 E(u ) = Gq (u , u ) + qN q 2 0 2 0 0 0 2σ2 k − targkL with respect to u0.

function Grad(q, u, TEnd,N) q , u q, u 0 0 ← ∆t T /N ← End for i 1,N do . Solve geodesic equation ← qi, ui Exp(qi− , ui− , ∆t, 1) ← 1 1 end for

uN 0 . Initialize adjoint variables b ← Find vbN such that for arbitrary δq 1 Gq (vN , δq) = qN q , δq 2 N b − σ2 h − targ iL for i N 1, 0 do . Solve adjoint equation ← − Find ubi such that for arbitrary δu

Gqi (ubi, δu) = Gqi (ubi+1, δu) + ∆tGqi+1 (vbi+1, δu) +2∆tD`(ui, δu; qi).ubi+1 Find vbi such that for arbitrary δq Gq (vi, δq) = Gq (vi , δq) + 2D`(ui ui, ui; qi).δq i b i+1 b +1 b +1 − b 2 +∆tD `(ui, ui; qi).(ubi+1, δq)

119 103

102

101

100

1 10−

2 10−

3 10−

4 10− Difference to computed gradient 5 10− 1 0 1 2 3 4 5 6 7 8 9 10 10 10 10− 10− 10− 10− 10− 10− 10− 10− 10− 10− Step size in the difference quotient

Figure 4.3: The difference between the gradient uE in the direction v, i.e. ∇ Gq( uE, v) and the approximation via difference quotient for ∇ decresing step-size in the difference quotient.

end for return u u 0 − b0 end function To test the computation of the gradient, we picked a surface q and two velocities u, v TqS and compared the scalar product Gq( uE, v) with ∈ E(u+tv)−E(u) ∇ the difference quotient t as t approaches 0, since

E(u + tv) E(u) Gq( uE, v) = lim − ∇ t→0 t

E(u+tv)−E(u) we expect the difference Gq( uE, v) to approach 0. We ∇ − t used 10 time-steps and a mesh with 30 30 vertices for this computation. × As we can see in Figure 4.3 the difference does approach 0.

4.6 Numerical Experiments

The basis for the numerical experiments described below is the ability to compute the initial velocity u0 of a geodesic, given its endpoints q0 and qtarg. We use a gradient descent algorithm on the initial velocity u0 for the

120 102 103 Energy E(ui) 101 G (u , u )/2 q0 0 0 102 2 2 qN qtarg L2 /(2σ ) 100 k − k Gradient G ( i E, i E) q0 ∇u ∇u 101 1 10−

Energy 100 2 10− Norm of the gradient 1 3 10− 10−

10 4 10 2 − 0 10 20 30 40 50 60 70 − Iterations

Figure 4.4: Progress of the gradient descent computing the geodesic between two shapes. As the norm of the gradient i E decreases, the ∇u endpoint qN of the geodesic approaches the specified target sur- 2 1 2 face q as measured by the L -distance qN q 2 . targ 2σ2 k − targkL energy 1 1 2 E(u ) = Gq (u , u ) + qN q 2 0 2 0 0 0 2σ2 k − targkL to compute the initial velocity. Here qN is the endpoint of the geodesic starting at q0 and having initial velocity u0. Starting with an initial guess 0 u0 we obtain iteratively new velocities

i+1 i u = u0 ε ui E. 0 − ∇ 0

We see in Figure 4.4, how the norm of the gradient decreases as the L2- 2 distance qN q 2 approaches 0. k − targkL We implemented the geodesic and adjoint equations (4.4), (4.6) and (4.5.2) in Python using the finite element library FEniCS [1]. The choice of the finite element method was natural, since the equations were given in a weak form. This form of the equations contains only first derivatives and no terms, which are defined only on the boundary ∂M of M. All model man- ifolds ([0, 1] [0, 1], S1 [0, 1], S1 S1) were modelled on the rectangle × × × [0, 1] [0, 1] with periodic boundary conditions prescribed where necessary. ×

121 Figure 4.5: Samples are shown from a geodesic in the space of surfaces be- tween a straight cylinder and a bent cylinder with ripples, at time points t = 0, 0.3, 0.5, 0.8, 1. The color encodes the Eu- clidean length of the deformation vector field at each point of the surface.

The domain was subdivided into a regular triangular mesh, on which La- grangian finite elements of order 1 were defined. In the first example we apply our method to compute the geodesic path between two shapes, which includes both large and small deformations. The template shape is a straight cylinder of height 1 and radius 0.25, which is discretized using a regular triangular mesh of 2 30 30 elements. The × × target shape is a cylinder, which is bent by 90◦ and has 5 small ripples added to it along the vertical axis. Compared to the bending the ripples constitute a small and local deformation of the shape. The target shape is discretized in the same way as the template. We use α = 0.6 as the length scale parameter and 10 time steps for the time integration. The gradient descent takes 80 steps to converge to an L2-error of 0.008. We can see in figure 4.5 that both the large and the small deformations are captured by the geodesic. In the second example we want to illustrate the curved nature of shape space. To do so we pick three asymmetric tori, lying in different positions in space. Each two tori differ by a composition of two rigid rotations. We compute the geodesics between each pair of tori to measure the angles and side lengths of the triangle with the tori as vertices and the geodesics as edges. By comparing the sum of the angles with π one can estimate, whether the curvature of shape space along the plane containing the triangle is positive or negative. We measured α = 33.766◦, β = 34.802◦ and γ =

122 Figure 4.6: This figure shows a geodesic triangle in the space of surfaces with asymmetric tori as vertices. The tori along the edges are the middle points of the geodesics connecting the vertices. One can see that the shapes tend to shrink along the geodesics before expanding again towards the ends. This effect implies negative curvature in this region of shape space.

34.675◦. The sum α+β +γ = 103.243◦ is smaller than 180◦, which indicates that the space is negatively curved in this area (c.f. [57, Sec. 5.4]). In negatively curved spaces geodesics tend to be attracted towards a common point. In this example the geodesics are attracted towards the surface, which is degenerated to a point. We can see in figure 4.6 that the midpoints of the geodesics between the vertices are slightly shrunk. This is another indication for the negatively curved nature of the space. In the third example we show, that our framework is indeed capable to do nonlinear statistics on shape space. We generate five sample shapes and compute the mean shape between them. The five shapes are cylindrical vases with an open top and bottom, discretized again using a triangular mesh of 2 30 30 elements. As the initial guess for the mean we use × × a straight cylinder. First we register this initial shape to the five target shapes, compute the average of the initial velocities and then shoot with this average velocity to obtain a next guess for the mean shape. We iterate this procedure until the average velocity is close to zero. This method of computing the Karcher mean was proposed in [27]. After four iterations we obtained an average velocity with norm 0.006. As can be seen in figure 4.7

123 Figure 4.7: In this figure we show the Karcher mean of five vase-shaped objects. The mean shape, which is displayed in the center of the figure is computed using an iterated shooting method. The colored regions on the averaged shapes encode the Euclidean length of the initial velocity of the geodesic, which connects each shape to the mean. The color of the mean was chosen for artistic purposes only. the average shape indeed combines the characteristics of the five shapes.

4.7 Outlook

The work presented in this Chapter represents a first step towards a dis- cretization of the family of Sobolev-type metrics on the space of surfaces. Several issues need to be addressed.

To implement the discretized geodesic and adjoint equations we used • the formulas provided by Lemmas 4.1 and 4.2, in particular their coor- dinate versions. We assume that the surface is given via a parametriza- tion, i.e. as a map q : M R3 from a flat model manifold to the → ambient space. This gives us a global coordinate chart as well as a basis for the tangent space to the surface.

The continuous equations can be written without resorting to charts and coordinates, using only intrinsically defined operations from dif- ferential geometry as shown in the first parts of Lemmas 4.1 and 4.2.

124 In applications the surfaces are usually given in the form of a trian- gulated mesh, not via an excplicit parametrization. Furthermore for surfaces of genus 0, i.e. sphere-shaped ones, there doesn’t even exist a global parametrization. Examples of such surfaces are the hippocam- pus or the cortical surface.

It would be therefore of great interest to implement the discrete match- ing equations for general meshes. Doing so would require a finite element library, that allows to work with finite elements on curved domains. Most finite element libraries, among them FEniCS, which was chosen for its ease of use, require the domain to be an open sub- set of Euclidean space, i.e. the domain of the coordinate chart of the surface.

It is also important to address the question of point-to-point corre- • spondences. The Riemannian metric on S is invariant under the action of the reparametrization group Diff(M). It means that instead

of finding the geodesic distance dist(q0, q1) between two parametrized surfaces, we could find

inf dist(q0 ϕ, q1 ψ) ϕ,ψ∈Diff(M) ◦ ◦

the distance between Diff(M)-orbits of q0 and q1. From a geometrical point of view this corresponds to finding geodesics on the quotient space S / Diff(M) with respect to the induced Riemannian metric. This corresponds to the geometrical idea of matching (unparamet- rized) surfaces, where only the image q(M) of the parametrization q : M R3 is considered. → There are several ways to achieve this. One possibility would be to change the matching term from q q 2 to q q ϕ 2 and k 1 − targkL2 k 1 − targ ◦ kL2 include a minimization over ϕ Diff(M), as is done in [21]. ∈ Another possibility is to replace the L2-norm in the matching term with a current-norm [83]. This matching term is invariant remains unchanged, if we replace q by q ϕ and allows us to find minimal 0 0 ◦ distances in the quotient space S / Diff(M) without having to com- pute the optimal reparametrization ϕ.

125 To gain flexibility in the choice of the metric, Sobolev-metrics of higher • order should be studied from a numerical point of view. The Sobolev- metric of order one is only the first in a family of related metrics, which were studied from a theoretical point of view in [7]. One would expect that varying the Sobolev-order as well as the length-scale parameters in the metrics, the metrics will exhibit different behaviour in terms of geodesic distance and grouping of shapes. At the same time as we gain flexibility by introducing higher deriva- tives, we also make the numerical treatment more difficult. We ex- pect that surfaces without boundary will be an easier case, since re- peated partial integration would introuduce more and more compli- cated boundary terms.

The points and the comparison of these metrics with the LDM framework remain the topic of further work.

126 Bibliography

[1] G. N. Wells A. Logg, K.-A. Mardal. Automated Solution of Differential Equations By the Finite Element Method. Springer, 2011.

[2] Robert A. Adams. Sobolev Spaces. Academic Press, 1975.

[3] V. I. Arnold. Sur un principe variationnel pour les ´ecoulements sta- tionnaires des liquides parfaits et ses applications aux probl`emesde stabilit´enon lin´eaires. J. M´ecanique, 5:29–43, 1966.

[4] N. Aronszajn. Theory of reproducing kernels. Trans. Amer. Math. Soc., 68:337–404, 1950.

[5] M. Bauer and M. Bruveris. A new Riemannian setting for surface registration. Available at http://arxiv.org/abs/1106.0620, 2011.

[6] M. Bauer, M. Bruveris, P. Harms, and P. W. Michor. Geodesic dis- tance for right invariant Sobolev metrics of fractional order on the diffeomorphism group. Preprint available at http://arxiv.org/abs/ 1105.0327, 2011.

[7] M. Bauer, P. Harms, and P. W. Michor. Sobolev metrics on shape space of surfaces. ArXiv e-prints, 2010.

[8] Martin Bauer, Martins Bruveris, Philipp Harms, and Peter W. Michor. Vanishing geodesic distance for the Riemannian metric with geodesic equation the KdV-equation. arXiv:1102.0236, 2011. To apper in Ann. Global Anal. Geometry.

[9] M. F. Beg. Variational and Computational Methods for Flows of Diffeo- morphisms in Image Matching and Growth in Computational Anatomy. Ph.D. Thesis, John Hopkins University, 2003.

127 [10] M. F. Beg, M. I. Miller, A. Trouv´e,and L. Younes. Computing large deformation metric mappings via geodesic flows of diffeomorphisms. Int. J. Comput. Vision, 61(2):139–157, 2005.

[11] M. Bruveris. The energy functional on the Virasoro-Bott group with the L2-metric has no local minima. Submitted for publication. Preprint available at http://arxiv.org/abs/1106.4326, 2011.

[12] M. Bruveris, F. Gay-Balmaz, D. D. Holm, and T. Ratiu. The mo- mentum map representation of images. J. Nonlinear Sci., 21:115–150, 2011.

[13] M. Bruveris, F.-X. Vialard, and L. Risser. Mixture of kernels and iterated semidirect product of diffeomorphism groups. Submitted for publication. Preprint available at http://arxiv.org/abs/1108.2472, 2011.

[14] R. Camassa and D. D. Holm. An integrable shallow water equation with peaked solitons. Phys. Rev. Lett., 71(11):1661–1664, 1993.

[15] Y. Cao, M. I. Miller, S. Mori, R. L. Winslow, and L. Younes. Diffeo- morphic matching of diffusion tensor images. In Computer Vision and Pattern Recognition Workshop, 2006 Conference on, page 67, 2006.

[16] Y. Cao, M. I. Miller, R. L. Winslow, and L. Younes. Large deforma- tion diffeomorphic metric mapping of vector fields. IEEE Trans. Med. Imag., 24(9):1216–1230, 2005.

[17] Pascal Cherrier. Probl`emesde Neumann non lin´eairessur les vari´et´es riemanniennes. Journal of Functional Analysis, 57(2):154–206, 1984.

[18] G.E. Christensen, R.D. Rabbitt, and M.I. Miller. Deformable tem- plates using large deformation kinematics. Image Processing, IEEE Transactions on, 5(10):1435 –1447, 1996.

[19] Adrian Constantin and Boris Kolev. Geodesic flow on the diffeomor- phism group of the circle. Comment. Math. Helv., 78(4):787–804, 2003.

[20] C. J. Cotter. The variational particle-mesh method for match- ing curves. Journal of Physics A: Mathematical and Theoretical, 41(34):344003, 2008.

128 [21] C. J. Cotter and D. D. Holm. Geodesic boundary value problems with symmetry. Preprint available at http://arxiv.org/abs/0911.2205, 2009.

[22] F. X. Le Dimet, H. E. Ngodock, and I. M. Navon. Sensitivity analysis in variational data assimilation. J. Meteorol. Soc. Japan, pages 145–155, 1997.

[23] David G. Ebin and Jerrold Marsden. Groups of diffeomorphisms and the motion of an incompressible fluid. Ann. of Math. (2), 92:102–163, 1970.

[24] J¨urgenEichhorn. The boundedness of connection coefficients and their derivatives. Math. Nachr., 152:145–158, 1991.

[25] J¨urgenEichhorn. Global Analysis on Open Manifolds. Nova Science Publishers Inc., New York, 2007.

[26] D. B. A. Epstein. The simplicity of certain groups of homeomorphisms. Compositio Math., 22:165–173, 1970.

[27] P.T. Fletcher, S. Venkatasubramanian, and S. Joshi. Robust statis- tics on riemannian manifolds via the geometric median. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1 –8, june 2008.

[28] Fran¸coisGay-Balmaz. Well-posedness of higher dimensional Camassa- Holm equations. Bull. Transilv. Univ. Bra¸sovSer. III, 2(51):55–58, 2009.

[29] Joan Glaun`es,Anqi Qiu, Michael Miller, and Laurent Younes. Large deformation diffeomorphic metric curve mapping. International Jour- nal of Computer Vision, 80:317–336, 2008.

[30] R. E. Greene. Complete metrics of bounded curvature on noncompact manifolds. Arch. Math., 31(1):89–95, 1978.

[31] U. Grenander. General Pattern Theory. Oxford University Press, 1994.

[32] U. Grenander and M. I. Miller. Computational anatomy: An emerging discipline. Quart. Appl. Math., 56:617–694, 1998.

129 [33] Ulf Grenander and Michael I. Miller. Pattern theory: from representa- tion to inference. Oxford University Press, Oxford, 2007.

[34] Laurent Guieu and Claude Roger. L’alg`ebre et le groupe de Virasoro. Les Publications CRM, Montreal, QC, 2007. Aspects g´eom´etriqueset alg´ebriques,g´en´eralisations. [Geometric and algebraic aspects, gener- alizations], With an appendix by Vlad Sergiescu.

[35] D. D. Holm. Geometric Mechanics Part II: Rotating, Translating and Rolling. Imperial College Press, London, 2008.

[36] D. D. Holm and J. E. Marsden. The Breadth of Symplectic and Poisson Geometry, volume 232 of Progress in Mathematics, chapter Momentum Maps and Measure-valued Solutions (Peakons, Filaments, and Sheets) for the EPDiff Equation, pages 203–235. Birkh¨auser,2005.

[37] D. D. Holm, J. E. Marsden, and T. S. Ratiu. The Euler-Poincar´eequa- tions and semidirect products with applications to continuum theories. Adv. Math., 137:1–81, 1998.

[38] D. D. Holm, J. T. Rathanather, A. Trouv´e,and L. Younes. Soliton dynamics in computational anatomy. NeuroImage, 23:170–178, 2004.

[39] D. D. Holm, A. Trouv´e,and L. Younes. The Euler-Poincar´etheory of metamorphosis. Quart. Appl. Math, pages 1–25, 2009. Electronically published on September 2, 2009.

[40] S.C. Joshi and M.I. Miller. Landmark matching via large deformation diffeomorphisms. Image Processing, IEEE Transactions on, 9(8):1357 –1370, 2000.

[41] David G. Kendall. Shape manifolds, procrustean metrics, and com- plex projective spaces. Bulletin of the London Mathematical Society, 16(2):81–121, 1984.

[42] B. Khesin, J. Lenells, G. Misiolek, and S. C. Preston. Curvatures of Sobolev metrics on diffeomorphism groups. ArXiv e-prints, September 2011.

130 [43] Boris Khesin and Gerard Misiolek. Euler equations on homogeneous spaces and Virasoro orbits. Advances in Mathematics, 176(1):116–144, 2003.

[44] Boris Khesin and Robert Wendt. The Geometry of Infinite- Dimensional Groups. Springer, 2009.

[45] Martin Kilian, Niloy J. Mitra, and Helmut Pottmann. Geometric mod- eling in shape space. ACM Trans. Graph., 26, 2007.

[46] Yu. A. Kordyukov. Lp-theory of elliptic differential operators on man- ifolds of bounded geometry. Acta Appl. Math., 23(3):223–260, 1991.

[47] Andreas Kriegl and Peter W. Michor. The Convenient Setting of Global Analysis, volume 53 of Mathematical Surveys and Monographs. Amer- ican Mathematical Society, Providence, RI, 1997.

[48] S. Kurtek, E. Klassen, J. Gore, Z. Ding, and A. Srivastava. Elastic geodesic paths in shape space of parametrized surfaces. Pattern Anal- ysis and Machine Intelligence, IEEE Transactions on, PP(99):1, 2011.

[49] Xiuwen Liu, Yonggang Shi, Ivo Dinov, and Washington Mio. A com- putational model of multidimensional shape. International Journal of Computer Vision, 89:69–83, 2010.

[50] J. E. Marsden and T. S. Ratiu. Introduction to Mechanics and Sym- metry, volume 17 of Texts in Applied Mathematics. Springer-Verlag, second edition, 1999.

[51] J. E. Marsden and J. Scheurle. The reduced Euler-Lagrange equations. Fields Inst. Commun., 1:139–164, 1983.

[52] John N. Mather. Commutators of diffeomorphisms. Comment. Math. Helv., 49:512–528, 1974.

[53] John N. Mather. Commutators of diffeomorphisms. II. Comment. Math. Helv., 50:33–40, 1975.

[54] M. Micheli, P. W. Michor, and D. Mumford. Sectional curvature in terms of the cometric, with applications to the riemannian manifolds of landmarks., 2010.

131 [55] Peter W. Michor. Some geometric evolution equations arising as geodesic equations on groups of diffeomorphisms including the Hamil- tonian approach. In Phase Space Analysis of Partial Differential Equa- tions, volume 69 of Progr. Nonlinear Differential Equations Appl., pages 133–215. Birkh¨auserBoston, 2006.

[56] Peter W. Michor and David Mumford. Vanishing geodesic distance on spaces of submanifolds and diffeomorphisms. Doc. Math., 10:217–245, 2005.

[57] Peter W. Michor and David Mumford. Riemannian geometries on spaces of plane curves. J. Eur. Math. Soc. (JEMS) 8 (2006), 1-48, 2006.

[58] Peter W. Michor and David Mumford. An overview of the Riemannian metrics on spaces of curves using the Hamiltonian approach. Appl. Comput. Harmon. Anal., 23(1):74–113, 2007.

[59] P.W. Michor and T. Ratiu. Geometry of the Virasoro-Bott group. J. Lie Theory, 8:293–309, 1998.

[60] M. I. Miller, A. Trouv´e,and L. Younes. On the metrics and Euler- Lagrange equations of computational anatomy. Ann. Rev. Biomed. Eng., 4:375–405, 2002.

[61] M. I. Miller, A. Trouv´e,and L. Younes. Geodesic shooting for compu- tational anatomy. J. Math. Imaging Vis., 244:209–228, 2006.

[62] Washington Mio, Anuj Srivastava, and Shantanu Joshi. On shape of plane elastic curves. International Journal of Computer Vision, 73:307– 324, 2007.

[63] G. Misiolek. Conjugate points in the Bott-Virasoro group and the KdV equation. Proc. Amer. Math. Soc., 125:935–940, 1997.

[64] David Mumford. Pattern theory and vision. In Questions Math´emathiquesen Traitement du Signal at de L’image, chapter 3, pages 7–13. Institut Henri Poincar´e,Paris, 1998.

[65] V. Y. Ovsienko and B. A. Khesin. Korteweg–de Vries superequations as and Euler equation. Funct. Anal. Appl., 21:329–331, 1987.

132 [66] C. Pierpaoli, P. Jezzard, P. J. Basser, A. Barnett, and G. D. Chiro. Dif- fusion tensor MR imaging of the human brain. Radiology, 201(3):637– 648, 1996.

[67] L. Risser, F.-X. Vialard, R. Wolz, D. D. Holm, and D. Rueckert. Simul- taneous fine and coarse diffeomorphic registration: Application to the atrophy measurement in alzheimer’s disease. In Medical Image Com- puting and Computer-Assisted Intervention – MICCAI 2010, volume 6362 of Lecture Notes in Computer Science, pages 610–617, Berlin, 2010. Springer.

[68] L. Risser, F.-X. Vialard, R. Wolz, M. Murgasova, D. D. Holm, and D. Rueckert. Simultaneous multiscale registration using large deforma- tion diffeomorphic metric mapping. IEEE Trans. Med. Imaging, 2011.

[69] S. Saitoh. Theory of Reproducing Kernels and its Applications. Pitman Research Notes in Mathematics, 1988.

[70] Ren´eSchilling. Measures, Integrals and Martingales. Cambridge Uni- versity Press, New York, 2005.

[71] L. Schwartz. Sous-espaces hilbertiens d’espaces vectoriels topologiques et noyaux associ´es(noyaux reproduisants). J. Anal. Math., 13:115–256, 1964.

[72] D. F. Scollan, A. Holmes, R. L. Winslow, and J. Forder. Histological validation of myocardial microstructure obtained from diffusion ten- sor magnetic resonance imaging. Am. J. Physiol. (Heart Circulatory Physiol.), 275:2308–2318, 1998.

[73] Graeme Segal. The geometry of the KdV equation. Internat. J. Modern Phys. A, 6(16):2859–2869, 1991. Topological methods in quantum field theory (Trieste, 1990).

[74] Jayant Shah. H0-type Riemannian metrics on the space of planar curves. Quart. Appl. Math., 66(1):123–137, 2008.

[75] M. A. Shubin. Spectral theory of elliptic operators on noncompact manifolds. Ast´erisque, 207(5):35–108, 1992. M´ethodes semi-classiques, Vol. 1 (Nantes, 1991).

133 [76] S. Sommer, F. Lauze, M. Nielsen, and X. Pennec. Kernel bundle EPDiff: Evolution equations for multi-scale diffeomorphic image regis- tration. In Scale Space and Variational Methods in Computer Vision, Lecture Notes in Computer Science. Springer, 2011.

[77] S. Sommer, M. Nielsen, F. Lauze, and X. Pennec. A multi-scale kernel bundle for LDDMM: Towards sparse deformation description across space and scales. In Proceedings of IPMI 2011, Lecture Notes in Com- puter Science. Springer, 2011.

[78] A. Srivastava, E. Klassen, S.H. Joshi, and I.H. Jermyn. Shape analysis of elastic curves in euclidean spaces. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 33(7):1415 –1428, 2011.

[79] D. W. Thompson. On Growth and Form. Dover, 1992. Reprint of 1942 2nd ed. (1st ed. 1917).

[80] William Thurston. Foliations and groups of diffeomorphisms. Bull. Amer. Math. Soc., 80:304–307, 1974.

[81] Hans Triebel. Theory of Function Spaces. II, volume 84 of Monographs in Mathematics. Birkh¨auserVerlag, Basel, 1992.

[82] Hans Triebel. The Structure of Functions, volume 97 of Monographs in Mathematics. Birkh¨auserVerlag, Basel, 2001.

[83] M. Vaillant and J. Glaunes. Surface matching via currents. In G. Chris- tensen and M. Sonka, editors, IPMI, volume 3565 of Lecture Notes in Computer Science, pages 381–392. Springer, 2005.

[84] F.-X. Vialard. Hamiltonian Approach to Shape Spaces in a Diffeomor- phic Framework: From the Discontinuous Image Matching Problem to a Stochastic Growth Model. Ph.D. Thesis, Ecole´ Normale Sup´erieure de Cachan, 2009.

[85] Fran¸cois-Xavier Vialard, Laurent Risser, Daniel Rueckert, and Colin Cotter. Diffeomorphic 3d image registration via geodesic shooting us- ing an efficient adjoint calculation. International Journal of Computer Vision, pages 1–13, 2011. 10.1007/s11263-011-0481-8.

134 [86] Fran¸cois-Xavier Vialard and Filippo Santambrogio. Extension to BV functions of the large deformation diffeomorphisms matching approach. Comptes Rendus Mathematique, 347(1-2):27 – 32, 2009.

[87] Benedikt Wirth, Leah Bar, Martin Rumpf, and Guillermo Sapiro. A continuum mechanical approach to geodesics in shape space. IJCV, 93(3):293–318, 2011.

[88] Marcus Wunsch. On the geodesic flow on the group of diffeomorphisms of the circle with a fractional Sobolev right-invariant metric. J. Non- linear Math. Phys., 17(1):7–11, 2010.

[89] L. Younes, F. Arrate, and M. I. Miller. Evolution equations in compu- tational anatomy. NeuroImage, 45:40–50, 2009.

[90] L. Younes, P. W. Michor, J. Shah, and D. Mumford. A metric on shape space with explicit geodesics. Atti Accad. Naz. Lincei Cl. Sci. Fis. Mat. Natur. Rend. Lincei (9) Mat. Appl., 19(1):25–57, 2008.

[91] Laurent Younes. Jacobi fields in groups of diffeomorphisms and appli- cations. Quart. Appl. Math., 65(1):113–134, 2007.

[92] Laurent Younes. Shapes and Diffeomorphisms. Springer, 2010.

[93] Laurent Younes, Anqi Qiu, Raimond Winslow, and Michael Miller. Transport of relational structures in groups of diffeomorphisms. Journal of Mathematical Imaging and Vision, 32:41–56, 2008.

135