Deformation analysis for shape and image processing

Laurent Younes

Contents

General notation 5 Chapter 1. Elements of theory 7 1. Definition and notation 7 2. Examples 8 3. Orthogonal spaces and projection 9 4. Orthonormal sequences 10 5. The Riesz representation theorem 11 6. Embeddings and the duality paradox 12 7. Weak convergence in a Hilbert space 15 8. Spline interpolation 16 9. Building V and its kernel 22 Chapter 2. Ordinary differential equations and groups of diffeomorphisms 35 1. Introduction 35 2. A class of ordinary differential equations 37 3. Groups of diffeomorphisms 44 Chapter 3. Matching deformable objects 49 1. General principles 49 2. Matching terms and greedy descent procedures 49 3. Regularized matching 70 Chapter 4. Groups, manifolds and invariant metrics 95 1. Introduction 95 2. Differential manifolds 95 3. Submanifolds 98 4. Lie Groups 99 5. Group action 102 6. Riemannian manifolds 103 7. Gradient descent on Riemannian manifolds 110 Bibliography 113

3

General notation

The differential of a function ϕ :Ω → Rd, Ω being an open subset of Rd, is denoted Dϕ : x 7→ Dϕ(x). It takes values in the space of linear operators from Rd to Rd so that Dϕ(x) can be identified to d × k matrix (containing the partial derivatives of the coordinates of ϕ). d Idk is the identity matrix in R . ∞ CK (Ω) is the set of infinitely differentiable functions on Ω, an open subset of Rd.

5

CHAPTER 1

Elements of Hilbert space theory

1. Definition and notation A set H is a (real) Hilbert space if i) H is a vector space over R (i.e., addition and scalar multiplication are defined on H). 0 0 0 ii) H has an inner product denoted (h, h ) 7→ hh , h iH , for h, h ∈ H. This inner product is a bilinear form, which is positive definite and the associ- p ated is kh]kH = hh , hiH . iii) H is a complete space for the topology associated to the norm. Here are some explanations of iii). Converging sequences for the norm topology are sequences hn for which there exists h ∈ H such that kh − hnkH → 0. Property iii) means that if a sequence (hn, n ≥ 0) in h is a Cauchy sequence, i.e., it collapses in the sense that, for every positive ε there exists n0 > 0 such that khn − hn0 kH ≤ ε for n ≥ n0, then it necessarily has a limit: there exists h ∈ H such that khn − hk → 0 when n tends to infinity.

If condition ii) is weakened to the fact that k.kH is a norm (not necessarily induced by an inner product), one says that H is a Banach space. If H satisfies i) and ii), it is called a pre-Hilbert space. On pre-Hilbert spaces, the Schwartz inequality holds: Proposition 1 (Schwartz inequality). If H is pre-Hilbert, and h, h0 ∈ H, then 0 0 hh , h iH ≤ khkH kh kH The first consequence of this property is Proposition 2. The inner product on H is continuous for the norm topology. Proof. The inner product is a function H × H → R. Letting ϕ(h, h0) = 0 hh , h iH , we have, by Schwartz inequality, and introducing a sequence (hn) which converges to h 0 0 0 |ϕ(h, h ) − ϕ(hn, h )| ≤ kh − hnkH kh kH → 0 which proves the continuity with respect to the first coordinate, and also with respect to the second coordinate by symmetry. ut Working with a complete normed vector space is essential when dealing with infinite sums of elements: if (hn) is a sequence in H, and if khn+1 + ··· + hn+kk can be made arbitrarily small for large n and any k > 0, then completeness implies P∞ that the series n=0 hn has a limit in h. In particular, absolutely converging series in H converge: X X (1) khnkH < ∞ ⇒ hn converges. n≥0 n≥0

7 8 1. ELEMENTS OF HILBERT SPACE THEORY

We shall add a fourth condition to our definition of a Hilbert space iv) H is separable for the norm topology: there exists a countable subset S in H such that, for any h ∈ H and any ε > 0, there exists h0 ∈ S such that kh − h0k ≤ ε. In the following, Hilbert spaces will always be a separable Hilbert spaces (this is not always the case in the literature). A Hilbert space isometry between two Hilbert spaces H and H0 is a an invertible 0 0 0 0 F : H → H such that, for all h, h ∈ H, hF h , F h iH0 = hh , h iH . A sub-Hilbert space of H is a subspace H0 of H (i.e., a non-empty subset closed 0 by linear combination) which is closed for the norm topology: if (hn) is a sequence 0 0 0 in H and khn − hkH → 0 for some h ∈ H, then h ∈ H . Topological closedness implies that H0 is itself a Hilbert space, since Cauchy sequences in H0 also are Cauchy sequences in H, hence converge in H, hence in H0 since H0 is closed. The next proposition shows that every finite dimensional subspace of a Hilbert space is closed, and hence a Hilbert subspace.

Proposition 3. If K is a finite dimensional subspace of H, then K is closed in H

Proof. let e1, . . . , ep be a basis of K. Let (hn) be a sequence in K which Pp converges to some h ∈ H: then hn may be written hn = k=1 aknek and for all l = 1, . . . , p: p X hhn , eliH = hek , eliH akn k=1 p p Let an be the vector in R with coordinates (akn, k = 1, . . . , p) and un ∈ R with coordinates (hhn , ekiH , k = 1, . . . , p). Let also S be the matrix with coefficients skl = hek , eliH , so that the previous system may be written: un = San. The matrix S is invertible: if b belongs to the null space of S, a quick computation shows that p 2

X T biei = b Sb = 0 i=1 H which is only possible when b = 0, since (e1, . . . , en) is a basis. We therefore have −1 an = S un. Since un converges to u with coordinates (hh , ekiH , k = 1, . . . , p) (by −1 continuity of the inner product), we obtain the fact that an converges to a = S u. Pp Pp But this implies that k=1 aknek → k=1 akek and since the limit is unique, we Pp have h = k=1 akek ∈ K. ut

2. Examples n 0 2.1. Finite dimensional Euclidean spaces. H = R with hh , h i n = Pn 0 R i=1 hihi is the standard example of a finite dimensional Hilbert space.

2.2. `2 space of real sequences. Let H be the set of real sequences h = P∞ 2 (h1, h2,...) such that i=1 hi < ∞. Then H is a Hilbert space. 3. ORTHOGONAL SPACES AND PROJECTION 9

2.3. L2 space of functions. Let k and d be two integers. Let Ω be an open subset of Rd. We define L2(Ω, Rd) as the set of all square integrable functions h :Ω → Rd, with inner-product Z 0 0 hh , h iL2 = h(x)h (x)dx Ω Integrals are taken with respect to the Lebesgue measure on Ω, and two functions which coincide everywhere except on a set of null Lebesgue measure are consid- ered as equal. The fact that L2(Ω, Rd) is a Hilbert space is a standard result in integration theory.

3. Orthogonal spaces and projection Let O be a subset of H. The orthogonal space to O is defined by ⊥ O = {h : h ∈ H, ∀o ∈ O, hh , oiH = 0} Theorem 1. O⊥ is a sub-Hilbert space of H. Proof. Stability by linear combination is obvious, and closedness is a conse- quence of the continuity of the inner product. ut When K is a sub-Hilbert space of H (a closed subspace) and h ∈ H, one can define the variational problem: 0 0 (PK (h)) : find k ∈ K such that kk − hkH = min {kh − k kH : k ∈ K} The following theorem is fundamental

Theorem 2. If K is a closed subspace of H and h ∈ H, (PK (h)) has a unique solution, characterized by the property k ∈ K and h − k ∈ K⊥

Definition 1. The solution of problem (PK (h)) in the previous theorem is called the orthogonal projection of h on K and will be denoted πK (h).

Proposition 4. πK : H → K is a linear, continuous function and πK⊥ = id − πK . 0 0 Proof. Let d = inf {kh − k kH : k ∈ K}. The proof relies on the following construction: if k, k0 ∈ K, a direct computation shows that 0 2 k + k 0 2  2 0 2  h − + kk − k kH /4 = kh − kkH + kh − k kH /2 2 H

0 2 0 k+k and the fact that (k + k )/2 ∈ K implies h − 2 ≥ d so that H 0 2  2 0 2  2 kk − k kH ≤ kh − kkH + kh − k kH /2 − d .

Now, from the definition of the infimum, one can find a sequence kn in K such that 2 2 −n kkn − hkH ≤ d + 2 for each n. The previous inequality implies that 2 −n −m kkn − kmkH ≤ 2 + 2 /2 which implies that kn is a Cauchy sequence, and therefore converges to a limit k 0 which belongs to K and kk − hkH = d. If the minimum is attained for another k in K, we have, by the same inequality 0 2 2 2 2 kk − k kH ≤ d + d /2 − d = 0 . 10 1. ELEMENTS OF HILBERT SPACE THEORY

0 so that k = k and uniqueness is proved, so that πK is well defined. 0 0 2 Let k = πK (h), and k ∈ K and consider the function f(t) = kh − k − tk kH , 2 0 which is by construction minimal for t = 0. We have f(t) = kh − kkH −2thh − k , k iH + 2 0 2 0 t kk kH and this can be minimal at 0 only if hh − k , k iH = 0. Since this has to be true for every k0 ∈ K, we obtain the fact that h − k ∈ K⊥. Conversely, if k ∈ K and h − k ∈ K⊥, we have for any k0 ∈ K

0 2 2 0 2 2 kh − k kH = kh − kkH + kk − k kH ≥ kh − kkH so that k = πK (h). 0 0 We now prove the proposition: let h, h ∈ H and α, α ∈ R. Let k = πK (h), 0 0 0 0 0 0 k = πK (h ); we want to show that πK (αh + α h ) = αk + α k for which it suffices to prove (since αk + α0k0 ∈ K) that αh + α0h0 − αk − α0k0 ∈ K⊥. But this is true since αh − α0h0 − αk − α0k0 = α(h − k) + α0(h0 − k0), and h − k ∈ K⊥, h0 − k0 ∈ K⊥, and K⊥ is a vector space. Continuity comes from 2 2 2 khkH = kh − πK (h)kH + kπK (h)kH so that kπK (h)kH ≤ khkH . 0 0 ⊥ Finally, if h ∈ H and k = πK (h), then k = πK⊥ (h) is characterized by k ∈ K and h − k0 ∈ (K⊥)⊥. The first property is true for h − k, and for the second, we need to show that K ⊂ (K⊥)⊥ which is a direct consequence of the definition of the orthogonal, and true in fact for any set O. ut

We have the interesting property Corollary 1. K is a sub-Hilbert space of H if and only if (K⊥)⊥ = K Proof. The ⇐ implication is a consequence of theorem 1. The fact that K ⊂ (K⊥)⊥ is obviousy true for any subset K ⊂ H, so that it suffices to show that every element of (K⊥)⊥ belongs to K. Assume that h ∈ (K⊥)⊥: this implies that πK⊥ (h) = 0 but since πK⊥ (h) = h − πK (h), this implies that h = πK (h) ∈ K. ut

4. Orthonormal sequences

A sequence (e1, e2,...) in a Hilbert space H is orthonormal if and only if 2 hei , ejiH = 1 if i = j and 0 otherwise. In such a case, if α = (α1, α2,...) ∈ ` , the P∞ series i=1 αiei converges in H (its partial sums form a Cauchy sequence) and if h is the limit, one has αi = hh , eiiH . Conversely, if h ∈ H then the sequence (hh , e1iH , hh , e2iH ,...) belongs to 2 Pn Pn 2 ` . Indeed, letting hn = i=1 hh , eiiH ei, one has hhn , hiH = i=1 hh , eiiH = 2 khnkH . On the other hand, one has, by Schwartz’s inequality hhn , hiH ≤ khnkH khkH which implies that khnkH ≤ khkH : therefore ∞ X 2 hh , eii < ∞ i=1

Denoting by K = Hilb(e1, e2,...) the smallest Hilbert subspace of H containing this sequence, one has the identity ( ∞ ) X 2 (2) K = αkek :(α1, α2,...) ∈ l n=1 5. THE RIESZ REPRESENTATION THEOREM 11

This is left as an exercise. As a consequence of this, we see that h 7→ (hh , e1iH , hh , e2iH ,...) 2 is an isometry between Hilb(e1, e2,...) and ` . Moreover, we have, for h ∈ H ∞ X πK (h) = hh , eiiH ei i=1

(To prove it, check that h − πK (h) is orthogonal to every ei.) An orthonormal set (e1, e2,...) is said complete in H if H = Hilb(e1, e2,...). In this case, we see that H is itself isometric to `2, and the interesting point is that this is always true: Theorem 3. Every (separable) Hilbert space has a complete orthonormal se- quence. A complete orthonormal sequence in H is also called an orthonormal basis of H. Proof. The proof relies on the important Schmidt orthonormalization proce- dure. Let f1, f2,... be a dense sequence in H. We let e1 = fk1 / kfk1 kH where fk1 is the first non vanishing element in the sequence. Assume that an orthonormal sequence e1, . . . , en have been constructed with a sequence k1, . . . , kn such that ei ∈ Vn = vect(f1, . . . , fki ) for each i. First assume that fk ∈ vect(e1, . . . , en) for all k > kn: then H = Vn is finite dimensional. Indeed, H is equal to the closure of (f1, f2,...) which is included in vect(e1, . . . , en) which is closed, as a finite dimensional vector subspace of H.

Assume now that there exists kn+1 such that fkn+1 6∈ Vn. Then, we may set  en+1 = λ fkn+1 − πVn (fkn+1 ) where λ is selected so that en+1 has unit norm, which is always possible. So there are two cases: either the previous construction stops at some point, and H is finite dimensional and the theorem is true, either the process carries on indefinitely, yielding an orthonormal sequence (e1, e2,...) and an increasing sequence of integers (k1, k2,...). But Hilb(e1, e2,...) contains (fn) which is dense in H and therefore is equal to H so that the orthonormal sequence is complete. ut

5. The Riesz representation theorem The of a normed vector space H is the space containing all contin- uous linear functionals ϕ : H → R. It is denoted H∗, and we will often use the notation, for ϕ ∈ H∗ and h ∈ H: (3) ϕ(h) = (ϕ| h) Thus, parentheses indicate linear forms, angles indicate inner products. H being a normed space, H∗ also has a normed space structure defined by:

kϕkH∗ = max {(ϕ| h): h ∈ H, khkH = 1} Continuity of ϕ is in fact equivalent to the finiteness of this norm. 0 0 ∗ When H is Hilbert, the function ϕh : h 7→ hh , h iH belongs to H , and by the Schwartz inequality kϕhkH∗ = khkH . The Riesz-Nagy representation theorem states that there exists no other continous linear form on H. Theorem 4 (Riesz). Let H be a Hilbert space. If ϕ ∈ H∗, there exists a unique h ∈ H such that ϕ = ϕh. 12 1. ELEMENTS OF HILBERT SPACE THEORY

0 0 Proof. Uniqueness comes from the fact that if hh , h iH = 0 for all h ∈ H, then taking h0 = h yields h = 0. To prove existence, we use, as in finite dimension, the orthogonal of the null space of ϕ. So, let K = {h0 ∈ H, (ϕ| h0) = 0}; K is a closed linear subspace of H (because ϕ is linear and continuous). If K⊥ = {0}, ⊥ ⊥ then K = (K ) = H and ϕ = 0 = ϕ0, which proves the theorem in this case. So, we assume that K⊥ 6= 0. ⊥ Now, let h1 and h2 be two non-zero elements in K . We have (ϕ| α1h1 + α2h2) = α1(ϕ| h1) + α2(ϕ| h2), so that it is always possible to find non-vanishing α1 and α2 ⊥ such that (ϕ| α1h1 + α2h2) = 0 since K is a vector space. But this implies that α1h1 + α2h2 also belongs to K, which is only possible when α1h1 + α2h2 = 0. Thus ⊥ for any h1, h2 ∈ K , there exists a (α1, α2) 6= (0, 0) such that α1h1 + α2h2 = 0, so ⊥ ⊥ that K has dimension 1. Fix h1 ∈ K such that, say kh1kH = 1. Using orthog- onal projections, and the fact that K is closed, any vector in H can be written: h = hh , h1ih1 + k with k ∈ K. This implies

(ϕ| h) = hh , h1i(ϕ| h1) so that ϕ = ϕ(ϕ| h1)h1 . ut

6. Embeddings and the duality paradox

6.1. Definition. Assume that H and H0 are two Banach spaces. An embed- ding of H in H0 is a continuous, one to one, linear map from H to H0, i.e., a map ι : H → H0 such that, for all h ∈ H, (4) kι(h)k ≤ C khk H0 H

The embedding is compact when the set {ι(h), khkH ≤ 1} is compact in H0. In the separable case (to which we restrict), this means that for any bounded sequence (hn, n > 0) in H, there exists a subsequence of (ι(hn)) which converges in H0. In all the applications we will be interested in, H and H0 will be function spaces, and we will have a set inclusion H ⊂ H0. For example H may be a set of smooth functions and H0 a set of less smooth functions (see the examples of embeddings below). Then, one says that H is embedded (resp. compactly embedded) in H0 if the canonical inclusion map: ι : H → H0 is continuous (resp. compact). If ϕ is a continuous linear form on H0 and ι : H → H0 is an embedding, then one can define the form ι∗ϕ on H by (ι∗ϕ| h) = (ϕ| ι(h)). Let’s verify that ι∗ϕ is continuous on H. We have, for all h ∈ H: ∗ |(ι ϕ| h)| = |(ϕ| ι(h))| ≤ kϕk ∗ kι(h)k ≤ C kϕk ∗ khk H0 H0 H0 H where the first inequality comes from the continuity of ϕ and the last one from (4). This proves the continuity of ι∗ϕ as a linear form on H together with the inequality: ∗ kι ϕk ∗ ≤ C kϕk ∗ . H H0 We have the following theorem:

Theorem 5. If ι : H → H0 is a Banach space embedding, such that ι(H) is ∗ ∗ ∗ dense in H0. Then the map ι : H0 → H also is an embedding. Proof. It only remains to prove that ι∗ is one-to-one, this relying on ι(H) ∗ ∗ 0 0 ∗ being dense in H0. Assume that ι ϕ = ι ϕ , for ϕ, ϕ ∈ H0 . This implies that, for 0 0 all h ∈ H,(ϕ| ιh) = (ϕ | ιh), i.e., that (ϕ − ϕ | h0) = 0 for any h0 ∈ ι(H). This 6. EMBEDDINGS AND THE DUALITY PARADOX 13

0 latter set being dense in H0 and ϕ and ϕ being continuous, this is only possible when ϕ = ϕ0. This proves that ι∗ is one-to-one. ut 6.2. Examples. 6.2.1. Banach spaces of continuous functions. Let Ω be an open subset of Rd. The space of continous functions on Ω with at least p continuous derivatives will be denoted Cp(Ω, R). If Ω is bounded, and thus Ω is compact, Cp(Ω, R) is the set of functions on Ω which are p times differentiable on Ω, each partial derivative being extended to a continuous function on Ω; Cp(Ω, R) has a Banach space structure when equipped with the norm

kfkp,∞ = max khk∞ h where h varies in the set of partial derivatives of f or order lower or equal to p. p p We let C0 (Ω, R) be the set of functions f in C (Ω, R) with vanishing derivatives p up to order p on ∂Ω or at infinity. It is a closed subspace of C0 (Ω, R) and therefore also a Banach space. We obviously have, almost by definition, the fact that Cp(Ω, R) is embedded in Cq(Ω, R) as soon as p ≤ q. In fact the embedding is compact when p < q. If Ω is bounded, compact sets in C0(Ω, R) are exactly described by Ascoli’s theorem: they are bounded subsets M ⊂ C0(Ω, R) (for the supremum norm) which are uniformly continuous, meaning that, for any x ∈ Ω, for any ε > 0 there exists η > 0 such that, sup sup |h(x) − h(y)| < ε . y∈Ω,|x−y|<η h∈M Compact sets in Cp(Ω, R) are bounded subsets of Cp(Ω, R) over which all the pth partial derivatives are uniformly continuous. For unbounded Ω, one defines the norm

kfkM,p,∞ = max max |h(x)|. h partial derivative of f |x|≤M p One says that fn → f in C on compact sets if kfn − fkM,p,∞ → 0 for all M. 6.2.2. Hilbert Sobolev spaces. Let Ω ⊂ Rd. We now define the space H1(Ω, R) of functions with square integrable weak derivatives. A function u belongs to this set if and only if u ∈ L2(Ω, R), and for each i = 1, . . . , k, there exists a function 2 ∞ ui ∈ L (Ω, R) such that for any function ϕ C with compact support in Ω, one has Z ∂ϕ Z u(x) (x)dx = − ui(x)ϕ(x)dx Ω ∂xi Ω The function ui is called the (weak) partial derivative of u, and will be denoted ∂u . The integration by parts formula shows that this coincides with the standard ∂xi partial derivative when it exists. H1(Ω, Rd) has a Hilbert space structure with the inner product: k X  ∂u ∂v  hu , vi = hu , vi + , . H1 L2 ∂x ∂x i=1 i i L2 The space Hm(Ω, R) can now be defined by induction as the set of functions f ∈ H1(Ω, R) with all partial derivatives belonging to Hm−1(Ω, R). Partial derivatives of increasing order are defined by induction, and the inner product on Hm is the sum of the L2 inner products of all partial derivatives up to order m. 14 1. ELEMENTS OF HILBERT SPACE THEORY

There are theorems which ensure embeddings in classic spaces of functions. We will be interested with a special case of Morrey’s theorem, which is stated below:

Theorem 6. Let Ω ⊂ Rd, and assume that m − d/2 > 0. Then, for any j ≥ 0, Hj+m(Ω, R) is embedded in Cj(Ω, R). If Ω is bounded, the embedding is compact. Moreover, if θ ∈]0, m−d/2], and u ∈ Hj+m(Ω, R), then every partial derivative, h, of order j has H¨olderregularity θ: for all x, y ∈ Ω θ |h(x) − h(y)| ≤ C kukHm+j |x − y| . m m As a final definition, we let H0 (Ω, R) be the completion in H0 (Ω, R) of the ∞ m set of C functions with compact support in Ω: u belongs to H0 (Ω, R) if and only m ∞ if u ∈ H (Ω, R) and there exists a sequence of functions un, C with compact support in Ω such that ku − unkHm tends to 0. A direct application of Morrey’s j+m theorem shows that, if m − k/2 > 0, then, for any j ≥ 0, H0 (Ω, R) is embedded j in C0 (Ω, R). 6.3. The duality paradox. The Riesz representation theorem identifies a Hilbert space H to its dual H∗. However, when H is (densely) embedded in another Hilbert space H0, every continuous linear form on H0 is also continuous on H, and ∗ ∗ H0 is embedded in H . We therefore have the sequence of embeddings ∗ ∗ H → H0 ' H0 → H ∗ but this sequence loops since H ' H. This indicates that H0 is also embedded 1 in H. This is indeed a strange result: for example, let H = H (Ω, R), and H0 = 2 L (Ω, R): the embedding of H in H0 is clear from their definition, but the converse does not seem natural, since there are more contraints in belonging to H than to H0. To understand this reversed embedding, we must think in terms of linear forms. 2 If u ∈ L (Ω, R), we may consider the linear form ϕu defined by Z (ϕu| v) = hu , viL2 = u(x)v(x)dx Ω When v ∈ H1(Ω, R), we have

(ϕu| v) ≤ kukL2 kvkL2 ≤ kukL2 kvkH1 1 so that ϕu is continuous when seen as a linear form on H (Ω, R). The Riesz representation theorem implies that there exists au ˜ ∈ H1(Ω, R) such that, for all 1 v ∈ H (Ω, R), hu , viL2 = hu˜ , viH1 : the relation u 7→ u˜ provides the embedding of L2(Ω, R) into H1(Ω, R). Let us be more specific and take Ω =]0, 1[. The relation states that, for any v ∈ H1(Ω, R), Z 1 Z 1 Z 1 u˜0(t)v0(t)dt + u˜(t)v(t)dt = u˜(t)v(t)dt 0 0 0 Let us make the simplifying assumption thatu ˜ has two derivatives, in order to integrate by part the first integral and obtain Z 1 Z 1 Z 1 (5) u0(1)v(1) − u0(0)v(0) − u˜00(t)v(t) + u˜(t)v(t)dt = u˜(t)v(t)dt 0 0 0 Such an identy can be true for every v in H1(Ω, R) if and only if u0(0) = u0(1) = 0 and −u˜00 +u ˜ = u:u ˜ is thus a solution of a second order differential equation, with first order boundary conditions, and the embedding of L2(]0, 1[, R) into H1(]0, 1[, R) 7. WEAK CONVERGENCE IN A HILBERT SPACE 15 just shows that a unique solution exists, at least in the generalized sense of equation (5). As seen in these examples, even if, from an abstract point of view, we have an identification between the two Hilbert spaces, the associated embeddings are of very different nature, the first one corresponding to a set inclusion (it is canonical), the second to the solution of a differential equation in one dimension, and in fact to a partial differential equation in the general case. Another striking application of the Riesz representation theorem is presented in the section 8, which will be our first incursion in the theory of deformations.

7. Weak convergence in a Hilbert space Let us start with the definition:

Definition 2. When V is a Banach space, a sequence (vn) in V is said to converge weakly to some v ∈ V if and only if, for all continous linear form α : V → R, one has α(vn) → α(v) when n tends to infinity. The following simple proposition will be useful later:

Proposition 5. Assume that V and W are Banach spaces and that W is (continuously) embedded in V . Then, if a sequence (wn) in W weakly converges (in W ) to some w ∈ W , then wn also weakly converges to w in V .

This just says that if α(wn) → α(w) for all continuous linear functionals on W , then the convergence holds for all continuous linear functional on V , which is in fact obvious because the restriction to W of any continuous linear functional on V is a continous linear functional on W . In the case of a Hilbert space, the Riesz representation theorem immediately provides the proposition:

Proposition 6. Let V be a Hilbert space. A sequence vn in V weakly converges to an element v ∈ V if and only if, for all w ∈ V ,

lim hw , vni = hw , vi n→∞ V V

Moreover, in vn weakly converges to v, then kvk ≤ lim inf kvnk.

The last statement comes from the inequality: hvn , viV ≤ kvnkV kvkV which provides at the limit 2 kvk ≤ kvk lim inf kvnk V V n→∞ V

If kvkV = 0, there is nothing to prove, and in the other case, on can divide both terms of the inequality by kvkV to obtain the result. Finally, the following result is essential for us ([28])

Theorem 7. If V is a Hilbert space and (vn) is a bounded sequence in V (there exists a constant C such that kvnk ≤ C for all n), then one can extract a subsequence from vn which weakly converges to some v ∈ V . 16 1. ELEMENTS OF HILBERT SPACE THEORY

8. Spline interpolation

8.1. The scalar case. Let Ω ⊂ Rd Consider a Hilbert space V included in L2(Ω, R). We assume that elements of V are smooth enough, and require the inclusion and the canonical embedding of V in C0(Ω, Rd). For example, it suffices (from Morrey’s theorem) that V ⊂ Hm(Ω, Rd) for m > k/2. This implies that there exists a constant C such that, for all v ∈ V

kvk∞ ≤ C kvkV . We make another assumption on V . We assume that a relation of the kind PN i=1 αiv(xi) = 0 cannot be true for every v ∈ V unless α1 = ... = αN = 0, (x1, . . . , xN ) being an arbitrary family of distinct points in Ω. The next section will investigate some practical methods to define such a set V . Each x in Ω specifies a linear form δx defined by (δx| v) = v(x) for x ∈ V . We have

|(δx| v)| ≤ kvk∞ ≤ C kvkV so that δx is continuous on V . But this implies, by Riesz’s theorem, that there exists an element Kx in V such that, for every v ∈ V , one has

(6) v(x) = hKx , viV

Since Kx belongs to V , it is a continuous function y 7→ Kx(y). This also defines a function, denoted K :Ω × Ω → R by K(y, x) = Kx(y). This function K has several interesting properties. First, applying equation (6) to v = Ky yields

K(x, y) = Ky(x) = hKx ,KyiV Since the last term is symmetric, we have K(x, y) = K(y, x), and because of the obtained identity, K is called the reproducing kernel of V . A second property is the fact that K is positive definite, in the sense that, for any family x1, . . . , xN ∈ V and any sequence α1, . . . , αN in R, the double sum N X αiαjK(xi, xj) i,j=1 is always positive, ans vanishes if and only if all αi equals 0. Indeed, by the 2 PN reproducing property, this sum may be written i=1 αiKxi and this is positive. V PN If it vanishes, then i=1 αiKxi = 0, which implies, by equation (6), that, for every PN v ∈ V , one has i=1 αiv(xi) = 0, and our original assumption on V implies that α1 = ··· = αN = 0. This kernel will help us to solve the following interpolation problem: (S) Fix a family of distinct points x1, . . . , xN in Ω. We want to determine a function v ∈ V of minimal norm, subject to the constraints v(xi) = λi, for some prescribed values λ1, . . . , λN ∈ R. To solve this problem, define V0 to be the set of v for which the constraints vanish

V0 = {v ∈ V : v(xi) = 0, i = 1,...,N} Using the kernel K, we may write

V0 = {v ∈ V : hKxi , vi = 0, i = 1,...,N} 8. SPLINE INTERPOLATION 17 so that ⊥ V0 = vect {Kx1 ,...,KxN } the orthogonal being taken for the V inner product. We have the first result

⊥ Lemma 1. If there exists a solution vˆ of problem S, then vˆ ∈ V0 = vect {Kx1 ,...,KxN }. ⊥ Moreover, if vˆ ∈ V0 is a solution of S restricted to this set, then it is a solution of S on V . Proof. Letv ˆ be this solution, and let v∗ be its orthogonal projection on vect {Kx1 ,...,KxN }. From the properties of orthogonal projections, we havev ˆ − ∗ ∗ v ∈ V0, which implies, by the definition of V0 thatv ˆ(xi) = v (xi) for i = 1,...,N. ∗ But, since kv kV ≤ kvˆkV (by the variational definition of the projection), kvˆkV ≤ ∗ ∗ kv kV , by assumption, both norms are equal, which is only possible whenv ˆ = v . ⊥ Therefore,v ˆ ∈ V0 and the proof of the first assertion is complete. ⊥ Now, ifv ˆ is a solution of S in which V is replaced by V0 , then, if v is any 2 2 function in V which satisfies the constraints, then v − vˆ ∈ V0 and kvkV = kvˆkV + 2 kv − vˆkV which shows thatv ˆ is a solution of the initial problem. ut This lemma allows us to restrict the search for a solution of S to the set of linear combination of Kx1 ,...,KxN which places us in a very comfortable finite PN dimensional situation. We look forv ˆ under the formv ˆ = i=1 αiKxi , which may also be written N X vˆ(x) = αiK(x, xi) i=1 and we introduce the N ×N matrix S such that sij = K(xi, xj). The whole problem T may now be reformulated in function of the vector α = (α1, . . . , αN ) (a column vector) and of the matrix S. Indeed, by the self reproducing property of K, we have N 2 X T (7) kvˆkV = αiαjK(xi, xj) = α Sα i=1 PN and each contraint may be written λi = v(xi) = j=1 αjK(xi, xj), so that, letting T λ = (λ1, . . . , λN ) , the whole system of constraints may be expressed as Sα = λ. But our hypotheses imply that S is invertible: indeed, if Sα = 0, then αT Sα = 0 which, by equation (7) and the positive definiteness of K, is only possible when ⊥ α = 0 (we assume that the xi are distincts). Therefore, there is only onev ˆ in V0 which satisfies the constraints, and it corresponds to α = S−1λ. These results are summarized in the next theorem. Theorem 8. Problem S has a unique solution in V , given by N X vˆ(x) = K(x, xi)αi i=1 with    −1   α1 K(x1, x1) ...K(x1, xN ) λ1  .   . . .   .   .  =  . . .   .  αN K(xN , x1) ...K(xN , xN ) λN 18 1. ELEMENTS OF HILBERT SPACE THEORY

Another important variant of the same problem comes when the hard con- straints v(xi) = λi are replaced by soft constraints, under the form of a penalty function added to the minimized norm. This may be expressed as the minimization of a function of the form N 2 X E(v) = kvkV + C ϕ(|v(xi) − yi|) i=1 for some increasing, convex function on [0, +∞[ and C > 0. Since the second term of E does not depend on the projection of v on V0, Lemma 1 remains valid, reducing again the problem to finding v of the kind N X v(x) = K(x, xi)αi i=1 for which   N N N X X X E(v) = αiαjK(xi, xj) + C ϕ  K(xi, xj)αj − λi 

i,j=1 i=1 j=1 Assume, to simplify, that ϕ is differentiable and ϕ0(0) = 0. We have, letting ψ(x) = sign(x)ϕ0(x), N N N ! ∂E X X X = 2 αiK(xi, xj) + C K(xi, xj)ψ K(xi, xl)αl − λi ∂αj i=1 i=1 l=1 −1 ∂E Assuming, still, that the xi are distincts, we can apply S to the system = ∂αj 0, j = 1,...,N, which characterizes the minimum, yielding   N X (8) 2αi + Cψ  K(xi, xj)αj − λi  = 0

j=1 This suggest an algorithm (which is a certain kind of gradient descent) to minimize E, that we can describe as follows: Algorithm 1 (General Spline smoothing).

Step 0 Start with an initial guess α0 for α step t.1 Compute, for i = 1,...,N,   N t+1 t γC X α = (1 − γ)αi − ψ  K(xi, xj)αj − λi  i 2 j=1 where γ is a fixed, small enough, . step t.2 Use a convergence test: if positive, stop, otherwise increment t and restart step t. This algorithm will converge if γ is small enough. However, the particular case of ϕ(x) = x2 is much simpler to solve, since in this case ψ(x) = 2x and equation (8) becomes  N  X 2αi + 2C  K(xi, xj)αj − λi = 0 j=1 8. SPLINE INTERPOLATION 19

The solution of this equation is α = (S + I/C)−1λ, yielding a result very similar to theorem 8: Theorem 9. The minimum over V of N 2 X 2 kvkV + C |v(xi) − λi| i=1 is attained at N X vˆ(x) = K(x, xi)αi i=1 with  α    1 λ1  .  −1 ...  .  = (S + I/C)   λ αN N and   K(x1, x1) ...K(x1, xN )  . . .  S =  . . .  K(xN , x1) ...K(xN , xN ) 8.2. The vector case. In the previous section, elements of V were functions from Ω to R. When working with deformations, which is our goal here, functions of interest describe displacements of points in Ω and therefore must be vector valued. This leads us to address the problem of spline approximation for vector fields in Ω, which, as will be seen, is handled very similarly to the scalar case. So, in this section, V is a Hilbert space, canonically embedded in L2(Ω, Rd) and in C0(Ω, Rd). Fixing x ∈ Ω, the evaluation function v 7→ v(x) is a continuous linear map from V to Rd. This implies that, for any a ∈ Rd, the function v 7→ aT v(x) is a continuous linear functional on V . We will denote this linear form by a ⊗ δx, so that T (9) (a ⊗ δx| v) = a v(x). a From Riesz theorem, there exists a unique element, denoted Kx in V such that, for any v ∈ V a T (10) hKx , viV = a v(x) a d T The map a 7→ Kx is linear from R to V (this is because a 7→ a v(x) is linear and because of the uniqueness of the Riesz representation), which implies that, for y ∈ Ω, a d d the map a 7→ Kx (y) is linear from R to R . This implies that there exists a matrix, d a that we will denote K(y, x), such that, for a ∈ R , x, y ∈ Ω, Kx (y) = K(y, x)a. In a the following, we will preferably use the notation Kx = K(., x)a to emphasize the matricial nature of K. The kernel K, in the case of vector fields, being matrix valued, the self repro- ducing property, in this case, is T b T hK(., x)a , K(., y)biV = a Ky(x) = a K(x, y)b By summetry of the first term, we obtain the fact that, for all a, b ∈ Rd, aT K(x, y)b = bT K(y, x)a which implies that K(y, x) = K(x, y)T . Concerning the positivity of K, we make an assumption similar to the scalar case: 20 1. ELEMENTS OF HILBERT SPACE THEORY

d Assumption 1. If x1, . . . , xN ∈ Ω and α1, . . . , αN ∈ R are such that, for all T T v ∈ V , α1 v(x1) + ··· + αN v(xN ) = 0, then α1 = ··· = αN = 0. d Under this assumption, it is easy to prove that, for all α1, . . . , αN ∈ R , N X T αi K(xi, xj)αj ≥ 0 i,j=1 with equality if and only if all αi vanish. The interpolation problem in the vector case becomes N (Sd) Given x1, . . . , xN in Ω, λ1, . . . , λN in R , find v in V , with minimum norm such that v(xi) = λi. As before, we let V0 = {v ∈ V : v(xi) = 0, i = 1,...,N} Then, lemma 1 remains valid:

⊥ Lemma 2. If there exists a solution vˆ of problem Sd, then vˆ ∈ V0 . Moreover, ⊥ if vˆ ∈ V0 is a solution of S restricted to this set, then it is a solution of S on V . ⊥ We add to this the characterization of V0 , which slightly less straightforward than in the scalar case: Lemma 3. ( N ) ⊥ X d V0 = v = K(., xi)αi, α1, . . . , αN ∈ R i=1 ⊥ Thus, a vector field v belongs to V0 if and only if there exists α1, . . . , αN in Rd such that, for all x ∈ Ω, N X v(x) = K(x, xi)αi i=1 This expression is formally similar to the scalar case, with the difference that K(x, xi) is a matrix and αi is a vector.

Proof. It is clear that w ∈ V0 is equivalent to the fact that, for any α1, . . . , αN , one has N X T αi v(xi) = 0 i=1 PN so that w ∈ V0 is equivalent to hv , wiV = 0 for all v of the kind v = i=1 K(., xi)αi. Thus ( N )⊥ X d V0 = v = K(., xi)αi, α1, . . . , αN ∈ R i=1 n PN do and since v = i=1 K(., xi)αi, α1, . . . , αN ∈ R is finite dimensional, hence closed, one has ( N ) ⊥ X d V0 = v = K(., xi)αi, α1, . . . , αN ∈ R i=1 ut 8. SPLINE INTERPOLATION 21

PN ⊥ When v = j=1 K(., xj)αj ∈ V0 , the constraint v(xi) = λi yields N X K(xi, xj)αj = λi j=1 Since we also have, in this case, N 2 X T kvkV = αi K(xi, xj)αj i,j=1 the whole problem can be rewritten quite concisely under a matricial form, intro- ducing the notation   K(x1, x1) ...K(x1, xN )  . . .  (11) S = S(x1, . . . , xN ) =  . . .  K(xN , x1) ...K(xN , xN ) which is now a block matrix of size Nd × Nd,     α1 λ1  .   .  α =  .  , λ =  .  αN λN each αi, λi being considered as d dimensional column vectors.. The whole set of 2 T constraints now becomes Sα = λ and kvkV = α Sα. Thus, replacing scalars by blocks, the problem has exactly the same structure as in the previous case, and we can repeat the results we have obtained:

Theorem 10 (interpolating splines). Problem (Sd) has a unique solution in V , given by N X vˆ(x) = K(x, xi)αi i=1 with −1    K(x , x ) ...K(x , x )    α1 1 1 1 N λ1 .  . . .  .  .  =  . . .   .   .     .  α . λ N K(xN , x1) . K(xN , xN ) N Theorem 11 (Smoothing splines). The minimum over V of N 2 X 2 kvkV + C |v(xi) − λi| i=1 is attained at N X vˆ(x) = K(x, xi)αi i=1 with     α1 λ1  .  −1  .   .  = (S + I/C)  .  αN λN and S = S(x1, . . . , xN ) given at equation (11). 22 1. ELEMENTS OF HILBERT SPACE THEORY

Finally, algorithm 1 remains the same, replacing scalars by vectors.

9. Building V and its kernel 9.1. Building V from operators. One way to define a Hilbert space V of smooth functions of vector fields is to use inner-products associated to operators. In this framework, an inner product, defined from the action of an operator, is defined on a subset of L2(Ω, R) (within which everything can be defined easily, i.e., in a classical sense when speaking of derivatives), but which is not complete for the induced norm. This subspace is then extended to a larger subspace of L2(Ω, R), which will be our Hilbert space V . This is called Friedrich’s extension of an operator. Since it is not restricted to subspaces of L2(Ω, R), we work in the following with an arbitrary Hilbert space H. To start, we need a subspace D, included in H and dense in this space, and an operator (i.e., a linear functional), L : D → H. Our typical application will be ∞ d ∞ with D = CK (Ω, R ) (the set of C functions with compact support in Ω) and H = L2(Ω, Rd). In such a case, L may be chosen as a differential operator of any degree, since derivatives of C∞ functions with compact support obviously belong to L2. However, L will be assumed to satisfy an additional monotonicity constraint, which is Assumption 2. The operator L is assumed to be symmetrical and strongly monotone on D, which means that there exists a constant c > 0 such that, for all u ∈ D,

(12) hu , LuiH ≥ chu , uiH and for all u, v ∈ D

(13) hu , LviH = hLu , viH . ∞ An example of strongly monotonic operator on CK (Ω, R) is given by Lu = Pk ∂2u −∆u + λu where ∆ is the Laplacian: ∆u = i=1 2 . Indeed, in this case, and ∂xi when u has compact support, an integration by parts yields

k 2 Z X Z  ∂u  − ∆u(x)u(x)dx = dx ≥ 0 ∂x Ω i=1 Ω i so that hu , LuiL2 ≥ λhu , uiL1 . Returning to the general case, the operator L induces an inner product on D, defined by

hu , viL = hu , LviH Assumption 2 ensures the symmetry of this product and its positive definiteness.

But D is not complete for k.kL, and we need to enlarge it (and simultaneously extend L) to obtain a Hilbert space. Theorem 12 (Freidrich’s extension). There exists a subspace V ⊂ H such that D is a dense subspace of V , and L can be extended to an operator Lˆ : V → V ∗ such that: • V is embedded in H ˆ  • If u, v ∈ D, hu , viV = hLu , viH = Lu| v 9. BUILDING V AND ITS KERNEL 23

The fact that Lˆ is an extension of L comes modulo the identification H = H∗. Indeed, we have V ⊂ H = H∗ ⊂ V ∗ (by the “duality paradox”), so that, L, defined on D ⊂ V can be seen as an operator with values in H∗. Definition 3. The operator Lˆ defined in Theorem 12 is called the energetic extension of L. It restriction to the space

n ∗ o DL = u ∈ V : Luˆ ∈ H = H is called the Freidrich’s extension of L. The Freidrich extension of L will still be denoted L in the following. We will not prove this theorem, but the interested reader may refer to [28]. The Freidrich extension has other interesting properties:

Theorem 13. L : VL → H is bijective and self-adjoint (hLu , viH = hu , LviH for all u, v ∈ VL). Its inverse, L−1 : H → H is continuous and self-adjoint. If the embedding V ⊂ H is compact, then L−1 : H → H is compact. In the following, we will mainly be interested in embeddings stronger than the L2 embedding implied by the monotony assumption. It is important than such embeddings are conserved by the extension whenever they are true in the initial space D. This is stated in the next proposition Proposition 7. Let D,V and H be as in Theorem 12, and B be a Banach space such that D ⊂ B ⊂ H and B is canonically embedded in H (there exists c1 > 0 such that kuk ≥ c1 kuk ). Assume that there exists a constant c2 such pB H that, for all u ∈ D, hLu , uiH ≥ c2 kukB. Then V ⊂ B and kukV ≥ c2 kukB for all u ∈ V . In particular, if B is compactly embedded in H, then L−1 : H → H is compact.

Proof. Let u ∈ V . Since D is dense in V , there exists a sequence un ∈ D such that kun − ukV → 0. Thus un is a Cauchy sequence in v and, by our assumption, 0 0 it is also a Cauchy sequence in B, so that there exists u ∈ B such that kun − u kB tends to 0. But since V and B are both embedded in H, we have kun − ukH → 0 0 0 and kun − u kH → 0 which implies that u = u . Thus u belongs to B, and since kunkV and kunkB respectively converge to kukV and kukB, passing to the limit in the inequality kunkV ≥ c2 kunkB ends the proof of proposition 7. ut We now investigate the relation between this theory and the previous analyses ∞ 2 using kernels. From now on, we assume D = CK (Ω) and H = L (Ω, R). The Hilbert space V will have the same role in both cases, but we need the continuity of the evaluation of functions in V , which here leads to assuming that there exists a constant such that, for all u ∈ D, 2 (14) hLu , uiL2 ≥ c kuk∞ since, as we have seen, this implies that V will be continuously embedded in 0 C (Ω, R). Thus, the kernel K is well defined on V , with the property that hKx , viV = ∗ v(x) = (δx| v). The linear form δx belongs to V and, by theorem 12, which implies ˆ  ˆ that hKx , viV = LKx| v , we have LKx = δx, or

−1 Kx = Lˆ δx 24 1. ELEMENTS OF HILBERT SPACE THEORY

This exhibits a direct relationship between the self-reproducing kernel on V and the energetic extension of L. For this reason, we will denote (with some abuse −1 ∗ of notation) K = Lˆ : V → V and write Kx = Kδx. In the vector case, we will make the identification K(., x)a = K(a ⊗ δx). There is another consequence, which stems from the fact that C0(Ω, R) is com- pactly embedded in L2(Ω, R), so that Theorem 13 implies that L−1 is a compact, self-adjoint operator. Such operators have the important property to admit an orthonormal sequence of eigenvectors: more precisely, there exits a decreasing se- quence, (ρn) of positive numbers, which is either finite or tends to 0, and an or- 2 2 thonormal sequence ϕn in L (Ω, R), such that, for u ∈ L (Ω, R) ∞ −1 X L u = ρnhu , ϕniL2 ϕn n=1

This directly characterizes VL as the set ( ∞ 2 ) X hu , ϕni 2 V = u ∈ L2(Ω, ): L < ∞ L R ρ2 n=1 n and for u ∈ VL, we have ∞ X −1 Lu = ρn hu , ϕniL2 ϕn n=1 so that, for u, v ∈ VL ∞ X −1 hu , viV = hLu , viL2 = ρn hu , ϕniL2 hu , ϕniL2 . n=1 This indicates that V should be given by ( ∞ ) 2 X −1 2 V = u ∈ L (Ω, R): ρn hu , ϕniL2 < ∞ n=1 PN This is indeed the case; VL is dense in this set: if u ∈ V , then uN = n=1 hu , ϕniϕn belongs to VL and kuN − ukV → 0. We summarize what we have just obtained in the following theorem.

∞ 2 Theorem 14. Assume that D = CK (Ω) and H = L (Ω, R), and L : D → H is symmetric and satisfies 2 hLu , uiL2 ≥ c kuk∞ for some constant c > 0. Then the energetic space of L, V , is continuously em- 0 ˆ−1 bedded in C (Ω, R) and its self-reproducing kernel is Kx = L δx. Moreover, there 2 exists an orthonormal basis, (ϕn) in L (Ω, R) and a decreasing sequence of positive numbers, (ρn), which tends to 0 such that ( ∞ ) 2 X −1 2 V = u ∈ L (Ω, R): ρn hu , ϕniL2 < ∞ n=1 Moreover, ∞ X −1 Lu = ρn hu , ϕniL2 ϕn n=1 9. BUILDING V AND ITS KERNEL 25 whenever

∞  2 X hu , ϕni 2 L < ∞ ρ n=1 n

9.2. Invariance of the inner product. 9.2.1. The operator side. We haven’t discussed so far under which criteria the inner product (or the operator L) should be selected. One important criterion, in particular for shape recognition is equivariance or invariance with respect to rigid transformations. We here consider the situation Ω = Rd. To understand the requirement, return to the interpolation problem for vector fields. Let x1, . . . , xN ∈ Ω and the vectors v1, . . . , vN be given. We want the optimal interpolation h to be invariant by an orthogonal change of coordinates. This means that, if A is a rotation and b ∈ Rd, and if we consider the problem associated to the points Ax1 +b, . . . .AxN +b and the constraints Av1, . . . , AvN , the obtained function hA,b must satisfy hA,b(Ax+b) = Ah(x), which can also be written hA,b(x) = Ah(A−1x − A−1b) := ((A, b) ? h)(x). The transformation h → (A, b) ? h is an action of the group of translations-rotations on functions defined on Ω. To ensure that hA,b always satisfies this condition, it is sufficient to enforce that the norm is invariant by this transformation: k(A, b) ? hkV = khkV . So, we need the action of translations and rotations to be isometric on V . 2 Assume that khkV = (Lh| h) for some operator L, and define the operator (A, b)∗L by the fact that, for all h, h˜ ∈ V

    L((A, b) ? h)| (A, b) ? h˜ =: ((A, b) ?L)h| h˜ so that, the invariance condition becomes expressed in terms of L:(A, b) ?L = L. Denoting hk the kth coordinate of h, and (Lh)k the kth coordinate of Lh, we can write the matrix operator L under the form

d X (Lh)k = Lklhl l=1

where Lkl is a scalar operator. We consider the case when each Lkl is a pseudo- differential operator in the sense that, for any C∞ scalar function u, the Fourier d transform of Lklu takes the form, for ξ ∈ R :

F (Lklu)(ξ) = fkl(ξ)ˆu(ξ).

for some scalar function fkl. Here and in the following, the Fourier transform of a function u is denoted eitheru ˆ or F (u). An interesting case is when fkl is a polynomial, which corresponds, from standard properties of the Fourier transform, to Lkl being a differential operator. Also, by the isometric property of Fourier 26 1. ELEMENTS OF HILBERT SPACE THEORY transforms,

d X (Lh| h) = ((Lh)k| hk) k=1 d X = (Lklhl| hk) k,l=1 d X = (F (Lklhl)| F (hk)) k,l=1 d X  ˆ ˆ  = fklhl| hk . k,l=1 Note that since we want L to be a positive operator, we need the matrix f(ξ) = (fkl(ξ)) to be a complex, positive, Hermitian matrix. Moreover, since we want (Lh| h) ≥ c(h| h) for some c > 0, it is natural to enforce that the smallest eigenvalue of f(ξ) is uniformly bounded from below (by c). To use this expression, we need to compute F ((A, b) ? h) in function of hˆ. We have

Z T F ((A, b) ? h)(ξ) = e−iξ x(A, b) ? h(x)dx d R Z T = e−iξ xAh(A−1x − A−1b)dx d R Z T = e−iξ (Ay+b)Ah(y)dy d R T = e−iξ bAh(AT ξ) where we have used the fact that det A = 1 (since A is a rotation matrix) in the change of variables. We can therefore write, denoting ϕb the function ξ 7→ exp(−iξT b)

d d X X  ˆ T ˆ T  (L(A ? h)| A ? h) = ϕbfklall0 hl0 ◦ A | ϕbakk0 hk0 ◦ A k,l=1 k0,l0=1 d d X X  ˆ ˆ  = fkl ◦ Aall0 hl0 | akk0 hk0 . k,l=1 k0,l0=1

T −1 We have used the fact that det A = 1 and that A = A . The function ϕb cancels in the complex product: Z (u| v) = uvdξ¯ d R wherev ¯ is the complex conjugate of v. Reodering the last equation, we obtain

d  d  X X ˆ ˆ (L(A ? h)| A ? h) =  akk0 fkl ◦ Aall0 hl0 | hk0  k0,l0=1 k,l=1 9. BUILDING V AND ITS KERNEL 27 which yields our invariance condition in terms of the fkl: for all k, l

d X fkl = ak0kfk0l0 ◦ Aal0l k0,l0=1 or representing f as a complex matrix AT f(Aξ)A = f(ξ) for all ξ and all rotation matrices A. We now investigate the consequences of this constraint. First consider the case ξ = 0. The identity AT f(0)A = f(0) for all rotations A implies, by a standard linear algebra theorem, that f(0) = λId for some λ ∈ R. Let’s now fix ξ 6= 0, and let η1 = ξ/|ξ|. Extend η1 into an orthonormal basis, (η1, . . . , ηd) and let (e1, . . . , ed) d be the canonical basis of R . Define A0 by A0ηi = ei; this yields A0ξ = |ξ|e1 and T f(ξ) = A0 f(|ξ|e1)A0 = M0. Now, for any rotation A that leaves η1 invariant, we T T also have f(ξ) = A A0 f(|ξ|e1)A0A since A0Aη1 = e1. This yield the fact that, for T any A such that Aη1 = η1, we have A M0A = M0. T This property implies, in particular that A M0η1 = M0η1 for all A such that ⊥ Aη1 = η1. Taking a decomposition of M0η1 on η1 and η1 , it is easy to see that this implies that M0η1 = αη1 for some α. So η1 is an eigenvector of M0, which implies ⊥ T that η is closed under the action of M0. But the fact that A M0A = M0 for any ⊥ rotation A of η1 implies that M0 restricted to η1 ⊥ is a homothetie, i.e., that there exists λ such that M0ηi = ληi for i ≥ 2. Using the decomposition u = hu , η1iη1 + u − hu , η1iη1, we can write

T M0u = αhu , η1iη1 + λ(u − hu , η1iη1) = ((α − λ)η1η + λId)u which finally yields the fact that (letting µ = α − λ and using the fact that the T eigenvalues of M0 = A f0(kxike1)A are the same as those of f0(|ξ|e1) and therefore only depend on |ξ|) ξξT f(ξ) = µ(|ξ|) + λ(|ξ|)Id. |ξ|2 Note that, since we know that f(0) = λId, we must have µ(0) = 0. We summarize this in the following theorem.

Theorem 15. Let L be a matricial operator such that there exists a matrix ∞ d d valued function f such that, for any u ∈ CK (R , R ), we have

d X F (Lu)k (ξ) = fkl(ξ)ˆu(ξ). l=1 Then L is rotation invariant if and only if f tales the form ξξT f(ξ) = µ(|ξ|) + λ(|ξ|)Id. |ξ|2 for some functions λ and µ such that µ(0) = 0.

Let’s first consider the case µ = 0: in this case, we have fkl = 0 if k 6= l and fkk(ξ) = λ(|ξ|) for all k, which implies that Lkl = 0 for k 6= l and all Lkk are equal to a single scalar operator L0, i.e.,

Lh = (L0h1,...,L0hd). 28 1. ELEMENTS OF HILBERT SPACE THEORY

This is the type of operators which is mostly used in practice. If one wants L0 to be a differential operator, λ(|ξ|) must be a polynomial in the coefficients of ξ, which is only possible if it tales the form p X 2p λ(|ξ|) = λp|ξ| . i=0 Pp p p In this case, the corresponding operator is L0u = i=0(−1) ∆ u. For example, if 2 2 2 λ(|ξ|) = (α + |ξ| ) , we get L0 = (αI − ∆) . Let’s consider an example of valid differential operator with µ 6= 0, taking µ(|ξ|) = α|ξ|2 and λ as above. This yields the operator p ∂2u X L u = −α + δ (−1)p∆pu kl ∂x ∂x kl k l i=0 so that p X Lh = −α∇divh + ( (−1)p∆p)h. i=0 9.2.2. The kernel side. The kernel (or Green function) that corresponds to an operator associated to a given f can be computed simply in the Fourier domain. It is indeed given by

F (Kklu)(ξ) = gkl(ξ)ˆu(ξ) with g = f −1 (the inverse matrix), as can be checked by a simple computation. Note that, for the spline application, we need K(a⊗δx) to be well defined. The kth T component of a ⊗ δx being akδx, we can write F ((a ⊗ δx))k (ξ) = ak exp(−iξ x) so that d X T F (K(a ⊗ δx))k = gkl(ξ)al exp(−iξ x). l=1 A sufficient condition for this to provide a well-defined function after Fourier inver- sion is that all gkl are integrable. Assuming this, we have d X (K(a ⊗ δx)(y))k = g˜kl(y − x)al l=1 whereg ˜kl is the inverse Fourier transform of gkl. In the case f = λId + µηηT , we can check that 1 µ f −1 = (Id − ηηT ). λ λ + µ Let ρ(ξ) = 1/λ(|ξ|) and ψ(ξ) = µ(ξ)/(λ(ξ)|ξ|2(λ(ξ) + µ(ξ))), so that f −1(ξ) = ρ(|ξ|)Id − ψ(|ξ|)|ξ|2ηηT . The eigenvalues of f −1 therefore are ρ(ξ) and ρ(ξ) − ψ(ξ)|ξ|2. Since the eigenvalues of f must be bounded from below by c, we obtain the following constraints on ρ and ψ: |ξ|2 max(ψ, 0) ≤ ρ ≤ c. We can moreover obtain the expression of the kernel in terms of the inverse Fourier transforms of ρ and ψ, which is: ∂2ψ˜ K(x, y)kl =g ˜kl(x − y) = δklρ˜(x − y) + (x − y). ∂xk∂xl 9. BUILDING V AND ITS KERNEL 29

We can make the additional remark that inverse Fourier transform of a radial function (i.e., a function u(ξ) that only depends on |ξ) is also a radial function. We have in fact

Z T ρ˜(x) = ρ(|ξ|)eiξ xdξ d R ∞ Z Z T = ρ(t)td−1 eiη txds(η) 0 Sd−1 Z ∞ Z = ρ(t)td−1 eiη1t|x|ds(η) 0 Sd−1 where Sd−1 is the unit sphere in Rd, and the last identity comes from the fact that the integral is invariant by rotation of x, which allows us to replace x by |x|. The last integral can in turn be expressed in terms of the Bessel function Jd/2 ([21]), yielding an expression which will not be used here. 2 There therefore exist two functions γ1 and γ2 such that ρ(x) = γ1(|x| ) and 2 ψ(x) = γ2(|x| ) and the expression for K becomes

2 0 2 00 2 T K(x, y) = (γ1(|x − y| ) + 2γ2(|x − y| ))I + 2γ2 (|x − y| )(x − y)(x − y) .

Again, the choice γ2 = 0 is commonly made in practice, yielding diagonal radial kernels [11]. There is a nice dimension independent characterization of scalar radial kernels:

Proposition 8. Assume that there exists a positive function f on ]0, +∞[ such that +∞ Z 2 γ(t) = e−t uf(u)du 0 Then, whatever the dimension d, the scalar kernel K(x, y) = γ(|x − y|) is positive definite. The interesting fact is that this sufficient condition is almost necessary. The necessary and sufficient condition, which requires Lebesgue’s integration, is that there exists a positive measure µ on [0, +∞[ such that

+∞ Z 2 γ(t) = e−t udµ(u) . 0

An important application is when µ is a Dirac measure µ = δσ−2 which provides 2 − t γ(t) = e σ2 . The kernel 2 − |x−y| K(x, y) = e σ2 is called the Gaussian kernel on Rd and is probably the most commonly used for spline smoothing. But proposition 8 can be used to generate other kernels, for example, f(u) = e−u provides the Cauchy kernel 1 K(x, y) = . 1 + |x − y|2 The translation invariant kernels (not necessarily radial), of the kind K(x, y) = Γ(x − y) can be characterized in a similar way. 30 1. ELEMENTS OF HILBERT SPACE THEORY

Proposition 9. Assume that there exists a positive, even function F on Rd such that Z T Γ(x) = e−ix uF (u)du d R Then the kernel χ(x, y) = Γ(x − y) is positive definite. 9.3. Mercer’s theorem. The interest of the above discussion is that it is possible to reconstruct the Hilbert space V used in spline representation from a positive definite kernel. We gave a description using Fourier transforms, but another way to achieve this is by using Mercer’s theorem [23].

Theorem 16. We take Ω = Rd. Let K :Ω × Ω → R be a continuous, positive definite kernel, such that Z K(x, y)2dxdy < ∞ Ω×Ω 2 Then, there exists an orthonormal sequence of functions in L (Ω, R), ϕ1, ϕ2,... and a decreasing sequence (ρn) which tends to 0 when n tends to ∞ such that ∞ X K(x, y) = ρnϕn(x)ϕn(y) n=1 Let the conditions of Mercer’s theorem be true, and define the Hilbert space V by ( ∞ ) 2 X 2 V = v ∈ L (Ω, R): hv , ϕniL2 < ∞ . n=1 Define, for v, w ∈ V : ∞ X hv , wi = ρ−1hv , ϕ i hw , ϕ i . V n n L2 n L2 n=1 Note that, for v ∈ V , there is a pointwise convergence of the series ∞ X v(x) = hv , ϕniL2 ϕn(x) n=1 since m !2 m m X X −1 2 X 2 hv , ϕniL2 ϕn(x) ≤ ρn hv , ϕniL2 ρn(ϕn(x)) n=p n=p n=p and both terms in the upper-bound can be made arbitrarily small. Similarly, ∞ X v(x) − v(y) = hv , ϕniL2 (ϕn(x) − ϕn(y)) n=1 so that ∞ ∞ 2 X −1 2 X 2 (v(x) − v(y)) ≤ ρn hv , ϕniL2 ρn(ϕn(x) − ϕn(y)) n=1 n=1 2 = kvkV (K(x, x) − 2K(x, y) + K(y, y)) so that v is continuous. 9. BUILDING V AND ITS KERNEL 31

Then, letting Kx = K(., x), one has ∞ X hϕm ,KxiL2 = ρnϕn(x)hϕm , ϕniL2 = ρnϕm(x) n=1 so that ∞ ∞ X −1 2 X 2 ρn hϕn ,KxiL2 = ρnϕn(x) = K(x, x) < ∞ n=1 n=1 so that Kx ∈ V and a similar computation shows that hKx ,KyiV = K(x, y) so that K is self reproducing. Finally, if v ∈ V , ∞ ∞ X −1 X hv , KxiV = ρn hv , ϕniL2 hϕn ,KxiL2 = hv , ϕniL2 ϕn(x) = v(x) n=1 n=1 so that Kx corresponds to the Riesz representation of the evaluation functional on V . Thus, finding a continuous, positive definite, square integrable kernel is enough to make the whole theory work. 9.4. Thin plate interpolation. Thin plate theory corresponds to the situa- tion in which the operator L is some power of the Laplacian. As a tool for shape analysis, it is been originally described by Bookstein [5]. Consider the following bilinear form Z Z 2 (15) hf , giL = ∆f∆gdx = f∆ gdx 2 2 R R which corresponds to the operator L = ∆2. We need to define it on a Hilbert space which is somewhat unusual. We consider d the space H1 of all functions in R with square integrable second derivatives, which have a bounded gradient at infinity. In this space, kfkL = 0 is equivalent to the fact that f is affine, i.e., f(x) = aT x + b, for some a ∈ Rd and b ∈ R. The Hilbert space we consider is the space of f ∈ H modulo the affine functions, i.e., the set of equivalent classes  T d [f] = g : g(x) = f(x) + a x + b, a ∈ R , b ∈ R for f ∈ H1. Obviously the norm associated to (15) is constant over the set [f], and kfk = 0 if and only if [f] = [0]. This space also has a kernel, although the analysis has to be different from what we have done previously, since the evaluation functional is not defined on H (functions are only known up to the addition of an affine term). However, the following is true [19]. Let U(r) = (1/8π)r2 log r if the dimension, d, is 2, and 3 d U(r) = (1/16π)r for d = 3. Then, for all u ∈ H1, there exists au ∈ R and bu ∈ R such that Z 2 T u(x) = U(|x − y|)∆ u(y)dy + au x + bu. 2 R We wil denote U(., x) the function y 7→ U(|x − y|). Fix, as in section 8.1, landmarks x1, . . . , xN and scalar constrains (c1, . . . , cN ). Again, the constraint h(xi) = ci has no meaning in H, but the constraint hU(., xi) , hiH = ci does, and means that there exists ah and bh such that T h(xi) + ah xi + bh = ci, i.e., h satisfies the constraints up to an affine term. 32 1. ELEMENTS OF HILBERT SPACE THEORY

Denote, as before Sij = U(|xi − xj|). The function h which is optimal under the constraints hU(., xi) , hiH = ci, i = 1,...N must therefore take the form N X T h(x) = αiU(|x − xi|) + a x + b. i=1 The corresponding interpolation problem is: minimize khk under the constraint T that there exists a, b such that h(xi) + ah xi + bh = ci, i = 1,...,N.The inexact matching problem simply consists in minimizing

N 2 X T 2 khkH + λ (h(xi) + a xi + b − ci) i=1 with respect to h and a, b. Replacing h by its expression in terms of the α’s, a and b yields the finite dimensional problem: minimize αT Sα T under the constraint Sα + Qγ = c with γ = (a1, . . . , ad, b) (with size d + 1 × 1) 1 d and Q, with size N × d + 1 given by (letting xi = (xi , . . . , xi )):  1 d  x1 ··· x1 1  . . .  Q =  . . .  . 1 d xN ··· xN 1 The optimal (α, γ) can be computed by identifying the gradient to 0. One obtains

−1 γˆ = QT S−1Q QT S−1c andα ˆ = S−1(c − Qγˆ). For the inexact matching problem, one minimizes

T t T t α Sα + λ(Sα + axi + b − c) (Sα + axi + b − c) , and the solution is provided by the same formulae, simply replacing S by Sλ = S + (1/λ)I. When the function h (and the ci) take d-dimensional values (e.g., correspond to displacements), the above computation has to be applied to each coordinate, which simply corresponds to using a diagonal operator, each component equal to ∆2, in the definition of the dot product. This is equivalent to use the diagonal scalar kernel associated to U.

9.5. Kernels that are affine at infinity. We now consider the case of vector fields, and discuss how affine components can be combined with any kernel. We assume here that Ω = Rd. We would like to consider spaces V that contain vector fields with an affine behavior at infinity. Note that the spaces V that we have consider so far, either by completion of C∞ compactly supported functions of using kernels defined by Fourier transforms only contain functions that vanish at infinity. We recall the definition of a function that vanishes at infinity:

Definition 4. A function fΩ → Rk is said to vanish at infinity if and only if, for all ε > 0, there exists A > 0 such that |f(x)| < ε whenever x ∈ Ω and |x| > A. 9. BUILDING V AND ITS KERNEL 33

Here, we let V be a Hilbert space of vector fields that vanish at infinity and define  d Vaff = w : ∃w0 ∈ V,A ∈ Md(R) and b ∈ R with w(x) = w0(x) + Ax + b . We have the following important fact: Proposition 10. If V is a Hilbert space of continuous vector fields that vanish at infinity, then the decomposition w(x) = w0(x) + Ax + b for w ∈ Vaff is unique.

Proof. Using differences, it suffices to prove this for w = 0, and so, if w0,A and b are such that, for all x w0(x) + Ax + b = 0, then, for any fixed x and t > 0, w0(tx) + tAx + b = 0 so that Ax = −(w0(tx) + b)/t. Since the right-hand term tends to 0 when t tends to infinity, we get Ax = 0 for all x so that A = 0. Now, for all t, x, we get b = −w0(tx), which (since w vanishes at infinity) implies b = 0, and therefore also w0 = 0. ut

So we can speak of the affine part of an element of Vaff . Given the inner product in V , we can define D E D E hw , w˜i = hw , w˜ i + A, A˜ + b , ˜b Vaff 0 0 V D E where A, A˜ is some inner product between matrices (for example trace(AT A)) D E and b , ˜b some inner product between vectors (for example bT ˜b). With this product Vaff is obviously a Hilbert space and we want to compute its kernel Kaff in function of the kernel K of V (assuming that V is reproducing). Given x ∈ Rd d T and a ∈ , we need to express a w(x) under the form hKaff (., x)a , wi . Using R Vaff the decomposition, we have T T T T a w(x) = a w0(x) + a Ax + a b T ∗ ∗ = hK(., x)a , w0iV + (ax ) ,A + ha , bi where, for a matrix M, we define M ∗ by the identity hM ∗ ,Ai = trace(M T A) for all A and for a vector z, z∗ is such that hz∗ , bi = zT b for all b. From the definition of the extended inner product, we can identify T ∗ ∗ Kaff (y, x)a = K(y, x)a + (ax ) y + a . D E D E In particular, when A, A˜ = λtrace(AT A) and b , ˜b = µbT ˜b, we get

T T Kaff (y, x)a = K(y, x)a + ax y + a = (K(x, y) + (x y/λ + 1/µ)Idd)a. This provide an immediate extension of the spline interpolation of vector fields to include affine transformations, by just replacing K by Kaff . For example, exact interpolation with constraints v(xi) = ci is obtained by letting N X v(x) = Kaff (xi, xj)αj i=1 the vectors αj being obtained by solving the system N X T (16) (K(xi, xj)αj + (xi xj/λ + 1/µ)αj) = ci, for i = 1,...,N. j=1 34 1. ELEMENTS OF HILBERT SPACE THEORY

The case λ = µ → 0 is particularly interesting, since this corresponds to relax- ing the penalty on the affine displacement and obtain in this way an affine invariance similar to thin plates. In fact, the solution of (16) has a limit v∗ which is a solution 2 of the affine invariant interpolation problem: minimize kvkV under the constraint that there exists A and b with v(xi) = ci − Axi − b for i = 1,...,N. Indeed, (16) provides the unique solutions, vλ,Aλ and bλ, of the problem: minimize, with respect to v, A, b 2 2 2 kvkV + λ(kAk + kbk ) λ under the constraints v(xi) = ci −Axi −b for i = 1,...,N. Because kv kV is always smaller that the norm of the optimal v for the problem without affine component, we know that it is bounded. Thus, to prove that vλ converges to v∗, it suffice to prove that any weakly converging subsequence of vλ has v∗ as a limit. So take such a sequence, vλn , and let v0 be its weak limit. Because v 7→ v(x) is a continuous λn linear form, we have v (xi) → v(xi) for all i. Because of this, and because there λ λ λ λ λ exists a solution A , b to the overconstrained system A xi + b = ci − v (xi), we 0 0 0 0 0 can conclude that there exists A and b such that A xi + b = ci − v (xi) (because the condition for the existence of solutions pass to the limit). So v0 satisfies the constraints of the limit problem. Moreover, for any λ, we have ∗ 2 ∗ 2 ∗ 2 λ 2 λ 2 λ 2 λ 2 kv kV + λ(kA k + kb k ) ≥ kv kV + λ(kA k + kb k ) ≥ kv kV . 0 2 since the liminf of the last term is larger than kv kV , and the limit of the left-hand ∗ 2 ∗ ∗ term is kv kV , we can conclude that v0 = v since v is the unique solution of the limit problem. CHAPTER 2

Ordinary differential equations and groups of diffeomorphisms

1. Introduction In this chapter, we review a few results for the theory of differential equations, which will provide a very efficient way to generate deformations. It may be time to specify what is meant by deformations in this course. The definition is quite natural: we fix an open subset Ω in Rd. A deformation is a function ϕ which assigns to each point x ∈ Ω a displaced position y = ϕ(x) ∈ Ω. There are two undesired behaviors that we would like to forbid: they are • The deformation should not create holes: every point y ∈ Ω should be the image of some point x ∈ Ω, i.e., ϕ should be onto. • Folds are also prohibited: two distinct points x and x0 in Ω should not be targeting the same point y ∈ Ω, i.e., ϕ must be one to one. Thus deformations must be bijections of Ω. In addition, we require some smoothness for ϕ. The next definition introduces some standard vocabulary:

Definition 5. A homeomorphism of Ω is a continuous bijection ϕ :Ω → Ω such that its inverse, ϕ−1 is continuous. A diffeomorphism of Ω is a continuously differentiable homeomorphism ϕ :Ω → Ω such that ϕ−1 is continuously differentiable.

One can show in fact that a continuously differentiable homeomorphism is a diffeomorphism. From now on, most of the deformations we shall consider will be diffeomorphisms of some open set Ω ⊂ Rd. If there is information (like grey-level, color, shape boundary) which is carried at position x, it will be moved according to the deformation. We shall say that deformations act on structures carried by Ω. Consider as a example an “image”, ie. a function I :Ω → R and a diffeomorphism ϕ of Ω. The deformation will create a new image I0 on Ω by letting I0(y) be the value of I at the position x which moved to y through the deformation, i.e., I0(y) = I(ϕ−1(y)) or I0 = I ◦ ϕ−1. This describes the action of diffeomorphisms on functions. However, we shall not be directly concerned with the direct problem of com- puting the action of diffeomorphims, but with the inverse problem of estimating the best diffeomorphims from the output of its action. For example, the image matching problem consists in finding an algorithm which, given two functions I and I0 on Ω, is able to recover a plausible diffeomorphism ϕ such that I0 = I ◦ ϕ−1. We therefore must face the problem of building diffeomorphims. This is not an easy matter, because of the nonlinear character of the problem: linear combinations of diffeomorphisms are not necessarily diffeomorphisms.

35 36 2. ORDINARY DIFFERENTIAL EQUATIONS AND GROUPS OF DIFFEOMORPHISMS

There is a direct, but limited way of building diffeomorphisms, by small per- turbations of the identity. It is provided by the next proposition. We assume that Ω is bounded.

Proposition 11. Let u ∈ C1(Ω, Rd), and assume that, (i) u(x) and Du(x) tend to 0 when x tend to infinity. (ii) For some δ > 0, one has that u(x) = 0 for any x ∈ Ω such that there exists y 6∈ Ω with |x − y| < δ. (This condition is void if Ω = Rd.) Then, for small enough ε, ϕ : x 7→ x + εu(x) is a diffeomorphism of Ω. Proof. Indeed, ϕ is obviously continuously differentiable, and takes values in Ω as soon as as ε < δ kuk∞. Since Du is continuous and tends to 0 at infinity, it is bounded and there exists a constant C such that |u(x) − u(x0)| ≤ C |x − x0|. If ϕ(x) = ϕ(x0) we have |x − x0| = ε |u(x) − u(x0)| ≤ Cε |x − x0| which implies x = x0 as soon as it is assumed that ε < 1/C, and ϕ is one to one in this case. Showing that ϕ is onto is a little bit harder. The first remark to be made is that if B(y, δ), the ball centered at y with radius δ is not included in Ω, then, by assumption, ϕ(y) = y so that y ∈ ϕ(Ω). Thus assume that B(y, δ) ⊂ Ω, and consider the function ψy, defined on B(0, δ) by ψy(η) = −εu(y + η). If ε < δ kuk∞, we have ψy(η) ∈ B(0, δ), and we have the inequality 0 0 |ψy(η) − ψy(η )| ≤ εC |η − η | y

If εC < 1, ψy is a contraction, and the fixed point theorem (stated below) implies that there exists η ∈ B(0, δ) such that ψy(η) = η. But in this case,

ϕ(y + η) = y + η + εu(y + η) = y + η − ψy(η) = y so that y ∈ ϕ(Ω) and ϕ is onto. It remains to prove that ϕ−1 is continuous. Note that we van extend u to Rd by u = 0 on Ωc which extends ϕ (and ϕ−1) to Rd also. It therefore suffices to prove d 0 d the continuity when Ω = R . So take y, y ∈ R and let ηy and ηy0 be the fixed 0 points of ψy and ψy0 in B(y, δ) and B(y , δ) respectively. We have −1 −1 0 0 0 |ϕ (y) − ϕ (y )| = |ηy + y − ηy0 − y | ≤ |ηy − ηy0 | + |y − y | 0 so that it suffices to prove that ηy is closed to ηy0 when y is closed to y . We have

|ηy − ηy0 | = |ψy(ηy) − ψy0 (ηy0 )| 0 = ε|u(y + ηy) − u(y + ηy0 )| 0 ≤ Cε(|y − y | + |ηy − ηy0 |) so that, if Cε < 1,

Cε 0 |η − η 0 | ≤ |y − y |. y y 1 − Cε which proves the continuity of ϕ−1. ut

We here recall the standard fixed point theorem, which has been used in the previous proof and will be used later: 2. A CLASS OF ORDINARY DIFFERENTIAL EQUATIONS 37

Theorem 17 (Banach fixed point theorem). Let B be a Banach space, U ⊂ B and Φ be a contraction of U, i.e., a map Φ: U → U such that there exists a constant c ∈ [0, 1[ such that, for all x, y ∈ U,

kΦ(x) − Φ(y)kB ≤ c kx − ykB .

Then, Φ has a unique fixed point in U, i.e., there exists a unique x0 ∈ U such that Φ(x0) = x0.

We therefore know how to build small deformations. Of course, we cannot be satisfied with this, since they correspond to a very limited class of diffeomorphisms. However, we can use them to generate large deformations, because diffeomorphisms can be combined using composition. If ϕ and ϕ0 are diffeomorphisms, then ϕ ◦ ϕ0 is a diffeomorphism. In fact, diffeomorphisms of Ω form a group for the composition of functions, often denoted Diff(Ω). Thus, let ε0 > 0 and u1, . . . , un,... be vector fields on Ω which are such that, for ε < ε0, id +εui is a diffeomorphism of Ω. Consider ϕn = (id +εun)◦· · ·◦(id +εu1). We have

ϕn+1 = (id + εun) ◦ ϕn = ϕn + εun ◦ ϕn which can also be written (ϕn+1 − ϕn)/ε = un ◦ ϕn. Fixing x ∈ Ω and letting x0 = x, xn = ϕn(x), we have the relation (xn+1 − xn)/ε = un(xn). This looks like a discretized version of a differential equation, of the form (introducing a continuous time variable t): dx t = u (x ) dt t t This motivates the rest of this chapter, which will be devoted to ordinary differential equations and how they generate flows of diffeomorphisms.

2. A class of ordinary differential equations

2.1. Definitions. We let Ω ⊂ Rd be open and bounded. We have denoted 1 d C0 (Ω, R ) the Banach space of continuously differentiable vector fields v on Ω such 1 d that v and Dv vanish on ∂Ω and at infinity. Elements v ∈ C0 (Ω, R ) can be considered as defined on Rd by setting v(x) = 0 if x 6∈ Ω. 1 1 d We define the set X (T, Ω) of integrable functions from [0,T ] to C0 (Ω, R ). An 1 element of X (T, Ω) is a time-dependent vector field, (vt, t ∈ [0, 1]) such that, for 1 d each t, vt ∈ C0 (Ω, R ) and Z T kvkX 1,T := kvtk1,∞ dt < ∞ . 0 For v ∈ X 1(T, Ω), we consider the ordinary differential equation, which will dy be formally denoted dt = vt(y). A function t 7→ y(t) is called a solution of the equation on [0, τ](τ ≤ T ) with initial condition x if

• t 7→ yt is continuous on [0, τ], y0 = x. • for all t ≤ τ, Z t yt = x + vs(ys)ds 0 2.2. Main results. 38 2. ORDINARY DIFFERENTIAL EQUATIONS AND GROUPS OF DIFFEOMORPHISMS

2.2.1. Existence and uniqueness. Theorem 18. Let v ∈ X 1(T, Ω). For all x ∈ Ω and t ∈ [0,T ], there exists a dy unique solution on [0,T ] of the ordinary differential equation dt = vt(y) with initial condition yt = x. This solution is such that ys ∈ Ω for all s ∈ [0,T ].

Proof. The standard proof usually assume uniform boundedness of kvtk1,∞ (or that vt is Lipschitz in x, uniformly in t). We here have an integral condition instead, but the proof is very similar and and we provide it for completeness [25, 10]. We first prove the result for Ω = Rd. Fix x ∈ Rd, t ∈ [0,T ] and δ > 0. Let I = I(t, δ) denote the interval [0,T ] ∩ [t− δ, t + δ]. If ϕ is a continuous function from I to Rd such that ϕ(t) = x, we define the function Γ(ϕ): I → Rd by Z s Γ(ϕ)(s) = x + vu(ϕ(u))du t The function s 7→ Γ(ϕ)(s) is continuous and such that Γ(ϕ)(t) = x. The set of continuous functions from the compact interval I to Rd, equipped with the supremum norm, is a Banach space, and we show that for δ small enough, Γ satisfies 0 0 kΓ(ϕ) − Γ(ϕ )k∞ ≤ γ kϕ − ϕ k∞ with γ < 1. The fixed point theorem will then imply that there is a unique function ϕ such that Γ(ϕ) = ϕ, and this is the definition of a solution of the ordinary differential equation on I. 0 R s 0 Since Γ(ϕ)(s) − Γ(ϕ )(s) = t (vu(ϕ(u)) − vu(ϕ (u)))du we have Z 0 0 kΓ(ϕ) − Γ(ϕ )k∞ ≤ kϕ − ϕ k∞ kvuk1,∞du I R but I kvuk1,∞du can be made arbitrarily small by reducing δ so that existence and uniqueness over I is proved. Now, we can make the additional remark that δ can R s be taken independent of t. This is because the function α : s 7→ 0 kvuk1,∞du is continuous, hence uniformly continuous on the interval [0,T ], so that there exists a constant η > 0 such that |s − s0| < η implies that |α(s) − α(s0)| < 1/2, and it suffices to take δ < η/2. From this remark, we can conclude that a unique solution of the ordinary differential equation exists over all [0,T ], because it is now possible, starting from the interval I(t, δ) to extend the solution from both sides, by jumps of δ/2 at least, until boundaries are reached. This proves the result for Ω = Rd and we now consider arbitrary open sets Ω. c By extending vt by 0 on Ω , the value of kvtk1,∞ remains unchanged and we can apply the result over Rd to ensure existence and uniqueness of the solution with a given initial condition. So, it only remains to show that solutions such that yt ∈ Ω for some t belong to Ω at all times. This is true because if there exists s such that 0 0 ys = x 6∈ Ω, then the functiony ˜u = x for all u is a solution of the equation, since 0 vu(x ) = 0 for all u. Uniqueness impliesy ˜ = y at all times which is impossible. ut 1 v Definition 6. Let v ∈ X (T, Ω). We denote by ϕst(x) the solution at time t of the equation dy/dt = vt(y) with initial condition ys = x. v The function (t, x) 7→ ϕst(x) is called the flow associated to v starting at s. It is defined on [0,T ] × Ω. 2. A CLASS OF ORDINARY DIFFERENTIAL EQUATIONS 39

From the definition, we have the property Proposition 12. If v ∈ X 1(T, Ω) and s, r, t ∈ [0,T ], then v v v ϕst = ϕrt ◦ ϕsr v v v In particular, ϕst ◦ ϕts = id and ϕst is invertible for all s et t. v Proof. If x ∈ Ω, ϕst(x) is the value at time t of the unique solution of dy/dt = 0 v vt(y) which is equal to x at time s. It is equal to x = ϕsr(x) at time r, and thus v 0 also equal to ϕrt(x ) which is the statement of the proposition. ut 2.2.2. Properties of the flow. The next theorem shows that flows really are interesting objects when diffeomorphisms are needed.

1 v Theorem 19. Let v ∈ X (T, Ω). The associated flow, ϕst, is at all times a diffeomorphism of Ω. The proof of this result depends on Gronwall’s lemma, that we first state and prove.

Theorem 20 (Gronwall’s lemma). Consider two positive functions αs, us, de- fined for s ∈ I where I is an interval in R. We assume that u is bounded, and that, for some function c, and for all t ∈ I, Z t

(17) ut ≤ c + αsusds . 0 Then, for all t ∈ [0,T ], R t | αsds| ut ≤ ce 0 Proof. To treat simultaneously the cases t > 0 and t < 0, we let ε = 1 in the first case and ε = −1 in the second case. Inequality (17) now becomes: Z t ut ≤ c + ε αsusds 0 Iterating once this inequality yields

Z t Z t Z s1 2 ut ≤ c + cε αsds + ε αs1 αs2 us2 ds . 0 0 0 and it may be checked by induction that, repeating this process, Z t Z n ut ≤ c + cε αsds + ··· + ε αs1 . . . αsn ds1 . . . dsn 0 0≤s1≤···≤sn≤t Z n+1 + ε αs1 . . . αsn+1 usn+1 ds1 . . . dsndsn+1 . 0≤s1≤···≤sn≤t Consider the integral Z

In = αs1 . . . αsn ds1 . . . dsn 0≤s1≤···≤sn≤t

Let σ be a permutation of {1, . . . , n}: making the change of variable si → sσi in In yields Z

In = αs1 . . . αsn ds1 . . . dsn 0≤sσ1 ≤···≤sσn ≤t 40 2. ORDINARY DIFFERENTIAL EQUATIONS AND GROUPS OF DIFFEOMORPHISMS

Obviously, X 1 = 1 0≤sσ1 ≤···≤sσn ≤t σ whenever si 6= sj if i 6= j. The n-tuples s1, . . . , sn for which si = sj for some j for a set of dimension n − 1 which has no influence on the integral. Thus, summing over σ yields Z Z t n

n!In = αs1 . . . αsn ds1 . . . dsn = αsds [0,t]n 0

Therefore, using the fact that un is bounded, we have n k n+1 X εk Z t  εn+1 sup(u) Z t  u ≤ c α ds + α ds . t k! s (n + 1)! s k=0 0 0 and passing to the limit yields the result. ut We now pass to the proof of theorem 19, in which Gronwall’s lemma will be used several times.

v Proof of theorem 19. We first show that ϕst is a homeorphism. Take x, y ∈ Ω. We have Z t v v v v |ϕst(x) − ϕst(y)| = x − y + (vs(ϕsr(x)) − vs(ϕsr(y))) dr s Z t v v ≤ |x − y| + kvrk1,∞ |ϕsr(x) − ϕsr(y)| ds s v v We apply Gronwall’s lemma with c = |x − y|, αr = kvrk1,∞ and ur = |ϕ0r(x) − ϕ0r(y)| v which is bounded since ϕsr is continuous in r. This yields Z t  v v (18) |ϕsr(x) − ϕsr(y)| ≤ |x − y| exp kvrk1,∞ dr s v which shows that ϕst is continuous on Ω, and even Lipschitz, with a Lipschitz   v −1 v constant smaller than exp kvkX 1,t . Since (ϕst) = ϕts, the inverse is also con- v tinuous so that ϕst is a homeomorphism of Ω. We now prove that we indeed have a diffeomorphism. To see what we aim at, we assume first that this is true and formally differentiate, at ε = 0, the equation ∂ϕv st (x + εδ) = v (ϕv (x + εδ)) ∂t t st to obtain ∂ Dϕv (x).δ = Dv (ϕv (x)) Dϕv (x).δ ∂t st t st st This suggests introducing the linear differential equation ∂W t = Dv (ϕv (x))W ∂t t st t with initial conditions Ws = δ. The same argument we have used for the original equation shows that this equation also admits a unique solution on [0,T ]. The previous computation indicates that this solution should provide the differential v Dϕst(x), and we now prove this result. Define v v aε(t) = (ϕst(x + εδ) − ϕst(x)) /ε − Wt . 2. A CLASS OF ORDINARY DIFFERENTIAL EQUATIONS 41

We need to show that aε(t) → 0 when ε → 0. For α > 0, define

µt(α) = max {|Dvt(x) − Dvt(y)| : x, y ∈ Ω, |x − y| ≤ α}

The function x 7→ Dvt(x) is continuous and tends to 0 at infinity (or on ∂Ω) and therefore is uniformly continuous, which is equivalent to the fact that µt(α) → 0 when α → 0. We can write Z t Z t 1 v v v aε(t) = (vr(ϕsr(x + εδ)) − vr(ϕsr(x))) dr − Dvt(ϕst(x))Wtdt ε s s Z t v = Dvr(ϕsr(x))aε(r)dr s Z t 1 v v v v v + (vr(ϕsr(x + εδ)) − vr(ϕsr(x)) − εDvr(ϕsr(x)) (ϕsr(x + εδ) − ϕsr(x))) dr ε s We have, for all x, y ∈ Ω:

|vt(y) − vt(x) − Dvt(x)(y − x)| ≤ µt(|x − y|) |x − y| . This inequality, combined with equation (18), yields Z t Z T |aε(t)| ≤ kvrk1,∞ |aε(r)| dr + C(v) |δ| µr(εC(v) |δ|)dr s 0 for a constant C(v) which only depends on v. To conclude the proof using Gron- wall’s lemma we need the fact that Z T lim µr(α)dr = 0 . α→0 0

This is a consequence of the fact that µr(α) → 0 for each r when α → 0 and of the upper bound µr(α) ≤ 2 kvk1,∞ which allows us to apply Lebesgue dominated convergence theorem. ut

We have incidentally shown the following important fact:

1 v Proposition 13. Let v ∈ X (T, Ω). Then for fixed x ∈ Ω, Dϕst(x) is the solution of the linear differential equation dW (19) t = Dv (ϕv (x))W dt t st t with initial condition Ws = Idk. In fact, this result can be extended to p derivatives (we state this without proof, see [24])

R t v Theorem 21. If p ≥ 1 and 0 kv(t)kp,∞dt < ∞, then, ϕst is p times differen- tiable and for all q ≤ p ∂ (20) Dqϕv = Dq(v ◦ ϕv ). ∂t st st Moreover, there exist constants C,C0 (independent on v) such that,

0 R 1 v C 0 kvkp,∞ (21) sup kϕstkp,∞ ≤ Ce . s∈[0,1] 42 2. ORDINARY DIFFERENTIAL EQUATIONS AND GROUPS OF DIFFEOMORPHISMS

v Expanding the q-th derivative of v ◦ϕ0t in the right-hand-term of equation (20) (the general formula is rather cumbersome, but one can do it at least for small values v q v of q) provides an expression with leading terms Dv ◦ ϕ0tD ϕ0t followed by lower q v order terms in ϕ. This implies that D ϕ0t(x) is the solution of a non homogeneous linear differential equation with vanishing initial conditions. 2.2.3. Variations with respect to the vector field. It will be quite important, in the following, to be also able to characterize the effects that a variation of the time-dependent vector v may have on the induced flow. For this purpose, we fix 1 1 v+εh v ∈ X (T, Ω), and h ∈ X (T, Ω), and proceed to the computation of (d/dε)ϕst at ε = 0. The argument is similar to the previous section: first, make a guess by formal differentiation of the original ODE, then proceed to a rigorous argument to show that the result which has been guessed is correct. So, consider the equation ∂ϕv+εh st = v ◦ ϕv+εh + εh ◦ ϕv+εh ∂t t st t st and formally compute its derivative with respect to ε. This yields ∂ ∂ d ϕv+εh = h ◦ ϕv+εh + D(v + εh ) ◦ ϕv+εh ϕv+εh ∂t ∂ε st t t t t st dε st and, letting ε = 0,

∂ ∂ v+εh v v d v+εh ϕ = (ht) ◦ ϕ + Dvt ◦ ϕ ϕ ∂t ∂ε st |ε=0 t st dε st |ε=0 which naturally leads to introduce the solution of the differential equation ∂ (22) W = h ◦ ϕv + Dv ◦ ϕv W ∂t t t st t st t with initial condition Ws = 0. We now set v+εh v  aε(t) = ϕst (x) − ϕst(x) /ε − Wt and express it under the form Z t Z t v v+εh v aε(t) = Dvu ◦ ϕsuaε(u)du + (hu(ϕsu (x)) − hu(ϕsu(x)))du s s Z t 1 v+εh v v v+εh v  + vu(ϕsu (x)) − vu(ϕsu(x)) − εDvu(ϕsu(x)) ϕsu (x) − ϕsu(x) du ε s The proof can proceed exactly as in theorem 19, provided it has been shown that v+εh v ϕsu (x) − ϕsu(x) = O(ε) which is again a direct consequence of Gronwall’s lemma and of the inequality Z t   Z t 1 v+εh v 1 v+εh v ϕst (x) − ϕst(x) ≤ kvukX 1,T ϕsu (x) − ϕsu(x) du + khuk∞ du. ε s ε s v Equation (22) is the same as equation (19), with the additional term ht ◦ ϕst. This implies that the solution of (22) may be expressed in function of the solution of (19) by variation of the constant, ie. it may be expressed under the form v Wt = Dϕst(x)At with As = 0 and At may be identified by letting dW dA h ◦ ϕv + Dv ◦ ϕv W = t = Dϕv (x) t + Dv ◦ ϕv W t st t st t dt st dt t st t 2. A CLASS OF ORDINARY DIFFERENTIAL EQUATIONS 43 so that dA t = (Dϕv (x))−1 h ◦ ϕv (x) = Dϕv (ϕv (x))h ◦ ϕv (x). dt st t st ts st t st This implies that Z t v v v At = Dϕus(ϕsu(x))hu ◦ ϕsu(x) s and Z t v v v v Wt = Dϕst(x)Dϕus(ϕsu(x))hu ◦ ϕsu(x) s which, by the chain rule can be written Z t v v v Wt = Dϕut(ϕsu(x))hu ◦ ϕsu(x) s We summarize this discussion by the theorem Theorem 22. Let v, h ∈ X 1(T, Ω). Then, for x ∈ Ω Z t d v+εh v v v (23) ϕst (x)|ε=0 = Dϕut(ϕsu(x))hu ◦ ϕsu(x)du dε s v A trivial consequence of this theorem is the fact that ϕst depends continuously on v. But one can be more specific: we have the inequalities t 0 Z  0  v v v 0 v ϕst(x) − ϕst(x) ≤ vu(ϕsu(x)) − vu(ϕsu(x)) du s t t Z Z 0 v 0 v 0 v 0 v ≤ vu(ϕsu(x)) − vu(ϕsu(x))du + vu(ϕsu(x)) − vu(ϕsu(x)) du s s t t Z Z 0 v 0 v 0 v v ≤ vu(ϕsu(x)) − vu(ϕsu(x))du + kvuk1,∞ ϕsu(x) − ϕsu(x) du s s We shall apply to this inequality a more general version of Theorem 20. The proof is similar to Theorem 20 and we omit it here.

Theorem 23. Consider 3 positive functions cs, αs, us defined on [0,T ] such that u is bounded and Z t (24) ut ≤ ct + αsusds . 0 then t Z R t αr dr ut ≤ ct + csαse s 0

0 v v This theorem can be applied in our case by letting uτ = ϕss+τ (x) − ϕss+τ (x) , 0 ατ = vs+τ 1,∞ and Z s+τ v 0 v cτ = (vu(ϕsu(x)) − vu(ϕsu(x))) du s yielding the inequality t t 0 Z Z  v v 0 0 0 (25) ϕst(x) − ϕst(x) ≤ ct−s + cu−s kvuk1,∞ exp kvu0 k1,∞ du du s u 44 2. ORDINARY DIFFERENTIAL EQUATIONS AND GROUPS OF DIFFEOMORPHISMS

0 R t 0 v 1 Consider the function v 7→ s vu ◦ ϕsudu. It is a linear form on X (T, Ω) which is obviously continuous, since Z t Z t 0 v 0 0 vu ◦ ϕsudu ≤ kvuk∞ du ≤ kv kX 1,T s s Thus, consider a sequence vn ∈ X 1(T, Ω) which is bounded in X 1(T, Ω) and weakly n n 0 converges to v. If we let cτ be equal to cτ with v instead of v , weak continuity n n n implies that cτ → 0 when n tends to infinity. We have cτ ≤ kv kX 1,T + kvkX 1,T , so that it is uniformly bounded and the dominated convergence theorem can be applied to equation (25) to obtain the fact that, for all s, t ∈ [0,T ], for all x ∈ Ω vn v ϕst (x) → ϕst(x). When Ω is bounded, the convergence is in fact uniform in x. Indeed, we have vn shown that the Lipschitz constant of ϕst could be expressed in function of kvnkX 1,T we have the fact that there exists a constant C such that, for all n > 0, and x, y ∈ Ω

vn vn |ϕst (x) − ϕst (y)| ≤ C |x − y| .

vn This implies that the family ϕst is equicontinuous and a similar argument shoes vn that it is bounded, so that Ascoli theorem shows (ϕst , n ≥ 0) is relatively compact for the uniform convergence. But the limit of any subsequence that converges v uniformly must be ϕst since it is already the pointwise limit of the whole sequence. v This implies that the uniform limit exists and is equal to ϕst. If Ω is not bounded, the same argument implies that the limit is uniform on compact sets. Thus, we have just proved the theorem 1 1 Theorem 24. If v ∈ X (T, Ω) and vn is a bounded sequence in X (T, Ω) which weakly converges to v, then, for all s, t ∈ [0,T ], for every compact subset Q ⊂ Ω

vn v lim max |ϕst (x) − ϕst(x)| = 0 n→∞ x∈Q

3. Groups of diffeomorphisms 3.1. Admissible Banach spaces. 1 d Definition 7. A Banach space V ⊂ C0 (Ω, R ) is admissible if it is (canoni- 1 d cally) embedded in C0 (Ω, R ), ie. there exists a constant C such that, for all v ∈ V ,

(26) kvkV ≥ C kvk1,∞ 1 If V is admissible, we denote by XV (Ω) the set of time-dependent vector fields, (vt, t ∈ [0, 1]) such that, for each t, vt ∈ V and Z 1 kvkX 1 := kvtkV dt < ∞ . V 0 1 If the interval [0, 1] is replaced by [0,T ], we will use the notation XV (T, Ω) and kvk 1 . XV ,T 3.2. Induced group of diffeomorphisms. 1 d Definition 8. If V ⊂ C0 (Ω, R ) is admissible, we denote  v V GV = ϕ01, v ∈ X1 (Ω) V the set of diffeomorphims provided by the flow associated to elements v ∈ X1 (Ω) after time 1. 3. GROUPS OF DIFFEOMORPHISMS 45

Theorem 25. GV is a group for the composition of functions. Proof. The identity function belongs to G: it corresponds, for example, to v v 0 v0 0 1 0 w ϕ01 when v = 0. If ψ = ϕ01 and ψ = ϕ01, with v, v ∈ XV , then ψ ◦ ψ = ϕ01 with 0 wt = v2t for t ∈ [0, 1/2] and wt = v2t−1 for t ∈]1/2, 1] (details are left to the reader) 1 v −1 w and w belongs to XV (Ω). Similarly, if ψ = ϕ01, then ψ = ψ01 with wt = −v1−t. Indeed, we have Z 1−t Z t w w w ϕ0,1−t(y) = y − v1−s ◦ ϕ0s(y)ds = y + vs ◦ ϕ0,1−sds 0 1 w v which implies (by the uniqueness theorem) that ϕ0,1−t(y) = ϕ1t(y) and in particular w v ϕ01 = ϕ10. This proves that GV is a group. ut Thus, by selecting a certain Banach space V , we can in turn specify a group of diffeomorphisms. In particular, elements in GV inherit the smoothness properties of V , by Theorem 21.

3.3. A distance on GV . Let V be an admissible Banach space. For ψ and 0 ψ in GV , we let 0 n 0 v o (27) dV (ψ, ψ ) = inf kvk 1 , ψ = ψ ◦ ϕ01 1 XV v∈XV (Ω) We have the theorem:

Theorem 26 (Trouv´e). The function dV is a distance on GV , and (GV , dV ) is a complete metric space.

Recall that dV is a distance if it is symmetrical, satisfies the triangle inequality 0 00 00 0 0 dV (ψ, ψ ) ≤ dV (ψ, ψ ) + dV (ψ , ψ ) and is such that dV (ψ, ψ ) = 0 if and only if ψ = ψ0. Proof. Note that the set over which the infimum is computed is not empty: if 0 −1 0 ψ, ψ ∈ GV , then ψ ◦ ψ ∈ GV (since GV is a group) and therefore can be written v 1 under the form ϕ01 for some v ∈ XV (Ω). 0 Let us start with the symmetry: fix ε > 0 and v such that kvk 1 ≤ d(ψ, ψ )+ε XV 0 v 0 v and ψ = ψ ◦ ϕ01. This implies that ψ = ψ ◦ ϕ10, but we know (from the proof of v w Theorem 25) that ϕ = ϕ with wt = −v1−t. Since kwk 1 = kvk 1 , we have, 10 01 XV XV from the definition of dV : 0 0 dV (ψ , ψ) ≤ kwk 1 ≤ dV (ψ, ψ ) + ε XV 0 0 and since this is true for every ε, we have dV (ψ , ψ) ≤ dV (ψ, ψ ). Interverting the 0 0 0 roles of ψ and ψ yields dV (ψ , ψ) = dV (ψ, ψ ). For the triangular inequality, let v 0 00 0 00 0 00 v and v be such that kvk 1 ≤ d(ψ, ψ )+ε, kv k 1 ≤ d(ψ , ψ )+ε, ψ = ψ ◦ϕ and XV XV 01 0 00 v0 0 v v0 ψ = ψ ◦ ϕ01. We thus have ψ = ψ ◦ ϕ01 ◦ ϕ01 and we know, still from the proof v v0 w 0 of Theorem 25, that ϕ01 ◦ ϕ01 = ϕ01 with wt = v2t for t ∈ [0, 1/2] and wt = v2t−1 0 for t ∈]1/2, 1]. But, in this case, kwk 1 = kvk 1 + kv k 1 so that XV XV XV 0 00 00 0 d(ψ, ψ ) ≤ kwk 1 ≤ d(ψ, ψ ) + d(ψ , ψ ) + 2ε XV which implies the triangular inequality, since this is true for every ε > 0. 0 0 We obviously have d(ψ, ψ) = 0 since ϕ01 = id . Assume that d(ψ, ψ ) = 0. This 0 −1 vn implies that there exists a sequence vn such that kvnk 1 → 0 and ψ ◦ ψ = ϕ . XV 01 v vn 0 0 The continuity of v 7→ ϕst implies that ϕ01 → ϕ01 = id so that ψ = ψ . 46 2. ORDINARY DIFFERENTIAL EQUATIONS AND GROUPS OF DIFFEOMORPHISMS

Let us now check that we indeed have a complete metric space. Let ψn be a Cauchy sequence for dV , so that, for any ε > 0, there exists n0 such that, for any n n0 −n n ≥ n0, dV (ϕ , ψ ) ≤ ε. Taking recursively ε = 2 , it is possible to extract a subsequence ψnk of ψn such that

∞ X nk nk+1 dV (ψ , ψ ) < ∞ k=0 Since a Cauchy sequence converges whenever one of its subsequences does, it is sufficient to show that ψnk has a limit. k 1 From the definition of dV , there exists, for every k ≥ 0 an element v in XV (Ω) k nk+1 v nk such that ψ = ϕ01 ◦ ψ and

k nk nk+1 −k−1 v 1 ≤ dV (ψ , ψ ) + 2 XV

0 1 Let us define a time dependent vector field v by vt = 2v2t for t ∈ [0, 1/2[, vt = 4v4t−2 for t ∈ [1/2, 3/4[, and so on: to define the general term, introduce the dyadic −k−1 sequence of times t0 = 0 and tk+1 = tk + 2 and let

k+1 k v = 2 v k+1 t 2 (t−tk) for t ∈ [tk, tk+1[. Since tk tends to 1 when t → ∞, this defines vt on [0, 1[, and we fix v1 = 0. We have

∞ Z tk+1 X k+1 k kvtkX 1 = 2 v2k+1(t−t ) dt V k V k=0 tk ∞ Z 1 X k = vt V dt k=0 0 ∞ X nk nk+1 ≤ 1 + dV (ψ , ψ ) k=0

1 v so that v ∈ XV (Ω). Now, consider the associated flow ϕ0t: it is obtained by 0 v v0 first integrating 2v2t between [0, 1/2[, which yields ϕ0,1/2 = ϕ01. Iterating this argument, we get

k 0 ϕv = ϕv ◦ · · · ◦ ϕv , 0tk+1 01 01 so that ψnk+1 = ϕv ◦ ψn0 0tk+1

k Let ψ∞ = ϕv ◦ ψn0 . We have ψ∞ = ϕv ◦ ψnk . Since ϕv = ϕw with 01 tk1 tk1 01 k wt = v(t−tk)/(1−tk)/(1 − tk), and Z 1 kwk 1 = kvtk dt XV V tk

n ∞ we obtain the fact that dV (ψ k , ψ ) → 0 which completes the proof of theorem 26. ut 3. GROUPS OF DIFFEOMORPHISMS 47

3.4. Properties of the distance. We first introduce the set of square inte- grable (in time) time dependent vector fields:

2 Definition 9. Let V be an admissible Banach space. We define XV (Ω) as the set of time-dependent vector fields v = (vt, t ∈ [0, 1]) such that, for each t, vt ∈ V and Z 1 2 kvtkV dt < ∞ 0 And we state without proof the important result:

2 Proposition 14. XV (Ω) is a Banach space with norm Z 1 1/2 2 kvkX 2 = kvtkV dt . V 0 2 Moreover, if V is a Hilbert space, then XV (Ω) is Hilbert too with Z 1 hv , wiX 2 = hvt , wtiV dt . V 0 2 R 1  R 1 2 2 1 Because 0 kvtkV dt ≤ 0 kvtkV dt, we have XV (Ω) ⊂ XV (Ω) and if v ∈ 2 X (Ω), kvk 1 ≤ kvk 2 . The computation of dV can be reduced to a minimization V XV XV 2 over XV by 0 Theorem 27. If v is admissible and ψ, ψ ∈ GV , we have

0 n 0 v o (28) dV (ψ, ψ ) = inf kvk 2 , ψ = ψ ◦ ϕ01 2 XV v∈XV (Ω) Proof. Let

0 n 0 v o δV (ψ, ψ ) = inf kvk 2 , ψ = ψ ◦ ϕ01 2 XV v∈XV (Ω) and dV be given by (27). Since δV is the infimum over a larget set than δV , and 0 0 minimizes a quantity which is always smaller, we have dV (ψ, ψ ) ≤ δV (ψ, ψ ) and 1 we now proceed to the reverse inequality. For this, consider v ∈ XV (Ω) such that 0 v 2 ψ = ψ ◦ ϕ01. It suffices to prove that, for any ε > 0 there exists a w ∈ X (Ω) 0 w such that ψ = ψ ◦ ϕ and kwk 2 ≤ kvk 1 + ε. The important remark for this 01 XV XV purpose is that, if α is a differentiable increasing function from [0, 1] onto [0, 1] dαs (which implies α0 = 0 and α1 = 1), then, lettingα ˙ s = ds ,

Z αt ϕv (x) = x + v (ϕ (x))du 0αt u 0u 0 Z t

= x + α˙ svαs (ϕ0αs (x))ds 0 so that the flow generated by w =α ˙ v is ϕv and therefore coincides with ϕv s αs 0αt 01 at t = 1. We have Z 1 Z 1

kwkX 1 = α˙ s kvαs kV ds = kvtkV dt = kvkX 1 V 0 0 V 48 2. ORDINARY DIFFERENTIAL EQUATIONS AND GROUPS OF DIFFEOMORPHISMS so that this time change does not help for the minimization in (27). However, we have, denoting by βt the inverse of αt, Z 1 Z 1 2 2 2 2 kwkX 2 = α˙ s kvαs kV ds = α˙ βt kvtkV dt V 0 0 so that this transformation can be used to reduce kvk 2 . If kvtkV never vanishes, XV ˙ we can chooseα ˙ βt = c/ kvtkV or equivalently βt = kvtkV /c with c chosen so that Z 1 ˙ βtdt = 1 0 which yields c = kvk 1 . This gives XV Z 1 2 2 kwkX 2 = c kvtkV dt = kvkX 1 V 0 V which is exactly the kind of transform we are looking for. In the general case, we ˙ let, for some η > 0,α ˙ βt = c/(η + kvtkV ) which yields βt = (η + kvtkV )/c and c = η + kvk 1 . This gives XV Z 1 2 2 kvtkV kwk 2 = c dt ≤ c kvk 1 = (kvk 1 + η) kvk 1 XV XV XV XV 0 η + kvtkV

By choosing η small enough, we can always arrange that kwk 2 ≤ kvk 1 +ε which XV XV is what we wanted to prove. ut An important consequence of this result is the following fact: 2 Corollary 2. If the infimum in (28) is attained at some v ∈ XV (Ω), then kvtkV is constant. Indeed, let v achieve the minimum in (28): we have 0 dV (ψ, ψ ) = kvk 2 ≥ kvk 1 XV XV 0 but kvk 1 ≥ dV (ψ, ψ ) by definition. Thus, we must have kvk 2 = kvk 1 , which XV XV XV corresponds to the equality case in the Schwartz inequality, which can only be achieved by (almost everywhere) constant functions. Corollary 2 is usefully completed by the following theorem: 0 Theorem 28. If V is Hilbert and admissible, and ψ, ψ ∈ GV , there exists 2 v ∈ XV (Ω) such that 0 dV (ψ, ψ ) = kvk 2 XV 0 v and ψ = ϕ01 ◦ ψ. Proof. By Proposition 14, X 2(Ω) is a Hilbert space. Let us fix a minimizing 0 n 2 n 0 sequence for dV (ψ, ψ ), ie. a sequence v ∈ X (Ω) such that kv k 2 → dV (ψ, ψ ) V XV 0 vn n and ψ = ϕ ◦ ψ. This implies that (kv k 2 ) is bounded and by theorem 7, one 01 XV can extract a subsequence of vn (that we still denote by vn) which weakly converges 2 to some v ∈ XV (Ω), such that n 0 kvk 2 ≤ lim inf kv k 2 = dV (ψ, ψ ) XV XV vn v 0 v But Theorem refth:weak.unif implies that ϕ converges to ϕ so that ψ = ϕ01 ◦ ψ remains true: this proves Theorem 28. ut CHAPTER 3

Matching deformable objects

1. General principles The methods we shall investigate in this chapter use variational formulations to find optimal matchings between two different structures. They will all obey a very simple general form. Let Ω be an open subset of Rd and G a group of diffeomorphisms on Ω. Consider a set I of structures of interest, on which G is acting: for every I in I and every ϕ ∈ G, the result of the action of ϕ on I is denoted ϕ.I and is a new element of I. Natural conditions require that id .I = I and ϕ.(ψ.I) = (ϕ ◦ ψ).I (we will discuss group actions more rigorously in the next chapter). Then, given two elements I and I0 in I, the optimal matching ϕ between 0 I and I is computed by minimizing a functional EI,I0 (ϕ) which measures how adequate ϕ is as a matching function between I and I0. Most of the time, E will be of the kind 0 (29) EI,I0 (ϕ) = δ(ϕ) + D(ϕ.I, I ) where δ is a function which penalizes unwanted behavior, typically ensuring that ϕ is smooth and not too far from the identity, and D measures the difference between ϕ.I and I0. Before reviewing examples and algorithms in this framework, we may list some properties that we would like to see satisfied. An obvious one should be that if I = I0, then the preferred ϕ should be identity. If E is given by (29), this will be true when δ is minimal at the identity and D(I,I0) minimal when I = I0. Another important property is consistency: can one find a minimizer for EI,I0 in G (or maybe in some larger space than G on which EI,I0 could still be defined)? Obviously, the problem would otherwise be meaningless. Beside consistency is feasibility: can one find an algorithm which is able to solve the problem? Is the algorithm stable enough? Symmetry may also be a requirement: if ϕ is found to provide an optimal correspondence from I to I0, it should be natural that ϕ−1 has the same property from I0 to I.

2. Matching terms and greedy descent procedures 2.1. Eulerian gradients. We consider energies of the form (29). For such energies, the matching term largely depends on the problem which is considered. Some of them are reviewed in this section. There are two important issues which will be addressed in each case. The first one is the continuity of the matching function. Assume that I and I0 are given: what assumptions must be made on 0 0 a converging sequence ϕn → ϕ to ensure that D(ϕn.I, I ) → D(ϕ.I, I )? Often, pointwise convergence will be enough (ϕn(x) → ϕ(x) for all x), but sometimes,

49 50 3. MATCHING DEFORMABLE OBJECTS some additional requirements will be necessary (like assumptions on the derivatives of ϕn). The second important issue concerns the differentiability of the matching term. We will denote the differential of a function U defined on G by δU  d (30) (ϕ)| h = U(ϕ + εh) . δϕ dε |ε=0 As indicated by its definition, δU/δϕ is a linear form acting of variations of diffeo- morphisms (i.e., on vector fields). This linear form does not need to be a function. For example, if U(ϕ) = |ϕ(x)|2 for some fixed x ∈ Ω, then δU  (ϕ)| h = 2ϕ(x)T h(x), δϕ so that δU/δϕ = 2ϕ(x) ⊗ δx. In the following, it will be useful to also consider other forms of variations than ϕ 7→ ϕ+εh, and introduce what can be called an Eulerian gradient for the function U. This can be defined as follows. Let V be a Hilbert space included in L2(Ω, Rd) 1 such that GV ⊂ G. This implies that, if U is a function defined on G, for v ∈ XV (Ω) v and ϕ ∈ G, the function t 7→ U(ϕ0t ◦ ϕ) is well-defined. Definition 10. We say that U has a Eulerian gradient with respect to V if V and only if, for all ϕ ∈ G, there exists an element denoted ∇ U(ϕ) ∈ V such that

d v D V E (31) U(ϕ0t ◦ ϕ)|t=0 = ∇ U(ϕ) , v(0) dt V 1 in which we identify v with the element vt ≡ v ∈ XV (Ω). We will also call the Eulerian differential the linear form defined by D V E ∂¯V U(ϕ)| v = ∇ U(ϕ) , v V V i.e., ∂¯V U(ϕ) = K−1∇ U(ϕ) where K is the kernel of V . The Eulerian gradient is directly related to the differential via the relation, holding for v ∈ V :

D V E δU  (32) ∇ U(ϕ) , v = ∂¯V U(ϕ)| v = (ϕ)| v ◦ ϕ . V δϕ Moreover, by definition of the kernel operator, K, on V , we have V (33) ∇ U(ϕ) = K(∂¯V U(ϕ)). The interest of this notion is to define the “gradient descent” process which generates a time dependent element of G by setting

dϕt(x) V (34) = −∇ U ◦ ϕt(x) dt ϕt R t V As long as ∇ϕ U ds is finite, this generates a time-dependent element of GV . 0 s V This is therefore an evolution within the group of diffeomorphisms, an important V property. Under some assumptions on the continuity of ϕ 7→ U(ϕ) and ϕ 7→ ∇ϕ U, it can be shown that 2 dU(ϕt) V = − ∇ϕ U dt t V 2. MATCHING TERMS AND GREEDY DESCENT PROCEDURES 51 so that U(ϕt) decreases with time. We now discuss different versions of (34) adapted to different matching problems. There is another interesting relation between δU/δϕ and the Eulerian differen- tial in the following particular case. Assume that U takes the form of some function Z of ϕ.I where I is a fixed deformation object and I 7→ ϕ.I possesses the properties we have discussed before, with in particular (ϕ◦ψ).I = ϕ.(ψ.i). Then, we can write d ∂U¯ (ϕ)| h = Z(((id + εh) ◦ ϕ).I) dε |ε=0 d = Z((id + εh)(ϕ.I)) dε |ε=0 δU  = ϕ (id )| h ψ where Uϕ(ψ) := U(ψ.(ϕ.I)). From a practical point of view, the computation of ∂U¯ (ϕ) can be done by first computing δZ(ϕ.J)/δϕ at the identity for an arbitrary J, and then to take J = ϕ.I for the formula.

2.2. Point matching. Assume that objects are collections of N points x1, . . . , xN ∈ Ω. Diffeomorphisms act on such objects by:

(35) ϕ.(x1, . . . , xN ) = (ϕ(x1), . . . , ϕ(xN )) 0 0 0 Fix two objects I = (x1, . . . , xN ) and I = (x1, . . . , xN ) and consider the matching functional N X 0 2 (36) U(ϕ) = |xi − ϕ(xi)| i=1 U is obviously continuous for pointwise convergence. The computation of δU/δϕ is straightforward and provides

N δU  X (37) | h = 2 (ϕ(x ) − x0 )T h(x ). δϕ i i i i=1 This can be written N δU X = 2 (ϕ(x ) − x0 ) ⊗ δ . δϕ i i xi i=1 The Eulerian differential comes immediately from (32), yielding

N δU  X (38) | h = 2 (ϕ(x ) − x0 )T h ◦ ϕ(x ). δϕ i i i i=1 or N δU X = 2 (ϕ(x ) − x0 ) ⊗ δ . δϕ i i ϕ(xi) i=1 Finally, from (33), we have

N V X 0 (39) ∇ U(ϕ) = 2 K(., ϕ(xi))(ϕ(xi) − xi). i=1 52 3. MATCHING DEFORMABLE OBJECTS

The gradient descent algorithm is N dϕt(x) X (40) = −2 K(ϕ (x), ϕ (x ))(ϕ (x ) − x0 ). dt t t i t i i i=1 i This system can be solved in two times: denote yt = ϕt(xi). Applying (40) at x = xj yields j N dy X t = −2 K(yj, yi)(yi − x0 ) dt t t t i i=1 1 N This is a differential system in yt , . . . , yt which must be solved with initial con- j ditions y0 = ϕ0(xj). Once this is done, the value of ϕt(x) for a general x is the solution of the differential equation N dy X = −2 K(y, yi)(yi − x0 ) dt t t i i=1

We now summarize this algorithm, assuming that ϕ0 = id : Algorithm 2 (greedy spline minimization). Solve the differential system j N dy X t = −2 K(yj, yi)(yi − x0 ) dt t t t i i=1 j with initial conditions y0 = xj. Then, let ϕt be the flow associated to the ordinary differential equation N dy X = −2 K(y, yi)(yi − x0 ) dt t t i i=1 2.3. Image matching. Now, consider that objects are functions I defined on Ω with values in R. Diffeomorphisms act on functions by: (ϕ.I)(x) = I(ϕ−1(x)) for x ∈ Ω. Fixing two such functions I and I0, the simplest matching functional which can be considered is Z −1 0 2 (41) U(ϕ) = I ◦ ϕ (x) − I (x) dx. Ω We assume that I is differentiable and first compute δU/δϕ. For this, we must first notice that d (42) (ϕ + εh)−1 = −D(ϕ−1)h ◦ ϕ−1. dε |ε=0 This can be proved by computing the derivative of the equation (ϕ + εh)−1 ◦ (ϕ + εh) = id at ε = 0. Using this, we can compute δU  Z | h = −2 (I ◦ ϕ−1(x) − I0(x))∇IT ◦ ϕ−1D(ϕ−1)h ◦ ϕ−1dx. δϕ Ω and Z ∂¯V U(ϕ)| h = −2 (I ◦ ϕ−1(x) − I0(x))∇IT ◦ ϕ−1D(ϕ−1)hdx. Ω so that (43) ∂¯V U(ϕ) = −2(I ◦ ϕ−1 − I0)∇(I ◦ ϕ−1) ⊗ dx. 2. MATCHING TERMS AND GREEDY DESCENT PROCEDURES 53

(Using the fact that ∇(I ◦ ϕ−1)T = ∇IT ◦ ϕ−1D(ϕ−1).) To compute the Eulerian gradient de U, we want to use the following lemma: Lemma 4. If V is a Hilbert space of smooth vector fields on Ω with kernel K and µ is a measure on Ω, z a µ-measurable function from Ω to Rd (a vector density), then, for all x ∈ Ω, Z K(z ⊗ µ)(x) = K(x, y)z(y)dµ(y). Ω Proof. Recall that z ⊗ µ is the linear form on vector fields defined by Z (z ⊗ µ| h) = zT hdµ. Ω The proof of lemma 4 goes as follow. From the definition of the kernel, we have, for any a ∈ Rd: T a K(z ⊗ µ)(x) = (a ⊗ δx| K(z ⊗ µ))

= (z ⊗ µ| K(a ⊗ δx)) = (z ⊗ µ| K(., x)a) Z = zT (y)K(y, x)adµ(y) Ω Z = aT K(x, y)z(y)dµ(y). Ω ut The expression of the Eulerian gradient of U is now straightforward given Lemma 4: V Z (44) ∇ U(ϕ) = −2 (I ◦ ϕ−1(y) − I0(y))K(., y)∇(I ◦ ϕ−1)(y)dy. Ω This provides the following greedy gradient descent algorithm, the original version being given in [7] (see also [26])

Algorithm 3 (Greedy image matching). Start with ϕ0 = id and solve the evolution equation Z d 0 ϕt(y) = 2 (It(x) − I (x))K(ϕt(y), x)∇It(x)dx dt Ω −1 with It = I ◦ ϕt .

Other, more complex, comparison criteria for images can be based on image histograms, or grey-level distributions. For example, define, for λ ∈ R the level line of an image I at λ by Iλ = {x ∈ Ω,I(x) = λ} In order to compare images which have been obtained under different lighting con- ditions, or by different imaging devices, in which case the grey level values are not reliable anymore, it is necessary to use matching criteria which are invariant to one-to-one transformations of the grey-levels. A group of methods which have been used in this context are based on joint local histograms of the grey levels. Such local histograms are computed from a pair of images, I,I0, and are functions of the kind 0 Hx(λ, λ ) which count the frequency of simultaneous occurence of grey-levels λ in I 54 3. MATCHING DEFORMABLE OBJECTS and λ0 in I0 at the same location in a small window around x. One computationally feasible way to define it is to use the “kernel estimator” Z 0 0 0 Hx(λ, λ ) = f(|I(y) − λ|)f(|I (y) − λ |)g(x, y)dy Ω where f is positive, R f(t)dt = 1, and f vanishes when t is far from 0, g is such R R that for all x, Ω g(x, y)dy = 1 and g(x, y) vanishes when y is far from x. For each x, Hx is a bidimensional probability function, and there exists several ways for measuring the degree of dependence between each of its components. The simplest one, which is probably sufficient for most applications, is the correlation ratio, given by (reintroducing I and I0 in the notation) R 0 0 0 0 2 λλ Hx(λ, λ )dλdλ Cx(I,I ) = 1 − q R R 2 0 0 R 0 2 0 0 2 λ Hx(λ, λ )dλdλ 2 (λ ) Hx(λ, λ )dλdλ R R It is then possible to define the matching function by Z −1 0 U(ϕ) = Cx(I ◦ ϕ ,I )dx Ω Here again, the computation of Eulerian gradients is possible. Since it is quite lengthy, we do not detail it, but the important fact to remember is that they lead to an algorithm similar to 3 for what is often called multimodal image matching. 2.4. Measure matching. We now consider the case when the compared ob- jects are measures on Rd. The motivation for considering measures is the matching d of unordered point sets (x1, . . . , xN ) in R . At the difference of the point matching method that has been discussed in section 2.2, there is no information on which element to the second point set corresponds to a given element of the first one. In fact, the sizes of the two point sets do not even need to be the same. In such a framework, it is natural to represent (x1, . . . , xN ) as a sum of Dirac measures N X µx = δxi i=1 which induces the measure matching problem. The only fact we will need here concerning measures is that they are linear forms acting on functions on Rd via Z (µ| f) = fdµ. d R In particular, if µx is like above, N X (µx| f) = f(xi). i=1 If ϕ is a diffeomorphism of Ω and µ a measure, we define a new measure ϕ.µ as follows (ϕ.µ| f) = (µ| f ◦ ϕ).

If µ = µx this gives N X  (ϕ.µx| f) = f ◦ ϕ(xi) = µϕ(x)| f i=1 2. MATCHING TERMS AND GREEDY DESCENT PROCEDURES 55 so that the transformation of the measure associated to a point set x is the measure associated to the transformed point set, which is reasonable. Now that we have defined ϕµ we need to define our function D(ϕµ, µ0) in order to run the greedy matching algorithm. We can use the norm of the difference between the two measures, provided we have chosen a suitable norm on measures. Since measures are linear forms on functions, one possible choice is to use the dual to a function norm, i.e., to define (45) kµk = sup {(µ| f): kfk = 1} which now displaced the question to which function norm to use. One choice leads to a particularly simple and elegant formulation, in association –again– to the theory of reproducing kernels. Indeed, let W be a reproducing kernel Hilbert space of functions, so that we ∗ have an operator k : W → W with kδx = k(., x) and with the identity (µ| f) = ∗ hkµ , fiW for µ ∈ W , f ∈ W . With this choice, (45) becomes

kµkW ∗ = sup {(µ| f): kfkW = 1}

= sup {hkµ , fiW : kfkW = 1} = kkµkW This implies that 2 kµkW ∗ = hkµ , kµiW = (µ| kµ). If µ is a measure, this expression is very simple and given by Z 2 kµkW ∗ = k(x, y)dµ(x)dµ(y). R This is because kµ(x) = (δx| kµ) = (µ| kδx) = k(y, x)dµ(y). So we can take 0 0 2 (46) D(ϕµ, µ ) = kϕµ − µ kW ∗ Let’s make this expression more explicit. we have

0 0 0 0 D(ϕµ, µ ) = hϕµ , ϕµiW ∗ − 2hϕµ , µ iW ∗ + hµ , µ iW ∗ = (ϕµ| k(ϕµ)) − 2(ϕµ| kµ0) + (µ0| kµ0) Z Z = k(ϕ(x), ϕ(y))dµ(x)dµ(y) − 2 k(ϕ(x), y)dµ(x)dµ0(y) Z + k(x, y)dµ0(x)dµ0(u).

It is now easy to compute δU/δϕ. It is given by (letting ∇1k be the gradient of k with respect to its first variable and using the fact that k is symmetric) δU  Z | h = 2 ∇ k(ϕ(x), ϕ(y))T h(x)dµ(x)dµ(y) δϕ 1 Z T 0 −2 ∇1k(ϕ(x), z) h(x)dµ(x)dµ (z) which can be written as δU Z Z  (47) = 2 ∇ k(ϕ(.), ϕ(y))dµ(y) − ∇ k(ϕ(.), z)dµ0(z) ⊗ µ δϕ 1 1 56 3. MATCHING DEFORMABLE OBJECTS

The Eulerian differential now is Z ¯V  T ∂ U(ϕ)| h = 2 ∇1k(ϕ(x), ϕ(y)) h ◦ ϕ(x)dµ(x)dµ(y) Z T 0 −2 ∇1k(ϕ(x), z) h ◦ ϕ(x)dµ(x)dµ (z) which yields the expression of the Eulerian gradient: (48) Z Z  V 0 ∇ U(ϕ)(.) = 2 K(., ϕ(x)) ∇1k(ϕ(x), ϕ(y))dµ(y) − ∇1k(ϕ(x), z)dµ (z) dµ(x)

The gradient descent algorithm is dϕ Z Z  (ξ) = −2 K(ϕ(ξ), ϕ(x)) ∇ k(ϕ(x), ϕ(y))dµ(y) − ∇ k(ϕ(x), z)dµ0(z) dµ(x). dt 1 1 0 Let’s now consider our case of interest for which µ = µx and µ = µy for point sets x = (x1, . . . , xN ) and y = (y1, . . . , yM ). The algorithm becomes (49) N  N M  dϕ X X X (ξ) = −2 K(ϕ(ξ), ϕ(x )) ∇ k(ϕ(x ), ϕ(x )) − ∇ k(ϕ(x ), y ) . dt i  1 i j 1 i h  i=1 j=1 h=1 Like in the case of labelled point sets, this equation may be solved in two times: lettingx ˜i(t) = ϕt(xi) solve the system

N  N M  dx˜q X X X = −2 K(˜x , x˜ ) ∇ k(˜x , x˜ ) − ∇ k(˜x , y ) . dt q i  1 i j 1 i h  i=1 j=1 h=1 ˜ Once this is done, the trajectory of an arbitrary point ξ(t) = ϕt(ξ) is

N  N M  dξ˜ X X X = −2 K(ξ,˜ x˜ ) ∇ k(˜x , x˜ ) − ∇ k(˜x , y ) . dt i  1 i j 1 i h  i=1 j=1 h=1 2.5. Matching lines and surfaces. 2.5.1. Curve matching with measures. The previous approach holding for any measure, it is not restricted to discrete point sets but also applies to continuous ones. It can, for example, be used to match plane or 3D curves, or surfaces. In this situation, however, it has the important aspect of not being geometric. To see why, consider the simplest example of plane curves. If γ is a curve, parameterized over an interval [a, b], one defines the line measure µγ by Z Z b (µγ | f) = fdl = f(γ(u))|γ˙ (u)|du. γ a This definition does not depends on how γ has been parameterized. Now, if ϕ is a diffeomorphism, we have, by definition of the action of diffeomorphisms on measures Z Z b (ϕµγ | f) = f ◦ ϕdl = f(ϕ(γ(u)))|γ˙ (u)|du γ a whereas Z Z b (µϕγ | f) = fdl = f(ϕ(γ(u)))|Dϕ(γ(u))γ ˙ (u)|du. ϕ(γ) a 2. MATCHING TERMS AND GREEDY DESCENT PROCEDURES 57

So, at the difference of point sets, for which ϕµx = µϕ(x), the image of the measure associated to a curve is not the measure associated to the image of a curve. Since the initial goal is to compare curves, and not measures, it is more natural to use the second definition, µϕγ , rather than the first one. We still have, using the notation of the previous section

D(µϕγ , µγ˜) = hµϕγ , µϕγ iW ∗ − 2hµϕγ , µγ˜iW ∗ + hµγ˜ , µγ˜iW ∗ = (µϕγ| k(µϕγ )) − 2(µϕγ | kµγ˜) + (µγ˜| kµγ˜) Z b Z b = k(ϕ(γ(u)), ϕ(γ(v)))|Dϕ(γ(u))γ ˙ (u)||Dϕ(γ(v))γ ˙ (v)|dudv a a Z b Z ˜b −2 K(ϕ(γ(u)), γ˜(v))|Dϕ(γ(u))γ ˙ (u)|dudv a a˜ Z + k(˜γ(u), γ˜(v))dudv.

The fact that derivatives of ϕ appear in the integral make the computation of the gradient slightly more complex. We shall only describe here how this compu- tation may be achieved without providing all details. The new term compared to the previous computation is |Dϕ(γ(u))γ ˙ (u)| and we focus on its variation, writing (using the fact that (d/dε)|z + εζ| = zT ζ/|z| at ε = 0, we get d |D(ϕ + εh)(γ(u))γ ˙ (u)| = (Dh(γ(u))γ ˙ (u))T Dϕ(γ(u))γ ˙ (u)/|Dϕ(γ(u))γ ˙ (u)|. dε |ε=0 This expression is still linear in h, but now requires using derivatives. In the computation of the Eulerian gradient, this requires using differential of the kernel, as the following computation indicates. We can indeed formally write, for h ∈ V and a ∈ Rd T ∂h ∂ a (x) = hh , K(., x)aiV ∂xi ∂xi  ∂  = h , K(., x)a ∂xi V which therefore involves the derivative of the kernel with respect to its second variable. This implies that T T (50) a Dh(x)b = h , b D2K(., x)a V where D2K is the differential of K with respect to its second coordinate. The rest of the computation is handled like for the general measure case; this yields, using the notation γϕ(u) = ϕγ(u) (details are left to the reader)

b b V Z Z ∇ U(ϕ)(.) = 2 K(., γϕ(u))∇1k(γϕ(u), γϕ(v))|γ˙ ϕ(u)||γ˙ ϕ(v)|dudv a a Z b Z b T + 2 η(u) D2K(., γϕ(u))η(u)k(γϕ(u), γϕ(v))|γ˙ ϕ(u)||γ˙ ϕ(v)| a a Z b Z ˜b − 2 ∇1k(γϕ(u), γ˜(v))|γ˙ ϕ(u)| |γ˜˙ (v)|dudv a a˜ Z b Z ˜b T − 2 η(u) D2K(., γϕ(u))η(u)k(γϕ(u), γ˜(v))|γ˙ ϕ(u)| |γ˜˙ (v)|dudv a a˜ 58 3. MATCHING DEFORMABLE OBJECTS with η(u) =γ ˙ ϕ(u)/|γ˙ ϕ(u)|. The computation is simpler, however, if we directly optimize a discretized ver- sion of D(µϕγ , µγ˜) (which needs to be discretized anyway). If a curve γ is discretized into points x0, . . . xN (with xN = x0 if the curve is closed), one can define the dis- cretized measure, still denoted µγ N X (µγ | f) = f(ci)|τi| i=1 with ci = (xi + xi−1)/2 and τi = xi − xi−1. The measure ϕ.γ takes the same form ϕ ϕ with ci = (ϕ(xi) + ϕ(xi−1))/2 and τi = ϕ(xi) − ϕ(xi−1). This time, lettingγ ˜ be discretized inx ˜1,..., x˜M , we have N X ϕ ϕ ϕ ϕ D(µϕγ , µγ˜) = k(ci , cj )|τi | |τj | i,j=1 N N M X X ϕ ϕ X −2 k(ci , c˜j)|τi | |τ˜j| + k(˜ci, c˜j)|τ˜i| |τ˜j|. i=1 j=1 i,j=1 The differential of this expression is computed like with point measures and provides (assuming closed curves)

N  N  δU X X = (∇ k(cϕ, cϕ)|τ ϕ| + ∇ k(cϕ , cϕ)|τ ϕ |)|τ ϕ| ⊗ δ δϕ  1 i j i 1 i+1 j i+1 j  xi i=1 j=1   N N ϕ ϕ X X τ τ +2 (k(cϕ, cϕ) i − k(cϕ , cϕ) i+1 )|τ ϕ| ⊗ δ  i j |τ ϕ| i+1 j |τ ϕ | j  xi i=1 j=1 i i+1

N  N  X X ϕ ϕ ϕ ϕ −  (∇1k(ci , c˜j)|τi | + ∇1k(ci+1, c˜j)|τi+1|)|τ˜j| ⊗ δxi i=1 j=1   N N ϕ ϕ X X τ τ +2 (k(cϕ, c˜ ) i + k(cϕ , c˜ ) i+1 )|τ˜ | ⊗ δ  i j |τ ϕ| i+1 j |τ ϕ | j  xi i=1 j=1 i i+1 from which the Eulerian differential and gradient can be computed. 2.5.2. Curve matching with vector measures. Instead of describing a curve by a measure, which is a linear form on functions, it is possible to represent it by a vector measure, which is a linear form on vector fields. For each curve parameterized d γ :[a, b] → R , we will define a vector measure νγ , which will therefore associate to d each vector field f on R a number (νγ | f) given by Z b T νγ (f) = γ˙ (u) f ◦ γ(u)du. a It can be checked that this definition is geometric: it is invariant by a change of parameterization. It is in fcat given by τγ ⊗ dl where τ is the unit tangent to γ and dl the line measure along γ (note that this depends on the orientation of γ). If ϕ is a diffeomorphism, we then have Z b T νϕγ (f) = (Dϕ(γ(u))γ ˙ (u)) f ◦ ϕ(γ(u))du. a 2. MATCHING TERMS AND GREEDY DESCENT PROCEDURES 59

Like for scalar measures, we can use a dual norm for the comparison of two vector measures, defined by

kνkW ∗ = sup {(ν| f): kfkW = 1} where W now is a reproducing kernel Hilbert space, of vector fields this time, and 2 we still have kνkW ∗ = (ν| kν) where k is the kernel of W . In particular, we have Z b Z b 2 T kνγ kW ∗ = γ˙ (u) k(γ(u), γ(v))γ ˙ (v)dudv a a and Z b Z b 2 T T kνγ kW ∗ = γ˙ (u) Dϕ(γ(u)) k(ϕ(γ(u)), ϕ(γ(v)))Dϕ(γ(v))γ ˙ (v)dudv. a a

Defining D(νϕγ , νγ˜) = kνϕγ − νγ˜kW ∗ , a computation similar to the previous one can be made to obtain the Eulerian gradient. We have in particular   Z b Z b δU T T | h = 2 γ˙ (u) Dϕ(γ(u)) D1k(ϕ(γ(u)), ϕ(γ(v)))h(γ(u))Dϕ(γ(v))γ ˙ (v)dudv) δϕ a a Z b Z b T T −2 γ˙ (u) Dϕ(γ(u)) D1k(ϕ(γ(u)), γ˜(v))h(γ(u))γ˜˙ (v)dudv a a Z b Z b +2 γ˙ (u)T Dh(γ(u))T k(ϕ(γ(u)), ϕ(γ(v)))Dϕ(γ(v))γ ˙ (v)dudv) a a Z b Z ˜b −2 γ˙ (u)T Dh(γ(u))T k(ϕ(γ(u)), γ˜(v))γ˜˙ (v)dudv. a a˜ Dealing with the derivatives of h can be done like with measures, yielding derivatives of the V kernel in the Eulerian gradient. Here, however, we can also get rid of the derivative of h, al least when the curve γ is smooth enough. Denote, to simplify, γϕ = ϕ.γ = ϕ ◦ γ. Since Dh(γ(u))γ ˙ (u) = (d/du)h ◦ γ(u), we can use an integration by parts in the last two integrals, yielding

Z b Z b T T γ˙ (u) Dh(γ(u)) k(γϕ(u), γϕ(v))γ ˙ ϕ(v)dudv = a a Z b Z b T T h(γ(b)) k(γϕ(b), γϕ(v))γ ˙ ϕ(v)dv − h(γ(a)) k(γϕ(a), γϕ(v))γ ˙ ϕ(v)dv a a Z b Z b T − h ◦ γ(u) (D1k(γϕ(u), γϕ(v))γ ˙ ϕ(u))γ ˙ ϕ(v)dv a a and

Z b Z ˜b T T γ˙ (u) Dh(γ(u)) k(γϕ(u), γ˜(v))γ˜˙ (v)dudv = a a˜ Z ˜b Z ˜b T T h(γ(b)) k(γϕ(b), γ˜(v))γ˜˙ (v)dv − h(γ(a)) k(γϕ(a), γ˜(v))γ˜˙ (v)dudv a˜ a˜ Z b Z ˜b T − h(γ(u)) (D1k(γϕ(u), γ˜(v))γ ˙ ϕ(u))γ˜˙ (v)dudv a a˜ 60 3. MATCHING DEFORMABLE OBJECTS

Replacing h by h ◦ ϕ, we obtain what we need to compute the Eulerian differential

Z b Z b ¯  T ∂U(ϕ)| h = 2 γ˙ ϕ(u) (D1k(γϕ(u), γϕ(v))h(γϕ(u)))γ ˙ ϕ(v)dudv a a Z b Z b T −2 γ˙ ϕ(u) (D1k(γϕ(u), γ˜(v))h(γϕ(u)))γ˜˙ (v)dudv a a Z b Z ˜b ! T +2h(γϕ(b)) k(γϕ(b), γϕ(v))γ ˙ ϕ(v)dv k(γϕ(b), γ˜(v))γ˜˙ (v)dv a a˜ Z b Z ˜b ! T −2h(γϕ(a)) k(γϕ(a), γϕ(v))γ ˙ ϕ(v)dv − k(γϕ(a), γ˜(v))γ˜˙ (v)dv a a˜ Z b Z b Z ˜b ! T −2 h(γϕ(u)) (D1k(γϕ(u), γϕ(v))γ ˙ ϕ(u))γ ˙ ϕ(v)dv − (D1k(γϕ(u), γ˜(v))γ ˙ ϕ(u))γ˜˙ (v)dv du. a a a˜

To simplify this expression, let τϕ be the unit tangent to γϕ, and similarly forτ ˜). Let kij be the component of the matrix kernel k. Introduce the quantities ρ andρ ˜ such that, for a point x on γϕ, Z X ∂kij ρ (x) = τ T (x) (x, y)τ (y)dy q ϕ ∂x ϕ i,j γϕ q Z X ∂kij ρ˜ (u) = τ˙ (x)T (x, y˜)˜τ(˜y)dy˜ q ϕ ∂x i,j γ˜ q for q = 1, . . . , d, Z ζq(x) = (D1k(x, y)τϕ(x))τϕ(y)dy γϕ Z ˜ ζq(x) = (D1k(x, y)τϕ(x))˜τ(˜y)dy˜ γ˜ϕ and Z η(x) = k(x, y)τϕ(y)dy γϕ Z η˜(x) = k(x, y˜)˜τ(˜y)dy˜ γ˜ so that Z ∂U¯ (ϕ)| h = 2 hT (ρ + ζ − ρ˜ − ζ˜) γϕ T T +2h(x1) (η(x1) − η˜(x1)) − 2h(x1) (η(x1) − η˜(x1)). x0 and x1 being the extremities of γϕ. This implies that ¯ ˜ (51) ∂U(ϕ) = 2(ρ + ζ − ρ˜ − ζ) ⊗ µϕ.γ + 2(η − η˜) ⊗ (δx1 − δx0 ). (The last term vanishes is γ is a closed curve). 2. MATCHING TERMS AND GREEDY DESCENT PROCEDURES 61

3 2.5.3. Surfaces. If S is a surface in R , one can define a measure µS and a vector measure νS by Z (µS| f) = f(x)dσS(x) for a scalar f S and Z T (νS| f) = f(x) N(x)dσS(x) for a vector field f S where dσ is the area form on S and N is the unit normal (S being assumed to be oriented in the second case). If W is a reproducing kernel Hilbert space of scalar functions or vector fields, we can still compare two surfaces by using the norm of the difference of their associated measures on W ∗. We state without proof the following result

Theorem 29. If S is a surface and ϕ a diffeomorphism of R3 that preserves the orientation (i.e., with positve jacobian), we have Z  −T µϕ(S)| f = f ◦ ϕ(x)|Dϕ(x) N| det Dϕ(x)dσS(x) S for a scalar f and Z  T −T νϕ(S)| f = f ◦ ϕ(x) Dϕ(x) N det Dϕ(x)dσS(x). S

If e1(x), e2(x) is a basis of the tangent plane to S at x, we have −T (52) Dϕ(x) N det Dϕ(x) = (Dϕ(x)e1 × Dϕ(x)e2)/|e1 × e2|. The last formula implies in particular that if S is locally parameterized by (u, v), so that (since N = (e1 × e2)/|e1 × e2| and dσS = |e1 × e2|dudv) Z T (νS| f) = f(x) (e1 × e2)dudv Z = det(e1, e2, f)dudv and Z  νϕ(S)| f = det(Dϕe1, Dϕe2, f ◦ ϕ)dudv.

Theorem 29 provides the necessary formulae to compute the Eulerian gradient in both cases. This is similar to the case of curves and left to the reader. Consider now the discrete case and let S be a triangulated surface. Let T1,...,TN be the triangles that compose S, ci be the center of Ti, Ni its oriented unit normal and ai its area. We define the discrete versions of the previous measures by

N X (µS| f) = f(ci)ai for a scalar f i=1 and N X T (νS| f) = f(ci) Niai for a vector field f. i=1 62 3. MATCHING DEFORMABLE OBJECTS

Moreover, letting xi1, xi2 and xi3 be the vertices of Ti, the previous formulae can be written

N X xi1 + xi2 + xi3  (µ | f) = f |(x − x ) × (x − x )| S 3 i2 i1 i3 i1 i=1 and N T X xi1 + xi2 + xi3  (ν | f) = f (x − x ) × (x − x ) S 3 i2 i1 i3 i1 i=1 where the last formula assumes that the vertices or the triangles are ordered con- sistently with the orientation. The transformed surfaces are now represented by the same expressions with xik replaced by ϕ(xik). The explicit formulae for the Eulerian gradient of D(µϕ.S, µS˜) can be obtained after a routine (but lengthy) computation similar to the cases of point sets or discretized curves [27]. 2.5.4. Induced actions and currents. This section assumes some preliminar knowledge in differential geometry. It can be skipped without harming the under- standing of the rest of these notes We have designed the action of diffeomorphisms on measures by (ϕµ| f) = (µ| f ◦ ϕ). Recall that we have the usual action of diffeomorphisms on functions defined by ϕ.f = f ◦ ϕ−1, so that we can write (ϕµ| f) = µ| ϕ−1.f. In the case of curves, we have seen that this action on the induced measure did not correspond to the image of the curve by a diffeomorphisms, in the sens that µϕγ 6= ϕµγ . Here, we discuss whether the transformations µγ → µϕγ or νγ → νϕγ (and the equivalent transformations for surfaces) can be explained by a similar operation, e.g., can one write (ϕµ| f) = µ| ϕ−1 ? f where ? would represent another action of diffeomorphisms on functions. In fact, for µγ , the answer is no. We have, letting τ(γ(u))) be the unit tangent to γ, we have

Z b Z b (µϕγ | f) = f(ϕ(γ(u)))|Dϕ(γ(u))γ ˙ (u)|du = f(ϕ(γ(u)))|Dϕ(γ(u))τ(u)||γ˙ (u)|du a a

−1  so that (µϕγ | f) = µϕγ | f ◦ ϕ|Dϕτ ◦ ϕ | , with some abuse of notation in the last formula, since τ is only defined along γ. The important fact here is that the function f is transformed according to a rule which depends, not only on the diffeomorphism ϕ, but also on the curve γ and therefore the result cannot be put under the form ϕ−1 ? f. The situation is different for vector measures. Indeed, we have

Z b T νϕγ (f) = Dϕ(γ(u))γ ˙ (u) f ◦ ϕ(γ(u))du a T  = νγ | Dϕ f ◦ ϕ .

−1 T −1 −1  So, if we define ϕ ? f = D(ϕ ) f ◦ ϕ , we have (νϕγ | f) = νγ | ϕ ? f . The transformation (ϕ, f) 7→ ϕ ? f is a valid action of diffeomorphisms on vector fields, since id ? f = f and ϕ ? (ψ ? f) = (ϕ ◦ ψ) ? f as can be easily checked. The same analysis can be made for surfaces; the scalar measures do not trans- form in accordance to an action, but the vector measures do. Let’s check this last 2. MATCHING TERMS AND GREEDY DESCENT PROCEDURES 63 point by considering the formula in a local chart, where Z  νϕ(S)| f = det(Dϕe1, Dϕe2, f ◦ ϕ)dudv Z −1 = det(Dϕ) det(e1, e2, (Dϕ) f ◦ ϕ)dudv

−1  = νS| det(Dϕ)Dϕ f ◦ ϕ . So, we here need to define ϕ ? f = det(D(ϕ−1))(Dϕ−1)−1f ◦ ϕ−1 = Dϕ ◦ ϕ−1f ◦ ϕ−1/ det(Dϕ ◦ ϕ−1). Here again, a direct computation shows that this is an action. We have therefore discovered that the vector measures are transformed by a diffeomorphism ϕ according to a rule (ϕµ| f) = µ| ϕ−1 ? f, ,the action ? being different for curves and surfaces. In fact, all these actions (including the scalar one) can be placed under a single framework if one replaces vector fields by p-forms and measures by currents. This requires a few definitions. A p-linear functional on Rd is a function q : (Rd)p → R which is linear with respect to each of its variables. We will use the notation

q(e1, . . . , ep) = (q| e1, . . . , ep) which is consistent with our notation for p = 1. A p-linear functional is skew-symmetric if (q| e1, . . . , ep) = 0 whenever ei = ej d for some i 6= j. They form a vector space denoted Λp (or Λp when the dimen- sion of the underlying space needs to be specified). For p = d, there is, up to a multiplicative constant, only one skew-symmetric d-linear functional, the determi- nant: q(e1, . . . , ed) = det(e1, . . . , ed). For p < d, one can associate to any family e1, . . . , ed−p of vectors, the skew-symmetric p-linear functional

qe1,...,ed−p (f1, . . . , fp) = det(e1, . . . , ed, f1, . . . , fp).

d d This functional is usually denoted e1 ∧ . . . ed−p ∈ Λp. If e1, . . . , ed is a basis of R , one can show that the family (ej1 ∧ ... ∧ ejd−p , 1 ≤ j1 < ··· < jd−p ≤ d) is a basis d d of Λp which is therefore p dimensional. One uniquely defines an inner product d on Λp by first selecting an orthonormal basis (e1, . . . , ed) of R and deciding that

(ej1 ∧ ... ∧ ejd−p , 1 ≤ j1 < ··· < jd−p ≤ d) is an orthonormal family in Λp. This dot product will be denoted h.,.i . Λp A p-form on Rd is a function x 7→ q(x) such that, for all x, q(x) is a p-linear d skew symmetric form, i.e., q is a function from R to Λp. For example, 0-forms are ordinary functions. The space of p-forms is denoted Ωp. Since p-linear forms form a normed space, with continuity, or differentiability of p-forms are well-defined notions. We can consider spaces of smooth p-forms, and in particular, reproducing Hilbert spaces of d such forms: a space W ⊂ Ωp is a reproducing Hilbert space if, for every x ∈ R and d e1, . . . , ep ∈ R , the evaluation function q 7→ (q(x)| e1, . . . , ep) is continuous in W , which implies that there exists an element, denoted kx(e1, . . . , ep) ∈ W such that

(q(x)| e1, . . . , ep) = hq , kx(e1, . . . , ep)iW .

Note that kx(e1, . . . , ep) being a p-form, we can compute

(kx(e1, . . . , ep)(y)| f1, . . . , fp) 64 3. MATCHING DEFORMABLE OBJECTS which will be denoted (k(x, y)| e1, . . . , ep; f1, . . . , fp). So, k(x, y) is 2p-linear, and skew-symmetric with respect to its p first and its p last variables. Moreover, by construction

hkx(e1, . . . , ep) , ky(f1, . . . , fp)iW = (k(x, y)| e1, . . . , ep; f1, . . . , fp). Like for vector fields, kernels for p-forms can be derived from scalar kernels by letting

(53) (k(x, y)| e1, . . . , ep; f1, . . . , fp) = k(x, y)he1 ∧ · · · ∧ ep , f1 ∧ · · · ∧ fpi . Λd−p When such a set W is given, we can associate to any oriented p-dimensional d ∗ submanifold M of R and element ηM of W via the formula (for q ∈ W ) Z (ηM | q) = (q(x)| e1(x), . . . , ep(x))dσM (x) M where e1, . . . , ep is, for all x, a positively oriented orthonormal basis of the tangent space to M at x. Like for measures, we can compute the dual norm of ηM , which is Z Z 2 kηM kW ∗ = (k(x, y)| e1(x), . . . , ep(x); e1(y), . . . , ep(y))dσM (x)dσM (y) M M or, for a scalar kernel Z Z 2 kηM k ∗ = k(x, y)he1(x) ∧ ... ∧ ep(x) , e1(y) ∧ ... ∧ ep(y)i dσM (x)dσM (y). W Λd−p M M The form ηM is a particular case of what is called a p-current, i.e., it is a linear form on smooth p-forms. The expressions of ηM and its norm in a local chart of M are quite simple. Indeed, if (u1, . . . , up) is the parametererization and (∂u1, . . . , ∂up) the associated tangent vectors (assumed to be positively oriented), we have, for a p-form q:

(q| ∂u1, . . . , ∂up) = (q| e1, . . . , ep) det(∂u1, . . . , ∂up) which, if (u1, . . . , up) is positively oriented, immediatly yields

(q| ∂u1, . . . , ∂up)du1 . . . dup = (q| e1, . . . , ep)dσM . We therefore have, in the chart, Z (ηM | q) = (q| ∂u1, . . . , ∂up)du1 . . . dup and similar formulae for the norm. Now consider the action of diffeomorphisms. If M becomes ϕ(M), the formula in the chart yields Z  ηϕ(M)| q = (q ◦ ϕ| Dϕ∂u1, . . . , Dϕ∂up)du1 . . . dup  so that ηϕ(M)| q = (ηM | q˜) with (˜q(x)| f1, . . . , fp) = (q(ϕ(x))| Dϕf1, . . . , Dϕfp). Like for vector measures, we can introduce the left action on p-forms (also called the pull-back of the p-form): −1 −1 −1  (ϕ ? q| f1, . . . , fp) = q ◦ ϕ | D(ϕ )f1,...,D(ϕ )fp and the resulting action on p-currents (54) (ϕ.η| q) = η| ϕ−1q so that we can write ηϕ(M) = ϕηM . 2. MATCHING TERMS AND GREEDY DESCENT PROCEDURES 65

This is reminiscent to what we have obtained for measures, and vector measures for curves and surfaces. We now check that these three examples are particular cases of the previous discussion. Measures are linear forms on functions, which are also 0-forms. The definition (ϕµ| f) = (µ| f ◦ ϕ) is exactly the same as in (54). Consider now the case of curves, which are 1D submanifolds, which implies p = 1. If γ is a curve, and τ is its unit tangent, s its arc-length Z Z b (ηγ | q) = (q(γ(s))| τ(s))ds = (q(γ(u))| γ˙ (u))du. γ a

d To a vector field f on R , we can associated the one form qf defined by (qf (x)| v) = T f(x) v. (In fact all 1-forms can be expressed as qf for some vector field f.) Using this identification and noting that (νγ | f) = (ηγ | qf ), we can see that the vector measure for curve matching is a special case of the currents that we have considered here. For surfaces in 3D, we need to take p = 2, and if S is a surface, we have Z (ηS| q) = (q(x)| e1(x), e2(x))dσS(x). M 3 Again, a vector field f on R induces a 2-form qf , defined by (qf | v1, v2) = det(f, v1, v2) = T f (v1 × v2), and every 2-form can be obtained this way. Using the fact that, if (e1, e2) is a positively oriented basis of the tangent space to the surface, then e1 × e2 = N, we retrieve (νS| f) = (ηS| qf ).

2.6. Matching vector fields. In this section, we compare two vector fields f and f˜, i.e., two functions from Ω to Rd. We will define the function D(ϕf, f˜) by the L2 norm between ϕ.f and f˜, and focus our discussion on the definition of the action of diffeomorphisms on vector fields. Several choices can be made for this action. The first one considers a vector field like a velocity: assume that each point in Ω moves according to a trajectory x(t) and let f(x) = dx/dt, say at time t = 0. Now, assume we make the transformation x 7→ ϕ(x), and let f˜ be the transformed vector field, such that dx/dt˜ = f˜(˜x). Replacingx ˜ by ϕ(x) yields Dϕ(x)dx/dt = f˜◦ϕ(x) so that f˜(x) = Dϕ◦ϕ−1f ◦ϕ−1. We can also consider vector fields that are gradient of a function I, f = ∇I. If ϕ becomes ϕ.I = I ◦ ϕ−1, then f becomes D(ϕ−1)T ∇I ◦ ϕ−1. By construction, the set of vector fields f such that curlf = 0 is closed under this action. Sometimes, it is important that the norms of the vector fields at each point are left invariant by the transformation. This can be achieved in both cases by normaliz- ing the result, using f 7→ |f|(Dϕf/|Dϕf|)◦ϕ−1 or f 7→ |f|(Dϕ−T f/|Dϕ−T f|)◦ϕ−1. ˜ 2 We now evaluate the differential of U(ϕ) = kϕ.f − fk2, at ϕ = id (which will provide the Eulerian differential by replacing f by ϕ.f). We therefore need to compute the derivative at ε = 0 of (id εh)f. This gives, for the first action

d d −1 (id + εh)f| = (D(id + εh)f) ◦ (id + εh) dε ε=0 dε |ε=0 d = Dhf + Df (id + εh)−1 dε |ε=0 = Dhf − Dfh 66 3. MATCHING DEFORMABLE OBJECTS by Equation (42) applied to ϕ = id . Note that the result can be written This immediately implies, for the first action: δU  Z (id )| h = 2 (f − f˜)T (Dh.f − Dfh)dx. δϕ Ω The Eulerian differential now comes by replacing f by ϕ.f so that Z ∂¯V U(ϕ)| h = 2 (ϕ.f − f˜)T (Dh(ϕ.f) − D(ϕ.f)h)dx. Ω This expression can be combined with (50) to obtain the Eulerian gradient of U, namely Z V  T T  ˜ ∇ U(ϕ) = 2 (ϕ.f) D2K(., x) − K(., x)D(ϕ.f) (ϕ.f − f)dx. Ω The Eulerian derivative can be rewritten in another form to avoid the inter- vention of the differential of h. A consequence of the divergence theorem is the following lemma

Lemma 5. If Ω is a bounded open domain of Rd and v, w, h are smooth vector fields on Rd, then Z Z Z (55) vT Dhwdx = (vT h)(wT N)dl − (wT DvT h + divwvT h)dx Ω ∂Ω Ω Under the assumption that h vanishes on ∂Ω, this lemma implies that Z ∂¯V U(ϕ)| h = −2 ((ϕ.f)T D(ϕ.f−f˜)T +div(ϕ.f)(ϕ.f−f˜)T +(ϕ.f−f˜)T D(ϕ.f))hdx Ω so that   ∂¯V U(ϕ) = −2 D(ϕ.f − f˜)(ϕ.f) + div(ϕ.f)(ϕ.f − f˜) + D(ϕ.f)T (ϕ.f − f˜) ⊗ dx which implies that

V Z   ∇ U(ϕ) = −2 K(., x) D(ϕ.f − f˜)(ϕ.f) + div(ϕ.f)(ϕ.f − f˜) + D(ϕ.f)T (ϕ.f − f˜) dx. Ω Let’s now consider the normalized version of this action, Dϕf ϕ.f = |f| ◦ ϕ−1. |Dϕf| Using the fact that the derivative of z/|z| is z0/|z|−zzT z0/|z|3, we obtain (skipping the details)   Z δU ˜ T T 2 (56) (id )| h = 2 (f − f) ((Idd − ff /|f| )Dh.f − |f|D(f/|f|)h)dx. δϕ Ω After replacing f by ϕ.f to compute the Eulerian differential, we obtain, using Lemma 5 again, yields V   ∇ U(ϕ) = − D(w)(ϕ.f) + div(ϕ.f)w + D(ϕ.f/|f|)T (ϕ.f − f˜) ⊗ dx

T 2 ˜ with w = (Idd − (ϕ.f)(ϕ.f) /|f| )(ϕ.f − f). The Eulerian gradient is directly obtained from this by applying the kernel. 2. MATCHING TERMS AND GREEDY DESCENT PROCEDURES 67

The computation for ϕ.f = (Dϕ−T f)◦ϕ−1 and its normalized version are very similar. We find, for the unnormalized action d (id + εh)f = −DhT f − Dfh. dε |ε=0 yielding δU  Z (id )| h = −2 (f − f˜)T (DhT f + Dfh)dx. δϕ Ω and Z ∂¯V U(ϕ)| h = −2 (ϕ.f − f˜)T (DhT (ϕ.f) + D(ϕ.f)h)dx Ω which can be written, using Lemma 5   ∂¯V U(ϕ) = 2 D(ϕ.f)(ϕ.f − f˜) + div(ϕ.f − f˜)(ϕ.f) − D(ϕ.f)T (ϕ.f − f˜) ⊗ dx.

For the normalized action, ϕ.f = (Dϕ−T f/|Dϕ−T f|) ◦ ϕ−1, we get

V   ∇ U(ϕ) = 2 D(w)(ϕ.f − f˜) + div(ϕ.f − f˜)w − D(ϕ.f/|f|)T (ϕ.f − f˜) ⊗ dx

T 2 with w = (Idd − (ϕ.f)(ϕ.f) /|f| )ϕ.f. The

2.7. Matching frames. We here consider the issue of moving 3D orthonor- mal frames fields. i.e., functions x 7→ f(x) = (f1(x), f2(x), f3(x)) that assign to each x ∈ Ω an orthonormal basis of R3. We want to define a suitable action of diffeomorphisms on such structures. The action will be built based on the following interpretation (which is based on applications to specific modalities of medical images). The first vector f1(x) represents the fiber orientation of a tissue at x. The plane generated by the first two vectors is the direction of a 2D layer around x, f3 being therefore a normal to this plane. The fiber at x is a 3D oriented curve, γ, that passes through x with a unit tangent tangent given by f1. After the action of a diffeomorphism ϕ, γ becomes ϕ ◦ γ and its tangent at y = ϕ(x) is Dϕ(x)f1(x)/|Dϕ(x)f1(x). This provides the 0 −1 transformation of f1, namely f1 7→ (f1 = Dϕf1/|Dϕf1|) ◦ ϕ . Similarly, the layer at x is a surface with tangent plane generated by f1(x) and f2(x). The tangent plane to this surface at y = ϕ(x) is generated by Dϕf1(x) and Dϕf2(x). The unit normal to this tangent plane is (Dϕf1(x)×Dϕf2(x))/|Dϕf1(x)× Dϕf2(x)| which therefore provides the image of f3. Note that, from the properties of the cross product, we have, for any z ∈ R3 T (Dϕf1 × Dϕf2) z = det(Dϕf1, Dϕf2, z) −1 = det(Dϕ) det(f1, f2, Dϕ z) T −1 = det(Dϕ)(f1 × f2) Dϕ z −T which yields Dϕf1 × Dϕf2 = det(Dϕ)(Dϕ) f3. This implies that −T −T Dϕf1 × Dϕf2/|Dϕf1 × Dϕf2| = (Dϕ) f3/|(Dϕ) f3| so that f3 is transformed by 0 −T −T −1 f3 7→ f3 = ((Dϕ) f3/|(Dϕ) f3|) ◦ ϕ . 68 3. MATCHING DEFORMABLE OBJECTS

0 0 0 0 0 0 Given f1 and f3, f2 is uniquely specified by the constraint that (f1, f2, f3) is or- 0 0 0 thonormal and positively oriented, namely f2 = f3 × f1. This provides our action ϕ.f = f 0. From this discussion, we see that the deformation applied to a frame requires the introduction of the two normalized actions on vector fields that we have discussed in the previous section. This is good news, since this implies that we have already made most of the computation that will be needed in this section. However, we must decide which matching function, D(ϕ.f, f˜), is more suitable for comparing frames. Let us fix some notation for the two normalized actions of vector fields. We will let ϕ ∗ f = (Dϕf/|Dϕf|) ◦ ϕ−1 and ϕ ? f = (Dϕ−T f/|Dϕ−T f|) ◦ ϕ−1 so that the action on frames is

(57) ϕ.(f1, f2, f3) = (ϕ ∗ f1, (ϕ ? f3) × (ϕ ∗ f1), ϕ ? f3). From the previous section. we can write d (ϕ + εh)(f , f , f ) = (w , w , w ) dε 1 2 3 |ε=0 1 2 3 with T (58) w1 = (Idd − f1f1 )(Dhf1Df1h), T T (59) w3 = −(Idd − f3f3 )(Dh f3 + Df3h),

(60) w2 = w3 × f1 + f3 × w1. We now discuss posible choices for the distance D between 3D frames. It is natural to consider distances related to the rotation angle θ from frame f to f 0, which can be computed by (identifying frames to 3 by 3 orthogonal matrices) (61) trace(f T f 0) = 1 + 2 cos θ T 0 which can also be written trace(Id3 −f f ) = 2(1−cos θ). Since the latter quantity is always positive and vanishes only when f = f 0, it is a possible candidate for our function D, yielding Z 0 T 0 U(ϕ) = D(ϕ.f, f ) := trace(Id3 − (ϕ.f) f )dx. Ω The expression of the Eulerian differential is Z ∂V U(ϕ)| h = − trace(wT .f 0)dx Ω where w = (w1, w2, w3) is given by Equation (58) with f replaced by ϕ.f. The differential of h can be removed using the divergence theorem like for vector field matching. We leave the (lengthy) computation to the reader.

One important application of frame matching is Diffusion Tensor Imaging, for which the frames arise as eigen-decompositions of 3 by 3 matrices. Here, the initial data is a field of 3 by 3 symmetric matrices, x 7→ S(x). The matrices can be decomposed under the form T T T S = λ1f1f1 + λ2f2f2 + λ3f3f3 2. MATCHING TERMS AND GREEDY DESCENT PROCEDURES 69 where λ1 ≥ λ2 ≥ λ3 are the eigenvalues of S and f1, f2, f3 its eigenvectors, all ˜ ˜ ˜ depending on the variable x. Letting f = (f1, f2, f3) and ϕ.f = (f1, f2, f3) defined by (57), we can define −1 ˜ ˜T −1 ˜ ˜T −1 ˜ ˜T (62) ϕ.S = λ1 ◦ ϕ f1f1 + λ2 ◦ ϕ f2f2 + λ3 ◦ ϕ f3f3 . One important feature of this action is that, although the eigen-decomposition of S is not unique (in particular when two or three eigenvalues coincide) the transfor- mation S 7→ ϕ.S is defined without ambiguity. In fact, if we denote Rf (ϕ) the 3D rotation (ϕ.f) ◦ ϕf T , we have T −1 ϕ.S = (Rf (ϕ)SRf (ϕ) ) ◦ ϕ .

The variation (d/dε)Rf (id + εh) at ε = 0 is a skew-symmetric matrix (like any infiniteimal variation of a rotation), ωf (h) given by T T T ωf h = w1f1 + w2f2 + w3f3 where (w1, w2, w3) come from (58). With this notation, we can compute the deriv- ative of (id + εh)S at ε = 0, which is d (id + εh).S = ω (h)S − Sω (h) − DS.h. dε |ε=0 f f T (Here DS.h is the matrix with coefficients ∇Sijh.) This computation can be used to compute the Eulerian differential of a match- ing functional U(ϕ) = D(ϕ.S, S0). Consider the simplest choice Z U(ϕ) = trace((ϕ.S − S0)2)dx Ω which is the sum of the squared coefficients of the difference between the matrices, we get Z ¯  0 ∂U(ϕ)| h = trace((ϕ.S − S )(ωϕ.f (h)(ϕ.S) − (ϕ.S)ωϕ.f (h) − D(ϕ.S).h))dx. Ω

Here again, the derivatives of h that are involved in ωϕ.f (h) can be eliminated using the divergence theorem. 2.8. Limit behavior of greedy algorithms. Greedy algorithms, as we pre- sented them here, have a series of very nice features. First, they operate by inte- grating forward a differential equation, which leads to fast and reasonably simple computer programs. Second, because of the action of the reproducing kernel, they provide smooth solutions at all times, which, combined to the fact that they travel along descent lines for some objective function (U), generally guarantees numeri- cal stability, at least in finite time. Maybe the most important thing is that their output is the flow of a differential equation, hence a diffeomorphism at all times. As we will see later, other algorithms designed to provide diffeomorphims may be much more computationally intensive. However, the infinite time behaviour of such algorithms is less obvious. One thing we have remarked is that U(ϕt) decreases with time, so that, if U is bounded from below (which is the least one can ask from a minimisation problem) U(ϕt) will have a limit when t tends to infinity. One can therefore expect that its derivative, which is the norm (in V ) of the left-hand term of the differential equation, vanishes for infinite t, so that variations on ϕt become infinitesimally small. This certainly results in a freezing of any numerical integration procedure when times goes by. 70 3. MATCHING DEFORMABLE OBJECTS

But this does not guarantee convergence, since slowly varying sequences may still diverge at infinity. In fact (note that this discussion is placed on a highly heuristic level) the limit behavior should strongly depend on the context in which the procedure is used. For point matching, or discretized curves or surfaces, because the problem is finite dimensional, one can expect that the greedy gradient descent will converge to some solution in GV which achieves exact matching (thus a null value for the cost function), although this still requires a rigorous proof. However, there is an infinite number of solutions to the exact matching problem withing GV , and it is hard to guess which is the one which is selected by this algorithm. The image matching and higher dimensional cases are more complex, because it is not even guaranteed that an exact matching solution exists within GV . The asymptotic behavior of the algorithm is an open problem. However, this algorithm (with an adequate stopping rule) provides generally good and smooth matching results, as illustrated in [7, 26, 4].

3. Regularized matching 3.1. Small deformations. A standard way to obtain a (hopefully) unique and smooth solution of a matching problem is to add a penalty term in the matching functional. (This term was denoted δ(ϕ) in (29).) A large variety of methods have been designed, in approximation theory, statistics or signal processing for solving ill-posed problems. The simplest (and typical) form of penalty function is

2 ρ(ϕ) = kϕ − id kH for some Hilbert (or Banach) space of functions. Some more complex functions of ϕ − id may also be devised, related to energies of non-linear elasticity (see, among others, [2, 3, 1, 12, 9, 22, 15]). Such methods may be called “small deformation” methods because they work on the deviation of u = ϕ − id , and controlling the size or smoothness of u alone is not enough to guarantee that ϕ is a diffeomorphism (unless u is small, as we have seen in section 1 of chapter 2). For example, there is, in general, no way of proving the existence of a solution of the minimization problem within some group of diffeomorphisms G, unless some restrictive assumptions are made on the objects to be matched. Since we want to focus this course on diffeomorphic matching we shall not detail much of these methods here. However, it is interesting to compute the Eulerian gradients of such functionals, and we shall do this in a simple situation, since the computation rapidly become very heavy. R 2 So, consider the function ρ(ϕ) = Ω kDϕ(x)k dx where the matrix norm is

2 T X 2 kAk = trace(A A) = aij i,j

(Hilbert-Schmidt norm). We have

 δρ  Z (ϕ)| h = 2 trace(DϕT Dh)dx δϕ Ω 3. REGULARIZED MATCHING 71 and Z ∂ρ¯ (ϕ)| h = 2 trace(DϕT D(h ◦ ϕ))dx Ω Z 2 trace(DϕT Dh ◦ ϕDϕ)dx Ω Let h(α) be the αth component of the Rd-valued vector field h, α = 1, . . . , d, and d let eα be the α-th element of the canonical basis of R so that (α) T D (α) E (63) h (x) = eα h(x) = hKxeα , hiV = Kx , h V (α) where K is the self-reproducing kernel of V , and Kx is the α-th column of K. Recalling that Kx(.) = K(., x) is a d × d matrix, we define ∂βK(., x) as the partial derivative of K(., x) with respect to xβ. A formal differentiation of (63) yields we get (α) ∂h D (α) E (x) = ∂βK (., x) , h ∂xβ V (this computation can be made rigorous under sufficient smoothness conditions on K). Now,

k (α) (β) (α) T X ∂ϕ ∂ϕ ∂h traceDϕ(x) Dh(ϕ(x))Dϕ(x) = (ϕ(x)) ∂xi ∂xi ∂xβ i,α,β=1

k (α) (β) X ∂ϕ ∂ϕ D (α) E = ∂βK (., ϕ(x)) , h ∂xi ∂xi V i,α,β=1 By linearity, this yields the Eulerian gradient of ρ on V :

Z k (α) (β) V X ∂ϕ ∂ϕ (α) (64) ∇ϕ ρ = 2 ∂βK (., ϕ(x))dx. ∂xi ∂xi Ω i,α,β=1 Another, equivalent, point of view is to rewrite the differential using lemma 5. This can be used, for example, to yield a new greedy image matching algorithm, which includes a regularization term (a similar algorithm may easily be written for point matching) Algorithm 4. The following procedure is a Eulerian gradient descent, on V , for the energy Z Z 2 1 −1 0 U(ϕ) = kDϕ(x)k dx + 2 I ◦ ϕ (x) − I (x) dx : Ω σ Ω

Start with an initial ϕ0 = id and integrate the differential equation Z k (α) (β) d X ∂ϕt ∂ϕt (α) (65) ϕt(y) = −2 ∂βK (ϕt(y), ϕt(x))dx dt ∂xi ∂xi Ω i,α,β=1 Z 2 0 + 2 (It(x) − I (x))K(ϕt(y), x)∇It(x)dx σ Ω −1 with It = I ◦ ϕt . 72 3. MATCHING DEFORMABLE OBJECTS

This algorithm, which, like the previous greedy procedures, has the fundamen- tal interest of providing a smooth flow of diffeomorphisms to minimize the matching functional, suffers from the same limitations of its predecessors concerning its limit behavior, which are essentially due to the fact that the variational problem itself is not well-posed; minimizers may not exists, and when they exist they do not necessarily correspond to diffeomorphisms. The methods we present now, because they directly work with flows of diffeomorphisms, will not have this disadvantage, although this will be at the cost of a significant increase of the computational burden.

3.2. Optimizing over flows. Instead of using a functional norm on the dif- ference between ϕ and the identity mapping, we here consider, as a regularizing term, the distance dV which has been defined in section 3.3 of chapter 2. More precisely, we set 2 ρ(ϕ) = dV (id , ϕ) and henceforth restrict the matching to diffeomorphisms belonging to GV . In this context, we have the important theorem:

1 d Theorem 30. Let V be a Hilbert space which is embedded in C0 (Ω, R ). As- sume that the functional U : GV 7→ R is bounded from below and continuous with respect to uniform convergence on compact sets. Then there always exists a mini- mizer of 2 (66) E(ϕ) = dV (id , ϕ) + U(ϕ) over GV .

Continuity for uniform convergence on compact sets means that, if ϕn → ϕ uni- formly on compact subsets of Ω, then U(ϕn) → U(ϕ). It is a weaker condition than continuity for pointwise convergence, which is satisfied for most of the matching terms we have considered. However, we have seen some cases in which U was also functions of the derivatives of ϕ, which would require, in order that U(ϕn) → U(ϕ) not only that ϕn → ϕ by also Dϕn → Dϕ and maybe higher derivatives. The theorem can be modified to handle this situation, simply by assuming that V is p d embedded in C0 (Ω, R ) for p strictly larger than the number of derivatives for which convergence is required. All the examples we have considered so far are covered by this theorem, sometimes with the need of adding one derivative to the embedding of V

Proof. E has an infimum u over GV , since it is bounded from below. We first use the following lemma:

2 Lemma 6. Minimizing E(ϕ) = d(id , ϕ) + U(ϕ) over GV is equivalent to min- imizing the function Z 1 ˜ 2 v E(v) = kvtkV dt + U(ϕ01) 0 1 over XV (Ω) . 1 Let’s prove this lemma. For v ∈ XV (Ω), we have, by definition of the distance Z 1 v 2 2 d(id , ϕ01) ≤ kvtkV dt 0 3. REGULARIZED MATCHING 73

v ˜ ˜ which implies E(ϕ01) ≤ E(v) and which obviously implies infGV E(ϕ) ≤ E(v), and 1 ˜ since this is true for all v ∈ XV (Ω), we have inf E ≤ inf E. Now, assume that ϕ is such that E(ϕ) ≤ inf E + ε. Then, by definition of the distance, there exists v such that Z 1 2 kvtkV dt ≤ d(id , ϕ) + ε 0 which implies that E˜(v) ≤ E(ϕ) + ε ≤ inf E + 2ε. Since the left-hand term is larger than inf E˜, we have inf E˜ ≤ inf E + 2ε for all ε > 0 which implies that inf E ≤ inf E˜. We therefore have inf E = inf E˜. Moreover, if there exists v such that E˜(v) = ˜ v ˜ v min E = inf E, then, since we know that E(ϕ01) ≤ E, we must have E(ϕ01) = v min E. Conversely, if E(ϕ) = min E(ϕ), by theorem 28, E(ϕ) = E(ϕ01) for some v and this v must achieve the infimum of E˜, which proves the lemma.

This lemma shows that it suffices to study the minima of E˜. Now, like in Theorem 28, one can find, by taking a subsequence of a minimizing sequence, a n 1 1 ˜ n sequence v in XV (Ω) which converges weakly to some v ∈ XV (Ω) and U(v ) tends to u. Since Z 1 Z 1 n 2 2 lim inf kvt kV dt ≥ kvtkV dt 0 0 1 and since weak convergence in XV (Ω) implies convergence of the flow on compact vn v ˜ sets (Theorem 24) we also have U(ϕ01 ) → U(ϕ01) so that E(v) = u and v is a minimizer. ut 3.3. Euler-Lagrange Equations and Gradient. We now detail the com- putation of the gradient for energies like (66). This computation is the starting point of all optimization algorithms. We refer, for example, to [20] for more details on how to buid efficient optimization algorithms once this issue is solved. As remarked in the proof of Theorem 30, the variational problem which has to 1 be solved is preferably expressed as a problem over XV (Ω). The function which is minimized over this space takes the form Z 1 2 v E(v) = kvtkV dt + U(ϕ01) 0 What we will be looking for is the expression of the gradient for the Hilbert structure 1 of XV (Ω), which is the natural inner-product structure of the problem. It is a V 1 function, denoted ∇E : v 7→ ∇v E ∈ XV (Ω), which is defined by the formula, valid 1 for v, h in XV (Ω) d Z 1 E(v + εh) = ∇V E , h = (∇V E) , h dt |ε=0 v X 1 v t t V dε V 0 Since the set V is fixed in this section, we will drop the exponent from the notation, and simply refer to the gradient ∇E(v). Note that this is different from the Eulerian gradient we have dealt with before, this last definition being closer than the usual definition of the gradient. One important difference is that the gradient we here 1 define is an element of XV (Ω), henceforth a time-dependent vector field, whereas the Eulerian gradient simply was an element of V (a vector field on Ω). We nonetheless 74 3. MATCHING DEFORMABLE OBJECTS have the following theorem that relates the two. For this, we need to introduce the following operation of diffeomorphisms on vector fields. Definition 11. Let ϕ be a diffeomorphism of Ω and v a vector field on Ω. We denote Adϕv the vector field on Ω defined by −1 −1 (67) Adϕv(x) = Dϕ ◦ ϕ v ◦ ϕ (x).

Adϕv is called the Adjoint of v by ϕ. T If v ∈ V , we will also define Adϕ v by the formula

D T E hv , AdϕwiV = Adϕ v , w V for all w ∈ V (or at least for w in a dense subset of V ). With this notation, we have the following theorem.

Theorem 31. Let U be a function defined on GV . Then the Eulerian gradient 1 of U and its XV gradient are related by the formula T v (68) ∇U(v)(t, x) = Ad v ∇U(ϕ )(x). ϕt1 01 Proof. We have, using Theorem 22 in Chapter 2, Z 1 d v+εh v v v ϕ01 (x)|ε=0 = Dϕu1(ϕ0u(x))hu ◦ ϕ0u(x)du dε 0 Z 1 v v v = Dϕu1 ◦ ϕ1uhu ◦ ϕ1u) ◦ ϕ01(x)du 0 Z 1 = (Ad v h ) ◦ ϕ (x)du ϕu1 u 01 0 R 1 Letting w = Ad v h du, we have 0 ϕu1 u d d U(ϕv+εh) = U((id + εw) ◦ ϕv + o(ε)) dε 01 |ε=0 dε 01 |ε=0 v = ∇U(ϕ01) , w V Z 1 v = ∇U(ϕ ) , Ad v h du 01 ϕu1 u V 0 Z 1 D T v E = Ad v ∇U(ϕ ) , hu du ϕu1 01 0 V which proves (68). ut This is a very important result, which has the following simple consequences.

1 Proposition 15. If v ∈ XV (Ω) is a minimizer of Z 1 ˜ 2 v E(v) = kvtkV dt + U(ϕ01) 0 then, for all t T (69) vt = Ad v v1 ϕt1 1 v with v1 = − 2 ∇U(ϕ01)(x). 3. REGULARIZED MATCHING 75

1 Corollary 3. If v ∈ XV (Ω) is a minimizer of Z 1 ˜ 2 v E(v) = kvtkV dt + U(ϕ01) 0 then, for all t T (70) vt = Ad v v0 ϕt0 Proposition 15 is a direct consequence of Theorem 31. For the corollary, we need to use the fact that AdϕAdψ = Adϕ◦ψ, which can be checked by direct computation and write

T T T T T vt = Ad v v1 = Ad v Ad v v0 = (Adϕv Adϕv ) v0 = Ad v v0. ϕt1 ϕt1 ϕ10 10 t1 ϕt0 Proposition 15 is an expression of what is called the Pontryagin principle in T 1 v optimal control. The equations vt = Ad v v0 and v1 = − ∇U(ϕ )(x) are equiv- ϕt0 2 01 alent to the Euler-Lagrange equations for E˜ and will lead to interesting numerical procedures. Equation (70) is a cornerstone of the theory. It is the result of a general mechanical property called the conservation of momentum, on which we will return later on in these notes.

The transpose of the Adjoint can be put into a more computational form.

∗ Theorem 32. Let v = Km for m ∈ V . Let (e1, . . . , ed) be an orthonormal basis of Rd. Then, for y ∈ Ω, we have d T X (71) Adϕ v(y) = (m| AdϕK(., y)ei)ei i=1

Proof. Take v ∈ V , and m = K−1v ∈ V ∗. For b ∈ Rd, we have bT AdT v(y) = AdT v , K(., y)b ϕ ϕ V

= hv , AdϕK(., y)biV = (m| AdϕK(., y)b). Theorem 32 is now a consequence of the decomposition

d T X T T Adϕ v(y) = ei Adϕ v(y)ei. i=1 ut

Recall that K(., .) is a matrix, so that K(., y)ei is the ith column of K(., y) i T that will be denoted K . Equation (71) states that the ith coordinate of Adϕ v is i  m| AdϕK (., y) . Using Proposition 15 and Theorem 32, we obtaing the expression of the V - gradient of E

Corollary 4. The V gradient of Z 1 2 v E(v) = kv(t)k dt + U(ϕ01) 0 76 3. MATCHING DEFORMABLE OBJECTS is equal to d X −1 i v  (72) 2vt + m1| Dϕt1 K (ϕt1(.), y) ei i=1 ¯ v with m1 = ∂U(ϕ01)(x). 3.4. Conservation of Momentum. 3.4.1. Evolution of the derivatives of the diffeomorphism. We now explore a few v consequences of Corollary 3. Combining equation (71) and the fact that dϕ0t/dt = v v(t, ϕt ), we get, for the optimal v (letting v0 = Km0) d dϕv X 0t (t, y) = m | (Dϕ )−1Ki(ϕv (.), ϕv (y))e . dt 0 0t 0t 0t i i=1 This implies that the evolution of the diffeomorphism at any point depends on the the diffeomorphism and its derivative everywhere (this is a global equation). Since v it depends on the differential of ϕ0t, it is natural to combine it with the evolution v of Dϕ0t which here takes a particularly simple form. Indeed, formally taking the differential with respect to y, we get, for a ∈ Rd d d X Dϕv (y)a = m | (Dϕ )−1D Ki(ϕv (.), ϕv (y))Dϕv (y)ae . dt 0t 0 0t 2 0t 0t 0t i i=1

The derivation “under” m0 requires some care however. As we have seen, the expression (m0| u) may depend on derivatives of u and will typically satisfy the continuity equation (m0| u) ≤ Ckukp,∞ for some p. (The examples we have seen so far corresponded to either p = 0 of p = 1). Let u depend on a additional parameter, say θ. We want to discuss the validity of the equation d  ∂u  (m | u h) = m | θ . dθ 0 t 0 ∂θ which corresponds to

lim (m0| (uθ+ε − uθ)/ε − ∂uθ/∂θ) = 0. ε→0 Since we can only ensure that

(m0| (uθ+ε − uθ)/ε − ∂uθ/∂θ) ≤ C k(uθ+ε − uθ)/ε − ∂uθ/∂θkp,∞ we see that we need to ensure that all derivatives up to order p of u (with respect to the variable x) are differentiable in the parameter θ, in general with a uniform convergence of the increment rate in θ. v v In the case we consider Dϕ0t(y)a is the derivative of ϕ0t(y + θa) so that the function uθ is given by (fixing the coordinate i) −1 i v v uθ(x) = (Dϕ0t(x)) K (ϕ0t(x), ϕ0t(y + θa)).

We see that the existence of the pth derivative of uθ requires the existence of the pth derivative of K and of the p + 1th derivative of ϕ. From Theorem 21, we see that p+1 d it suffices to assume that V is continuously embedded in C0 (Ω, R ). Moreover, once this is assumed, we can iterate the argument to higher derivatives (up to p+1) v of ϕst. This is summarized in the following result. 3. REGULARIZED MATCHING 77

Theorem 33. Assume that v satisfies Equation (70) with v0 = Km0. Assume that

(m0| u) ≤ Ckukp−1,∞ p for some constant C and that V is continuously embedded in C0 . Then the diffeo- morphism and its p first differentials evolve according to the following differential i i system. Denote Kz : y 7→ K (z, y) (73) dϕv d  0t (y) = P m | (Dϕ )−1Ki(ϕv (.), ϕv (y))e  dt i=1 0 0t 0t 0t i  d v Pd −1 i v v v   dt Dϕ0t(y)a = i=1 m0| (Dϕ0t) D2K (ϕ0t(.), ϕ0t(y))(Dϕ0t(y)a) ei .  .  p v    dD ϕ0t(y) Pd −1 p i v  (y)(a1, . . . , ap) = m0| (Dϕ0t) D (K v (ϕ0t(y))(a1, . . . , ap) ei dt i=1 ϕ0t(.)

(p) The interest of this theorem is that it expresses the family (ϕ0t, Dϕ0t,...D ϕ0t) (1) (p) as the solution of a functional ordinary differential equation, (d/dt)(ϕ0t,M0t ,...,M0t ) = (1) (p) F (ϕ0t,M0t ...,M0t ). We now provide an alternate form of this result.

−1  −1 Introduce the linear form (µt| w) = m0| Dϕ0t w . Denote Mt = (Dϕ0t) . We have, using MtDϕ0t = Idd dM d = −M (Dϕ )M. dt dt 0t Using the second equation in (73), this yields

d dMt X a = − m | M(.)D Ki(ϕ(.), ϕ(y))aM(y)e . dt 0 2 i i=1 This implies that, for any w ∈ V dµ   dM  t | w = − m | w dt 0 dt d X = m | m | M(x)D Ki(ϕ(x), ϕ(y))m M(y)e  0 0 2 x i y i=1 d X = − µ | µ | D Ki(ϕ(x), ϕ(y))w e  t t 2 x i y i=1 where the subscripts x and y refer to the variable on which the linear form is applied. We therefore have the system d dϕt X (74) (y) = µ | Ki(ϕ (.), ϕ (y))e dt t t t i i=1   d dµt X | w = − µ | µ | D Ki(ϕ (x), ϕ (y))w e  . dt t t 2 t t x i y i=1 0 ∗ When (m0| w) does not rely in the derivatives of w (more precisely, m0 ∈ C0 (Ω) ), this provides an ordinary differential equation in the variables (ϕ, µ) (of the form (d/dt)(ϕ, µ) = F (ϕ, µ).) The initial conditions are ϕ0 = id and µ0 = m0. 78 3. MATCHING DEFORMABLE OBJECTS

3.4.2. Case of Measure Momenta. Another interesting feature of this expres- sion is that it can easily be reduced to a smaller number of dimensions when m0 takes specific forms. As a typical example, we explicit the equation in the case

N X (75) m0 = zk(0) ⊗ ρk k=1 where ρk is an arbitrary measure on Ω and zk(0) a vector field. (We recall the notation (z ⊗ ρ| w) = R z(x)T w(x)ρ(dx).) The resulting evolution must take the PN −1 form µt = k=1 zk(t) ⊗ ρk (where zk(t) = Dϕ0t zk(0)). The first equation in (74) is d N dϕ X X Z (y) = z (x)T Ki(ϕ(x), ϕ(y))e dx. dt k i i=1 k=1 Ω For any matrix A with column vector Ai, and any vector z, zT Ai is the ith coor- dinate of AT z so that N dϕ X Z (76) (y) = K(ϕ(y), ϕ(x))z (x)dx. dt k k=1 Ω where we have used the fact that K(ϕ(x), ϕ(y))T = K(ϕ(y), ϕ(x)). The second equation in (74) becomes

d dµ  X | w = − µ| µ| D Ki(ϕ(x), ϕ(y))w(y) e  dt 2 x i y i=1 N Z Z d X X T i T = − zl (x)D2K (ϕ(x), ϕ(y))w(y)zk(y) eidρl(x)dρk(y) k,l=1 Ω Ω i=1 N Z N Z d ! X X T X T i = − zk(y) ei zl (x)D2K (ϕ(x), ϕ(y))dρl(x) w(y)dρk(y). k=1 Ω l=1 Ω i=1

This allows us to identify dzk/dt, which is given by N Z d dzk(y) X X (77) = − z (y)T e (D Ki(ϕ(x), ϕ(y)))T z (x)dρ (x). dt k i 2 l l l=1 Ω i=1 The equation is somewhat simpler when K is a scalar kernel, i.e. Ki(x, y) = i T Γ(x, y)ei, where γ takes real values. In this case D2K = ei∇2Γ (the derivative being always with respect to the second coordinate) and N Z d dzk(y) X X = − z (y)T e ∇ Γ(ϕ(x), ϕ(y))eT z (x)dρ (x) dt k i 2 i l l l=1 Ω i=1 N Z X T = − ∇2Γ(ϕ(x), ϕ(y))zk(y) zl(x)dρl(x). l=1 Ω This implies that the evolution of µ can be completely described using the evolution of z1, . . . , zN . In the particular case when the zk’s are constant vectors (that, as we will see, correspond to most point matching problems), this therefore provides a finite dimensional system on the µ part. 3. REGULARIZED MATCHING 79

3.5. General optimization procedures. In this section, we review a series of options for the minimization of E. Each of them is then illustrated with the point matching example, for which the energy takes the form

(78) E(ϕ) = ρ(ϕ) + F (ϕ(x1), . . . , ϕ(xN )) in which x1, . . . , xN are fixed points in Ω. 3.5.1. Gradient descent in V. The previous computations immediately provides gradient descent equations for minimizing E with respect to the trajectory (vt, t ∈ [0, 1]), namely d dvt X (79) = −2v − m | Dϕ−1Ki(ϕv (.), y)e dτ t 1 t1 t1 i i=1 ¯ v with m1 = ∂U(ϕ01)(x). Application to point matching. Consider now the point matching energy. In this case, letting U(ϕ) = F (ϕ(x1), . . . , ϕ(xN )), we have N ¯ X m1 = ∂U(ϕ01) = Fi ⊗ δϕ01(xi) i=1 where Fi = ∂F/∂zi is the partial derivative of F with respect to its ith variable computed at ϕ01(x1), . . . , ϕ01(xN ). We therefore have, by Corollary 3 d X v v ∇V U(v)(t, y) = (m1| Dϕt1(ϕ1t(.))K(y, ϕ1t(.))ei)ei i=1 d N X X T v v = Fi Dϕt1(ϕ0t(xi))K(y, ϕ0t(xi))eiei i=1 i=1 N X v v T = K(ϕ0t(xi), y)Dϕt1(ϕ0t(xi)) Fi i=1 so that N X v v T (80) ∇E(v)(t, y) = 2v(t, y) + 2 K(ϕ0t(xi), y)Dϕt1(ϕ0t(xi)) Fi. i=1 It is certainly possible to use this expression to implement a gradient descent over v, using

τ N dv X τ τ τ (81) (t, y) = −γ(2vτ (t, y) + 2 K(ϕv (x ), y)Dϕv (ϕv (x ))T F dτ 0t i t1 0t i i i=1 but one can be more efficient by using the finite dimensional nature of the problem. Indeed, we have the following result

Proposition 16. Assume that U(ϕ) only depends on the value of ϕ(xi), x1, . . . , xN being a fixed set of points in Ω. Let Z 1 ˜ 2 v E(v) = kvtkV dt + U(ϕ01) 0 1 for v ∈ XV (Ω). Then, for all v, E˜(v) ≥ E˜(ˆv) 80 3. MATCHING DEFORMABLE OBJECTS where at each time t, vˆ(t, .) is the solution of the spline interpolation problem v v infw kwkV with the constraints w(ϕ0t(xi)) = v(t, ϕ0t(xi)). In particular, vˆ(t, .) assumes the form

N X v (82)v ˆ(t, y) = K(y, ϕ0t(xi))αi(t). i=1 The explicit solution of the spline problem is provided in Theorem 10. The v proof of proposition 16 is straightforward once it is noticed thatv ˆ(t, ϕ0t(xi)) = v vˆ v v(t, ϕ0t(xi)) for all t implies that ϕ0t(xi) = ϕ0t(xi), since both are solution of dx/dt = v(x) with initial condition x(0) = xi. The issue with algorithm (81) is that the optimal form (82) is not conserved via (81). Using it directly forces to compute the vector field vτ at every point and v therefore lose the property it has to be uniquely specified by its values at ϕ0t(xi). It therefore should be corrected, with the help of the projection described in theorem v v 10. To reduce notation, we define xi (t) = ϕ0t(xi). The resulting algorithm is simpler to describe with discrete time steps. We assume that at time τ, we have a time-dependent vector field vτ which takes the form

N τ X vτ τ (83) v (t, y) = K(y, xi (t))αi (t). i=1

Using (81), we define

N τ X vτ vτ vτ T τ v˜(t, y) = v − δτ(2v (t, y) + 2 K(xi (t), y)Dϕt1 (xi (t)) Fi . i=1

Since this is a gradient step, we have ˜(˜v) ≤ E˜(vτ ), at least for small enough δτ. v˜ v˜ The values ofv ˜(t, .) are in fact only needed at the points xi (t) = ϕ0t(xi). These points are obtained by solving the differential equation

N dx X τ τ τ (84) = vτ (t, x) − δτ(2vτ (t, x) + 2 K(xv (t), x)Dϕv (xv (t))T F τ dt i t1 i i i=1

v˜ v˜ with x(0) = xi. Solving this equation simulataneously provides xi (t) andv ˜(xi (t)). Once this is done, define vτ+δτ (t, .) to be the solution of the approximation v˜ v˜ problem infw kwkV with w(xi (t)) = v(xi (t)) which will therefore take the form

N τ+δτ X vτ+δτ τ+δτ v (t, y) = K(y, xi (t))αi (t). i=1

Solving (84) requires evaluating the expression of vτ , which can be done exactly vτ vτ thanks to (83). It also requires computing the expression of Dϕt1 (xi (t)). This requires solving an auxilliary differential equation that is derived as follows. We have v v v v v v −1 Dϕt1(xi (t)) = Dϕt1 ◦ ϕ0t(xi) = Dϕ01(xi)Dϕ0t(xj) 3. REGULARIZED MATCHING 81

v v v v −1 v −1 Now, from (d/dt)Dϕ0t = Dv◦ϕ0tDϕ0t, we deduce (d/dt)(Dϕ0t) = −(Dϕ0t) Dv◦ v ϕ0t so that

d Dϕv (xv(t)) = −Dϕv (x )Dϕv (x )−1Dv(xv(t)) dt t1 i 01 i 0t i i v v v = −Dϕt1(xi (t))Dv(xi (t))

v v v so that Dϕt1(xi (t)) is solution of (d/dt)M = −MDv(xi (t)) with initial condition v M = Id. The matrix Dv(xi (t)) can be computed explicitely in function of the v point trajectories xj (t), j = 1,...,N, using (82). 3.5.2. Gradient with respect to µ. Since we have, from (74),

d dϕt X (y) = µ | Ki(ϕ (.), ϕ (y))e , dt t t t i i=1 the evolving diffeomorphism is uniquely specified if we are given the linear form µt at all times. Moreover, since we have

d X i  vt(y) = µt| K (ϕt(.), y) ei, i=1 we can write, using the fact that (mt| w) = (µt| w ◦ ϕ),

2 kvtkV = (mt| vt)

= (µt| vt ◦ ϕ) d X = µ | µ | Ki(ϕ (x), ϕ (y)) e  . t t t t x i y i=1

This implies that we can formulate the minimization problem in terms of µ as: minimize

d X Z 1 E(µ) = (µt| (µt| K(ϕt(x), ϕt(y))ei)xei)ydt + U(ϕ1) i=1 0 under the constraint

d dϕt X (y) = µ | Ki(ϕ (.), ϕ (y))e . dt t t t i i=1

Although this is apparently more complex than the original problem in terms of v, using µ can take advantage of the fact that this linear form is often simpler (lower dimensional) than the vector field. We now discuss the computation of the gradient of E with respect to µ. Consider a variation µ 7→ µ + δµ, which induces a 82 3. MATCHING DEFORMABLE OBJECTS variation ϕ 7→ ϕ + δϕ. The variation of the energy is

d X Z 1 δE = 2 (δµt| (µt| K(ϕt(x), ϕt(y))ei)xei)ydt i=1 0 d X Z 1 + µ | µ | D Ki(ϕ (x), ϕ (y))δϕ (x) e  dt t t 1 t t t x i y i=1 0 d X Z 1 + µ | µ | D Ki(ϕ (x), ϕ (y))δϕ (y)e  e  dt t t 2 t t t i x i y i=1 0 δU  + | δϕ . δϕ 1 This takes the form Z 1 Z 1 δU  (85) δE = (δµt| At)dt + (βt| δϕt)dt + | δϕ1 . 0 0 δϕ for some vector field At and linear form βt. The usual approach would then require writing  δϕ  (β | δϕ ) = β | t δµ t t t δµ  δϕ ∗  = δµ| t β δµ t to obtain an expression that only depends on δµ. The variation of ϕ, however, cannot be directly computed in function of δµ , but as a solution of the differential equation

d dδϕt X (y) = δµ | Ki(ϕ (x), ϕ (y)) e dt t t t x i i=1 d X i  + µt| D1K (ϕt(x), ϕt(y))δϕt(x) xei i=1 d X i  + µt| D2K (ϕt(x), ϕt(y))δϕt(y) xei. i=1 This takes the form dδϕ (86) t (y) = H δµ + J δϕ , dt t t t t the initial condition being δϕ0 = 0. The following lemma provides the necessary tool to incorporate the dependency of δϕt on δµ in (85). Lemma 7. Let B be a Banach space and consider a time dependent linear operator Jt : B → B, t ∈ [0, 1]. We assume that supt kJtk is bounded, so that the linear equation dx (87) = J x dt t 3. REGULARIZED MATCHING 83 has a unique solution over [0, 1] for any initial condition. Denote Msth the solution at time t of this equation with condition xs = h at time s. Then, for any t and any ∗ ∗ η ∈ B , M0tη is given by the solution at time 0 of the equation dξ (88) = −J ∗ξ ds s with condition ξt = η. ∗ Thus M0tη is computed by solving (88) backwards in time. (Recall that if A : B → B is an operator, its conjugate is defined by (A∗η| w) = (η| Aw).)

Proof. We first remark that, from the uniqueness of the solution, we have −1 Msth = M0tM0s h which implies d dM M h = −M M −1 0s M −1h = −M J h. ds st 0t 0s ds 0s st s Therefore, for η ∈ B∗, d M ∗ η = −J ∗M ∗ η ds st s st ∗ so that ξs = Mstη satisfies (88). ut This lemma can be applied to (86) as follows. First, note that the general solution of (86) is given by Z t δϕt = MstHsδµsds. 0 This is obtained by variation of the constant on the homogeneous equation (87), and can be directly checked to satisfy (86). We can therefore rewrite (85) under the form Z 1 Z 1 Z t Z 1 δU  δE = (δµt| At)dt + (βt| MstHsδµs)dsdt + | Mt1Htδµt dt 0 0 0 0 δϕ Z 1 Z 1 Z 1 Z 1   ∗ ∗ ∗ ∗ δU = (δµt| At)dt + (δµs| Hs Mstβt)dsdt + δµt| Ht Mt1 dt 0 0 s 0 δϕ Z 1 = (δµt| zt)dt 0 with Z 1 ∗ ∗ ∗ ∗ δU zt = At + Ht Mtsβsds + Ht Mt1 . t δϕ R 1 ∗ ∗ δU Let ζt = t Mtsβsds + Mt1 δϕ . Using Lemma 7, we have dζ t = −J ∗ζ − β . dt t t t To summarize, the computation of δE/δµ requires the following steps:

• Compute, for the specific problem at hand, the expressions of At, βt, Jt and Ht. ∗ ∗ • Compute the conjugate maps Jt and Ht . 84 3. MATCHING DEFORMABLE OBJECTS

• Solve the following ordinary differential equation backwards in time:

dζ t = −J ∗ζ − β dt t t t

with initial condition ζ1 = (δU/δϕ)(ϕ1). • Let ∗ zt = At + Ht ζt R 1 so that δE = 0 (δµt| zt). −1 • Update µt → µt − εK ζt.

Note that the procedure can in fact be simpler due to the specific form of µt. Application to point matching. The minimization in terms of t 7→ µt directly captures the finite dimensional nature of the problem. We must indeed have

N X µt = αk(t) ⊗ δxk k=1 at all times, for some coefficients α1, . . . , αN . In this case, the evaluation of E(µ) gives d Z 1 X T E(α) = αk(t) K(ϕt(xk), ϕt(xl))αl(t)dt + U(ϕ1) k,l=1 0 so that, letting zk(t) = ϕt(xk)

N Z 1 X T δE = 2 δαk(t) K(zk(t), zl(t))αl(t)dt k,l=1 0 N Z 1 X T + αk(t) (D1K(zk, zl)δzk)αl(t)dt k,l=1 0 N Z 1 X T + αk(t) (D2K(zk, zl)δzl)αl(t)dt k,l=1 0 +∇U(z)T δz(1).

This takes the form

Z 1 Z 1 δU  δE = A(t)T δα(t)dt + β(t)T δz(t)dt + | δz(1) 0 0 δz with

N X Ak(t) = K(zk(t), zl(t))αl(t) and l=1 N X T T βk(t) = ∇1(αk(t) K(zk, zl)αl(t)) + ∇2(αl(t) K(zl, zk)αk(t)) l=1 3. REGULARIZED MATCHING 85

Moreover, we can write N dδzk X = 2 K(z (t), z (t))δα (t) dt k l l l=1 N X + D1(K(zk(t), zl(t))αl(t))δzk(t) l=1 N X + D2(K(zk(t), zl(t))αl(t))δzl(t) l=1 This takes the form dδz (y) = H δα(t) + J δz(t), dt t t Ht being the block matrix with (k, l) block given by 2K(zk(t), zl(t)) and being the block matrix with (k, l) block given by

N X D1(K(zk(t), zl0 (t))αl0 (t))δkl + D2(K(zk(t), zl(t))αl(t)). l0=1 The general method can now be applied to this problem, namely: Solve backwards in time: dζ = −J T ζ(t) − β dt t t with initial condition ζ1 = δU/δz(z(1)). Then let T h(t) = At + Ht ζ(t) R 1 T so that δE = 0 (δα(t)) h(t). Finally, update α(t) → α(t) − εh(t). 3.5.3. Gradient in the initial momentum. We now use the whole information we have on the optimal solution, since we know that the optimal vt is uniquely constrained by its value at t = 0, from equation (70). It is therefore possible to optimize with respect to v0, or equivalently with respect to µ0 = m0. This requires to find m0 such that Z 1 2 v kvtk dt + λU(ϕ01) 0 v Pd −1 (i)  is minimal under the constraints dϕ0t/dt = vt◦ϕ, with vt = i=1 m0| Dϕ K (., y) ei. The following results helps simplifying this expression.

∗ T 2 Proposition 17. If m0 ∈ V , vt = Ad v Km0 then kvtk = (m0| Km0). (It ϕ0t V is in particular independent of time.)

v −1 v  Proof. We have hvt , vtiV = m0| (Dϕ0t) vt ◦ ϕ0t . Using d (Dϕv )−1 = −(Dϕv )−1Dv ◦ ϕv dt 0t 0t 0t we have d (Dϕv )−1v ◦ ϕv = −(Dϕv )−1Dv ◦ ϕv v ◦ ϕv + (Dϕv )−1Dv ◦ ϕv v ◦ ϕv = 0 dt 0t t 0t 0t t 0t t 0t 0t t 0t t 0t v −1 v so that (Dϕ0t) vt◦ϕ0t is equal at all times to its value at t = 0, which is v0 = Km0. ut 86 3. MATCHING DEFORMABLE OBJECTS

R 1 2 From this result, we get 0 kvtk dt = (m0| Km0) and the minimization problem therefore is to find m0 such that

E(m0) = (m0| Km0) + λU(ϕ01) is minimal, with

d d X   ϕ (y) = m | Dϕ−1K(i)(., ϕ (y)) e . dt 0t 0 0t 0t i i=1

This is still a delicate problem, since the relation between m0 and ϕ01 comes from the solution of a differential equation of which m0 is a parameter. To solve the optimization problem, we need as a first step to understand the relation between a small variation m0 → m0 + δm0 and the resulting variation ϕ0t → ϕ0t + δϕ0t. Let’s use for this the alternative form d dϕ X (y) = µ| Ki(ϕ(.), ϕ(y))e , dt i i=1 d dµ  X | w = − µ| µ| D Ki(ϕ(x), ϕ(y))w(y) e  . dt 2 x i y i=1 with µ0 = m0. Computing a variation of this expression yields

d dδϕ X (89) (y) = δµ| Ki(ϕ(x), ϕ(y)) e dt x i i=1 d X i  + µ| D1K (ϕ(x), ϕ(y))δϕ(x) xei i=1 d X i  + µ| D2K (ϕ(x), ϕ(y))δϕ(y) xei i=1 d dδµ  X (90) | w = − δµ| µ| D Ki(ϕ(x), ϕ(y))w(y) e  dt 2 x i y i=1 d X − µ| δµ| D Ki(ϕ(x), ϕ(y))w(y) e  2 x i y i=1 d X − µ| µ| D (D Ki(ϕ(x), ϕ(y))w(y))δϕ(x) e  1 2 x i y i=1 d X − µ| µ| D (D Ki(ϕ(x), ϕ(y))w)δϕ(y) e  . 2 2 x i y i=1

This provides the evolution of δϕ0t and δµt, the initial condition being δϕ00 = 0 and δµ0 = δm0. Note that if we make m0 → m0 + δm0 and ϕ01 → ϕ01 + δϕ01, the variation δE can formally be written as   δU δϕ01 δE = (δm0 | Km0) + | δm0 δϕ01 δm0 3. REGULARIZED MATCHING 87 where (δϕ01/δm0)δm0 is the operation that consists in solving the system that we have just computed with initial condition δm0. To be able to explicit a gradient descent algorithm, we need to move (δϕ01/δm0) to the other side for the (.| .) pairing, in order to isolate δm0 (in other terms, we need to compute the conjugate of (δϕ01/δm0)). In finite dimensions, this can be done by representing (δϕ01/δm0) as a matrix and computing its transpose. Note that the finite dimensional situation does apply for point set problems. However, even in finite dimensions, and a fortiori in infinite dimensions, one can use Lemma 7 to compute the quantity η such that   δU δϕ01 | δm0 = (δm0| η) δϕ01 δm0 without having to explicitly compute (δϕ01/δm0). Equations (89) and (90) indeed provide a linear system in (δϕ, δµ), of the form (d/dt)(δϕ, δµ) = Jt(δϕ, δµ). Lemma 7 can therefore be applied to it to obtain a general optimization algorithm in terms of the initial momentum m0. The general procedure to update a current value of m0 works as follows:

1. Compute the variation δU/δϕ01. 2. Apply lemma 7 to (89) and (90) to obtain ζ such that   δU δϕ01 | δm0 = (δm0| ζ). δϕ01 δm0

3. Replace m0 by m0 − γ(2Km0 + λζ). Application to point matching. We now apply this approach to point matching problems. Since m0 takes the form N X m0 = αk ⊗ δxk(0) k=1 we are in the situation studied for (75) and the evolution equations are (assuming, to simplify, a scalar kernel K(x, y) = Γ(x, y)Idd) N dxk X = Γ(x , x )α dt k l l l=1 and N Z dαk X = − ∇ Γ(x , x )αT α . dt 2 l k k l l=1 Ω Using as before a Γ(x, y) = γ(|x − y|2), the last equation can be rewritten N Z dαk X = −2 γ0(|x − x |2)αT α (x − x ). dt l k k l k l l=1 Ω

Let’s consider the effect of a variation αk(0) → αk(0) + δαk(0). Denote γkl = 2 γ(|xk − xl| ) and similarly for the derivatives. We have N dδxk X = γ δα dt kl l l=1 N X 0 T +2 γkl(xk − xl) (δxk − δxl)αl l=1 88 3. MATCHING DEFORMABLE OBJECTS and N dδαk X = −2 γ0 (δαT α + δαT α )(x − x ) dt kl k l l k k l l=1 N X 0 T −2 γklαk αl(δxk − δxl) l=1 N X 00 T T −2 γklαk αl(xk − xl)(xk − xl) (δxk − δxl). l=1

Introduce the dN by dN matrices Jxx, Jxα, Jαx and Jαα with (k, l) d by d blocks given by:

N X 0 T Jxx(k, l) = 2 γkqαl(xk − xq) δkl − 2αl(δxk − δxl) q=1

Jxα(k, l) = γklIdd N X 0 T T T T Jαx(k, l) = −2( γkqαk αq(Idd + (xk − xq)(xk − xq) )δkl − 2αk αl(Idd + (xk − xl)(xk − xl) ) q=1 N X 0 T 0 T Jαα(k, l) = 2( γkq(xk − xq)αq )Idd + 2γkl(xk − xl)αk q=1 so that, J being the block matrix formed with Jxx, Jxα, Jαx and Jαα d δx δx = J . dt δα δα The gradient descent methods can therefore be implemented by iterating the following steps: PN 2 (1) Compute δU/δϕ. For example, if U(ϕ) = i=1 |ϕ(xi)−yi| , take δU/δϕ = PN 2 i=1(ϕ(xi) − yi). (2) Solve backwards in time the equation d η  η  x = J T x . dt ηα ηα

(3) Replace α by α − ε(Kα + ηα(0). 3.5.4. Solving the Euler equation. The Euler-Lagrange equations for our prob- 1 ¯V lem are m1 = − 2 ∂ U(ϕ01) with (m1| w ◦ ϕ10) = (µ1| w) and µt given by the previous evolution. Note that δU  ∂U¯ (ϕ)| w ◦ ϕ−1 = | w . δϕ 1 ¯ so that m1 = − 2 ∂U(ϕ01) is equivalent to 1 δU (91) µ = − (ϕ ). 1 2 δϕ 01

Since ϕ01 and µ1 can be considered as functions of µ0 = m0, one can try to solve (91) directly in function of this initial value. The principle of the algorithm 3. REGULARIZED MATCHING 89 is very simple: given a current value of m0, find a variation m0 → m0 + δm0 such that (91) is true at first order, i.e., 2 δµ1 1 δU 1 δ U δϕ01 (92) µ1 + δm0 = − (ϕ01) − 2 (ϕ01) δm0. δm0 2 δϕ 2 δϕ δm0

This linear equation in δm0 needs to be solved to update the current m0. We have already described the variations (δµ1/δm0)δm0 and (δϕ01/δm0)δm0 as solutions of the system (89) and (90). However, inverting this system cannot be done as easily as computing its conjugate, as we did in Lemma 7. In fact, Newton’s method can mainly be applied for systems of small (finite !) dimension, which will be the case, in our context, for some point matching problems. Application to point matching. Assuming that U is a function U(z1, . . . , zN ) where zj = ϕ(xj), j = 1,...,N, the Euler equation is α(1) + ∇U(x(1)) = 0. For a variation α(0) → α(0) + δα(0), we have δx(1)  0  = H(1) δα(1) δα(0) where H(1) is obtained by solving the differential equation (using the matrix J computed in the previous section) dH/dt = JH. Denoting Hxx(1), Hxα(1), Hαx(1) and Hαα(1) the four dN by dN blocks of H(1), the variation of the Euler equation yields, for the current α(1) and x(1) 2 α(1) + Hαα(1)δα(0) + ∇U(x(1)) + D U(x(1))Hxα (1)δα(0) = 0 which yields the increment 2 −1 δα(0) = −(Hαα(1) + D U(x(1))Hxα (1)) (α(1) + ∇U(x(1))). 3.6. Another point matching algorithm. We can also consider the point matching problem in terms of the evolving points, x(t). Let S(x(t)) = S(x1(t), . . . , xN (t)) be the block matrix with (i, j) block given by K(xi(t), xj(t)). The constraints are dx(t)/dt = S(x(t))α(t) so that α(t) = S(x(t))−1dx(t)/dt and the minimization reduces to Z 1 dx(t)T dx(t) E(x) = S(x(t))−1 dt + U(x(1)). 0 dt dt Minimizing this function with respect to x by gradient descent is possible, and has been described in [16, 17] for labelled landmark matching. The basic computation r is as follows: if spq = dspq/dxr, we can write Z 1  T d dx(t) −1 dξ(t) E(x + εξ)|ε=0 = 2 S(x(t)) dt dε 0 dt dt Z 1 X r T − αp(t)αr(t)spq(x(t))ξq(t)dt + ∇U(x(1)) ξ(1) 0 p,q,r After a change of variables in the first integral, we obtain a gradient given by     d −1 dx(t) −1 dx(t) −2 S(x(t)) − z(t) + S(x(1)) + ∇U(x(1)) δ1(t) dt dt dt |t=1 P r where zq(t) = p,r αp(t)αr(t)spq(x(t)) and δ1 is the Dirac measure at t = 1 (this “gradient” is in fact a generalized function). 90 3. MATCHING DEFORMABLE OBJECTS

This singular part in the gradient can be removed by computing the gradient in a Hilbert space in which the evaluation function x(.) 7→ x1 is a continuous linear form. This method has been introduced, in particular, in [13]. Let H be the space of all landmark trajectories x : t 7→ x(t) = (x1(t), . . . , xN (t)), with fixed starting point x(0), free end-point x(1) and square integrable time derivative. This is a space of the form x(0) + H where H is the Hilbert space of time dependent functions t 7→ ξ(t), considered as column vectors of size Nk, with Z 1 T 0 dξ dξ T 0 hξ , ξ iH = dt + ξ(1) ξ (1). 0 dt dt What we need to do, now, in order to obtain the gradient for this inner product, is to express d E(x + εξ) under the form ∇H E , ξ . We will make the assumption dε |ε=0 x H that Z 1 2 −1 dx(t) S(x(t)) dt < ∞ 0 dt which implies that s Z 1  T Z 1 2 Z 1 2 dx(t) −1 dξ(t) dx(t) dξ(t) S(x(t)) dt ≤ S(x(t))−1 dt dt 0 dt dt 0 dt 0 dt is continuous in ξ. Similarly, the linear form ξ 7→ ∇U(x(1))T ξ(1) is continuous since ∇U(x(1))T ξ(1) ≤ |∇U(x(1))| |x(1)| R 1 T and so is ξ 7→ 0 z(t) ξtdt, since, letting Z t η(t) = z(s)dt 0 and assuming that it is square integrable over [0, 1], we have Z 1 Z 1 dξ(t) z(t)T ξ(t)dt = η(1)ξ(1) − η(t) dt. 0 0 dt Thus, d ξ 7→ E(x + εξ) dε |ε=0 H is continuous over H, and the Riesz representation theorem implies that ∇x E exists as an element of H. We now proceed to its computation. Letting Z 1 dx(t) µ(t) = 2 S(x(t))−1 0 dt and a = ∇U(x(1)), the problem is to find ζ ∈ H such that, for all ξ ∈ H, Z 1 T Z 1 dµ dξ T T hζ , ξiH = dt + z(t) ξ(t)dt + a ξ(1). 0 dt dt 0 The latter expression can also be written Z 1 dζ T dξ Z 1 dµ T dξ + ζ(1) dt = + η(1) − η(t) + a dt. 0 dt dt 0 dt dt This suggests selecting ζ such that ζ(0) = 0 and dζ dµ + ζ(1) = + η(1) − η(t) + a dt dt 3. REGULARIZED MATCHING 91 which implies

Z t ζ(t) + tζ(1) = µ(t) − η(s)ds + t(η(1) + a). 0

At t = 1, this yields

Z 1 2ζ(1) = µ(1) − η(s)ds + η(1) + a 0 and we finally obtain

Z t t Z 1  ζ(t) = µ(t) − η(s)ds + η(s)ds − µ(1) + η(1) + a /2. 0 2 0

We summarize this in an algorithm, in which τ is again the computation time.

Algorithm 5 (Gradient descent algorithm for landmark matching). Start with 0 0 0 initial landmark trajectories x (t) = (x1(t), . . . , xN (t)). Solve

dxτ (t)  Z t t Z 1   = −γ µτ (t) − ητ (s)ds + ητ (s)ds − µτ (1) + ητ (1) + aτ /2 dτ 0 2 0

τ τ τ R t τ τ R t τ with a = ∇U(x (1)), µ (t) = 0 α (s)ds, η (t) = 0 z (s)dt and

dx (τ) α (τ) = S(x (τ))−1 t t t dt

τ P τ τ r τ zq (t) = p,r αp (t)αr (t)spq(x (t)).

3.7. Image matching. We now take an infinite dimensional example to illus- trate the previously discussed methods and focus on the image matching problem. We therefore consider

Z U(ϕ) = (I ◦ ϕ−1 − I˜)2dx. Ω

The Eulerian differential of U is given by (44):

∂U¯ (ϕ) = −2(I ◦ ϕ−1 − I0)∇(I ◦ ϕ−1) ⊗ dx. 92 3. MATCHING DEFORMABLE OBJECTS

So, according to (71),

d X v v ∇U(v)(t, y) = (m| Dϕt1(ϕ1t(.))K(y, ϕ1t(.))ei)ei i=1 d Z X v ˜ v T v v = ei (I ◦ ϕ10(x) − I(x))∇(I ◦ ϕ10)(x) Dϕt1(ϕ1t(x))K(y, ϕ1t(x))eidx i=1 Ω d Z X v ˜ v T v v v = ei (I ◦ ϕ10(x) − I(x))∇I ◦ ϕ10(x) Dϕ10(x)Dϕt1(ϕ1t(x))K(y, ϕ1t(x))eidx i=1 Ω d Z X v ˜ v T v v = ei (I ◦ ϕ10(x) − I(x))∇I ◦ ϕ10(x) Dϕt0(x)(ϕ1t(x))K(y, ϕ1t(x))eidx i=1 Ω d Z X v ˜ v v T = ei (I ◦ ϕ10(x) − I(x))(∇I ◦ ϕt0)(ϕ1t(x)) K(y, ϕ1t(x))eidx i=1 Ω d Z X v ˜ T v v v = ei (I ◦ ϕ10(x) − I(x))ei K(ϕ1t(x), y)∇(I ◦ ϕt0)(ϕ1t(x))dx i=1 Ω Z v ˜ v v v = (I ◦ ϕ10(x) − I(x))K(ϕ1t(x), y)∇(I ◦ ϕt0)(ϕ1t(x))dx. Ω This provides the expression of the V -gradient of E˜ for image matching, namely (93) Z V v ˜ v v v (∇ E(v))(t, y) = 2vt − 2 (I ◦ ϕ10(x) − I(x))K(ϕ1t(x), y)∇(I ◦ ϕt0)(ϕ1t(x))dx. Ω Note that, using a change of variable, the integral may also be written (94) Z v ˜ v v v (∇vE)(t, y) = 2vt−2 (I◦ϕt0(x)−I(x)◦ϕt1(x))K(x, y)∇(I◦ϕt0)(x) det(Dϕt1(x))dx. Ω Note also that in this formula, the transpose of the kernel appears (K(x, y) = K(y, x)T ), the two being the same in the case of a scalar kernel. We have just obtained the algorithm, where running time now is denoted by τ

Algorithm 6 (Flow based gradient descent for image matching). Start with 0 1 0 an initial guess v ∈ XV (Ω) (for example v = 0). Let v evolve according to the equation τ  Z  dvt (y) τ τ = −γ vt (y) − K(x, y)αt (x)dx dτ Ω τ vτ ˜ vτ vτ vτ with αt (x) = (I ◦ ϕt0 (x) − I(x) ◦ ϕt1 (x))K(x, y)∇(I ◦ ϕt0 )(x) det(Dϕt0 (x)).

Let’s now consider what an optimization with respect to the initial m0 would 0 provide. First note that m1 = (I ◦ ϕ10 − I )∇(I ◦ ϕ10) ⊗ dx so that Z 0 T (m1| w) = (I ◦ ϕ10 − I )∇(I ◦ ϕ10) w(x)dx Ω 3. REGULARIZED MATCHING 93

We have, letting ϕ = ϕ01 and ψ = ϕ10

−1  (m0| w) = m1| Dψ w ◦ ψ Z = (I ◦ ψ − I0)∇(I ◦ ψ)T Dψ−1w ◦ ψdx Ω Z = (I ◦ ψ − I0)(∇I)T ◦ ψw ◦ ψdx Ω Z = (I − I0 ◦ ϕ)(∇I)T w det(Dϕ)dx Ω which takes the form m0 = Z(0) ⊗ dx for a vector valued function Z(0). We can therefore use the evolution equations (76) and (77) with N = 1 and ρ1 = dx which yields (assuming a scalar kernel K(x, y) = Γ(x, y)I to simplify the expressions) Z dϕ0t (y) = Γ(ϕ0t(y), ϕ0t(x))Z(x)dx dt Ω and Z dZ(y) T = − Z(y) Z(x)∇2Γ(ϕ(x), ϕ(y))dx. dt Ω Still too avoid too complicated expressions, let’s assume that Γ is radial and takes 2 0 2 the form Γ(x, y) = γ(|x − y| ). This implies ∇2Γ(x, y) = 2γ (|x − y| )(y − x) and

00 2 T 0 D22Γ(x, y) = 4γ (|x − y| )(y − x)(y − x) + 2γ (x, y)Idd = −D21Γ(x, y).

A variation m0 → m0 + δm0 yields variations ϕ → ϕ + δϕ and Z → Z + δZ that satisfy the following equations, starting with δϕ. dδϕ Z (y) = γ(|ϕ(y) − ϕ(x)|2)δZ(x)dx dt Ω Z +2 γ0(|ϕ(y) − ϕ(x)|2)(ϕ(y) − ϕ(x))T (δϕ(y) − δϕ(x))Z(x)dx Ω For δZ, we have dδZ(y) Z = −2 (δZ(y)T Z(x) + δZ(x)T Z(y))γ0(|ϕ(x) − ϕ(y)|2)(ϕ(y) − ϕ(x))dx dt Ω Z −2 Z(y)T Z(x)γ0(|ϕ(x) − ϕ(y)|2)(δϕ(y) − δϕ(x))dx Ω Z −4 Z(y)T Z(x)γ00(|ϕ(x) − ϕ(y)|2)(ϕ(y) − ϕ(x))(ϕ(y) − ϕ(x))T (δϕ(y) − δϕ(x))dx Ω This provides an expression of the form (d/dt)(δϕ, δZ) = J.(δϕ, δZ)and we now compute its transpose, i.e., letting Jϕ and JZ be the two components of J, we solve Z Z T T T T (ηϕ δϕ + ηZ δZ)dx = (wϕ Jϕ(δϕ, δZ) + wZ JZ (δϕ, δZ))dy Ω Ω to obtain ηϕ and ηZ in function of wϕ and wZ . Let’s detail this computation. To shorten the formulas we will denote γ = γ(|ϕ(x) − ϕ(y)|2) and the same for γ0 and 94 3. MATCHING DEFORMABLE OBJECTS

γ00, remembering that they are symmetric functions of x and y. First, we have Z Z Z T T (wϕ Jϕ(δϕ, δZ)dy = wϕ(y) γδZ(x)dx Ω Ω Ω Z Z T 0 T +2 wϕ(y) γ (ϕ(y) − ϕ(x)) (δϕ(y) − δϕ(x))Z(x)dx Ω Ω Z Z T = Z(x) γwϕ(y)dydx Ω Ω Z Z 0 T T T +2 δϕ(x) γ (ϕ(x) − ϕ(y)) (wϕ(y) Z(x) + wϕ(x) Z(y))dxdy. Ω Ω For the second term, we write Z Z Z T T T T 0 wZ JZ (δϕ, δZ)dy = −2 wZ (y) (δZ(y) Z(x) + δZ(x) Z(y))γ (ϕ(y) − ϕ(x))dxdy Ω Ω Ω Z Z T T 0 −2 wZ (y) Z(y) Z(x)γ (δϕ(y) − δϕ(x))dxdy Ω Ω Z Z T T 00 T −4 wZ (y) Z(y) Z(x)γ (ϕ(y) − ϕ(x))(ϕ(y) − ϕ(x)) (δϕ(y) − δϕ(x))dxdy Ω Ω Z Z T 0 T = −2 δZ(x) Z(y)γ (wZ (y) − wZ (x)) (ϕ(y) − ϕ(x))dydx Ω Ω Z Z T 0 T T −2 δϕ(x) γ Z(y) Z(x)(wZ (x) − wZ (y)) dydx Ω Ω Z Z T 00 T T −4 δϕ(x) γ Z(y) Z(x)(ϕ(y) − ϕ(x))(ϕ(y) − ϕ(x)) (wZ (x) − wZ (y))dydx Ω Ω This yields Z 0 T T T ηϕ = 2 γ (ϕ(x) − ϕ(y)) (wϕ(y) Z(x) + wϕ(x) Z(y))dy Ω Z 0 T T −2 γ Z(y) Z(x)(wZ (x) − wZ (y)) dy Ω Z 00 T T −4 γ Z(y) Z(x)(ϕ(y) − ϕ(x))(ϕ(y) − ϕ(x)) (wZ (x) − wZ (y))dy Ω and Z Z 0 T ηZ = γwϕ(y)dy − 2 Z(y)γ (wZ (y) − wZ (x)) (ϕ(y) − ϕ(x))dy. Ω Ω These computations now provide an image matching algorithm optimizing with respect to the variable z0 such that m0 = z0∇I ⊗ dx. The variation in the gradient direction, δm0 = δz0∇I0 is computed as follows. (1) Compute ∂U/∂ϕ = −2(I − I0 ◦ ϕ) det(Dϕ)Dϕ−T ∇I. (2) Solve, backwards in time, until time t = 0 the equation dwϕ/dt = ηϕ.w and dwZ /dt = ηZ .w with initial condition wϕ(1) = ∂U/∂ϕ and wZ (1) = 0. T (3) Set δz0 = ∇I (Km0 + λwZ (0)).