Deformation analysis for shape and image processing
Laurent Younes
Contents
General notation 5 Chapter 1. Elements of Hilbert space theory 7 1. Definition and notation 7 2. Examples 8 3. Orthogonal spaces and projection 9 4. Orthonormal sequences 10 5. The Riesz representation theorem 11 6. Embeddings and the duality paradox 12 7. Weak convergence in a Hilbert space 15 8. Spline interpolation 16 9. Building V and its kernel 22 Chapter 2. Ordinary differential equations and groups of diffeomorphisms 35 1. Introduction 35 2. A class of ordinary differential equations 37 3. Groups of diffeomorphisms 44 Chapter 3. Matching deformable objects 49 1. General principles 49 2. Matching terms and greedy descent procedures 49 3. Regularized matching 70 Chapter 4. Groups, manifolds and invariant metrics 95 1. Introduction 95 2. Differential manifolds 95 3. Submanifolds 98 4. Lie Groups 99 5. Group action 102 6. Riemannian manifolds 103 7. Gradient descent on Riemannian manifolds 110 Bibliography 113
3
General notation
The differential of a function ϕ :Ω → Rd, Ω being an open subset of Rd, is denoted Dϕ : x 7→ Dϕ(x). It takes values in the space of linear operators from Rd to Rd so that Dϕ(x) can be identified to d × k matrix (containing the partial derivatives of the coordinates of ϕ). d Idk is the identity matrix in R . ∞ CK (Ω) is the set of infinitely differentiable functions on Ω, an open subset of Rd.
5
CHAPTER 1
Elements of Hilbert space theory
1. Definition and notation A set H is a (real) Hilbert space if i) H is a vector space over R (i.e., addition and scalar multiplication are defined on H). 0 0 0 ii) H has an inner product denoted (h, h ) 7→ hh , h iH , for h, h ∈ H. This inner product is a bilinear form, which is positive definite and the associ- p ated norm is kh]kH = hh , hiH . iii) H is a complete space for the topology associated to the norm. Here are some explanations of iii). Converging sequences for the norm topology are sequences hn for which there exists h ∈ H such that kh − hnkH → 0. Property iii) means that if a sequence (hn, n ≥ 0) in h is a Cauchy sequence, i.e., it collapses in the sense that, for every positive ε there exists n0 > 0 such that khn − hn0 kH ≤ ε for n ≥ n0, then it necessarily has a limit: there exists h ∈ H such that khn − hk → 0 when n tends to infinity.
If condition ii) is weakened to the fact that k.kH is a norm (not necessarily induced by an inner product), one says that H is a Banach space. If H satisfies i) and ii), it is called a pre-Hilbert space. On pre-Hilbert spaces, the Schwartz inequality holds: Proposition 1 (Schwartz inequality). If H is pre-Hilbert, and h, h0 ∈ H, then 0 0 hh , h iH ≤ khkH kh kH The first consequence of this property is Proposition 2. The inner product on H is continuous for the norm topology. Proof. The inner product is a function H × H → R. Letting ϕ(h, h0) = 0 hh , h iH , we have, by Schwartz inequality, and introducing a sequence (hn) which converges to h 0 0 0 |ϕ(h, h ) − ϕ(hn, h )| ≤ kh − hnkH kh kH → 0 which proves the continuity with respect to the first coordinate, and also with respect to the second coordinate by symmetry. ut Working with a complete normed vector space is essential when dealing with infinite sums of elements: if (hn) is a sequence in H, and if khn+1 + ··· + hn+kk can be made arbitrarily small for large n and any k > 0, then completeness implies P∞ that the series n=0 hn has a limit in h. In particular, absolutely converging series in H converge: X X (1) khnkH < ∞ ⇒ hn converges. n≥0 n≥0
7 8 1. ELEMENTS OF HILBERT SPACE THEORY
We shall add a fourth condition to our definition of a Hilbert space iv) H is separable for the norm topology: there exists a countable subset S in H such that, for any h ∈ H and any ε > 0, there exists h0 ∈ S such that kh − h0k ≤ ε. In the following, Hilbert spaces will always be a separable Hilbert spaces (this is not always the case in the literature). A Hilbert space isometry between two Hilbert spaces H and H0 is a an invertible 0 0 0 0 linear map F : H → H such that, for all h, h ∈ H, hF h , F h iH0 = hh , h iH . A sub-Hilbert space of H is a subspace H0 of H (i.e., a non-empty subset closed 0 by linear combination) which is closed for the norm topology: if (hn) is a sequence 0 0 0 in H and khn − hkH → 0 for some h ∈ H, then h ∈ H . Topological closedness implies that H0 is itself a Hilbert space, since Cauchy sequences in H0 also are Cauchy sequences in H, hence converge in H, hence in H0 since H0 is closed. The next proposition shows that every finite dimensional subspace of a Hilbert space is closed, and hence a Hilbert subspace.
Proposition 3. If K is a finite dimensional subspace of H, then K is closed in H
Proof. let e1, . . . , ep be a basis of K. Let (hn) be a sequence in K which Pp converges to some h ∈ H: then hn may be written hn = k=1 aknek and for all l = 1, . . . , p: p X hhn , eliH = hek , eliH akn k=1 p p Let an be the vector in R with coordinates (akn, k = 1, . . . , p) and un ∈ R with coordinates (hhn , ekiH , k = 1, . . . , p). Let also S be the matrix with coefficients skl = hek , eliH , so that the previous system may be written: un = San. The matrix S is invertible: if b belongs to the null space of S, a quick computation shows that p 2
X T biei = b Sb = 0 i=1 H which is only possible when b = 0, since (e1, . . . , en) is a basis. We therefore have −1 an = S un. Since un converges to u with coordinates (hh , ekiH , k = 1, . . . , p) (by −1 continuity of the inner product), we obtain the fact that an converges to a = S u. Pp Pp But this implies that k=1 aknek → k=1 akek and since the limit is unique, we Pp have h = k=1 akek ∈ K. ut
2. Examples n 0 2.1. Finite dimensional Euclidean spaces. H = R with hh , h i n = Pn 0 R i=1 hihi is the standard example of a finite dimensional Hilbert space.
2.2. `2 space of real sequences. Let H be the set of real sequences h = P∞ 2 (h1, h2,...) such that i=1 hi < ∞. Then H is a Hilbert space. 3. ORTHOGONAL SPACES AND PROJECTION 9
2.3. L2 space of functions. Let k and d be two integers. Let Ω be an open subset of Rd. We define L2(Ω, Rd) as the set of all square integrable functions h :Ω → Rd, with inner-product Z 0 0 hh , h iL2 = h(x)h (x)dx Ω Integrals are taken with respect to the Lebesgue measure on Ω, and two functions which coincide everywhere except on a set of null Lebesgue measure are consid- ered as equal. The fact that L2(Ω, Rd) is a Hilbert space is a standard result in integration theory.
3. Orthogonal spaces and projection Let O be a subset of H. The orthogonal space to O is defined by ⊥ O = {h : h ∈ H, ∀o ∈ O, hh , oiH = 0} Theorem 1. O⊥ is a sub-Hilbert space of H. Proof. Stability by linear combination is obvious, and closedness is a conse- quence of the continuity of the inner product. ut When K is a sub-Hilbert space of H (a closed subspace) and h ∈ H, one can define the variational problem: 0 0 (PK (h)) : find k ∈ K such that kk − hkH = min {kh − k kH : k ∈ K} The following theorem is fundamental
Theorem 2. If K is a closed subspace of H and h ∈ H, (PK (h)) has a unique solution, characterized by the property k ∈ K and h − k ∈ K⊥
Definition 1. The solution of problem (PK (h)) in the previous theorem is called the orthogonal projection of h on K and will be denoted πK (h).
Proposition 4. πK : H → K is a linear, continuous function and πK⊥ = id − πK . 0 0 Proof. Let d = inf {kh − k kH : k ∈ K}. The proof relies on the following construction: if k, k0 ∈ K, a direct computation shows that 0 2 k + k 0 2 2 0 2 h − + kk − k kH /4 = kh − kkH + kh − k kH /2 2 H
0 2 0 k+k and the fact that (k + k )/2 ∈ K implies h − 2 ≥ d so that H 0 2 2 0 2 2 kk − k kH ≤ kh − kkH + kh − k kH /2 − d .
Now, from the definition of the infimum, one can find a sequence kn in K such that 2 2 −n kkn − hkH ≤ d + 2 for each n. The previous inequality implies that 2 −n −m kkn − kmkH ≤ 2 + 2 /2 which implies that kn is a Cauchy sequence, and therefore converges to a limit k 0 which belongs to K and kk − hkH = d. If the minimum is attained for another k in K, we have, by the same inequality 0 2 2 2 2 kk − k kH ≤ d + d /2 − d = 0 . 10 1. ELEMENTS OF HILBERT SPACE THEORY
0 so that k = k and uniqueness is proved, so that πK is well defined. 0 0 2 Let k = πK (h), and k ∈ K and consider the function f(t) = kh − k − tk kH , 2 0 which is by construction minimal for t = 0. We have f(t) = kh − kkH −2thh − k , k iH + 2 0 2 0 t kk kH and this can be minimal at 0 only if hh − k , k iH = 0. Since this has to be true for every k0 ∈ K, we obtain the fact that h − k ∈ K⊥. Conversely, if k ∈ K and h − k ∈ K⊥, we have for any k0 ∈ K
0 2 2 0 2 2 kh − k kH = kh − kkH + kk − k kH ≥ kh − kkH so that k = πK (h). 0 0 We now prove the proposition: let h, h ∈ H and α, α ∈ R. Let k = πK (h), 0 0 0 0 0 0 k = πK (h ); we want to show that πK (αh + α h ) = αk + α k for which it suffices to prove (since αk + α0k0 ∈ K) that αh + α0h0 − αk − α0k0 ∈ K⊥. But this is true since αh − α0h0 − αk − α0k0 = α(h − k) + α0(h0 − k0), and h − k ∈ K⊥, h0 − k0 ∈ K⊥, and K⊥ is a vector space. Continuity comes from 2 2 2 khkH = kh − πK (h)kH + kπK (h)kH so that kπK (h)kH ≤ khkH . 0 0 ⊥ Finally, if h ∈ H and k = πK (h), then k = πK⊥ (h) is characterized by k ∈ K and h − k0 ∈ (K⊥)⊥. The first property is true for h − k, and for the second, we need to show that K ⊂ (K⊥)⊥ which is a direct consequence of the definition of the orthogonal, and true in fact for any set O. ut
We have the interesting property Corollary 1. K is a sub-Hilbert space of H if and only if (K⊥)⊥ = K Proof. The ⇐ implication is a consequence of theorem 1. The fact that K ⊂ (K⊥)⊥ is obviousy true for any subset K ⊂ H, so that it suffices to show that every element of (K⊥)⊥ belongs to K. Assume that h ∈ (K⊥)⊥: this implies that πK⊥ (h) = 0 but since πK⊥ (h) = h − πK (h), this implies that h = πK (h) ∈ K. ut
4. Orthonormal sequences
A sequence (e1, e2,...) in a Hilbert space H is orthonormal if and only if 2 hei , ejiH = 1 if i = j and 0 otherwise. In such a case, if α = (α1, α2,...) ∈ ` , the P∞ series i=1 αiei converges in H (its partial sums form a Cauchy sequence) and if h is the limit, one has αi = hh , eiiH . Conversely, if h ∈ H then the sequence (hh , e1iH , hh , e2iH ,...) belongs to 2 Pn Pn 2 ` . Indeed, letting hn = i=1 hh , eiiH ei, one has hhn , hiH = i=1 hh , eiiH = 2 khnkH . On the other hand, one has, by Schwartz’s inequality hhn , hiH ≤ khnkH khkH which implies that khnkH ≤ khkH : therefore ∞ X 2 hh , eii < ∞ i=1
Denoting by K = Hilb(e1, e2,...) the smallest Hilbert subspace of H containing this sequence, one has the identity ( ∞ ) X 2 (2) K = αkek :(α1, α2,...) ∈ l n=1 5. THE RIESZ REPRESENTATION THEOREM 11
This is left as an exercise. As a consequence of this, we see that h 7→ (hh , e1iH , hh , e2iH ,...) 2 is an isometry between Hilb(e1, e2,...) and ` . Moreover, we have, for h ∈ H ∞ X πK (h) = hh , eiiH ei i=1
(To prove it, check that h − πK (h) is orthogonal to every ei.) An orthonormal set (e1, e2,...) is said complete in H if H = Hilb(e1, e2,...). In this case, we see that H is itself isometric to `2, and the interesting point is that this is always true: Theorem 3. Every (separable) Hilbert space has a complete orthonormal se- quence. A complete orthonormal sequence in H is also called an orthonormal basis of H. Proof. The proof relies on the important Schmidt orthonormalization proce- dure. Let f1, f2,... be a dense sequence in H. We let e1 = fk1 / kfk1 kH where fk1 is the first non vanishing element in the sequence. Assume that an orthonormal sequence e1, . . . , en have been constructed with a sequence k1, . . . , kn such that ei ∈ Vn = vect(f1, . . . , fki ) for each i. First assume that fk ∈ vect(e1, . . . , en) for all k > kn: then H = Vn is finite dimensional. Indeed, H is equal to the closure of (f1, f2,...) which is included in vect(e1, . . . , en) which is closed, as a finite dimensional vector subspace of H.
Assume now that there exists kn+1 such that fkn+1 6∈ Vn. Then, we may set en+1 = λ fkn+1 − πVn (fkn+1 ) where λ is selected so that en+1 has unit norm, which is always possible. So there are two cases: either the previous construction stops at some point, and H is finite dimensional and the theorem is true, either the process carries on indefinitely, yielding an orthonormal sequence (e1, e2,...) and an increasing sequence of integers (k1, k2,...). But Hilb(e1, e2,...) contains (fn) which is dense in H and therefore is equal to H so that the orthonormal sequence is complete. ut
5. The Riesz representation theorem The dual space of a normed vector space H is the space containing all contin- uous linear functionals ϕ : H → R. It is denoted H∗, and we will often use the notation, for ϕ ∈ H∗ and h ∈ H: (3) ϕ(h) = (ϕ| h) Thus, parentheses indicate linear forms, angles indicate inner products. H being a normed space, H∗ also has a normed space structure defined by:
kϕkH∗ = max {(ϕ| h): h ∈ H, khkH = 1} Continuity of ϕ is in fact equivalent to the finiteness of this norm. 0 0 ∗ When H is Hilbert, the function ϕh : h 7→ hh , h iH belongs to H , and by the Schwartz inequality kϕhkH∗ = khkH . The Riesz-Nagy representation theorem states that there exists no other continous linear form on H. Theorem 4 (Riesz). Let H be a Hilbert space. If ϕ ∈ H∗, there exists a unique h ∈ H such that ϕ = ϕh. 12 1. ELEMENTS OF HILBERT SPACE THEORY
0 0 Proof. Uniqueness comes from the fact that if hh , h iH = 0 for all h ∈ H, then taking h0 = h yields h = 0. To prove existence, we use, as in finite dimension, the orthogonal of the null space of ϕ. So, let K = {h0 ∈ H, (ϕ| h0) = 0}; K is a closed linear subspace of H (because ϕ is linear and continuous). If K⊥ = {0}, ⊥ ⊥ then K = (K ) = H and ϕ = 0 = ϕ0, which proves the theorem in this case. So, we assume that K⊥ 6= 0. ⊥ Now, let h1 and h2 be two non-zero elements in K . We have (ϕ| α1h1 + α2h2) = α1(ϕ| h1) + α2(ϕ| h2), so that it is always possible to find non-vanishing α1 and α2 ⊥ such that (ϕ| α1h1 + α2h2) = 0 since K is a vector space. But this implies that α1h1 + α2h2 also belongs to K, which is only possible when α1h1 + α2h2 = 0. Thus ⊥ for any h1, h2 ∈ K , there exists a (α1, α2) 6= (0, 0) such that α1h1 + α2h2 = 0, so ⊥ ⊥ that K has dimension 1. Fix h1 ∈ K such that, say kh1kH = 1. Using orthog- onal projections, and the fact that K is closed, any vector in H can be written: h = hh , h1ih1 + k with k ∈ K. This implies
(ϕ| h) = hh , h1i(ϕ| h1) so that ϕ = ϕ(ϕ| h1)h1 . ut
6. Embeddings and the duality paradox
6.1. Definition. Assume that H and H0 are two Banach spaces. An embed- ding of H in H0 is a continuous, one to one, linear map from H to H0, i.e., a map ι : H → H0 such that, for all h ∈ H, (4) kι(h)k ≤ C khk H0 H
The embedding is compact when the set {ι(h), khkH ≤ 1} is compact in H0. In the separable case (to which we restrict), this means that for any bounded sequence (hn, n > 0) in H, there exists a subsequence of (ι(hn)) which converges in H0. In all the applications we will be interested in, H and H0 will be function spaces, and we will have a set inclusion H ⊂ H0. For example H may be a set of smooth functions and H0 a set of less smooth functions (see the examples of embeddings below). Then, one says that H is embedded (resp. compactly embedded) in H0 if the canonical inclusion map: ι : H → H0 is continuous (resp. compact). If ϕ is a continuous linear form on H0 and ι : H → H0 is an embedding, then one can define the form ι∗ϕ on H by (ι∗ϕ| h) = (ϕ| ι(h)). Let’s verify that ι∗ϕ is continuous on H. We have, for all h ∈ H: ∗ |(ι ϕ| h)| = |(ϕ| ι(h))| ≤ kϕk ∗ kι(h)k ≤ C kϕk ∗ khk H0 H0 H0 H where the first inequality comes from the continuity of ϕ and the last one from (4). This proves the continuity of ι∗ϕ as a linear form on H together with the inequality: ∗ kι ϕk ∗ ≤ C kϕk ∗ . H H0 We have the following theorem:
Theorem 5. If ι : H → H0 is a Banach space embedding, such that ι(H) is ∗ ∗ ∗ dense in H0. Then the map ι : H0 → H also is an embedding. Proof. It only remains to prove that ι∗ is one-to-one, this relying on ι(H) ∗ ∗ 0 0 ∗ being dense in H0. Assume that ι ϕ = ι ϕ , for ϕ, ϕ ∈ H0 . This implies that, for 0 0 all h ∈ H,(ϕ| ιh) = (ϕ | ιh), i.e., that (ϕ − ϕ | h0) = 0 for any h0 ∈ ι(H). This 6. EMBEDDINGS AND THE DUALITY PARADOX 13
0 latter set being dense in H0 and ϕ and ϕ being continuous, this is only possible when ϕ = ϕ0. This proves that ι∗ is one-to-one. ut 6.2. Examples. 6.2.1. Banach spaces of continuous functions. Let Ω be an open subset of Rd. The space of continous functions on Ω with at least p continuous derivatives will be denoted Cp(Ω, R). If Ω is bounded, and thus Ω is compact, Cp(Ω, R) is the set of functions on Ω which are p times differentiable on Ω, each partial derivative being extended to a continuous function on Ω; Cp(Ω, R) has a Banach space structure when equipped with the norm
kfkp,∞ = max khk∞ h where h varies in the set of partial derivatives of f or order lower or equal to p. p p We let C0 (Ω, R) be the set of functions f in C (Ω, R) with vanishing derivatives p up to order p on ∂Ω or at infinity. It is a closed subspace of C0 (Ω, R) and therefore also a Banach space. We obviously have, almost by definition, the fact that Cp(Ω, R) is embedded in Cq(Ω, R) as soon as p ≤ q. In fact the embedding is compact when p < q. If Ω is bounded, compact sets in C0(Ω, R) are exactly described by Ascoli’s theorem: they are bounded subsets M ⊂ C0(Ω, R) (for the supremum norm) which are uniformly continuous, meaning that, for any x ∈ Ω, for any ε > 0 there exists η > 0 such that, sup sup |h(x) − h(y)| < ε . y∈Ω,|x−y|<η h∈M Compact sets in Cp(Ω, R) are bounded subsets of Cp(Ω, R) over which all the pth partial derivatives are uniformly continuous. For unbounded Ω, one defines the norm
kfkM,p,∞ = max max |h(x)|. h partial derivative of f |x|≤M p One says that fn → f in C on compact sets if kfn − fkM,p,∞ → 0 for all M. 6.2.2. Hilbert Sobolev spaces. Let Ω ⊂ Rd. We now define the space H1(Ω, R) of functions with square integrable weak derivatives. A function u belongs to this set if and only if u ∈ L2(Ω, R), and for each i = 1, . . . , k, there exists a function 2 ∞ ui ∈ L (Ω, R) such that for any function ϕ C with compact support in Ω, one has Z ∂ϕ Z u(x) (x)dx = − ui(x)ϕ(x)dx Ω ∂xi Ω The function ui is called the (weak) partial derivative of u, and will be denoted ∂u . The integration by parts formula shows that this coincides with the standard ∂xi partial derivative when it exists. H1(Ω, Rd) has a Hilbert space structure with the inner product: k X ∂u ∂v hu , vi = hu , vi + , . H1 L2 ∂x ∂x i=1 i i L2 The space Hm(Ω, R) can now be defined by induction as the set of functions f ∈ H1(Ω, R) with all partial derivatives belonging to Hm−1(Ω, R). Partial derivatives of increasing order are defined by induction, and the inner product on Hm is the sum of the L2 inner products of all partial derivatives up to order m. 14 1. ELEMENTS OF HILBERT SPACE THEORY
There are theorems which ensure embeddings in classic spaces of functions. We will be interested with a special case of Morrey’s theorem, which is stated below:
Theorem 6. Let Ω ⊂ Rd, and assume that m − d/2 > 0. Then, for any j ≥ 0, Hj+m(Ω, R) is embedded in Cj(Ω, R). If Ω is bounded, the embedding is compact. Moreover, if θ ∈]0, m−d/2], and u ∈ Hj+m(Ω, R), then every partial derivative, h, of order j has H¨olderregularity θ: for all x, y ∈ Ω θ |h(x) − h(y)| ≤ C kukHm+j |x − y| . m m As a final definition, we let H0 (Ω, R) be the completion in H0 (Ω, R) of the ∞ m set of C functions with compact support in Ω: u belongs to H0 (Ω, R) if and only m ∞ if u ∈ H (Ω, R) and there exists a sequence of functions un, C with compact support in Ω such that ku − unkHm tends to 0. A direct application of Morrey’s j+m theorem shows that, if m − k/2 > 0, then, for any j ≥ 0, H0 (Ω, R) is embedded j in C0 (Ω, R). 6.3. The duality paradox. The Riesz representation theorem identifies a Hilbert space H to its dual H∗. However, when H is (densely) embedded in another Hilbert space H0, every continuous linear form on H0 is also continuous on H, and ∗ ∗ H0 is embedded in H . We therefore have the sequence of embeddings ∗ ∗ H → H0 ' H0 → H ∗ but this sequence loops since H ' H. This indicates that H0 is also embedded 1 in H. This is indeed a strange result: for example, let H = H (Ω, R), and H0 = 2 L (Ω, R): the embedding of H in H0 is clear from their definition, but the converse does not seem natural, since there are more contraints in belonging to H than to H0. To understand this reversed embedding, we must think in terms of linear forms. 2 If u ∈ L (Ω, R), we may consider the linear form ϕu defined by Z (ϕu| v) = hu , viL2 = u(x)v(x)dx Ω When v ∈ H1(Ω, R), we have
(ϕu| v) ≤ kukL2 kvkL2 ≤ kukL2 kvkH1 1 so that ϕu is continuous when seen as a linear form on H (Ω, R). The Riesz representation theorem implies that there exists au ˜ ∈ H1(Ω, R) such that, for all 1 v ∈ H (Ω, R), hu , viL2 = hu˜ , viH1 : the relation u 7→ u˜ provides the embedding of L2(Ω, R) into H1(Ω, R). Let us be more specific and take Ω =]0, 1[. The relation states that, for any v ∈ H1(Ω, R), Z 1 Z 1 Z 1 u˜0(t)v0(t)dt + u˜(t)v(t)dt = u˜(t)v(t)dt 0 0 0 Let us make the simplifying assumption thatu ˜ has two derivatives, in order to integrate by part the first integral and obtain Z 1 Z 1 Z 1 (5) u0(1)v(1) − u0(0)v(0) − u˜00(t)v(t) + u˜(t)v(t)dt = u˜(t)v(t)dt 0 0 0 Such an identy can be true for every v in H1(Ω, R) if and only if u0(0) = u0(1) = 0 and −u˜00 +u ˜ = u:u ˜ is thus a solution of a second order differential equation, with first order boundary conditions, and the embedding of L2(]0, 1[, R) into H1(]0, 1[, R) 7. WEAK CONVERGENCE IN A HILBERT SPACE 15 just shows that a unique solution exists, at least in the generalized sense of equation (5). As seen in these examples, even if, from an abstract point of view, we have an identification between the two Hilbert spaces, the associated embeddings are of very different nature, the first one corresponding to a set inclusion (it is canonical), the second to the solution of a differential equation in one dimension, and in fact to a partial differential equation in the general case. Another striking application of the Riesz representation theorem is presented in the section 8, which will be our first incursion in the theory of deformations.
7. Weak convergence in a Hilbert space Let us start with the definition:
Definition 2. When V is a Banach space, a sequence (vn) in V is said to converge weakly to some v ∈ V if and only if, for all continous linear form α : V → R, one has α(vn) → α(v) when n tends to infinity. The following simple proposition will be useful later:
Proposition 5. Assume that V and W are Banach spaces and that W is (continuously) embedded in V . Then, if a sequence (wn) in W weakly converges (in W ) to some w ∈ W , then wn also weakly converges to w in V .
This just says that if α(wn) → α(w) for all continuous linear functionals on W , then the convergence holds for all continuous linear functional on V , which is in fact obvious because the restriction to W of any continuous linear functional on V is a continous linear functional on W . In the case of a Hilbert space, the Riesz representation theorem immediately provides the proposition:
Proposition 6. Let V be a Hilbert space. A sequence vn in V weakly converges to an element v ∈ V if and only if, for all w ∈ V ,
lim hw , vni = hw , vi n→∞ V V
Moreover, in vn weakly converges to v, then kvk ≤ lim inf kvnk.
The last statement comes from the inequality: hvn , viV ≤ kvnkV kvkV which provides at the limit 2 kvk ≤ kvk lim inf kvnk V V n→∞ V
If kvkV = 0, there is nothing to prove, and in the other case, on can divide both terms of the inequality by kvkV to obtain the result. Finally, the following result is essential for us ([28])
Theorem 7. If V is a Hilbert space and (vn) is a bounded sequence in V (there exists a constant C such that kvnk ≤ C for all n), then one can extract a subsequence from vn which weakly converges to some v ∈ V . 16 1. ELEMENTS OF HILBERT SPACE THEORY
8. Spline interpolation
8.1. The scalar case. Let Ω ⊂ Rd Consider a Hilbert space V included in L2(Ω, R). We assume that elements of V are smooth enough, and require the inclusion and the canonical embedding of V in C0(Ω, Rd). For example, it suffices (from Morrey’s theorem) that V ⊂ Hm(Ω, Rd) for m > k/2. This implies that there exists a constant C such that, for all v ∈ V
kvk∞ ≤ C kvkV . We make another assumption on V . We assume that a relation of the kind PN i=1 αiv(xi) = 0 cannot be true for every v ∈ V unless α1 = ... = αN = 0, (x1, . . . , xN ) being an arbitrary family of distinct points in Ω. The next section will investigate some practical methods to define such a set V . Each x in Ω specifies a linear form δx defined by (δx| v) = v(x) for x ∈ V . We have
|(δx| v)| ≤ kvk∞ ≤ C kvkV so that δx is continuous on V . But this implies, by Riesz’s theorem, that there exists an element Kx in V such that, for every v ∈ V , one has
(6) v(x) = hKx , viV
Since Kx belongs to V , it is a continuous function y 7→ Kx(y). This also defines a function, denoted K :Ω × Ω → R by K(y, x) = Kx(y). This function K has several interesting properties. First, applying equation (6) to v = Ky yields
K(x, y) = Ky(x) = hKx ,KyiV Since the last term is symmetric, we have K(x, y) = K(y, x), and because of the obtained identity, K is called the reproducing kernel of V . A second property is the fact that K is positive definite, in the sense that, for any family x1, . . . , xN ∈ V and any sequence α1, . . . , αN in R, the double sum N X αiαjK(xi, xj) i,j=1 is always positive, ans vanishes if and only if all αi equals 0. Indeed, by the 2 PN reproducing property, this sum may be written i=1 αiKxi and this is positive. V PN If it vanishes, then i=1 αiKxi = 0, which implies, by equation (6), that, for every PN v ∈ V , one has i=1 αiv(xi) = 0, and our original assumption on V implies that α1 = ··· = αN = 0. This kernel will help us to solve the following interpolation problem: (S) Fix a family of distinct points x1, . . . , xN in Ω. We want to determine a function v ∈ V of minimal norm, subject to the constraints v(xi) = λi, for some prescribed values λ1, . . . , λN ∈ R. To solve this problem, define V0 to be the set of v for which the constraints vanish
V0 = {v ∈ V : v(xi) = 0, i = 1,...,N} Using the kernel K, we may write
V0 = {v ∈ V : hKxi , vi = 0, i = 1,...,N} 8. SPLINE INTERPOLATION 17 so that ⊥ V0 = vect {Kx1 ,...,KxN } the orthogonal being taken for the V inner product. We have the first result
⊥ Lemma 1. If there exists a solution vˆ of problem S, then vˆ ∈ V0 = vect {Kx1 ,...,KxN }. ⊥ Moreover, if vˆ ∈ V0 is a solution of S restricted to this set, then it is a solution of S on V . Proof. Letv ˆ be this solution, and let v∗ be its orthogonal projection on vect {Kx1 ,...,KxN }. From the properties of orthogonal projections, we havev ˆ − ∗ ∗ v ∈ V0, which implies, by the definition of V0 thatv ˆ(xi) = v (xi) for i = 1,...,N. ∗ But, since kv kV ≤ kvˆkV (by the variational definition of the projection), kvˆkV ≤ ∗ ∗ kv kV , by assumption, both norms are equal, which is only possible whenv ˆ = v . ⊥ Therefore,v ˆ ∈ V0 and the proof of the first assertion is complete. ⊥ Now, ifv ˆ is a solution of S in which V is replaced by V0 , then, if v is any 2 2 function in V which satisfies the constraints, then v − vˆ ∈ V0 and kvkV = kvˆkV + 2 kv − vˆkV which shows thatv ˆ is a solution of the initial problem. ut This lemma allows us to restrict the search for a solution of S to the set of linear combination of Kx1 ,...,KxN which places us in a very comfortable finite PN dimensional situation. We look forv ˆ under the formv ˆ = i=1 αiKxi , which may also be written N X vˆ(x) = αiK(x, xi) i=1 and we introduce the N ×N matrix S such that sij = K(xi, xj). The whole problem T may now be reformulated in function of the vector α = (α1, . . . , αN ) (a column vector) and of the matrix S. Indeed, by the self reproducing property of K, we have N 2 X T (7) kvˆkV = αiαjK(xi, xj) = α Sα i=1 PN and each contraint may be written λi = v(xi) = j=1 αjK(xi, xj), so that, letting T λ = (λ1, . . . , λN ) , the whole system of constraints may be expressed as Sα = λ. But our hypotheses imply that S is invertible: indeed, if Sα = 0, then αT Sα = 0 which, by equation (7) and the positive definiteness of K, is only possible when ⊥ α = 0 (we assume that the xi are distincts). Therefore, there is only onev ˆ in V0 which satisfies the constraints, and it corresponds to α = S−1λ. These results are summarized in the next theorem. Theorem 8. Problem S has a unique solution in V , given by N X vˆ(x) = K(x, xi)αi i=1 with −1 α1 K(x1, x1) ...K(x1, xN ) λ1 . . . . . . = . . . . αN K(xN , x1) ...K(xN , xN ) λN 18 1. ELEMENTS OF HILBERT SPACE THEORY
Another important variant of the same problem comes when the hard con- straints v(xi) = λi are replaced by soft constraints, under the form of a penalty function added to the minimized norm. This may be expressed as the minimization of a function of the form N 2 X E(v) = kvkV + C ϕ(|v(xi) − yi|) i=1 for some increasing, convex function on [0, +∞[ and C > 0. Since the second term of E does not depend on the projection of v on V0, Lemma 1 remains valid, reducing again the problem to finding v of the kind N X v(x) = K(x, xi)αi i=1 for which N N N X X X E(v) = αiαjK(xi, xj) + C ϕ K(xi, xj)αj − λi
i,j=1 i=1 j=1 Assume, to simplify, that ϕ is differentiable and ϕ0(0) = 0. We have, letting ψ(x) = sign(x)ϕ0(x), N N N ! ∂E X X X = 2 αiK(xi, xj) + C K(xi, xj)ψ K(xi, xl)αl − λi ∂αj i=1 i=1 l=1 −1 ∂E Assuming, still, that the xi are distincts, we can apply S to the system = ∂αj 0, j = 1,...,N, which characterizes the minimum, yielding N X (8) 2αi + Cψ K(xi, xj)αj − λi = 0
j=1 This suggest an algorithm (which is a certain kind of gradient descent) to minimize E, that we can describe as follows: Algorithm 1 (General Spline smoothing).
Step 0 Start with an initial guess α0 for α step t.1 Compute, for i = 1,...,N, N t+1 t γC X α = (1 − γ)αi − ψ K(xi, xj)αj − λi i 2 j=1 where γ is a fixed, small enough, real number. step t.2 Use a convergence test: if positive, stop, otherwise increment t and restart step t. This algorithm will converge if γ is small enough. However, the particular case of ϕ(x) = x2 is much simpler to solve, since in this case ψ(x) = 2x and equation (8) becomes N X 2αi + 2C K(xi, xj)αj − λi = 0 j=1 8. SPLINE INTERPOLATION 19
The solution of this equation is α = (S + I/C)−1λ, yielding a result very similar to theorem 8: Theorem 9. The minimum over V of N 2 X 2 kvkV + C |v(xi) − λi| i=1 is attained at N X vˆ(x) = K(x, xi)αi i=1 with α 1 λ1 . −1 ... . = (S + I/C) λ αN N and K(x1, x1) ...K(x1, xN ) . . . S = . . . K(xN , x1) ...K(xN , xN ) 8.2. The vector case. In the previous section, elements of V were functions from Ω to R. When working with deformations, which is our goal here, functions of interest describe displacements of points in Ω and therefore must be vector valued. This leads us to address the problem of spline approximation for vector fields in Ω, which, as will be seen, is handled very similarly to the scalar case. So, in this section, V is a Hilbert space, canonically embedded in L2(Ω, Rd) and in C0(Ω, Rd). Fixing x ∈ Ω, the evaluation function v 7→ v(x) is a continuous linear map from V to Rd. This implies that, for any a ∈ Rd, the function v 7→ aT v(x) is a continuous linear functional on V . We will denote this linear form by a ⊗ δx, so that T (9) (a ⊗ δx| v) = a v(x). a From Riesz theorem, there exists a unique element, denoted Kx in V such that, for any v ∈ V a T (10) hKx , viV = a v(x) a d T The map a 7→ Kx is linear from R to V (this is because a 7→ a v(x) is linear and because of the uniqueness of the Riesz representation), which implies that, for y ∈ Ω, a d d the map a 7→ Kx (y) is linear from R to R . This implies that there exists a matrix, d a that we will denote K(y, x), such that, for a ∈ R , x, y ∈ Ω, Kx (y) = K(y, x)a. In a the following, we will preferably use the notation Kx = K(., x)a to emphasize the matricial nature of K. The kernel K, in the case of vector fields, being matrix valued, the self repro- ducing property, in this case, is T b T hK(., x)a , K(., y)biV = a Ky(x) = a K(x, y)b By summetry of the first term, we obtain the fact that, for all a, b ∈ Rd, aT K(x, y)b = bT K(y, x)a which implies that K(y, x) = K(x, y)T . Concerning the positivity of K, we make an assumption similar to the scalar case: 20 1. ELEMENTS OF HILBERT SPACE THEORY
d Assumption 1. If x1, . . . , xN ∈ Ω and α1, . . . , αN ∈ R are such that, for all T T v ∈ V , α1 v(x1) + ··· + αN v(xN ) = 0, then α1 = ··· = αN = 0. d Under this assumption, it is easy to prove that, for all α1, . . . , αN ∈ R , N X T αi K(xi, xj)αj ≥ 0 i,j=1 with equality if and only if all αi vanish. The interpolation problem in the vector case becomes N (Sd) Given x1, . . . , xN in Ω, λ1, . . . , λN in R , find v in V , with minimum norm such that v(xi) = λi. As before, we let V0 = {v ∈ V : v(xi) = 0, i = 1,...,N} Then, lemma 1 remains valid:
⊥ Lemma 2. If there exists a solution vˆ of problem Sd, then vˆ ∈ V0 . Moreover, ⊥ if vˆ ∈ V0 is a solution of S restricted to this set, then it is a solution of S on V . ⊥ We add to this the characterization of V0 , which slightly less straightforward than in the scalar case: Lemma 3. ( N ) ⊥ X d V0 = v = K(., xi)αi, α1, . . . , αN ∈ R i=1 ⊥ Thus, a vector field v belongs to V0 if and only if there exists α1, . . . , αN in Rd such that, for all x ∈ Ω, N X v(x) = K(x, xi)αi i=1 This expression is formally similar to the scalar case, with the difference that K(x, xi) is a matrix and αi is a vector.
Proof. It is clear that w ∈ V0 is equivalent to the fact that, for any α1, . . . , αN , one has N X T αi v(xi) = 0 i=1 PN so that w ∈ V0 is equivalent to hv , wiV = 0 for all v of the kind v = i=1 K(., xi)αi. Thus ( N )⊥ X d V0 = v = K(., xi)αi, α1, . . . , αN ∈ R i=1 n PN do and since v = i=1 K(., xi)αi, α1, . . . , αN ∈ R is finite dimensional, hence closed, one has ( N ) ⊥ X d V0 = v = K(., xi)αi, α1, . . . , αN ∈ R i=1 ut 8. SPLINE INTERPOLATION 21
PN ⊥ When v = j=1 K(., xj)αj ∈ V0 , the constraint v(xi) = λi yields N X K(xi, xj)αj = λi j=1 Since we also have, in this case, N 2 X T kvkV = αi K(xi, xj)αj i,j=1 the whole problem can be rewritten quite concisely under a matricial form, intro- ducing the notation K(x1, x1) ...K(x1, xN ) . . . (11) S = S(x1, . . . , xN ) = . . . K(xN , x1) ...K(xN , xN ) which is now a block matrix of size Nd × Nd, α1 λ1 . . α = . , λ = . αN λN each αi, λi being considered as d dimensional column vectors.. The whole set of 2 T constraints now becomes Sα = λ and kvkV = α Sα. Thus, replacing scalars by blocks, the problem has exactly the same structure as in the previous case, and we can repeat the results we have obtained:
Theorem 10 (interpolating splines). Problem (Sd) has a unique solution in V , given by N X vˆ(x) = K(x, xi)αi i=1 with −1 K(x , x ) ...K(x , x ) α1 1 1 1 N λ1 . . . . . . = . . . . . . α . λ N K(xN , x1) . K(xN , xN ) N Theorem 11 (Smoothing splines). The minimum over V of N 2 X 2 kvkV + C |v(xi) − λi| i=1 is attained at N X vˆ(x) = K(x, xi)αi i=1 with α1 λ1 . −1 . . = (S + I/C) . αN λN and S = S(x1, . . . , xN ) given at equation (11). 22 1. ELEMENTS OF HILBERT SPACE THEORY
Finally, algorithm 1 remains the same, replacing scalars by vectors.
9. Building V and its kernel 9.1. Building V from operators. One way to define a Hilbert space V of smooth functions of vector fields is to use inner-products associated to operators. In this framework, an inner product, defined from the action of an operator, is defined on a subset of L2(Ω, R) (within which everything can be defined easily, i.e., in a classical sense when speaking of derivatives), but which is not complete for the induced norm. This subspace is then extended to a larger subspace of L2(Ω, R), which will be our Hilbert space V . This is called Friedrich’s extension of an operator. Since it is not restricted to subspaces of L2(Ω, R), we work in the following with an arbitrary Hilbert space H. To start, we need a subspace D, included in H and dense in this space, and an operator (i.e., a linear functional), L : D → H. Our typical application will be ∞ d ∞ with D = CK (Ω, R ) (the set of C functions with compact support in Ω) and H = L2(Ω, Rd). In such a case, L may be chosen as a differential operator of any degree, since derivatives of C∞ functions with compact support obviously belong to L2. However, L will be assumed to satisfy an additional monotonicity constraint, which is Assumption 2. The operator L is assumed to be symmetrical and strongly monotone on D, which means that there exists a constant c > 0 such that, for all u ∈ D,
(12) hu , LuiH ≥ chu , uiH and for all u, v ∈ D
(13) hu , LviH = hLu , viH . ∞ An example of strongly monotonic operator on CK (Ω, R) is given by Lu = Pk ∂2u −∆u + λu where ∆ is the Laplacian: ∆u = i=1 2 . Indeed, in this case, and ∂xi when u has compact support, an integration by parts yields
k 2 Z X Z ∂u − ∆u(x)u(x)dx = dx ≥ 0 ∂x Ω i=1 Ω i so that hu , LuiL2 ≥ λhu , uiL1 . Returning to the general case, the operator L induces an inner product on D, defined by
hu , viL = hu , LviH Assumption 2 ensures the symmetry of this product and its positive definiteness.
But D is not complete for k.kL, and we need to enlarge it (and simultaneously extend L) to obtain a Hilbert space. Theorem 12 (Freidrich’s extension). There exists a subspace V ⊂ H such that D is a dense subspace of V , and L can be extended to an operator Lˆ : V → V ∗ such that: • V is embedded in H ˆ • If u, v ∈ D, hu , viV = hLu , viH = Lu| v 9. BUILDING V AND ITS KERNEL 23
The fact that Lˆ is an extension of L comes modulo the identification H = H∗. Indeed, we have V ⊂ H = H∗ ⊂ V ∗ (by the “duality paradox”), so that, L, defined on D ⊂ V can be seen as an operator with values in H∗. Definition 3. The operator Lˆ defined in Theorem 12 is called the energetic extension of L. It restriction to the space
n ∗ o DL = u ∈ V : Luˆ ∈ H = H is called the Freidrich’s extension of L. The Freidrich extension of L will still be denoted L in the following. We will not prove this theorem, but the interested reader may refer to [28]. The Freidrich extension has other interesting properties:
Theorem 13. L : VL → H is bijective and self-adjoint (hLu , viH = hu , LviH for all u, v ∈ VL). Its inverse, L−1 : H → H is continuous and self-adjoint. If the embedding V ⊂ H is compact, then L−1 : H → H is compact. In the following, we will mainly be interested in embeddings stronger than the L2 embedding implied by the monotony assumption. It is important than such embeddings are conserved by the extension whenever they are true in the initial space D. This is stated in the next proposition Proposition 7. Let D,V and H be as in Theorem 12, and B be a Banach space such that D ⊂ B ⊂ H and B is canonically embedded in H (there exists c1 > 0 such that kuk ≥ c1 kuk ). Assume that there exists a constant c2 such pB H that, for all u ∈ D, hLu , uiH ≥ c2 kukB. Then V ⊂ B and kukV ≥ c2 kukB for all u ∈ V . In particular, if B is compactly embedded in H, then L−1 : H → H is compact.
Proof. Let u ∈ V . Since D is dense in V , there exists a sequence un ∈ D such that kun − ukV → 0. Thus un is a Cauchy sequence in v and, by our assumption, 0 0 it is also a Cauchy sequence in B, so that there exists u ∈ B such that kun − u kB tends to 0. But since V and B are both embedded in H, we have kun − ukH → 0 0 0 and kun − u kH → 0 which implies that u = u . Thus u belongs to B, and since kunkV and kunkB respectively converge to kukV and kukB, passing to the limit in the inequality kunkV ≥ c2 kunkB ends the proof of proposition 7. ut We now investigate the relation between this theory and the previous analyses ∞ 2 using kernels. From now on, we assume D = CK (Ω) and H = L (Ω, R). The Hilbert space V will have the same role in both cases, but we need the continuity of the evaluation of functions in V , which here leads to assuming that there exists a constant such that, for all u ∈ D, 2 (14) hLu , uiL2 ≥ c kuk∞ since, as we have seen, this implies that V will be continuously embedded in 0 C (Ω, R). Thus, the kernel K is well defined on V , with the property that hKx , viV = ∗ v(x) = (δx| v). The linear form δx belongs to V and, by theorem 12, which implies ˆ ˆ that hKx , viV = LKx| v , we have LKx = δx, or
−1 Kx = Lˆ δx 24 1. ELEMENTS OF HILBERT SPACE THEORY
This exhibits a direct relationship between the self-reproducing kernel on V and the energetic extension of L. For this reason, we will denote (with some abuse −1 ∗ of notation) K = Lˆ : V → V and write Kx = Kδx. In the vector case, we will make the identification K(., x)a = K(a ⊗ δx). There is another consequence, which stems from the fact that C0(Ω, R) is com- pactly embedded in L2(Ω, R), so that Theorem 13 implies that L−1 is a compact, self-adjoint operator. Such operators have the important property to admit an orthonormal sequence of eigenvectors: more precisely, there exits a decreasing se- quence, (ρn) of positive numbers, which is either finite or tends to 0, and an or- 2 2 thonormal sequence ϕn in L (Ω, R), such that, for u ∈ L (Ω, R) ∞ −1 X L u = ρnhu , ϕniL2 ϕn n=1
This directly characterizes VL as the set ( ∞ 2 ) X hu , ϕni 2 V = u ∈ L2(Ω, ): L < ∞ L R ρ2 n=1 n and for u ∈ VL, we have ∞ X −1 Lu = ρn hu , ϕniL2 ϕn n=1 so that, for u, v ∈ VL ∞ X −1 hu , viV = hLu , viL2 = ρn hu , ϕniL2 hu , ϕniL2 . n=1 This indicates that V should be given by ( ∞ ) 2 X −1 2 V = u ∈ L (Ω, R): ρn hu , ϕniL2 < ∞ n=1 PN This is indeed the case; VL is dense in this set: if u ∈ V , then uN = n=1 hu , ϕniϕn belongs to VL and kuN − ukV → 0. We summarize what we have just obtained in the following theorem.
∞ 2 Theorem 14. Assume that D = CK (Ω) and H = L (Ω, R), and L : D → H is symmetric and satisfies 2 hLu , uiL2 ≥ c kuk∞ for some constant c > 0. Then the energetic space of L, V , is continuously em- 0 ˆ−1 bedded in C (Ω, R) and its self-reproducing kernel is Kx = L δx. Moreover, there 2 exists an orthonormal basis, (ϕn) in L (Ω, R) and a decreasing sequence of positive numbers, (ρn), which tends to 0 such that ( ∞ ) 2 X −1 2 V = u ∈ L (Ω, R): ρn hu , ϕniL2 < ∞ n=1 Moreover, ∞ X −1 Lu = ρn hu , ϕniL2 ϕn n=1 9. BUILDING V AND ITS KERNEL 25 whenever
∞ 2 X hu , ϕni 2 L < ∞ ρ n=1 n
9.2. Invariance of the inner product. 9.2.1. The operator side. We haven’t discussed so far under which criteria the inner product (or the operator L) should be selected. One important criterion, in particular for shape recognition is equivariance or invariance with respect to rigid transformations. We here consider the situation Ω = Rd. To understand the requirement, return to the interpolation problem for vector fields. Let x1, . . . , xN ∈ Ω and the vectors v1, . . . , vN be given. We want the optimal interpolation h to be invariant by an orthogonal change of coordinates. This means that, if A is a rotation and b ∈ Rd, and if we consider the problem associated to the points Ax1 +b, . . . .AxN +b and the constraints Av1, . . . , AvN , the obtained function hA,b must satisfy hA,b(Ax+b) = Ah(x), which can also be written hA,b(x) = Ah(A−1x − A−1b) := ((A, b) ? h)(x). The transformation h → (A, b) ? h is an action of the group of translations-rotations on functions defined on Ω. To ensure that hA,b always satisfies this condition, it is sufficient to enforce that the norm is invariant by this transformation: k(A, b) ? hkV = khkV . So, we need the action of translations and rotations to be isometric on V . 2 Assume that khkV = (Lh| h) for some operator L, and define the operator (A, b)∗L by the fact that, for all h, h˜ ∈ V
L((A, b) ? h)| (A, b) ? h˜ =: ((A, b) ?L)h| h˜ so that, the invariance condition becomes expressed in terms of L:(A, b) ?L = L. Denoting hk the kth coordinate of h, and (Lh)k the kth coordinate of Lh, we can write the matrix operator L under the form
d X (Lh)k = Lklhl l=1
where Lkl is a scalar operator. We consider the case when each Lkl is a pseudo- differential operator in the sense that, for any C∞ scalar function u, the Fourier d transform of Lklu takes the form, for ξ ∈ R :
F (Lklu)(ξ) = fkl(ξ)ˆu(ξ).
for some scalar function fkl. Here and in the following, the Fourier transform of a function u is denoted eitheru ˆ or F (u). An interesting case is when fkl is a polynomial, which corresponds, from standard properties of the Fourier transform, to Lkl being a differential operator. Also, by the isometric property of Fourier 26 1. ELEMENTS OF HILBERT SPACE THEORY transforms,
d X (Lh| h) = ((Lh)k| hk) k=1 d X = (Lklhl| hk) k,l=1 d X = (F (Lklhl)| F (hk)) k,l=1 d X ˆ ˆ = fklhl| hk . k,l=1 Note that since we want L to be a positive operator, we need the matrix f(ξ) = (fkl(ξ)) to be a complex, positive, Hermitian matrix. Moreover, since we want (Lh| h) ≥ c(h| h) for some c > 0, it is natural to enforce that the smallest eigenvalue of f(ξ) is uniformly bounded from below (by c). To use this expression, we need to compute F ((A, b) ? h) in function of hˆ. We have
Z T F ((A, b) ? h)(ξ) = e−iξ x(A, b) ? h(x)dx d R Z T = e−iξ xAh(A−1x − A−1b)dx d R Z T = e−iξ (Ay+b)Ah(y)dy d R T = e−iξ bAh(AT ξ) where we have used the fact that det A = 1 (since A is a rotation matrix) in the change of variables. We can therefore write, denoting ϕb the function ξ 7→ exp(−iξT b)
d d X X ˆ T ˆ T (L(A ? h)| A ? h) = ϕbfklall0 hl0 ◦ A | ϕbakk0 hk0 ◦ A k,l=1 k0,l0=1 d d X X ˆ ˆ = fkl ◦ Aall0 hl0 | akk0 hk0 . k,l=1 k0,l0=1
T −1 We have used the fact that det A = 1 and that A = A . The function ϕb cancels in the complex product: Z (u| v) = uvdξ¯ d R wherev ¯ is the complex conjugate of v. Reodering the last equation, we obtain
d d X X ˆ ˆ (L(A ? h)| A ? h) = akk0 fkl ◦ Aall0 hl0 | hk0 k0,l0=1 k,l=1 9. BUILDING V AND ITS KERNEL 27 which yields our invariance condition in terms of the fkl: for all k, l
d X fkl = ak0kfk0l0 ◦ Aal0l k0,l0=1 or representing f as a complex matrix AT f(Aξ)A = f(ξ) for all ξ and all rotation matrices A. We now investigate the consequences of this constraint. First consider the case ξ = 0. The identity AT f(0)A = f(0) for all rotations A implies, by a standard linear algebra theorem, that f(0) = λId for some λ ∈ R. Let’s now fix ξ 6= 0, and let η1 = ξ/|ξ|. Extend η1 into an orthonormal basis, (η1, . . . , ηd) and let (e1, . . . , ed) d be the canonical basis of R . Define A0 by A0ηi = ei; this yields A0ξ = |ξ|e1 and T f(ξ) = A0 f(|ξ|e1)A0 = M0. Now, for any rotation A that leaves η1 invariant, we T T also have f(ξ) = A A0 f(|ξ|e1)A0A since A0Aη1 = e1. This yield the fact that, for T any A such that Aη1 = η1, we have A M0A = M0. T This property implies, in particular that A M0η1 = M0η1 for all A such that ⊥ Aη1 = η1. Taking a decomposition of M0η1 on η1 and η1 , it is easy to see that this implies that M0η1 = αη1 for some α. So η1 is an eigenvector of M0, which implies ⊥ T that η is closed under the action of M0. But the fact that A M0A = M0 for any ⊥ rotation A of η1 implies that M0 restricted to η1 ⊥ is a homothetie, i.e., that there exists λ such that M0ηi = ληi for i ≥ 2. Using the decomposition u = hu , η1iη1 + u − hu , η1iη1, we can write
T M0u = αhu , η1iη1 + λ(u − hu , η1iη1) = ((α − λ)η1η + λId)u which finally yields the fact that (letting µ = α − λ and using the fact that the T eigenvalues of M0 = A f0(kxike1)A are the same as those of f0(|ξ|e1) and therefore only depend on |ξ|) ξξT f(ξ) = µ(|ξ|) + λ(|ξ|)Id. |ξ|2 Note that, since we know that f(0) = λId, we must have µ(0) = 0. We summarize this in the following theorem.
Theorem 15. Let L be a matricial operator such that there exists a matrix ∞ d d valued function f such that, for any u ∈ CK (R , R ), we have
d X F (Lu)k (ξ) = fkl(ξ)ˆu(ξ). l=1 Then L is rotation invariant if and only if f tales the form ξξT f(ξ) = µ(|ξ|) + λ(|ξ|)Id. |ξ|2 for some functions λ and µ such that µ(0) = 0.
Let’s first consider the case µ = 0: in this case, we have fkl = 0 if k 6= l and fkk(ξ) = λ(|ξ|) for all k, which implies that Lkl = 0 for k 6= l and all Lkk are equal to a single scalar operator L0, i.e.,
Lh = (L0h1,...,L0hd). 28 1. ELEMENTS OF HILBERT SPACE THEORY
This is the type of operators which is mostly used in practice. If one wants L0 to be a differential operator, λ(|ξ|) must be a polynomial in the coefficients of ξ, which is only possible if it tales the form p X 2p λ(|ξ|) = λp|ξ| . i=0 Pp p p In this case, the corresponding operator is L0u = i=0(−1) ∆ u. For example, if 2 2 2 λ(|ξ|) = (α + |ξ| ) , we get L0 = (αI − ∆) . Let’s consider an example of valid differential operator with µ 6= 0, taking µ(|ξ|) = α|ξ|2 and λ as above. This yields the operator p ∂2u X L u = −α + δ (−1)p∆pu kl ∂x ∂x kl k l i=0 so that p X Lh = −α∇divh + ( (−1)p∆p)h. i=0 9.2.2. The kernel side. The kernel (or Green function) that corresponds to an operator associated to a given f can be computed simply in the Fourier domain. It is indeed given by
F (Kklu)(ξ) = gkl(ξ)ˆu(ξ) with g = f −1 (the inverse matrix), as can be checked by a simple computation. Note that, for the spline application, we need K(a⊗δx) to be well defined. The kth T component of a ⊗ δx being akδx, we can write F ((a ⊗ δx))k (ξ) = ak exp(−iξ x) so that d X T F (K(a ⊗ δx))k = gkl(ξ)al exp(−iξ x). l=1 A sufficient condition for this to provide a well-defined function after Fourier inver- sion is that all gkl are integrable. Assuming this, we have d X (K(a ⊗ δx)(y))k = g˜kl(y − x)al l=1 whereg ˜kl is the inverse Fourier transform of gkl. In the case f = λId + µηηT , we can check that 1 µ f −1 = (Id − ηηT ). λ λ + µ Let ρ(ξ) = 1/λ(|ξ|) and ψ(ξ) = µ(ξ)/(λ(ξ)|ξ|2(λ(ξ) + µ(ξ))), so that f −1(ξ) = ρ(|ξ|)Id − ψ(|ξ|)|ξ|2ηηT . The eigenvalues of f −1 therefore are ρ(ξ) and ρ(ξ) − ψ(ξ)|ξ|2. Since the eigenvalues of f must be bounded from below by c, we obtain the following constraints on ρ and ψ: |ξ|2 max(ψ, 0) ≤ ρ ≤ c. We can moreover obtain the expression of the kernel in terms of the inverse Fourier transforms of ρ and ψ, which is: ∂2ψ˜ K(x, y)kl =g ˜kl(x − y) = δklρ˜(x − y) + (x − y). ∂xk∂xl 9. BUILDING V AND ITS KERNEL 29
We can make the additional remark that inverse Fourier transform of a radial function (i.e., a function u(ξ) that only depends on |ξ) is also a radial function. We have in fact
Z T ρ˜(x) = ρ(|ξ|)eiξ xdξ d R ∞ Z Z T = ρ(t)td−1 eiη txds(η) 0 Sd−1 Z ∞ Z = ρ(t)td−1 eiη1t|x|ds(η) 0 Sd−1 where Sd−1 is the unit sphere in Rd, and the last identity comes from the fact that the integral is invariant by rotation of x, which allows us to replace x by |x|. The last integral can in turn be expressed in terms of the Bessel function Jd/2 ([21]), yielding an expression which will not be used here. 2 There therefore exist two functions γ1 and γ2 such that ρ(x) = γ1(|x| ) and 2 ψ(x) = γ2(|x| ) and the expression for K becomes
2 0 2 00 2 T K(x, y) = (γ1(|x − y| ) + 2γ2(|x − y| ))I + 2γ2 (|x − y| )(x − y)(x − y) .
Again, the choice γ2 = 0 is commonly made in practice, yielding diagonal radial kernels [11]. There is a nice dimension independent characterization of scalar radial kernels:
Proposition 8. Assume that there exists a positive function f on ]0, +∞[ such that +∞ Z 2 γ(t) = e−t uf(u)du 0 Then, whatever the dimension d, the scalar kernel K(x, y) = γ(|x − y|) is positive definite. The interesting fact is that this sufficient condition is almost necessary. The necessary and sufficient condition, which requires Lebesgue’s integration, is that there exists a positive measure µ on [0, +∞[ such that
+∞ Z 2 γ(t) = e−t udµ(u) . 0
An important application is when µ is a Dirac measure µ = δσ−2 which provides 2 − t γ(t) = e σ2 . The kernel 2 − |x−y| K(x, y) = e σ2 is called the Gaussian kernel on Rd and is probably the most commonly used for spline smoothing. But proposition 8 can be used to generate other kernels, for example, f(u) = e−u provides the Cauchy kernel 1 K(x, y) = . 1 + |x − y|2 The translation invariant kernels (not necessarily radial), of the kind K(x, y) = Γ(x − y) can be characterized in a similar way. 30 1. ELEMENTS OF HILBERT SPACE THEORY
Proposition 9. Assume that there exists a positive, even function F on Rd such that Z T Γ(x) = e−ix uF (u)du d R Then the kernel χ(x, y) = Γ(x − y) is positive definite. 9.3. Mercer’s theorem. The interest of the above discussion is that it is possible to reconstruct the Hilbert space V used in spline representation from a positive definite kernel. We gave a description using Fourier transforms, but another way to achieve this is by using Mercer’s theorem [23].
Theorem 16. We take Ω = Rd. Let K :Ω × Ω → R be a continuous, positive definite kernel, such that Z K(x, y)2dxdy < ∞ Ω×Ω 2 Then, there exists an orthonormal sequence of functions in L (Ω, R), ϕ1, ϕ2,... and a decreasing sequence (ρn) which tends to 0 when n tends to ∞ such that ∞ X K(x, y) = ρnϕn(x)ϕn(y) n=1 Let the conditions of Mercer’s theorem be true, and define the Hilbert space V by ( ∞ ) 2 X 2 V = v ∈ L (Ω, R): hv , ϕniL2 < ∞ . n=1 Define, for v, w ∈ V : ∞ X hv , wi = ρ−1hv , ϕ i hw , ϕ i . V n n L2 n L2 n=1 Note that, for v ∈ V , there is a pointwise convergence of the series ∞ X v(x) = hv , ϕniL2 ϕn(x) n=1 since m !2 m m X X −1 2 X 2 hv , ϕniL2 ϕn(x) ≤ ρn hv , ϕniL2 ρn(ϕn(x)) n=p n=p n=p and both terms in the upper-bound can be made arbitrarily small. Similarly, ∞ X v(x) − v(y) = hv , ϕniL2 (ϕn(x) − ϕn(y)) n=1 so that ∞ ∞ 2 X −1 2 X 2 (v(x) − v(y)) ≤ ρn hv , ϕniL2 ρn(ϕn(x) − ϕn(y)) n=1 n=1 2 = kvkV (K(x, x) − 2K(x, y) + K(y, y)) so that v is continuous. 9. BUILDING V AND ITS KERNEL 31
Then, letting Kx = K(., x), one has ∞ X hϕm ,KxiL2 = ρnϕn(x)hϕm , ϕniL2 = ρnϕm(x) n=1 so that ∞ ∞ X −1 2 X 2 ρn hϕn ,KxiL2 = ρnϕn(x) = K(x, x) < ∞ n=1 n=1 so that Kx ∈ V and a similar computation shows that hKx ,KyiV = K(x, y) so that K is self reproducing. Finally, if v ∈ V , ∞ ∞ X −1 X hv , KxiV = ρn hv , ϕniL2 hϕn ,KxiL2 = hv , ϕniL2 ϕn(x) = v(x) n=1 n=1 so that Kx corresponds to the Riesz representation of the evaluation functional on V . Thus, finding a continuous, positive definite, square integrable kernel is enough to make the whole theory work. 9.4. Thin plate interpolation. Thin plate theory corresponds to the situa- tion in which the operator L is some power of the Laplacian. As a tool for shape analysis, it is been originally described by Bookstein [5]. Consider the following bilinear form Z Z 2 (15) hf , giL = ∆f∆gdx = f∆ gdx 2 2 R R which corresponds to the operator L = ∆2. We need to define it on a Hilbert space which is somewhat unusual. We consider d the space H1 of all functions in R with square integrable second derivatives, which have a bounded gradient at infinity. In this space, kfkL = 0 is equivalent to the fact that f is affine, i.e., f(x) = aT x + b, for some a ∈ Rd and b ∈ R. The Hilbert space we consider is the space of f ∈ H modulo the affine functions, i.e., the set of equivalent classes T d [f] = g : g(x) = f(x) + a x + b, a ∈ R , b ∈ R for f ∈ H1. Obviously the norm associated to (15) is constant over the set [f], and kfk = 0 if and only if [f] = [0]. This space also has a kernel, although the analysis has to be different from what we have done previously, since the evaluation functional is not defined on H (functions are only known up to the addition of an affine term). However, the following is true [19]. Let U(r) = (1/8π)r2 log r if the dimension, d, is 2, and 3 d U(r) = (1/16π)r for d = 3. Then, for all u ∈ H1, there exists au ∈ R and bu ∈ R such that Z 2 T u(x) = U(|x − y|)∆ u(y)dy + au x + bu. 2 R We wil denote U(., x) the function y 7→ U(|x − y|). Fix, as in section 8.1, landmarks x1, . . . , xN and scalar constrains (c1, . . . , cN ). Again, the constraint h(xi) = ci has no meaning in H, but the constraint hU(., xi) , hiH = ci does, and means that there exists ah and bh such that T h(xi) + ah xi + bh = ci, i.e., h satisfies the constraints up to an affine term. 32 1. ELEMENTS OF HILBERT SPACE THEORY
Denote, as before Sij = U(|xi − xj|). The function h which is optimal under the constraints hU(., xi) , hiH = ci, i = 1,...N must therefore take the form N X T h(x) = αiU(|x − xi|) + a x + b. i=1 The corresponding interpolation problem is: minimize khk under the constraint T that there exists a, b such that h(xi) + ah xi + bh = ci, i = 1,...,N.The inexact matching problem simply consists in minimizing
N 2 X T 2 khkH + λ (h(xi) + a xi + b − ci) i=1 with respect to h and a, b. Replacing h by its expression in terms of the α’s, a and b yields the finite dimensional problem: minimize αT Sα T under the constraint Sα + Qγ = c with γ = (a1, . . . , ad, b) (with size d + 1 × 1) 1 d and Q, with size N × d + 1 given by (letting xi = (xi , . . . , xi )): 1 d x1 ··· x1 1 . . . Q = . . . . 1 d xN ··· xN 1 The optimal (α, γ) can be computed by identifying the gradient to 0. One obtains
−1 γˆ = QT S−1Q QT S−1c andα ˆ = S−1(c − Qγˆ). For the inexact matching problem, one minimizes
T t T t α Sα + λ(Sα + axi + b − c) (Sα + axi + b − c) , and the solution is provided by the same formulae, simply replacing S by Sλ = S + (1/λ)I. When the function h (and the ci) take d-dimensional values (e.g., correspond to displacements), the above computation has to be applied to each coordinate, which simply corresponds to using a diagonal operator, each component equal to ∆2, in the definition of the dot product. This is equivalent to use the diagonal scalar kernel associated to U.
9.5. Kernels that are affine at infinity. We now consider the case of vector fields, and discuss how affine components can be combined with any kernel. We assume here that Ω = Rd. We would like to consider spaces V that contain vector fields with an affine behavior at infinity. Note that the spaces V that we have consider so far, either by completion of C∞ compactly supported functions of using kernels defined by Fourier transforms only contain functions that vanish at infinity. We recall the definition of a function that vanishes at infinity:
Definition 4. A function fΩ → Rk is said to vanish at infinity if and only if, for all ε > 0, there exists A > 0 such that |f(x)| < ε whenever x ∈ Ω and |x| > A. 9. BUILDING V AND ITS KERNEL 33
Here, we let V be a Hilbert space of vector fields that vanish at infinity and define d Vaff = w : ∃w0 ∈ V,A ∈ Md(R) and b ∈ R with w(x) = w0(x) + Ax + b . We have the following important fact: Proposition 10. If V is a Hilbert space of continuous vector fields that vanish at infinity, then the decomposition w(x) = w0(x) + Ax + b for w ∈ Vaff is unique.
Proof. Using differences, it suffices to prove this for w = 0, and so, if w0,A and b are such that, for all x w0(x) + Ax + b = 0, then, for any fixed x and t > 0, w0(tx) + tAx + b = 0 so that Ax = −(w0(tx) + b)/t. Since the right-hand term tends to 0 when t tends to infinity, we get Ax = 0 for all x so that A = 0. Now, for all t, x, we get b = −w0(tx), which (since w vanishes at infinity) implies b = 0, and therefore also w0 = 0. ut
So we can speak of the affine part of an element of Vaff . Given the inner product in V , we can define D E D E hw , w˜i = hw , w˜ i + A, A˜ + b , ˜b Vaff 0 0 V D E where A, A˜ is some inner product between matrices (for example trace(AT A)) D E and b , ˜b some inner product between vectors (for example bT ˜b). With this product Vaff is obviously a Hilbert space and we want to compute its kernel Kaff in function of the kernel K of V (assuming that V is reproducing). Given x ∈ Rd d T and a ∈ , we need to express a w(x) under the form hKaff (., x)a , wi . Using R Vaff the decomposition, we have T T T T a w(x) = a w0(x) + a Ax + a b T ∗ ∗ = hK(., x)a , w0iV + (ax ) ,A + ha , bi where, for a matrix M, we define M ∗ by the identity hM ∗ ,Ai = trace(M T A) for all A and for a vector z, z∗ is such that hz∗ , bi = zT b for all b. From the definition of the extended inner product, we can identify T ∗ ∗ Kaff (y, x)a = K(y, x)a + (ax ) y + a . D E D E In particular, when A, A˜ = λtrace(AT A) and b , ˜b = µbT ˜b, we get
T T Kaff (y, x)a = K(y, x)a + ax y + a = (K(x, y) + (x y/λ + 1/µ)Idd)a. This provide an immediate extension of the spline interpolation of vector fields to include affine transformations, by just replacing K by Kaff . For example, exact interpolation with constraints v(xi) = ci is obtained by letting N X v(x) = Kaff (xi, xj)αj i=1 the vectors αj being obtained by solving the system N X T (16) (K(xi, xj)αj + (xi xj/λ + 1/µ)αj) = ci, for i = 1,...,N. j=1 34 1. ELEMENTS OF HILBERT SPACE THEORY
The case λ = µ → 0 is particularly interesting, since this corresponds to relax- ing the penalty on the affine displacement and obtain in this way an affine invariance similar to thin plates. In fact, the solution of (16) has a limit v∗ which is a solution 2 of the affine invariant interpolation problem: minimize kvkV under the constraint that there exists A and b with v(xi) = ci − Axi − b for i = 1,...,N. Indeed, (16) provides the unique solutions, vλ,Aλ and bλ, of the problem: minimize, with respect to v, A, b 2 2 2 kvkV + λ(kAk + kbk ) λ under the constraints v(xi) = ci −Axi −b for i = 1,...,N. Because kv kV is always smaller that the norm of the optimal v for the problem without affine component, we know that it is bounded. Thus, to prove that vλ converges to v∗, it suffice to prove that any weakly converging subsequence of vλ has v∗ as a limit. So take such a sequence, vλn , and let v0 be its weak limit. Because v 7→ v(x) is a continuous λn linear form, we have v (xi) → v(xi) for all i. Because of this, and because there λ λ λ λ λ exists a solution A , b to the overconstrained system A xi + b = ci − v (xi), we 0 0 0 0 0 can conclude that there exists A and b such that A xi + b = ci − v (xi) (because the condition for the existence of solutions pass to the limit). So v0 satisfies the constraints of the limit problem. Moreover, for any λ, we have ∗ 2 ∗ 2 ∗ 2 λ 2 λ 2 λ 2 λ 2 kv kV + λ(kA k + kb k ) ≥ kv kV + λ(kA k + kb k ) ≥ kv kV . 0 2 since the liminf of the last term is larger than kv kV , and the limit of the left-hand ∗ 2 ∗ ∗ term is kv kV , we can conclude that v0 = v since v is the unique solution of the limit problem. CHAPTER 2
Ordinary differential equations and groups of diffeomorphisms
1. Introduction In this chapter, we review a few results for the theory of differential equations, which will provide a very efficient way to generate deformations. It may be time to specify what is meant by deformations in this course. The definition is quite natural: we fix an open subset Ω in Rd. A deformation is a function ϕ which assigns to each point x ∈ Ω a displaced position y = ϕ(x) ∈ Ω. There are two undesired behaviors that we would like to forbid: they are • The deformation should not create holes: every point y ∈ Ω should be the image of some point x ∈ Ω, i.e., ϕ should be onto. • Folds are also prohibited: two distinct points x and x0 in Ω should not be targeting the same point y ∈ Ω, i.e., ϕ must be one to one. Thus deformations must be bijections of Ω. In addition, we require some smoothness for ϕ. The next definition introduces some standard vocabulary:
Definition 5. A homeomorphism of Ω is a continuous bijection ϕ :Ω → Ω such that its inverse, ϕ−1 is continuous. A diffeomorphism of Ω is a continuously differentiable homeomorphism ϕ :Ω → Ω such that ϕ−1 is continuously differentiable.
One can show in fact that a continuously differentiable homeomorphism is a diffeomorphism. From now on, most of the deformations we shall consider will be diffeomorphisms of some open set Ω ⊂ Rd. If there is information (like grey-level, color, shape boundary) which is carried at position x, it will be moved according to the deformation. We shall say that deformations act on structures carried by Ω. Consider as a example an “image”, ie. a function I :Ω → R and a diffeomorphism ϕ of Ω. The deformation will create a new image I0 on Ω by letting I0(y) be the value of I at the position x which moved to y through the deformation, i.e., I0(y) = I(ϕ−1(y)) or I0 = I ◦ ϕ−1. This describes the action of diffeomorphisms on functions. However, we shall not be directly concerned with the direct problem of com- puting the action of diffeomorphims, but with the inverse problem of estimating the best diffeomorphims from the output of its action. For example, the image matching problem consists in finding an algorithm which, given two functions I and I0 on Ω, is able to recover a plausible diffeomorphism ϕ such that I0 = I ◦ ϕ−1. We therefore must face the problem of building diffeomorphims. This is not an easy matter, because of the nonlinear character of the problem: linear combinations of diffeomorphisms are not necessarily diffeomorphisms.
35 36 2. ORDINARY DIFFERENTIAL EQUATIONS AND GROUPS OF DIFFEOMORPHISMS
There is a direct, but limited way of building diffeomorphisms, by small per- turbations of the identity. It is provided by the next proposition. We assume that Ω is bounded.
Proposition 11. Let u ∈ C1(Ω, Rd), and assume that, (i) u(x) and Du(x) tend to 0 when x tend to infinity. (ii) For some δ > 0, one has that u(x) = 0 for any x ∈ Ω such that there exists y 6∈ Ω with |x − y| < δ. (This condition is void if Ω = Rd.) Then, for small enough ε, ϕ : x 7→ x + εu(x) is a diffeomorphism of Ω. Proof. Indeed, ϕ is obviously continuously differentiable, and takes values in Ω as soon as as ε < δ kuk∞. Since Du is continuous and tends to 0 at infinity, it is bounded and there exists a constant C such that |u(x) − u(x0)| ≤ C |x − x0|. If ϕ(x) = ϕ(x0) we have |x − x0| = ε |u(x) − u(x0)| ≤ Cε |x − x0| which implies x = x0 as soon as it is assumed that ε < 1/C, and ϕ is one to one in this case. Showing that ϕ is onto is a little bit harder. The first remark to be made is that if B(y, δ), the ball centered at y with radius δ is not included in Ω, then, by assumption, ϕ(y) = y so that y ∈ ϕ(Ω). Thus assume that B(y, δ) ⊂ Ω, and consider the function ψy, defined on B(0, δ) by ψy(η) = −εu(y + η). If ε < δ kuk∞, we have ψy(η) ∈ B(0, δ), and we have the inequality 0 0 |ψy(η) − ψy(η )| ≤ εC |η − η | y
If εC < 1, ψy is a contraction, and the fixed point theorem (stated below) implies that there exists η ∈ B(0, δ) such that ψy(η) = η. But in this case,
ϕ(y + η) = y + η + εu(y + η) = y + η − ψy(η) = y so that y ∈ ϕ(Ω) and ϕ is onto. It remains to prove that ϕ−1 is continuous. Note that we van extend u to Rd by u = 0 on Ωc which extends ϕ (and ϕ−1) to Rd also. It therefore suffices to prove d 0 d the continuity when Ω = R . So take y, y ∈ R and let ηy and ηy0 be the fixed 0 points of ψy and ψy0 in B(y, δ) and B(y , δ) respectively. We have −1 −1 0 0 0 |ϕ (y) − ϕ (y )| = |ηy + y − ηy0 − y | ≤ |ηy − ηy0 | + |y − y | 0 so that it suffices to prove that ηy is closed to ηy0 when y is closed to y . We have
|ηy − ηy0 | = |ψy(ηy) − ψy0 (ηy0 )| 0 = ε|u(y + ηy) − u(y + ηy0 )| 0 ≤ Cε(|y − y | + |ηy − ηy0 |) so that, if Cε < 1,
Cε 0 |η − η 0 | ≤ |y − y |. y y 1 − Cε which proves the continuity of ϕ−1. ut
We here recall the standard fixed point theorem, which has been used in the previous proof and will be used later: 2. A CLASS OF ORDINARY DIFFERENTIAL EQUATIONS 37
Theorem 17 (Banach fixed point theorem). Let B be a Banach space, U ⊂ B and Φ be a contraction of U, i.e., a map Φ: U → U such that there exists a constant c ∈ [0, 1[ such that, for all x, y ∈ U,
kΦ(x) − Φ(y)kB ≤ c kx − ykB .
Then, Φ has a unique fixed point in U, i.e., there exists a unique x0 ∈ U such that Φ(x0) = x0.
We therefore know how to build small deformations. Of course, we cannot be satisfied with this, since they correspond to a very limited class of diffeomorphisms. However, we can use them to generate large deformations, because diffeomorphisms can be combined using composition. If ϕ and ϕ0 are diffeomorphisms, then ϕ ◦ ϕ0 is a diffeomorphism. In fact, diffeomorphisms of Ω form a group for the composition of functions, often denoted Diff(Ω). Thus, let ε0 > 0 and u1, . . . , un,... be vector fields on Ω which are such that, for ε < ε0, id +εui is a diffeomorphism of Ω. Consider ϕn = (id +εun)◦· · ·◦(id +εu1). We have
ϕn+1 = (id + εun) ◦ ϕn = ϕn + εun ◦ ϕn which can also be written (ϕn+1 − ϕn)/ε = un ◦ ϕn. Fixing x ∈ Ω and letting x0 = x, xn = ϕn(x), we have the relation (xn+1 − xn)/ε = un(xn). This looks like a discretized version of a differential equation, of the form (introducing a continuous time variable t): dx t = u (x ) dt t t This motivates the rest of this chapter, which will be devoted to ordinary differential equations and how they generate flows of diffeomorphisms.
2. A class of ordinary differential equations
2.1. Definitions. We let Ω ⊂ Rd be open and bounded. We have denoted 1 d C0 (Ω, R ) the Banach space of continuously differentiable vector fields v on Ω such 1 d that v and Dv vanish on ∂Ω and at infinity. Elements v ∈ C0 (Ω, R ) can be considered as defined on Rd by setting v(x) = 0 if x 6∈ Ω. 1 1 d We define the set X (T, Ω) of integrable functions from [0,T ] to C0 (Ω, R ). An 1 element of X (T, Ω) is a time-dependent vector field, (vt, t ∈ [0, 1]) such that, for 1 d each t, vt ∈ C0 (Ω, R ) and Z T kvkX 1,T := kvtk1,∞ dt < ∞ . 0 For v ∈ X 1(T, Ω), we consider the ordinary differential equation, which will dy be formally denoted dt = vt(y). A function t 7→ y(t) is called a solution of the equation on [0, τ](τ ≤ T ) with initial condition x if
• t 7→ yt is continuous on [0, τ], y0 = x. • for all t ≤ τ, Z t yt = x + vs(ys)ds 0 2.2. Main results. 38 2. ORDINARY DIFFERENTIAL EQUATIONS AND GROUPS OF DIFFEOMORPHISMS
2.2.1. Existence and uniqueness. Theorem 18. Let v ∈ X 1(T, Ω). For all x ∈ Ω and t ∈ [0,T ], there exists a dy unique solution on [0,T ] of the ordinary differential equation dt = vt(y) with initial condition yt = x. This solution is such that ys ∈ Ω for all s ∈ [0,T ].
Proof. The standard proof usually assume uniform boundedness of kvtk1,∞ (or that vt is Lipschitz in x, uniformly in t). We here have an integral condition instead, but the proof is very similar and and we provide it for completeness [25, 10]. We first prove the result for Ω = Rd. Fix x ∈ Rd, t ∈ [0,T ] and δ > 0. Let I = I(t, δ) denote the interval [0,T ] ∩ [t− δ, t + δ]. If ϕ is a continuous function from I to Rd such that ϕ(t) = x, we define the function Γ(ϕ): I → Rd by Z s Γ(ϕ)(s) = x + vu(ϕ(u))du t The function s 7→ Γ(ϕ)(s) is continuous and such that Γ(ϕ)(t) = x. The set of continuous functions from the compact interval I to Rd, equipped with the supremum norm, is a Banach space, and we show that for δ small enough, Γ satisfies 0 0 kΓ(ϕ) − Γ(ϕ )k∞ ≤ γ kϕ − ϕ k∞ with γ < 1. The fixed point theorem will then imply that there is a unique function ϕ such that Γ(ϕ) = ϕ, and this is the definition of a solution of the ordinary differential equation on I. 0 R s 0 Since Γ(ϕ)(s) − Γ(ϕ )(s) = t (vu(ϕ(u)) − vu(ϕ (u)))du we have Z 0 0 kΓ(ϕ) − Γ(ϕ )k∞ ≤ kϕ − ϕ k∞ kvuk1,∞du I R but I kvuk1,∞du can be made arbitrarily small by reducing δ so that existence and uniqueness over I is proved. Now, we can make the additional remark that δ can R s be taken independent of t. This is because the function α : s 7→ 0 kvuk1,∞du is continuous, hence uniformly continuous on the interval [0,T ], so that there exists a constant η > 0 such that |s − s0| < η implies that |α(s) − α(s0)| < 1/2, and it suffices to take δ < η/2. From this remark, we can conclude that a unique solution of the ordinary differential equation exists over all [0,T ], because it is now possible, starting from the interval I(t, δ) to extend the solution from both sides, by jumps of δ/2 at least, until boundaries are reached. This proves the result for Ω = Rd and we now consider arbitrary open sets Ω. c By extending vt by 0 on Ω , the value of kvtk1,∞ remains unchanged and we can apply the result over Rd to ensure existence and uniqueness of the solution with a given initial condition. So, it only remains to show that solutions such that yt ∈ Ω for some t belong to Ω at all times. This is true because if there exists s such that 0 0 ys = x 6∈ Ω, then the functiony ˜u = x for all u is a solution of the equation, since 0 vu(x ) = 0 for all u. Uniqueness impliesy ˜ = y at all times which is impossible. ut 1 v Definition 6. Let v ∈ X (T, Ω). We denote by ϕst(x) the solution at time t of the equation dy/dt = vt(y) with initial condition ys = x. v The function (t, x) 7→ ϕst(x) is called the flow associated to v starting at s. It is defined on [0,T ] × Ω. 2. A CLASS OF ORDINARY DIFFERENTIAL EQUATIONS 39
From the definition, we have the property Proposition 12. If v ∈ X 1(T, Ω) and s, r, t ∈ [0,T ], then v v v ϕst = ϕrt ◦ ϕsr v v v In particular, ϕst ◦ ϕts = id and ϕst is invertible for all s et t. v Proof. If x ∈ Ω, ϕst(x) is the value at time t of the unique solution of dy/dt = 0 v vt(y) which is equal to x at time s. It is equal to x = ϕsr(x) at time r, and thus v 0 also equal to ϕrt(x ) which is the statement of the proposition. ut 2.2.2. Properties of the flow. The next theorem shows that flows really are interesting objects when diffeomorphisms are needed.
1 v Theorem 19. Let v ∈ X (T, Ω). The associated flow, ϕst, is at all times a diffeomorphism of Ω. The proof of this result depends on Gronwall’s lemma, that we first state and prove.
Theorem 20 (Gronwall’s lemma). Consider two positive functions αs, us, de- fined for s ∈ I where I is an interval in R. We assume that u is bounded, and that, for some function c, and for all t ∈ I, Z t
(17) ut ≤ c + αsusds . 0 Then, for all t ∈ [0,T ], R t | αsds| ut ≤ ce 0 Proof. To treat simultaneously the cases t > 0 and t < 0, we let ε = 1 in the first case and ε = −1 in the second case. Inequality (17) now becomes: Z t ut ≤ c + ε αsusds 0 Iterating once this inequality yields
Z t Z t Z s1 2 ut ≤ c + cε αsds + ε αs1 αs2 us2 ds . 0 0 0 and it may be checked by induction that, repeating this process, Z t Z n ut ≤ c + cε αsds + ··· + ε αs1 . . . αsn ds1 . . . dsn 0 0≤s1≤···≤sn≤t Z n+1 + ε αs1 . . . αsn+1 usn+1 ds1 . . . dsndsn+1 . 0≤s1≤···≤sn≤t Consider the integral Z
In = αs1 . . . αsn ds1 . . . dsn 0≤s1≤···≤sn≤t
Let σ be a permutation of {1, . . . , n}: making the change of variable si → sσi in In yields Z
In = αs1 . . . αsn ds1 . . . dsn 0≤sσ1 ≤···≤sσn ≤t 40 2. ORDINARY DIFFERENTIAL EQUATIONS AND GROUPS OF DIFFEOMORPHISMS
Obviously, X 1 = 1 0≤sσ1 ≤···≤sσn ≤t σ whenever si 6= sj if i 6= j. The n-tuples s1, . . . , sn for which si = sj for some j for a set of dimension n − 1 which has no influence on the integral. Thus, summing over σ yields Z Z t n
n!In = αs1 . . . αsn ds1 . . . dsn = αsds [0,t]n 0
Therefore, using the fact that un is bounded, we have n k n+1 X εk Z t εn+1 sup(u) Z t u ≤ c α ds + α ds . t k! s (n + 1)! s k=0 0 0 and passing to the limit yields the result. ut We now pass to the proof of theorem 19, in which Gronwall’s lemma will be used several times.
v Proof of theorem 19. We first show that ϕst is a homeorphism. Take x, y ∈ Ω. We have Z t v v v v |ϕst(x) − ϕst(y)| = x − y + (vs(ϕsr(x)) − vs(ϕsr(y))) dr s Z t v v ≤ |x − y| + kvrk1,∞ |ϕsr(x) − ϕsr(y)| ds s v v We apply Gronwall’s lemma with c = |x − y|, αr = kvrk1,∞ and ur = |ϕ0r(x) − ϕ0r(y)| v which is bounded since ϕsr is continuous in r. This yields Z t v v (18) |ϕsr(x) − ϕsr(y)| ≤ |x − y| exp kvrk1,∞ dr s v which shows that ϕst is continuous on Ω, and even Lipschitz, with a Lipschitz v −1 v constant smaller than exp kvkX 1,t . Since (ϕst) = ϕts, the inverse is also con- v tinuous so that ϕst is a homeomorphism of Ω. We now prove that we indeed have a diffeomorphism. To see what we aim at, we assume first that this is true and formally differentiate, at ε = 0, the equation ∂ϕv st (x + εδ) = v (ϕv (x + εδ)) ∂t t st to obtain ∂ Dϕv (x).δ = Dv (ϕv (x)) Dϕv (x).δ ∂t st t st st This suggests introducing the linear differential equation ∂W t = Dv (ϕv (x))W ∂t t st t with initial conditions Ws = δ. The same argument we have used for the original equation shows that this equation also admits a unique solution on [0,T ]. The previous computation indicates that this solution should provide the differential v Dϕst(x), and we now prove this result. Define v v aε(t) = (ϕst(x + εδ) − ϕst(x)) /ε − Wt . 2. A CLASS OF ORDINARY DIFFERENTIAL EQUATIONS 41
We need to show that aε(t) → 0 when ε → 0. For α > 0, define
µt(α) = max {|Dvt(x) − Dvt(y)| : x, y ∈ Ω, |x − y| ≤ α}
The function x 7→ Dvt(x) is continuous and tends to 0 at infinity (or on ∂Ω) and therefore is uniformly continuous, which is equivalent to the fact that µt(α) → 0 when α → 0. We can write Z t Z t 1 v v v aε(t) = (vr(ϕsr(x + εδ)) − vr(ϕsr(x))) dr − Dvt(ϕst(x))Wtdt ε s s Z t v = Dvr(ϕsr(x))aε(r)dr s Z t 1 v v v v v + (vr(ϕsr(x + εδ)) − vr(ϕsr(x)) − εDvr(ϕsr(x)) (ϕsr(x + εδ) − ϕsr(x))) dr ε s We have, for all x, y ∈ Ω:
|vt(y) − vt(x) − Dvt(x)(y − x)| ≤ µt(|x − y|) |x − y| . This inequality, combined with equation (18), yields Z t Z T |aε(t)| ≤ kvrk1,∞ |aε(r)| dr + C(v) |δ| µr(εC(v) |δ|)dr s 0 for a constant C(v) which only depends on v. To conclude the proof using Gron- wall’s lemma we need the fact that Z T lim µr(α)dr = 0 . α→0 0
This is a consequence of the fact that µr(α) → 0 for each r when α → 0 and of the upper bound µr(α) ≤ 2 kvk1,∞ which allows us to apply Lebesgue dominated convergence theorem. ut
We have incidentally shown the following important fact:
1 v Proposition 13. Let v ∈ X (T, Ω). Then for fixed x ∈ Ω, Dϕst(x) is the solution of the linear differential equation dW (19) t = Dv (ϕv (x))W dt t st t with initial condition Ws = Idk. In fact, this result can be extended to p derivatives (we state this without proof, see [24])
R t v Theorem 21. If p ≥ 1 and 0 kv(t)kp,∞dt < ∞, then, ϕst is p times differen- tiable and for all q ≤ p ∂ (20) Dqϕv = Dq(v ◦ ϕv ). ∂t st st Moreover, there exist constants C,C0 (independent on v) such that,
0 R 1 v C 0 kvkp,∞ (21) sup kϕstkp,∞ ≤ Ce . s∈[0,1] 42 2. ORDINARY DIFFERENTIAL EQUATIONS AND GROUPS OF DIFFEOMORPHISMS
v Expanding the q-th derivative of v ◦ϕ0t in the right-hand-term of equation (20) (the general formula is rather cumbersome, but one can do it at least for small values v q v of q) provides an expression with leading terms Dv ◦ ϕ0tD ϕ0t followed by lower q v order terms in ϕ. This implies that D ϕ0t(x) is the solution of a non homogeneous linear differential equation with vanishing initial conditions. 2.2.3. Variations with respect to the vector field. It will be quite important, in the following, to be also able to characterize the effects that a variation of the time-dependent vector v may have on the induced flow. For this purpose, we fix 1 1 v+εh v ∈ X (T, Ω), and h ∈ X (T, Ω), and proceed to the computation of (d/dε)ϕst at ε = 0. The argument is similar to the previous section: first, make a guess by formal differentiation of the original ODE, then proceed to a rigorous argument to show that the result which has been guessed is correct. So, consider the equation ∂ϕv+εh st = v ◦ ϕv+εh + εh ◦ ϕv+εh ∂t t st t st and formally compute its derivative with respect to ε. This yields ∂ ∂ d ϕv+εh = h ◦ ϕv+εh + D(v + εh ) ◦ ϕv+εh ϕv+εh ∂t ∂ε st t t t t st dε st and, letting ε = 0,
∂ ∂ v+εh v v d v+εh ϕ = (ht) ◦ ϕ + Dvt ◦ ϕ ϕ ∂t ∂ε st |ε=0 t st dε st |ε=0 which naturally leads to introduce the solution of the differential equation ∂ (22) W = h ◦ ϕv + Dv ◦ ϕv W ∂t t t st t st t with initial condition Ws = 0. We now set v+εh v aε(t) = ϕst (x) − ϕst(x) /ε − Wt and express it under the form Z t Z t v v+εh v aε(t) = Dvu ◦ ϕsuaε(u)du + (hu(ϕsu (x)) − hu(ϕsu(x)))du s s Z t 1 v+εh v v v+εh v + vu(ϕsu (x)) − vu(ϕsu(x)) − εDvu(ϕsu(x)) ϕsu (x) − ϕsu(x) du ε s The proof can proceed exactly as in theorem 19, provided it has been shown that v+εh v ϕsu (x) − ϕsu(x) = O(ε) which is again a direct consequence of Gronwall’s lemma and of the inequality Z t Z t 1 v+εh v 1 v+εh v ϕst (x) − ϕst(x) ≤ kvukX 1,T ϕsu (x) − ϕsu(x) du + khuk∞ du. ε s ε s v Equation (22) is the same as equation (19), with the additional term ht ◦ ϕst. This implies that the solution of (22) may be expressed in function of the solution of (19) by variation of the constant, ie. it may be expressed under the form v Wt = Dϕst(x)At with As = 0 and At may be identified by letting dW dA h ◦ ϕv + Dv ◦ ϕv W = t = Dϕv (x) t + Dv ◦ ϕv W t st t st t dt st dt t st t 2. A CLASS OF ORDINARY DIFFERENTIAL EQUATIONS 43 so that dA t = (Dϕv (x))−1 h ◦ ϕv (x) = Dϕv (ϕv (x))h ◦ ϕv (x). dt st t st ts st t st This implies that Z t v v v At = Dϕus(ϕsu(x))hu ◦ ϕsu(x) s and Z t v v v v Wt = Dϕst(x)Dϕus(ϕsu(x))hu ◦ ϕsu(x) s which, by the chain rule can be written Z t v v v Wt = Dϕut(ϕsu(x))hu ◦ ϕsu(x) s We summarize this discussion by the theorem Theorem 22. Let v, h ∈ X 1(T, Ω). Then, for x ∈ Ω Z t d v+εh v v v (23) ϕst (x)|ε=0 = Dϕut(ϕsu(x))hu ◦ ϕsu(x)du dε s v A trivial consequence of this theorem is the fact that ϕst depends continuously on v. But one can be more specific: we have the inequalities t 0 Z 0 v v v 0 v ϕst(x) − ϕst(x) ≤ vu(ϕsu(x)) − vu(ϕsu(x)) du s t t Z Z 0 v 0 v 0 v 0 v ≤ vu(ϕsu(x)) − vu(ϕsu(x))du + vu(ϕsu(x)) − vu(ϕsu(x)) du s s t t Z Z 0 v 0 v 0 v v ≤ vu(ϕsu(x)) − vu(ϕsu(x))du + kvuk1,∞ ϕsu(x) − ϕsu(x) du s s We shall apply to this inequality a more general version of Theorem 20. The proof is similar to Theorem 20 and we omit it here.
Theorem 23. Consider 3 positive functions cs, αs, us defined on [0,T ] such that u is bounded and Z t (24) ut ≤ ct + αsusds . 0 then t Z R t αr dr ut ≤ ct + csαse s 0
0 v v This theorem can be applied in our case by letting uτ = ϕss+τ (x) − ϕss+τ (x) , 0 ατ = vs+τ 1,∞ and Z s+τ v 0 v cτ = (vu(ϕsu(x)) − vu(ϕsu(x))) du s yielding the inequality t t 0 Z Z v v 0 0 0 (25) ϕst(x) − ϕst(x) ≤ ct−s + cu−s kvuk1,∞ exp kvu0 k1,∞ du du s u 44 2. ORDINARY DIFFERENTIAL EQUATIONS AND GROUPS OF DIFFEOMORPHISMS
0 R t 0 v 1 Consider the function v 7→ s vu ◦ ϕsudu. It is a linear form on X (T, Ω) which is obviously continuous, since Z t Z t 0 v 0 0 vu ◦ ϕsudu ≤ kvuk∞ du ≤ kv kX 1,T s s Thus, consider a sequence vn ∈ X 1(T, Ω) which is bounded in X 1(T, Ω) and weakly n n 0 converges to v. If we let cτ be equal to cτ with v instead of v , weak continuity n n n implies that cτ → 0 when n tends to infinity. We have cτ ≤ kv kX 1,T + kvkX 1,T , so that it is uniformly bounded and the dominated convergence theorem can be applied to equation (25) to obtain the fact that, for all s, t ∈ [0,T ], for all x ∈ Ω vn v ϕst (x) → ϕst(x). When Ω is bounded, the convergence is in fact uniform in x. Indeed, we have vn shown that the Lipschitz constant of ϕst could be expressed in function of kvnkX 1,T we have the fact that there exists a constant C such that, for all n > 0, and x, y ∈ Ω
vn vn |ϕst (x) − ϕst (y)| ≤ C |x − y| .
vn This implies that the family ϕst is equicontinuous and a similar argument shoes vn that it is bounded, so that Ascoli theorem shows (ϕst , n ≥ 0) is relatively compact for the uniform convergence. But the limit of any subsequence that converges v uniformly must be ϕst since it is already the pointwise limit of the whole sequence. v This implies that the uniform limit exists and is equal to ϕst. If Ω is not bounded, the same argument implies that the limit is uniform on compact sets. Thus, we have just proved the theorem 1 1 Theorem 24. If v ∈ X (T, Ω) and vn is a bounded sequence in X (T, Ω) which weakly converges to v, then, for all s, t ∈ [0,T ], for every compact subset Q ⊂ Ω
vn v lim max |ϕst (x) − ϕst(x)| = 0 n→∞ x∈Q
3. Groups of diffeomorphisms 3.1. Admissible Banach spaces. 1 d Definition 7. A Banach space V ⊂ C0 (Ω, R ) is admissible if it is (canoni- 1 d cally) embedded in C0 (Ω, R ), ie. there exists a constant C such that, for all v ∈ V ,
(26) kvkV ≥ C kvk1,∞ 1 If V is admissible, we denote by XV (Ω) the set of time-dependent vector fields, (vt, t ∈ [0, 1]) such that, for each t, vt ∈ V and Z 1 kvkX 1 := kvtkV dt < ∞ . V 0 1 If the interval [0, 1] is replaced by [0,T ], we will use the notation XV (T, Ω) and kvk 1 . XV ,T 3.2. Induced group of diffeomorphisms. 1 d Definition 8. If V ⊂ C0 (Ω, R ) is admissible, we denote v V GV = ϕ01, v ∈ X1 (Ω) V the set of diffeomorphims provided by the flow associated to elements v ∈ X1 (Ω) after time 1. 3. GROUPS OF DIFFEOMORPHISMS 45
Theorem 25. GV is a group for the composition of functions. Proof. The identity function belongs to G: it corresponds, for example, to v v 0 v0 0 1 0 w ϕ01 when v = 0. If ψ = ϕ01 and ψ = ϕ01, with v, v ∈ XV , then ψ ◦ ψ = ϕ01 with 0 wt = v2t for t ∈ [0, 1/2] and wt = v2t−1 for t ∈]1/2, 1] (details are left to the reader) 1 v −1 w and w belongs to XV (Ω). Similarly, if ψ = ϕ01, then ψ = ψ01 with wt = −v1−t. Indeed, we have Z 1−t Z t w w w ϕ0,1−t(y) = y − v1−s ◦ ϕ0s(y)ds = y + vs ◦ ϕ0,1−sds 0 1 w v which implies (by the uniqueness theorem) that ϕ0,1−t(y) = ϕ1t(y) and in particular w v ϕ01 = ϕ10. This proves that GV is a group. ut Thus, by selecting a certain Banach space V , we can in turn specify a group of diffeomorphisms. In particular, elements in GV inherit the smoothness properties of V , by Theorem 21.
3.3. A distance on GV . Let V be an admissible Banach space. For ψ and 0 ψ in GV , we let 0 n 0 v o (27) dV (ψ, ψ ) = inf kvk 1 , ψ = ψ ◦ ϕ01 1 XV v∈XV (Ω) We have the theorem:
Theorem 26 (Trouv´e). The function dV is a distance on GV , and (GV , dV ) is a complete metric space.
Recall that dV is a distance if it is symmetrical, satisfies the triangle inequality 0 00 00 0 0 dV (ψ, ψ ) ≤ dV (ψ, ψ ) + dV (ψ , ψ ) and is such that dV (ψ, ψ ) = 0 if and only if ψ = ψ0. Proof. Note that the set over which the infimum is computed is not empty: if 0 −1 0 ψ, ψ ∈ GV , then ψ ◦ ψ ∈ GV (since GV is a group) and therefore can be written v 1 under the form ϕ01 for some v ∈ XV (Ω). 0 Let us start with the symmetry: fix ε > 0 and v such that kvk 1 ≤ d(ψ, ψ )+ε XV 0 v 0 v and ψ = ψ ◦ ϕ01. This implies that ψ = ψ ◦ ϕ10, but we know (from the proof of v w Theorem 25) that ϕ = ϕ with wt = −v1−t. Since kwk 1 = kvk 1 , we have, 10 01 XV XV from the definition of dV : 0 0 dV (ψ , ψ) ≤ kwk 1 ≤ dV (ψ, ψ ) + ε XV 0 0 and since this is true for every ε, we have dV (ψ , ψ) ≤ dV (ψ, ψ ). Interverting the 0 0 0 roles of ψ and ψ yields dV (ψ , ψ) = dV (ψ, ψ ). For the triangular inequality, let v 0 00 0 00 0 00 v and v be such that kvk 1 ≤ d(ψ, ψ )+ε, kv k 1 ≤ d(ψ , ψ )+ε, ψ = ψ ◦ϕ and XV XV 01 0 00 v0 0 v v0 ψ = ψ ◦ ϕ01. We thus have ψ = ψ ◦ ϕ01 ◦ ϕ01 and we know, still from the proof v v0 w 0 of Theorem 25, that ϕ01 ◦ ϕ01 = ϕ01 with wt = v2t for t ∈ [0, 1/2] and wt = v2t−1 0 for t ∈]1/2, 1]. But, in this case, kwk 1 = kvk 1 + kv k 1 so that XV XV XV 0 00 00 0 d(ψ, ψ ) ≤ kwk 1 ≤ d(ψ, ψ ) + d(ψ , ψ ) + 2ε XV which implies the triangular inequality, since this is true for every ε > 0. 0 0 We obviously have d(ψ, ψ) = 0 since ϕ01 = id . Assume that d(ψ, ψ ) = 0. This 0 −1 vn implies that there exists a sequence vn such that kvnk 1 → 0 and ψ ◦ ψ = ϕ . XV 01 v vn 0 0 The continuity of v 7→ ϕst implies that ϕ01 → ϕ01 = id so that ψ = ψ . 46 2. ORDINARY DIFFERENTIAL EQUATIONS AND GROUPS OF DIFFEOMORPHISMS
Let us now check that we indeed have a complete metric space. Let ψn be a Cauchy sequence for dV , so that, for any ε > 0, there exists n0 such that, for any n n0 −n n ≥ n0, dV (ϕ , ψ ) ≤ ε. Taking recursively ε = 2 , it is possible to extract a subsequence ψnk of ψn such that
∞ X nk nk+1 dV (ψ , ψ ) < ∞ k=0 Since a Cauchy sequence converges whenever one of its subsequences does, it is sufficient to show that ψnk has a limit. k 1 From the definition of dV , there exists, for every k ≥ 0 an element v in XV (Ω) k nk+1 v nk such that ψ = ϕ01 ◦ ψ and
k nk nk+1 −k−1 v 1 ≤ dV (ψ , ψ ) + 2 XV
0 1 Let us define a time dependent vector field v by vt = 2v2t for t ∈ [0, 1/2[, vt = 4v4t−2 for t ∈ [1/2, 3/4[, and so on: to define the general term, introduce the dyadic −k−1 sequence of times t0 = 0 and tk+1 = tk + 2 and let
k+1 k v = 2 v k+1 t 2 (t−tk) for t ∈ [tk, tk+1[. Since tk tends to 1 when t → ∞, this defines vt on [0, 1[, and we fix v1 = 0. We have
∞ Z tk+1 X k+1 k kvtkX 1 = 2 v2k+1(t−t ) dt V k V k=0 tk ∞ Z 1 X k = vt V dt k=0 0 ∞ X nk nk+1 ≤ 1 + dV (ψ , ψ ) k=0
1 v so that v ∈ XV (Ω). Now, consider the associated flow ϕ0t: it is obtained by 0 v v0 first integrating 2v2t between [0, 1/2[, which yields ϕ0,1/2 = ϕ01. Iterating this argument, we get
k 0 ϕv = ϕv ◦ · · · ◦ ϕv , 0tk+1 01 01 so that ψnk+1 = ϕv ◦ ψn0 0tk+1
k Let ψ∞ = ϕv ◦ ψn0 . We have ψ∞ = ϕv ◦ ψnk . Since ϕv = ϕw with 01 tk1 tk1 01 k wt = v(t−tk)/(1−tk)/(1 − tk), and Z 1 kwk 1 = kvtk dt XV V tk
n ∞ we obtain the fact that dV (ψ k , ψ ) → 0 which completes the proof of theorem 26. ut 3. GROUPS OF DIFFEOMORPHISMS 47
3.4. Properties of the distance. We first introduce the set of square inte- grable (in time) time dependent vector fields:
2 Definition 9. Let V be an admissible Banach space. We define XV (Ω) as the set of time-dependent vector fields v = (vt, t ∈ [0, 1]) such that, for each t, vt ∈ V and Z 1 2 kvtkV dt < ∞ 0 And we state without proof the important result:
2 Proposition 14. XV (Ω) is a Banach space with norm Z 1 1/2 2 kvkX 2 = kvtkV dt . V 0 2 Moreover, if V is a Hilbert space, then XV (Ω) is Hilbert too with Z 1 hv , wiX 2 = hvt , wtiV dt . V 0 2 R 1 R 1 2 2 1 Because 0 kvtkV dt ≤ 0 kvtkV dt, we have XV (Ω) ⊂ XV (Ω) and if v ∈ 2 X (Ω), kvk 1 ≤ kvk 2 . The computation of dV can be reduced to a minimization V XV XV 2 over XV by 0 Theorem 27. If v is admissible and ψ, ψ ∈ GV , we have
0 n 0 v o (28) dV (ψ, ψ ) = inf kvk 2 , ψ = ψ ◦ ϕ01 2 XV v∈XV (Ω) Proof. Let
0 n 0 v o δV (ψ, ψ ) = inf kvk 2 , ψ = ψ ◦ ϕ01 2 XV v∈XV (Ω) and dV be given by (27). Since δV is the infimum over a larget set than δV , and 0 0 minimizes a quantity which is always smaller, we have dV (ψ, ψ ) ≤ δV (ψ, ψ ) and 1 we now proceed to the reverse inequality. For this, consider v ∈ XV (Ω) such that 0 v 2 ψ = ψ ◦ ϕ01. It suffices to prove that, for any ε > 0 there exists a w ∈ X (Ω) 0 w such that ψ = ψ ◦ ϕ and kwk 2 ≤ kvk 1 + ε. The important remark for this 01 XV XV purpose is that, if α is a differentiable increasing function from [0, 1] onto [0, 1] dαs (which implies α0 = 0 and α1 = 1), then, lettingα ˙ s = ds ,
Z αt ϕv (x) = x + v (ϕ (x))du 0αt u 0u 0 Z t
= x + α˙ svαs (ϕ0αs (x))ds 0 so that the flow generated by w =α ˙ v is ϕv and therefore coincides with ϕv s αs 0αt 01 at t = 1. We have Z 1 Z 1
kwkX 1 = α˙ s kvαs kV ds = kvtkV dt = kvkX 1 V 0 0 V 48 2. ORDINARY DIFFERENTIAL EQUATIONS AND GROUPS OF DIFFEOMORPHISMS so that this time change does not help for the minimization in (27). However, we have, denoting by βt the inverse of αt, Z 1 Z 1 2 2 2 2 kwkX 2 = α˙ s kvαs kV ds = α˙ βt kvtkV dt V 0 0 so that this transformation can be used to reduce kvk 2 . If kvtkV never vanishes, XV ˙ we can chooseα ˙ βt = c/ kvtkV or equivalently βt = kvtkV /c with c chosen so that Z 1 ˙ βtdt = 1 0 which yields c = kvk 1 . This gives XV Z 1 2 2 kwkX 2 = c kvtkV dt = kvkX 1 V 0 V which is exactly the kind of transform we are looking for. In the general case, we ˙ let, for some η > 0,α ˙ βt = c/(η + kvtkV ) which yields βt = (η + kvtkV )/c and c = η + kvk 1 . This gives XV Z 1 2 2 kvtkV kwk 2 = c dt ≤ c kvk 1 = (kvk 1 + η) kvk 1 XV XV XV XV 0 η + kvtkV
By choosing η small enough, we can always arrange that kwk 2 ≤ kvk 1 +ε which XV XV is what we wanted to prove. ut An important consequence of this result is the following fact: 2 Corollary 2. If the infimum in (28) is attained at some v ∈ XV (Ω), then kvtkV is constant. Indeed, let v achieve the minimum in (28): we have 0 dV (ψ, ψ ) = kvk 2 ≥ kvk 1 XV XV 0 but kvk 1 ≥ dV (ψ, ψ ) by definition. Thus, we must have kvk 2 = kvk 1 , which XV XV XV corresponds to the equality case in the Schwartz inequality, which can only be achieved by (almost everywhere) constant functions. Corollary 2 is usefully completed by the following theorem: 0 Theorem 28. If V is Hilbert and admissible, and ψ, ψ ∈ GV , there exists 2 v ∈ XV (Ω) such that 0 dV (ψ, ψ ) = kvk 2 XV 0 v and ψ = ϕ01 ◦ ψ. Proof. By Proposition 14, X 2(Ω) is a Hilbert space. Let us fix a minimizing 0 n 2 n 0 sequence for dV (ψ, ψ ), ie. a sequence v ∈ X (Ω) such that kv k 2 → dV (ψ, ψ ) V XV 0 vn n and ψ = ϕ ◦ ψ. This implies that (kv k 2 ) is bounded and by theorem 7, one 01 XV can extract a subsequence of vn (that we still denote by vn) which weakly converges 2 to some v ∈ XV (Ω), such that n 0 kvk 2 ≤ lim inf kv k 2 = dV (ψ, ψ ) XV XV vn v 0 v But Theorem refth:weak.unif implies that ϕ converges to ϕ so that ψ = ϕ01 ◦ ψ remains true: this proves Theorem 28. ut CHAPTER 3
Matching deformable objects
1. General principles The methods we shall investigate in this chapter use variational formulations to find optimal matchings between two different structures. They will all obey a very simple general form. Let Ω be an open subset of Rd and G a group of diffeomorphisms on Ω. Consider a set I of structures of interest, on which G is acting: for every I in I and every ϕ ∈ G, the result of the action of ϕ on I is denoted ϕ.I and is a new element of I. Natural conditions require that id .I = I and ϕ.(ψ.I) = (ϕ ◦ ψ).I (we will discuss group actions more rigorously in the next chapter). Then, given two elements I and I0 in I, the optimal matching ϕ between 0 I and I is computed by minimizing a functional EI,I0 (ϕ) which measures how adequate ϕ is as a matching function between I and I0. Most of the time, E will be of the kind 0 (29) EI,I0 (ϕ) = δ(ϕ) + D(ϕ.I, I ) where δ is a function which penalizes unwanted behavior, typically ensuring that ϕ is smooth and not too far from the identity, and D measures the difference between ϕ.I and I0. Before reviewing examples and algorithms in this framework, we may list some properties that we would like to see satisfied. An obvious one should be that if I = I0, then the preferred ϕ should be identity. If E is given by (29), this will be true when δ is minimal at the identity and D(I,I0) minimal when I = I0. Another important property is consistency: can one find a minimizer for EI,I0 in G (or maybe in some larger space than G on which EI,I0 could still be defined)? Obviously, the problem would otherwise be meaningless. Beside consistency is feasibility: can one find an algorithm which is able to solve the problem? Is the algorithm stable enough? Symmetry may also be a requirement: if ϕ is found to provide an optimal correspondence from I to I0, it should be natural that ϕ−1 has the same property from I0 to I.
2. Matching terms and greedy descent procedures 2.1. Eulerian gradients. We consider energies of the form (29). For such energies, the matching term largely depends on the problem which is considered. Some of them are reviewed in this section. There are two important issues which will be addressed in each case. The first one is the continuity of the matching function. Assume that I and I0 are given: what assumptions must be made on 0 0 a converging sequence ϕn → ϕ to ensure that D(ϕn.I, I ) → D(ϕ.I, I )? Often, pointwise convergence will be enough (ϕn(x) → ϕ(x) for all x), but sometimes,
49 50 3. MATCHING DEFORMABLE OBJECTS some additional requirements will be necessary (like assumptions on the derivatives of ϕn). The second important issue concerns the differentiability of the matching term. We will denote the differential of a function U defined on G by δU d (30) (ϕ)| h = U(ϕ + εh) . δϕ dε |ε=0 As indicated by its definition, δU/δϕ is a linear form acting of variations of diffeo- morphisms (i.e., on vector fields). This linear form does not need to be a function. For example, if U(ϕ) = |ϕ(x)|2 for some fixed x ∈ Ω, then δU (ϕ)| h = 2ϕ(x)T h(x), δϕ so that δU/δϕ = 2ϕ(x) ⊗ δx. In the following, it will be useful to also consider other forms of variations than ϕ 7→ ϕ+εh, and introduce what can be called an Eulerian gradient for the function U. This can be defined as follows. Let V be a Hilbert space included in L2(Ω, Rd) 1 such that GV ⊂ G. This implies that, if U is a function defined on G, for v ∈ XV (Ω) v and ϕ ∈ G, the function t 7→ U(ϕ0t ◦ ϕ) is well-defined. Definition 10. We say that U has a Eulerian gradient with respect to V if V and only if, for all ϕ ∈ G, there exists an element denoted ∇ U(ϕ) ∈ V such that
d v D V E (31) U(ϕ0t ◦ ϕ)|t=0 = ∇ U(ϕ) , v(0) dt V 1 in which we identify v with the element vt ≡ v ∈ XV (Ω). We will also call the Eulerian differential the linear form defined by D V E ∂¯V U(ϕ)| v = ∇ U(ϕ) , v V V i.e., ∂¯V U(ϕ) = K−1∇ U(ϕ) where K is the kernel of V . The Eulerian gradient is directly related to the differential via the relation, holding for v ∈ V :
D V E δU (32) ∇ U(ϕ) , v = ∂¯V U(ϕ)| v = (ϕ)| v ◦ ϕ . V δϕ Moreover, by definition of the kernel operator, K, on V , we have V (33) ∇ U(ϕ) = K(∂¯V U(ϕ)). The interest of this notion is to define the “gradient descent” process which generates a time dependent element of G by setting
dϕt(x) V (34) = −∇ U ◦ ϕt(x) dt ϕt R t V As long as ∇ϕ U ds is finite, this generates a time-dependent element of GV . 0 s V This is therefore an evolution within the group of diffeomorphisms, an important V property. Under some assumptions on the continuity of ϕ 7→ U(ϕ) and ϕ 7→ ∇ϕ U, it can be shown that 2 dU(ϕt) V = − ∇ϕ U dt t V 2. MATCHING TERMS AND GREEDY DESCENT PROCEDURES 51 so that U(ϕt) decreases with time. We now discuss different versions of (34) adapted to different matching problems. There is another interesting relation between δU/δϕ and the Eulerian differen- tial in the following particular case. Assume that U takes the form of some function Z of ϕ.I where I is a fixed deformation object and I 7→ ϕ.I possesses the properties we have discussed before, with in particular (ϕ◦ψ).I = ϕ.(ψ.i). Then, we can write d ∂U¯ (ϕ)| h = Z(((id + εh) ◦ ϕ).I) dε |ε=0 d = Z((id + εh)(ϕ.I)) dε |ε=0 δU = ϕ (id )| h ψ where Uϕ(ψ) := U(ψ.(ϕ.I)). From a practical point of view, the computation of ∂U¯ (ϕ) can be done by first computing δZ(ϕ.J)/δϕ at the identity for an arbitrary J, and then to take J = ϕ.I for the formula.
2.2. Point matching. Assume that objects are collections of N points x1, . . . , xN ∈ Ω. Diffeomorphisms act on such objects by:
(35) ϕ.(x1, . . . , xN ) = (ϕ(x1), . . . , ϕ(xN )) 0 0 0 Fix two objects I = (x1, . . . , xN ) and I = (x1, . . . , xN ) and consider the matching functional N X 0 2 (36) U(ϕ) = |xi − ϕ(xi)| i=1 U is obviously continuous for pointwise convergence. The computation of δU/δϕ is straightforward and provides
N δU X (37) | h = 2 (ϕ(x ) − x0 )T h(x ). δϕ i i i i=1 This can be written N δU X = 2 (ϕ(x ) − x0 ) ⊗ δ . δϕ i i xi i=1 The Eulerian differential comes immediately from (32), yielding
N δU X (38) | h = 2 (ϕ(x ) − x0 )T h ◦ ϕ(x ). δϕ i i i i=1 or N δU X = 2 (ϕ(x ) − x0 ) ⊗ δ . δϕ i i ϕ(xi) i=1 Finally, from (33), we have
N V X 0 (39) ∇ U(ϕ) = 2 K(., ϕ(xi))(ϕ(xi) − xi). i=1 52 3. MATCHING DEFORMABLE OBJECTS
The gradient descent algorithm is N dϕt(x) X (40) = −2 K(ϕ (x), ϕ (x ))(ϕ (x ) − x0 ). dt t t i t i i i=1 i This system can be solved in two times: denote yt = ϕt(xi). Applying (40) at x = xj yields j N dy X t = −2 K(yj, yi)(yi − x0 ) dt t t t i i=1 1 N This is a differential system in yt , . . . , yt which must be solved with initial con- j ditions y0 = ϕ0(xj). Once this is done, the value of ϕt(x) for a general x is the solution of the differential equation N dy X = −2 K(y, yi)(yi − x0 ) dt t t i i=1
We now summarize this algorithm, assuming that ϕ0 = id : Algorithm 2 (greedy spline minimization). Solve the differential system j N dy X t = −2 K(yj, yi)(yi − x0 ) dt t t t i i=1 j with initial conditions y0 = xj. Then, let ϕt be the flow associated to the ordinary differential equation N dy X = −2 K(y, yi)(yi − x0 ) dt t t i i=1 2.3. Image matching. Now, consider that objects are functions I defined on Ω with values in R. Diffeomorphisms act on functions by: (ϕ.I)(x) = I(ϕ−1(x)) for x ∈ Ω. Fixing two such functions I and I0, the simplest matching functional which can be considered is Z −1 0 2 (41) U(ϕ) = I ◦ ϕ (x) − I (x) dx. Ω We assume that I is differentiable and first compute δU/δϕ. For this, we must first notice that d (42) (ϕ + εh)−1 = −D(ϕ−1)h ◦ ϕ−1. dε |ε=0 This can be proved by computing the derivative of the equation (ϕ + εh)−1 ◦ (ϕ + εh) = id at ε = 0. Using this, we can compute δU Z | h = −2 (I ◦ ϕ−1(x) − I0(x))∇IT ◦ ϕ−1D(ϕ−1)h ◦ ϕ−1dx. δϕ Ω and Z ∂¯V U(ϕ)| h = −2 (I ◦ ϕ−1(x) − I0(x))∇IT ◦ ϕ−1D(ϕ−1)hdx. Ω so that (43) ∂¯V U(ϕ) = −2(I ◦ ϕ−1 − I0)∇(I ◦ ϕ−1) ⊗ dx. 2. MATCHING TERMS AND GREEDY DESCENT PROCEDURES 53
(Using the fact that ∇(I ◦ ϕ−1)T = ∇IT ◦ ϕ−1D(ϕ−1).) To compute the Eulerian gradient de U, we want to use the following lemma: Lemma 4. If V is a Hilbert space of smooth vector fields on Ω with kernel K and µ is a measure on Ω, z a µ-measurable function from Ω to Rd (a vector density), then, for all x ∈ Ω, Z K(z ⊗ µ)(x) = K(x, y)z(y)dµ(y). Ω Proof. Recall that z ⊗ µ is the linear form on vector fields defined by Z (z ⊗ µ| h) = zT hdµ. Ω The proof of lemma 4 goes as follow. From the definition of the kernel, we have, for any a ∈ Rd: T a K(z ⊗ µ)(x) = (a ⊗ δx| K(z ⊗ µ))
= (z ⊗ µ| K(a ⊗ δx)) = (z ⊗ µ| K(., x)a) Z = zT (y)K(y, x)adµ(y) Ω Z = aT K(x, y)z(y)dµ(y). Ω ut The expression of the Eulerian gradient of U is now straightforward given Lemma 4: V Z (44) ∇ U(ϕ) = −2 (I ◦ ϕ−1(y) − I0(y))K(., y)∇(I ◦ ϕ−1)(y)dy. Ω This provides the following greedy gradient descent algorithm, the original version being given in [7] (see also [26])
Algorithm 3 (Greedy image matching). Start with ϕ0 = id and solve the evolution equation Z d 0 ϕt(y) = 2 (It(x) − I (x))K(ϕt(y), x)∇It(x)dx dt Ω −1 with It = I ◦ ϕt .
Other, more complex, comparison criteria for images can be based on image histograms, or grey-level distributions. For example, define, for λ ∈ R the level line of an image I at λ by Iλ = {x ∈ Ω,I(x) = λ} In order to compare images which have been obtained under different lighting con- ditions, or by different imaging devices, in which case the grey level values are not reliable anymore, it is necessary to use matching criteria which are invariant to one-to-one transformations of the grey-levels. A group of methods which have been used in this context are based on joint local histograms of the grey levels. Such local histograms are computed from a pair of images, I,I0, and are functions of the kind 0 Hx(λ, λ ) which count the frequency of simultaneous occurence of grey-levels λ in I 54 3. MATCHING DEFORMABLE OBJECTS and λ0 in I0 at the same location in a small window around x. One computationally feasible way to define it is to use the “kernel estimator” Z 0 0 0 Hx(λ, λ ) = f(|I(y) − λ|)f(|I (y) − λ |)g(x, y)dy Ω where f is positive, R f(t)dt = 1, and f vanishes when t is far from 0, g is such R R that for all x, Ω g(x, y)dy = 1 and g(x, y) vanishes when y is far from x. For each x, Hx is a bidimensional probability function, and there exists several ways for measuring the degree of dependence between each of its components. The simplest one, which is probably sufficient for most applications, is the correlation ratio, given by (reintroducing I and I0 in the notation) R 0 0 0 0 2 λλ Hx(λ, λ )dλdλ Cx(I,I ) = 1 − q R R 2 0 0 R 0 2 0 0 2 λ Hx(λ, λ )dλdλ 2 (λ ) Hx(λ, λ )dλdλ R R It is then possible to define the matching function by Z −1 0 U(ϕ) = Cx(I ◦ ϕ ,I )dx Ω Here again, the computation of Eulerian gradients is possible. Since it is quite lengthy, we do not detail it, but the important fact to remember is that they lead to an algorithm similar to 3 for what is often called multimodal image matching. 2.4. Measure matching. We now consider the case when the compared ob- jects are measures on Rd. The motivation for considering measures is the matching d of unordered point sets (x1, . . . , xN ) in R . At the difference of the point matching method that has been discussed in section 2.2, there is no information on which element to the second point set corresponds to a given element of the first one. In fact, the sizes of the two point sets do not even need to be the same. In such a framework, it is natural to represent (x1, . . . , xN ) as a sum of Dirac measures N X µx = δxi i=1 which induces the measure matching problem. The only fact we will need here concerning measures is that they are linear forms acting on functions on Rd via Z (µ| f) = fdµ. d R In particular, if µx is like above, N X (µx| f) = f(xi). i=1 If ϕ is a diffeomorphism of Ω and µ a measure, we define a new measure ϕ.µ as follows (ϕ.µ| f) = (µ| f ◦ ϕ).
If µ = µx this gives N X (ϕ.µx| f) = f ◦ ϕ(xi) = µϕ(x)| f i=1 2. MATCHING TERMS AND GREEDY DESCENT PROCEDURES 55 so that the transformation of the measure associated to a point set x is the measure associated to the transformed point set, which is reasonable. Now that we have defined ϕµ we need to define our function D(ϕµ, µ0) in order to run the greedy matching algorithm. We can use the norm of the difference between the two measures, provided we have chosen a suitable norm on measures. Since measures are linear forms on functions, one possible choice is to use the dual to a function norm, i.e., to define (45) kµk = sup {(µ| f): kfk = 1} which now displaced the question to which function norm to use. One choice leads to a particularly simple and elegant formulation, in association –again– to the theory of reproducing kernels. Indeed, let W be a reproducing kernel Hilbert space of functions, so that we ∗ have an operator k : W → W with kδx = k(., x) and with the identity (µ| f) = ∗ hkµ , fiW for µ ∈ W , f ∈ W . With this choice, (45) becomes
kµkW ∗ = sup {(µ| f): kfkW = 1}
= sup {hkµ , fiW : kfkW = 1} = kkµkW This implies that 2 kµkW ∗ = hkµ , kµiW = (µ| kµ). If µ is a measure, this expression is very simple and given by Z 2 kµkW ∗ = k(x, y)dµ(x)dµ(y). R This is because kµ(x) = (δx| kµ) = (µ| kδx) = k(y, x)dµ(y). So we can take 0 0 2 (46) D(ϕµ, µ ) = kϕµ − µ kW ∗ Let’s make this expression more explicit. we have
0 0 0 0 D(ϕµ, µ ) = hϕµ , ϕµiW ∗ − 2hϕµ , µ iW ∗ + hµ , µ iW ∗ = (ϕµ| k(ϕµ)) − 2(ϕµ| kµ0) + (µ0| kµ0) Z Z = k(ϕ(x), ϕ(y))dµ(x)dµ(y) − 2 k(ϕ(x), y)dµ(x)dµ0(y) Z + k(x, y)dµ0(x)dµ0(u).
It is now easy to compute δU/δϕ. It is given by (letting ∇1k be the gradient of k with respect to its first variable and using the fact that k is symmetric) δU Z | h = 2 ∇ k(ϕ(x), ϕ(y))T h(x)dµ(x)dµ(y) δϕ 1 Z T 0 −2 ∇1k(ϕ(x), z) h(x)dµ(x)dµ (z) which can be written as δU Z Z (47) = 2 ∇ k(ϕ(.), ϕ(y))dµ(y) − ∇ k(ϕ(.), z)dµ0(z) ⊗ µ δϕ 1 1 56 3. MATCHING DEFORMABLE OBJECTS
The Eulerian differential now is Z ¯V T ∂ U(ϕ)| h = 2 ∇1k(ϕ(x), ϕ(y)) h ◦ ϕ(x)dµ(x)dµ(y) Z T 0 −2 ∇1k(ϕ(x), z) h ◦ ϕ(x)dµ(x)dµ (z) which yields the expression of the Eulerian gradient: (48) Z Z V 0 ∇ U(ϕ)(.) = 2 K(., ϕ(x)) ∇1k(ϕ(x), ϕ(y))dµ(y) − ∇1k(ϕ(x), z)dµ (z) dµ(x)
The gradient descent algorithm is dϕ Z Z (ξ) = −2 K(ϕ(ξ), ϕ(x)) ∇ k(ϕ(x), ϕ(y))dµ(y) − ∇ k(ϕ(x), z)dµ0(z) dµ(x). dt 1 1 0 Let’s now consider our case of interest for which µ = µx and µ = µy for point sets x = (x1, . . . , xN ) and y = (y1, . . . , yM ). The algorithm becomes (49) N N M dϕ X X X (ξ) = −2 K(ϕ(ξ), ϕ(x )) ∇ k(ϕ(x ), ϕ(x )) − ∇ k(ϕ(x ), y ) . dt i 1 i j 1 i h i=1 j=1 h=1 Like in the case of labelled point sets, this equation may be solved in two times: lettingx ˜i(t) = ϕt(xi) solve the system
N N M dx˜q X X X = −2 K(˜x , x˜ ) ∇ k(˜x , x˜ ) − ∇ k(˜x , y ) . dt q i 1 i j 1 i h i=1 j=1 h=1 ˜ Once this is done, the trajectory of an arbitrary point ξ(t) = ϕt(ξ) is
N N M dξ˜ X X X = −2 K(ξ,˜ x˜ ) ∇ k(˜x , x˜ ) − ∇ k(˜x , y ) . dt i 1 i j 1 i h i=1 j=1 h=1 2.5. Matching lines and surfaces. 2.5.1. Curve matching with measures. The previous approach holding for any measure, it is not restricted to discrete point sets but also applies to continuous ones. It can, for example, be used to match plane or 3D curves, or surfaces. In this situation, however, it has the important aspect of not being geometric. To see why, consider the simplest example of plane curves. If γ is a curve, parameterized over an interval [a, b], one defines the line measure µγ by Z Z b (µγ | f) = fdl = f(γ(u))|γ˙ (u)|du. γ a This definition does not depends on how γ has been parameterized. Now, if ϕ is a diffeomorphism, we have, by definition of the action of diffeomorphisms on measures Z Z b (ϕµγ | f) = f ◦ ϕdl = f(ϕ(γ(u)))|γ˙ (u)|du γ a whereas Z Z b (µϕγ | f) = fdl = f(ϕ(γ(u)))|Dϕ(γ(u))γ ˙ (u)|du. ϕ(γ) a 2. MATCHING TERMS AND GREEDY DESCENT PROCEDURES 57
So, at the difference of point sets, for which ϕµx = µϕ(x), the image of the measure associated to a curve is not the measure associated to the image of a curve. Since the initial goal is to compare curves, and not measures, it is more natural to use the second definition, µϕγ , rather than the first one. We still have, using the notation of the previous section
D(µϕγ , µγ˜) = hµϕγ , µϕγ iW ∗ − 2hµϕγ , µγ˜iW ∗ + hµγ˜ , µγ˜iW ∗ = (µϕγ| k(µϕγ )) − 2(µϕγ | kµγ˜) + (µγ˜| kµγ˜) Z b Z b = k(ϕ(γ(u)), ϕ(γ(v)))|Dϕ(γ(u))γ ˙ (u)||Dϕ(γ(v))γ ˙ (v)|dudv a a Z b Z ˜b −2 K(ϕ(γ(u)), γ˜(v))|Dϕ(γ(u))γ ˙ (u)|dudv a a˜ Z + k(˜γ(u), γ˜(v))dudv.
The fact that derivatives of ϕ appear in the integral make the computation of the gradient slightly more complex. We shall only describe here how this compu- tation may be achieved without providing all details. The new term compared to the previous computation is |Dϕ(γ(u))γ ˙ (u)| and we focus on its variation, writing (using the fact that (d/dε)|z + εζ| = zT ζ/|z| at ε = 0, we get d |D(ϕ + εh)(γ(u))γ ˙ (u)| = (Dh(γ(u))γ ˙ (u))T Dϕ(γ(u))γ ˙ (u)/|Dϕ(γ(u))γ ˙ (u)|. dε |ε=0 This expression is still linear in h, but now requires using derivatives. In the computation of the Eulerian gradient, this requires using differential of the kernel, as the following computation indicates. We can indeed formally write, for h ∈ V and a ∈ Rd T ∂h ∂ a (x) = hh , K(., x)aiV ∂xi ∂xi ∂ = h , K(., x)a ∂xi V which therefore involves the derivative of the kernel with respect to its second variable. This implies that T T (50) a Dh(x)b = h , b D2K(., x)a V where D2K is the differential of K with respect to its second coordinate. The rest of the computation is handled like for the general measure case; this yields, using the notation γϕ(u) = ϕγ(u) (details are left to the reader)
b b V Z Z ∇ U(ϕ)(.) = 2 K(., γϕ(u))∇1k(γϕ(u), γϕ(v))|γ˙ ϕ(u)||γ˙ ϕ(v)|dudv a a Z b Z b T + 2 η(u) D2K(., γϕ(u))η(u)k(γϕ(u), γϕ(v))|γ˙ ϕ(u)||γ˙ ϕ(v)| a a Z b Z ˜b − 2 ∇1k(γϕ(u), γ˜(v))|γ˙ ϕ(u)| |γ˜˙ (v)|dudv a a˜ Z b Z ˜b T − 2 η(u) D2K(., γϕ(u))η(u)k(γϕ(u), γ˜(v))|γ˙ ϕ(u)| |γ˜˙ (v)|dudv a a˜ 58 3. MATCHING DEFORMABLE OBJECTS with η(u) =γ ˙ ϕ(u)/|γ˙ ϕ(u)|. The computation is simpler, however, if we directly optimize a discretized ver- sion of D(µϕγ , µγ˜) (which needs to be discretized anyway). If a curve γ is discretized into points x0, . . . xN (with xN = x0 if the curve is closed), one can define the dis- cretized measure, still denoted µγ N X (µγ | f) = f(ci)|τi| i=1 with ci = (xi + xi−1)/2 and τi = xi − xi−1. The measure ϕ.γ takes the same form ϕ ϕ with ci = (ϕ(xi) + ϕ(xi−1))/2 and τi = ϕ(xi) − ϕ(xi−1). This time, lettingγ ˜ be discretized inx ˜1,..., x˜M , we have N X ϕ ϕ ϕ ϕ D(µϕγ , µγ˜) = k(ci , cj )|τi | |τj | i,j=1 N N M X X ϕ ϕ X −2 k(ci , c˜j)|τi | |τ˜j| + k(˜ci, c˜j)|τ˜i| |τ˜j|. i=1 j=1 i,j=1 The differential of this expression is computed like with point measures and provides (assuming closed curves)
N N δU X X = (∇ k(cϕ, cϕ)|τ ϕ| + ∇ k(cϕ , cϕ)|τ ϕ |)|τ ϕ| ⊗ δ δϕ 1 i j i 1 i+1 j i+1 j xi i=1 j=1 N N ϕ ϕ X X τ τ +2 (k(cϕ, cϕ) i − k(cϕ , cϕ) i+1 )|τ ϕ| ⊗ δ i j |τ ϕ| i+1 j |τ ϕ | j xi i=1 j=1 i i+1
N N X X ϕ ϕ ϕ ϕ − (∇1k(ci , c˜j)|τi | + ∇1k(ci+1, c˜j)|τi+1|)|τ˜j| ⊗ δxi i=1 j=1 N N ϕ ϕ X X τ τ +2 (k(cϕ, c˜ ) i + k(cϕ , c˜ ) i+1 )|τ˜ | ⊗ δ i j |τ ϕ| i+1 j |τ ϕ | j xi i=1 j=1 i i+1 from which the Eulerian differential and gradient can be computed. 2.5.2. Curve matching with vector measures. Instead of describing a curve by a measure, which is a linear form on functions, it is possible to represent it by a vector measure, which is a linear form on vector fields. For each curve parameterized d γ :[a, b] → R , we will define a vector measure νγ , which will therefore associate to d each vector field f on R a number (νγ | f) given by Z b T νγ (f) = γ˙ (u) f ◦ γ(u)du. a It can be checked that this definition is geometric: it is invariant by a change of parameterization. It is in fcat given by τγ ⊗ dl where τ is the unit tangent to γ and dl the line measure along γ (note that this depends on the orientation of γ). If ϕ is a diffeomorphism, we then have Z b T νϕγ (f) = (Dϕ(γ(u))γ ˙ (u)) f ◦ ϕ(γ(u))du. a 2. MATCHING TERMS AND GREEDY DESCENT PROCEDURES 59
Like for scalar measures, we can use a dual norm for the comparison of two vector measures, defined by
kνkW ∗ = sup {(ν| f): kfkW = 1} where W now is a reproducing kernel Hilbert space, of vector fields this time, and 2 we still have kνkW ∗ = (ν| kν) where k is the kernel of W . In particular, we have Z b Z b 2 T kνγ kW ∗ = γ˙ (u) k(γ(u), γ(v))γ ˙ (v)dudv a a and Z b Z b 2 T T kνγ kW ∗ = γ˙ (u) Dϕ(γ(u)) k(ϕ(γ(u)), ϕ(γ(v)))Dϕ(γ(v))γ ˙ (v)dudv. a a
Defining D(νϕγ , νγ˜) = kνϕγ − νγ˜kW ∗ , a computation similar to the previous one can be made to obtain the Eulerian gradient. We have in particular Z b Z b δU T T | h = 2 γ˙ (u) Dϕ(γ(u)) D1k(ϕ(γ(u)), ϕ(γ(v)))h(γ(u))Dϕ(γ(v))γ ˙ (v)dudv) δϕ a a Z b Z b T T −2 γ˙ (u) Dϕ(γ(u)) D1k(ϕ(γ(u)), γ˜(v))h(γ(u))γ˜˙ (v)dudv a a Z b Z b +2 γ˙ (u)T Dh(γ(u))T k(ϕ(γ(u)), ϕ(γ(v)))Dϕ(γ(v))γ ˙ (v)dudv) a a Z b Z ˜b −2 γ˙ (u)T Dh(γ(u))T k(ϕ(γ(u)), γ˜(v))γ˜˙ (v)dudv. a a˜ Dealing with the derivatives of h can be done like with measures, yielding derivatives of the V kernel in the Eulerian gradient. Here, however, we can also get rid of the derivative of h, al least when the curve γ is smooth enough. Denote, to simplify, γϕ = ϕ.γ = ϕ ◦ γ. Since Dh(γ(u))γ ˙ (u) = (d/du)h ◦ γ(u), we can use an integration by parts in the last two integrals, yielding
Z b Z b T T γ˙ (u) Dh(γ(u)) k(γϕ(u), γϕ(v))γ ˙ ϕ(v)dudv = a a Z b Z b T T h(γ(b)) k(γϕ(b), γϕ(v))γ ˙ ϕ(v)dv − h(γ(a)) k(γϕ(a), γϕ(v))γ ˙ ϕ(v)dv a a Z b Z b T − h ◦ γ(u) (D1k(γϕ(u), γϕ(v))γ ˙ ϕ(u))γ ˙ ϕ(v)dv a a and
Z b Z ˜b T T γ˙ (u) Dh(γ(u)) k(γϕ(u), γ˜(v))γ˜˙ (v)dudv = a a˜ Z ˜b Z ˜b T T h(γ(b)) k(γϕ(b), γ˜(v))γ˜˙ (v)dv − h(γ(a)) k(γϕ(a), γ˜(v))γ˜˙ (v)dudv a˜ a˜ Z b Z ˜b T − h(γ(u)) (D1k(γϕ(u), γ˜(v))γ ˙ ϕ(u))γ˜˙ (v)dudv a a˜ 60 3. MATCHING DEFORMABLE OBJECTS
Replacing h by h ◦ ϕ, we obtain what we need to compute the Eulerian differential
Z b Z b ¯ T ∂U(ϕ)| h = 2 γ˙ ϕ(u) (D1k(γϕ(u), γϕ(v))h(γϕ(u)))γ ˙ ϕ(v)dudv a a Z b Z b T −2 γ˙ ϕ(u) (D1k(γϕ(u), γ˜(v))h(γϕ(u)))γ˜˙ (v)dudv a a Z b Z ˜b ! T +2h(γϕ(b)) k(γϕ(b), γϕ(v))γ ˙ ϕ(v)dv k(γϕ(b), γ˜(v))γ˜˙ (v)dv a a˜ Z b Z ˜b ! T −2h(γϕ(a)) k(γϕ(a), γϕ(v))γ ˙ ϕ(v)dv − k(γϕ(a), γ˜(v))γ˜˙ (v)dv a a˜ Z b Z b Z ˜b ! T −2 h(γϕ(u)) (D1k(γϕ(u), γϕ(v))γ ˙ ϕ(u))γ ˙ ϕ(v)dv − (D1k(γϕ(u), γ˜(v))γ ˙ ϕ(u))γ˜˙ (v)dv du. a a a˜
To simplify this expression, let τϕ be the unit tangent to γϕ, and similarly forτ ˜). Let kij be the component of the matrix kernel k. Introduce the quantities ρ andρ ˜ such that, for a point x on γϕ, Z X ∂kij ρ (x) = τ T (x) (x, y)τ (y)dy q ϕ ∂x ϕ i,j γϕ q Z X ∂kij ρ˜ (u) = τ˙ (x)T (x, y˜)˜τ(˜y)dy˜ q ϕ ∂x i,j γ˜ q for q = 1, . . . , d, Z ζq(x) = (D1k(x, y)τϕ(x))τϕ(y)dy γϕ Z ˜ ζq(x) = (D1k(x, y)τϕ(x))˜τ(˜y)dy˜ γ˜ϕ and Z η(x) = k(x, y)τϕ(y)dy γϕ Z η˜(x) = k(x, y˜)˜τ(˜y)dy˜ γ˜ so that Z ∂U¯ (ϕ)| h = 2 hT (ρ + ζ − ρ˜ − ζ˜) γϕ T T +2h(x1) (η(x1) − η˜(x1)) − 2h(x1) (η(x1) − η˜(x1)). x0 and x1 being the extremities of γϕ. This implies that ¯ ˜ (51) ∂U(ϕ) = 2(ρ + ζ − ρ˜ − ζ) ⊗ µϕ.γ + 2(η − η˜) ⊗ (δx1 − δx0 ). (The last term vanishes is γ is a closed curve). 2. MATCHING TERMS AND GREEDY DESCENT PROCEDURES 61
3 2.5.3. Surfaces. If S is a surface in R , one can define a measure µS and a vector measure νS by Z (µS| f) = f(x)dσS(x) for a scalar f S and Z T (νS| f) = f(x) N(x)dσS(x) for a vector field f S where dσ is the area form on S and N is the unit normal (S being assumed to be oriented in the second case). If W is a reproducing kernel Hilbert space of scalar functions or vector fields, we can still compare two surfaces by using the norm of the difference of their associated measures on W ∗. We state without proof the following result
Theorem 29. If S is a surface and ϕ a diffeomorphism of R3 that preserves the orientation (i.e., with positve jacobian), we have Z −T µϕ(S)| f = f ◦ ϕ(x)|Dϕ(x) N| det Dϕ(x)dσS(x) S for a scalar f and Z T −T νϕ(S)| f = f ◦ ϕ(x) Dϕ(x) N det Dϕ(x)dσS(x). S
If e1(x), e2(x) is a basis of the tangent plane to S at x, we have −T (52) Dϕ(x) N det Dϕ(x) = (Dϕ(x)e1 × Dϕ(x)e2)/|e1 × e2|. The last formula implies in particular that if S is locally parameterized by (u, v), so that (since N = (e1 × e2)/|e1 × e2| and dσS = |e1 × e2|dudv) Z T (νS| f) = f(x) (e1 × e2)dudv Z = det(e1, e2, f)dudv and Z νϕ(S)| f = det(Dϕe1, Dϕe2, f ◦ ϕ)dudv.
Theorem 29 provides the necessary formulae to compute the Eulerian gradient in both cases. This is similar to the case of curves and left to the reader. Consider now the discrete case and let S be a triangulated surface. Let T1,...,TN be the triangles that compose S, ci be the center of Ti, Ni its oriented unit normal and ai its area. We define the discrete versions of the previous measures by
N X (µS| f) = f(ci)ai for a scalar f i=1 and N X T (νS| f) = f(ci) Niai for a vector field f. i=1 62 3. MATCHING DEFORMABLE OBJECTS
Moreover, letting xi1, xi2 and xi3 be the vertices of Ti, the previous formulae can be written
N X xi1 + xi2 + xi3 (µ | f) = f |(x − x ) × (x − x )| S 3 i2 i1 i3 i1 i=1 and N T X xi1 + xi2 + xi3 (ν | f) = f (x − x ) × (x − x ) S 3 i2 i1 i3 i1 i=1 where the last formula assumes that the vertices or the triangles are ordered con- sistently with the orientation. The transformed surfaces are now represented by the same expressions with xik replaced by ϕ(xik). The explicit formulae for the Eulerian gradient of D(µϕ.S, µS˜) can be obtained after a routine (but lengthy) computation similar to the cases of point sets or discretized curves [27]. 2.5.4. Induced actions and currents. This section assumes some preliminar knowledge in differential geometry. It can be skipped without harming the under- standing of the rest of these notes We have designed the action of diffeomorphisms on measures by (ϕµ| f) = (µ| f ◦ ϕ). Recall that we have the usual action of diffeomorphisms on functions defined by ϕ.f = f ◦ ϕ−1, so that we can write (ϕµ| f) = µ| ϕ−1.f. In the case of curves, we have seen that this action on the induced measure did not correspond to the image of the curve by a diffeomorphisms, in the sens that µϕγ 6= ϕµγ . Here, we discuss whether the transformations µγ → µϕγ or νγ → νϕγ (and the equivalent transformations for surfaces) can be explained by a similar operation, e.g., can one write (ϕµ| f) = µ| ϕ−1 ? f where ? would represent another action of diffeomorphisms on functions. In fact, for µγ , the answer is no. We have, letting τ(γ(u))) be the unit tangent to γ, we have
Z b Z b (µϕγ | f) = f(ϕ(γ(u)))|Dϕ(γ(u))γ ˙ (u)|du = f(ϕ(γ(u)))|Dϕ(γ(u))τ(u)||γ˙ (u)|du a a