Note on Mathematics of Imaging

Haocheng Dai

1

Definition 1. Vector space is a set of elements called vectors together with two operations: addition and scalar multiplication, which are assumed to satisfy the following axioms:

1. x + y = y + x

2.( x + y) + z = x + (y + z)

3. There is a null vector θ ∈ X such that x + θ = x for every x ∈ X

4. α(x + y) = αx + αy;(α + β)x = αx + βx

5.( αβ)x = α(βx)

6.0 x = θ; 1x = x

Definition 2. Inner product [1] h·, ·i : X × X → R is a mapping that satisfies the following axioms:

1. hx, yi = hy, xi

2. hx + y, zi = hx, zi + hy, zi

3. hλx, yi = λhx, yi

4. hx, xi ≥ 0 and hx, xi = 0 if and only if x = θ where X is a vector space.

Example 1. For A, B ∈ GL(n), the inner product hA, Bi = Tr(AT B) and the associated norm T 1/2 kAk2 = Tr(A A) .

Definition 3. Norm k · k : X × X → R is a mapping that satisfies the following axioms:

1. kxk ≥ 0 for all x ∈ X, kxk = 0 if and only in x = θ

2. kx + yk ≤ kxk + kyk for each x, y ∈ X. Triangle inequality

3. kαxk = |α| · kxk for all scalars α and each x ∈ X where X is a vector space.

1 Remark 1. Norm and inner product are two independent concepts. Norm is not necessarily defined by inner product. But when the Banach space’s norm is defined by inner product, then it is called Hilbert space.

Example 2. The norm is clearly an abstraction of our usual concept of length. For continuous situation, the supremum norm kfk∞ is the supremum (lowest upper bound) of all elements of its domain evaluated in f. For discrete situation, the sup norm equals to the maximum of absolute values of its components, namely kfk∞ = max |fi|.

n Remark 2. If f : R → R, f(x) = kxkp, p ≥ 1, then f is convex.

Figure 1: Image of 2D norm

p Definition 4. l space consists of all sequences of scalars {ξ1, ξ2, ···} for which

∞ X p |ξi| < ∞ i=1 where 1 ≤ p < ∞.

p The norm of an element x = {ξi} in l is defined as

∞ !1/p X p kxkp = |ξi| i=1

Definition 5. Lp[a, b] space (Lebesgue sapce) consists of all functions f(u) for which

Z b |f(u)|pdu < ∞ a where 1 ≤ p < ∞.

The norm of an element f(u) in Lp is defined as

1/p Z b ! p kfkp = |f(u)| du a

The Lp-functions are the functions for which this integral converges. For p 6= 2, the space of Lp-functions is a Banach space which is not a Hilbert space.

Remark 3. Always remember the absolute value sign in norm calculation.

Remark 4. Lp space is a space of measurable functions for which the p-th power of the absolute value is Lebesgue integrable.

2 Figure 2: Visualization of kxkp = 1, which are the cross sections of Figure 1.

Definition 6. Lebesgue integral of a function f over a measure space X is written Z fdµ X to emphasize that the integral is taken with respect to the measure µ.

Remark 5. In mathematics, the integral of a non-negative function of a single variable can be regarded, in the simplest case, as the between the graph of that function and the x-axis. The Lebesgue integral extends the integral to a larger class of functions.

Figure 3: Riemann-Darboux’s integration (in blue) and Lebesgue integration (in red).

Definition 7. Normed linear vector space is a vector space X on which there is defined a real- valued function which maps each element x in X into a real number kxk.

Definition 8. Pre-Hilbert space is a linear vector space X together with an inner product defined on X × X.

Definition 9. Cauchy sequence is a sequence {xn} in a normed space such that kxn − xmk → 0 as n, m → ∞.

Remark 6. In a normed space, every convergent sequence is a Cauchy sequence, however, a Cauchy sequence may not be convergent.

3 Figure 4: Example of Cauchy sequence

Definition 10. A normed linear vector space X is complete is every Cauchy sequence from X has a limit in X. The limit is also a vector.

Definition 11. Banach space is a complete normed linear vector space.

Definition 12. Hilbert space is a complete pre-Hilbert space or a Banach space whose norm is defined by inner product.

Remark 7. Actually, the hypothesis of completness is weak, it is always possible to complete a space X endowed with a inner product. The “completed” norm is then associated with the inner product.

Remark 8. Hilbert spaces are complete infinitedimensional spaces in which distances and can be measured. These spaces have a major impact in analysis and topology and will provide a convenient and proper setting for the functional analysis of partial differential equations.

Theorem 1. Holder Inequality. If p and q are positive numbers 1 ≤ p ≤ ∞, 1 ≤ q ≤ ∞, such that

1/p + 1/q = 1 and if x = {ξ1, ξ2, · · · } ∈ lp, y = {η1, η2, · · · } ∈ lq, then ∞ X |ξiηi| ≤ kxkp · kykq i=1  1/q  1/p Equality holds if and only if |ξi| = |ηi| for each i. kxkp kykq

Theorem 2. Cauchy-Schwarz Inequality. If p = 2 and q = 2 and if x = {ξ1, ξ2, · · · } ∈ l2, y =

{η1, η2, · · · } ∈ l2, then ∞ X |ξiηi| ≤ kxk2 · kyk2 i=1

Theorem 3. Minkowski Inequality. If x and y are in lp, 1 ≤ p ≤ ∞, then

kx + ykp ≤ kxkp + kykp

Equality holds if and only if k1x = k2y for some positive constants k1 and k2.

Theorem 4. Divergence Theorem. Letting ϕ be a C1 vector field, defined on Ω, which is a region in the plane with boundary ∂Ω, then Z Z divϕdx = hϕ, Nidl Ω ∂Ω where N is the outward normal to Ω and div(ϕ) = trace(Dϕ).

4 Definition 13. Sobolev space W k,p(R) for 1 ≤ p ≤ ∞ in one-dimensional case is defined as the subset of functions f in Lp(R) such that f and its weak derivatives1 up to order k have a finite Lp norm.

1 k ! p Z p X (i) kfkk,p = f (t) dt . i=0

Remark 9. In the one-dimensional problem it is enough to assume that the (k−1)-th derivative f (k−1) is differentiable almost everywhere and is equal almost everywhere to the Lebesgue integral of its derivative (this excludes irrelevant examples such as Cantor’s function).

Example 3. Sobolev spaces with p = 2 are especially important because of their connection with Fourier series and because they form a Hilbert space. A special notation has arisen to cover this case, since the space is a Hilbert space: Hk = W k,2.

Thereby, the frequently occurring H1 denotes the Sobolev space is constituted by the functions f such that its first derivative have a finite L2 norm.

Example 4. The space Hk can be defined naturally in terms of Fourier series whose coefficients decay sufficiently rapidly, namely,

( ∞ ) 2 k 2 X 2 4 2k H (T) = f ∈ L (T): 1 + n + n + ··· + n fb(n) < ∞ n=−∞ where fb is the Fourier series of f, and T denotes the 1-torus. As above, one can use the equivalent norm

∞ 2 2 X 2k kfkk,2 = 1 + |n| fb(n) . n=−∞

Remark 10. Overview of several spaces:

Spaces Elements Operations Equivalents

Vector vectors x + y, αx Pre-Hilbert vectors x + y, αx, h·, ·i vector space + h·, ·i Banach vectors x + y, αx, k · k vector space(complete) +k · k x + y, αx, Hilbert vectors vector space(complete)+ h·, ·i + k · k k · k defined by h·, ·i functions(vectors) Lebesgue x + y, αx, k · k Banach space s.t. kfkp < ∞, p ∈ [1, ∞) functions(vectors) H1 Sobolev x + y, αx, k · k Banach space s.t. f 0 has L2 norm

1A weak derivative is a generalization of the concept of the derivative of a function (strong derivative) for functions not assumed differentiable, but only integrable to lie in the Lp space.

5 Definition 14. Euler-Lagrange equation is defined as

∂F ∂  ∂F  − = 0. ∂y ∂x ∂y0 which is used to find a y = f(x) making this integral

Z x2 L(y) = F (x, y, y0)dx x1 stationary.

Example 5. Suppose A and B are two points in a Euclidean space. We want to find the geodesic between A and B.

Solution. We would like to minimize Z B L = 1ds, where ds = p(dx)2 + (dy)2 = p1 + (y0)2dx A which can be written in another form Z B L = p1 + (y0)2dx A We need to find a y(x) which minimize L, where

F = p1 + (y0)2

Substituting it into the Euler-Lagrange equation, we have

∂F d  ∂F  − = 0 ∂y dx ∂y0 ! d y0 − = 0 dx p1 + (y0)2 y0 = c p1 + (y0)2 c2 (y0)2 = 1 − c2 0 y = c1

y = c1x + c2

Theorem 5. The property of a Green’s function can be exploited to solve differential equations of the form Lu(x) = f(x), where L and f(x) are given. If the kernel of L is non-trivial, then the Green’s function is not unique. A Green’s function, G(x, s) of a linear differential operator L = L(x) at point s, is any solution of

LG(x, s) = δ(s − x),

6 Operator L(f + g) = L(f) + L(g) L(tf) = tL(f) d(f+g) df dg d(tf) df Differential dx = dx + dx dx = t dx Integral R (f + g)dx = R fdx + R gdx R (tf)dx = t R fdx Gradient ∇(f + g) = ∇f + ∇g ∇(tf) = t∇f Fourier F(f + g) = Ff + Fg F(tf) = tFf Laplacian ∆(f + g) = ∆f + ∆g ∆(tf) = t∆f Expectation E(f + g) = E(f) + E(g) E(tf) = tE(f) where δ is the Dirac delta function. If such function G can be found for the operator L, then we obtain Z Z LG(x, s)f(s)ds = δ(s − x)f(s)ds = f(x)

Because the operator L = L(x) is linear and acts only on the variable x, one may take the operator L outside of integration, yielding Z  L G(x, s)f(s)ds = f(x), which means that Z u(x) = G(x, s)f(s)ds is the solution to Lu(x) = f(x).

Definition 15. Linear operators L are the operators such that for every pair of functions f and g and scalar t, it has

L(f + g) = L(f) + L(g),

L(tf) = tL(f).

Example 6. The following operators are all linear:

Definition 16. [13] Kernel k : X × X → R is a function, where H is a Hilbert space and X is a non-empty set, if there exists a function φ : X → H such that for any x, x0 ∈ X , we have

0 k(x, x ): X × X → R

0 0 k(x, x ) = hφ(x), φ(x )iH

Remark 11. We imposed almost no conditions on X : we don’t even require there to be an inner product defined on the elements of X . Defining inner product on H is enough. For example, let x, x0 represent two different books, we can’t take an inner product between books, but we can take an inner product between the feature maps φ(x), φ(x0) corresponding to x, x0.

Remark 12. Kernel gives a way to compute inner products in some feature space without even knowing what this space is and what is φ. In most cases, we care more about the inner product than the feature mapping itself. We never need the coordinates of the data in the feature space. One example is the Gaussian kernel k(x, y) = exp(−γkx−yk2). If we Taylor-expand this function, we’ll see that it corresponds to an infinite-dimensional codomain of φ.

7 T Figure 5: φ(x) = [x1, x2, x1x2] example of kernel: on the left, the points are plotted in the original space; on the right, the points are plotted into a higher dimensional feature space by φ.

Definition 17. Reproducing kernel k : X × X → R is a function, where H is a Hilbert space and X is a non-empty set, if k satisfies

1. ∀x ∈ X , k(·, x) ∈ H . Feature map of every point is in the feature space

2. ∀x ∈ X , ∀f ∈ H, f(x) = hf, k(·, x)iH . Reproducing property

3. k(x, y) = hk(·, x), k(·, y)iH = hφ(x), φ(y)iH

Remark 13. The feature map is not unique, only the kernel is. RKHS functions can be written as linear combination of feature maps k(·, x), which we can regard as “basis function”:

m m X X f(·) = αiφi(·) = αik(·, xi) i=1 i=1

f(x) = hf(·), k(·, x)iH * m + X = αik(·, xi), k(·, x) i=1 H m X = αi hk(·, xi), k(·, x)iH i=1 m X = αik(x, xi) i=1

Example 7. We define a feature map φ : R2 → R3

T φ(x) = [x1, x2, x1x2]

For the reproducing property, we define a RKHS function f : R2 → R

∞ X f(x) = flφl(x) . Remark 3 l=1

= ax1 + bx2 + cx1x2

f(·) = [a, b, c]T

8 where f(·) or f stands for a function while f(x) means the value of function f at x. With this, we can write

f(x) = f(·)T φ(x)

= hf(·), φ(x)iH

The reproducing property tells us that the evaluation of f at x can be written as an inner product in feature space.

Definition 18. Primal-Dual method Assume the primal problem as below:

maximize z(x)

subject to G(x) ≤ θ x ∈ Ω which is equivalent to

minimize w(y) = sup{z(x) + hG(x), yi} x∈Ω subject to y ≥ θ x ∈ Ω

More specifically,

Primal

n X maximize z = cjxj j=1 n X subject to aijxj ≤ bi (i = 1, 2, ··· , m) j=1

xj ≥ 0 (j = 1, 2, ··· , n)

Dual

m X minimize w = biyi i=1 m X subject to aijyi ≥ cj (j = 1, 2, ··· , n) i=1

yi ≥ 0 (i = 1, 2, ··· , m)

Remark 14. The difference between supremum (resp. infimum) and maximum (resp. minimum) is that for bounded, infinite sets, the maximum (resp. minimum) may not exist, but the supremum (resp. infimum) always does.

9 Figure 6: Supremum and Infimum

Example 8.

Primal

max z = 30x1 + 100x2

s.t. x1 + x2 ≤ 7

4x1 + 10x2 ≤ 40

x1 ≥ 3

x1 ≥ 0

x2 ≥ 0

Multiply constraints i by a factor yi. Choose the sign of yi such that all inequalities are ≤ after multi- plication:

max z = 30x1 + 100x2

s.t. x1 + x2 ≤ 7 ×y1

4x1 + 10x2 ≤ 40 ×y2

x1 ≥ 3 ×(−y3)

x1 ≥ 0 ×(−y4)

x2 ≥ 0 ×(−y5)

Add up all the obtained inequalities into a resultant inequality:

(y1 + 4y2 − y3 − y4)x1 + (y1 + 10y2 − y5)x2 ≤ 7y1 + 40y2 − 3y3

Make the coefficients of the resultant constraint match the objective function. Then, the right hand side of the resultant constraints is an upper bound of z∗:

Dual

min w = 7y1 + 40y2 − 3y3

s.t. y1 + 4y2 − y3 − y4 = 30

y1 + 10y2 − y5 = 100

10 Remark 15. Finding the max problem is equivalent to finding the min of the upper bound, that’s why in the above example we should have the right sign of factor to make all inequalities are ≤ after multiplication. Likewise, finding the min problem is equivalent to find the max of the lower bound.

Definition 19. Thin plate splines are a spline-based technique for data interpolation and smoothing, which has the natural representation in terms of radial basis functions

K X f(x) = wiϕ(kx − cik), i=1 where wi is a set of mapping coefficient, ci is a set of control points and corresponding ϕ for TPS is ϕ(r) = r2 log(r). For 2D case, the energy function is defined as below

K " 2 2 # X ZZ ∂2f   ∂2f  ∂2f  E (f) = ky − f(x )k2 + λ + 2 + dx dx TPS,smooth i i ∂x2 ∂x ∂x ∂x2 1 2 i=1 1 1 2 2 where the tuning parameter λ is to control the rigidity of the deformation, balancing the aforementioned criterion with the measure of goodness of fit. If the interpolant pass through the data points exactly, then the first term of the energy function below should be zero. For this variational problem, it can be shown that there exists a unique minimizer f.

11 2 Differential Geometry for Images

Overview. Riemannian manifolds is a space that locally resemble Euclidean space and equipped with a Riemannian metric, which defines the inner product at each point on the tangent space and is in form of a metric tensor. Tangent space is a vector(Euclidean) space that associate with each point on the manifold. Distance between two points on a Riemannian manifold is called geodesic, which is also called the shortest path. This distance makes a manifold a metric space.

Definition 20. Isomorphism [2, 3, 4] is a function between two structures of the same type that can be reversed by an inverse function.

Example 9. In various of mathematics, isomorphisms have received specialized names, depending on the type of structure under consideration.

• An isometry (Def. 52) is an isomorphism of metric spaces.

• A homeomorphism (Def. 21) is an isomorphism of topological spaces.

• A diffeomorphism (Def. 23) is an isomorphism of spaces equipped with a differential structure, typically differentiable manifolds.

Definition 21. Homeomorphism f : M → N is a bijective function between two topological spaces M,N, such that f and f −1 are both continuous function.

Definition 22. Manifold is a Hausdorff space M with a countable basis such that for each point p ∈ M there is a neighborhood u of p that is homeomorphic to Rn for some integer n. In other words, locally, a manifold is like a Euclidean space.

Definition 23. Diffeomorphism f : M → N is a bijective function between two smooth manifolds M,N, such that f and f −1 are both smooth functions.

Remark 16. For two manifolds M and N, a smooth mapping φ : M → N induces a linear mapping of the tangent spaces φ∗ : TpM → Tp◦φN called the differential of φ.

Definition 24. Tangent space TpM is a vector space attached to each point on a manifold M, which is equivalent to the Euclidean space. Intuitively, it’s thought of as the linear space that best approximates M in a neighborhood of point p. Vectors in this space are called tangent vectors.

Remark 17. Tangent space means for each and every point p in Rn, we introduce a new coordinate system where all the vectors originated at p will reside in.

Example 10. The group is presented as

3×3 T SO(3) = {R ∈ R |R R = I, |R| = 1}

12 In order to derive the form of elements in its Lie Algebra, so(3), take a generic curve R(t) through the identity in SO(3) with derivative X ∈ so(3) at t = 0 and consider the derivative of the constraint at t = 0. The product rule yields

d T T R(t) R(t) = X + X = 0 dt t=0 This implies that any element of so(3) is a skew-symmetric .

Remark 18. The cross product of vector a and b can be written as below     0 −a3 a2 b1         a × b = [a]×b =  a3 0 −a1 · b2 ,     −a2 a1 0 b3 namely a skew-symmetric matrix times the vector b.

Example 11. The group of symmetric positive definite matrices is presented as

n×n T SPD(n) = {X ∈ R |X = X , X > 0}.

The tangent space TASPD(n) at A, is the space of symmetric matrices [6].

Definition 25. Tangent bundle TM consists of the tangent space TpM at all points p in M.

TM = {(p, v)|p ∈ M, v ∈ TpM}

Since a tangent space TpM is the set of all tangent vectors to M at p, the tangent bundle is the collection of all tangent vectors, along with the information of the point to which they are tangent.

Definition 26. Hessian matrix of a differentiable, multivariable function f : Rn → R at p is defined by  2  ∂1 fp ∂1∂2fp ··· ∂1∂nfp    2  ∂2∂1fp ∂2 fp ··· ∂2∂nfp Hfp =   .  . . .. .   . . . .    2 ∂n∂1fp ∂n∂2fp ··· ∂nfp Remark 19. The Hessian matrix of a function f is the Jacobian matrix of the gradient of the function f ; that is: H(f) = D(∇f).

Proof.

 1 n  ∂1f(x , ··· , x )   1 n  .  ∇f(x , ··· , x ) =  .    1 n ∂mf(x , ··· , x )   ∂1∂1f ∂2∂1f ··· ∂n∂1f     ∂1∂2f ∂2∂2f ··· ∂n∂2f  D(∇f) =   = Hf  . . .. .   . . . .    ∂1∂nf ∂2∂nf ··· ∂n∂nf

13 Remark 20. If multivariable function f : Rn → R, ∇f(x) = 0, Hf(x) is positive (resp. negative) definite, then x is the isolated local minimum (resp. maximum).

Remark 21. Relationship between convexity and positive-definiteness:

f is convex ⇔ ∀p, Hfp is positive semi–definite

f is strictly convex ⇔ ∀p, Hfp is positive definite

Definition 27. Laplacian of a differentiable, multivariable function f : Rn → R at p is defined by

n X 2 ∆fp = tr(Hfp) = ∂i fp. i=1

Definition 28. Jacobian matrix (differential) of a differentiable, vector-valued function f : Rn → Rm at p is defined by [9]  1 1 1   1 T  ∂1fp ∂2fp ··· ∂nfp (∇fp )      2 2 2   2 T   ∂1fp ∂2fp ··· ∂nfp   (∇fp )  Dfp =   =   ,  . . .. .   .   . . . .   .      m m m m T ∂1fp ∂2fp ··· ∂nfp (∇fp ) where   f 1(x1, ··· , xn)   1 n  .  f(x , ··· , x ) =  .  .   f m(x1, ··· , xn)

Definition 29. Divergence of a differentiable, vector-valued function f : Rn → Rm at p is defined by

n X i ∇ · fp = tr(Dfp) = ∂ifp. i=1 Remark 22. The Jacobian of ∇f, where f : Rn → R, is the Laplacian of f.

Remark 23. As we all know, a matrix can be thought of as a linear transformation. Here as well, we can n m think of Dfp as a linear function Dfp : TpR → TpR . In other words, Dfp maps a vector in the tangent space at the source point p to a vector in the tangent space at the target point f(p). More formally, we have d f(γ(t))| = Df · v ,where v ∈ T n dt t=0 p p pR

Figure 7: Jacobian

14 Remark 24. The determinant of Jacobian at a given point gives important information about the behavior of f near that point.

• If the Jacobian determinant at p is non-zero, then f is invertible near a point p ∈ Rn.

• If the Jacobian determinant at p is positive (resp. negative), then f preserves (resp. reverses) orientation near p.

• The absolute value of the Jacobian determinant at p gives us the factor by which the function f expands or shrinks volumes near p.

Example 12. For vector field f : R2 → R2:  ∂f 1 ∂f 1  Df = ∂x1 ∂x2  ∂f 2 ∂f 2  ∂x1 ∂x2

Definition 30. Integration by substitution(Change of variables) is a method for evaluating in- tegrals, for one-dimensional case: Z a Z ϕ(a) 0 0 dy f(ϕ(x))ϕ (x)dx = f(ϕ(x))dϕ(x) → dy = y (x)dx = dx b ϕ(b) dx For n-dimensional case:

dy1dy2 ··· dyn = det(Dφ)(x1, x2, ··· , xn) · dx1dx2 ··· dxn

 dy1 dy1 dy1  dx dx ··· dx  1 2 n   dy2 dy2 ··· dy2   dx1 dx2 dxn  = det   dx1dx2 ··· dxn  . . .. .   . . . .    dyn dyn ··· dyn dx1 dx2 dxn

R a 3 7 2 Example 13. Integrating b (2x + 1) (x )dx.

Solution. Making

y = ϕ(x) = 2x3 + 1 dy = ϕ0(x) = 6x2 dx Therefore we have Z a 1 Z a (2x3 + 1)7(x2)dx = f(ϕ(x))ϕ0(x)dx b 6 b 1 Z ϕ(a) = f(ϕ(x))dϕ(x) 6 ϕ(b) 1 Z ϕ(a) = f(y)dy 6 ϕ(b) 1 Z ϕ(a) = y7dy 6 ϕ(b) 1 = [y8]ϕ(a) 48 ϕ(b)

15 Definition 31. Vector field is a function on a manifold M that smoothly assigns to each point p ∈ M a tangent vector v ∈ TpM.

Figure 8: Tangent Space

Definition 32. Riemannian metric on a differential manifold M is a smooth function that assigns to each point p of M an inner product h·, ·i on the tangent space TpM.

Remark 25. Assuming A is the metric at point p ∈ M, and v, λ are the unit eigenvector and its square root corresponding eigenvalue of A.

Av = λv

kvk2 = vT Av = vT λv = λvT v = λ

The deviation above tells you that the length of unit vector in the direction of eigenvector is scored as its corresponding eigenvalue. The unit vector points to other direction may be scored at different length.

Definition 33. Metric tensor [17] is a function which tells how to compute the distance between any two points in a given space.

Example 14. Typically, we calculate the arc length as below Z arc length = kγ˙ (t)kdt

16 Figure 9: Base vectors on tangent space

By introducing the position vector, and expand it intrinsically, we can have

2 dR~ dR~ dR~

= · dλ dλ dλ ! ! dR~ du dR~ dv dR~ du dR~ dv = · + · · · + · du dλ dv dλ du dλ dv dλ ! ! du2 dR~ dR~ du dv dR~ dR~ = · + · · dλ du du dλ dλ du dv ! ! dv du dR~ dR~  dv 2 dR~ dR~ + · · + · dλ dλ dv du dλ dv dv

 ~ ~ ~ ~      dR · dR dR · dR du ~ ~ du dv du du du dv dλ dR dR =     .~eu = ,~ev = dλ dλ dR~ dR~ dR~ dR~ dv du dv dv · du dv · dv dλ | {z } metric tensor     du   ~eu · ~eu ~eu · ~ev dλ = du dv     dλ dλ dv ~ev · ~eu ~ev · ~ev dλ | {z } metric tensor Remark 26. Actually, without seeing the manifold extrinsically, it’s hard to derive the metric, as we don’t know the position vector. Provided that the metric is already given, so can we calculate what we want intrinsically.

The parametric equation of a sphere is shown as below:

R~ = [X,Y,Z]T

where X = cos(v) sin(u) = cos(λ) sin(λ)

Y = sin(v) sin(u) = sin(λ) sin(λ)

Z = cos(u) = cos(λ)

when u = λ, v = λ.

17 After expanding the base vectors extrinsically, we can have the base vectors expressed as below:

dR~ ∂R~ ∂X ∂R~ ∂Y ∂R~ ∂Z ~e = = + + u du ∂X ∂u ∂Y ∂u ∂Z ∂u ∂R~ ∂R~ ∂R~ = cos(v) cos(u) + sin(v) cos(u) − sin(u) ∂X ∂Y ∂Z

= cos(v) cos(u)~eX + sin(v) cos(u)~eY − sin(u)~eZ dR~ ∂R~ ∂X ∂R~ ∂Y ∂R~ ∂Z ~e = = + + v dv ∂X ∂u ∂Y ∂u ∂Z ∂u ∂R~ ∂R~ = − sin(v) sin(u) + cos(v) sin(u) ∂X ∂Y

= − sin(v) sin(u)~eX + cos(v) sin(u)~eY

Since ~eX ,~eY ,~eZ are perpendicular to each other, so the metric tensor is yielded as below:     ~eu · ~eu ~eu · ~ev 1 0 =    2  ~ev · ~eu ~ev · ~ev 0 sin (u)

Substituting the metric tensor back into the expression of norm of velocity, we get     ~ 2 du dR   1 0 dλ = du dv     dλ dλ dλ 2 dv 0 sin (u) dλ du2  dv 2 = + sin2(u) dλ dλ

π • For u = 4 , v = λ 2 dR~ du2  dv 2 2 = + sin (u) dλ dλ dλ π  1 = 02 + sin2 · 12 = 4 2

The functional of arc length is √ √ Z dR~ Z 2 2

arc length = dt = dt = t dλ 2 2

π Figure 10: u = 4 , v = λ

18 π • For u = 2 , v = λ 2 dR~ du2  dv 2 2 = + sin (u) dλ dλ dλ π  = 02 + sin2 · 12 = 1 2 The functional of arc length is

Z dR~ Z

arc length = dt = 1dt = t dλ

π Figure 11: u = 2 , v = λ

Remark 27. So the figure below is a good illustration of the role metric tensor plays:

Figure 12: What we usually see in expression convenience vs. What actually it is

The sub-figure left is what we usually see in practice, which gives us the illusion that this is an Euclidean space, but a distorted one. However, the actual shape of the manifold is the sub-figure right, which can be more arbitrary than this sphere in most cases. So, if we want to measure the distance between two points, what we need is simply the metric tensor on each points. With the metric tensor, we can derive the inner product of the velocity vector, then integrate the norm of velocity by t, we can have the distance we want. In other words, the metric tensor is the tool to describe the shape of a manifold.

Remark 28. As figure 11 shows, longer axis stands for higher time cost, while shorter axis represents lower time cost, namely a shorter distance. And figure 12 illustrates the previous property well - the closer to the polars, the lower time cost would be.

19 Figure 13: Visualization of metric tensor

Figure 14: Visualization of sphere metric field

Definition 34. Riemannian manifold is a differentiable(smooth) manifold equipped with a Rieman- nian metric.

Example 15. The Riemannian metric can be equated with a smoothly varying positive-definite symmetric matrix g, called the metric tensor, defined at each point. For two vectors v, w ∈ TpM, given local coordinates (x1, x2, ··· , xn) in a neighborhood of p, the entry in g (n × n matrix) can be expressed like below

gij = hEi,Eji,

∂ where Ei = ∂xi are the coordinate basis vectors at p. With this definition, we can compute the inner T 1 product hv, wi as v gw. Also, for a vector v, we can compute the length of the vector as hv, vi 2 , which is the L2 norm. Sometimes, people utilize the inverse of the diffusion tensor, D−1, to define a local cost function as hv, wi = vT D−1w, where v, w ∈ TpM. In this case, since the inverse of the diffusion tensor are positive-definite symmetric and they are also Riemannian metric, a DTI is actually wrapped into a Riemannian manifold.

20 Definition 35. Runge–Kutta methods(RK4) Let an initial value problem be specified as follows: The only things we know are the slope of the tangent to the curve y at any point dy = f(t, y) dt and the initial condition (t0, y0), namely

y(t0) = y0.

We want to approximate the original function y, which equals to R f(t, y). For the n + 1th iteration, we have the approximation

1 y = y + h(kn + 2kn + 2kn + kn), n+1 n 6 1 2 3 4

tn+1 = tn + h, where h is the step size and

n k1 = f(tn, yn) . The slope at the beginning of the interval  h kn  kn = f t + , y + h 1 . The slope at the midpoint of the interval, using y and kn 2 n 2 n 2 1  h kn  kn = f t + , y + h 2 . The slope at the midpoint of the interval, using y and kn 3 n 2 n 2 2

n n k4 = f(tn + h, yn + hk3) . The slope at the end of the interval, using y and k3

n n n n Remark 29. When k4 = k3 = k2 = k1 = f(tn, yn), it’s the simplest Runge–Kutta method, namely the Euler method.

Remark 30. In an ideal world, we can integrate a function by integration rules, like R ex = ex + c. However, in the real world, we can only use a numerical procedure to integrate a random function. That’s why we call the result of RK4 or Euler methods as integral curve, which is simply the result after integration.

21 Figure 15: RK4

Definition 36. Integral curve is a parametric curve that represents a specific solution to an ordinary differential equation or system of equations. If the differential equation is represented as a vector field or slope field, then the corresponding integral curves are tangent to the field at each point.

Figure 16: Integral curve

Definition 37. Geodesic between two points p, q ∈ M can be defined by the minimization of the energy functional Z 1 E(γ) = kγ˙ (t)k2dt 0

22 where γ : [0, 1] → M is a curve with fixed endpoints γ(0) = p, γ(1) = q. The inner product between two T tangent vectors v, w ∈ TxM is given by hv, wi = v g(x)w, where g(x) is the Riemannian metric at point x.

Remark 31. [10] Intrinsic distance is measured as ‘an ant would walk along the surface’. Extrinsic distance is defined as the L2 norm between two points, basically ‘how a surface looks from the outside’. From the beginning and through the middle of the 18th century, differential geometry was studied from the extrinsic point of view: curves and surfaces were considered as lying in a Euclidean space of higher dimension. Starting with the work of Riemann, the intrinsic point of view was developed, in which one cannot speak of moving “outside” the geometric object because it is considered to be given in a free-standing way.

Example 16. [7] Let A and B be any two elements of Pn. Then there exists a unique geodesic [A, B] joining A and B. This geodesic has a parametrization

1 − 1 − 1 T 1 γ(t) = A 2 (A 2 BA 2 ) A 2 , t ∈ [0, 1].

Figure 17: Intrinsic vs. Extrinsic Distance

Definition 38. Geodesic Equation guarantees the acceleration vector normal to the surface

d2uk dui duj + Γk · = 0 dt2 ij dt dt

Proof. In this section, all the computations are conducted in 2D situation.

dR~ ∂R~ du ∂R~ dv Velocity Vector: = · + · dt ∂u dt ∂v dt ! d2R~ d ∂R~ du ∂R~ dv Acceleration Vector: = · + · dt2 dt ∂u dt ∂v dt

23 Expend the expression of acceleration vector, we have ! d2R~ d ∂R~ du ∂R~ dv = · + · dt2 dt ∂u dt ∂v dt ! ! ∂R~ d2u du d ∂R~ ∂R~ d2v dv d ∂R~ = · + · + · + · ∂u dt2 dt dt ∂u ∂v dt2 dt dt ∂v ! ! ∂R~ d2u du ∂ dR~ ∂R~ d2v dv ∂ dR~ = · + · + · + · ∂u dt2 dt ∂u dt ∂v dt2 dt ∂v dt " !# ∂R~ d2u du ∂ ∂R~ du ∂R~ dv = · + · · + · ∂u dt2 dt ∂u ∂u dt ∂v dt " !# ∂R~ d2v dv ∂ ∂R~ du ∂R~ dv + · + · · + · ∂v dt2 dt ∂v ∂u dt ∂v dt ∂R~ d2u ∂2R~ du2 ∂2R~ 2 du dv = · + · + · · ∂u dt2 ∂u2 dt ∂u∂v dt dt ∂R~ d2v ∂2R~ 2 du dv ∂2R~ dv 2 + · + · · + · ∂v dt2 ∂u∂v dt dt ∂v2 dt

By using Einstein Notation, and making u1 = u, u2 = v, we can denote the acceleration vector as

d2R~ d2ui ∂R~ dui duj ∂2R~ = · + · · (1) dt2 dt2 ∂ui dt dt ∂ui∂uj

∂2R~ Assuming that ∂ui∂uj is consist of three components, so we can express it like

∂2R~ ∂R~ ∂R~ = Γ1 + Γ2 + L ~n ∂ui∂uj ij ∂u1 ij ∂u2 ij

k ∂2R~ where the Christoffel symbol Γij, gives us the tangential component of ∂ui∂uj and the second fundamental ∂2R~ form Lij, gives us the normal component of ∂ui∂uj . By using the Einstein Notation, we can have a more concise form as below: ∂2R~ ∂R~ = Γk + L ~n (2) ∂ui∂uj ij ∂uk ij

k Figure 18: meaning of Lij, Γij

24 Finally, by substituting Eq.(2) into Eq.(1), we can have acceleration vector as below

d2R~ d2ui ∂R~ dui duj ∂2R~ = · + · · dt2 dt2 ∂ui dt dt ∂ui∂uj ! d2ui ∂R~ dui duj ∂R~ = · + · · Γk + L ~n dt2 ∂ui dt dt ij ∂uk ij d2uk dui duj  ∂R~ dui duj = + Γk · + L · · · ~n dt2 ij dt dt ∂uk ij dt dt | {z } | {z } tangential part normal part

That acceleration vector normal to the surface requires

d2uk dui duj + Γk · = 0, dt2 ij dt dt which is the Geodesic Equation.

k ∂R~ Derivation of Γij and Lij As ~n is perpendicular to the tangent vectors, therefore, by multiplying ∂ul on both sides of equation above, we can yield that ! ∂2R~ ∂R~ ∂R~ ∂R~ ∂R~ ∂R~ · = Γk + L ~n · = Γk · (3) ∂ui∂uj ∂ul ij ∂uk ij ∂ul ij ∂uk ∂ul

∂R~ ∂R~ Since ∂uk · ∂ul = ~ek · ~el = gkl, then we get

∂2R~ ∂R~ · = Γk g , ∂ui∂uj ∂ul ij kl

By substituting the metric form below into Eq.(3)     ∂R~ ∂R~ ∂R~ ∂R~ ∂u1 · ∂u1 ∂u1 · ∂u2 g11 g12   =   ∂R~ ∂R~ ∂R~ ∂R~ ∂u2 · ∂u1 ∂u2 · ∂u2 g21 g22

lm m and with Kronecker delta cancellation rule gkl · g = δk , we can have

∂2R~ ∂R~ Γk g glm = · · glm ij kl ∂ui∂uj ∂ul ∂2R~ ∂R~ Γk δm = · · glm ij k ∂ui∂uj ∂ul ∂ ~e  Γm = j · ~e glm (4) ij ∂ui l

Likewise, by multiplying ~n at both side of Eq.(2), we can yield the extrinsic expression of second fundamental form ∂2R~ L = · ~n ij ∂ui∂uj

Definition 39. Given a vector space V and a functional f : V → R, x, h ∈ V, α ∈ R, if the limit 1 δf(x) = lim [f(x + αh) − f(x)] α→0 α exists, it’s called the Gˆateauxderivative of f at x with increment h. If the limit exists for ∀h ∈ V , the functional f is said to be Gˆateauxdifferentiable at x.

25 Example 17. Given x ∈ Rn and f : Rn → R, which has continuous partial derivatives with respect to each components of x. Then, the Gateaux derivative of f is n X ∂f δf(x) = h = h∇f, hi ∂x i i=1 i

Definition 40. Directional derivative of a multivariate differentiable function along a given vector v at a given point x intuitively represents the instantaneous rate of change of the function, moving through x with a velocity specified by h.

∇hf(x) = Df(x)(h) = h∇f, hi

Remark 32. Relationship between partial derivative(scalar), directional derivative(scalar) and gradient(vector):

• The vector consists of partial derivatives is the gradient.

• The linear combination of partial derivatives is directional derivative.

• Partial derivative is a special directional derivative, which is along the axis.

Remark 33. Relationship between Gˆateaux derivative(scalar), directional derivative(scalar) and covariant derivative(vector):

• Gˆateaux derivative(differential) is a generalization of the concept of directional derivative in differ- ential calculus.

• Covariant derivative is a generalization of the directional derivative from vector calculus. The covairant derivative of a function is directional

• Gˆateaux and directional derivative are applicable to functional f : Rn → R, so their output are both scalars. While the covariant derivative is for vector field v : Rn → Rn, so its output is still a vector.

i ∇hf(x) = h ∇ ∂ f ∂xi ∂f = hi ∂xi

i j ∇hv = h ∇ ∂ (v ~ej) ∂ui  j  i ∂v j = h i ~ej + v ∇ ∂ ~ej ∂u ∂ui ∂vj ∂ ~e  = hi ~e + vj j ∂ui j ∂ui ∂vj  ∂ ~e = hi ~e + vjΓk ~e . j = Γk ~e ∂ui j ij k ∂ui ij k ∂vk  = hi ~e + vjΓk ~e ∂ui k ij k ∂vk  = hi + vjΓk ~e ∂ui ij k

26 Definition 41. Covariant derivative ∇ ~w~v, refers to Levi-Civita connection generally,

• is the ordinary derivative for Euclidean space.

• is the rate of change vector at ~v of a vector field in a direction ~w with the normal component subtracted, extrinsically.

Levi-Civita connection has following properties:

1. ∇a ~w+b~t~v = a∇ ~w~v + b∇~t~v

2. ∇ ~w(~v + ~u) = ∇ ~w~v + ∇ ~w~u . Distributive Property

3. ∇ ~w(~v · ~u) = (∇ ~w~v) · ~u + ~v · (∇ ~w~u) . Product Rule

4. ∇ ~w(a~v) = (∇ ~wa)~v + a(∇ ~w~v)

∂a 5. ∇∂i (a) = ∂ui

6. ∇ ~w~v = ∇~v ~w . Commutative Property

Remark 34. • In Euclidean space, the covariant derivative is simply the change of the vector fields that take changing basis vectors into account.

• Parallel transport provides a way to compare a vector in one tangent plane to a vector in another, by moving the vector along a curve without changing it.   m m 1 im ∂gij ∂gki ∂gjk • Different expressions of Γkj give us different ways of “parallel transport”. If Γkj = 2 g ∂uk + ∂uj − ∂ui , m then it’s Levi-Civita connection. If Γkj = 0, it’s another connection.

m Figure 19: Different definitions of Γkj give us different kinds of “parallel transport”.

• Covariant derivative helps us find parallel transported vector fields. ∇ ~w~v = ~0 means the vector ~v is parallel transported in the direction ~w at ~v’s position.

• [19] Covariant derivative ∇ ~w~v is the difference between a vector field v and its parallel transport in the direction w.

27 Figure 20: Difference between a vector and the parallel transported one

• Covariant derivative provides a connection between tangent spaces in a curved space.

• In curved space, a geodesic has zero tangential acceleration when we travel along it at constant speed. To compute geodesic curves, we need to find curves where the acceleration vector is normal

to the space, namely ∇γ˙ (t)γ˙ (t) = ~0 holds along the curve γ.

• In other words, geodesic is a curve resulting from parallel transporting a vector along itself.

Example 18. [14] In a 2D Euclidean space, we represent a vector as below:

1 2 X i i ~v = v ~e1 + v ~e2 = v ~ei = v ~ei, i where v1, v2 are constant.

Figure 21: Constant vector space

28 The covariant derivative defined in Euclidean space is just intuitive:

∂ ∇ ∂ ~v = 1 ~v ∂u1 ∂u ∂ = (v1 ~e + v2 ~e ) ∂u1 1 2 ∂ ∂ = (v1 ~e ) + (v2 ~e ) ∂u1 1 ∂u1 2 ∂v1 ∂ ~e ∂v2 ∂ ~e = ~e + v1 1 + ~e + v2 2 ∂u1 1 ∂u1 ∂u1 2 ∂u1 ∂v1 ∂v2 = ~e + ~e ∂u1 1 ∂u1 2 = ~0 ∂ ∇ ∂ ~v = 2 ~v ∂u2 ∂u ∂v1 ∂v2 = ~e + ~e ∂u2 1 ∂u2 2 = ~0

In conclusion, in Euclidean space, the covariant derivative of a vector field is just the ordinary deriva- tive. We need to make sure to differentiate both the vector components and the basis vectors.

∂ ∂ ~v = vj ~e ∂ui ∂ui j ∂vj ∂ ~e = ~e + j vj ∂ui j ∂ui | {z } | {z } components basis vectors

Example 19. [15] In extrinsic case, the covariant derivative is defined as below

∂~v ∇ ∂ ~v = i − ~n ∂ui ∂u ∂ = (v1 ~e + v2 ~e ) − ~n ∂ui 1 2 ∂ = vj ~e − ~n ∂ui j ∂vj ∂ ~e = ~e + j vj − ~n ∂ui j ∂ui ∂vj ∂ ~e = ~e + (Γk ~e + L nˆ)vj − ~n . j = Γ1 ~e + Γ2 ~e + L nˆ ∂ui j ij k ij ∂ui ij 1 ij 2 ij ∂vj = ~e + Γk ~e vj ∂ui j ij k ∂vk = ~e + Γk ~e vj ∂ui k ij k ∂vk  = + Γk vj ~e (5) ∂ui ij k

k where Lij is the second fundamental form and Γij is in form of Eq.(4).

29 Parameterize the space with tangent space basis

R~ = [X,Y,Z]T

where X = cos(u2) sin(u1)

Y = sin(u2) sin(u1)

Z = cos(u1), where R~ is the position vector and u1, u2 represents the latitude and longitude, respectively.

Figure 22: Parametric equations for sphere

By using the chain rule, we have

∂R~ ∂R~ ∂R~ ∂R~ ~e = = + cos(u2) cos(u1) + sin(u2) cos(u1) − sin(u1) (6) 1 ∂u1 ∂X ∂Y ∂Z 2 1 2 1 1 = + cos(u ) cos(u )~eX + sin(u ) cos(u )~eY − sin(u )~eZ ∂R~ ∂R~ ∂R~ ~e = = − sin(u2) sin(u1) + cos(u2) sin(u1) (7) 2 ∂u2 ∂X ∂Y 2 1 2 1 = − sin(u ) sin(u )~eX + cos(u ) sin(u )~eY

With Eq.(6,7), we can yield the metric as below       ∂R~ ∂R~ ∂R~ ∂R~ ~e1 · ~e1 ~e1 · ~e2 ∂u1 · ∂u1 ∂u1 · ∂u2 1 0 gij =   =   =   ∂R~ ∂R~ ∂R~ ∂R~ 2 1 ~e2 · ~e1 ~e2 · ~e2 ∂u2 · ∂u1 ∂u2 · ∂u2 0 sin (u )  −1  −1   ∂R~ ∂R~ ∂R~ ∂R~ ij ~e1 · ~e1 ~e1 · ~e2 ∂u1 · ∂u1 ∂u1 · ∂u2 1 0 g =   =   =   ∂R~ ∂R~ ∂R~ ∂R~ 1 ~e2 · ~e1 ~e2 · ~e2 ∂u2 · ∂u1 ∂u2 · ∂u2 0 sin2(u1)

30 Substituting Eq.(6,7) into the second derivative of position vector, we get ! ∂ ~e ∂ ∂R~ 1 = ∂u1 ∂u1 ∂u1 ∂R~ ∂R~ ∂R~ = − cos(u2) cos(u1) − sin(u2) sin(u1) − cos(u1) ∂X ∂Y ∂Z 2 1 2 1 1 = − cos(u ) cos(u )~eX − sin(u ) sin(u )~eY − cos(u )~eZ (8) ! ∂ ~e ∂ ∂R~ 2 = ∂u2 ∂u2 ∂u2 ∂R~ ∂R~ = − cos(u2) sin(u1) − sin(u2) sin(u1) ∂X ∂Y 2 1 2 1 = − cos(u ) sin(u )~eX − sin(u ) sin(u )~eY (9) ! ∂ ~e ∂ ∂R~ 2 = ∂u1 ∂u1 ∂u2 ∂R~ ∂R~ = − sin(u2) cos(u1) + cos(u2) cos(u1) ∂X ∂Y 2 1 2 1 = − sin(u ) cos(u )~eX + cos(u ) cos(u )~eY (10)

Substituting gij and Eq.(6,7,8,9,10) into Eq.(4), we can yield the Christoffel symbols as below

1 1 1 1 1 1 Γ11 = 0 Γ12 = 0 Γ21 = 0 Γ22 = − 2 sin(2u ) 2 2 1 2 1 2 Γ11 = 0 Γ12 = cot(u )Γ21 = cot(u )Γ22 = 0

Substituting the Christoffel symbols into Eq.(5), we finally get the extrinsic expression of covariant derivative on sphere as below

 ∂v2  ∇ ~v = + v2 cot(u1) ~e ~e1 ∂u1 2  ∂v1 1   ∂v2  ∇ ~v = − sin(2u1)v2 ~e + + v1 cot(u1) ~e (11) ~e2 ∂u2 2 1 ∂u2 2

We initialize two different vector field along the equator to see what does covariant derivative exactly mean:

• The first vector field along the equator is

π π ~v = cos(u2) ~e + sin(u2) ~e where u1 = , u2 = λ ∈ [0, ] 1 2 2 2

Figure 23: Exponential and log map

31 Substitute the u1, u2 into Eq.(11), we have

 ∂v1 1   ∂v2  ∇ ~v = − sin(2u1)v2 ~e + + v1 cot(u1) ~e ~e2 ∂u2 2 1 ∂u2 2

2 2 = − sin(u ) ~e1 + cos(u ) ~e2,

~ since ∇ ~e2~v 6= 0, which means the rate of change is not completely normal to the tangent space.

• The second vector field along the equator is

π π ~v = 0 ~e + 1 ~e where u1 = , u2 = λ ∈ [0, ] 1 2 2 2

Figure 24: Exponential and log map

Since the vector field has nothing to do with u1, u2, we have

 ∂v1 1   ∂v2  ∇ ~v = − sin(2u1)v2 ~e + + v1 cot(u1) ~e ~e2 ∂u2 2 1 ∂u2 2

= (0 − 0) ~e1 + (0 + 0)~e2 = ~0,

which means the rate of change doesn’t exist in the tangent space. And this is exactly the geodesic which is resulted from parallel transporting a vector along itself.

Example 20. [16] In extrinsic case, we have to subtract the normal component, however, in intrinsic case, such normal component doesn’t exist, so we have

∂~v ∇ ∂ ~v = i ∂ui ∂u ∂ = (vj ~e ) ∂ui j ∂vj ∂ ~e = ~e + vj j ∂ui j ∂ui ∂vk ∂ ~e = ~e + vjΓk ~e . j = Γk ~e ∂ui k ij k ∂ui ij k ∂vk  = + vjΓk ~e ∂ui ij k

The only difference between the extrinsic and intrinsic cases lies in the calculation of Christoffel symbol. Previously, we derived the christoffel symbol in Eq.(4), by inner product between Eq.(6,7,8,9,10),

32 given the position vector. However, in intrinsic case, there’s no longer a position vector, so we have to find another way to derive the Christoffel symbol. And it turns out using the metric: ∂ ∂ ∂ ∂ g = (~e · ~e ) . g = · = ~e · ~e ∂uk ij ∂uk i j ij ∂ui ∂uj i j ∂~e ∂ ~e = i · ~e + ~e · j ∂uk j i ∂uk l l = (Γik ~el) · ~ej + ~ei · (Γjk ~el)

l l = Γik(~el · ~ej) + Γjk(~ei · ~el)

l l = Γikglj + Γjkgil

Similarly, we can yield other two expressions: ∂g ij = Γl g + Γl g ∂uk ik jl jk il ∂g ki = Γl g + Γl g ∂uj kj il ij kl ∂g jk = Γl g + Γl g ∂ui ji kl ki jl Add the two of them up and subtract the left one: ∂g ∂g ∂g ij + ki − jk = Γl g l + Γl g + Γl g + Γl g − Γl g − Γl g ∂uk ∂uj ∂ui ik j jk il kj il ij kl ji kl ki jl l l = Γjkgil + Γkjgil

l = 2Γkjgil

Times gim on both sides, we can finally get the intrinsic expression of the Christoffel symbol: ∂g ∂g ∂g  2Γl g gim = gim ij + ki − jk kj il ∂uk ∂uj ∂ui 1 ∂g ∂g ∂g  Γl δm = gim ij + ki − jk kj l 2 ∂uk ∂uj ∂ui 1 ∂g ∂g ∂g  Γm = gim ij + ki − jk kj 2 ∂uk ∂uj ∂ui The derivation below illustrates the extrinsic and intrinsic expressions of the Christoffel symbols are actually the same: ∂g ∂g ∂g ∂~e ∂ ~e ∂g ∂~e ∂ ~e ij + ki − jk = i · ~e + ~e · j . ij = i · ~e + ~e · j ∂uk ∂uj ∂ui ∂uk j i ∂uk ∂uk ∂uk j i ∂uk ∂ ~e ∂ ~e + k · ~e + ~e · j ∂uj i k ∂ui ∂ ~e ∂~e − j · ~e − ~e · i ∂ui k j ∂uk ∂ ~e ~e ~e = 2~e · k . j = i i ∂uj ui uj

1 ∂g ∂g ∂g  Γm = gim ij + ki − jk kj 2 ∂uk ∂uj ∂ui 1 ∂ ~e = gim · 2~e · j 2 i ∂uj  ∂ ~e  = ~e · k gmi i ∂uj

33 Definition 42. First fundamental form on manifold M is the field which assigns to each p ∈ M the bilinear map

gp(v, w): TpM × TpM → R

gp(v, w) = hv, wi where v, w ∈ TpM.

Definition 43. Second fundamental form on manifold M is defined by

⊥ hp(v, w): TpM × TpM → TpM

hp(v, w) = (dΠ(p)v)w = (dΠ(p)w)v where p ∈ M and v, w ∈ TpM.

Definition 44. Riemannian exponential map takes the position p = γ(0) ∈ M and velocity v =

γ˙ (0) ∈ TpM as input and returns the point at time 1 along the geodesic with these initial conditions. When γ is defined over the interval [0, 1], the Riemannian exponential map at p is defined as

Expp(v): TpM → M

Expp(v) = Exp(p, v) = γ(1)

Example 21. For a Lie group with bi-invariant metric, the Lie group exponential map is the same with the Riemannian exponential map at the identity, that is, for any tangent vector X ∈ g, we have

exp(X) = Expe(X).

For matrix groups, the Lie group exponential map of a matrix X ∈ gl(n) is computed by the formular

∞ X 1 exp(X) = Xk. k! k=0 This series converges absolutely for all X ∈ gl(n)

Definition 45. Riemannian log map is the inverse of Riemannian exponential map, defined in the neighborhood Expp(v)

Logp : M → TpM

Logp(γ(1)) = v

Remark 35. The matrix logarithm of M is defined as

log(M) = X−1 log(D)X, where M ∈ Rn×n is a diagonalizable matrix, X ∈ Rn×n and D ∈ Rn×n is a diagonal matrix. log(D) ∈ Rn×n is also a diagonal matrix with diagonal elements equals to the logarithm of the corresponding diagonal elements of D.

34 Remark 36. According to the property above, we can also have

tr(log(M)) = tr(X−1 log(D)X)

= tr(XX−1 log(D))

= tr(log(D))

Figure 25: Exponential and log map

Definition 46. Group G is a set of elements with a binary operation ·, such that

1. ∀x, y ∈ G, x · y ∈ G

2. ∀x, y ∈ G, (x · y) · z = x · (y · z)

3. ∃e ∈ G, ∀x ∈ G satisfy e · x = x · e = x, where e is unique

4. ∀x ∈ G, ∃y ∈ G such that y · x = x · y = e

Definition 47. Lie group is a smooth manifold equipped with group structures, where the two group operations

(x, y) → x · y : G × G → G Multiplication

x → x−1 : G → G Inverse are both smooth mappings. In other words, a Lie group adds a smooth manifold structure to a group.

Definition 48. Orbit of a point p ∈ M is defined as

G(p) = {g · p : g ∈ G}

Example 22. If G = SO(2), the orbit of point p is a .

Definition 49. Lie algebra is a vector space g together with an operation called Lie bracket [·, ·], a alternating bilinear map g × g → g. ∀x, y, z ∈ g and a, b ∈ R, the following axioms are satisfied:

35 1. Linearity: [ax + by, z] = a[x, z] + b[y, z]

2. Anticommutativity: [x, y] = −[y, x] = x · y − y · x

3. Jacobi identity: [x, [y, z]] + [z, [x, y]] + [y, [z, x]] = 0

Definition 50. Lie derivative of vector fields [X,Y ] is the derivative of Y along the flow generated by X, and is sometimes denoted LX Y (“Lie derivative of Y along X”). This generalizes to the Lie derivative of any tensor field along the flow generated by X.

[X,Y ](f) = X(Y (f)) − Y (X(f)) for all f ∈ C∞(M).

Remark 37. Coordinate lines are just flow curves along the basis vector. Coordinate flow curves always close, which means Lie bracket of basis vectors always has to be zero vector.

Figure 26: Coordinate lines

Lie bracket(commutator) measures how much vector field flow curves fail to close.

Example 23. The example below shows how to calculate the flow curve, given the flow field.

Figure 27: Flow field

Vector arrows tell you velocity at each point:

~w = 1 ~ex + x ~ey.

To express the vector in the way of position vector R~, we have

dR~ dx ∂R~ dy ∂R~ ~w = = + dλ dλ ∂x dλ ∂y dx dy = ~e + ~e dλ x dλ y

36 Associating the two expressions above, we can yield the respective expressions of x, y.

dx = 1 → x = λ + c dλ 1 dy 1 = x = λ + c → y = λ2 + c λ + c dλ 1 2 1 2

Assuming c1 = c2 = 0, this can be a possible flow curve

1 x(λ) = λ, y(λ) = λ2, 2 which is tangent to all vectors in a vector field.

Example 24. The Lie bracket of the vector field is defined below:

[~u,~v] = ~u(~v) −~v(~u) |{z} derivative of ~v in the direction of ~u We separate the flow field in the example above into two fields

Figure 28: Two vector fields

Derivative of ~u in the direction of ~v is shown below

i j ~v(~u) = v ~ei(u ~ej)

i j = v ∂i(u ∂j)

i j j = v [(∂iu )∂j + u (∂i∂j)] . product rule

i j i j = v (∂iu )∂j + v u (∂i∂j)

y x y x = v (∂yu )∂x + v u (∂y∂x)

= x(∂y1)∂x + x · 1(∂y∂x)

= x(∂y∂x)

= x(∂y ~ex) ~e = x x ∂y = ~0

37 Derivative of ~v in the direction of ~u is shown below

i j ~u(~v) = u ~ei(v ~ej)

i j = u ∂i(v ∂j)

i j j = u [(∂iv )∂j + v (∂i∂j)] . product rule

i j i j = u (∂iv )∂j + u v (∂i∂j)

x y x y = u (∂yv )∂y + u v (∂x∂y)

= 1(∂xx)∂y + 1 · x(∂x∂y)

= ∂y + x(∂y∂x)

= ∂y + x(∂y ~ex)

~ex = ∂y + x ∂y

= ∂y

= ~ey

[~u,~v] = ~u(~v) − ~v(~u) = ~ey − ~0 = ~ey

These four lines don’t give us a closed . Lie bracket is defined like this to computes the difference between these two derivatives. ~ey is the separation vector.

Figure 29: Separation vector

Definition 51. Lie derivative of tensor fields T w.r.t. smooth vector field X is defined by

∗ d ∗ φt (Tφt(p)) − Tp (LX T )p = (φt T )p = lim , t→0 dt t=0 t where X is on smooth manifold M, T is the covariant tensor field on M and let φt(p) = φ(t, p) be a diffeomorphism parameterized by point p and “time” t. X is induced by φ

Remark 38. Intuitively, if you have a tensor field T and a vector field X, then LX T is the infinites- imal change you would see when you flow T using the vector field −X, which is the same thing as the infinitesimal change you would see in T if you flowed along the vector field X.

38 Definition 52. Isometry φ : M → N is a function which preserves distance between manifold M and N. ∀x ∈ M, it has

1. The derivative of φ at x is an isomorphism of tangent space Dφx : TxM → Tφx N

2. ∀v, w ∈ TxM, the Riemannian metric preserves as hv, wi = hDφx · v, Dφx · wi

Example 25. [11] If S and S0 are surfaces with metric g and g0, then the surfaces are isometric if there 0 exists φ : S → S such that for all tangent vector Xp,Yp ∈ TpS and all p ∈ S, we have

hXp,Ypig = hDφ · Xp, Dφ · Ypig0

0 Notice that Dφ is the Jacobian matrix pushes forward tangent vectors from TpS to Tφ(p)S . We can understand an isometry as preserving the intrinsic geometry at corresponding points.

Definition 53. Isometry group G is group of M such that ∀p, q ∈ M, g ∈ G, d(p, q) = d(g · p, g · q) holds.

Definition 54. Conformality φ : S1 → S2 is a function that for all X,Y ∈ TpM, there exists a function u : M → R such that

2u(p) e hX,Y ig1 = hDφp · X, Dφp · Y ig2 , where g1, g2 are the metrics of S1,S2 at points p, φ(p).

Remark 39. [12] Compared to isometries that preserve both lengths and angles, conformality is a weaker condition that preserves only angles. Conformality is very flexible, in fact, all surfaces are locally confor- mal to the Euclidean metric.

Definition 55. Isotropy subgroup of p is defined as Gp = {g ∈ G|g · p = p}. In other words, Gp is the subgroup of G which leaves p fixed.

Definition 56. Symmetric space is a connected Riemannian manifold M such that ∀p ∈ M, there is an involutive isometry φp : M → M that has p as an isolated fixed point. A point x ∈ X is called a fixed point of φ if φ(x) = x.

Definition 57. Automorphism group [5] Ψ : G → G is defined as

−1 Ψg(h) = ghg given ∀g, h ∈ G.

Definition 58. Inner automorphism group Inn(G) is the collection of all inner automorphisms of the form Ψg, ∀g ∈ G. Inn(G) is a Lie group and commutative.

Definition 59. Dual pairing (m, v), where m ∈ V ∗, the dual space to V , and v ∈ V

39 Definition 60. Adjoint action Adg is the derivative of Ψg(h) with respect to h at the identity, which is

Adg : G × g → g

Adg = d(Ψg)e

• For matrix group, Adg is derived as

∂ Ad w = Ψ (w) g ∂ξ g ∂ = Ψ (h | ) ∂ξ g ξ ξ=0 ∂ = g(h | )g−1 ∂ξ ξ ξ=0  ∂  = g h | g−1 ∂ξ ξ ξ=0 = gwg−1

∂ −1 where hξ denotes the variation of h by ξ such that h0 = e and ∂ξ hξ|ξ=0 = w for Ψg(hξ) = ghξg .

−1 • For conjugation of operators under dual pairing, using Adgw = gwg , we have

−1 (m, Adgw) = (m, gwg )

= (gT m, wg−1)

= (gT mg−T , w)

∗ T −T Adgm = g mg ,

as if A is a linear operator from V to V , its conjugate A∗ : V ∗ → V ∗ is defined by (A∗m, v) = (m, Av). For detail, see [5].

• For Diff(Ω), Adφ is derived as

∂ Ad w = (Ψ h )| φ ∂ξ φ ξ ξ=0 ∂ = (φ ◦ h ◦ φ−1)| ∂ξ ξ ξ=0

−1 −1 = Dφ|hξ◦φ w|φ = (Dφ ◦ φ−1)w ◦ φ−1

∂ where hξ denotes the variation of h by ξ such that h0 = Id and ∂ξ hξ|ξ=0 = w for Ψφ(hξ) = −1 φ ◦ hξ ◦ φ .

Example 26. Adjoint of the gradient is the negative divergence:

h∇f, gi = −hf, divgi

40 Definition 61. Infinitesimal adjoint action ad is the derivative of the adjoint map Ad with respect to g at identity, which is

ad : g × g → g

ad = d(Adg)e

• For matrix group, adg is derived as ∂ ad w = Ad w| v ∂ξ gξ ξ=0 ∂ = (g wg−1)| ∂ξ ξ ξ ξ=0     ∂ −1 ∂ −1 = gξwgξ + gξw gξ ∂ξ ξ=0 ∂ξ ξ=0   −1 ∂ −1 = vw − gξwgξ gξgξ ∂ξ ξ=0 = vw − wv

∂ −1 where gξ is the variation of g by ξ with g0 = e and ∂ξ gξ|ξ=0 = v for Adgξ w = gξwgξ .

• For conjugation of operators under dual pairing, using advw = vw − wv, we have

(m, advw) = (m, vw − wv)

= (m, vw) − (m, wv)

= (vT m, w) − (mvT , w)

= (vT m − mvT , w)

∗ T T advm = v m − mv ,

as if A is a linear operator from V to V , its conjugate A∗ : V ∗ → V ∗ is defined by (A∗m, v) = (m, Av). For detail, see [5].

• For Diff(Ω), adv is derived as ∂ ad w = ((Dφ ◦ φ−1)w ◦ φ−1)| v ∂ξ ξ ξ ξ ξ=0 ∂ ∂ = ((Dφ ◦ φ−1))| w + DId (w ◦ φ−1)| ∂ξ ξ ξ ξ=0 ∂ξ ξ ξ=0 ∂φ−1 ! ∂ −1 −1 −1 ξ = ( Dφξ|φ + DDφξ|ξ=0Dφ )w ◦ φ + Dv| −1 ξ ξ ξ φξ ∂ξ ∂ξ ξ=0 = (Dv + 0)w − Dwv

= Dvw − Dwv

∂ −1 −1 where φξ is the variation of φ by ξ with φ0 = e and ∂ξ φξ|ξ=0 = v for Adφw = (Dφ ◦ φ )w ◦ φ .

Definition 62. Left/Right multiplication is a diffeomorphism such that

Ly : x → y · x Left Multiplication

Ry : x → x · y Right Multiplication where y ∈ G, G is a Lie group.

41 Definition 63. Left/Right-invariant means ∀y ∈ G, we have Ly∗X = X or Ry∗X = X.

I Example 27. The metric G has the property of right-invariance: if U, V ∈ TφDiff(M) then

I I Gφ(U, V ) = Gφ◦ψ(U ◦ ψ, V ◦ ψ) ∀ψ ∈ Diff(M)

Definition 64. Inertia operator L : g → g∗ is defined by

hv, wi = (Lv, w), ∀v, w ∈ g

L must be invertible and (Lv, w) = (Lw, v), ∀v, w ∈ g in order to satisfy the properties of a well-formed Riemannian metric.

Definition 65. A linear operator f : g → g is transposed with respect to the inner product defined by L, using the formula hf †v, wi = hv, fwi, ∀v, w ∈ g

† We use this to define the adjoint- action Ad : G × g → g via the transpose of Adg and the † infinitesimal adjoint-transpose ad : g × g → g via the transpose of adv

Definition 66. Information metric GI is defined by

Z k Z Z  I X Gφ(U, V ) = − h∆u, vigvol + λ hu, ξiigvol · hv, ξiigvol , M i=1 M M where U = u◦φ, V = v ◦φ, λ > 0, ∆ is the Laplace-de Rham operator lifted to vector fields, and ξ1, ··· , ξk is an orthonormal basis of the harmonic 1-form on M.

Definition 67. Fisher-Rao metric is the Riemannian metric on Dens(M) given by Z F 1 α β Gµ (α, β) = · µ, 4 M µ µ for tangent vectors α, β ∈ TµDens(M). It can be interpreted as the Hessian of relative entropy, or information divergence.

Definition 68. The background metric g on manifold M is called compatible with µ if volg = µ, for µ ∈ Dens(M).

Definition 69. Set of vector space isomorphisms Liso is defined by

m m Liso(R ,V ) = {e : R → V |e is a vector space isomorphism}.

Definition 70. Frame of m-dimension real vector space V is the basis e1, ··· , em of V .

Definition 71. Frame bundle F of a smooth m-dimensional submanifold M is defined by

F(M) = {(p, e)|p ∈ M, e ∈ F(M)p},

m where F(M)p = Liso(R ,TpM) is the space of frames of tangent space at p.

42 Definition 72. A smooth curve β : R → F(M) is called a lift of smooth curve γ : R → M if [8]

π ◦ β = γ.

Example 28. The general linear group GL(m, R) acts on this space by composition on the right via

m m ∗ GL(m) × Liso(R ,V ) → Liso(R ,V ):(a, e) → a e = e ◦ a

Definition 73. A curve β(t) = (γ(t), e(t)) ∈ F(M) is called horizontal lift of γ if the vector field

X(t) = e(t)ξ along γ is parallel for every ξ ∈ Rm. Thus a horizontal lift of γ has the form

β(t) = (γ(t), Φγ (t, 0)e)

m for some e ∈ Liso(R ,Tγ(0)M).

Definition 74. Suppose φ : M → N is a differential map from M to N, γ :(−, ) → M is a curve on 0 M, γ(0) = p, γ (0) = v ∈ TpM, then φ ◦ γ is a curve on N, φ ◦ γ(0) = φ(p), we define the tangent vector

0 φ∗(v) = (φ ◦ γ) (0) ∈ Tφ(p)N, as the pushforward tangent vector of v induced by φ.

Definition 75. Pullback and pushforward are defined as below

Pullback(right action): ϕ∗ρ = |Dϕ|ρ(ϕ(·))

= |Dϕ|ρ ◦ ϕ

−1 ∗ Pushforward(left action): ϕ∗ρ = (ϕ ) ρ

= |Dϕ−1|ρ(ϕ−1(·))

= |Dϕ−1|ρ ◦ ϕ−1 where (ϕ, ρ) ∈ Diff(M) × Dens(M).

Remark 40. Given a φ, there can be two effects or two directions of the φ. The pullback is the one from distorted to checkered, while the pushforward is the one from checkered to distorted, which is more intuitive.

Remark 41. The scaling factor of |Dφ| in pullback or pushforward actually tells us the intensity change after the diffeomorphism action. For density matching, if the volume is squeezed, the intensity would increase, which is reflected in the value of Jacobian determinant.

43 Figure 30: Pullback

Figure 31: Pushforward

−1 Remark 42. Relationship between I0,I1, φ, φ : Intuitively, we may think that φ demonstrate the right way to distort I0 : V to I1 : J, like distorting the “painted flat tablecloth”. In a sense, that’s right, if we write it as I1 = φ∗I0. However, we always use compose operation to distort the density map, which −1 −1 −1 is |Dφ |I0 ◦ φ , so when we are using the composing, always remember that we are composing φ , instead of the more intuitively-correct φ.

44 Figure 32: Pushforward

Remark 43. “Blank Space” You may have the confusion that if we deform the image using the diffeomorphism ϕ in Fig.(31), namely φ∗I, what to fill in the blank triangle area at the bottom, since I analogize the diffeomorphism to distorting the “painted flat tablecloth”. How about we think this way, by implementing the algorithm through Python discretely, we can have

I ◦ ϕ−1 : Z2 → R2 → R. The space of Z2 consists of all the integer tuple among (0 : M, 0 : N), where M,N are the height and width of I. The space of R2 can be arbitrary real number tuple, but still in the range of (0 : M, 0 : N), as I is not defined beyond this range. For simplicity, I ◦ ϕ−1 : Z2 → R is still a function defined all over the (0 : M, 0 : N), which won’t cause the blank space. The essential cause of the above phenomena is that, due to the limitation in computer, we typically store the diffeomorphism in the data structure of array, which means φ : Z2 → R2, where Z2 is consists of all the possible tuple among (0 : M, 0 : N), that’s how we index the entries in the array. So we should never worry about the blank space, as how we plot the diffeomorphism is different from how to express it in algorithm.

45 Figure 33: ϕ

46 3 Statistics for Images

Definition 76. Principal geodesic analysis(PGA) seeks a sequence of geodesic submanifolds that maximize the variance of the data. These submanifolds are called the principal geodesic submanifolds.

• Definition 1. The principal geodesic submanifolds are defined by first constructing an orthonormal

basis e1, ··· , ed of TµM. Then, these vectors are used to form a sequence of nested subspaces

Vk = span({e1, ··· , ek}) ∩ U, where U ⊂ TµM is a neighbourhood of 0, such that projection is

well-defined for all geodesic submanifolds of Expµ(U).

The principal geodesic submanifolds are given by

Hk = Expµ(Vk)

The first principal direction is now chosen to maximize the projected variance along the correspond- ing geodesic:

n X 2 e1 = arg max k Logµ(πH (xi))k , where H = Expµ(span({e}) ∩ U) kek=1 i=1 n X 2 ek = arg max k Logµ(πH (xi))k , where H = Expµ(span({e1, ··· , ek−1, e}) ∩ U) kek=1 i=1

where we define the projection operator πH : M → H as

2 πH (x) = arg min d(x, y) y∈H

2 = arg min k Logx(y)k y∈H

2 ≈ arg min k Logp(x) − Logp(y)k y∈H

• Definition 2. The intent of principal geodesic analysis is to find an orthonormal basis {e1, ··· , ek} d of a set of points {x1, ··· , xn} ∈ R , which satisfies the recursive relationship

n X 2 e1 = arg min kxi − he, xiiek kek=1 i=1 n X 2 e2 = arg min kxi − he1, xiie1 − he, xiiek kek=1 i=1 . . n X 2 ek = arg min kxi − he1, xiie1 − · · · − hek−1, xiiek−1 − he, xiiek kek=1 i=1

In other words, the subspace Vk = span({e1, ··· , ek}) is the k-dimensional subspace that minimizes the sum-of-squared distances to the data. The principal geodesic submanifolds are given by

Hk = Expµ(Vk)

47 The first principal direction is now chosen to minimize the sum-of-squared distance of the data to the corresponding geodesic:

n X 2 e1 = arg max k Logxi (πH (xi))k , where H = Expµ(span({e}) ∩ U) kek=1 i=1 n X 2 ek = arg max k Logxi (πH (xi))k , where H = Expµ(span({e1, ··· , ek−1, e}) ∩ U) kek=1 i=1

Example 29.

Algorithm 1 Principal Geodesic Analysis

Inputs: x1, ··· , xn ∈ M

µ ← intrinsic mean of {xi} . Remark 49

ui ← Logµ(xi) . Map xi ∈ M to TµM as ui 1 Pn T Σ ← n i=1 uiui

{ek, λk} ← eigenvectors and eigenvalues of Σ

return {ek, λk}

Remark 44. • The sample variance of the data is the expected value of the squared Riemannian distance from the mean.

• For data in Rn the two definitions are equivalent since PGA reduces to PCA in the linear case.

Definition 77. Atlas is the image of the average anatomy of a collection of anatomical images.

Remark 45. Motivation of atlas building:

• Map population into common coordinate space;

• Learn about variability of brain anatomy;

• Describe difference from normal;

• Use as normative atlas for segmentation.

Example 30. Atlas-based segmentation:

1. Build atlas with segmented structures;

2. Transform the atlas to case;

3. Make decision voxel-wise according to atlas.

Remark 46. The space of diffeomorphisms is not a vector space, despite the linear average µ = 1 PN N i=1 xi, we need a more general notion of average, and here comes the Fr´echetmean.

48 Definition 78. Fr´echet mean and variance are defined as

p∗ = arg min φ(p),. Fr´echet mean p N X 2 φ(p) = d (p, xi)wi,. Fr´echet variance i=1 where (M, d) is a complete metric space, x1, x2, ··· , xn are the random points in M and p ∈ M.

Remark 47. The Karcher means are then those points, p of M, which locally minimize φ, while the Fr´echet means are then those points, which globally minimize φ.

Remark 48. Actually, the Fr´echetmean is the generalization of the arithmetic mean, median, geometric mean, and harmonic mean, by using different distance functions.

• For real numbers, the arithmetic mean is a Fr´echetmean, by using the usual Euclidean distance as the distance function.

• For positive real numbers, the geometric mean is a Fr´echetmean, by using the (hyperbolic) distance function d(x, y) = | log(x) − log(y)|

• For positive real numbers, the harmonic mean is a Fr´echetmean, by using the distance function 1 1 d(x, y) = | x − y | Remark 49. How to compute the intrinsic mean of manifold data: According to 4.1.2 in [21], Karcher[20] shows that the gradient of φ above is

N X ∇φ(p) = −2 Logp(xi)wi. i=1 1 If wi = 2N , then we have N 1 X φ(p) = d2(p, x ), 2N i i=1 N 1 X ∇φ(p) = − Log (x ). N p i i=1 The gradient descent algorithm takes successive steps in the negative gradient direction. Given a current estimate µj, say µ1 = x1, as the intrinsic mean, the equation for updating the mean by taking a step in the negative gradient direction is N ! τ X µj+1 = Exp Log (xi) , µj N µj i=1 where τ is the step size. And this updating equation is easy to understand: Log map all the xi ∈ M to the tangent space of µj, after derived the mean in the tangent space, exp map this mean back to the M, namely the final intrinsic mean we want.

Definition 79. Fr´echet median is defined as

p∗ = arg min ψ(p),. Fr´echet median p N X 2 ψ(p) = |d (p, xi)|, i=1

49 where x1, x2, ··· , xn are the random points in M and p ∈ M.

Definition 80. Weighted geometric median is defined as

p∗ = arg min ψ(p), p N X 2 ψ(p) = d (p, xi)wi, i=1 where x1, x2, ··· , xn are the random points in M and p ∈ M.

Definition 81. Fr´echet distance between A and B is defined as infimum over all reparameterizations α and β of the maximum over all t ∈ [0, 1] of the distance between A(α(t)) and B(β(t)).

F (A, B) = inf max {d (A(α(t)),B(β(t)))} α,β t∈[0,1] where d is the distance function, e.g. Euclidean distance.

Example 31. Informally, we can think of the t in parameterization as “time”. In the discrete situation, we can think of α(t), β(t) as two sequences of same size. Say integer index t ∈ [0, 100), so the sequences

α(·): Z1 → Z1, β(·): Z1 → Z1 can be presented as follows:

α(·) = [α(0), α(1), α(2), ··· , α(99)],

β(·) = [β(0), β(1), β(2), ··· , β(99)], where α(0) ≤ α(1) ≤ α(2) ≤ · · · ≤ α(99) and β(0) ≤ β(1) ≤ β(2) ≤ · · · ≤ β(99). We can view the α(·), β(·) as two displacement-time graphs in the way of sequence, which tells you the moving points move forward in what manner.

A(·): Z1 → Rn,B(·): Z1 → Rn are also two sequences of same size(not necessary the same size as α, β), which can be exhibited as follows:

A(·) = [A(0),A(1),A(2), ··· ],

B(·) = [B(0),B(1),B(2), ··· ], and A(i) and B(i) are the coordinates in the metric space. So for calculating the Fr´echet distance between the curves, the key is to find all the possible repa- rameterizations α, β. And for each combination of α, β, we can then find the maximum distance across the whole “time period”, followed by finding the infimum among all the maximum distance under each combination.

Remark 50. In a nutshell, the Fr´echetdistance between two given fixed paths can be found in this way: we want to find moving patterns of the two points on two paths that the maximun distance during this travel is minimized. The satisfied moving patterns are often making the two points moving forward “simultenously”.

50 Figure 34: The Fr´echet distance between two same shape curves is the norm of the translation vector.

Definition 82. Wasserstein distance is defined as

 Z 1/p p Wp(µ, ν) = inf dist(x, y) dγ(x, y) , γ∈Γ(µ,ν) M×M  Z 1/p = inf dist(x, y)pγ(x, y)dxdy , γ∈Γ(µ,ν) M×M  Z 1/2 2 W2(µ, ν) = inf dist(x, y) γ(x, y)dxdy , γ∈Γ(µ,ν) M×M where Γ(µ, ν) denotes the set of all coupling of marginal distribution µ and ν, and x, y are actually indicating the position in the respective distribution. In discrete case, the distance reads as

T Wp(µ, ν) = min hT,Mµν i = min tr(T Mµν ), T ∈Γ(µ,ν) T ∈Γ(µ,ν)

p m×n m×n 1 T 1 where Mµν = [dist(xi, yj) ]ij ∈ R and Γ(µ, ν) = {T ∈ R+ |T m = a, T n = b}, namely the transport map matrix T is under the constraint that the sum of each row and column equals to a and b, respectively. The two matrices correspond to dist(x, y) and γ(x, y) in continuous form.

Figure 35: Wasserstein distance, credit to Wikipedia

Remark 51. Wasserstein distance is closely related to optimal transport problem. That is, for two distributions of mass µ(x), ν(y) in the space S, where x, y ∈ S, we wish to transport the mass at the

51 lowest cost. The problem only makes sense when the sums of two distributions are identical, fortunately µ(x), ν(y) are two probability distributions, namely the sum of mass equals to 1, this premise will be satisfied. Assuming there is also a cost function c(x, y) → [0, +∞) which indicates the cost of transporting unit mass from point x to point y. Function γ(x, y) depicts a transport plan which gives the amount of mass moved from point x to point y. Therefore, the cost of the whole transport plan equals to ZZ c(x, y)γ(x, y)dxdy, and the Wasserstein distance is exactly the cost of optimal transport plan.

Lemma 1. The p-Wasserstein distance between the two probability measures µ and ν on R1 has the following closed-form expression:

Z +∞ 1/p Z p Wp(µ, ν) = |U(s) − V (s)| ds . quantity × unit distance −∞ Z 1 1/p Z = |U −1(t) − V −1(t)|pdt ,. distance × unit quantity 0 where U and V are the CDFs of µ, ν respectively.

∗ 2 ∗ Proof. Assuming {(x1, y1), (x2, y2)} ⊂ (γ ) , x1 < x2, where γ denotes the optimal transport plan.

Given the previous assumption, we claim y1 ≤ y2. 3 Supposing that y1 ≤ y2 is not the case, namely y1 > y2, which yields

p p p p |x1 − y2| + |x2 − y1| < |x1 − y1| + |x2 − y2|

∗ ∗ However this inequality suggests that {(x1, y2), (x2, y1)} ⊂ (γ ), rather than {(x1, y1), (x2, y2)} ⊂ (γ ), which contradicts the initial assumption, namely the optimality of γ∗, as it indicates that γ∗ is no cyclically monotone. Now, for x ∈ (µ), y ∈ (ν), we claim that (x, y) ∈ (γ∗) if and only if U(x) = V (y). To see this, note that form the monotonicity property we just built, we deduce that (x, y) ∈ (γ∗) if and only if

∗ ∗ ∗ γ (R, (−∞, y]) = γ ((−∞, x], (−∞, y]) = γ ((−∞, x], R)

In turn, the fact that γ∗ ∈ Γ(µ, ν) implies that γ∗((−∞, x], R) = F (x) and γ∗(R, (−∞, y]) = G(y). From previous relation, we conclude that

 Z 1/p Z 1 1/p p −1 −1 p Wp(µ, ν) = inf dist(x, y) γ(x, y)dxdy = |F (t) − G (t)| dt γ∈Γ(µ,ν) M×M 0

Example 32. For one dimensional discrete case, to transport ν to µ,

2The support of a probability distribution can be loosely thought of as the closure of the set of possible values of a random variable having that distribution. 3For a more detailed deviation of this inequality in the case of p > 1, refers to Appendix A in https://arxiv.org/pdf/ 1509.02237.pdf

52 1. 4 extra squares would be moved from 0 to 1;

2. 3 extra squares would be moved from 1 to 2;

3. 2 extra squares would be moved from 2 to 3;

4. 1 extra squares would be moved from 3 to 4.

The “earth” need to be moved is exactly the difference between the two CDFs at each location. Therefore the p-Wasserstein distance equals to (P |U(s) − V (s)|pds)1/p = (4p × 1 + 3p × 1 + 2p × 1 + 1p × 1)1/p

Figure 36: Two distribution µ and ν.

Figure 37: Optimal transport measures the minimal effort required for filling −µ1 with µ0, i.e. trans- porting one distribution to another.

Remark 52. Relationship between KL divergence, JS divergence and Wasserstein distance:

• Intuitively, KL divergence looks like a distance between two distributions, however DKL(p, q) 6=

DKL(q, p), namely it is asymmetric. So comes the JS divergence.

• When the two distributions are far apart, the KL divergence cannot reflect the distance between the distributions while JS divergence is constant, which is deadly for backpropagation in neural network. Nevertheless, the Wasserstein distance can tackle this drawback of KL/JS divergence, as the optimal transport plan of two distant distributions would always make sense and variable.

53 Remark 53. Advantages of Wasserstein distance:4

• By leveraging Wasserstein distance, we can get a better average/summary image of two distribution.

Figure 38: Top: Some random . Bottom left: Euclidean average of the circles. Bottom right: Wasserstein barycenter.

• When we are creating a geodesic between two distributions P0,P1, and Pt interpolates between them, Wasserstein distance can preserve the basic structure of the distribution.

Figure 39: Top row: Geodesic path from P0 to P1. Bottom row: Euclidean path(Pt = tP0 + (1 − t)P1) from P0 to P1.

• Wasserstein distance is insensitive to small wiggles.

Definition 83. Wasserstein barycenter is defined as

∗ X 2 µ = arg min W2 (µ, µi) µ i

2 where W2 is the L Wasserstein distance and µi is the sample distribution. For computation in discrete situation, please refer to https://arxiv.org/pdf/1310.4375.pdf.

Remark 54. Euclidean averaging does not contain geometric information. Barycenters under the Wasserstein distance are more intuitive.5

4Figures in this remark credit to https://www.stat.cmu.edu/~larry/=sml/Opt.pdf 5https://people.csail.mit.edu/eddchien/presentations/Stochastic_Wasserstein_Barycenters.pdf

54 Figure 40: Euclidean mean vs. Wasserstein barycenter of distributions.

−1 h 1 i Corollary 1. Given ∆f = g, f(x, y) = F 2 cos(2πu/M)+2 cos(2πv/N)−4 G(u, v) .

Proof.

f(x + 1, y) + f(x − 1, y) + f(x, y + 1) + f(x, y − 1) − 4f(x, y) = g(x, y)

F[f(x + 1, y) + f(x − 1, y) + f(x, y + 1) + f(x, y − 1) − 4f(x, y)] = F[g(x, y)]

e−2πju/M F (u, v) + e2πju/M F (u, v) + e−2πjv/N F (u, v) + e2πjv/N F (u, v) − 4F (u, v) = G(u, v)

.time shifting[18]: F(f(x + a, y + b)) = e−2πjau/M−2πjbv/N F (u, v)

After withdrawing the common factor, we have

1 F (u, v) = G(u, v) e−2πju/M + e2πju/M + e−2πjv/N + e2πjv/N − 4  1  f(x, y) = F −1 G(u, v) e−2πju/M + e2πju/M + e−2πjv/N + e2πjv/N − 4

As eju = cos(u) + j sin(u), we can write f(x, y) in this way:

 1  f(x, y) = F −1 G(u, v) 2 cos(2πu/M) + 2 cos(2πv/N) − 4

Likewise, you can find the 3D one like below:

 1  f(x, y, z) = F −1 G(u, v, w) 2 cos(2πu/M) + 2 cos(2πv/N) + 2 cos(2πw/O) − 6

Remark 55. The formula below defines MATLAB’s discrete Fourier transform G of an m × n matrix g:

m n X X (x−1)(u−1) (y−1)(v−1) G(u, v) = wm wn g(x, y), x=1 y=1

−2πi/m −2πi/n where wm = e , wn = e and x, u ∈ [1, m], y, v ∈ [1, n]. In a more specific way, we have

m n X X −2πi (x−1)(u−1) + (y−1)(v−1) G(u, v) = e ( m n )g(x, y) x=1 y=1

55 4 Linear Algebra for Images

4.1 Understand a matrix

Example 33. We have a matrix as below and multiply it with two base vectors:       3 1 1 3   ×   =   1 2 0 1       3 1 0 1   ×   =   1 2 1 2

Therefore we can have a intuition of a matrix by viewing it column by column         1 3 0 1   →   ,   →   0 1 1 2 which tells us after applying this matrix, the base vectors would rotate and be scaled at what degree.

4.2 Multiplication

Definition 84. of U, V can be presented by two ways

 1 2 n  hu1, v i hu1, v i · · · hu1, v i    1 2 n  hu2, v i hu2, v i · · · hu2, v i UV =   . inner product  . . .. .   . . . .    1 2 n hun, v i hun, v i · · · hun, v i n n X X = ui ⊗ vi . outer product i=1 i=1 where the superscript indexes the columns and the subscript indexes rows.

4.3 Eigen

Definition 85. If v, λ satisfy Av = λv, then v, λ are eigenvalue and eigenvector of A.

Av = λv

Av − λv = 0

(A − λI)v = 0

|A − λI| = 0

56 Figure 41: In this shear mapping the red arrow changes direction, but the blue arrow does not. The blue arrow is an eigenvector of this shear mapping because it does not change direction, and since its length is unchanged, its eigenvalue is 1.

Remark 56.

Y λi = det(A) i X λi = tr(A) i

Remark 57. Let A be a square n × n matrix with n linearly independent eigenvectors vi. Then A can be factorized as A = QΛQ−1, where Q is the square n × n matrix whose i-th column is the eigenvector vi of A, and Λ is the diagonal matrix whose diagonal elements are the corresponding eigenvalues, Λii = λi. Note that only diagonalizable matrices can be factorized in this way. For example, the defective matrix cannot be diagonalized.

4.4 Inverse

• (AB)−1 = B−1A−1

• (ABC...)−1 = ...C−1B−1A−1

• (A−1)T = (AT )−1

• (rA)−1 = r−1A−1

4.5 Transpose

• (AB)T = BT AT

• (ABC...)T = ...CT BT AT

• (A + B)T = AT + BT

57 • (A−1)T = (AT )−1

• (rA)T = rA−1

4.6 Trace

• tr(AB) = tr(BA)

• tr(ABC) = tr(BCA) = tr(CAB)

• tr(A + B) = tr(A) + tr(B)

• tr(AT ) = tr(A)

• tr(rA) = r tr(A)

• tr(P −1AP ) = tr(A)

4.7 Determinant

• det(AB) = det(A) det(B)

• det(A−1) = det(A)−1, if A is invertable

• det(AT ) = det(A)

• det(rA) = rn det(A)

Q • det(A) = λi

Q • det(A) = aii, and aii are the eigenvalues of A, if A is triangular or diagonal

• det(A)A−1 = adj(A)

A is invertible ⇔ det(A) 6= 0 ⇔ full rank ⇔ nonsingular ⇔ each column linearly independent

A is non-invertible ⇔ det(A) = 0 ⇔ rank deficient ⇔ singular ⇔ each column not linearly independent

4.8 Logarithm

• log(AB) = log A + log B

4.9 Matrix exponential and logarithm

Definition 86. Real exponential exp : R → R is defined by

∞ X xk exp(x) = k! k=0 x0 x1 x2 x3 = + + + + ··· . 0! 1! 2! 3!

58 Definition 87. Matrix exponential exp : Rn×n → Rn×n is likewise defined by ∞ X Xk exp(X) = k! k=0 X0 X1 X2 X3 = + + + + ··· . 0! 1! 2! 3! where Xk is the matrix multiplication of k X.

Proof. Given the following ODE d f(t) = xf(t), dt where t, x ∈ R, f(·): R → R, we can easily know that

f(t) = c · ext.

When it comes to the matrix X ∈ R2 × R2 and f(·): R → R2, we still expect the matrix satisfies the form above, namely d f(t) = Xf(t) (12) dt f(t) = ~c · eXt. (13)

By substitute the definition of matrix exponential into equation above, we have d d f(t) = ~c · eXt dt dt ∞ d X Xk = ~c · tk dt k! k=0 ∞ X Xk = ~c · ktk−1 k! k=0 ∞ X X · Xk−1 = ~c · tk−1 (k − 1)! k=0 ∞ X Xk−1 = X · ~c · tk−1 (k − 1)! k=0 = X · ~c · eXt

= Xf(t) which proves the definition of matrix exponential is true and consistent with the original exponential.

Definition 88. Real logarithm ln : R → R is defined by ∞ X (x − 1)k ln(x) = (−1)k+1 k k=1 (x − 1)2 (x − 1)3 (x − 1)4 = (x − 1) − + − + ··· . 2 3 4 Definition 89. Matrix logarithm ln : Rn×n → Rn×n is likewise defined by ∞ X (X − I)k log(X) = (−1)k+1 k k=1 (X − I)2 (X − I)3 (X − I)4 = (X − I) − + − + ··· . 2 3 4

59 where (X − I)k is the matrix multiplication of k X − I.

4.10 Decomposition

Definition 90. Orthogonal matrix. A A ∈ Rn×n is called orthogonal if

A−1 = AT

Remark 58. A is orthogonal means rows or columns of A are orthonormal(orthogonal and maganitude is 1) in Rn×n, | det(A)| = 1, kAxk = kxk. When A is orthogonal and det(A) = 1, it’s called rotation matrix.

Definition 91. Cholesky decomposition. Let A ∈ Rn×n be symmetric and positive definite. Then there exist a unique lower triangular matrix L ∈ Rn×n with strictly positive diagonal entries such that

A = LLT

Remark 59. Using the Cholesky decomposition, we solve LLT x = b instead of Ax = b by

1. Solving Lz = b for z using forward substitution

2. Solving LT x = z for x using backward substitution

Definition 92. LU decomposition. Let A ∈ Rn×n be invertable and let A be diagonally dominant, P n×n namely ∀i, |aii| ≥ i6=j |aij|. Then there exist a lower triangular matrix L ∈ R and a upper triangular matrix U ∈ Rn×n such that A = LU

Remark 60. Using the LU decomposition, we solve LUx = b instead of Ax = b by

1. Solving Lz = b for z using forward substitution

2. Solving Ux = z for x using backward substitution

Definition 93. Eigendecomposition. Let A be a square n × n matrix with n linearly independent eigenvectors vi. Then A can be factorized as

A = QΛQ−1,

where Q is the square n × n matrix whose i-th column is the eigenvector vi of A, and Λ is the diagonal matrix whose diagonal elements are the corresponding eigenvalues, Λii = λi.

Remark 61. When A is a n × n real symmetric matrix can be decomposed as

A = QΛQT , since the eigenvalues are real and the eigenvectors are and orthonormal, therefore Q is an orthogonal matrix.

60 Remark 62. A square n×n matrix A is called diagonalizable or nondefective if there exists an invertible matrix P such that P −1AP is a diagonal matrix. Only diagonalizable matrices can be eigendecomposed.

Definition 94. Singular value decomposition(SVD). Let A ∈ Rn×n, then there exist orthogonal n×n T T matrices U, V ∈ R ,UU = I,VV = I and a diagonal matrix Σ = diag(σ1, ··· , σn), ∀σi ≥ 0 such that A = UΣV T , where σi are called singular values.

Figure 42: Illustration of the singular value decomposition UΣV T of a real 2 × 2 matrix M. Down T left: The action of V , is a rotation. Down right: The action of Σ, a scaling by the singular values σ1 horizontally and σ2 vertically. Top right: The action of U is also a rotation.

Remark 63. The SVD can be thought of as decomposing a matrix into a weighted, ordered sum of separable matrices. By separable, we mean that a matrix A can be written as an outer product of two vectors X = uvT . Specifically, the matrix A can be decomposed as

n X T A = σi · ui · vi , i=1 where ui and vi are the i-th columns of the U, V , σi are the ordered singular values. As a result, SVD is widely used in PCA.

Remark 64. For real positive definite symmetric matrix

• The columns of U are eigenvectors of AAT .

• The columns of V are eigenvectors of AT A.

• The non-zero singular values from Σ are the square roots of the non-zero eigenvalues of AAT or AT A.

Remark 65. Using the SVD decomposition, we solve Ax = b by

x = V Σ−1U T b

61 Example 34. SVD in image compression The figure below is of the size 450 × 333, so we can store this image as a matrix A ∈ R450×333.

Figure 43: Original image

Now we can decompose A by SVD as below

T T T A = σ1u1v1 + σ2u2v2 + ··· + σnunvn ,

T where n = min{450, 333}, uivi are matrices of rank 1. And assuming σ1 ≥ σ2 ≥ · · · ≥ σn ≥ 0. And we can see the influence that adding more terms into the image brings in the figure below.

Figure 44: From left to right: only keep the first term; keep the first five terms; keep the first twenty terms; keep the first fifty terms

Without any compression techniques, we need to store 450 × 333 = 149, 850 pixels’ value to present an image. With SVD, if we keep the first fifty terms, only (1 + 450 + 333) × 50 = 39, 200 pixels’ value need to be saved, which is 26% of original size.

Definition 95. Diagonalization. Let A ∈ Rn×n, if there exist invertible matrix P ∈ Rn×n such that

P −1AP = Diag, then A is diagonalizable.

62 4.11 Geometric Transformations

Transformation Matrix #DoF Preserves   1 0 t1     Translation 0 1 t2 2 orientation   0 0 1   s1 0 0     Scaling  0 s2 0 2 orientation   0 0 1   1 s1 0     Shearing s2 1 0 2 orientation   0 0 1   cos(θ) − sin(θ) 0     Rotation sin(θ) cos(θ) 0 1 lengths   0 0 1   a11 a12 a13     Affine a21 a22 a23 6 parallelism   0 0 1

Remark 66. The transformations can be combined by matrix multiplication, of which the order matters,       1 0 t1 cos(θ) − sin(θ) 0 s1 0 0             0 1 t2 · sin(θ) cos(θ) 0 ·  0 s2 0       0 0 1 0 0 1 0 0 1     cos(θ) − sin(θ) t1 s1 0 0         = sin(θ) cos(θ) t2 ·  0 s2 0     0 0 1 0 0 1   s1 cos(θ) − sin(θ) t1     =  sin(θ) s2 cos(θ) t2   0 0 1

63 especially when there is a translation operation.       cos(θ) − sin(θ) 0 s1 0 0 1 0 t1             sin(θ) cos(θ) 0 ·  0 s2 0 · 0 1 t2       0 0 1 0 0 1 0 0 1     s1 cos(θ) − sin(θ) 0 1 0 t1         =  sin(θ) s2 cos(θ) 0 · 0 1 t2     0 0 1 0 0 1   s1 cos(θ) − sin(θ) s1t1 cos(θ) − t2 sin(θ)     =  sin(θ) s2 cos(θ) t1 sin(θ) + s2t2 cos(θ)   0 0 1

Definition 96. Affine transformation is a of a Euclidean space that pre- serves lines and parallelism (but not necessarily distances and angles).

Remark 67. Affine transformations include scaling, rotation, translation, shear mapping, reflection, or compositions of them in any combination and sequence.

Definition 97. Rigid transformation is a geometric transformation of a Euclidean space that pre- serves the Euclidean distance between every pair of points.

Remark 68. Rigid transformations include rotation, translation, reflection, or compositions of them in any combination and sequence.

4.12 Matrix Derivative

• For f(x): Rn → R, f(x)0 ∈ Rn, we can have: ∂ ∂ (bT x) = (xT b) = b ∂x ∂x ∂ (xT x) = 2x ∂x ∂ (xT Ax) = 2A ∂x ∂ kAx − bk2 = 2AT (Ax − b) ∂x 2 ∂ AT (Ax − b) kAx − bk2 = ∂x kAx − bk2

64 • For f(X): Rn×m → R, f(X)0 ∈ Rn×m, we can have: ∂ (aT Xb) = abT ∂X ∂ (aT XT b) = baT ∂X ∂ tr(AT XB) = ABT ∂X ∂ tr(AT XT B) = BAT ∂X ∂ |X| = |X|(X−1)T ∂X

65 References

[1] Luenberger, D.G., 1997. Optimization by vector space methods. John Wiley & Sons.

[2] Fletcher, P.T., Pizer, S.M. and Joshi, S., 2004. Statistical variability in nonlinear spaces: Application to shape analysis and DT-MRI (pp. 6469-6469). University of North Carolina at Chapel Hill.

[3] Hao, X., 2014. Improved segmentation and analysis of white matter tracts based on adaptive geodesic tracking. The University of Utah.

[4] Zhang, M., 2016. Bayesian models on manifolds for image registration and statistical shape analysis. The University of Utah.

[5] Singh, N.P., 2013. Multivariate regression of shapes via deformation momenta: Application to quan- tifying brain atrophy in aging and dementia. The University of Utah.

[6] Yger, F., Berar, M. and Lotte, F., 2016. Riemannian approaches in brain-computer interfaces: a review. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 25(10), pp.1753-1762.

[7] Bhatia, R., 2009. Positive definite matrices (Vol. 24). Princeton university press.

[8] Robbin, J.W. and Salamon, D.A., 2011. Introduction to differential geometry. ETH, Lecture Notes, preliminary version.

[9] Lecture 4 Note: Surface Theory I. PDF file. Spring, 2013. https://graphics.stanford.edu/ courses/cs468-13-spring/assets/lecture4-suteparuk.pdf

[10] Lecture 10: Computing Geodesics. PDF file. Spring, 2013. https://graphics.stanford.edu/ courses/cs468-13-spring/assets/lecture10-lukaczyk.pdf

[11] Lecture 15: Isometries, Rigidity and Curvature. PDF file. Spring, 2013. https://graphics. stanford.edu/courses/cs468-13-spring/assets/lecture15-lindsey.pdf

[12] Lecture 19: Conformal Geometry. PDF file. Spring, 2013. https://graphics.stanford.edu/ courses/cs468-13-spring/assets/lecture19-buganza-tepole.pdf

[13] Introduction to RKHS, and some simple kernel algorithms. PDF file. October 16, 2019. http:// www.gatsby.ucl.ac.uk/~gretton/coursefiles/lecture4_introToRKHS.pdf

[14] Tensor Calculus 17: The Covariant Derivative (flat space). YouTube. September 25, 2018. https: //www.youtube.com/watch?v=U5iMpOn5IHw&list=LLVT8QysnCZqdisAyo0CI0-g&index=2

[15] Tensor Calculus 18: Covariant Derivative (extrinsic) and Parallel Transport. YouTube. October 16, 2018. https://www.youtube.com/watch?v=Af9JUiQtV1k&list=LLVT8QysnCZqdisAyo0CI0-g& index=2&t=1482s

[16] Tensor Calculus 19: Covariant Derivative (Intrinsic) and Geodesics. YouTube. October 27, 2018. https://www.youtube.com/watch?v=EFKBp52LtDM&t=146s

66 [17] Tensors for Beginners 9: The Metric Tensor. YouTube. January 27, 2018. https://www.youtube. com/watch?v=C76lWSOTqnc

[18] PROPERTIES OF THE DFT. PDF file. http://eeweb.poly.edu/iselesni/EL713/zoom/ dftprop.pdf

[19] Riemannian Geometry: Definitions, Pictures, and Results. PDF file. https://arxiv.org/pdf/ 1412.2393.pdf

[20] Karcher, H., 1977. Riemannian center of mass and mollifier smoothing. Communications on pure and applied mathematics, 30(5), pp.509-541.

[21] Fletcher, P.T., Statistical Variability in Nonlinear Spaces: Application to Shape Analysis and DT- MRI, Ph.D. Thesis, Department of Computer Science, University of North Carolina, August 2004.

67