<<

J Math Imaging Vis (2009) 33: 39–65 DOI 10.1007/s10851-008-0104-3

Probabilistic Models for Shapes as Continuous Curves

Jeong-Gyoo Kim · J. Alison Noble · J. Michael Brady

Published online: 23 July 2008 © Springer Science+Business Media, LLC 2008

Abstract We develop new shape models by defining a stan- 1 Introduction dard shape from which we can explain shape deformation and variability. Currently, planar shapes are modelled using Our work contributes to shape analysis, particularly in med- a space, which is applied to data extracted from ical image analysis, by defining a standard representation images. We regard a shape as a continuous curve and identi- of shape. Specifically we develop new mathematical mod- fied on the Wiener space whereas previous methods els of planar shapes. Our aim is to develop a standard rep- have primarily used sparse sets of landmarks expressed in a resentation of anatomical or biological shapes that can be Euclidean space. The average of a sample set of shapes is used to explain shape variations, whether in normal subjects defined using measurable functions which treat the Wiener or abnormalities due, for example, to disease. In addition, measure as varying Gaussians. Various types of invariance we propose a quasi-score that measures global deformation of our formulation of an average are examined in regard to and provides a generic way to compare statistical methods practical applications of it. The average is examined with of shape analysis. relation to a Fréchet mean in order to establish its valid- As D’Arcy Thompson pointed in [50], there is an impor- ity. In contrast to a Fréchet mean, however, the average al- tant relationship between shape of a biological structure and ways exists and is unique in the Wiener space. We show that its function. To understand the identity, structure and func- the average lies within the range of deformations present in tion of anatomical objects non-invasively, one must study the sample set. In addition, a measurement, which we call a their shape as it appears in images. One of the principal quasi-score, is defined in order to evaluate “averages” com- puted by different shape methods, and to measure the over- problems in medical image analysis is the description of all deformation in a sample set of shapes. We show that the shapes in a way that represents biological variability. This is average defined within our model has the least spread com- often termed shape variation or deformation. Evidently, de- pared with methods based on eigenstructure. We also derive formation has to be explained in terms of correspondences a model to compactly express shape variation which com- between shapes. prises the average generated from our model. Some exam- There has been considerable progress in shape analysis ples of average shape and deformation are presented using during the last two decades. Shape has often been defined by well-known datasets and we compare our model to previous a set of landmarks, significant points on a shape, and which work. are usually selected manually. For this reason, sparse set of landmarks tend to be used. We define a shape space as an Keywords Shape space · Average over a function space · underlying set of shape representations. The representation Wiener measure space · Fréchet mean chosen for a shape space is important, because manipula- tions of shape are determined in large part by how shape in- formation is represented. Several authors have provided pre-  · · J.-G. Kim ( ) J.A. Noble J.M. Brady cise definitions of shape and shape space. These include: dif- Dept. Engineering Science, Oxford University, Oxford OX1 3PJ, UK ferential manifolds [4, 12, 13, 20, 27, 33], eigenstructures, e-mail: [email protected] [6, 7], and function spaces [3, 22]. 40 J Math Imaging Vis (2009) 33: 39–65 Kendall has defined shape as “what is left when the dif- ferential manifold of the descriptors. They use eigenstruc- ferences which can be attributed to translations, rotations, ture on the differential manifolds to capture shape variabil- and dilations have been quotiented out” [20]. Kendall’s ap- ity. On the other hand, Klassen et al. [22] built another type proach, using a Procrustean metric, has been a cornerstone of differential manifold on L2. In their work, a single para- for a great deal of shape related research, and adopted in meter, such as a direction function or the curvature of con- many applications, both theoretically and practically. tours, is described on Hilbert manifold. They used a mean in In Kendall’s approach, shapes characterised by land- terms of Karcher [18], in fact a Frechét mean [14]. In their marks form a Riemannian manifold with the Procrustean shape space, analytic expressions for geodesics are not pre- metric; a shape is represented by a point on a sphere. The sented so that their model is difficult to be practical. This Procrustes Analysis that Kendall [20] employed on a Rie- is a common limitation from which differential geometric mannian manifold has been used for Euclidean space in nu- approaches suffer. merous methods. On the other hand, Bookstein [4] consid- Sparr [42–44] formulated both a shape representation and ers triangular shapes (characterised by three points) which a basis for the shape space for a finite number of points, form a differential manifold, a sphere but with a differ- a polyhedron. The shape space is designed to be invariant ent metric from Kendall’s; the Poincaré plane. Both shape with respect to affine transformations, so is named the Affine spaces have been developed and statistics of the result- Shape. Berthilsson and Åström [3] extended Sparr’s idea for ing spaces have been investigated by many researchers, shapes represented by finitely many points to shapes repre- sometimes separately [24, 26, 27], sometimes comparatively sented by continuous curves. [10, 28, 29, 41]. Pennec et al. [33, 34] regards shapes as We contend that methods using sparse sets of landmarks a combination of a feature (such as a point or curve) and are suitable only for shapes that can be characterised by a a transformation (rigid-body). Both the feature set and the small number of landmarks. Some geometric shapes have transformation set constitute differential manifolds, respec- evident landmarks that are sufficient to characterise them. tively, with invariant metrics. Very recently and indepen- For example, the planar shape of 3 pyramids in an aerial dently of our own work, Pennec [32] studies statistics on view in Fig. 1(left) is perfectly captured by the four land- Riemannian manifolds, focusing on Fréchet means. marks shown on each of them. However, this approach is of Cootes et al. [6, 7] use generalised Procrustes analysis questionable relevance for anatomical shapes, such as that [16] and proposed a shape model in terms of an eigenspace showninFig.1 (right), which tend to have very few distinc- by using Principal Component Analysis (PCA). In this ap- tive landmarks. It is precisely this kind of shape that is the proach, a shape is represented by the eigenvectors of the co- focus of this paper. variance matrix of the locations of landmarks, with weights There are widely recognised fundamental problems in as parameters. The weights express shape variations un- methods that depend on sparse sets of landmarks. First, lo- der the assumption that these are independent and follow cating landmarks on images is not only time-consuming; a Gaussian distribution. This approach results in a huge re- but also, especially for noisy medical images, often requires duction of the dimensionality of their shape space. For this expert knowledge. Second, landmarks are not only used to reason, the method is computationally efficient and easy to characterise a particular shape; but are also used to match apply and test. It has been widely applied, especially in med- corresponding points over the whole sample set of shapes. In ical image analysis [2, 8, 9, 53]. methods that use a sparse set of landmarks, mis-represented Fletcher et al. [12, 13] represented shapes in terms of shapes can lead to insufficient or false analysis. For this rea- medial axis descriptions and developed a model on a dif- son, recent work has tended to use denser sets of landmarks.

Fig. 1 Landmarking annotated by ∗: planar shapes of 3 pyramids (left) and a femoral head (right) J Math Imaging Vis (2009) 33: 39–65 41

Fig. 2 Femurs: (left) each femoral shape consists of 31 landmarks; (right) every 5 point are marked by + and their average by ∗; the average of others by ·

However, the fundamental problems persist, since the same istence and uniqueness are not guaranteed. There has been techniques are applied. Also, to overcome the first of these a study of its existence [18] under quite restrictive assump- difficulties, there have been attempts to automate landmark- tions and its uniqueness, with even more assumptions [26]; ing [8, 9, 23, 49] but to date these techniques are limited in but both are limited to a narrow set of a structures in Rie- accuracy and the correspondence problem remains. mannian manifolds. Due to the lack of existence and unique- Methods of shape analysis increasingly tend to be devel- ness of a Fréchet mean, one needs more than one term for oped within a statistical framework. In such methods, land- a mean and has to distinguish them as in [28] (a mean of marks are labelled points (locations) as shown in Fig. 1. shapes, the shape of a mean) and [33] (a mathematical mean, After being filtered by Procrustes analysis, corresponding empirical mean). landmarks are grouped according to a set of labels. That is, a set of corresponding landmarks across all shapes of a 1.1 The proposed model sample set is formed for each label. For each label, an aver- age location is estimated, usually assuming that the sample is uniformly distributed. Then the shape configured by the Our model developed in Sect. 4 mainly focuses on solv- collection of these average locations corresponding to each ing the problems outlined above. First, we contend that the label is called the “average” shape. Indeed, most methods mathematical precision of the definition of shape, and the employing an average shape assume a uniform distribution1 numerical measurement of shape variability, remain inade- over shapes, regardless of the structure of their shape spaces. quate. Since our primary interest is in shapes which tend to Such methods are known to lead to average shapes which are have few evident landmarks, we prefer to regard a planar not representative of actual shapes (see for example Fig. 2), shape as a continuous curve, and this leads to the construc- or which deviates from the majority of the population of a tion of a shape space that is totally different from previous sample set [7]. This is particularly the case for shapes hav- methods. ing no, or only a few, evident landmarks. We have observed Very recently, researchers recognise a shape should be that the average with a uniform distribution does not ade- defined as continuous functions and there are methods quately reflect the extent of a deformation found in a sample where shape is viewed in terms of continuous contours set, particularly when the deformation is large. [3, 22], but no one has suggested concrete structure on Other methods, not necessarily using a distribution, adopt continuous functions to explain deformation. For example, the approach of a Fréchet mean: a minimiser of a variance- Berthilsson and Åström [3] present a precise definition of like functional defined on a metric space [14]. A fundamen- the shape space but concentrate on making it affine invari- tal problem with the use of a Fréchet mean is that its ex- ant;Klassenetal.[22] use a Fréchet mean on a Hilbert mani- fold. We are not aware of methods for generating a standard 1 representation of shapes expressed in terms of continuous An average shape of n shapes having m landmarks is the collection of ¯1 ¯m ¯i = n i i curves. Conventional methods of building a shape space of points (x ,...,x ),wherex j=1 xj /n,andxj is the i-th land- mark of a shape j. continuous functions mix up with the structure of a sparse 42 J Math Imaging Vis (2009) 33: 39–65 set of landmarks, probably due to the difficulty of manipu- 2 Mathematical Background lating a complicate infinite-dimensional space as pointed in [31] and finding a distribution and an average. Hence, they We regard univariate functions as paths in Brownian motion lack structural consistency in our view. on the Euclidean space R2. Primarily to establish notation, In the following, we concentrate on how to formalise we briefly introduce the Wiener measure space, measurable normal shape, and how to describe and to measure shape functions on it, and metrics on the Wiener space. We mostly changes due to disease or growth over time. Apart from this follow notation in [17] where a compact description of the primary consideration, we try to develop our model so that Wiener measure space may be found. it does not suffer from correspondence problems, but to date we have found it difficult to judge this objectively. 2.1 The Wiener measure space We are using a well-known function space for the shape space of the proposed model. The shape space for our model The Wiener measure space originates from Brownian mo- C0[a,b], where [a,b] is a closed bounded interval of real tion, that is the motion of small particles suspended in a numbers, is chosen for the following reasons: fluid. Einstein used kinetic theory to derive the diffusion equation ∂ρ = Dρ for such motions in terms of funda- • It is a metric space with the L2-norm which is sensible for ∂t mental parameters of the particles and liquid, where D is a practical use; there is no difficulty in defining a metric as diffusion constant. The normalised one-dimensional version in shape models built on differential manifolds. of Einstein’s probabilistic formula is • We define a shape by its continuous boundary; the contour is represented by a , a member of the Prob({α

Fig. 4 Illustration of the function Xt . Xt maps the continuous curve x to a scalar x(t);thevalueofx at t which is depicted by a dot on the curve

f(x(t1), . . . , x(tn)) depends only on the value of the func- tion x at n points. This does not cause any problem in the application of our model since in practice shapes in medical images are represented by discrete sets of data as determined by the image sampling. Fig. 3 Membership of a cylinder set E ={x ∈ C0[a,b]:α1 < x(t1) ≤ β1}: The three curves depicted by solid lines are members of E while the curve depicted by a dashed line is not 2.2 Measurable Functions on the Wiener Space

Let us denote the collection of cylinder sets by I. With the We use a system of measurable functions defined on the Carathéodory extension process, the set function m turns Wiener space. Let T be a linearly ordered subset of [a,b] out to be a countably additive measure on σ(I),theσ - and let us define Xt on the Wiener space: algebra generated by I. In fact, σ(I) = B(C0[a,b]), where 2 : [ ]−→R B(C0[a,b]) is the Borel class of C0[a,b]. The completion Xt C0 a,b (C [a,b], W,m)of (C [a,b],σ(I), m) is called the Wiener 0 0 x −→ X (x) := x(t) (2.5) measure space. A member of W is called a Wiener measur- t able set. where t ∈ T . Then Xt is Wiener measurable on C0[a,b] [54]. Obviously, Xa(x) = 0asx(a) = 0 for all x ∈ C0[a,b]. Xt is Of the many properties of the Wiener measure space, we defined on a function space and assigns each function x to a state only Wiener’s integration formula, since it will be used scalar x(t). Figure 4 shows the function Xt whose value at t = (t ,...,t ) in subsequent sections. Let us for now fix 1 n x is depicted by a dot on x itself. such that a

3 Fréchet Mean over the Wiener Measure Space Remark 3.1 In Fréchet’s claim, the existence of the first no- tion of a mean is required to use the property that the two We noted earlier that some researchers have approximated notions coincide. From his claim, one however does not yet an “average” shape using Fréchet’s concept of mean. Such know whether or not the first notion MX exists and even methods do not provide an explicit definition of an aver- how to calculate it. age. Moreover, the non-existence and non-uniqueness of a Fréchet mean are major concerns [26–28, 32, 33]. A num- Although Fréchet only provided global meanings for the ber of requirements for this have been sought under which a two notions of a mean for a function space, the second no- Fréchet mean exists and is unique [18, 24–26, 28] for their tion can in fact be denoted by metrics employed in models. The conditions suggested by Karcher [18] and Le [24–26, 28] are, however, limited to F(y) := d2(x, y)dμ(x), (3.1) narrow sets in the structure of Riemannian manifolds. For M other structures, its existence and uniqueness are still un- known. where d is a metric and μ is a measure on the space M, ˆ := In contrast, the concept of an average function introduced respectively. Then a Fréchet mean can be denoted by y F F in this paper provides not only an explicit definition; but also arg min (y). We refer to in (3.1)asaFréchet functional. guarantees its uniqueness without imposing additional con- In our shape space in the next subsections, the Wiener mea- ditions. In order to distinguish clearly these two concepts, sure will be used for μ in (3.1). we reserve the term average for the one defined with re- spect to the Wiener measure and mean for that defined in 3.2 Population Average of the Wiener Space the Fréchet sense. In this section, the population average of the Wiener space is examined in connection with a Fréchet In our shape space, we noted in Remark 2.1 that for each mean. t ∈ (a, b], the average value of {x(t) : x ∈ C0[a,b]} is zero. J Math Imaging Vis (2009) 33: 39–65 45

For t = a, the average value of {x(a) : x ∈ C0[a,b]} is zero (1) C0[a,b] and [a,b] are finite measure spaces, with re- since x(a) = 0 for every x ∈ C0[a,b]. Hence, the function spectively the Wiener and Lebesgue measures. A finite consists of the average value at each t is a zero function. measure space is obviously σ -finite. That is, the population average over C0[a,b] is the zero (2) Measurability of the integrand needs to be checked. For function x ≡ 0. simplicity, we decompose the integrand into three func- tions G1,G2 and G3, defined by: 3.3 Relationship Between the Population Average and a := Fréchet Mean G1(x, t) x(t), G2(x, t) := x(t) − y(t), The theorem and corollary in this section show that the av- ∈ [ ] erage function x ≡ 0 with respect to the Wiener measure is where y C0 a,b is a fixed element, a Fréchet mean with metric · on C [a,b]. 2 0 G (x, t) := {x(t) − y(t)}2. Fréchet’s hypotheses in Sect. 3.1 about the conditions un- 3 der which the two notions of a mean coincide, do not pre- Evidently, G1 is a continuous function on the prod- cisely describe the concept of measurability of a random uct space C0[a,b]×[a,b] [17]. For any fixed element variable. Also the existence (as a finite real number) of his y, G2 is also continuous on the product space. It fol- first notion of a mean is taken as one of hypotheses in his lows that the function G3 is also continuous on the prod- claim as noted in Remark 3.1. Without the hypothesis be- uct space C0[a,b]×[a,b] since it is a composition of ing satisfied, one cannot use the property that the two no- G2 and a polynomial of order 2. That is, the function G3, tions of a mean coincide. Moreover, the uniqueness of the which is the integrand of (3.3), is continuous on C0[a,b]× second notion does not appear to have been discussed. For [a,b] and so it is B(C0[a,b]×[a,b])-measurable since these reasons, we now present the proof of the existence for all continuous functions are measurable. However, C0[a,b] his second notion of mean, as well as the coincidence of the and [a,b] are both separable metric spaces [54] and so two notions for the case of the Wiener measure. First, we B(C0[a,b]×[a,b]) = B(C0[a,b])⊗B([a,b]) by Proposi- need to recall a fact about the σ -algebra of a product space. tion 3.1. Hence, G3 is B(C0[a,b]) ⊗ B([a,b])-measurable and so W ⊗ Leb.([a,b])-measurable, where W is the com- Proposition 3.1 ([17]) If X and Y are separable metric pletion of B(C0[a,b]) in Definition 2.1. spaces, then the Boreal class of X×Y , i.e., B(X×Y)equals The claims in (1) and (2) imply that the integrand fulfills the product σ -algebra B(X) ⊗ B(Y ), where the assumptions of the Tonelli theorem. This means that we can change the order of the in (3.3) and have ⊗ := × : ∈ ∈ B(X) B(Y ) σ(B1 B2 B1 B(X), B2 B(Y )). b F(y) = [x(t) − y(t)]2 dm(x) dt Theorem 3.1 If Gy : C [a,b]→R is defined by 0 a C0[a,b] y := − 2 [ ] G (x) x y 2, where y is a fixed element of C0 a,b b 2 y and · 2 is the L -norm. Then G is Wiener integrable and = [x(t)2 + y(t)2 − 2x(t)y(t)] dm(x) dt [ ] a C0 a,b (b − a)2 F(y) := Gy(x) dm(x) = + y 2. (3.2) b 2 = x(t)2 dm(x) + y(t)2 dm(x) C0[a,b] 2 a C [a,b] C [a,b] 0 0 y · Proof From the definitions of G and 2,wehave − 2y(t) x(t)dm(x) dt C0[a,b] F = y (y) G (x) dm(x) b C [a,b] 2 0 = {(t − a) + y(t) } dt a = − 2 x y 2 dm(x) 2 [ ] (b − a) C0 a,b = + 2 y 2. (3.4) b 2 = [x(t) − y(t)]2dt dm(x). (3.3) By Proposition 2.2 and Corollary 2.1,(3.4) follows and C0[a,b] a we have the desired result.  We now wish to change the order of integrals in (3.3) and to this end we check if the Tonelli theorem can be applied A similar but simpler argument may be found in Chap. 3 2 y to the last of these integrals. The integrand is [x(t) − y(t)] of [17]. Note that G is a continuous function of x and non- and its integral domain is C [a,b]×[a,b] in the form of the negative so it is clear that Gy(x) dm(x) in (3.2)is 0 C0[a,b] Tonelli theorem. defined. 46 J Math Imaging Vis (2009) 33: 39–65

Corollary 3.1 The average function, x ≡ 0, over C0[a,b] μ(M) < ∞. By an average of a measurable function f on a with respect to the Wiener measure is a Fréchet mean with measurable set E in M we mean the metric · and so a Fréchet mean over C [a,b] exists. 2 0 fdμ Furthermore, it is unique in C0[a,b]. f¯ := E , (4.1) E μ(E) F Proof Let be the functional defined in (3.2). Then it is where μ(E) > 0[40]. We will use this formula for the sys- F = (b−a)2 + 2 well-defined and (y) 2 y 2 by Theorem 3.1. tem of measurable functions {Xt } in (2.5) and on the Wiener [ ] A Fréchet mean over C0 a,b with respect to the Wiener measure space (C0[a,b], W,m). measure can be represented by arg min F(y). F Clearly, the minimum of (y) is achieved when y 2 4.2 Cylinder Set equals zero. This implies that y(t) = 0 almost everywhere in the Lebesgue measure. However, the only continuous func- Let us now assume that we are given a sample set C ⊆ tion that is equal to zero almost everywhere is the one iden- C0[a,b]. The Wiener space is constructed on the basis of [ ] tical to zero everywhere, i.e. a zero function in C0 a,b . cylinder sets, characterised by finitely many parameters t ∈ ≡ [ ] Hence, the average function x 0 over C0 a,b achieves [a,b].IfC is already given in the form of a cylinder set in I the minimum to F(y) uniquely and is a Fréchet mean.  (see Definition 2.1), then that would be ideal and we could start with them. If not, we can construct cylinder sets rep- In conclusion, the trajectory of local average values at resenting the set C as follows. Also, for an average to be each t composes the global minimiser of the functional F defined as a function, we need to extend all aspects of its in (3.1) by which a Fréchet mean is defined on the Wiener construction to any t in [a,b]. space. The average shape built in the next section will be We assume in addition that C is a bounded and uni- examined with this idea in Sect. 5. formly equicontinuous collection3. This condition, uniform equicontinuity, appears to be strong, though it is necessary for the generation of the cylinder of the model to be valid for 4 Modelling of a Sample Average any t. Fortunately, if we have a sample set of a finite num- ber of curves, this condition always holds. Moreover, curves In this section we rigorously develop an average of a set that are the shapes of anatomical objects always obey the of shapes represented by continuous contours. Section 4.1 condition. presents the concept of an average, Sect. 4.2 how to describe Let t ∈ (a, b] be fixed, and define a cylinder set of the set deformations of a shape present in a sample set, and Sect. 4.3 A ={x(t) : x ∈ C} at t by determining the range of the set describes how to formalise an average shape. For practical A. Here, A is dependent on t and is illustrated as dots on applications of our formula, various types of its invariance a vertical line in Fig. 5. The simplest way to determine the are examined in Sect. 4.4. range of A is to take its infimum and supremum, say α and β, respectively. 4.1 Shape Space and the Concept of an Average Definition 4.1 Let C be a subset of C0[a,b] with a positive t ∈ (a, b] The shape space of our model is the set of continuous func- measure. Define the range of a cylinder set at by tions, the Wiener space C [a,b]. The construction of an av- 0 α(t) := inf {x(t)}−ε, β(t) := sup{x(t)} erage function of a set of continuous functions is not a trivial x∈C x∈C problem. Here is the main idea in brief. We are given a set of continuous functions, a sample set. We want to evaluate for a sufficiently small ε>0. an average of this sample set. This in turn is expected to be a function on the interval, i.e., an average is a newly defined Remark 4.1 (1) The values α and β obviously depend on t. function. To this end, we need to assign a value for each For the nature of a cylinder set characterised by a half open ] point on the interval; t → x(t) for each t ∈[a,b]. The prob- interval (α, β , a sufficiently small ε>0isusedforα. lem can be interpreted to how to evaluate the values at each t which constitute an average function. It is desirable that the 3AsetC of functions on [a,b] to R is said to be uniformly equicontin- function defined in this way is continuous, i.e. a member of uous on [a,b] if, for each real number ε>0 there is a number δ(ε) > 0 the underlying set, hence of our shape space. such that if t,s ∈[a,b] and |t − s| <δ(ε)and x is a function in C,then |x(t) − x(s)| <ε[1]. Here δ depends only on ε, not the points t, s or We introduce the formula for an average defined in a functions x. In other words, the collection keep the uniform degree of measure space which we use in our model. We refer to [40] being close x(t) → x(s) as t → s, regardless of what points on [a,b] for the definition. Let (M, M,μ) be a measure space and and which function x. J Math Imaging Vis (2009) 33: 39–65 47 In order to evaluate the average values, we employ the system {Xt : t ∈ (a, b]} of measurable functions on the Wiener space, where Xt is defined in (2.5). The measurable functions play a pivotal role in achieving our goal. When the system is considered with a finitely ordered discrete pa- rameter, T ={t1,...,tn}, the system {Xt : t ∈ T } is called a random vector and will be used in practical applications in Sect. 7. The system in our model is the generalisation of a random vector to a continuous parameter. We then evaluate the average value of Xt over all x in the cylinder set Et for ¯ ¯ each t and denote it by Xt if it exists. The average value Xt of a measurable function Xt over a Wiener measurable set Et is given as follows. Xt (x) dm(x) X¯ := Et . (4.3) t dm(x) Et Fig. 5 A cylinder set Et at t; all curves going through the gap marked by α and β which contains vertically lined dots The Wiener integrals on the right hand side of (4.3)are well-defined and exist unless m(Et ) = 0, where m(Et ) is := −6 { }− (2) One suggestion for ε is ε(t) 10 (supx∈C x(t) the Wiener measure of the set Et . That m(Et )>0was −6 infx∈C{x(t)}). Here, the constant factor 10 can be substi- pointed out in Remark 4.2 below. The Wiener integrals in tuted by any other positive number. Note that infx∈C{x(t)} (4.3) are transformed to the Lebesgue integrals using Propo- { } and supx∈C x(t) are both continuous functions of t since sition 2.1. When the integral domain is a cylinder set like C the set is a uniformly equicontinuous collection [1]. Et , a Lebesgue integral can be regarded as the Riemann in- Hence, this ε makes α and β in Definition 4.1 continuous tegral [38]. Hence, the Wiener integral on a cylinder set is ¯ and is used for rigorous development of our model (in par- calculable and the value Xt in (4.3) becomes ticular, Theorem 4.3). 2 β −u (3) In practical application, we can simply take a uniform √ 1 ue2(t−a) du ¯ α 2π(t−a) value ε>0 for all t. Xt = . (4.4) −u2 β √ 1 2(t−a) α − e du According to Definition 4.1, a cylinder set is created at 2π(t a) each t,sayEt , and may be depicted as in Fig. 5. Without Remark 4.2 The denominator of (4.4) is always positive loss of generality, for t>a, we can define cylinder sets of since its integrand is positive and α<β as noted in Re- the form: mark 4.1. The value on the right-hand side of (4.4) always exists for all t ∈ (a, b] as its denominator is non-zero and its Et ={x ∈ C0[a,b]:α(t) < x(t) ≤ β(t)}. (4.2) numerator is finite. The cylinder set E in (4.2) describes the local deforma- t ¯ tion or shape variability present in the sample set C.Note As noted in Remark 4.2, the function t −→ Xt is well- ] ¯ that Et also includes all potential samples within the range defined on the interval (a, b . The value denoted by Xt ,in of shape variability of the sample set C, in fact infinitely (4.4) is the average value of all x over the cylinder set Et C ∈ ] × ¯ many x.Every x in belongs to Et for each t (a, b , and and illustrated as in Fig. 6.ThevalueXt assigned for t is so C ⊆ E .ThesetE interprets all functions in C in always located within the range described in the cylinder set t t t ¯ terms of t. Et for all t ∈ (a, b]. Then the mapping t → Xt is a function ¯ defined on (a, b] and the function of t, X(·) is continuous on 4.3 Formulation of an Average Function the interval. These observations are proved in the following theorems and corollaries below. In the previous subsection, the given set of continuous func- tions are interpreted as cylinder sets Et . We now evaluate Theorem 4.1 Let {Xt ,t∈ (a, b]} be a system of random the average values of {x(t) : x ∈ Et } for each t by using the variables defined as in formula (2.5). If the average of Xt formula4 in (4.1) in Sect. 4.1.

formula for each parameter t by employing a system of measurable 4It is from [40]. It does not tell the average function of a set of functions functions and construct an average function; in other words, defining but just an average of a single measurable function. We here use the x(t)¯ of the average function x¯ at each t. 48 J Math Imaging Vis (2009) 33: 39–65 ¯ ¯ As Xt and Xt are defined by integrals, the statement re- quires the interchange of a limit and an integral. In Re- mark 4.1, it is noted that both α and β are continuous functions of t. To simplify notation, we denote α = α(t), β = β(t), α = α(t) and β = β(t). Hence, after cancelling the constant factor in both the numerator and the denomina- tor, we need to show

 −u2  β 2(t −a) ¯ α ue du lim Xt = lim    −u2 t →t t →t  β 2(t −a) α e du

 −u2 β − limt→t  ue2(t a) du = α (4.6)  −u2 β   2(t −a) limt →t α e du 2 β −u ue2(t−a) du ¯ × = α Fig. 6 An average value Xt (plotted by ) overlaid on values of 2 (4.7) β −u curves (dots) in the cylinder set defined by (α, β] at each t 2(t−a) α e du = ¯ ¯ Xt . over the set Et is defined by Xt for each t, where Et is in ¯ formula (4.2) and Xt is in (4.4), then for every t, All of the integrals in these four equations are Riemann in- ¯ ≤ tegrals. The first and last equations are trivial from the de- α(t) < Xt β(t). (4.5) ¯ finition of Xt in (4.4). We give the proof of (4.6) and (4.7) in detail. The limit of a quotient function can be evaluated Proof Let t be fixed in (a,√ b]. We simplify (4.5) by can- celling the positive factor 1/ 2π(t − a) in (4.4); its denom- by the quotient of limits if the limit of the denominator is ¯ non-zero. The denominator is discussed in Remark 4.2 that inator is still positive. Then, the inequality α 0 β α H1(t,α,β):= K(t,u)du; since both of the linear function u and the constant function α α of u are integrable on the bounded interval (α, β].The H2(t) := (α(t), β(t)); inequality follows as required since αasince a is the left end point of the interval and so ¯ Let us consider the case t = a, i.e., Xa.Asx(a) = 0 for all we consider right continuity at t = a only. ∈ C = ≡ x , Xa(x) 0 for all x. Hence, we have Xa 0 and its Suppose on the contrary that X(t)¯ is not continuous at C ¯ ≡ ¯ · average value over is zero; Xa 0. We denote X( ) as the t = a. Then there exists a positive real number >0 such ¯ ∈[ ] extension of X(·) to all t a,b in the next definition. that for t with a0. Note that X(a)¯ = 0. We first check part of the inequality, Definition 4.2 Let C ⊆ C [a,b] and let it be a uniformly 0 X(t)¯ ≥ . If so, we have β(t) ≥ X(t)¯ ≥ by Theorem 4.1. equicontinuous collection. Then the curve defined by + ≥ This means that limt→a β(t) >0 which contradicts to X¯ : t ∈ (a, b] the first relationship lim → + β(t) = 0 in Lemma 4.1.The X(t)¯ := t (4.9) t a 0 : t = a other part of the inequality X(t)¯ ≤− implies that α(t) ≤ ¯ X(t) ≤− and so lim → + α(t) ≤− <0. This leads to a ¯ t a is the average curve of all x in C, where Xt is in for- contradiction to the second relationship limt→a+ α(t) = 0in mula (4.4). Lemma 4.1. Hence, X¯ is continuous at t = a.  The statement of Theorem 4.2 is valid to the extension ¯ [ ] Definition 4.2 with Theorems 4.1, 4.2 and 4.3 are key to X: continuous and bounded on a,b . This is stated below in ¯ Theorem 4.3 which requires an additional explanation of t = the model proposed in this paper; the average function X ex- a since the statement is already true for all t ∈ (a, b]. In Defi- ists and is unique in the Wiener space. Moreover, it becomes = { }− = { } a member of E for t ∈ (a, b]. In terms of shapes for practi- nition 4.1, α(t) infx∈C x(t) ε and β(t) supx∈C x(t) t −6 for t>a. We suggested ε = ε(t) = 10 (sup ∈C{x(t)}− cal application, Theorems 4.1, 4.2 and 4.3 in this subsection x ¯ infx∈C{x(t)}) in Remark 4.1. By the fact that for a uniformly imply that the average shape defined by X is always within C { } { } equicontinuous collection ,infx∈C x(t) and supx∈C x(t) the range of shape variability (or deformation) present in the are both continuous functions of t [1], so that α and β are sample set C of shapes. The validity and invariance of the continuous on (a, b]. We need the following lemma which average function developed in this section will be discussed aids the explanation of t = a. in Sects. 5.1 and 4.4, respectively.

Lemma 4.1 lim → + β(t) = 0; lim → + α(t) = 0. t a t a 4.4 Invariance of the Average Formula Proof We first show the relationship for β. Suppose, to the contrary that limt→a+ β(t) = 0. Then there exists a positive As discussed in Sect. 4.3, the process of computing the av- 0 such that for all δ>0, if a0, tation. In addition to the theoretical validity of our formula- β(t) ≥ x(t) ≥ 0 for a0, if a

Fig. 7 For a parameter domain [0, 2π], 5 different values of t0 are compared so that 5 different shape spaces C0[t0, 2π]. For each t0, the average values of {x(1 : 128)} at fixed 128 t’s are compared (left) and the non-overlapped part near t = 0 is zoomed in (right)

4.4.1 Invariance w.r.t. a Parameter Domain For open curves such as femurs in Fig. 9 (will be used in Sect. 7.2), there is no difference between outputs from [ 1 2 ] For shapes represented by closed curves, assuming that we shape spaces built on parameter domains [0,1] or 3 , 3 or use polar coordinates5 for points on the curves, we need to [0,4000]. We also varied the number of parameters, such as take cylinder sets between t = 0 and t = 2π. In such a case, 100, 128 or 200, within a parameter domain and found no the parameter domain should be bigger than [0, 2π] since all difference. curves x must satisfy x(t0) = 0, so that t0 < 0. Then cylin- These are expected behaviour since the resulting average der sets are defined from t = 0 or slightly bigger than 0, say is continuous as theoretically supported in Theorem 4.3.For −6 10 ,to2π. We compare various values of t0 for the ex- these reasons, we can assert with confidence that our for- ample of dissimilar shapes in Sect. 7.1 where the curves of mula for the average is invariant with respect to the choice the example are discretised by {x(1 : 128)}. The average val- of a domain [a,b] and parameters on it. ues of the fixed cylinder sets t1,t2 ...,t128 for five different −4 −2 −1 values of t0 (−10 , −10 , −10 , −1, −π), hence mod- 4.4.2 Invariance w.r.t. an Initial Point: Closed Curves elled for five different shape spaces C0[t0, 2π], are calcu- lated and plotted in a single figure (Fig. 7). The average formula is invariant of the choice of an ini- Though the figure is plotted like an open curve, it is for tial point when the correspondence in a sample set is estab- aclosedcurveasshowninFig.9 (it is from the example lished. For shapes represented by closed curves, the averages of dissimilar shapes to be used in Sect. 7.1). We stretch the of the set {x(1 : end)} and {x(middle + 1 : end, 1 : middle)} parameter domain into the Cartesian coordinates in Fig. 7, are compared. If the middle point is located at the antipodal to look into how the various choices of t0 behave. We can point, the rotation of the average of the latter by π should see slight differences in the first five cylinders in Fig. 7 but be coincide with the average of the former. This is shown in the differences zoomed in the right figure are at a very small Fig. 8—the average curve will appear with its sample set in scale and negligible; we observed that this difference did Sect. 7.1. not appear and completely overlap in an example of similar shapes. 4.4.3 Invariance w.r.t. Orientation: Open Curves This example (and other similar tests) reassure us that the choice of t0 for the domain of [t0, 2π] does not affect the For open curves, there are two orientations to be taken in resulting average value at each t though for a different t0 a parameterisation. One end of an open curve can be taken as different variance t − t0 in the Gaussian is applied at each t. an initial point, say t = 0, and the other as a terminal point, say t = 1, if a parameter domain is [0, 1]. If the orientation 5For a closed curve, the polar coordinates enable us to represent it as a of the parameterisation is clockwise (c.w.) as in Fig. 9 (left), single-variable function, which is more efficient than in Cartesian coor- the other choice of orientation is counterclockwise (c.c.w) dinates where the representation requires two single-variable functions. where the initial point and terminal point are switched. We J Math Imaging Vis (2009) 33: 39–65 51 Fig. 8 Invariance w.r.t. an initial point. The average of {x(1 : 128)} in Fig. 7 is plotted in the polar coordinate, and the average of {x(65 : 128, 1 : 64)} is overlaid (left). The latter is rotated by π and overlaid (right)

Fig. 9 Invariance w.r.t. orientation. Left: an initial point for each of 32 femurs is marked by a red ◦ (c.w.). Right: the average curve of the left figure (c.w.) is plotted by a blue solid line and the average curve of c.c.w curves is by a red dashed line

compare the averages of the curves in Fig. 9 (left) with the We have introduced the Fréchet functional FC in (3.1) two orientations in Fig. 9 (right); the averages for the two and want to show that the average X¯ defined in Sect. 4 is the orientations are coincide. minimiser of FC, F ¯ := − ¯ 2 5 Variance and Score From Average Shape C(X) x X 2 dm(x). C In order to evaluate the average defined in Sect. 4, we need some kind of measurement for dispersion or spread. This Theorem 5.1 verifies this and deals with a minimisation section provides such a measurement: the first subsection problem. We approach it as a variational problem of the discusses the concept of variance and the second subsection functional FC. The variational problem could be solved us- a quasi-score. ing the Euler equation; but we suggest a simpler way, using convexity of the functional, since that way does not require 5.1 Variances strong differentiability. We first introduce a necessary proposition for convexity Definitions of variance are proposed in this subsection for dispersion or spread of a sample set of continuous functions in [52] to solve a variational problem. The Gâteaux variation 2 below is a kind of directional derivative and used for the from its average, For this, we employ the L -norm · 2 for distance between functions as discussed in Sect. 2.3. proposition. 52 J Math Imaging Vis (2009) 33: 39–65 Y = b 2 = Gâteaux variation: Let be a linear space and J J(y) a with equality if and only if a v (t)dt 0 which implies functional on a subset of the linear space. For y,v ∈ Y, v2(t) = 0 almost everywhere on [a,b] in the Lebesgue mea- sure. However, only v2 ≡ 0 is a possible function with con- J(y+ v) − J(y) δJ(y; v) := lim , (5.1) tinuity, i.e., v ≡ 0. Thus J is strictly convex and by Proposi- →0 tion 5.1, each y ∈ C which makes assuming that this limit exists, is called the Gâteaux b variation of J at y in the direction v. δJ(y; v) = 2 {y(t) − γ(t)}v(t)dt = 0 Convexity: A real valued function J defined on a set D in a a linear space Y is said to be [strictly]6 convex on D pro- for all v, minimises J on C uniquely. Then any y with the vided that when y and y + v ∈ D then Gâteaux variation property that y(t) = γ(t)almost everywhere is the solution. δJ(y; v) is defined and Clearly, y ≡ γ accomplishes this since y should be a con- tinuous function. Hence, the function y ≡ γ is the unique + − ≥ ; J(y v) J(y) δJ(y v) (5.2) minimiser of J .  [with equality if and only if v = O], where O is a zero Theorem 5.1 Let a set C ⊆ C [a,b] be Wiener measurable. element of the linear space Y. 0 If yˆ is a minimiser to the Fréchet functional Proposition 5.1 Let Y be a linear space and J = ([52]) F := − 2 C(y) x y 2 dm(x), (5.3) J(y) a functional on a subset of the linear space. If J is C [strictly] convex on D then each y0 ∈ D for which Gâteaux then the minimiser yˆ is of the form variation δJ(y0; v) = 0, for all y0 + v ∈ D minimises J on D [uniquely]. 1 y(t)ˆ = x(t)dm(x), (5.4) m(C) C Lemma 5.1 Let C ⊆ C0[a,b] and J be a functional defined on C by where m(C) is the Wiener measure of C as defined in (2.2). b F J(y):= [{y(t) − γ(t)}2 + ζ(t)] dt, Proof The Fréchet functional C in (5.3) is expressed by a a double integral on the left hand side of (5.5) since the L2-norm is also an integral. By the Tonelli theorem, (5.5) where γ ∈ C and ζ ∈ C [a,b] are fixed. Then the functional 0 below follows. Equation (5.6) is obtained by coupling the J is convex and minimised at y ≡ γ . Furthermore, J is integrand to be a squared quantity. Hence, we have the fol- strictly convex and the minimiser is unique. lowing series of equations. + − Proof By (5.2), convexity of J requires that J(y v) FC(y) J(v) − δJ(y; v) ≥ 0 for all v for which y + v ∈ C.The ; b Gâteux variation δJ(y v) of J defined in (5.1) is, for each = [y(t)2 − 2y(t)x(t) + x(t)2] dt dm(x) y ∈ C, C a b b = [ 2 − + 2] ; = { − } y(t) 2y(t)x(t) x(t) dm(x) dt (5.5) δJ(y v) 2 y(t) γ(t) v(t)dt. a C a b Hence, = y(t)2m(C) − 2y(t) x(t)dm(x) a C J(y+ v) − J(v)− δJ(y; v) 2 + x(t) dm(x) dt b C = [{y(t) + v(t) − γ(t)}2 + ζ(t)] b 2 a 1 = m(C) y(t) − x(t)dm(x) C −[{y(t) − γ(t)}2 + ζ(t)]−[ (y(t) − γ (t))v(t)] dt a m( ) C 2 b + η(t) dt, (5.6) = v2(t) dt ≥ 0 a where η(t) is 6This is to be considered two assertions; the first is made by deleting 1 2 the bracketed expressions throughout, while the second requires their η(t) := x(t)2 dm(x) − x(t)dm(x) (5.7) presence. C m(C) C J Math Imaging Vis (2009) 33: 39–65 53 C C b and independent on y, and m( ) is the Wiener measure of . = 1 2 ˆ x(t) dm(x) Finding a minimiser y of (5.6) can be considered as a vari- m(C) a C ational problem, as proved in Lemma 5.1. By Lemma 5.1, 1 2 the integral in (5.6) achieves its minimum when y(t) = − x(t)dm(x) dt. 1 C = m(C) C m(C) C x(t)dm(x) if m( ) 0, and so does the functional FC in (5.3). (If we exclude the positive constant m(C) of The variance in Definition 5.1 is a normalised value of the FC in (5.6), then in the expression of J in Lemma 5.1 1 ¯ Fréchet functional, C FC(X). The formula is based on the γ(t)= 1 x(t)dm(x) and ζ(t)= 1 η(t).) Hence, m( ) m(C) m(C) fact that a variance is usually described by an average dis- tance between a sample set and their mean when they are 1 y(t)ˆ = x(t)dm(x) numeric variables. We again follow a similar type to the av- C m( ) C erage formula in (4.1) and so we normalise it by m(C). and the minimiser yˆ is unique; this is the desired (5.4).  Remark 5.2 When the sample set C is the whole set C0[a,b] [ ] Theorem 5.1 provides the existence of a minimiser and in Definition 5.1, C0 a,b is the integral domain instead of a C = − states what it is. Consequently, we can obtain the minimum sample set and then by Corollary 5.1 η(t) t a. Hence, value and it is described in the following corollary. the variance from the population average, a zero function, is − 2 2 (b a) Corollary 5.1 Let FC be the Fréchet functional defined in σ := (5.8) 2 (5.3) in Theorem 5.1. Then its minimum value is In fact, we have already derived the value σ 2 in Sect. 3. The- b orem 3.1 says that min FC(y) = η(t)dt, a − 2 F = − 2 = (b a) + 2 where η(t) is in (5.7); (y) x y 2 dm(x) y 2 C0[a,b] 2 2 (b−a)2 2 1 F [ η(t) = x(t) dm(x) − x(t)dm(x) . and the minimum value of (y) is 2 since m(C0 a, C m(C) C b]) = 1 and the population average is a zero function, i.e. y ≡ 0. Proof By Theorem 5.1, FC in (5.3) is minimised at yˆ, where yˆ is in (5.4). Hence, the minimum value of FC is FC(y)ˆ and The practical problem of how to calculate the formula F ˆ = b{ C · + } from (5.6) C(y) a m( ) 0 η(t) dt. Therefore, the in Definition 5.1 remains. Trivially, if a sample set is large result follows.  enough, then the sample set can be regarded as the whole set, ¯ C0[a,b], and the variance of C from the average X could be Remark 5.1 By Corollary 5.1, we see that our construction defined by the population variance itself in (5.8). However, of an average formula in Sect. 4.3 is sensible. If the sample this hardly ever happens in practice since C0[a,b] is an un- set C is an individual cylinder set Et defined in (4.2), which countable set and moreover, is of high cardinality. makes up the sample set C, then Corollary 5.1 tells us that If the set C is composed of cylinder sets such as the one ¯ ¯ yˆ = Xt ; the average value Xt in (4.3) is a minimiser of a in (4.2), characterised by a finite number of boundaries α’s F Fréchet functional Et on the cylinder set Et for each t.It and β’s in Definition 4.1, then the value in Definition 5.1 is ¯ ≤ can then be said that Xt for a

Here, the denominator M can be replaced by M − 1fora C0[a,b] in our model. However, in any shape space and any small sample set. This formula may be useful for practical method built on a metric space, one can analogously define applications. In a real world application, a sample set is usu- a quasi-score ally a finite set of continuous functions and relatively small, ¯ [ ] x − X in contrast to the whole set C0 a,b . z := The concept of variance defined in this subsection can be σX¯ used for evaluation between sample sets within a population with an appropriate metric · employed on a shape space set. It is further developed for more general use in the next ¯ and a sample set having X and σ ¯ . subsection. X With the concept of a quasi-score, the function variables 5.2 Score x are transformed to numerical variables z. Then, the degree ¯ In the previous subsection, we have derived the mathemati- of spread of x’s from the average function X is compactly cal variance and proposed an alternative formula for practi- described by the variance of the numerical variables z.In cal use. The variance simplifies important characteristic of a terms of shapes, the quasi-score measures the global defor- sample set into single measurement of central tendency (av- mation of a shape x from the average shape with a numerical erage). The analysis of variance deals with differences be- value. tween sample averages within a population. However, for Note that, unlike a score in traditional numeric variables, 2 comparative evaluation between different distributions or a quasi-score is nonnegative since we take the L -norm as a [ ] methods, the variance may not provide an adequate mea- metric on the shape space C0 a,b . When the mathematical surement of spread or dispersion. Sometimes, for example, variance in formula in Definition 5.1 is not calculable, we ˆ we need to compare averages from different distributions or can again use σ as an alternative formula to σX¯ in (5.10); to analyse shape models based on different types of shape we use (5.9) instead. spaces. This requires a universal measurement for analysis of shape spaces regardless of how shapes are represented, what are shape spaces, and what kind of distributions are 6 Model for Shape Variation and Fitting employed. The concept that satisfies the requirements is pro- This section discusses how to fit a model to an instance in a vided in this subsection. shape space and to express shape variation. We make use of To meet these requirements, we suggest an adaptation of the average developed in Sect. 4. The problem of model fit- the statistical score (sometimes called z-score). A score in ting in a shape space is investigated as an algebraic system: statistics is a value of a standardised (numeric) random vari- a system of linear equations. The system is overdetermined able which is useful for comparing different distributions. and we suggest a strategy to solve the overdetermined sys- It has a mean of zero and a variance of 1, which accounts tem. for the name standardised. Most commonly, normal distri- If a shape space can be reduced to a lower dimension, an butions are transformed to the standard normal distribution instance of shape in the shape space should be modelled by by a score when variables are numeric. this reduced representation. Let {φ1,φ2,...φk} be k repre- Definition 5.2 Let x be an element of a sample set C and sentatives of a shape space. The most common φ’s are mem- ¯ C bers of a basis for the shape space if it takes the form of a X the average over and σX¯ be the positive square root of variance σ 2 in Definition 5.1. We define an associated vector space. The model is usually expressed in the form X¯ + +···+ standardised random variable given by of a linear combination of them, c1φ1 c2φ2 ckφk, with suitable coefficients c1,...,ck. For the shape space of x − X¯ our model discussed in Sect. 4, any finite dimensional basis z := 2 (5.10) for the Hilbert space (L2[a,b], · ) can be the φ’s since σX¯ 2 C0[a,b] is a subspace of the Hilbert space (see Appendix B). and call it a quasi-score.7 If a shape space is built based on an eigenstructure, eigen- vectors can be the φ’s. 2 Remark 5.3 (1) We employ the L -norm · 2 in Defini- For effective representation of shape variation, we then tion 5.2 since we have taken it as a metric on the shape space express any shape in terms of an average shape x¯.Anin- stance model x is formulated by 7 A similar concept in multivariate analysis can be Mahalanobis dis- =¯+ tance, which takes into account the correlations of a data set. When the x x c. (6.1) covariance matrix used in Mahalanobis distance is the identity matrix, the Mahalanobis distance becomes to the standard normalised distrib- Let y be a shape to be fitted to the model in (6.1), then ution when dealing with finitely many random variables. the difference between y and the model is formulated to J Math Imaging Vis (2009) 33: 39–65 55 z(= y −¯x) = c. The linear system z = c is overdeter- we apply global affine transformations to a data set. In or- mined and ill-posed problem in our applications since we der to find an optimal transformation, several iterative ap- try to represent a shape (a dense, say n point set) as compact proaches, such as generalised Procrustes analysis for a group (with just small number, say k of basis) as possible, result- alignment [6, 16], have been suggested. However, through- ing in , n × k matrix. We want to make the residual vector out this paper we use an analytic method (see Appendix A) z − c as small as possible with suitable values of c since to find an optimal transformation for a pair-wise alignment it cannot be made equal to zero in general (see [21] for de- for our model. tails). We choose the Euclidean norm in the discussion of Often, shapes acquired from medical images of an inter- this Least Square problem. nal organ tend to be quite similar. For this reason, we first A strategy to solve this system z = c is first to trans- devise a set of synthetic data to illustrate how the model form the system into a well-posed problem; n×k matrix into can explain the variety of shapes in the case of highly dis- n × n matrix. Then we apply the Conjugate Gradient (CG) similar shapes, see Sect. 7.1. Then similar shape data sets method to the transformed system which is well-behaved in of anatomical or biological objects are tested in Sect. 7.2. convergence. We accomplish the transformation by multi- Section 7.3 evaluates the average shapes generated in those plying the matrix T on the left to both sides of equation sections in terms of quasi-scores. Section 7.4 presents the c = z so that T c = T z. The matrix A := T  is application of the model for variation and fitting developed in Sect. 6. then symmetric and positive definite. Then CG iteration ap- plied to A converges monotonically with a norm defined by 7.1 Application to Dissimilar Shapes the matrix A [51]. With the new notations A = T  and b := T z, the system is reparaphrased by Ac= b. An opti- ∗ We present the application of our shape method to a set of mal estimate c is used for our original equation and so for deliberately dissimilar shapes, see Fig. 10 (top). We para- ≈ =¯+ ∗ = −1 (6.1); y x x c . One can use c Aps b for solving meterise each of the contours by an arc-length comprising −1 the equation, where Aps is a pseudo-inverse (such as a ma- 128 points in [0, 1]. Then global affine transformations are trix left division in MATLAB). We have observed from our employed so that the curves are centered on the origin, taken data sets that the CG algorithm produced smaller relative er- as their geometric centre, and depicted in Fig. 10 (bottom) rors (order of 10−3 to 10−4) than the matrix left division in by solid lines. For convenience in manipulating these closed practice. contours, we use polar coordinates8 for their representation. They are denoted by x1,...,x7 and regarded as continu- ous functions of a radial angle, denoted by t, in our model; 7 Applications of the Models the notation (x, t) will be used throughout this section for conveniently referring to formulae in the modelling section, instead of (r, θ) in standard representation of polar coordi- This section presents several applications of the models pro- nates. We then reparameterise the aligned contours by re- posed in Sects. 4, 5 and 6. In practice data representing a sampling onto equally spaced t’s. shape is a discrete set; in our approach for a continuous Let us denote the reparameterisation of the seven func- curve, a very dense collection of points. A data set corre- ={ } = = (j−1)2π tions by T t1,...,t128 , where t1 0, and tj 128 sponding to a shape should have a form that enables efficient for j = 2,...,128, which are radial angles covering the manipulation, evaluation and analysis. When a data set is plane at the origin and likely into [0, 2π]. Recall that for not well-formed, we can employ some parameterisation on every x ∈ C0[a,b], x(a) = 0. The sample curves presented a data set. There have been various parametric approaches to in Fig. 10 (right) are obviously non-zero at t = 0, so we shape representation; Superquadrics [48]; Fourier harmon- extend the parameter domain slightly bigger than [0, 2π]. icsof2-Dcurves[30, 45, 47]; Spherical harmonics in 3-D Assuming that t0 =−δ and x(t0) = 0, the domain of t then objects [5, 19]; a medial description [13, 35, 36]; a mixture becomes [−δ,2π], where δ is a positive real number. An ar- of Spherical harmonics and medial description [15, 46]. We bitrary choice of a small positive number δ would be accept- use an arc-length parameterisation and a Hilbert basis para- able. The contours in the sample set are now members of the meterisation [3] in this paper which are appropriate for arbi- function space C0[−δ,2π], the Wiener space; so that we trary continuous curves. The parameterisation methods used can deal with x(0) = x(2π) = 0 keeping x(t0) = 0, where in our model show good performance in affine alignment t0 =−δ

7.2 Application to Similar Shapes

7.2.1 Femurs

We next test our model on a data set of femoral shapes. This consists of 32 contours, each of which has already been sampled with various lengths from 89 to 217 (their average length is 131.7). In this example, we use Cartesian coordinates of the curves representing femoral shapes. We mainly use a pa- rameterisation method introduced in [3] where their shape space is invariant to affine transformations. Its algorithm is present in appendix C. We call it a Hilbert basis parame- terization, since the parameterisation is performed by find- ing a finite dimensional Hilbert basis for L2[0, 1]. Note that 2 C0[0, 1]⊂L [0, 1]. In this example, we use the parameters T ={tj ,j = 1,...,129}⊂[0, 1]. The number 129 is cho- sen for comparison with the MDL method, which requires Fig. 10 Top: Synthetically generated 7 geometric curves. Bottom: 2 + 1 samples for open curves, and to be close to the aver- Affine transformed curves (solid lines), cylinder sets (radial bars), an age length (131) of the data set. average curve (dots) As a result of the parameterisation, each curve is approx- imated by a Hilbert basis. We chose a 4-D basis, which is 9 We define a cylinder set for each tj ∈ T . The smallest sufficiently good to take, according to theorems in Appen- value αj and biggest value βj of x(tj ) at tj for all x are dix B. Then the parameterised curves are affine aligned to 1 7 then evaluated, say, βj = max{x (tj ), . . . , x (tj )}. In prac- one of curves in the data set. To check if the approximation j j 10 tice, the values αj and βj are determined on a neighbour- (x˜ , y˜ ) by the basis is sufficient, the relative error is mea- hood of tj to accommodate all deformations existing in the contours. The cylinder sets are the set of all continuous con- 9This is the same way as one commonly takes the first one or two terms tours within the range depicted by radial bars in Fig. 10 of a Taylor expansion as in [6, 32], which is in fact equivalent to taking (right). one or two dimensional subspace from an infinite-dimensional space of ¯ differentiable functions, or a first few terms of an harmonic expansion Then, for each j = 1,...,128, the average value Xj of (sinusoidal basis functions) for a function to approximate the function. each component of the random vector (X1,...,X128) is Obviously the bigger the dimension is, the better the approximation is. evaluated using (4.4). In practice, the average contour in our 10The methods compared here are of different scales; one method uses model is defined as the contour represented by the vector normalisation but the other does not. For a fair comparison of such ¯ ¯ of the average values (X1,...,X128). The average contour methods of different scales, a relative error is necessary. J Math Imaging Vis (2009) 33: 39–65 57

Fig. 11 Geometric shapes x1,...,x7 (solid lines)andtheiraver- distribution (middle); a uniform distribution by generalised Pro- age overlaid as a dotted curve: arc-length parameterisation and the crustes alignment, which requires normalising the raw data (right) Wiener measure (left); arc-length parameterisation and a uniform

Fig. 12 Affine aligned femurs and their averages, a Hilbert basis pa- Fig. 13 Affine aligned femurs and their average, an MDL parameteri- rameterisation sation

j j j j j j sured by (x˜ , y˜ ) − (x ,y ) 2/ (x˜ , y˜ ) 2. The mean of ¯ { ¯ ¯ : = } allel Ytj . Finally, the set of pairs (Xtj , Ytj ) j 1,...,n relative errors of 32 femoral curves resulting from this ap- is taken as the average curve of the sample set A. proximation is 0.002592. This measurement of error is in- { ¯ ¯ : = Figure 12 shows the average curve (Xtj , Ytj ) j dependent of the location where the data set is aligned. As 1,...,n} overlaid on the affine transformed curves in the is well-known, an affine transformation is not affected by Cartesian coordinates (dots). The average with a uniform location. We call the affine aligned curves a sample set and distribution is also overlaid (crosses) for comparison, which denote it by A ={(x1,y1),...,(x32,y32)}; we reuse the no- is generated from the same parameterisation and alignment. tation x and y for the sample set though they have been ap- The average generated from the proposed model is close to proximated by the Hilbert basis. They are shown as solid the average calculated from a uniform distribution where lines in Fig. 12. small deformation occurs, but our average copes with large For x and y-coordinates of curves, cylinder sets are de- deformation. We observed that the resulting average of our fined for the parameters T . For an average, we use Defini- model is not affected by noise, which gives a very small dif- tion 4.2 for both the x- and y-coordinate. For each t ∈ T , ference of the range of cylinder sets, as supposed from our two systems of measurable functions Xt (x) := x(t) and theorems. On its right an average with a uniform distribu- Yt (x) := y(t) are used to calculate an average value. In fact, tion of MDL parameterisation (software adapted from [49]) we use two random vectors (Xt1 ,...,Xtn ) and (Yt1 ,...,Ytn ) is given in Fig. 13. Note that, in this section, we plot every = ¯ where n 129. Then the average value Xtj of each compo- other point of t, i.e. (1:2:129), in all figures of cylinders and nent of (Xt1 ,...,Xtn ) is calculated by using (4.4) and in par- average to clearly visualise them. 58 J Math Imaging Vis (2009) 33: 39–65

Fig. 14 Affine aligned metacarpals and average, a Hilbert basis para- meterisation Fig. 15 Affine aligned metacarpals and their average, an MDL para- meterisation 7.2.2 Metacarpals Table 1 Variance of quasi-scores of examples in Sects. 7.1 and 7.2 in three methods The set of 24 contours from metacarpals is used in this sub- section; a raw data set sampled by 281 points. We parame- Distribution Wiener Uniform Uniform terise the contours of the data set by a Hilbert basis into (parameterisation) (Hilbert/arc) (Hilbert/arc) (MDL) 128 points. The approximation by a Hilbert basis was good enough to take; the mean of relative errors of the approxi- 32 femurs 0.107254 0.307687 0.143926 mation is 0.000760. 24 metacarpals 0.084478 0.306329 0.147042 The averages generated by three methods are presented in 7 geometric curves 0.000074 0.051944 0.240408 Figs. 14 and 15: the Wiener measure of a Hilbert parameter- isation, a uniform distribution of the same parameterisation (Fig. 14) and MDL parameterisation (Fig. 15). metacarpals are also given in Fig. 18. In both figures, quasi- scores of the Wiener measure show the least variability. 7.3 Evaluation with a Quasi-Score The quasi-scores for all examples are summarised in the Table 1. The average with the Wiener measure has the small- In the previous subsections, we have tested our shape model est variance of quasi-scores. The smaller variance of quasi- on a range of examples: dissimilar shapes and similar scores implies that the sample is less spread about the aver- shapes, represented both by closed curves and open curves. age. The gap of variances of quasi-scores between the two In this section, the average shapes generated in Sects. 7.1 different methods tends to be bigger for dissimilar shapes or and 7.2 are evaluated quantitatively in terms of the quasi- shapes with large deformations. As expected from the theo- score defined in (5.10). rems and corollaries in Sects. 3 and 5, global analysis with The quasi-score proposed in (5.10) is of considerable in a quasi-score tells us that the average with the Wiener mea- comparing distributions or methods; it is used for the mea- sure produces a better representative of the sample set than surement of spread of a sample set about an average. Fig- ¯ the one with a uniform distribution. xj −X 2 ure 16 shows the quasi-scores zj = of geomet- σX¯ ric contours xj , j = 1,...,7, computed with their variance 7.4 Implementation of Model Fitting from the four methods. The quasi-scores computed using the Wiener measure (according to our proposed model) has low The model for shape variation and fitting (Sect. 6)istested variability, whereas the quasi-scores from the more com- in this subsection. In our experiments, data to be fitted has monly used methods, which use the uniform distribution, been excluded from a sample set that is used for building a have substantially greater variability. model and average. We adopt a leave-one-out strategy: given The quasi-scores of femoral curves with the Wiener mea- a set of M shapes, we remove one of them and form a sample sure are depicted in Fig. 17 and compared to those with a set with the rest of M − 1 shapes. From a sample set of size uniform distribution of the same parameterisation and MDL M − 1, we build a model x =¯x + c. With the model, data parameterisation. The quasi-scores of the three methods for is tested to see how well it fits. J Math Imaging Vis (2009) 33: 39–65 59

Fig. 17 Quasi-scores of 32 femoral curves in Figs. 12 and 13

Fig. 16 Quasi-scores of 7 geometric curves. (Top) Wiener measure only; (Bottom) four methods including Wiener measure presented left: three in Fig. 11, and a uniform distribution for MDL parameterisation.

Fig. 18 Quasi-scores of 24 metacarpal curves in Figs. 14 and 15 We demonstrate fitting of data to two types of  as repre- sentatives of a shape space. For clarity, we term two models described and tested in this section though they are both ex- we choose CG iteration for our algorithm for both types of pressed by the same notations in (6.1).  which is more accurate in our experiments. The fitting of data y to a model x is visualised in one • By the Wiener-Hilbert model, we mean a model x =¯x + figure to investigate how well a model fits the data. Since c with the average x¯ generated within the structure of visual comparison may be objective, we also evaluate the the Wiener space in Sect. 4. The matrix  consists of a fitting error. In practice, each model has a different scale. finite dimensional basis for the Hilbert space L2[0, 1]. Hence, for a measurement of error, we use a relative norm • By a PCA model, we mean a model x =¯x + c with the − ¯+ − y (x c) = z c , where · is the Euclidean norm. In average x¯ with a uniform distribution and the matrix  y y the Wiener-Hilbert model, the Euclidean norm is regarded that consists of a certain number of eigenvectors of the to be equivalent to the L2-norm. covariance matrix of a sample set. For a fair comparison between two models, the example For a PCA model, an algorithm of model fitting has been set should be parameterised. For the Wiener-Hilbert model, suggested by [6] using a projection to a tangent plane of an the Hilbert parameterisation is applied as in Sect. 7.2.For average shape. The rate of convergence is similar to both the PCA model, MDL parameterisation is applied, and we algorithm introduced in the previous subsection. However, use software adapted from MDL software [49]. Note that 60 J Math Imaging Vis (2009) 33: 39–65

Fig. 19 Wiener-Hilbert model (4-D basis): The relative error of fitting Fig. 20 PCA model (5 eigenvectors): The same instance as in Fig. 19. is 0.006197 The error is 0.014058

A PCA model for the same hand data was also tested and unlike the Wiener-Hilbert model, all vectors are normalised the fitting result is provided in Fig. 20. In the PCA model, 5 in a PCA model. In a PCA model we take a certain number eigenvectors are used to generate this model. Again, the red of the biggest eigenvalues to cover 95 % of shape variations, contour is data to be fitted and the blue contour is the opti- which is approximately equivalent to the range of 2σ , where mised model from the algorithm. The relative norm between σ is variance of a model assumed to follow the Gaussian dis- the instance and its PCA model is 0.014057 which is bigger tribution [7]. (One can choose any portion of eigenvalues.) than that of the Wiener-Hilbert model. Then we take eigenvectors corresponding to these eigenval- In fitting to the Wiener-Hilbert model, mismatch is ues. Hence, the number of eigenvectors used for construct- present only at the index finger and middle finger (on one ing  varies example by example in a PCA model. In con- side each of them); otherwise the fit is excellent. In fitting trast to a PCA model, a fixed number of basis components to a PCA model, mismatch occurs at many scattered parts. are used for the Wiener-Hilbert model: a 4-D basis for all Some of the other contours were tested as data and they examples. produced slightly smaller, but very similar, errors to these In one experiment, 18 hand contours11 were used; one results. The errors of a model with a Hilbert basis vary little, contour was selected as data to be fitted to the model and the of the order of 0.006 (< 0.007) while the errors of the PCA rest, 17 contours, for a model (and a sample set for an aver- model vary more, of the order of 0.01 (> 0.01). age). We used the raw data set of hand contours which have The femur example in Sect. 7.2 was tested and the fit- been sampled by 72 points for each contour. The contours ting is presented in Figs. 21 and 22. The femoral shape is are parameterised with a different method for each model smoother than the hand shape, so that the errors in fitting but the shapes to be compared should have the same num- are smaller than those of the hand examples in both models. ber of points. Again, due to the nature of the MDL software, The fitting result of the Wiener-Hilbert model is presented in where the number of points representing a shape must be of Fig. 21. The error from the fitting of this model is 0.003364. the form 2 + 1, we choose 65 points as being the closest The result of fitting to a PCA model with MDL parameter- number of the form 2 + 1 for both models. isation is given in Fig. 22. In the PCA model, 12 eigenvec- Figures 19 and 20 show the performance of model fit- torsareusedtobuild. The error from the PCA model is ting for the two different models; also with different para- 0.012056. meterisations. On the left, Fig. 19 demonstrates an instance In summary, the relative errors produced from two mod- of a hand contour fitted to the Wiener-Hilbert model. The els are compared for two sets of examples: a hand shape and red contour is hand data to be fitted and the blue contour is a femur shape. The speeds of convergence of the two models the optimised model from our algorithm. The relative error are similar; they converged in one or two iterations. Overall, norm resulting from this fitting is 0.006197. performance of model fitting of the both methods is good; in particular for a hand example considering a small num- 11They are obtainable from the web pages of T. Cootes. http://www. ber of parameters. We have tested it with a denser sets with isbe.man.ac.uk/~bim/ 129 parameters, and they produced slightly smaller errors. J Math Imaging Vis (2009) 33: 39–65 61 Table 2 Relative errors of model fitting, evaluated with the Euclidean norm, ·

Model fitting Wiener-Hilbert model PCA model

hand (65 points) 0.006197 0.014058 femur (129 points) 0.003364 0.012056

can be used to differentiate variations in normal subjects or abnormality due to disease. We have focused on shapes that are not characterised by a small number of landmarks and assumed that planar shapes are expressed by continuous curves as its boundary. For these reasons, the setting of the shape space we propose is the set of continuous functions defined on a bounded interval. The shape space of our models is a well-known space C0 and we used the structure of the Wiener space. As noted in Fig. 21 Wiener-Hilbert model (4-D basis): The relative error of fitting is 0.003364 Sect. 1.1, with the structure our shape space well equips all the necessary properties required for statistical framework and is not complicate as in differential manifolds. A shape represented by a curve is identified as a continuous func- tion in the Wiener measure space; deformation is described by cylinder sets in the space; an average shape is generated with the Wiener measure; employing measurable functions on the Wiener space enables us to extract variable Gaus- sians over shapes. The standard shape is formulated explic- itly in the form of an average in our model. The average generated with Gaussians is quantitatively analysed using a quasi-score which provides a useful tool for the evaluation of results from different methods. Our work is therefore different from previous methods in a number of ways. We are aware that the background knowl- edge of the Wiener space is not on the undergraduate level and provided its essence for our model in Sect. 2. The models have been applied to a set of synthetic shapes and to sets of anatomical shapes in Sect. 7, and we have de- Fig. 22 PCA model (12 eigenvectors): The same instance as in liberately chosen both dissimilar and similar shapes. The re- Fig. 21. The error is 0.012056. sults of the application of the model show that the average generated by our model was a better representative of a sam- The resulting errors evaluated by y − (x¯ + c) / y are ple set than the one by the common approaches which use summarised in Table 2: the proposed model in this paper in a uniform distribution. This has been demonstrated not only the second column, a PCA model in the third column. The visually but also quantitatively in terms of quasi-scores. The errors from the model fitting with the Wiener-Hilbert model differences between our method and previous methods are are smaller than those of a PCA model. The mismatch in most marked in cases where there are large natural defor- fitting to the Wiener-Hilbert model is present in a local area mations. This is the most important case in medical image where a large deformation occurs in the sample. On the other analysis. For example some tumours grow double or triple hand, the mismatch in a PCA model is globally present. of its initial size and to a different shape during a series of medical examinations. Conventional methods are limited to explain the case. We have devoted to overcome the limita- 8 Conclusions and Discussions tion and developed our models. In particular, the model ex- presses deformation embedded in samples regardless of the We have developed models for shape analysis, with particu- deformation being small or large. lar relevance to medical images. We aimed to develop a stan- We have defined two new concepts in our models: an av- dard way to model the shape of anatomical objects which erage shape and a quasi-score. The validity of the definitions 62 J Math Imaging Vis (2009) 33: 39–65 are supported by theorems and corollaries and we can avoid A general 2-D affine transformation is modelled by a 3 × the problems of a Fréchet mean discussed in Sect. 3 in prac- 3 matrix, a system of six degrees of freedom, tical use. It has been proved in Sect. 4 that the average gen- ⎛ ⎞ a a a erated with Gaussians exists, is unique, and is always within 11 12 13 ⎝a a a ⎠ . the range of deformation presented in a sample set. More- 21 22 23 001 over, it is also a Fréchet mean in our shape space and so it yields the least spread. This implies that we can expect that A planar curve is represented by a set of finite num- the outcome of any other examples will be similar to the re- ber of points C1 ={(x1,y1),...,(xn,yn)} by sampling sults of comparisons demonstrated in Sect. 7. In addition, or parameterisation. Then the transformed curve C2 = {     } we have provided various types of invariance of our formu- (x1,y1),...,(xn,yn) is calculated by the following rela- lation raised with practical applications. Therefore, we can tionship: ⎛ ⎞ say that our models are both valid and robust.    x1 x2 ... xn Our proposed concept, a quasi-score, includes the con- ⎝    ⎠ y1 y2 ... yn cept of variance of the average generated in the model. Due 11... 1 to its definition on a function space, the formula of variance ⎛ ⎞ ⎛ ⎞ may not be computerised. In this case, we suggested an al- a11 a12 a13 x1 x2 ... xn = ⎝ ⎠ ⎝ ⎠ ternative formula which is most commonly used in a small a21 a22 a23 y1 y2 ... yn . (A.1) sized sample set. This alternative formula may induce less 001 11... 1 accuracy in evaluation. If we intend to align C1 ={(x1,y1),...,(xn,yn)} to C2 = We also developed the model for shape variation and fit- {     } (x1,y1),...,(xn,yn) , we need to find the best affine trans- ting in a low dimension by using the structure of a bigger formation, i.e. the best values of a11,a12,a13,a21,a22 and 2 space L [a,b] than our shape space C0[a,b]. A basis for a23, satisfying (A.1). If we assume a correspondence be- the Hilbert space enables us to reduce the infinite dimen- tween sampling of two curves C1 and C2 has been made, sionality greatly to a finite dimension as in Sect. 7. then the best affine transformation results in the shortest dis- tance between the two curves. The distance d(C ,C ) is for- The proposed model could be developed to a 3D model 1 2 = n {  − 2 +  − 2} when there is an appropriate parameterisation method. The mulated by d(C1,C2) i=1 (xi xi) (yi yi) . new measurement of global deformation, a quasi-score, can If two curves are geometrically similar and perfect corre- be applicable to the assessment of abnormality: deforma- spondence sampling has been made, the distance is equiv- tions due to abnormal growth such as a tumour. In conjunc- alent to zero. Otherwise, the distance yields a polynomial tion with some local properties of shape, it can enable clin- of degree 2 with respect to a11, a12, a13, a21, a22 and a23, icians quantitatively to estimate deformation corresponding respectively. to abnormal growth, such as staging colorectal cancer which 2 2 2 2 2 2 D = d(C1,C2) = c1a + c2a + c3a + c4a + c5a should be rigorous and of which quantitative measurement 11 12 13 21 22 + 2 +··· is necessary. Clinicians may be helped by using this mea- c6a23 . surement to make more precise observations of changes in- That is, D = D(a ,a ,...,a ) is a quadratic polyno- duced by treatment. Theses topics are the subject of on- 11 12 23 mial in each of six-variables and its leading coefficients going work. c1,...,c6 are all positive. Hence, the polynomial D has a unique relative extremum, whose partial derivatives ∂D for ∂ak Acknowledgements The authors thank to Mr. Vincent Calvez at k = 1,...,6 all vanish. These partial differential equations CMB at Oxford University for a kind help in translation of [14]and are simply reduced to two linear systems represented by Dr. Hans H. Thodberg at Technical University of Denmark (DTU) for 3 × 3 matrices. Moreover, the extremum is the global min- kindly providing data sets and software, which we adapted for compar- ison to our methods in Sect. 7. imiser of the polynomial D since all leading coefficients of D are positive; each of the leading coefficients are made up by the sum of squared values. Note that d(C1,C2) is min- 2 imised if D = d(C1,C2) is minimised. Appendix A: Alignment

Appendix B: Some Basic Properties of the Hilbert The analytic method to find an optimal affine transforma- Space tion, used in Sect. 7, is based on , but to aid some readers not comfortable with it, we describe the method For comfortable access to fundamental knowledge about the here. Hilbert space L2[a,b], we here put some necessary prop- J Math Imaging Vis (2009) 33: 39–65 63 erties used in our parameterisation. They can be found in by μ. In the practical application of this method in Sect. 7.2, commonly used textbooks of functional analysis. we try to minimise the norm μ. The algorithm therein is mainly adapted from [3] and presented below; its applica- Proposition B.1 ([37, 40]) Let p ≥ 1. tion can be found in [11]. p (1) L [a,b] is complete. 1. Initialisation: Choose a parameterisation for each curve, C[a,b] is dense in Lp[a,b] with · i e Lp[a,b] is (2) p, . ., say i-th curve, ψi(t) = (ψi1(t), ψi2(t), ψi3(t)) such as [ ] the completion of C a,b . the arc-length parameterisation onto the unit interval I = [0, 1]. Then initialise d(ψ ) = span({ψ ,ψ ,ψ }), i = 2[ ] i i1 i2 i3 Proposition B.2 ([37]) L a,b is a Hilbert space with an 0,...,m− 1form shapes. inner product f,g= b f(x)g(x)dx¯ , where f¯ is a com- a 2. Update d(φ): Compute d(φ) keeping d(ψi) fixed for all plex conjugate of f . i, d(φ) = span({ψik|i = 0,...,m− 1,k= 1, 2, 3}).Find P that minimises μ. Proposition B.3 ([37]) Every Hilbert space has an ortho- φ 3. Update parameterisation: Reparameterise the curves normal basis. keeping d(φ) and d(ψi) fixed, by finding a continuous bijection γ : I → I , such that d(ψ ◦ γ ) minimises μ. Proposition B.4 ([37]) Let H be a Hilbert space and S = i i i Set d(ψi) := d(ψi ◦γi), i = 0,...,m−1 and go to step 2. {xα}α∈A an orthonormal basis. Then for each y ∈ H, =   (i) y α∈A xα,y xα; 2 = | |2 (ii) y α∈A xα,y . Appendix D: C0 Assumption of the Shape Space The equality in (i) means that the sum on the right-hand side converges (independent of order) to y in H. Conversely, if This section is for those who are partial to differentiability | |2 ∞ ∈ C α∈A cα < , cα , then α∈A cαxα converges to an for a shape space rather than C assumption (linked from · 0 element of H. The in (ii) is the norm induced by an Sect. 1.1). · · inner product , in H. There has been lots of interesting mathematics developed for curves or surfaces e.g., Koenderink (shape index), Gib- By the propositions any element y of a Hilbert space, lin and Cipolla, and there has been work on showing how and so of C [a,b], can be approximated by a finite sum 0 scale space (Moktarian) can apply to images. Majority of n x ,yx of an orthogonal basis S ={x ,x ,...} of j=1 j j 1 2 other methods also assume differentiability so that they are the Hilbert space. built on a Cn space, where n ≥ 2. For the shapes of smooth curves or surfaces, we can of course get extensive outcomes by utilising the essential mathematical tools. Obviously the Appendix C: Algorithm for a Hilbert Basis stronger the assumptions are, the easier to manage. Parameterisation How about the practical side? When we deal with a dis- Let φ be a single-valued function representing a shape, cretised version of a real world problem in computer vision, desirably square integrable, then the set s(φ) ={f ∈ quite often we find amazing differences between the con- L2[a,b]|f,φ=0, f,1=0} is a linear subspace of tinuous world of mathematics and discretised world of com- L2[a,b], called Affine Shape of φ in [3], the name followed puting. For a simple example, a curve appearing smooth per- from [43] where a shape is represented by sparse sets of ceptively is actually zigzagged in machine vision. In such a landmarks, polygons. Then its orthogonal complement, say, case, the calculated curvature of the curve does not reflect d(φ) forms a basis for L2[a,b] [3]. In fact, we can see s(φ) the perceptive curvature; the location of the highest curva- is an annihilator of d(φ) and so L2[a,b]=d(φ) ⊕ s(φ), ture computed from machine vision does not agree with the where ⊕ is the direct sum of the Hilbert space. location where we perceive the highest curvature. One com- When we have a finite dimensional, orthonormal basis mon way of avoiding this type of problems is smoothing, as = 2[ ] in methods using a scale space. However, smoothing is not φ (φ1,...,φn), every element f of L a,b can be ap- = n   always helpful in medical image analysis. For some shapes proximated by φ: Qφ(f ) k=1 φk,f φk, where n is the dimension of d(φ). In this approximation, our concern is such as star-shaped tumours, usually lose its intrinsic nature obviously the error occurred in the approximation: Pφ(f ) = of shape as a result of smoothing. In our model, we want 1 − Qφ(f ), where 1 is the identity functional (1(f ) = f ). to allow and try to cope with this type of shape as well as Theoretically, the operator Pφ annihilates every function in shapes that have been explained well by conventional meth- d(φ), so that PφQφ ≡ 0. In practice, however this rela- ods assuming differentiability. Therefore, we took the mini- tionship does not hold; we denote its Hilbert-Schmidt norm mal assumption C0 for our model. 64 J Math Imaging Vis (2009) 33: 39–65

References 26. Le, H.: On the consistency of Procrustean mean shape. Adv. Appl. Probab. (SGSA) 30, 53–63 (1998) 1. Bartle, R.G.: The Elements of Real Analysis. Wiley, New York 27. Le, H.: Locating Fréchet means with application to shape spaces. (1976) Adv. Appl. Probab. (SGSA) 33, 324–338 (2001) 2. Beichel, R., Bischof, H., Leberl, F., Sonka, M.: Robust active ap- 28. Le, H., Kume, A.: The Fréchet mean shape and the shape of the pearance models and their application to medical image analysis. means. Adv. Appl. Probab. (SGSA) 32, 101–113 (2000) IEEE Trans. Med. Imaging 24(9), 1151–1169 (2005) 29. Mardia, K.V., Dryden, I.L.: Shape distributions for landmark data. 3. Berthilsson, R., Åström, K.: Extension of affine shape. J. Math. Adv. Appl. Probab. (SGSA) 21, 742–755 (1989) Imaging Vis. 11, 119–136 (1999) 30. Meier, D., Fisher, E.: Parameter space warping: shape-based cor- 4. Bookstein, F.: Size and shape spaces for landmark data in two di- respondence between morphologically different objects. IEEE mensions. Stat. Sci. 1(2), 181–221 (1986) Trans. Med. Imaging 21(1), 31–47 (2002) 5. Brechbühler, C., Gerig, G., Kübler, O.: Parameterisation of closed 31. Mumford, D.: The problem of robust shape descriptors. In: 1st surfaces for 3-D shape description. CVGIP: Image Underst. 61, ICCV 1987, pp. 602–606. IEEE (1987) 154–170 (1995) 32. Pennec, X.: Intrinsic statistics on Riemannian manifolds: Basic 6. Cootes, T.F., Taylor, C.J.: Statistical models of appearance for tools for geometric measurements. J. Math. Imaging Vis. 25(1), computer vision. Technical Report, University of Manchester, 127–154 (2006) Manchester M13 9PT, UK (2004) 33. Pennec, X., Ayache, N.: Uniform distribution, distance and expec- 7. Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape tation problems for geometric features processing. J. Math. Imag- models—their training and application. Comput. Vis. Image Un- ing Vis. 9, 46–67 (1998) derst. 61(1), 38–59 (1995) 34. Pennec, X., Thirion, J.-P.: A framework for uncertainty and vali- 8. Davies, R.H., Cootes, T.F., Taylor, C.J.: A minimum description dation of 3D registration methods based on points and frames. Int. length approach to statistical shape modelling. In: IPMI 2001. J. Comput. Vis. 25(3), 203–229 (1997) LNCS, vol. 2082, pp. 50–63. Springer, Berlin (2001) 9. Davies, R.H., Twining, C., Cootes, T.F., Taylor, C.J.: A minimum 35. Pizer, S.M., Eberly, D., Fritsch, D.S.: Zoom-invariant vision of description length approach to statistical shape modelling. IEEE figural shape: the mathematics of cores. Comput. Vis. Image Un- Trans. Med. Imaging 21, 525–537 (2002) derst. 69(1), 55–71 (1998) 10. Dryden, I.L., Mardia, K.V.: General shape distributions in a plane. 36. Pizer, S.M., Fritsch, D.S., Yushkevich, P.A., Johnson, V.E., Adv. Appl. Probab. (SGSA) 23, 259–276 (1991) Chaney, E.L.: Segmentation, registration and measurement of 11. Ericsson, A., Åström, K.: An affine invariant deformable shape shape variation via image object shape. IEEE Trans. Med. Imaging representation for general curves. In: ICCV 2003 (2003) 18(10), 851–865 (1999) 12. Fletcher, P.T., Joshi, S., Lu, C., Pizer, S.M.: Gaussian distributions 37. Reed, M., Simon, B.: Functional Analysis. Methods of Modern on Lie groups and their application to statistical shape analysis. Mathematical Physics, vol. 1. Academic Press, New York (1980) In: Taylor, C.J., Noble, J.A. (eds.) IPMI 2003. LNCS, vol. 2732. 38. Royden, H.L.: Real Analysis, 3rd edn. Macmillan, New York Springer, Berlin (2003) (1988) 13. Fletcher, P.T., Lu, C., Pizer, S.M., Joshi, S.: Principal geodesic 39. Rudin, W.: Principles of Mathematical Analysis. McGraw-Hill, analysis for the study of nonlinear statistics of shape. IEEE Trans. New York (1976) Med. Imaging 23(8), 995–1005 (2004) 40. Rudin, W.: Real and Complex Analysis, 3rd edn. McGraw-Hill, 14. Fréchet, M.: Les éléments aléatoires de nature quelconque dans un New York (1987) espace distancié. Ann. Inst. Henri Poincaré 10, 215–310 (1948) 41. Small, C.G.: The Statistical Theory of Shape. Springer, Berlin 15. Gerig, G., Styner, M.: Shape vs size: improved understanding of (1996) the morphology of brain structures. In: MICCAI 2001. LNCS, 42. Sparr, G.: Depth computations from polyhedral images. Image vol. 2208. Springer, Berlin (2001) Vis. Comput. 10(10), 683–688 (1992) 16. Gower, J.C.: Generalised Procrustes analysis. Psychometrika 43. Sparr, G.: A common framework for kinetic depth, reconstruc- 40(1), 33–51 (1975) tion and motion for deformable objects. In: European Conf. on 17. Johnson, G.W., Lapidus, M.L.: The Feynman Integral and Feyn- Computer Vision. LNCS, vol. 801, pp. 471–482. Springer, Berlin man’s Operational Calculus. Oxford University Press, London (1994) (2000) 44. Sparr, G.: Euclidean and affine structure/motion for uncalibrated 18. Karcher, H.: Riemannian center of mass and mollifier smoothing. cameras from affine shape and subsidiary information. In: SMILE Commun. Pure Appl. Math. 30, 509–541 (1977) Workshop (1998) 19. Kelemen, A., Székely, G., Gerig, G.: Elastic model-based segmen- 45. Staib, L.H., Duncan, J.S.: Boundary finding with parametri- tation of 3-D neuroradiological data sets. IEEE Trans. Med. Imag- cally deformable models. IEEE Trans. PAMI 14(11), 1061–1075 ing 18(10), 828–839 (1999) 20. Kendall, D.G.: Shape-manifolds, procrustes metrics, and complex (1992) projective spaces. Bull. Lond. Math. Soc. 16, 81–121 (1984) 46. Styner, M., Gerig, G.: Medial models incorporating object vari- 21. Kim, J.-G.: Probabilistic shape models: application to medical im- ability for 3D shape analysis. In: IPMI 2001. LNCS, vol. 2082. ages. Ph.D. thesis, Oxford University (2005) Springer, Berlin (2001) 22. Klassen, E., Srivastava, A., Mio, W., Joshi, S.H.: Analysis of pla- 47. Székely, G., Kelemen, A., Brechbühler, C., Gerig, G.: Segmenta- nar shapes using geodesic paths on shape spaces. IEEE Trans. tion of 2D and 3D object from MRI volume data using constrained PAMI 26(3), 372–383 (2004) elastic deformations of flexible Fourier contour and surface mod- 23. Kotcheff, A.C.W., Taylor, C.J.: Automatic construction of eigen- els. Med. Image Anal. 1(1), 19–34 (1996) shape models by direct optimization. Med. Image Anal. 2(4), 303– 48. Terzopoulos, D., Metaxas, D.: Dynamic 3D models with local and 314 (1988) global deformations: deformable superquadrics. IEEE Trans. in 24. Kume, A., Le, H.: Estimating Fréchet means in Bookstein’s shape PAMI 13(7), 703–714 (1991) space. Adv. Appl. Probab. (SGSA) 32, 663–674 (2000) 49. Thodberg, H.H.: Minimum description length shape and ap- 25. Le, H.: Mean size-and-shapes and mean shapes: a geometric point pearance models. In: IPMI 2003. LNCS, vol. 2732, pp. 51–62. of view. Adv. Appl. Probab. (SGSA) 27, 44–55 (1995) Springer, Berlin (2003) J Math Imaging Vis (2009) 33: 39–65 65

50. Thompson, D.: On Growth and Form. Cambridge University on pure mathematics at Yonsei University, Korea and University of Press, Cambridge (1961) Nebraska-Lincoln, USA. Since her first doctoral degree in mathemat- 51. Trefethen, L.N., Bau III, D.: Numerical Linear Algebra. SIAM ics in 1987 from Yonsei University, she published a number of journal (1997) articles on Measure and Integration and Functional Analysis. 52. Troutman, J.L.: Variational Calculus and Optimal Control, 2nd edn. Springer, New York (1996) J. Alison Noble is a Professor of Engineering Science in the De- 53. Twining, C., Marsland, S.: Constructing diffeomorphic represen- partment of Engineering Science at the University of Oxford, UK. tations of non-rigid registrations of medical images. In: Taylor, She heads a biomedical image analysis laboratory of around 25 re- C.J., Noble, J.A. (eds.) IPMI 2003. LNCS, vol. 2732, pp. 413– searchers working in functional cardiovascular imaging, oncological 425. Springer, Berlin (2003) image analysis, women’s health and invivo soft tissue mechanics that 54. Yeh, J.: Stochastic Processes and the Wiener Integral. Pure and has recently moved to the new Oxford Institute of Biomedical Engi- Applied Mathematics, vol. 13. Marcel Dekker, New York (1973) neering. Professor Noble has 22 years of research experience in image analysis and its application in manufacturing and medicine and has published over 180 peer-reviewed papers. She is on the Board of the Jeong-Gyoo Kim received the DPhil MICCAI Society, a Senior Member of the IEEE, AE for IEEE Trans- degree in Engineering in 2006 from actions on Medical Imaging, and a regular reviewer for other leading the University of Oxford, UK and journals and conferences in her field. did her post-doc in the Depart- ment of Engineering Science. She has worked on computer vision, mainly shape analysis and medical image analysis since 2001. She will be with Dept. Mathematics, Yonsei University, Korea. Her research in- terests include mathematical mod- elling, shape analysis, segmenta- tion, and registration. Prior to the University of Oxford, she worked