Issues and Problems

SHAPE ANALYSIS Issues and Problems

M.B. Rao University of Cincinnati

A Seminar Presented at The Division of Biostatistics and Bioinformatics University of Cincinnati January 30, 2015

1 Outline 1. Exordium 2. Consulting Problems 3. What is shape? 4. How to bring shapes into a single platform? a. Procrustes analysis b. Bookstein coordinates c. Helmert transformation a la Kendall 5. Distribution theory 6. Excursus

1. Exordium Introduction and Examples Take any object depicted in two or three dimensions. The focus is on the outline (shape) of the object. What is shape? The outline can be described by a mathematical function. This is hard. Another way is to identify some landmarks of the shape and note the coordinates of the landmarks in a co-ordinate system. Such data on the shape is called landmark data. We will be working with landmark data of several objects. From a statistical point of view, a shape is characterized by a collection of points listed in some order along with their coordinates. The points are put together in the form of a matrix with rows representing points or landmarks and columns coordinates. The subject matter comes under the name ‘Statistical Shape Analysis’ and ‘Morphometrics.’ There are hundreds of papers published in this area. Books Ian Dryden and Kanti Mardia – Statistical Shape Analysis, Wiley, 1998

2 Julien Claude – Morphometrics with R, Springer, 2008 Fred Bookstein – Morphometric Tools for Landmark data, Cambridge University Press, 1991 Examples Download and activate the package ‘shapes’ from R. There are some examples of shapes data in the package. Download the data ‘digit3.dat.’ > data(digit3.dat) What are the dimensions of the data? It is an array. > dim(digit3.dat) [1] 13 2 30 It is an array consisting of 30 shapes in two dimensions with 13 landmarks. The first number gives the number of rows, second columns, and third slices. Plot the first two shapes. > plotshapes(digit3.dat[, , 1], joinline = c(1 : 13)) > plotshapes(digit3.dat[, , 2], joinline = c(1 : 13)) Plot all shapes.

 Plotshapes(digit3.dat, joinline = c(1:13))

Data behind Shape 1: > digit3.dat[ , , 1] [,1] [,2] [1,] 9 -27 [2,] 12 -31 [3,] 17 -36 [4,] 26 -39

3 [5,] 34 -37 [6,] 36 -33 [7,] 38 -27 [8,] 35 -19 [9,] 30 -15 [10,] 21 -14 [11,] 21 -8 [12,] 16 -6 [13,] 8 -5 Example 1 0 0 1 - 0 2 -

0 3 - 0 4 -

10 20 30 40

Example 2

4 0 1 - 5 1 - 0 2 - 5 2

- 0 3 - 5 3 - 0 4 -

5 10 15 20 25 30 35 40

Plot all shapes of 3. 0 0 1 - 0 2 -

0 3 - 0 4 - 0 5 -

0 10 20 30 40 50

Questions

Basic: How does one define shape?

1. How to define distance between two shapes?

2. How to define mean shape?

3. How to define median shape?

4. How to measure variation present in the shapes?

5. What is Shape space? How to introduce a distribution on the shape space?

5 2. Consulting problems A. The Antarctica was teeming with life 60,000 years ago: flora and fauna. A geology professor, on a summer excavation expedition in the Antarctica, brought 60 seeds. Each seed laid out on a tracing paper was looked at through a microscope. Its outline was drawn on the paper. He brought these 60 papers to my office and asked me to do cluster analysis on the seeds. How? B. A Physics researcher in the medical school here came to me with a modeling problem. Treatment regimen of breast cancer: Six-week program: Once a week treatment: a. Identify the tumor in the breast; b. Locate the center of the tumor; c. Intense radiation is applied at the center for a certain length of time. The woman comes again next week: a. Identify the tumor in the breast (the tumor seems to have shrunk); b. Locate the center of the tumor (the center has shifted – the tumor also has shifted too!); c. Apply radiation. Data: Week 1: Center (0, 0, 0)

2. Center (x1, y1, z1) … …

6. Center (x5, y5, z5) + Outlines of the shapes of the tumors + some co-variate information (age; parity; density of the breast; race etc. Data on about 30 women – Questions: Model how the center is shifting from week to week; variation in the shapes C. Donna’s Morphometrics Lab in the Children’s Hospital D. Tessier facial cleft 3. What is shape?

6 Suppose Y and W are two shapes in m-dimensional space each with k landmarks. What this means is that Y and W are matrices of order kxm (k rows and m columns). Say that the shapes Y and W are the same if after shifting the location of one shape it is identical with the other shape. What does it mean to say location shift? Suppose Y = and W = with 1 joined 2 joined to 3.

+ = 4 3 2

1 0 1 -

-1 0 1 2 3 4

Say two shapes Y and W are the same if after shrinking or enlarging (scaling) one of the shapes it is identical to the other shape. Consider the following examples. Y = and W =

0 1 2 3 4

Shrink the shape W by 50%. 0.5*W = 0.5* = = X Say two shapes Y and W are the same if after rotating one of the shapes by an angle it coincides with the other shape. Look at the following examples. Y = and W = Plot the shapes. 6 4

Y W 0 2 -

-4 -2 0 2 4

Rotate the shape W anti-clockwise by an angle 900. What does it mean? * = * = The 2x2 matrix above is an example of a rotation matrix.

8 One could rotate a shape by any angle. In general, the result of a θ0 rotation of a point or a shape anti-clockwise is tantamount to post-multiplying the point or the shape matrix by the matrix .

Definition: Shape is all the geometrical information that remains when location, scale, and rotational effects are filtered out from an object. In summary: Take any shape. Shift it to another location. We are not getting a new shape. Take any shape. Scale it. We are not getting a new shape. Take any shape. Rotate it. We are not getting a new shape. Say two shapes are identical if after shifting the location, scaling, and rotating of one shape, it matches with the other shape. Goals:

1. We have landmark data on several objects. Bring them together onto the same platform by location shift, scaling, and rotation. Obtain summary statistics of the shapes after that. How to do that?

2. Develop distribution theory on shapes.

3. Fit a shape distribution to the landmark data.

4. Pursue statistical inference.

5. Non-parametric inference

6. Pattern recognition

7. Etc. In general, the result of a θ0 rotation of a point or a shape anti-clockwise is tantamount to post-multiplying the point or the shape matrix by the matrix . Example:

9 Consider the landmark data on the first digit 3 from the dataset ‘digit2.dat.’ Plot it. > data(digit3.dat) > digit3.dat[ , , 1] [,1] [,2] [1,] 9 -27 [2,] 12 -31 [3,] 17 -36 [4,] 26 -39 [5,] 34 -37 [6,] 36 -33 [7,] 38 -27 [8,] 35 -19 [9,] 30 -15 [10,] 21 -14 [11,] 21 -8 [12,] 16 -6 [13,] 8 -5 > plotshapes(digit3.dat[ , , 1], joinline = c(1:13))

10 0 0 1 - 0 2 -

0 3 - 0 4 -

10 20 30 40

Rotate it by 900. Post-multiply its landmark data by the appropriate rotation matrix. 0 0 4 0 1 - 0 3 0 2 -

0 2 0 3 - 0 1 0 4 -

10 20 30 40 0 10 20 30 40

1. If the shape is located in the first quadrant, the 900 rotation will place it in the second quadrant.

2. If the shape is located in the second quadrant, the 900 rotation will place it in the third quadrant.

3. If the shape is located in the third quadrant, the 900 rotation will place it in the fourth quadrant.

4. If the shape is located in the fourth quadrant, the 900 rotation will place it in the first quadrant.

11 5. Out Digit 3 is located in the fourth quadrant.

4. How to bring all shapes into a single platform? a. What is Procrustes Analysis? There are two planar shapes y and w. Mathematically, y and w each is an ordered sequence of two-tuples signifying the landmarks of the shape. Let us look at the ‘digit3.dat’ data. Let us look at the first two shapes. They can be brought into the same frame. Shape 1 seems to be taller and wider than Shape 2. The simplest way to achieve this objective is to make the centroid of the shapes to be (0, 0). For each shape, make the mean of each coordinate zero. Goal:

1. Rotate Shape 2 by an angle θ.

2. Expand it by an amount β > 1?

3. Shift it to a different position by (a, b).

4. Choose θ, β, and (a, b) so that the resultant modified Shape 2 is closest to Shape 1 in the Euclidean sense.

This is, in essence, a Procrustes transformation of Shape 2 to Shape 1. The objective can be formulated mathematically if we view each 2-tuple as a complex number. Procrustes is a mythological figure from the Greek mythology. He owned only one bed on a roadside inn for the benefit of travelers who wanted to rest at his inn for the night. He could offer the bed to only one traveler. The dimensions of the bed were fixed and non-immutable. If the traveler was shorter than the length of the bed, Procrustes stretched his legs to fit him snugly into the bed. If the traveler was taller than the length of the bed, Procrustes chopped his legs so as to fit him snugly into the bed. Procrustes transformation

12 T T Let y = (y1, y2, … , yk) and w = (w1, w2, … , wk) be two centralized shapes. Each two-tuple is written as a complex number. Each yi and wi is a complex number. Write the complex linear model:

iθ yi = u + β*e *wi + εi i = 1, 2, … , k In matrix notation, y = = (a+ib)*1k + β*, where 1k is a column vector of k 1 s. We are rotating the shape w by an angle θ (i.e., ) and then scaling it by β. Whatever shape we get we are shifting it by a + ib. Parameters

1. u = a + i b is the shift.

2. θ is the degree of rotation.

3. β > 0 is the scale.

ε is error.

iθ Estimate the parameters so that y and (a + ib)*1k + β*e *w are closest. Minimize the sum of absolute squared errors. Minimize with respect to u, θ, and β. (Complex least squares problem!) The symbol - is the operation of complex conjugation. Solution is explicit.

= arg(w*y) = -arg(y*w)

= 0 The package “shapes” has a command which does procrustes analysis. Plot Shapes 1 and 2 in the frame.

13 0 0 0 0 2

- 2 - 0 0 4 - 4 -

10 20 30 40 10 20 30 40 50

Numerical example Both shapes are centered at the origin (0, 0).

Two Shapes of Three 3 2 1 0 Y 1 - 2 - 3 -

-4 -2 0 2 4

Comments: The shapes seem to be similar. The blue shape (Shape2) is bigger. If we can squash it (scale: β < 1) and rotate it, we should be able to get it closer to the red shape. Use the linear model theory for complex data to bring Shape 2 closest to Shape 1.

14 Shape 1 & Modified Shape 2 3 2 1 0 Y 1 - 2 - 3 -

-2 -1 0 1 2

Centering Digit 3 data and Procrustes We center all Digit 3 shapes and apply Procrustes transformation to Shapes 2 to 30 to bring them all closest to Shape 1. We will then find the mean shape and variance of shapes. Let us plot the centered shapes.

Cenered Digit 3s 0 2 0 1

0 0 1 - 0 2 -

-20 -10 0 10 20

Let us apply the Procrustes transformation to Shapes 2 to 30 to bring them all closest to Shape 1.

15 Procrustes of Shapes 2:30 into Shape 1 0 2 0 1

0 0 1 - 0 2 -

-20 -10 0 10 20

Let us find the mean shape.

Procrustes of Shapes 2:30 into Shape 1 + Mean Shape 0 2 0 1

0 0 1 - 0 2 -

-20 -10 0 10 20

Criticism? Let us measure variation present in the shapes. From each Procrusted shape take away the Mean Shape, square the differences, add them, and then divided by 13. Standard deviation of the shapes: [1] 3.743057 Can we build a 95% confidence interval for the population mean shape?

16 Bookstein’s coordinates and Bookstein’s mean shape

Let (x1, y1), (x2, y2), … , (xk, yk) be the landmarks of a shape in two dimensions. Translate, rotate, and rescale the shape so that Landmark 1 becomes (-1/2, 0) and

Landmark 2 becomes (1/2, 0). The new landmarks are (-1/2, 0), (1/2, 0), (u3, v3),

(u4, v4), … , (uk, vk). The whole operation can be summarized as follows. For any j ≥ 1,

= c* c = scaling factor A = = Rotation by an angle θ clock-wise

= translation There are four unknowns. We need to find them. Look at our goals on Landmarks 1 and 2. Set

= c* We will have four equations in four unknowns. The solution is given by

2 2 2 1/c = (x1 – x2) + (y1 – y2) = Distance between Landmark 1 and Landmark 2 = A = c With this solution, the new land marks are: uj = - 0.5 vj = j = 3, 4, … , k k = 3

(x1, y1) = (2, 1)

(x2, y2) = (1, 2)

17 (x3, y3) = (2, 2) New landmarks: (-0.5, 0) (0.5, 0) (0.0, -0.5) Aim: We have m shapes with landmarks. For each shape, get the Bookstein’s coordinates. Then we can calculate the mean shape by averaging the new coordinates (Bookstein mean shape). Example: Let us look at the female gorillas. Only the first two are shown. 6 6 . . 0 0 2 2 . . 0 0

2 2 . . 0 0 - - 6 6 . . 0 0 - -

-0.6 -0.2 0.2 0.6 -0.6 -0.2 0.2 0.6

Do Bookstein.

18 6 . 0 4 . 0 2 . 0 0 .

0 2 . 0 - 4 . 0 - 6 . 0 -

-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6

5. Distributions on the pre-shape space Pre-shape space We look at a shape X (configuration matrix) in m-dimensions described by k points. This means that X is a matrix of order kxm. Let us consider the Helmert matrix HF of order kxk. HF =

Properties:

F F T F T F 1. It is an orthogonal matrix, i.e., (H )(H ) = (H ) H = Ik, identity matrix of order kxk.

2. With the exception of the first row, every row sum is zero.

Consider the following sub-matrix H of order (k-1)xk obtained from HF by deleting its first row. H =

Definition: Let U = (uij) be a matrix of order pxq. Its norm is defined by . It can be shown that = = ,

19 where the trace of any square matrix is the sum of all its diagonal elements. Definition: The pre-shape of a configuration matrix X is defined by Z = .

The matrix Z is of order (k-1)xm. Let Z = (zij). Note that

= = = 1 This is not always true. Why? Definition: A shape X is coincident if all rows of X are identical. This means all k points in the m-dimensional Euclidean space are the same. If the shape X is coincident, then HX = 0. Why? Consequently, = 0. The pre-shape of X does not make sense. Definition: A shape X is non-coincident if it is not coincident. Properties of pre-shape

1. Shift the shape X to some other spot in the m-dimensional space. The new shape is of the form

X1 = X + = X + A Its pre-shape is exactly the same as that of X. HX1 = H(X + A) = HX + HA = HX + 0 = HX and Z1 = pre-shape of X1 = HX1/ = HX/ = Z

2. Distort the shape X by a factor β > 0.

The pre-shape of βX is exactly the same as that of X. Definition: The pre-shape space Σ is the collection of all pre-shapes of non- coincident shapes. Mathematically, Σ = { Z = HX/} One could have introduced Shape Space as the collection of all shapes X. This space is wild and chaotic. A shape X and all its translations and distortions prowl the Shape Space as distinct objects. We cannot introduce distributions on such a

20 wild entity. On the other hand, the pre-shape space is orderly. A shape and all its translations and distortions are one and the same in the world of pre-shapes. It is easy to introduce distributions on such an entity. Further, the term ‘pre-shape’ signifies that we are one step away from shape – rotation still has to be removed. Specialize to the case m = 2 We now focus on shapes in two-dimensions. The pre-shape Z of a shape X is of dimensions (k-1)x2. Let us write it explicitly – Z = HX/ = with the property

= 1 Obtain the polar coordinates of each point in Z.

= and =

= and = …

= and = Note that = 1. Why? Let ,

Each angle θ ε [0, 2π). Let us summarize the polar information in the pre-shape Z. P = (). Properties of P

1. Each si > 0.

21 2. s1 + s2 + … + sk-2 ≤ 1.

3. Each θi ε [0, 2π).

Knowing a non-coincident shape X implies knowing its pre-shape Z. Knowing the pre-shape Z implies knowing its polar vector P. Given any P with properties listed above there is a pre-shape Z of some shape X. Give me P = (), with the properties

1. Each si > 0.

2. s1 + s2 + … + sk-2 ≤ 1.

3. Each θi ε [0, 2π).

Calculate

= and =

= and = …

= and =

Note that ri = sqrt(si). Define Z = . Define X = HTZ. Note that the order of the matrix X is kx2.

T The pre-shape of X is precisely Z. Note that HH = Ik-1.

22 Now, the strategy is clear. If one wants to introduce a distribution on the pre- shape Σ, it suffices to introduce a distribution on the polar space

Ω = {P = (): si > 0 for all i; s1 + s2 + … + sk-2 ≤ 1; θi ε [0, 2π) for all i}. Uniform pre-shape distribution on Ω The point () is in the k-2 dimensional simplex. It is a solid. Its volume is 1/(k-2)!. (Geometry result) Put a uniform distribution on the simplex.

Put a uniform distribution for θ1 on the interval [0, 2π).

Put a uniform distribution for θ2 on the interval [0, 2π). …

Put a uniform distribution for θk-1 on the interval [0, 2π). String them together independently. This is the uniform pre-shape distribution. The uniform distribution on the simplex is a special type of Dirichlet distribution. Definition

The random vector (S1, S2, … , Sk-2) with each Si ≥ 0 and S1 + S2 + … + Sk-2 ≤ 1 is said to have a Dirichlet distribution with parameters p1 > 0, p2 > 0, … , pk-2 > 0, pk-1 > 0 if the joint probability density function is given by f(s1, s2, … , sk-2) =

for s1 ≥ 0, … , sk-2 ≥ 0 and s1 + s2 + … + sk-2 ≤ 1.

This is a multivariate version of the Beta distribution. If p1 = p2 = … = pk-1 = 1, the distribution is uniform on the simplex. This definition is introduced to point out that we can entertain other distributions on the simplex.

23 General: Put a distribution on the simplex; put a distribution on each angle. String them together independently. We will have a distribution on the pre-shape space Ω. Dirichlet distribution on the simplex is a natural. There is a plethora of distributions on the angle space [0, 2π) (circular distributions – von Mises distribution, for example) 30 male gorillas’ landmark data on the shape of skulls 30 female gorillas’ landmark data on the shape of skulls 8 landmarks Identify differences How? Parametric approach a. Put a Dirichlet distribution on the six-dimensional simplex. Estimate the parameters of the distribution using the male gorillas’ data. Estimate the parameters of the distribution using the female gorillas’ data. Interpret the parameters. b. Put a von Mises distribution on each angle space. Estimate the parameters of the distributions separately for males and females. Interpret the parameters. c. Examine the differences. 6. Excursus Challenges a. Introducing distributions on the pre-shape space b. Fitting distributions c. Goodness-of-fit d. Cluster analysis of shapes e. Shape space in the context of Bookstein coordinates. f. Spline model approach to Shape analysis