Geometric Median Shapes
Total Page:16
File Type:pdf, Size:1020Kb
GEOMETRIC MEDIAN SHAPES Alexandre Cunha Center for Advanced Methods in Biological Image Analysis Center for Data Driven Discovery California Institute of Technology, Pasadena, CA, USA ABSTRACT an efficient linear program in practice. They observed that smooth We present an algorithm to compute the geometric median of shapes do not necessarily lead to smooth medians, an aspect we also shapes which is based on the extension of median to high dimen- verified in some of our experiments. sions. The median finding problem is formulated as an optimization We propose a method to quickly build median shapes from over distances and it is solved directly using the watershed method closed planar contours representing silhouettes of shapes discretized as an optimizer. We show that the geometric median shape faithfully in images. Our median shape is computed using the notion of represents the true central tendency of the data, contaminated or not. geometric median [12], also known as the spatial median, Fermat- It is superior to the mean shape which can be negatively affected by Weber point, and L1 median (see [13] for the history and survey the presence of outliers. Our approach can be applied to manifold of multidimensional medians), which is an extension of the median and non manifold shapes, with single or multiple connected com- of numbers to points in higher dimensional spaces. The geometric ponents. The application of distance transform and watershed algo- median has the property, in any dimension, that it minimizes the n rithm, two well established constructs of image processing, lead to sum of its Euclidean distances to given anchor points xj 2 R , an algorithm that can be quickly implemented to generate fast solu- ∗ X x = arg min kx − xj k : (1) tions with linear storage requirement. We demonstrate our methods x in synthetic and natural shapes and compare median and mean re- j sults under increasing outlier contamination. It has a breakdown point of 0.5, i.e. the data can be corrupted with up Index Terms— Shape Analysis, Geometric Median, Median to 50% and it still gives a good prediction representing the sample Shape, Average Shape, Segmentation Fusion [14] (and [15] for the functional median). Unlike the mean (centroid) of points, which can be found directly using a analytical expression, 1. INTRODUCTION the geometric median can only be approximated by numerical algo- rithms. Due to convexity of eq.(1) the median point always exist and The computation of an average shape from a collection of similar it is unique provided the anchor points are not collinear [16]. The ge- shapes is an important tool when classifying, comparing, and search- ometric median lies within the convex hull of the given points (see ing for objects in a database or in an image. It is helpful, for example, e.g. [17]), a property we will exploit in our approach. when morphology is key in the identification of healthy and diseased Researchers have developed fast numerical solutions for this cells, when differentiating wild from mutated cells, in the construc- convex problem. Weiszfeld developed in 1937, at the age of 16 tion of anatomical atlases, and when combining different candidate years old, a gradient descent algorithm which is still quite com- segmentations for the same image. The average usually takes the petitive (see translated reprint [18], and [19] for a review). It was form of the mean or the median, the latter being preferred due to rediscovered many times until recognition on early 60’s. New de- its robustness to automatically filter out outliers. Shape analysis is velopments claim faster algorithms that can handle large data sets a vast field of research and many solutions have been proposed to [20, 21]. We do not rely on these methods as our median shape is determine geometrical and statistical mean shapes (see e.g. [1, 2]), generated by finding paths of minimum cost in the discrete image with applications ranging from cell biology to space exploration and domain. arXiv:1810.12445v3 [cs.CV] 15 Apr 2019 methods in the fields of elastic shape analysis [3, 4] and most re- The emphasis of our presentation is to introduce a novel al- cently in deep learning for shape analysis [5, 6, 7]. gorithm capable of generating median shapes in a fast manner us- Comparatively, very few have proposed methods for computing ing tools already familiar to the image processing community. We medians of shapes. Fletcher et al. [8] developed a method to com- will leave proofs for the mathematical models and formulations pre- pute the geometric median on Riemannian shape manifolds showing sented here for future work. robustness to outliers on the Kendall shape space. In [9] the authors formulate the median finding problem as a variational problem tak- 2. METHODS ing into account the area of the symmetric difference of shapes, a model investigated much earlier in [10]. More recently, the work in 2.1. A distance for planar shapes [11] represents shapes as integral currents and consider median find- ing using a variational formulation which the authors claim lead to We define here the expression to measure the distance between closed planar shapes represented by their contours. It is this metric Funding supporting this work was provided by the Beckman Insti- that we will be optimizing when computing the geometric median tute at Caltech, a research center endowed with funds from the Arnold and Mabel Beckman Foundation, to the Center for Advanced Methods of a set of shapes. We assume contours are rectifiable curves, but in Biological Image Analsysis. Correspondence should be addressed to do not restrict their topology to planar manifolds, crossings are al- [email protected]. lowed. Other popular metrics to measure distances between shapes, 2.2. Geometric median for shapes We formulate the problem of finding the geometric median of shapes as an optimization problem over distances, similar to the high dimen- sional geometric median of points: find the optimal closed contour Γ∗ that minimizes the sum of distances to a given set of closed con- n tours fΓigi=1, n ∗ X Γ = arg min d(Γ; Γi) (5) Γ i=1 Fig. 1: Illustration of planar shape distances. Arrows in the left figure where d(·; ·) is given by eq.(3) and the solution lies in the convex represent the shortest distances dΓ (x) and dΨ(y) from points x 2 Ψ and ∗ y 2 Γ, respectively, to closed contours Γ and Ψ. The figure on the right hull of fΓig, Γ ⊂ ch(fΓig). When each curve Γi degenerates to illustrates a case where the horizontal black curve Γ is equally distant to a point, with jΓij = 1, then the problem is reduced to finding the three different other curves, d(Γ; Γ1) = d(Γ; Γ2) = d(Γ; Γ3) = 63. The geometric median of points. From eq.(5) we then have symmetric distance d captures our two-way expectations: d (Γ; Γ ) = g g 1 n Z 145 > dg(Γ; Γ2) = 129 > dg(Γ; Γ3) = 116. Curves Γi only differ at ∗ X 1 Γ = arg min dΓi (x)ds ; x 2 Γ the bulgy part. Slanted arrows on the right figure are directed to the farthest Γ jΓj i=1 Γ points on Γi from Γ. To compute dg, curves were 8-connected discretized in n ! 429x295 images. 1 Z X = arg min dΓi (x) ds (6) Γ jΓj e.g. Hausdorff and Frechet´ distances, are not suitable for our median Γ i=1 formulation as they do not report distances along the entire length of n ! 1 Z X candidate curves. = arg min kx − yik ds; yi 2 Γi 2 2 Γ jΓj Γ Let dΓ : R ! R give the distance of a point x 2 R to a curve i=1 Γ ⊂ R2, where R and P commute due to Tonelli’s theorem for non-negative dΓ (x) = inf kx − yk > 0 (2) y2Γ functions [22] (indeed, dΓi (·) > 0) and yi is the point in Γi closest to x 2 Γ, as in Fig.1. It should be clear that the location of each y where the infimum is attained for one or more points y 2 Γ closest i depends on the position of x in Γ, thus y := y (x). to x, and the L norm k · k gives the Euclidean distance between i 2 Let’s for a moment ignore jΓj on eq.(6) and consider an approx- points (see Fig.1). Note that d (x) = 0 () x 2 Γ. We then define Γ imation of the unknown Γ by an ordered set of points (x ; : : : ; x ). the distance from a rectifiable curve Γ to another curve Ψ using a 1 m At the solution Γ∗ := (x∗; : : : ; x∗ ), the term in parentheses on the line integral over all points x := x(s) 2 Γ, 1 m last expression in eq.(6) is optimized for each xj which implies that ∗ n 1 Z any point in Γ is in fact the geometric median of points fyigi=1, d(Γ; Ψ) = dΨ (x)ds (3) each from a single Γi, jΓj Γ n ∗ X ∗ ∗ x = arg min kx − yik; yi 2 Γi; 8x 2 Γ (7) where dΨ (·) is given by eq(2), ds is the arc length differential, and x jΓj > 0 is the length of curve Γ. By integrating distances for all i=1 ∗ points along the curve and dividing by its length we have a measure Since we know x 2 ch(fyig) ⊂ ch(fΓig), see e.g. [17], we can ∗ that tells us how far, in average, is the source curve Γ to the target then show by contradiction that Γ ⊂ ch(fΓig), as stated above.