Group-Wise Point-Set Registration based on Renyi’s´ Second Order Entropy

Luis G. Sanchez Giraldo Erion Hasanbelliu, Murali Rao, Jose C. Principe University of Miami, FL, USA University of Florida, FL, USA [email protected] [email protected], [email protected], [email protected]

Abstract mation relating the point-sets. The method is very simple, but it exhibits local convergence due to a non-differentiable In this paper, we describe a set of robust algorithms cost function. In addition, it suffers in the presence of out- for group-wise registration using both rigid and non-rigid liers and is not suitable for non-rigid transformations. Ro- transformations of multiple unlabelled point-sets with no bust point matching (RPM) [4] improves upon ICP by em- bias toward a given set. These methods mitigate the need ploying a global to local search and soft assignment for the to establish a correspondence among the point-sets by rep- correspondence. The method jointly determines the corres- resenting them as probability density functions where the pondence and the transformation between the two point-sets registration is treated as a multiple distribution alignment. via deterministic annealing and soft-assignment. However, Holder’s and Jensen’s inequalities provide a notion of sim- the method is not stable in the presence of , and ilarity/distance among point-sets and Renyi’s´ second order due to the soft-assignment approach, the complexity of the entropy yields a closed-form solution to the cost function method is high. and update equations. We also show that the methods can To eliminate the point correspondence requirement, a be improved by normalizing the entropy with a scale factor. global approach is taken where the point-sets are repre- These provide simple, fast and accurate algorithms to com- sented as probability density functions (PDF). Tsin and pute the spatial transformation function needed to register Kanade [21] were the first to propose such a method where multiple point-sets. The algorithms are compared against they modelled the two point-sets as kernel density func- two well-known methods for group-wise point-set registra- tions and evaluated their similarity and transformation up- tion. The results show an improvement in both accuracy dates using the kernel correlation between the two den- and computational complexity. sity estimates. Glaunes et al. [8] matched the two point- sets by representing them as weighted sums of Dirac delta functions, where Gaussian functions were used to ‘soften’ 1. Introduction the Dirac delta functions and diffeomorphictransformations Point-set registration is a common problem in computer were used to minimize the distance between the two distri- vision, , medical imaging, , and butions. Jian and Vemuri [12],[13] extended this approach many other fields. Tasks such as , face by representing the densities as Gaussian mixture models recognition, object tracking, or 3D object (GMM). They derive a closed-form expression to compute fusion all require registration of features/point-sets. Fea- the L2 distance between the two Gaussian mixtures and the tures representing an object’s contour or other distinguish- update method to align the two point-sets. ing characteristics are extracted from an image and com- All these methods are limited to registering only a pair pared against those of other objects or templates, where of point-sets. In certain applications, such as medical im- first, a correspondencebetween the point-sets is established, age registration, there is a need to simultaneously register and then, the spatial transformation that aligns them is re- a group of point-sets. None of these methods is directly trieved. A similarity measure is used to compare the point- extendible to group-wise alignment of multiple point-sets. sets, where the choice depends on the object features and In addition, these methods are all biased (unidirectional up- the problem. date) where one point-set acts as the target and the other is The research community has developed many techniques transformed to align with it. In applications where group- to solve point-set registration. The iterative closest point wise registration is required, there may not be an actual (ICP) algorithm [1] is the most popular method, which uses template to match against (e.g. medical images). There- the nearest neighbour relationship to assign binary corres- fore, there is a need to determinea middle, commonorienta- pondence and then determines the transfor- tion/position to align the point-sets. Estimating a meaning-

16693 ful average shape from a set of unlabelled shapes is a key The divergence directly compares PDF similarity and is challenge in deformable shape modelling. expressed in terms of inner products of the two PDFs which Chui et al. [6] presented a joint clustering and match- essentially estimate a normalized R´enyi’s quadratic cross- ing algorithm that finds a mean shape from multiple shapes entropy. Suppose we have two d-dimensional point-sets represented by unlabelled point-sets. Their process follows X = x(f),..., x(f) and X = x(g),..., x(g) . It is f 1 Mf g 1 Mg a similar approach to their previous work in [5] where ex- { } { } assumed that x(f) X are drawn from f and x(g) X plicit correspondence needs to be determined first. In addi- i f j g drawn from g. The∈ CS divergence between point-sets∈ X tion, the method is not robust to outliers, so stability is not f and X is computed by plugging in the Parzen density es- always guaranteed. g ˆ Wang et al. [22] proposed a method for group-wise timates f and gˆ of the PDFs f and g into (2). Namely, the registration where the point-sets are represented as density Parzen density estimate for f is given by [14]: functions. Based on the same principles as the PDF regis- Mf (f) 1 x xi tration methods above, their algorithm simultaneously reg- fˆ(x; Xf ) = κ − , (3) Mf σ isters the point-sets and determines the mean set without i=1 ! solving for correspondences or selecting any specific point- X where κ( ) is a valid kernel (window) function and σ its set as a reference. Their approach minimizes the Jensen- · Shannon divergence among cumulative distribution func- bandwidth parameter. The estimate gˆ of g is obtained in a tions (CDFs). They shifted from PDFs because CDFs are similar way. The more immune to noise and are also well defined since CDF 2 1 x x is an integral measure. However, the CDF estimation is i (4) Gσ(x, xi) = exp k − 2 k , computationally very expensive and there are no closed- √2πσ − 2σ ! form solutions to their updates. is considered as the kernel of choice for its properties: Chen et al. [3] developedanother group-wise registration symmetry, positive definitiness, and exponential decay con- method based on the Havrda-Charv´at divergence for CDFs. trolled via the kernel bandwidth. By using the Gaussian Similar to [22], they use the cumulative residual entropy to kernel, the plug in estimator of Cauchy-Schwarz diver- represent the CDFs. Their method, CDF-HC, generalizes gence has a closed-form expression in terms of convolu- the CDF-JS but it is much simpler to implement and com- tions operators making it simple to estimate. An important putationally more efficient. term in (2) is called the cross-information potential [15, 11], In this paper, we introduce a set of methods for group- which corresponds to the inner product between f and g. wise registration based on R´enyi’s quadratic entropy. The The empirical estimator of the cross-information potential major improvement of our methods is that they provide a has the following form: closed-form solution for the updates, which makes them much simpler and faster to compute than both CDF-based CIP (X , X ) = fˆ(x)ˆg(x)dx (5) methods above, with no loss in accuracy. We compare our f g Z methods against CDF-JS and CDF-HC on various data sets. Mf Mg 1 (f) (g) = G√2σ xi , xj . Mf Mg 2. Background i=1 j=1 X X   In [9] and [10], a density-based point-set registration This term is crucial in determining the similarity between method is introduced. The similarity between the two PDFs the two point-sets. When data points are interpreted as par- is measured using an information theoretic measure, the ticles, the information potential measures the interaction of Cauchy-Schwarz (CS) divergence [15], derived from the the field created by one set of particles on the locations Cauchy-Schwarz inequality [18]: specified by the other set. The Cauchy-Schwarz informa- tion potential field exerts information forces on samples of f(x)g(x)dx f 2(x)dx g2(x)dx, (1) the second point-set forcing them to move toward a path ≤ Z sZ Z that will provide the most similarity between the two PDFs. where in the case of PDFs, the equality holds if and only if The CS-divergence is restricted to pairwise comparisons. f(x) = Cg(x) with C = 1. The Cauchy-Schwarz diver- To account for group-wise comparisons, we propose two gence is then defined as: extensions of the above idea that allow the comparison of more than two density functions simultaneously. The first 2 proposed method uses H¨older’s divergence as a direct ex- f(x)g(x)dx tension of the Cauchy-Schwarz divergence. The second ex- (f g) = log Z  . (2) tension, which is the main focus of this work, is obtained DCS k − 2 2 f (x)dx g (x)dx based on Jensen’s inequality on the information potential. Z Z

6694 3. Group-wise registration based on Holder’s¨ Jensen’s inequality. However, unlike [22] where a group- divergence wise registration method was proposed using the Jensen- Shannon (JS) divergence, our method is based on the direct A generalized version of H¨older’s inequality to N func- comparison of information potentials derived from R´enyi’s tions fk is given by: { } second order entropy. N N N 4.1. Renyi’s´ second order entropy and the informa• N fk(x)dx fk (x)dx. (6) ≤ tion potential Z k=1 ! k=1 Z Y Y R´enyi’s entropy is a generalization of Shannon’s entropy Based on (6), H¨older’s divergence is thus defined as: [19] for which the logarithm operation lies outside the ex-

N N pectation operator. R´enyi’s entropy of a continuous random variable X taking values in , and with probability density fk(x)dx function f, is defined as: X Z k=1 ! ( fk ) = log Y , (7) DH { } − N N 1 α fk (x)dx Hα(X) = log f (x)dx. (10) k −1 α Y=1 Z − ZX which can be written as a difference of logarithms, In the limit α 1,(10) approximates Shannon’s differen- → tial entropy [17]. For α = 2, an empirical estimator Hˆ2 of N H2, which has closed form solution and it is also differen- ( fk ) = N log fk(x)dx DH { } − tiable, can be obtained using Parzen windows. Let Xp be k Z Y=1 (8) an i.i.d sample of size M drawn from f. The expression, N N + log f (x)dx. M M k 1 k=1 Z Hˆ (Xp) = log G (xj xi), (11) X 2 − M 2 √2σ − i=1 j=1 Suppose we have N point-sets Xk, k 1,...,N. Each X X (k) (k) (k) ∈ d Xk = x ,..., x , where x R . Similar to { 1 Mk } i ∈ is an estimator of R´enyi’s second order entropy. The quan- (5), an empirical estimator of H¨older’s divergence, with tity inside of the logarithm in (11) is known as the informa- closed-form solution, can be obtained using Parzen density tion potential, which as we will se below, allows the formu- approximation with Gaussian kernel, lation of a well behaved cost function for group-wise regis- M1 MN N−1 N tration. ˆ DH({fk}) = − N log X ··· X Y Y Gp,q i1=1 iN =1 p=1 q=p+1 4.2. Group•wise registration based on information (9) N Mk Mk N−1 N potentials + X log X ··· X Y Y Gk,k, k=1 i1,k =1 iN,k=1 p=1 q=p+1 Let X = X ,..., XN be a collection of N shapes to { 1 } (p) (q) be aligned. Each Xk is an array of Mk points representing where γ = √Nσ, Gp,q = Gγ (x x ), and ip iq a shape. Here, a shape is considered to be a set of i.i.d (k) (k) − Gk,k = Gγ (x x ). Although having a closed- samples drawn from a common random variable that have ip,k − iq,k form solution for H¨older’s divergence estimator is appeal- undergone an unknown transformation Tk. Our goal is to 1 ing at first glance, it also reveals its complexity. A di- find a set of transformations Sk = Tk− that map the collec- 1 rect implementation of H¨older’s divergence (9) would re- tion of shapes X to a collection X˜ = X˜ k = T − (Xk) 2 { k } quire (M1 MN N d) operations which become pro- regarded as i.i.d samples drawn from the same under- O ··· hibitively large as the number of shapes and data points in- lying random variable X˜. Since the underlying random crease. Clever manipulations of the terms in (9) can reduce variable that the shapes are assumed to be sampled from 2 2 the number of operations to (maxi Mi ) N d . In is unknown, it is necessary to impose a set of regularity O { } the following, we propose an alternative objective function conditions on the transformation mappings Sk and the   ˜ 2 2 estimated matching distributions from each Xi. with (maxi Mi ) N d complexity. O { }   The information potential 2 is a con- 4. Group-wise registration based on Renyi’s´ IP (f) = f (x)dx vex function since it corresponds to the squared L2 norm of second order entropy a density function1. From the convexityR of the information

To reach a closed-form but also computationally effi- 1Here, we restrict to the space of probability density functions with cient solution, we propose an objective function based on bounded L2 norm.

6695 potential, the following inequality holds: by a linear transformation AT followed by the addition of the vector t Rd. For a point x Rd, the affine transfor- N N ∈ ∈ mation Saff is expressed in terms of A and t, as follows: IP Πkfˆk ΠkIP (fˆk), (12) ≤ k ! k X=1 X=1 S (x A, t) = ATx + t. (15) aff | where fˆk is the Parzen density estimate for the kth point-set, and A more compact form of (15) can be obtained by consid- M ering homogeneous coordinates, where the original vector Π = k . (13) k N points are extended to (d + 1) dimensions with the last di- k=1 Mk mension as 1, This indicates that the weightedP sum of the information po- ˜ tentials of each of the point-sets Xk is greater than the in- x ˜ S (x A, t) = AT t . (16) formation potential of their union X, and equality is valid aff | | 1   if and only if all the point-sets are the same. From this ob-   servation, we propose as a cost function, the difference be- (k) From here on, it is assumed that the points xi have been tween the two terms in the inequality (12). The proposed extended to deal with translations implicitly as shown in X˜ cost function, expressed in terms of the point-sets k, can (16). be written as:

N N 5.2. Non•rigid transformation = ΠkIP (X˜ k) IP X˜ k . (14) J − Non-rigid transformations model local deformations, k k ! X=1 [=1 which is a non-linear operation on the coordinate points of Multi-shape alignment can be achieved by minimizing the the shape. In this work, we employ radial basis function above difference. As will will show below, transforming the (RBF) expansions to compute the non-rigid transformation. samples Xk to X˜ k, so that their weighted average informa- In the RBF expansion, the transformation is defined as a tion potential is equivalent to the information potential of linear combination of M basis functions φi(x), their union X˜ , yields the desired alignment. M M

Snr(x W, φ) = wiφi (x) = wiφ ( x xi ) . 5. Point-set registration | k − k i=1 i=1 X X As we mentioned above,a collection of N point-sets Xk, (17) assumed to be transformed versions of samples drawn from Each of the basis elements φi is a nonlinear function of the a common random variable X˜, can be aligned to a common Euclidean distance between the point x, at which the func- shape by finding a set of transformations Sk such that the tion is evaluated, and the correspondingcenter point xi. The estimated distributions of Sk(Xk) minimize the objective most common RBFs used for non-rigid transformations are function (14). Nevertheless, this is an ill-posed problem be- thin plate splines (TPS) [2] and Gaussian RBFs. The key cause the distribution of X˜ is unknown. To select a partic- advantage of both of these functions is that their approxi- ular solution, we impose constraints on the set of estimated mation errors approach to zero asymptotically. However, transformations as well as the target distribution. For each the non parametric nature of the RBF expansion where all Xk, the desired transformationcan be brokeninto two parts: data points are also control points, if not handled properly, 1) affine transformation, and 2) non-rigid transformation. can have a significant impact on the computational com- The affine transformation has a global effect on the shape plexity. Although this is an important problem on its own and is mainly composed of linear transformations such as right, it is out of the scope of this paper, and will not be rotation, scaling, shear, and a translation. The non-rigid addressed any further. transformation accounts for local deformations that cannot The transformation Sk is the linear combination of the be expressed by the affine transformation. In addition, we affine and non-rigid components as follows: assume that the function is smooth in the sense that two sim- ilar inputs correspond to two similar outputs. This assump- Sk(x Ak, tk, Wk, φk) = Saff (x Ak, tk) + Snr(x Wk, φk) | | | tion is crucial when dealing with the non-rigid part of the T x T = A tk + W φk(x). transformation. Smoothness of the solution is enforced by k | 1 k   introducing a regularization term into our problem.   (18) 5.1. Affine transformation The mapping φk is defined by the set of basis functionscen- Let Xk denote a matrix of size Mk d, where each row tred at the points of Xk, which yields Wk of size Mk d. vector is a point in Rd. An affine transformation× is obtained To simplify notation, we will denote the affine mapping× by

6696 A. It should be understood that x is the augmented vector 5.4. Solving for transformation matrices A and W and t is contained in A. Equation (18) is then written as: The cost function can be expressed in terms of the T T J Sk(x Ak, Wk, φk) = A x + W φk(x). (19) cross-information potential (5). First, we have that for each | k k transformed shape X˜ k with Mk points, For each shape Xk, the transformed shape X˜ k can be com- pactly written in matrix form, IP (X˜ k) = CIP (X˜ k, X˜ k). (25)

X˜ k = [Xk 1] Ak + ΦkW, (20) | The information potential of the collection of transformed where Φ is the matrix of all pairwise evaluations of φ, shapes ˜X is given by: k { k} that is, Φ x(k) x(k) . However, since the ( k)ij = φ i i N N k − k 1 columns of the augmented matrix [Xk 1] could also be- ˜ ˜ ˜ IP Xk = 2 MkMℓCIP (Xk, Xℓ). long to the span of the columns of Φ ,| part of the affine ! N k k=1 M k,ℓ=1 transformation could be implicitly contained on the second [ k X k=1  term of the r.h.s. of (20). It is possible to decouple the P (26) affine and non-rigid components by projecting Φk to the The cross-information potential can be expressed in com- kernel of the span of the columns of Xk. The projection pact form using Gram matrices. For a given kernel, in par- can be obtained based on the QR factorization of [Xk 1]. ticular for the Gaussian kernel of fixed width, let the matrix | Let QkRk = [Xk 1], where Qk = QX Q⊥ . A decou- G(X˜ k, X˜ ℓ) contain all pairwise evaluations of points in X˜ k | k | k pled version of (20) can be obtained by replacing Φk with and X˜ ℓ. For short, we will denote G(X˜ k, X˜ ℓ) as Gkℓ and   its entries by ¯ T T Φk = Qk⊥Qk⊥ Φk Qk⊥Qk⊥ . (21) G(kℓ) = G x˜(k), x˜(ℓ) . (27) Out of sample extensions of the solution are given by: ij √2σ i j   T T ¯ Sk(x) = Ak x + Wk ΦkΦk†φk(x), (22) Then, the cross-information potential can be written as: where Φk† denotes the Moore-Penrose pseudo-inverse of X˜ X˜ 1T G 1 CIP ( k, ℓ) = Mk kℓ Mℓ , (28) Φk. From this point on, to ease notation, Φk will be under- stood as the decoupled version obtained in (21), and Xk as where 1M denotes a vector of length M and equal entries the augmented matrix that accounts for the shift parameter 1/M. The partial derivativesof the cost functioncan be eas- in the affine transformation. ily computed based on the partial derivatives of the cross- information potential, which are shown below: 5.3. Regularization ˜ ˜ Incorporating a stabilizer into the solution is an impor- ∂CIP (Xk, Xℓ) T ˜ T ˜ Xk GkℓXℓ Xk dg(MℓGkℓ1Mℓ )Xk, tant part of shape registration because it enforces smooth- ∂Ak ∝ − ness on the transformation function. Since we are dealing ˜ ˜ ∂CIP (Xk, Xℓ) T ˜ T ˜ with a finite set of points, there can be an infinite num- Φk GkℓXℓ Φk dg(MℓGkℓ1Mℓ )Xk, ∂Wk ∝ − ber of transformations that match the corresponding points, (29) but have very different behavior for unseen portions of the shape. Choosing the smoothest solution through a regular- where dg(v) denotes a diagonal matrix with the values of v ization term provides a unique tractable solution. on its main diagonal. The regularization theory originates from the work of Tikhonov [20] where the existing optimization problem is 5.5. Normalized information potential augmented with a regularization term. In our case, we have: Likewise differential entropy, the information potential is not scale invariant, so using the cost function as presented minimize ( Sk ) + λΩ( Sk ), (23) Sk J { } { } above to align the point-sets can end up at a saddle point { } on the performance surface. A global minimum could be where Ω( S ) is the regularization term that constraints k reached by simply collapsing all the point-sets into to a sin- the smoothness{ } of set of functions S , and λ is the free k gle point. To ensure that such a collapse does not occur, we parameter that trades-off between the{ alignment} ( S ) k introduce a normalized form of the cost function that con- objective and the smoothness of the solution. ForJ the{ non-} fines the solution space. Here, we rely on the fact that the rigid transformation, the regularization term is given by the entropy of a random variable depends on its standard de- pseudo-norm, viation. To prevent unwanted, scaled-down solutions, we T Ω( Sk ) = tr W ΦkWk . (24) modify the cost function in (14) dividing the information { } k  6697 potentials by the squared root of the total variance, which more closely point-set 7, Figure 2(g), and its final regis- corresponds to the trace of the covariance matrix. The cost tration is smaller than the rest of the point-sets. Holder’s function corresponds to the following difference of normal- final registration has a close alignment with the initial posi- ized information potentials: tions of each individual point-set and is quite similar to the unconstrained IP solution. However, both get scaled down

N IP X˜ k in the y-axis. This is due to the scaling problem previously IP (X˜ k) = Πk , (30) discussed. Image 2(l) shows the results of the normalized J −   k=1 tr(C˜ k) tr(SC˜ ) IP with the constraint term in (30). This demonstrates the X ∪ q q importance of the constraint when we minimize the cost. The results demonstrate that our methods not only provide where C˜ k are the covariance matrices of each transformed a final registration that resembles the average shape more shape X˜ k, and C˜ is the covarianceof the unionof all trans- formed shapes. ∪ closely, but they also provide a better fit of the final point- sets. 6. Experimental results To compare the registration capabilities of the methods against noise, we add a few points in one of the To illustrate the group-wise registration capabilities of point-sets, namely, point-set 7. Fig. 3(a) shows the point- the proposed method, Fig. 1 shows an example of multiple sets before registration plus the outlier samples which are shapes of the same object rotated, translated, and scaled to shown in dark stars. Sub-figures 3(b), 3(c), and 3(e) show different sizes. Fig. 1(A) shows the initial position of the the final registration for CDF-HC, Holder’s, and normal- objects. The rest of the subfigures show steps through the ized information potential, respectively. In addition to the alignment process. Note that there are no jumps occurring seven point-sets after registration, each sub-figure also con- during the registration process. This is not the case for the tains point-set 7 from the registration without the outlier two methods based on CDFs that we compare against. The samples to show the deviations that may have occurred in resulting alignment is an average of rotation and scaling of the final registration due to the outliers. The final registra- all the individual shapes. It does not coincide with any of tion shows that our method is more robust to outliers than the original shapes, but it reflects a compromise in orienta- CDF-HC. When compared against point-set 7 of the regis- tion and size of the different point-sets. tration without outliers, our method has very little deviation, whereas CDF-HC shows a more pronounced convergence 6.1. Groupwise registration for atlas construction of the point-sets towards the outliers. The following example shows group-wise registration We need to point out that even though our methods per- using both affine and non-rigid transformations on a dataset form very well, we have two free parameters in the cases borrowed from [3]. The dataset contains points extracted when TPS are used; namely, the kernel size σ required to from the outer contours of the corpus callosum (CC) of estimate the information potential, and the regularization seven subjects. In this experiment, we demonstrate the abil- parameter λ that controls the ‘smoothness’ of the function. ity of our algorithm for unbiased 2D atlas construction. In In addition, when Gaussian RBFs are employed, a kernel addition to the information potential and normalized infor- bandwidth β that determines the locality influence of con- mation potential differences, we also provide results based trol points must be defined, as well. Determining these on H¨older’s divergence. Fig. 2 shows the registration results parameters and establishing an optimal annealing rate are for the two algorithms and also for CDF-HC [3]. In our ex- problem dependent. In our examples above using the corpus periments, we modified the optimization routine used for callosum data, we use TPSs as basis functions, λ = 0.1 for CDF-HC with the minimize routine that uses conjugate gra- the information potential and normalized information po- dients and approximate line searches based on polynomial tential algorithms and 0.01 for Holder’s algorithm. For all interpolation with Wolfe-Powel conditions from [16]. The three proposed algorithms, we initialize σ = 0.1 and ap- first seven images 2(a)-2(g) show the deformation of each ply 98% annealing rate. To observe the effect of different point-set to the atlas generated by one of the three meth- regularization parameter values, Fig. 5 shows four different ods. The color scheme used in the first seven images is as values for λ while initializing σ = 0.2 with 98% annealing follows: the initial point-set is denoted with blue ‘+’, the rate. deformed points sets are denoted with circles which are in Figure 5 depicts the behavior of the alignment when λ color green, black, red and magenta corresponding to CDF- is changed by one order of magnitude. It can be seen how HC, Holder’s, IP, and normalized IP algorithms. Image 2(h) lower values of λ allow the points to freely move and pro- shows the superimposed point-sets before the registration. vide an almost perfect overlap. Nevertheless, applying the Images 2(i)-2(l) show the superimposed point-sets after reg- learned transformation from the noisy subsampled shapes to istration for each method. Notice that CDF-HC follows the original shapes exposes the overfitting phenomena that

6698 1.5 1 1 1 1 1

0.8 0.8 0.8 0.8 0.8 1 0.6 0.6 0.6 0.6 0.6

0.4 0.4 0.4 0.4 0.4 0.5 0.2 0.2 0.2 0.2 0.2

0 0 0 0 0 0 −0.2 −0.2 −0.2 −0.2 −0.2

−0.4 −0.4 −0.4 −0.4 −0.4 −0.5 −0.6 −0.6 −0.6 −0.6 −0.6

−0.8 −0.8 −0.8 −0.8 −0.8 −1

−1.5 −1 −0.5 0 0.5 1 1.5 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 (a) initial position (b) step 5 (c) step 20 (d) step 50 (e) step 80 (f) final alignment Figure 1. Example of multiple shapes aligned using the Information potential cost function.

0.1 0.1 0.1 0.1 0.1 0.1 0.1

0.05 0.05 0.05 0.05 0.05 0.05 0.05

0 0 0 0 0 0 0

−0.05 −0.05 −0.05 −0.05 −0.05 −0.05 −0.05

−0.1 −0.1 −0.1 −0.1 −0.1 −0.1 −0.1

−0.15 −0.15 −0.15 −0.15 −0.15 −0.15 −0.15

−0.2 −0.2 −0.2 −0.2 −0.2 −0.2 −0.2 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 (a) point-set 1 (b) point-set 2 (c) point-set 3 (d) point-set 4 (e) point-set 5 (f) point-set 6 (g) point-set 7

0.1 0.1 0.1 0.1 0.1

0.05 0.05 0.05 0.05 0.05

0 0 0 0 0

−0.05 −0.05 −0.05 −0.05 −0.05

−0.1 −0.1 −0.1 −0.1 −0.1

−0.15 −0.15 −0.15 −0.15 −0.15

−0.2 −0.2 −0.2 −0.2 −0.2 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 (h) Before Registr. (i) CDF-HC (j) Holder’s (k) IP difference (l) Normalized IP difference Figure 2. Example of unbiased group-wise non-rigid registration on real CC data sets. Performance comparison using CDF-HC, estimate of Holder’s, IP, and normalized IP methods.

0.1 0.1 0.1 0.1 0.1

0.05 0.05 0.05 0.05 0.05

0 0 0 0 0

−0.05 −0.05 −0.05 −0.05 −0.05

−0.1 −0.1 −0.1 −0.1 −0.1

−0.15 −0.15 −0.15 −0.15 −0.15

−0.2 −0.2 −0.2 −0.2 −0.2 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 (a) Before Regist. (b) CDF-HC (c) Holder’s (d) IP difference (e) Normalized IP difference Figure 3. Example of unbiased groupwise no-rigid registration on outlier noise. Compare the three methods: CDF-HC, estimate of Holder’s, IP difference and normalized IP difference. can result when the non-rigid transformation is not properly truth point-set, the first set, and the final registered point- regularized. sets. The average KS-statistic results for the corpus callo- sum (CC7), CC7 with outliers CC7(+out), CC7 with out- 6.2. Groupwise registration of biased datasets liers registration but only considering the original points for To demonstrate the accuracy and robustness against the KS statistic CC7(-out), olympic logo, and the noisy fish noise, we borrowed two data-sets from [3]: olympic logo datasets are shown in Table 1. We also include KS-statistic and noisy fish. These datasets were synthetically generated results of the CDF-JS [22] for the olympic and fish datasets to be biased. The first point-set is the original set and the that are listed in [3]. The results show that our methods other six were generated by passing the point-set through perform better than CDF-JS and CDF-HC. It is important various non-rigid transformations using thin-plate splines. to highlight the difference between the KS statistic on the For the second dataset, the fish, in addition to the transfor- CC7 with outliers data set when the outliers are removed mation, ten randomly generated jitter points were also in- from the set. When the outliers are removed after registra- serted into each point-set. Notice, this is a different kind tion to compute the KS statistic, the normalized IP algo- of perturbation from the one employed in the regularization rithm not only exposes the best performance both also the experiments (5), where the point locations were altered by largest decrease in KS statistic in agreement with the visual adding Gaussian noise. The initial point-set positions and inspection from Figure 3. the final registration results for CDF-HC, Holder’s, IP, and The normalized information potential algorithm per- normalized IP are shown in Fig. 4. For the olympic and forms well on any transformation type and noise level. In fish data λ = 0.01, σ = 0.5 with 99% annealing rate for all addition, in comparison with CDF-HC, the information po- three algorithms. tential algorithm is faster to compute. Table 2 shows the av- To analytically measure the accuracy of the registration - erage computation time of CDF-HC, H¨older’s, information the similarity between the final point-sets, we compute the potential, and normalized information potential on three dif- Kolmogorov-Smirnov (KS) statistic [7] between the ground ferent sets of data, where all three methods are set to run

6699 1.5 1.5 1.5 1.5 1.5

1 1 1 1 1

0.5 0.5 0.5 0.5 0.5

0 0 0 0 0

−0.5 −0.5 −0.5 −0.5 −0.5

−1 −1 −1 −1 −1

−1.5 −1.5 −1.5 −1.5 −1.5

−2 −2 −2 −2 −2 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 (a) Initial (b) CDF-HC (c) Holder’s (d) IP (e) Normalized IP

2 2 2 2 2

1.5 1.5 1.5 1.5 1.5

1 1 1 1 1

0.5 0.5 0.5 0.5 0.5

0 0 0 0 0

−0.5 −0.5 −0.5 −0.5 −0.5

−1 −1 −1 −1 −1

−1.5 −1.5 −1.5 −1.5 −1.5

−2 −2 −2 −2 −2 −4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4 (f) Initial (g) CDF-HC (h) Holder’s (i) IP (j) Normalized IP Figure 4. Example of biased group-wise non-rigid registration. Compare the four methods: Holder’s, information potential, normalized information potential, and CDF-HC on the olympic logo and noisy fish datasets.

2 2 2 1.5 1.5 1.5 Table 1. KS statistic 1 1 1

0.5 0.5 0.5

0 0 0 −0.5 −0.5 −0.5 CDF-JS CDF-HC Holder IP Norm-IP −1 −1 −1

−1.5 −1.5 −1.5

−2 −2 −2 CC7 N/A 0.0661 0.0555 0.0503 0.0317 −4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4 CC7(+out) N/A 0.0577 0.0635 0.0539 0.0405 (a) (b) (c) CC7(-out) N/A 0.0556 0.0556 0.0529 0.0317

2 2 2 olympic 0.1103 0.0295 0.0206 0.0177 0.0177

1.5 1.5 1.5 1 1 1 fish(out) 0.1314 0.0462 0.0387 0.0383 0.0383 0.5 0.5 0.5

0 0 0

−0.5 −0.5 −0.5

−1 −1 −1

−1.5 −1.5 −1.5

−2 −2 −2 −4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4 (d) (e) (f) methods perform much faster than CDF-HC and CDF-JS.

2 2 2 1.5 1.5 1.5 Table 2. Run time 1 1 1

0.5 0.5 0.5 CDF-HC Holder IP NormIP

0 0 0 −0.5 −0.5 −0.5 CC7 252s 28s 30s 31s −1 −1 −1 −1.5 −1.5 −1.5 CC7(outliers) 270s 29s 30s 31s −2 −2 −2 −4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4 olympic 746s 74s 57s 60s (g) (h) (i) fish(outliers) 1946s 131s 111s 117s

2 2 2

1.5 1.5 1.5

1 1 1

0.5 0.5 0.5

0 0 0

−0.5 −0.5 −0.5

−1 −1 −1

−1.5 −1.5 −1.5 7. Conclusions

−2 −2 −2 −4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4 (j) (k) (l) In this paper, we presented a robust algorithm to simulta- Figure 5. Registration results for different regularization levels λ. neously register multiple unlabelled point-sets represented The first row displays the resulting registration for λ = 0.1 us- as density functions. We used the argumentof the logarithm ing the original set of shapes 5(a), a sub-sampled set of points in R´enyi’s second order entropy and Jensen’s inequality to which are corrupted with zero mean and 0.05 standard deviation yield a cost function with closed-form solution that allows i.i.d. Gaussian noise 5(b), and the original set of shapes trans- gradient based parameter updates. We observed that the formed using the parameters learned based on the sub-sampled alignment based on the poposed cost function focuses on noisy shapes 5(c). Similarly, row 2 (5(d), 5(e), and 5(f)) corre- high density regions making it robust to the presence of out- sponds to λ = 0.01, and rows 3 (5(g), 5(h), and 5(i)) and 4 (5(j), liers. Furthermore, the proposed normalized cost function 5(k), and 5(l)) to λ = 0.001 and λ = 0.0001, respectively. avoids the necessity of impossing constraints on the rigid transformation, which can be difficult to set, and the reg- ularized non-rigid transformation can handle cases where for 300 epochs. The results show that the proposed meth- points-sets are noisy. These poperties arise from the aver- ods, H¨older’s, information potential, and normalized infor- aging behaviour of the group-wise registration, where no mation potential, are computationally less demanding. Our point-set is employed as reference.

6700 References [12] Jian, B., Vemuri, B.: A robust algorithm for point set registration using mixture of gaussians. In: Computer [1] Besl, P.J., McKay, N.D.: A method for registration of Vision, 2005. ICCV 2005. Tenth IEEE International 3-d shapes. IEEE Trans. Pattern Anal. Mach. Intell. Conference on, vol. 2, pp. 1246 – 1251 (2005) 1 14(2), 239–256 (1992) 1 [13] Jian, B., Vemuri, B.C.: Robust point set registration [2] Bookstein, F.L.: Principal warps: Thin-plate splines using gaussian mixture models. Pattern Analysis and and the decomposition of deformations. IEEE Trans. Machine Intelligence, IEEE Transactions on 33(8), Pattern Anal. Mach. Intell. 11(6), 567–585 (1989) 4 1633 –1645 (2011) 1 [3] Chen, T., Vemuri, B.C., Rangarajan, A., Eisenschenk, [14] Parzen, E.: On estimation of a probability density S.J.: Group-wise point-set registration using a novel function and mode. The Annals of Mathematical cdf-based havrda-charvat divergence. Int. J. Comput. Statistics 33(3), 1065–1076 (1962) 2 Vision 86(1), 111–124 (2010) 2, 6, 7 [15] Principe, J.C.: Information Theoretic Learning, [4] Chui, H., Rangarajan, A.: A new algorithm for non- Renyi’s Entropy and Kernel Perspectives. Springer, rigid point matching. In: and Pattern New York, NY (2010) 2 Recognition, 2000. Proceedings. IEEE Conference on, vol. 2, pp. 44 –51 (2000) 1 [16] Rasmussen, C.E., Williams, C.K.I.: Gaussian Pro- cesses for Machine Learning. The MIT Press, Cam- [5] Chui, H., Rangarajan, A.: A new point matching algo- bridge, Massachusetts (2006) 6 rithm for non-rigid registration. Comput. Vis. Image [17] Renyi, A.: Selected papersof Alfred Renyi. Akademia Underst. 89(2-3), 114–141 (2003) 2 Kiado, Budapest (1976) 3 [6] Chui, H., Rangarajan, A., Zhang, J., Leonard, C.: Un- [18] Rudin, W.: Principles of Mathematical Analysis. supervised learning of an atlas from unlabeled point- McGraw-Hill Publishing Co., New York (1976) 2 sets. Pattern Analysis and Machine Intelligence, IEEE Transactions on 26(2), 160 –172 (2004) 2 [19] Shannon, C., Weaver, W.: The Mathematical Theory of Communication. University of Illinois Press, Ur- [7] Feller, W.: On the kolmogorov-smirnov limit theo- bana (1949) 3 rems for empirical distributions. The Annals of Math- ematical Statistics 12(2), 177–189 (1948) 7 [20] Tikhonov, A.N., Arsenin, V.I.: Solutions of Ill-Posed Problems. Winston and Sons, Washington (1977) 5 [8] Glaunes, J., Trouve, A., Younes, L.: Diffeomor- phic matching of distributions: a new approach for [21] Tsin, Y., Kanade, T.: A correlation-based approach to unlabelled point-sets and sub-manifolds matching. robust point set registration. In: Computer Vision, In: Computer Vision and Pattern Recognition, 2004. 2004. ECCV 2004. Eighth European Conference on, CVPR 2004. Proceedings of the 2004 IEEE Computer vol. 3023, pp. 558 –569 (2004) 1 Society Conference on, vol. 2 (2004) 1 [22] Wang, F., Vemuri, B., Rangarajan, A.: Groupwise [9] Hasanbelliu, E., Sanchez Giraldo, L., Principe, J.: A point pattern registration using a novel cdf-based robust point matching algorithm for non-rigid regis- jensen-shannon divergence. In: Computer Vision and tration using the cauchy-schwarz divergence. In: Ma- Pattern Recognition, 2006 IEEE Computer Society chine Learning for Signal Processing (MLSP), 2011 Conference on (2006) 2, 3, 7 IEEE International Workshop on (2011) 2

[10] Hasanbelliu, E., Sanchez Giraldo, L., Principe, J.: In- formation theoretic shape matching. IEEE Transac- tions on Pattern Analysis and Machine Intelligence 36(125), 2436–2451 (2014) 2

[11] Jenssen, R., Principe, J.C., Erdogmus, D., Eltoft, T.: The cauchy-schwarz divergence and parzen window- ing: Connections to and mercer kernels. Journal of the Franklin Institute 343(6), 614 – 629 (2006) 2

6701