An Algebraic Approach to Affine Registration of Point Sets

Jeffrey Ho Adrian Peter Anand Rangarajan Ming-Hsuan Yang Department of CISE Northrop Grumman Department of CISE EECS University of Florida University of Florida University of California Gainesville, FL, USA Melbourne, FL, USA Gainesville, FL, USA Merced, CA, USA

Abstract mial, for which there exist now efficient and powerful op- timization techniques for computing the global minimum This paper proposes a new affine registration algorithm (e.g., [5, 14]). for matching two point sets in IR2 or IR3. The input point More specifically, any affine transform A in IRk induces sets are represented as probability density functions, using a linear transform A∗ on the (linear) space of multivariate either Gaussian mixture models or discrete density mod- polynomials on IRk by the familiar rule: if p is a polyno- els, and the problem of registering the point sets is treated mial, its transformation A∗(p) under A∗ is the polynomial as aligning the two distributions. Since polynomials trans- whose value at a point x ∈ IRk is the value of p at the point form as symmetric tensors under an affine transformation, A(x), the distributions’ moments, which are the expected values A∗(p)(x) = p(A(x)). of polynomials, also transform accordingly. Therefore, in- If the two density functions are related exactly by an affine stead of solving the harder problem of aligning the two dis- 1 tributions directly, we solve the softer problem of matching transform A , we have the relation the distributions’ moments. By formulating a least-squares | det(A)|P (A(x)) = P (x), (1) problem for matching moments of the two distributions up T S to degree three, the resulting cost function is a polynomial where | det(A)| is the absolute value of the determinant of that can be efficiently optimized using techniques originated A, which is also the determinant of the Jacobian matrix as- from algebraic geometry: the global minimum of this poly- sociated to the transform A. Note that the presence of the nomial can be determined by solving a system of polynomial determinant is required to ensure that the integral of the den- equations. The algorithm is robust in the presence of noises k sity function PT over IR is indeed one. For example, a zero and , and we validate the proposed algorithm on a mean and unit variance is transformed variety of point sets with varying degrees of deformation under an linear transform A into a zero mean normal distri- and noise. bution with variance AA>, and the determinant | det(A)| in Equation1 comes from the normalization constant of the transformed density function [1]. 1. Introduction It then follows directly from the equation above that the expected values of a polynomial f and its transformation This paper proposes a new affine registration algorithm for A∗(f) 2 3 with respect to their corresponding distributions are two point sets in IR and IR based on solving a system the same: of polynomial equations. Let S = {s , ··· , s }, T = 1 m Z Z {t1, ··· , tn} denote the two point sets for which we seek ∗ A (f) PS dx = f(A(x)) PT(A(x))| det(A)| dx an affine transform A that best matches them. Similar to IRk IRk several recent point registration algorithms (e.g., [20, 11]), the point sets S, T are assumed to be independently sam- where dx is the volume element in IRk, and a simple change pled data points according to some probability density func- of variables shows that tions (PDFs) PS, PT, respectively. The main idea of this Z Z ∗ paper is to solve the registration problem by matching the A (f) PS dx = f(y) PT(y) dy. corresponding moments of the two distributions. Since IRk IRk the moments are the expected values of polynomials, the 1For example, in IR2, this means that the two random variables x, y are matching cost function will also be a (multivariate) polyno- subject to the affine transform: x → ax + by + e, y → cx + dy + f.

1 If A∗(f) is a linear combination of some basis polynomials do symbolic computations have matured considerably and g1, ··· , gl, become widely available. This makes computing the cost function E almost effortless. Second and more importantly, ∗ A (f) = a1 g1 + ··· + algl, there are now efficient and robust techniques available for computing the global minimums of polynomials [5, 14]. we have This is very important because it is difficult to guarantee l the quality of the solution if it is only a local minimum of Z X Z f PTdx = ai gi PS dx. (2) the cost function. k k IR i=1 IR Needless to say, affine registration has been studied quite extensively in vision literature. Most of the published al- If the basis polynomials g , 1 ≤ i ≤ l are the monomials, i gorithms require some kind of optimization either directly their integrals above are the moments [1] of the distribu- on IRk or on some relevant Lie group such as SO(k). For tions P . Furthermore, if f and g are appropriately chosen, S i this part, has always been the method of the coefficients a , 1 ≤ i ≤ l will be polynomials in the en- i choice. While many of these algorithms perform well most tries of the matrix M representing the affine transform A, A of the time, it is virtually impossible to guarantee conver- and the equation above provides one polynomial constraint gence of these algorithms to the global minimum of their for M . With a collection of h linearly independent poly- A cost functions. However, by formulating a polynomial cost nomials, {f , ··· , f }, we have h polynomial constraints, 1 h function, we can indeed guarantee that our solution will be which can be used to formulate the matching cost function: the global minimum of the cost function. Furthermore, our h approach makes a good intuitive sense because distributions X E(A) = ω MM2 , can be characterized by their moments, and a good affine i fi (3) i=1 transform between the two distributions should match the moments reasonably well. In particular, compared to mo- where ments of higher-degrees, the first few moments (low-degree Z k Z moments) are much more important because they encode X i MMfi (A) = fi PTdω − aj gj PS dω, certain global geometric properties of the distributions that k k IR j=1 IR are important for perceptions. For example, the linear mo- ments are related to the centroid, the quadratic moments are and ωi ≥ 0 are the weights. That is, instead of match- related to the variance and the cubic moments are related to ing points directly, we match the moments, and the affine skewness [1]. Therefore, even if the affine transform asso- transform A is computed as the global minimum of the cost ciated to the global minimum of our cost function is not the function E, which is a polynomial. true affine transform, we can still expect that the true affine We do not claim that the idea of point registration by transform should be near our solution because it will most matching moments is new. In fact, this idea was already likely match the moments almost as well as our solution. implicit in the early work of [16, 17,2]. In these seminal work on rigid registration published in the early 90s, the 2. Affine Registration Algorithm rigid registration is accomplished by matching principal di- In this section, we detail the proposed affine registration al- rections computed from the point sets. This can be easily gorithm. We will first describe how the polynomials trans- shown to be equivalent to matching the quadratic moments formed under a general affine transformation. This is then of the discrete density functions associated to the point sets, followed by the discussions on how to cast the registration m n problem into the form that can be solved using polynomial 1 X 1 X P = δ , P = δ . optimization. S m si T n ti k i=1 i=1 Let S and T as above denote two point sets in IR , and our problem is to estimate an affine transformation A that Algebraically, matching quadratic moments was tractable best approximates T by the image of S under A2. In the back in the early 90s because completing squares is rela- following discussions, we will consider only point sets in tively easy and global minimization is straightforward. For IR2 since the IR3 case is almost exactly the same except moments with degrees greater than two, the coefficients the algebra is messier because of more variables. For the i aj and hence the cost function E starts to become cum- moment, we will ignore the translational component of A bersome to compute by hand and its optimization requires and focus on its linear component more elaborate techniques. Two developments in the in-  X   a b   x  terim period have made it possible now to explore the pos- = . (4) sibility of matching higher-degree moments for point reg- Y c d y istrations. First, software packages such as MAPLE that 2k = 2, 3 are the cases of interest. ∗ 2.1. Induced Linear Transformations and the matrix for A3 is 2 As a linear transform of IR , A induces a family of linear  a3 a2c ac2 c3  transformations on the polynomials. Specifically, for each 2 2 2 2 ∗  3a b a d + 2abc c b + 2acd 3c d  integer d ≥ 1, A induces a linear transformation A∗ on the A ≡   . d 3  3ab2 2abd + cb2 2bcd + ad2 3cd2  space of polynomials of degree d in x and y. This can be b3 b2d bd2 d3 seen immediately using abstract linear algebra because ho- mogeneous polynomials of degree d are symmetric tensors For any degree d ≥ 1, the space of homogeneous in the d-fold tensor product space3 V ⊗d ≡ V ⊗V ⊗· · ·⊗V , polynomials of degree d has dimension d + 1 as there where V is the vector space IR2. Any linear transforma- are d + 1 linearly independent degree d monomials d d−1 d−1 d i tion of the base vector space V will induce a linear transfor- x x y, ··· , xy , y . We will denote gd for 1 ≤ i ≤ mation for each V ⊗d that maps symmetric tensors to sym- d+1 these d+1 basis monomials of degree d, and the above metric tensors, i.e., linear transformations of homogeneous discussions can be summarized succinctly by the following polynomials. The important point is that once a matrix rep- equation resentation for A is given, all the linear transformations A∗ k d ∗ i X i j can be written down in matrix form such that the entries of Ad(gd) = Adj gd, (5) ∗ j=1 the matrix for Ad are degree d polynomials in the entries of the matrix for A. i ∗ where Adj denote the entries of the matrix for Ad. Note that Instead of citing standard references (e.g., [10]) to illus- i each Adj is indeed a degree d homogeneous polynomial in trate the above, we will work out one example in detail. the entries of the matrix A (a, b, c and d). Consider the affine transformation A in Equation4. Un- der A, the linear monomials x, y are transformed into the 2.2. Matching Moments following two linear monomials: Applying Equations2 and5, we have x → ax + by, y → cx + dy. Z k Z Since these two monomials x, y form a basis for the space i X i j 2 gd PTdx = Adj gd PS dx. (6) of linear polynomials on IR , the above two transforma- k k IR j=1 IR ∗ tions define a linear transformation A1 on the space of lin- ∗ ear polynomials, and the matrix representation for A1 using Each integral in the equation above defines one partic- this basis is the transpose of A in Equation4. This result ular moment of degree d for its corresponding density can be extended easily to define transformations for mono- function [1]. Therefore, if we assume that all moments R i R i mials of higher degrees using polynomial multiplications. IRk gd PS dx, IRk gd PT dx of a given degree exist, the For instance, when d = 2, we have the transformations for above equation then provides us with one constraint for A. the following three monomials: In particular, since we assume that the distributions PS, PT are either the discrete densities or Gaussian mixtures, all 2 2 2 2 2 2 x →(ax + by) = a x + 2abxy + b y , of their moments exist and is finite: for the former, it is 2 2 xy→(ax + by)(cx + dy) = acx + (ad + bc)xy + bdy , just the average of the values of the polynomial evaluated 2 2 2 2 2 2 y →(cx + dy) = c x + 2cdxy + d y . at the points and for the latter, the moments can be directly Since any quadratic polynomial is a linear combination of computed using the moment generating function for a mul- the above three basis monomials, these transformations al- tivariate normal distribution N(µ; Σ) with mean µ and co- ∗ variance matrix Σ [1]: low us to define a linear transformations A2 for the space of quadratic polynomials as above, and its matrix represen- 1 M(t) = exp(µ>t + t>Σt), tation is (using this basis) 2  2 2  a ac c where t is the column vector [x y]>. The specific formula ∗ A2 ≡  2ab ad + bc 2cd  . we need is that the moments of N(µ; Σ) can be computed 2 2 b bd d by Z ∂M For degree three polynomials, the transformations for the i j x y N(µ; Σ) dx = i j (0), four basis monomials are IRk ∂ x∂ y which allows us to quickly compute the moments for a x3→a3x3 + 3a2bx2y + 3ab2xy2 + b3y3, Gaussian mixture by taking the (weighted) sum of the mo- x2y→a2cx3 + (a2d + 2abc)x2y + (2abd + cb2)xy2 + b2dy3, ment of each Gaussian component. xy2→ac2x3 + (c2b + 2acd)x2y + (2bcd + ad2)xy2 + bd2y3, Finally, we denote IS, IT the vector of degree d moments y3→c3x3 + 3c2dx2y + 3cd2xy2 + d3y3, d d of PS, PT, respectively. These vectors are obtained by ver- 3These are the rank-d tensors. tically stacking the d + 1 moments formed by integrating i S T the d + 1 basis monomials gd, and accordingly, Id , Id are same method for solving polynomial equations has been vectors of length d + 1. Equation6 can now be put into a applied previously to solve other vision problems such as very succinct form triangulation and camera calibration (e.g. [19, 13]). How- ever, we remark that our problem is significantly easier than T ∗ S Id = AdId (7) theirs because there is no “denominators” in our cost func- tion and the application of the Grobner¨ basis here is much for all d > 0. Therefore, for a point registration problem, more straightforward. once the PDFs PS, PT have been computed, we can try to minimize the following cost function 3. Related Work D X Using probability distributions in point set registration has E(A) = ω kIT − A∗ ISk2, (8) d d d d become quite popular in recent years, e.g., [4, 20, 11,8]. d=1 Replacing the point sets with their associated distributions where ωd > 0 are the weights, and D gives the upper bound provides a more robust and principled way of dealing with on the degrees of the moments used in the matching. Note outliers and noises. While these cited papers deal with the S T that all the vectors Id , Id are known and their dimensions more general non-rigid registration, the affine registration is depend on the degree d. For a fixed D, Equation8 is a nevertheless an important component in these algorithms as polynomial of degree 2D in the entries of A. it is typically the crucial first step in aligning the point sets. We remark that in general we can assume that all the lin- In [20], the matching cost function is the KL-divergence be- ear moments vanish, and this corresponds to centering the tween the two distributions and in [11], it is the L2-distance point set with respect to it own centroid. In IRk, there are between the distributions. Affine registration is handled us- C(k + 1, 2) = k(k + 1)/2 quadratic basis moments. This ing gradient descent in these papers, and in [11], the closed- number is smaller than the dimension of GL(k), which is form formula for the gradient is known and this makes the k2. In particular, if we match only up to quadratic mo- optimization more efficient when compared with the case ments in Equation8, the resulting optimization problem (e.g., [20]), where the gradients have to be computed nu- will not have finite number of solutions but a continuous merically. family of solutions. In terms of linear algebra, this corre- There are registration algorithms that do not require min- sponds to the fact that given two covariance (symmetric and imization using gradient descent. As mentioned in the in- positive-definite) matrices, we can always match the two troduction, the methods proposed in [16, 17,2] are basi- matrices using affine transformations in GL(k). Therefore, cally equivalent to matching quadratic moments. To the best this shows that it is not enough to match only quadratic mo- of our knowledge, there does not seem to have been any ments. In the experiments below, we will match the mo- significant follow-up of these works that tackles the point ments up to degree three, and a simple dimension counting set registration problem using higher-degree moments, ex- will show that for a general pair of point sets, the solution cept perhaps [9]. However, the algorithm proposed in [9] (global minimum of E) will be finite in number. requires complex multiplications and this limits its appli- cability to point sets in IR2. Furthermore, the invariants 2.3. Polynomial Optimization matched there are generated by evaluating elementary sym- Given a multivariate polynomial P (x1, ··· , xn), a direct metric functions on the points, which, although similar, can- method for computing its global minimum is to enumerate not be interpreted as (discrete) moments. Iterative closest all possible critical points of P , and the global minimum point (ICP) (e.g., [15, 18,7]) is another class of registra- would be among these critical points. The critical points of tion algorithms that does not require continuous optimiza- P are the points at which the gradient of P vanishes tion (the optimization can be solved quickly using linear algebra once the correspondences are known). While this ∂P ∂P ∇P = ( , ··· , ) = 0. algorithm has been adopted widely and in many cases per- ∂x ∂x 1 n form brilliantly, it is difficult to prove the algorithm’s con- This provides us with a system of n polynomial equations vergence and at the same time, nothing much can be said with degree one less than the degree of P . This system about the quality of its solution in general. of equations can be solved by computing a Grobner¨ basis Finally, the polynomial optimization algorithms used in for the polynomial ideal generated by the n polynomials this paper have already made their appearances in several ∂P , ··· , ∂P [5]. In order to compute the Grobner¨ basis important vision papers published in the past few years, ∂x1 ∂xn efficiently, we need to define a monomial order carefully e.g., [12, 19, 13]. Polynomial cost functions abound in (details omitted), and there exist several software packages multi-view geometry problems, ranging from three-view (e.g. MAPLE) that can efficiently compute the Grobner¨ ba- triangulation, 2D homography estimation to camera calibra- sis given polynomial inputs and the monomial order. The tion (with radial distortion). We note however that our con- text is different from theirs in that the point correspondences variables. The Grobner¨ basis method is used to solve for the are known in these works and their algorithms output the affine transformation. optimal transformations according to the known correspon- 2 dences. In our case, the correspondences are not known 4.2. Matching Shapes in IR on the point-level (we are trying to compute the correspon- In the second set of experiments, we work with shapes from dences) but on the moment-level. That is, the moment vec- the MPEG shape database. In this shape database, there T S tors Id , Id of the same degree correspond under the induced are seventy different shape categories and each category has ∗ linear transform Ad (Equation7). twenty different shapes. We apply the proposed algorithm to affine register shapes in the same category. In this ex- 4. Experiments periment, the shapes are represented as point sets with sizes For lack of space, we will report only two sets of exper- ranging from 2000 to about 5000 points. A Gaussian mix- imental results. More (including 3D) results are available ture model is estimated for each shape using Expectation 4 online . Our goal in these two experiments is to demon- Maximization algorithm [6,3] with the number of mixture strate that the proposed algorithm is indeed robust against components ranging from 15 to 33. The matching cost func- noises and outliers. In the first set of experiments, we ran- 2 tion E is a polynomial with degree six (using all the mo- domly generate point sets and affine transformations in IR . ments up to degree three), and the Grobner¨ basis method is Various different amounts of noise are added to the point used to solve for the affine registration. Figure1 displays sets and we study the behavior of the algorithm under dif- several examples of successful affine registrations using the ferent noise settings. Second, we apply the algorithm to 2D proposed algorithm. point sets that are extracted from MPEG shape database, and we show several affine registration results between pairs 5. Conclusion of similar (but not the same) shapes. We have implemented We have proposed a new affine registration algorithm for the algorithm using MATLAB and MAPLE without any op- two point sets in IR2 or IR3 based on matching the mo- timization. The sizes of the point sets in the experiments ments of two probability density functions. The matching range from 100 to more than 4000, and both experiments cost function turns out to be a polynomial, and its global were run on a DELL desktop with a single 3.1GHz proces- minimum can be efficiently determined by solving a system sor. of polynomial equations. Experimental results have demon- 4.1. Affine Registration in IR2 strated that this new approach is indeed viable, and the al- gorithm is capable of producing good solutions even in the In this set of experiments, our aim is to give a qualitative presence of significant amount of noises and outliers. as well as quantitative analysis on the accuracy and robust- ness of the proposed method. We report our experimen- References tal results on synthetic data using point sets of different [1] T. W. Anderson. An Introduction to Multivariate Statistical sizes with various different noise settings. Tables1 sum- Analysis. John Willey & Sons, 1984.1,2,3 marizes the experimental results. The algorithm is tested [2] P. J. Besel and H. D. Mckay. A method for registration of 3-d in four different sizes, 100, 200, 500 and 1000, and five shapes. IEEE Transactions on Pattern Analysis and Machine different noise settings, 0%, 1%, 2%, 5%, 10%. For each Intelligence, 14(2):239–256, 1992.2,4 100 pair of point set size and noise setting, we ran trials, [3] C. Bishop. Neural Networks for . Oxford each with a randomly generated non-singular matrix A and University Press, Inc., 1995.5 a point set of the given size. In trials with x% noise setting, [4] H. Chui and A. Rangarajan. A new algorithm for non-rigid we add a uniform random noise (±x%) to each coordinate point matching. In Proc. IEEE Conf. on Comp. Vision and 0 of every point independently. Let A denote the estimated Patt. Recog., volume 2, pages 44–51, 2000.4 matrix. A point s ∈ S is matched to the point t ∈ T if [5] D. Cox, J. Little, and D. O’Shea. Using Algebraic Geometry. 0 t = minti∈T dist(A s, ti). For each trial, we report the Springer, 2005.1,2,4 percentage of mismatched points and the relative error of 0 [6] A. Dempster, N. Laird, and D. Rubin. Maximum likelihood the estimated matrix A0: kA −AkF , using the Frobenius from incomplete data via the em algorithm. Journal of Royal kAkF norm. Statistical Society. Series B, 39(1):1–38, 1977.5 As the point sets are randomly generated in this exper- [7] A. W. Fitzgibbon. Robust registration of 2d and 3d point sets. iment, Gaussian mixtures are clearly not suited for mod- and Image Understanding, 2(13):1145– elling these point sets, and we use the discrete densities in- 1153, 2003.4 stead. We use moments up to degree three, and the match- [8] J. Goldberger, S. Gordon, and H. Greenspan. An efficient ing cost function E(A) is a degree six polynomial with four image similarity measure based on approximations of kl- divergence between two gaussian mixtures. In Proc. Int. 4http://www.cise.ufl.edu/˜jho/AffineRegistration/ Conf. on Computer Vision, pages 487–493, 2003.4 Table 1. Experimental Results For each pair of point set size and noise setting, one hundred trials, each with a randomly generated matrix A and a point set, are performed. The averaged relative error of the estimated matrix and the percentage of mismatched points (in parenthesis) are shown. Point Set Size → 100 200 500 1000 Noise ↓ Matrix Error Matching Error Matrix Error Matching Error 0% 0.0 (0) 0.0 (0) 0.01 (0) 0.01 (0) 1% 0.0 (0) 0.0 (0) 0.02 (0) 0.02 (0) 2% 0.0 (0) 0.0 (0) 0.003 (0.1) 0.02 (0.1) 5% 0.01 (1) 0.01 (1) 0.01 (1) 0.03 (1) 10% 0.037 (2) 0.038 (2) 0.041 (2) 0.042 (3)

Figure 1. Matching 2D Shapes Results of matching shapes from five different shape categories in the MPEG Shape database. The source shapes are displayed in red in the first row. The source and target shapes (in blue) are displayed together in the second row. The matching results are shown in the last row.

[9] J. Ho, M.-H. Yang, A. Ranganrajan, and B. Vemuri. A new [16] G. Scott and C. Lonquiet-Higgins. An algorithm for associ- affine registration algorithm for 2d point sets. In WACV, ating the features of two images. Proc. of Royal Society of 2007.4 London, B244:21–26, 1991.2,4 [10] M. Itskov. Tensor Algebra and Tensor Analysis for Engi- [17] L. Shapiro and J. Brady. Feature-based correspondence: an neers. Springer, 2009.3 eigenvector approach. Image and Vision Computing, pages [11] B. Jian and B. Vemuri. A robust algorithm for point set reg- 283–288, 1992.2,4 istration using mixture of gaussians. In Proc. Int. Conf. on [18] G. C. Sharp, S. W. Lee, and D. K. Wehe. ICP registration Computer Vision, pages 1246–1251, 2005.1,4 using invariant features. IEEE Transactions on Pattern Anal- [12] F. Kahl and D. Henrion. Globally optimal estimates for geo- ysis and Machine Intelligence, 24(1):90–102, 2002.4 metric reconstruction problems. In Proc. Int. Conf. on Com- [19] H. Stewnius, F. Schaffalitzky, and D. Nister. How hard is puter Vision, pages 978–985, 2005.4 three-view triangulation really? In Proc. IEEE Conf. on [13] Z. Kukelova and T. Pajdla. A minimal solution to the autocal- Comp. Vision and Patt. Recog., volume 1, pages 686–693, ibration of radial distortion. In Proc. IEEE Conf. on Comp. 2005.4 Vision and Patt. Recog., 2007.4 [20] Y. Tsin and T. Kanade. A correlation-based approach to ro- [14] J. Lasserre. Global optimization with polynomials and the bust point set registration. In Proc. European Conf. on Com- problem of moment. SIAM Journ. Optimization, 11:796– puter Vision, pages 558–569, 2004.1,4 817, 2001.1,2 [15] S. Rusinkiewicz and M. Levoy. Efficient variants of the icp algorithm. In Proc. Third International Conference on 3D Digital Imaging and Modeling (3DIM), pages 145–152, 2001.4