Robust Estimation of Nonrigid Transformation for Point Set Registration

Jiayi Ma1,2, Ji Zhao3, Jinwen Tian1, Zhuowen Tu4, and Alan L. Yuille2 1Huazhong University of Science and Technology, 2Department of Statistics, UCLA 3The Institute, CMU, 4Lab of Neuro Imaging, UCLA jyma2010, zhaoji84, zhuowen.tu @gmail.com, [email protected], [email protected] { }

Abstract widely studied [5,4,7, 19, 14]. By contrast, nonrigid reg- istration is more difficult because the underlying nonrigid We present a new point matching algorithm for robust transformations are often unknown, complex, and hard to nonrigid registration. The method iteratively recovers the model [6]. But nonrigid registration is very important be- point correspondence and estimates the transformation be- cause it is required for many real world tasks including tween two point sets. In the first step of the iteration, fea- hand-written character recognition, shape recognition, de- ture descriptors such as shape context are used to establish formable motion tracking and medical . rough correspondence. In the second step, we estimate the In this paper, we focus on the nonrigid case and present transformation using a robust estimator called L2E. This is a robust algorithm for nonrigid point set registration. There the main novelty of our approach and it enables us to deal are two unknown variables we have to solve for in this prob- with the noise and which arise in the correspon- lem: the correspondence and the transformation. Although dence step. The transformation is specified in a functional solving for either variable without information regarding space, more specifically a reproducing kernel Hilbert space. the other is difficult, an iterated estimation framework can We apply our method to nonrigid sparse image feature cor- be used [4,3,6]. In this iterative process, the estimate of the respondence on 2D images and 3D surfaces. Our results correspondence is used to refine the estimate of the trans- quantitatively show that our approach outperforms state-of- formation, and vice versa. But a problem arises if there the-art methods, particularly when there are a large num- are errors in the correspondence which occurs in many ap- ber of outliers. Moreover, our method of robustly estimating plications particularly if the transformation is large and/or transformations from correspondences is general and has there are outliers in the data (e.g., data points that are not many other applications. undergoing the non-rigid transformation). In this situation, the estimate of the transformation will degrade badly un- less it is performed robustly. The main contribution of our 1. Introduction approach is to robustly estimate the transformations from the correspondences using a robust estimator named the L - Point set registration is a fundamental problem which 2 Minimizing Estimate (L E)[20,2]. frequently arises in , medical image analy- 2 More precisely, our approach iteratively recovers the sis, and [5,4,6]. Many tasks in these point correspondences and estimates the transformation be- fields – such as stereo matching, shape matching, image tween two point sets. In the first step of the iteration, fea- registration and content-based image retrieval – can be for- ture descriptors such as shape context are used to estab- mulated as a point matching problems because point repre- lish correspondence. In the second step, we estimate the sentations are general and easy to extract [5]. The points transformation using the robust estimator L E. This esti- in these tasks are typically the locations of interest points 2 mator enable us to deal with the noise and outliers in the extracted from an image, or the edge points sampled from correspondences. The nonrigid transformation is modeled a shape contour. The registration problem then reduces to in a functional space, called the reproducing kernel Hilbert determining the correct correspondence and to find the un- space (RKHS) [1], in which the transformation function has derlying spatial transformation between two point sets ex- an explicit kernel representation. tracted from the input data. The registration problem can be categorized into rigid or 1.1. Related Work nonrigid registration depending on the application and the form of the data. Rigid registration, which only involves a The iterated closest point (ICP) algorithm [4] is one small number of parameters, is relatively easy and has been of the best known point registration approaches. It uses

1 nearest-neighbor relationships to assign a binary correspon- 2.1. Problem Formulation Using L2E: Robust Esti- dence, and then uses estimated correspondence to refine the mation transformation. Belongie et al.[3] introduced a method for Parametric estimation is typically done using maximum registration based on the shape context descriptor, which likelihood estimation (MLE). It can be shown that MLE incorporates the neighborhood structure of the point set is the optimal estimator if it is applied to the correct proba- and thus helps establish correspondence between the point bility model for the data (or to a good approximation). But sets. But these methods ignore robustness when they re- MLE can be badly biased if the model is not a sufficiently cover the transformation from the correspondence. In re- good approximation or, in particular, there are a significant lated work, Chui and Rangarajan [6] established a general fraction of outliers. In many point matching problems, it framework for estimating correspondence and transforma- is desirable to have a robust estimator of the transformation tions for nonrigid point matching. They modeled the trans- f because the point correspondence set S usually contains formation as a thin-plate spline and did robust point match- outliers. There are two choices: (i) to build a more complex ing by an algorithm (TRS-RPM) which involved determin- model that includes the outliers – which is complex since it istic annealing and soft-assignment. Alternatively, the co- involves modeling the process using extra (hidden) herence point drift (CPD) algorithm [17] uses Gaussian ra- variables which enable us to identify and reject outliers, dial basis functions instead of thin-plate splines. Another or (ii) to use an estimator which is different from MLE interesting point matching approach is the kernel correla- but less sensitive to outliers, as described in Huber’s robust tion (KC) based method [22], which was later improved statistics [8]. In this paper, we use the second method and in [9]. Zheng and Doermann [27] introduced the notion adopt the L E estimator [20,2], a robust estimator which of a neighborhood structure for the general point matching 2 minimizes the L distance between densities, and is par- problem, and proposed a matching method, the robust point 2 ticularly appropriate for analyzing massive data sets where matching-preserving local neighborhood structures (RPM- data cleaning (to remove outliers) is impractical. More for- LNS) algorithm. Other related work includes the relaxation mally, L E estimator for model f(x θ) recommends esti- labeling method, generalized in [11], and the graph match- 2 | mating the parameter θ by minimizing the criterion: ing approach for establishing feature correspondences [21]. n The main contributions of our work include: (i) we pro- Z 2 X L E(θ) = f(x θ)2dx f(x θ). (1) pose a new robust algorithm to estimate a spatial transfor- 2 | − n i| mation/mapping from correspondences with noise and out- i=1 liers; (ii) we apply the robust algorithm to nonrigid point To get some intuition for why L2E is robust, observe that its set registration and also to sparse image feature correspon- penalty for a low probability point x is f(x θ) which is i − i| dence. much less than the penalty of log f(x θ) given by MLE − i| (which becomes infinite as f(x θ) tends to 0). Hence i| 2. Estimating Transformation from Corre- MLE is reluctant to assign low probability to any points, spondences by L2E including outliers, and hence tends to be biased by out- liers. By contrast, L E can assign low probabilities to Given a set of point correspondences S = (x , y ) n , 2 { i i }i=1 many points, hopefully to the outliers, without paying too which are typically perturbed by noise and by outlier points high a penalty. To demonstrate the robustness of L E, we which undergo different transformations, the goal is to esti- 2 present a line-fitting example which contrasts the behavior mate a transformation f : y = f(x ) and fit the inliers. i i of MLE and L E, see Fig.1. The goal is to fit a linear re- In this paper we make the assumption that the noise on 2 gression model, y = αx + , with residual  N(0, 1), the inliers is Gaussian on each component with zero mean ∼ by estimating α using MLE and L2E. This gives, re- and uniform standard deviation σ (our approach can be di- Pn spectively, αˆMLE = arg maxα log φ(yi αxi 0, 1) rectly applied to other noise models). More precisely, an h i=1 − | i √1 2 Pn inlier point correspondence (x , y ) satisfies y f(x ) and αˆL2E = arg minα 2 π n i=1 φ(yi αxi 0, 1) , i i i − i ∼ − − | N(0, σ2I), where I is an identity matrix of size d d, with d where φ(x µ, Σ) denotes the normal density. × | being the dimension of the point. The data y f(x ) n As shown in Fig.1, L E is very resistant when we con- { i − i }i=1 2 can then be thought of as a sample set from a multivariate taminate the data by outliers, but MLE does not show this 2 normal density N(0, σ I) which is contaminated by out- desirable property. L2E always has a global minimum at liers. The main idea of our approach is then, to find the approximately 0.5 (the correct value for α) but MLE’s es- largest portion of the sample set (e.g., the underlying inlier timates become steadily worse as the amount of outliers in- set) that “matches” the normal density model, and hence creases. Observe, in the bottom right figure, that L2E also estimate the transformation f for the inlier set. Next, we in- has a local minimum near α = 2, which becomes deeper as troduce a robust estimator named L2-minimizing estimate the number n of outliers increases so that the two minima (L2E) which we use to estimate the transformation f. become approximately equal when n = 200. This is ap-

2 25 50 our formulation. 20 40 We define an RKHS by a positive definite matrix- 15 d Hd d×d 30 valued kernel Γ : IR IR IR . The optimal trans- y y 10 × → 20 formation f which minimizes the L2E functional (2) then 5 Pn 10 takes the form f(x) = i=1 Γ(x, xi)ci [16, 26], where 0 0 the coefficient ci is a d 1 dimensional vector (to be de- −5 0 2 4 6 8 10 0 2 4 6 8 10 × x x termined). Hence, the minimization over the infinite di- 4 3 x 10 x 10 1 mensional Hilbert space reduces to finding a finite set of 0 100 100 120 120 n coefficients c . But in point correspondence problem the −1 140 −1 140 i 160 160 180 200 −2 180 point set typically contains hundreds or thousands of points, −3 200 −3 which causes significant complexity problems (in time and −5 log−likelihood −4 log−likelihood space). Consequently, we adopt a sparse approximation,

−7 −5 and randomly pick only a subset of size m input points m −6 −9 x˜ 0 1 2 3 4 5 0 0.5 1 1.5 2 2.5 i i=1 to have nonzero coefficients in the expansion of the α α { } 0.4 0.4 solution. This follows [18] who found that this approxima- α = 0.5 tion works well and that simply selecting a random subset 0 0 of the input points in this manner, performs no worse than

−0.4 −0.4 200 more sophisticated and time-consuming methods. There- L2E L2E 200 180 180 160 160 fore, we seek a solution of form 140 140 −0.8 −0.8 120 120 m

100 100 X −1.2 −1.2 0 1 2 3 4 5 0 0.5 1 1.5 2 2.5 f(x) = Γ(x, x˜i)ci. (3) α α i=1 m Figure 1. Comparison between L2E and MLE for linear re- The chosen point set x˜i are somewhat analogous to { }i=1 gression as the number of outliers varies. Top row: data sam- “control points” [5]. By including a regularization term for ples, where the inliers are shown by cyan pluses, and the out- imposing smooth constraint on the transformation, the L2E liers by magenta circles. The goal is to estimate the slope α of functional (2) becomes: the line model y = αx. We vary the number of data samples n n = 100, 120, , 200, by always adding 20 new outliers (re- 2 1 2 X 1 ··· L2E(f, σ ) = taining the previous samples). In the second column the outliers 2d(πσ)d/2 − n (2πσ2)d/2 i=1 are generated from another line model y = 2x. Middle and bot- Pm 2 yi− j=1 Γ(xi,x˜j )cj tom rows: the curves of MLE and L2E respectively. The MLE − 2 e k 2σ2 k + λ f , (4) estimates are correct for n = 100 but rapidly degrade as we add k kΓ outliers, see how the peak of the log-likelihood changes in the sec- where λ > 0 controls the strength of regularization, and ond row. By contrast, (see third row) L2E estimates α correctly 2 the stabilizer f Γ is defined by an inner product, e.g., even when half the data is outliers and also develops a local mini- 2 k k f Γ = f, f Γ. By choosing a diagonal decomposable mum to fit the outliers when appropriate (third row, right column). k k h i 2 Γ(x , x ) = e−βkxi−xj k I β Best viewed in color. kernel [26]: i j with determining the width of the range of interaction between samples (i.e. neighborhood size), the L2E functional (4) may be conve- propriate because, in this case, the contaminated data also niently expressed in the following matrix form: comes from the same linear parametric model with slope n 1 2 X 1 α = 2, e.g., y = 2x. L E(C, σ2) = 2 2d(πσ)d/2 − n (2πσ2)d/2 We now apply the L2E formulation in (1) to the point i=1 matching problem, assuming that the noise of the inliers T 2 yi −Ui,·C − T is given by a , and obtain the following e k 2σ2 k + λ tr(C ΓC), (5) functional criterion: where kernel matrix Γ IRm×m is called the Gram matrix n 2 ∈ −βkx˜i−x˜j k n×m 2 1 2 X 2  with Γij = Γ(˜xi, x˜j) = e , U IR with L2E(f, σ ) = φ yi f(xi) 0, σ I . 2 ∈ 2d(πσ)d/2 − n − | −βkxi−x˜j k i=1 Uij = Γ(xi, x˜j) = e , Ui,· denotes the i-th row (2) of the matrix U, C = (c , , c )T is the coefficient ma- 1 ··· m We model the nonrigid transformation f by requiring it to trix of size m d, and tr( ) denotes the trace. × · lie within a specific functional space, namely a reproducing 2.2. Estimation of the Transformation kernel Hilbert space (RKHS) [1, 24, 16]. Note that other pa- rameterized transformation models, for example, thin-plate Estimating the transformation requires taking the deriva- splines (TPS) [23, 15], can also be easily incorporated into tive of the L2E cost function, see equation (5), with respect

3 Algorithm 1: Estimation of Transformation from Cor- enables our method to be applied to large scale data. respondences n 2.3. Implementation Details Input: Correspondence set S = (xi, yi) , { }i=1 parameters γ, β, λ The performance of point matching algorithms depends, Output: Optimal transformation f typically, on the coordinate system in which points are ex- 1 Construct Gram matrix Γ and matrix U; pressed. We use data normalization to control for this. More 2 2 Initialize parameter σ and C; specifically, we perform a linear re-scaling of the correspon- 3 Deterministic annealing: dences so that the points in the two sets both have zero mean 4 Using the gradient (6), optimize the objective and unit variance. function (5) by a numerical technique (e.g., the We define the transformation f as the initial position plus quasi-Newton algorithm with C as the old value); a displacement function v: f(x) = x+v(x) [17], and solve 2 5 Update the parameter C arg minC L2E(C, σ ); for v instead of f. This can be achieved simply by setting 2 2 ← 6 Anneal σ = γσ ; the output y to be y x . i i − i 7 The transformation f is determined by equation (3). Parameter settings. There are three main parameters in this algorithm: γ, β and λ. The parameter γ controls the an- nealing rate. The parameters β and λ control the influence to the coefficient matrix C, which is given by: of the smoothness constraint on the transformation f. In general, we found our method was very robust to parameter T ∂L2E 2U [V (W 11×d)] changes. We set γ = 0.5, β = 0.8 and λ = 0.1 through- = ◦ ⊗ + 2λΓC, (6) 2 ∂C nσ2(2πσ2)d/2 out this paper. Finally, the parameter σ and C in line 2 of algorithm1 were initialized to 0.05 and 0 respectively. where V = UC Y and Y = (y , , y )T are matrices − 1 ··· n of size n d, W = exp diag(VV T)/2σ2 is an n 1 3. Nonrigid Point Set Registration × { } × dimensional vector, diag( ) is the diagonal of a matrix, 1 × · 1 d n is an 1 d dimensional row vector of all ones, denotes the Point set registration aims to align two point sets xi i=1 × ◦ l { } Hadamard product, and denotes the Kronecker product. (the model point set) and yj j=1 (the target point set). ⊗ { } By using the derivative in equation (6), we can employ Typically, in the nonrigid case, it requires estimating a non- efficient gradient-based numerical optimization techniques rigid transformation f which warps the model point set to such as the quasi-Newton method and the nonlinear con- the target point set. We have shown above that once we jugate gradient method to solve the optimization problem. have established the correspondence between the two point But the cost function (5) is convex only in the neighborhood sets even with noise and outliers, we are able to estimate the of the optimal solution. Hence to improve convergence we underlying transformation between them. Next, we discuss use a coarse-to-fine strategy by applying deterministic an- how to find correspondences between two point sets. nealing on the inlier noise parameter σ2. This starts with 3.1. Establishment of Point Correspondence a large initial value for σ2 which is gradually reduced by σ2 γσ2, where γ is the annealing rate. Our algorithm is Recall that our method described above does not jointly 7→ outlined in algorithm1. solve the transformation and point correspondence. In order Computational complexity. By examining equations (5) to use algorithm1 to solve the transformation between two and (6), we see that the costs of updating the objective func- point sets, we need initial correspondences. tion and gradient are both O(dm2 + dmn). For the nu- In general, if the two point sets have similar shapes, the merical optimization method, we choose the Matlab Opti- corresponding points have similar neighborhood structures mization toolbox, which implicitly uses the BFGS Quasi- which could be incorporated into a feature descriptor. Thus Newton method with a mixed quadratic and cubic line finding correspondences between two point sets is equiv- search procedure. Thus the total complexity is approxi- alent to finding for each point in one point set (e.g., the mately O(dm2 + dm3 + dmn). In our implementation, model) the point on the other point set (e.g., the target) that the number m of the control points required to construct the has the most similar feature descriptor. Fortunately, the ini- transformation f in equation (3) is in general not large, and tial correspondences need not be very accurate, since our so use m = 15 for all the results in this paper (increasing method is robust to noise and outliers. Inspired by these m only gave small changes to the results). The dimension d facts, we use shape context [3] as the feature descriptor in of the data in feature point matching for vision applications the 2D case, using the Hungarian method for matching with is typically 2 or 3. Therefore, the complexity of our method the χ2 test statistic as the cost measure. In the 3D case, the can be simply expressed as O(n), which is about linear in spin image [10] can be used as a feature descriptor, where the number of correspondences. This is important since it the local similarity is measured by an improved correlation

4 Algorithm 2: Nonrigid Point Set Registration without iteration, since we focus on determining the right n l correspondences which does not need precise recovery of Input: Two point sets xi i=1, yj j=1 { } { } n the underlying transformation, and our approach then plays Output: Aligned model point set xˆi { }i=1 a role of rejecting outliers. 1 Compute feature descriptors for the target point set y l ; { j}j=1 4. Experimental Results 2 repeat 3 Compute feature descriptors for the model point In order to evaluate the performance of our algorithm, set x n ; we conducted two types of experiments: i) nonrigid point { i}i=1 4 Estimate the initial correspondences based on the set registration for 2D shapes; ii) sparse image feature cor- feature descriptors of two point sets; respondence on 2D images and 3D surfaces. 5 f Solve the transformation warping the model 4.1. Results on Nonrigid Point Set Registration point set to the target point set using algorithm1; n n 6 Update model point set x f(x ) ; We tested our method on the same synthesized data as in { i}i=1 ← { i }i=1 7 until reach the maximum iteration number; [6] and [27]. The data consists of two different shape mod- n els: a fish and a Chinese character. For each model, there 8 The aligned model point set xˆi i=1 is given by n { } are five sets of data designed to measure the robustness of f(xi) i=1 in the last iteration. { } registration algorithms under deformation, occlusion, rota- tion, noise and outliers. In each test, one of the above distor- tions is applied to a model set to create a target set, and 100 coefficient. Then the matching is performed by a method samples are generated for each degradation level. We use which encourages geometrically consistent groups. the shape context as the feature descriptor to establish initial The two steps of estimating correspondences and trans- correspondences. It is easy to make shape context transla- formations are iterated to obtain a reliable result. In this tion and scale invariant, and in some applications, rotation paper, we use a fixed number of iterations, typically 10 but invariance is also required. We use the rotation invariant more when the noise is big or when there are a large per- shape context as in [27]. centage of outliers contained in the original point sets. We Fig.2 shows the registration results of our method on summarize our point set registration method in algorithm2. solving different degrees of deformations and occlusions. 3.2. Application to Image Feature Correspondence As shown in the figure, we see that for both datasets with moderate degradation, our method is able to produce an The image feature correspondence task aims to find vi- almost perfect alignment. Moreover, the matching perfor- sual correspondences between two sets of sparse feature mance degrades gradually and gracefully as the degree of points x n and y l with corresponding feature de- { i}i=1 { j}j=1 degradation in the data increases. Consider the results on scriptors extracted from two input images. In this paper, the occlusion test in the fifth column, it is interesting that we assume that the underlying relationship between the in- even when the occlusion ratio is 50 percent our method can put images is nonrigid. Our method for this task is to esti- still achieve a satisfactory registration result. Therefore our mate correspondences by matching feature descriptors us- method can be used to provide a good initial alignment for ing a smooth spatial mapping f. More specifically, we first more complicated problem-specific registration algorithms. estimate the initial correspondences based on the feature de- To provide a quantitative comparison, we report the re- scriptors, and then use the correspondences to learn a spatial sults of four state-of-the-art algorithms such as shape con- mapping f fitting the inliers by algorithm1. text [3], TPS-RPM [6], RPM-LNS [27], and CPD [17] Once we have obtained the spatial mapping f, we then which are implemented using publicly available codes. The have to establish accurate correspondences. We prede- registration error on a pair of shapes is quantified as the fine a threshold τ and judge a correspondence (xi, yj) to average Euclidean distance between a point in the warped be an inlier provided it satisfies the following condition: model and the corresponding point in the target. Then the 2 2 e−kyj −f(xi)k /2σ > τ. We set τ = 0.5 in this paper. registration performance of each algorithm is compared by Note that the feature descriptors in the point set reg- the mean and standard deviation of the registration error of istration problem are calculated based on the point sets all the 100 samples in each distortion level. The statistical themselves, and are recalculated in each iteration. How- results, error means, and standard deviations for each set- ever, the descriptors of the feature points here are fixed ting are summarized in the last column of Fig.2. In the and calculated from images in advance. Hence the iterative deformation test results (e.g., 1st and 3rd rows), five algo- technique for recovering correspondences, estimating spa- rithms achieve similar registration performance in both fish tial mapping, and re-estimating correspondences can not be and Chinese character at low deformation levels, and our used here. In practice, we find that our method works well method generally gives better performance as the degree of

5 CVPR CVPR #*** #*** CVPR 2013 Submission #***. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

756 810 757 CVPR 811 CVPR 758 #*** 812 #*** 759 CVPR 2013 Submission #***. CONFIDENTIALCVPR REVIEW COPY. DO NOT DISTRIBUTE. 813 CVPR #*** #*** 760 CVPR 2013 Submission #***. CONFIDENTIAL REVIEW COPY.814 DO NOT DISTRIBUTE. 761 815

762 864 756 0.08 816 918 810 757 SC 811 763 865 TPS−RPM 817 919 0.06 866 758 RPM−LNS 920 812 764 759 CPD 818 813 765 867 760 0.04 Ours 819 921 814 761 815 766 868 0.02 820 922

762 Error 816 869 763 923 817 767 0 821 764 818 768 870 822 924 765 −0.02 819 769 871 766 823 925 820 872 767 −0.04 926 821 770 0.02 0.035 0.05 0.065 8240.08 768 Degree of Deformation 822 771 873 769 825 927 823 770 824 772 874 SC 826 928 771 0.12 TPS−RPM 825 773 875 772 RPM−LNS 827 929 826 876 773 CPD 930 827 774 0.08 Ours 828 877 774 931 828 775 775 829 829

776 878 776 Error 0.04 830 932 830 777 879 777 831 933 831 778 832 0 778 880 779 832 934 833 779 881 780 833 935 834 781 −0.04 835 0.1 0.2 0.3 0.4 0.5 780 882 782 Occlusion Ratio 834 936 836 783 837 781 883 0.08 835 937 784 SC 838 782 884 785 TPS−RPM 836 938 839 0.06 786 RPM−LNS 840 783 885 CPD 837 939 787 0.04 Ours 841 784 886 788 838 940 842 785 887 789 0.02 839 941 843 Error 786 888 790 840 942 844 791 0 845 787 889 792 841 943 846 −0.02 788 890 793 842 944 847 794 848 −0.04 789 891 795 0.02 0.035 0.05 0.065 8430.08 945 849 790 892 796 Degree of Deformation 844 946 850 797 851 791 893 SC 845 947 798 0.16 852 799 TPS−RPM 853 792 894 RPM−LNS 846 948 800 CPD 854 793 0.12 847 895 801 Ours 949 855 794 896 802 848 950 856 803 0.08 857 795 Error 849 897 804 951 858 796 898 805 0.04 850 952 859 797 899 806 851 953 860 807 861 0 798 900 808 852 954 862

809 0.1 0.2 0.3 0.4 0.5 863 799 901 Occlusion Ratio 853 955 800 902 854 956 Figure8 5. . 801 Figure 4.903 Point matchingFigure 2. results Point on set dataset registration of Chinese results character of our method[4]. From on the topfish to(top)Figure bottom, and 6. experimentsChinese . character on deformation,(bottom) shapes rotation, [6 noise,, 27], with deformation855 and 957 802 occlusion904 and outliersocclusion presented presented in every in twoevery rows. two rows.For each The group goal ofis to experiment, align the model the upper point figure set (blue is the pluses) model ontoand target the target point point sets, setand (red circles).856 For each 958 803 the lower905 figure isgroup the registration of experiments, result. From the upper left to figure right, is increasing the model degree and target of degradation. point sets, and the lower figure is the registration result. From857 left to right, 959 804 906 increasing[17] L. Yang, degree P. of Meer, degradation. and D. The J. Foran. rightmost Unsupervised figures are comparisons Seg- of the2984, registration 2011.2 performance of our method with shape858 context 960 805 907 (SC) [3mentation], TPS-RPM Based [6], RPM-LNS on Robust [27 Estimation] and CPD and [17] Color on the Ac- corresponding[20] datasets.Y. Zheng The and error D. bars Doermann. indicate the Robust registration Point859error Matching means 961 806 908 and standardtive Contour deviations Models. over 100 trials.IEEE Trans. on Information for Nonrigid Shapes by Preserving Local860 Neighbor- 962 807 909 Technology in Biomedicine, 9(3):475–486, 2005.3 hood Structures. TPAMI, 28(4):643–649, 2006.861 5,6 963 808 862 910 deformation[18] A. Zaharescu, increases. E. Boyer, In the K. occlusion Varanasi, test and results R. Horaud. (e.g., Note that our method is not affected by rotation which is 964 809 863 911 2nd andSurface4th rows), Feature we observe Detection that and our Description method shows with much Ap- not surprising because we use the rotation invariant shape 965 912 more robustnessplications comparedto Mesh Matching. with the other In CVPR four, algorithms. pages 373– context as the feature descriptor. We also performed exper- 966 913 More380, experiments 2009.4 on rotation, noise and outliers8 are also iments on 3D data and got similar results. 967 914 performed on the two shape models, as shown in Fig.3. In conclusion, our method is efficient for most non-rigid 968 915 From[19] theJ. Zhao, results, J. Ma, weagain J. Tian, see J. that Ma, our and method D. Zhang. is able A Ro- to point set registration problems with moderate, and in some 969 916 bust Method for Vector Field Learning with Applica- 970 generate good alignment when the degradation is moderate, cases severe, distortions. It can also be used to provide 917 tion to Mismatch Removing. In CVPR, pages 2977– 971 and the registration performance degrades gradually and a good initial alignment for more complicated problem- is still acceptable as the amount of degradation increases. specific registration algorithms. 9 6 CVPR CVPR #688 #688 CVPR 2013 Submission #688. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

864 918 865 919 866 920 867 921 868 922 869 923 870 924 871 925 872 926 873 927 874 928 875 929 876 930 877 931 878 932 879 933 880 934 881 935 882 936 883 937 884 938 885 939 886 940 887 941 888 942 889 943 890 944 891 945

CVPR 892 CVPR CVPR CVPR 946 #*** #*** #*** #*** CVPR893 2013 Submission #***. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.CVPR 2013 Submission #***. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE. 947 894 948 756 Figure 4.Figure Results 6.756 Results of image on 3D feature surfaces correspondence of deformable810 objects on 2D (INRIA imageDance-1 pairs ofsequence). deformable Top:810 resultsobjects. on frames From525 leftand to right,527; bottom: increasing frames degree of 757 895 757 811 811 949 758 CVPR 530 and 758550. For each group, the left pair denotes812 the identified suspectCVPR inliers, and the812 right pair denotes the removed suspect outliers. deformation. The759 inlier percentages in the initial correspondences are 79.61%, 56.57%813 , 51.84% and 45.71% respectively, and the corre- 759 #*** 813 #*** 760 896 760 814 814 950 CVPR 2013 Submission #***. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE. 761 sponding precision-recall761 pairs are (100.00%, 99.81573%), (99.06%, 99.53%), (99.09%815, 99.35%) and (100.00%, 98.96%) respectively. The 762 897 762 816 816 951 763 lines indicate matching763 results (blue = true positive,817 green = false negative, red = false817 positive). Best viewed in color. 648 702 764 898 764 818 818 952 765 649 765 819 703 819 766 650 899 766 820 704 820 953 767 651 900 767 821 705 821 954 768 652 768 822 706Table 1. Performance822 comparison on the image pairs in Fig.4. The 769 653 769 823 707 823 770 901 770 824 824 955 654 708values in the first row are the inlier percentages (%), and the pairs 771 771 825 825 655 709 772 902 772 826 826 956 773 656 773 827 710are the precision-recall827 pairs (%). 774 657 903 774 828 711 828 957 775 658 775 829 712 829 776 659 904 776 830 713Inlier830 79.61 56.57 51.84 45.71 958 777 660 905 777 831 714 831 959 778 661 778 832 715 832 779 779 833 ICF [12] (96.05,833 98.38) (83.95, 86.73) (80.43, 95.48) (75.42, 92.71) 662 716 780 906 780 834 834 960 663 717VFC [26] (100.00, 97.04) (98.59, 99.53) (98.09, 99.35) (98.94, 97.91) 781 781 835 835 782 664 907 782 836 718Ours (100.00,836 99.73) (99.06, 99.53) (98.09, 99.35) (100.00, 98.96)961 783 665 783 837 719 837 784 666 908 784 838 720 838 962 785 667 785 839 721 839 786 668 909 786 840 722 840 963 787 787 841 841 669 723 788 910 788 842 842 964 670 724 789 789 843 843 790 671 911 790 844 725 844 965 791 672 791 845 726 845 792 673 912 792 846 727(100.00%,84699.73%). On the rightmost pair, the deforma-966 793 674 793 847 728 847 794 675 913 794 848 729tion is relatively848 large and the inlier percentage in the initial967 795 676 795 849 730 849 796 Figure 4. Point matching resultsFigure914 on dataset 3. Point of Chinese matching character results.[4 From796]. From top topFigure to bottom,to bottom, 4. Point experimental experiments matching results on results deformation, on on dataset rotation, ofrotation,Chinese noise, noise, character occlusion[4850]. and From outliers top to presented bottom, experiments in every two on deformation, rotation, noise, 850 45.71% 968 677 occlusion and outliers presented in every two rows. For each group of experiment, the upper figure is the model and target731correspondences point sets, and is only about . In this case, our 797 occlusion and outliers presentedrows. in every For eachtwo rows. group For of each experiment, group of797 experiment, the upper figure the upper is the figure model is the and model target and point target sets, point and sets, the and lower figure851 is the registration result. From 851 the lower figure is the678 registration result. From left to right, increasing degree ofthe degradation. lower figure is the registration result. From left to right, increasing degree of degradation. 732 798 left915Figure to right, increasing 3. From degree oftop degradation.798 to bottom, results on rotation, noise852 and outliers 852 969 799 679 799 853 733method still853 obtains a good precision-recall pair (100.00%, 800 680 916presented in every800 two rows. For each group of experiments,854 the 734 854 970 801 681 801 855 73598.96%).855 Note that there are still a few false positives and 802 682 917upper figure is802 the data, and the lower figure is856 the registration 736 856 971 803 683 803 857 737false negatives857 in the results since we could not precisely 804 804 858 858 684 738 805 result. From left805 to right, increasing degree of degradation.859 859 685 739estimate the true warp functions between the image pairs 806 806 860 860 807 686 807 861 9 740 861 808 687 808 862 741in this framework.862 The average run time of our method on 809 688 809 863 742 863 689 743these image pairs is about 0.5 seconds on an Intel Pentium 690 4.2. Results on8 Image Feature Correspondence8 744 691 7452.0 GHz PC with Matlab code. 692 746 693 In this section, we perform experiments on real images, 747 In addition, we also compared our method to two state- 694 748 695 and test the performance of our method for sparse image 749of-the-art methods, such as identifying point correspon- 696 Figure 4. Point matching results on dataset of Chinese character [4]. From top to bottom, experiments on deformation, rotation, noise, 750 697 occlusionfeature and outliers correspondence. presented in every two rows. For each These group of experiment, images the upper contain figure is the model deformable and target point sets, and 751dences by correspondence function (ICF) [12] and vector 698 the lower figure is the registration result. From left to right, increasing degree of degradation. 752 699 objects and consequently the underlying relationships be- 753field consensus (VFC) [26]. The ICF uses support vector 700 754 701 tween the images are nonrigid. 755regression to learn a correspondence function pair which

Fig.4 contains a newspaper7 with different amounts of maps points in one image to their corresponding points in spatial warps. We aim to establish correspondences be- another, and then reject outliers by the estimated correspon- tween sparse image features in each image pair. In our eval- dence functions. While the VFC converts the outlier re- uation, we first extract SIFT [13] feature points in each in- jection problem into a robust vector field learning problem, put image, and estimate the initial correspondences based and learns a smooth field to fit the potential inliers as well on the corresponding SIFT descriptors. Our goal is then to as estimates a consensus inlier set. The results are shown in reject the outliers contained in the initial correspondences Table1. We see that all the three algorithms work well when and, at the same time, to keep as many inliers as possible. the deformation contained in the image pair is relatively Performance is characterized by precision and recall. slight. As the amount of deformation increases, the perfor- The results of our method are presented in Fig.4. For the mance of ICF degenerates rapidly. But VFC and our method leftmost pair, the deformation of the newspaper is relatively seem to be relatively unaffected even when the number of slight. There are 466 initial correspondences with 95 out- outliers exceeds the number of inliers. Still, our method liers, and the inlier percentage is about 79.61%. After using gains slightly better results compared to VFC. our method to establish accurate correspondences, 370 out Our next experiment involves feature point matching on of the 371 inliers are preserved, and simultaneously all the 3D surfaces. We adopt MeshDOG and MeshHOG [25] as 95 outliers are rejected. The precision-recall pair is about the feature point detector and descriptor to determine the

7 CVPR CVPR #*** #*** CVPR 2013 Submission #***. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

108 162 109 163 110 164 China Scholarship Council (No. 201206160008). Z. Tu 111 165 112 166 113 167 is supported by NSF CAREER award IIS-0844566, NSF 114 168 115 169 award IIS-1216528, and NIH R01 MH094343. 116 170 117 171 118 172 119 173 120 174 References 121 175 122 176 123 177 [1] N. Aronszajn. Theory of Reproducing Kernels. Trans. of the Ameri- 124 178 125 179 can Mathematical Society, 68(3):337–404, 1950. 126 180 127 181 [2] A. Basu, I. R. Harris, N. L. Hjort, and M. C. Jones. Robust and 128 182 129 183 Efficient Estimation by Minimising A Density Power Divergence. 130 184 131 185 Biometrika, 85(3):549–559, 1998. 132 186 133 187 [3] S. Belongie, J. Malik, and J. Puzicha. Shape Matching and Object 134 188 135 189 Recognition Using Shape Contexts. TPAMI, 24(24):509–522, 2002. 136 190 137 191 [4] P. J. Besl and N. D. McKay. A Method for Registration of 3-D 138 192 139 193 Shapes. TPAMI, 14(2):239–256, 1992. 140 194 141 Figure 1. Results on 3D surfaces of deformable objects (INRIA Dance-1 sequence, frames 525 and 527). There are 191 initial correspon- 195 [5] L. G. Brown. A Survey of Image Registration Techniques. ACM 142 dences, and 156 of which are preserved by our method. The left pair denotes the identified suspect inliers, and the right pair denotes the 196 143 removed suspect outliers. 197 Computing Surveys, 24(4):325–376, 1992. Figure144 5. Results on 3D surfaces of deformable objects (INRIA198 145 2. Estimating Spatial Mapping from Corre- the largest portion of the sample set (which corresponds to 199 [6] H. Chui and A. Rangarajan. A new point matching algorithm for Dance-1146 sequence). Top: results on frames 525 and 527; bottom:200 spondences by L2E the underlying inlier set) that “matches” the normal den- 147 sity model, and consequently estimate the spatial mapping 201 non-rigid registration. CVIU, 89:114–141, 2003. 148 202 framesGiven530 a setand of putative550 point. correspondences For eachS = group,f based on the estimatedthe left inlier set. pair Next, we introducedenotes a ro- the 149 n 203 [7] A. W. Fitzgibbon. Robust Registration of 2D and 3D Point Sets. (x , y ) which may be perturbed by noise and outliers, bust estimator named L2-minimizing estimate (L2E), and 150 { i i }i=1 204 the goal is to learn a spatial mapping f : y = f(x ) to fit use it to estimate the spatial mapping f. identified151 suspect inliers,i andi the right pair denotes the removed205 the inliers. Image and Vision Computing, 21:1145–1153, 2003. 152 206 2.1. Problem Formulation Using L2E suspect153 Without outliers. loss of generality, we make the assumption that 207 [8] P. J. Huber. Robust Statistics. John Wiley & Sons, New York, 1981. 154 the noise on the inliers is Gaussian on each component with Parametric estimation algorithms typically rely on max- 208 155 zero mean and uniform standard deviation σ, i.e., for an in- imum likelihood estimation (MLE). It is known that the 209 [9] B. Jian and B. C. Vemuri. Robust Point Set Registration Using Gaus- 156 lier point correspondence (x , y ) it satisfies y f(x ) maximum likelihood is well suited for the estimation prob- 210 i i i − i ∼ 157 N(0, σ2I), where I is an identity matrix of size d d, and d lem in which the model is a good descriptor for the data and 211 × sian Mixture Models. TPAMI, 33(8):1633–1645, 2011. 158 is the dimension of the point. The data set y f(x ) n all the data are indeed coming from the model. However, 212 { i − i }i=1 initial159 can correspondences.then be seen as a sample set from a multivariate For nor- thethe MLE dataset,estimates can be we badly biased use if the the model INRIAis not 213 [10] A. E. Johnson and M. Hebert. Using Spin-Images for Efficient Object 160 mal density N(0, σ2I) with some unknown contamination good enough or there are a small fraction of outliers. To 214 Dance-1161 outliers. Thesequence main idea of our technique [25 is], then, in to find whichestimate the each spatial mapping surfacef, a robust estimator is from is desir- the215 Recognition in Cluttered 3-D Scenes. TPAMI, 21(5):433–449, 1999.

2 [11] J.-H. Lee and C.-H. Won. Topology Preserving Relaxation Labeling same moving person. Note that it is hard to give a quan- for Nonrigid Point Matching. TPAMI, 33(2):427–432, 2011. titative performance comparison since the correctness of a [12] X. Li and Z. Hu. Rejecting Mismatches by Correspondence Func- correspondence is hard to decide. So we just schematically tion. IJCV, 89(1):1–17, 2010. show our results in Fig.5. In the upper group, the two [13] D. Lowe. Distinctive Image Features from Scale-Invariant Key- points. IJCV, 60(2):91–110, 2004. frames are nearby, and the level of deformation is relatively [14] B. Luo and E. R. Hancock. Structural Graph Matching Using slight. There are 191 initial correspondences, of which 156 the EM Algorithm and Singular Value Decomposition. TPAMI, are preserved after using our method to establish accurate 23(10):1120–1136, 2001. correspondences. In the lower group, the two frames are [15] J. Ma, J. Zhao, Y.Zhou, and J. Tian. Mismatch Removal via Coherent Spatial Mapping. In ICIP, 2012. far apart, and the level of deformation is relatively large, [16] C. A. Micchelli and M. Pontil. On Learning Vector-Valued Func- leading to less good initial correspondences. There are 23 tions. Neural Computation, 17(1):177–204, 2005. initial correspondences, and 18 of which are preserved by [17] A. Myronenko and X. Song. Point Set Registration: Coherent Point our method. Drift. TPAMI, 32(12):2262–2275, 2010. [18] R. Rifkin, G. Yeo, and T. Poggio. Regularized Least-Squares Clas- sification. In Advances in Learning Theory: Methods, Model and 5. Conclusion Applications, 2003. [19] S. Rusinkiewicz and M. Levoy. Efficient Variants of the ICP Algo- In this paper, we have presented a new approach for rithm. In 3DIM, pages 145–152, 2001. nonrigid point set registration. A key characteristic of our [20] D. W. Scott. Parametric Statistical Modeling by Minimum Integrated approach is the estimation of transformation from corre- Square Error. Technometrics, 43(3):274–285, 2001. spondences based on a robust estimator named L E. The [21] L. Torresani, V. Kolmogorov, and C. Rother. Feature Correspondence 2 via Graph Matching: Models and Global Optimization. In ECCV, computational complexity of estimation of transformation pages 596–609, 2008. is linear in the scale of correspondences. We applied our [22] Y. Tsin and T. Kanade. A Correlation-Based Approach to Robust method to sparse image feature correspondence, where the Point Set Registration. In ECCV, pages 558–569, 2004. [23] G. Wahba. Spline Models for Observational Data. SIAM, Philadel- underlying relationship between images is nonrigid. Exper- phia, PA, 1990. iments on a public dataset for nonrigid point registration, [24] A. L. Yuille and N. M. Grzywacz. A Mathematical Analysis of the 2D and 3D real images for sparse image feature correspon- Motion Coherence Theory. IJCV, 3(2):155–175, 1989. dence demonstrate that our approach yields results superior [25] A. Zaharescu, E. Boyer, K. Varanasi, and R. Horaud. Surface Feature Detection and Description with Applications to Mesh Matching. In to those of state-of-the-art methods when there is significant CVPR, pages 373–380, 2009. noise and/or outliers in the data. [26] J. Zhao, J. Ma, J. Tian, J. Ma, and D. Zhang. A Robust Method for Vector Field Learning with Application to Mismatch Removing. In Acknowledgment CVPR, pages 2977–2984, 2011. [27] Y. Zheng and D. Doermann. Robust Point Matching for Non- The authors gratefully acknowledge the financial support rigid Shapes by Preserving Local Neighborhood Structures. TPAMI, from the National Science Foundation (No. 0917141) and 28(4):643–649, 2006.

8