An Improved Discriminative Optimization for 3D Rigid Point Cloud Registration

arXiv:2104.08854v1 [cs.CV] 18 Apr 2021 ORA FL OF JOURNAL on lu eitain piiain uevsdSeque Supervised Optimization, Mapping. Registration, Cloud Point h xeietlrsl eosrtsoragrtmachiev algorithm and six accuracy mean-sqart-error. our registration with point demonstrates in algo it performance result comparable registration compared experimental cloud and The point dataset, State-Of-The-Art Bunny SensatUrban classical Stanford the Oxford on distribu DO a points’ and Improved proposed model of the the reweighte evaluated to We which side we according histogram ”back” addition, histogram the extended In or ”up- extended ”front-back”, ”clockwise-anticlockwise”. ”front” to we and ”front-back” paper, in from sides this indicate the points indicate histogram In scene of point. element of model the weights and clo histogram, the point registrati a two of cloud as (descriptor) point feature defined the 3D DO, in original the successful In much proved been has X 01 cetdXXX,2021. XX, XXX accepted 2021; XX, isl,frec oe on,teseepitcodwsonly DO. was original cloud point the scene in the point, shortcoming model each some an for registrati achieved Firstly, remains map. rigid which cloud there method, linear point But SSU 3D a in typical performance by by a outstanding proposed also regressed [2] is it is algorithm Jayakorn method, Optimization step Discriminative SSU fine-tuning The representative opti each the Regression a to and parameter Pose initialized as is the Cascaded such refined [1] progressively The method, The algorithm learning Method. maps. supervised (CPR) Square of a Least sequential in the a trained based are through maps learning esti iteratively (SSU) be updated to typical Updating parameters are the Sequential The architecture, this Supervised decades. in been algorithms, the has past is mehtod. algorithms the optimization selection optimization in manually based of studied learning the distribution the than the difficult Thus preferable know selecti is penalty is ideally the it Therefore, method not distortion. sometimes its can for and But dataset we used is optimum. that selec which of because to function), is judgement (loss optimization function the penalty of proper step a important An registration. T aoaoyo T,Cleeo lcrncSineadTechno and 4100 Science Changsha, Electronic Technology, of Defense wangj of College (e-mail: University ATR, National of Laboratory ugnf@uteuc;[email protected]). [email protected]; aucitrcie X X 01 eie X X 01adXX and 2021 XX, XXX revised 2021; XX, XXX received Manuscript .Wn,P ag .L,R uadJ uaewt h Key the with are Wu J. and Fu R. Li, B. Wang, P. Wang, J. ne Terms Index Abstract oto optrvso ak,sc spitcloud point in as such role tasks, key vision a computer plays of optimization most mathematical HE TeDsrmntv piiain(O algorithm (DO) Optimization Discriminative —The A T nIpoe iciiaieOptimization Discriminative Improved An E s4sn.o;70247q.o;libiao [email protected]; [email protected]; LS IE,VL 4 O ,NVME 2020 NOVEMBER 8, NO. 14, VOL. FILES, CLASS X Dsrmntv piiain ierRegression, Linear Optimization, —Discriminative o DRgdPitCodRegistration Cloud Point Rigid 3D for .I I. NTRODUCTION i ag igWn,Ba i ugn u n uzegWu Junzheng and Fu, Ruigang Li, Biao Wang, Ping Wang, Jia Crepnigato:Pn Wang.) Ping author: (Corresponding [email protected]; 3 China. 73, nfree on rithms. down”, dwas ud mated mum, the d root- ntial tion. logy, on. on. es X s t h rgnlCDi l ae.TeKre orlto (KC) Correlation Kernel The cases. the all and in thoery, CPD Drift original outperform inference coherent BCPD the the Bayesian Point the that demonstrated variational discribes result Coherent experimental the it Bayesian in proposed, point drift cloud was scene a the algorithm point recently, of model (BCPD) the probability More registered the posterior cloud. method from the distribution, CPD maximizing generated by The probability is cloud. sampling mixed-Gaussian mixed-Gaussian point the Point as the the Coherent cloud where point The from scene established. avoid the point To treats be [8] step. must bet (CPD) tuning interaction s Drift cloud fine global paramters, the point the initialized in initialization, two the used of to often influence sensitive is the is ICP drawback it ICP the The that that The is convergence. [6],[7]. ICP until of variants alternatively transformat its and correspondence parameters and point the [5] solved algorithm (ICP) Point Closest Methods Traditional A. neural or [3] [1], maps approaches [4]. trained networks based the through learning and correspondence parameters the (heurist registration the 2) second- method predicted simultaneously contrast, derivation-free (first-order, registration the In or based algorithms). higher-order) pre-match optimal gradient or to the a order features through searching parameters handcrafted then be a can correspondence, applied algorithms appro traditional These which 1) categories: two proposed. into classified simply been have algorithms hti h oe ons itiuin fe hs w step cloud two point 3D these in prior After performance registration. DO’s known distribution. original the a points’ improved model to we the according is histograms that additio the ar In they re-weighted side. and we cloud, ”clockwise-anticlockwise” point two and two introduced the ”up-down” between which we sides algorithm, DO, relative DO original extra the improved i of from an failed performance be away proposed the will far improve DO To original is the case. cloud that this So point cloud. scene point be the model i would the when which (descriptor) vector histogram, feature sparse weighted a the Gaussian point. a Secondly, model as betw pairwise. that designed relationships piont of rotational 3D side the the omitted ”back” they or Therefore, ”front” into separated h ersnaietaiinlapoc sIterative is approach traditional representative The registration point of mass a decades, last the During I P II. REVIOUS W ORK aches, ween point een ion ed n, ic s, o n e s 1 JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, NOVEMBER 2020 2 method [9] aligns the densities of two point cloud through a solve the object pose estimation, and the learned maps not only ”kernel-trick”, it register the points pair by maximizing their can be applied in summation updating rules, but also other correlation. Robust Point Matching (RPM) [10] adopted a inversible composition rules. Cao et al. [23] also learned a soft assignment rather than direct assignment, so that it was sequence of regressors to minimized the residual of parameters robust to degradations than strict correlation KC method. in the boosting technology. Sun et al. [24] proposed a Iteratively Reweighted Least Squares (IRLS) [11] applied neural networks to learn the feature extraction and parameters various loss functions to overcome the outliers and large regression which is also a SSU method. Xiong et al. [3] rotation angle. Gaussian Mixture Model Registration method [25] present Supervised Descent Method, which learned the [12] reformulated the alignment as a statistical discrepency gradient descent directions that minimizes a non-linear least- minimizing, and the L2 distance was adopted since it was squares function in parameter space. The Supervised Descent differentiable and effecient. [13] constructed the point cloud Method minimizes the loss function without calculation of by a learned one-class support vector machine, then regressed Jacobian matrix. Rather than proposing a fixed sequential the parameters by minimizing L2 error between the support procedure for the optimizaiton, Zimmermann et al. selected vectors. [14] treated the point cloud as particles with universal the regressors from the pre-trained regressors, and concatenate gravitation, then registered them through a simulation of them into a new sequence [26]. The DO algorithm [2] also movement in universal gravitation field. Despite the classical inspired by the Supervised Descent Method, specially, it methods performed well, there remain some drawbacks, used the last learned regressor to infer the parameters until for example, 1) the traditional methods usually lack of termination condition occurs. generalization capability; 2) for a complicated transformation, 2) DNN-based Methods: The renaissance of Nerual the loss function may be non-convex and indifferentiable. Network attracted much attention of computer vison researchers. A gread deal of DNN-based point cloud registration approaches have been propsed. Because the point B. Learning-based Methods clouds is unordered and the deep neural network requires The learning based method can predicts the optimal a regular input data, deep learning was firstly applied in registration parameters in a sequential or direct manner. The points classification and segmentation. The representative learning based method can be classified into two categories: 1) architecture are PointNet [27] and PointNet++ [28], which is Supervised Sequential Update (SSU) based method and Deep simply consist of Multi-Layer Perceptron, Feature Transform Neural Network (DNN) based method. Block, and Max Pooling Layers. After that, the PointNet 1) Supervised Sequential Update: In the SSU architecture, was adopted to point cloud registration. Yasuhiro Aoki [29] the registration parameters can be updated through a sequential combined the Lucas & Kanade (LK) algorithm with PointNet of regressions. The regressors are learned from the training for point cloud registration. In original PointNet, the ”T- samples which is the point cloud pair and the optimal net” block is computed at each iteration. To improve the registration parameter. The trained regressor maps the point computational efficency, the authors replaced it with the Lucas cloud to an updating vector that points to the direction of & Kanade algorithm, and the the differential executation was global optimum. The first SSU method is a single regressor reduced to only once. X. Huang et al. [30] proposed a Feature proposed by Cootes et al.[15] for facial image alignment. Metric Registration network, they trained an encoder to extract They introduced an active appearance model which is learned features and a decoder to compute metric, then minimizing from the shape variation and texture variation of image, and the feature metric projection error to register point clouds. the image’s residual was minimized through a iteratively Comparing to the traditional geometrical error, the special regression. Jurie and Dhome [16] utilized boost logistic feature metric projection error is robust to points degradation. regression to estimate the current displacement of image Sheng Ao et al. proposed a conceptually simple but sufficiently features to register images. Cristinacce and Cootes [17] informative SpinNet [31]. By projecting the point cloud into introduced a GentleBoost technology to iteratively update the a cylindrical space, they designed a cylindrical convolutional registration parameters until the terminated condition meets. neural layers to extract a rotational invariant local feature. Bayro-Corrochano and Ortegon-Aguilar [18] [19] applied The experimental result shows that the SpinNet achieved Lie algebra thoery to describe the image projection and the best generalization ability across different benchmarks. affine transformation, so that the expression of projection Choy et al. present deep global registration [32], which and affine can be described in a linear vector space. Thus is consist of three differentiable modules: 1) a Unet-based the deriviation of loss function to rotation matrix becomes network to estimate a inlier likelihood as weight for each simple. Simultaneously, Tuzel et al. [20] combined Lie algebra correspondence; 2) a weighted Procrustes [33] algorithm to with the HOG feature to learn regressors. The methods predict the registration parameters; 3) a robust gradient-based mentioned above only have one regressor, and it would SE(3) optimizer to refine parameters. Gil Elbaz et al. [34] restrict the algorithm’s performance. So, the approaches with present super-points descriptor extracted from local cluster of a sequential maps are proposed. Saragih et al. [21] present points, then predict the coarse to fine correspondence between the Iterative Error Bound Minimisation (IEBM) method, they descriptors. Experimental result reveals that the super-points utilized V-support vector regression [22] to learn a sequence of descriptor aligned the large-scale and close-proximity point regressors, and it performed well on the non-rigid registration clouds successfully. Gojcic et al. [35] also present a coarse to task. Dollar et al. [1] present Cascade Poses Regression to fine method for multiview 3D point cloud registration. They JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, NOVEMBER 2020 3

proposed a novel Overlap Pooling Layer as the similarity Dk+1 is trained by all of the training samples through the metric estimation block to predict transformation parameters following formulation. and utilized the Iterative Reweighted Least Squares (IRLS) to N 2 2 refine the point correspondence and transformation parameters D 1 x(i) x(i) Dh˜ (i) x(i) D˜ k+1 = arg min ∗ − k + ( k ) + λ alternatively. Lu et al. [36] present DeepVCP which consist D˜ N 2 F Xi=1 of deep feature extraction layer, weighting layer, deep feature (5) embedding layer, and corresponding point generation layer. It The Dt+1 can be solved as follows: is a typical End-to-End point cloud registration network. N x(i) x(i) h(i) x(i) T 1 ( ∗ − k ) · ( k ) Dk+1 = (6) N h(i) x(i) T h(i) x(i) Xi=1 λ + ( k ) · ( k ) III. PROPOSED APPROACH For all , x(i) is updated to x(i) through Eq.(7) A. Original Discriminative Optimization i k k+1 x x D h x The mission of point cloud registration optimization k+1 = k − k+1 ( k) (7) algorithm is to find the optimal parameters as rapid and precise To register the M and S, the discriminative optimization as possible. The points registration can be described as: algorithm can be summarized in the following three steps. m x∗ T U x V T (1) • Firstly, for every model point i, fitting a local-plane at = arg minx {ϕ ( ( , ), )+ λ k k} each point mi upon its sixth nearest neighbor points. (i) T S×3 • Secondly, for every model point mi, the scene points S Where S = (s1, ..., sS) ∈ R is the scene point clouds, T was seperated into the ”front”or”back” side of the model and M = (m , ..., m ) ∈ RM×3 is the model point clouds. 1 M point. ϕ : RM×2×RS×2 → R is a similarity metric function, and λ is • Thirdly, for every model point m , the seperated two part a parameter that controls the tradeoff. A first order derivation i of scene points, which is on ”front” or ”back” side of optimizing step is: mi, are summerized (with Gaussian weighting) into the h h h R2NM ∂ϕ(x) histogram a and a+NM respectively, here ∈ , xk+1 = xk − µ (2) ∂x a =1, ..., NM . xk For model point m , h and h is the corresponding a a a+NM here, the parameter is updated in the opposite direction of the Gaussian weighted voting results that indicate the scene points gradient. In practice, the derivation of ϕ usually computational on the front and back side of ma. The histogram h, which is or undifferentialbe, and sometimes ϕ is non-convex. So that a descriptor, is formulated as follows. numerical optimization methods may be inefficient, especially, + T in a high dimension search space, the handcrafted optimization Sa = {sb|na (T (sb, x) − ma) > 0} (8) algorithm may trap into local minimum. − T S = {sb|n (T (sb, x) − ma) ≤ 0} (9) The Discriminative Optimization (DO) algorithm predicts a a 1 1 the registration parameter in a sequential updating manner, h x x 2 (10) [ ( ; S)]a = ( exp(− 2 kT (sb; ) − mak2)) which is proved to be a excellent method. Here, we recall z + σ sbX∈Sa the original discriminative optimization briefly. In the DO the current estimation of parameter is refined by the trained map h x 1 1 x 2 [ ( ; S)]a+NM = ( exp(− 2 kT (sb; ) − mak2)) D , it is expressed in Eq.3. z − σ k+1 sbX∈Sa (11) xk+1 = xk − Dk+1h(xk) (3)

p Where xk ∈ R is the k−th estimation of parameters. h(xk) B. Improved Discriminative optimization extracts feature from input data with the t−th estimated The original Discriminative Optimization method does not parameters xk. Exactly, the full expression of h is: embed enough information of model points in the descriptor, which restricts its performance in tough degradation datasets. h x M S(i) R3×NM R3×NS(i) Rp Rf ( k, , ): × × → (4) To utilize the relationships between model points and scene points (shown in Fig.1), we extend the descriptor h from the but we abbreviate it as h(x ), where M ∈ R3×NM is a ﬁxed k ”front ( +)- back ( −)” side, that we termed it as ”single- S(i) R3×NS(i) Sa Sa model point cloud and ∈ is the i − th scene + − ↑ ↓ binary”, to ”front (Sa )- back (Sa )”, ”up (Sa)- down (Sa)” point cloud. N , N (i) is the number of model point cloud M S and ”clockwise (S→)- anticlockwise (S←)” side, that we and scene point cloud respectively. D ∈ Rd×f is one of a a k+1 termed it as ”triple-binary”. In Fig.1, a model point M is the learned maps (D : Rf → Rd ) which regress the a k+1 depicted in yellow dot, and we paint those scene points pale feature h(x ) to an updating vector ∆x. Therefore, the DO is k green which are in ”front” or ”up” or ”clockwise” of M , a linear regeression method, and the updating continues until a and those in the opposite side are painted dark green. The the terminant condition is met. ”up-down” and ”clockwise-anticlockwise” sides are described Given a training dataset {(x(i), x(i), h(i))}n , where i 0 ∗ i=1 in Eq.(12) to Eq.(15). denotes the i−th instance, which includes the point cloud h(i) x(i) x(i) ↑ s feature , the initialization 0 , and the ground truth ∗ . Sa = { b|θT (sb,x) >θma } (12) JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, NOVEMBER 2020 4

↓ s Sa = { b|θT (sb,x) ≤ θma } (13) → s Sa = { b|ωT (sb,x) >ωma } (14) ← s Sa = { b|ωT (sb,x) >ωma } (15) Where θ, ω are elevation angle and azimuth angle of a 3D point, respectively. Here, we rewrite the whole histogram as follows. α 1 h x a s x m 2 (16) [ ( ; S)]a = ( exp(− 2 kT ( b; ) − ak2)) z + σ sbX∈Sa (a) front-back side h x 1 − αa 1 s x m 2 [ ( ; S)]a+NM = ( exp(− 2 kT ( b; ) − ak2)) z − σ sbX∈Sa (17) βa h x s x m [ ( ; S)]a+2NM = ′ ( (θT ( b; ) − θ a ) > 0 1) z ↑ sbX∈Sa (18) 1 − βa h x s x m [ ( ; S)]a+3NM = ′ ( (θT ( b; ) − θ a ) < 0 1) z ↓ sbX∈Sa (19) γa h x s x m [ ( ; S)]a+4NM = ′′ ( (ωT ( b; ) − ω a ) > 0 1) (b) up-down side z s → bX∈Sa (20) 1 − γa h x s x m [ ( ; S)]a+5NM = ′′ ( (ωT ( b; ) − ω a ) < 0 1) z ← sbX∈Sa (21) ′ ′′ Here, z, z , z is the normalization term, αa, βa, γa are weights as determined by the model point cloud, as the following equations.

+ αa = Na NM (22) ↑ βa = Na /NM (23) (c) clockwise-anticlockwise side → Fig. 1. the ”triple-binary” diagram γa = Na /NM (24)

+ Where, Na represents the number of subset of model point, whose position are in ”front” of the model point ma. Similarly, C. Lie groups and Lie algebras application ↑ Na represents the number of subset of model point, whose The rigid transformation contains a rotation matrix → elevation angle are larger than θma , and Na means the subset R and a translation vector t, since the Discriminative of model points’ number, the subset’s azimuth angle are larger Optimization method is a linear regressor, but the rigid then γma . transformation T = [R, t] described in Euclidian space is a Since we have encoded the elevation angle and azimuth nonlinear tranformation model, so that optimizing the rigid angle into the re-designed histogram, the information of transformation matrix would be complexed. Thanks to the descriptor has been enriched and would improving the training Lie groups and Lie algebras theory [37], which merges the efficiency and testing accuracy. Exactly, h is a descriptor rotation and translation operations into a linear vector space. which encodes the distribution of the scene point cloud on Here we briefly introduced the Lie theory and utilize it to the field of model points. It is noteworthy that we only add parametrize the rigid transformation. In Lie group theory, two ”binary” to divide the scene point cloud eventhough the the rotation matrix R ∈ R3×3 forms a special orthogonal points in a 3D space. Considering an arbitrary rotation in group SO(3), while the rigid transformation matrix T forms the 3D space, the rotated points can be restored after two a special Euclidian group SE(3). steps as follows. 1) rotate along with a proper elevation angle, which aligned the axis deviated Z′ with the original axis Z. SO(3) = R ∈ R3×3|RRT = I, det(R)=1 (25) 2) rotate along with a proper azimuth angle, which aligns n o the deviated X′ − Y ′ axises with the original X − Y axises. R t SE(3) = T = ∈ R4×4|R ∈ SO(3), t ∈ R3 As a consequencewe, we seperate the additional two ”binary” 0 1 according to elevation angle and azimuth angle. (26) JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, NOVEMBER 2020 5

The Lie algebras so(3), which is corresponding to the Lie D. Proof of Convergence group SO(3), is deﬁned as: A major question is that Does the sequential updating 3 3 3 method converge to the optimum? Here, we provide detailed so(3) = φ = θn ∈ R , φ∧ ∈ R × (27) provement. where n is the rotation direction unit vector and θ is the Theorem 1: Strict decrease in training error under a rotation angle, and sequence of update maps. Given a training dataset,

0 −φ3 φ2 N ∧ ∨ x(i) x(i) h(i) φ = Φ =  φ3 0 −φ1  , Φ = φ (28) 0 , ∗ , (40) n oi=1 −φ2 φ1 0   If there exists a linear map Dˆ ∈ Rp×f , such that, for each Then the relationship between R ∈ SO(3) and φ ∈ so(3) is: Dhˆ (i) x(i) x(i) x(i) i, is strictly monotone at ∗ , and if ∃i : k 6= ∗ , then the update rule: R = exp(φ∧) = exp(θn∧) (29) x(i) x(i) D h(i) x(i) k+1 = k − k+1 ( k ) (41) R ∨ (30) φ = ln( ) p×f With Dk+1 ∈ R obtained from: A Lie algebras vector convert to the Lie group rotation matrix N 2 through the famous Rodrigues formula Eq.(31). D 1 x(i) x(i) Dhˆ (i) x(i) k+1 = arg min ∗ − k + ( k ) (42) Dˆ N Xi=1 ∧ T ∧ R = exp(θn ) = cos θI + (1 − cos θ)nn + sin θn (31) It guarantees that the training error strictly decreases in each and the inverse is: iteration: N N R 2 2 tr( ) − 1 x(i) x(i) x(i) x(i) θ = arccos , Rn = n (32) ∗ − k+1 < ∗ − k (43) 2 Xi=1 Xi=1

The Lie algebras se(3) is deﬁned as Eq.(33) x(i) x(i) x(i) Proof: We assume that not all ∗ = k , otherwise all ∗ ρ are already at their optimal points. Thus, there exists an i such se(3) = ξ = ∈ R6,ρ ∈ R3, φ ∈ so(3) (33) that: φ i x(i) x(i) T Dhˆ ( ) x(i) ( k − ∗ ) ( k ) > 0 (44) and the conversion between T ∈ SE(3) and ξ ∈ se(3) are: We need to proof that: ∆ T = exp(ξ ) (34) N N 2 2 x(i) x(i) x(i) x(i) ∗ − k+1 < ∗ − k (45) T ∇ 2 2 ξ = ln( ) (35) Xi=1 Xi=1

This can be shown by letting: Where, ¯ ˆ φ∧ ρ D = αD (46) ξ∆ = Ξ = , Ξ∇ = ξ (36) 0T 0 where: β the vector in se(3) convert to Lie group matrix through the α = (47) Eq.(37). γ exp(φ∧) Jρ N T ∆ i i (i) i = exp(ξ )= T (37) x( ) x( ) T Dhˆ x( ) 0 1 β= ( k − ∗ ) ( k ) (48) Xi=1 where, J solved as follows: N 2 Dhˆ (i) x(i) sin θ sin θ 1 − cos θ γ= ( k ) (49) J = I + 1 − nnT + n∧ (38) 2 θ θ θ Xi=1

x(i) x(i) TDhˆ (i) x(i) and the conversion formula from SE(3) to se(3) is: Since there exists an i such that ( k − ∗ ) ( k ) > 0, both β and γ are both positive, so that α is also positive. The tr(R) − 1 training error decreases in each iteration as follows: θ = arccos , Rn = n, t = Jρ (39) 2 N 2 N 2 x(i) x(i) x(i) x(i) D h(i) x(i) ∗ − k+1 = ∗ − k + k+1 ( k ) Therefore, the rigid transformation can be parametrized from i=1 2 i=1 2 T ∈ R4×4 to φ ∈ R6, and the parameter regression and P N P 2 x (i) x(i ) Dh¯ (i) x(i) updating becomes: φ′ = φ + ∆φ, rather than a complicated ≤ ∗ − k + ( k ) i=1 2 formula: T′ T T . P (50) = f( , ∆ ) JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, NOVEMBER 2020 6

The inequality is because Dk+1 being the optimal matrix that Algorithm 1 Training phase minimizes the squared error: x(i) x(i) h(i) N Require: {( 0 , ∗ , )}i=1,K,λ K N 2 Ensure: {Dk} D 1 x(i) x(i) Dhˆ (i) x(i) k=1 k+1 = arg min ∗ − k + ( k ) (51) 1: for k =0 to K − 1 do Dˆ N i D X=1 2: Compute k+1 with (5). So that: 3: for i =1 to N do (i) (i) (i) N 2 4: Update x := x − D h(i)(x ). x(i) x(i) D h(i) x(i) k+1 k k+1 k ∗ − k + k+1 ( k ) 5: end for i=1 PN 2 (52) 6: end for x(i) x(i) Dhˆ (i) x(i) ≤ ∗ − k + ( k ) i=1 P Algorithm 2 Testing phase and by recalling the formula: D¯ = αDˆ , We only need to Require: x , h, {D }K ,maxIter,ǫ proof: α = β > 1, where, 0 k k=1 γ Ensure: x N x x (i) (i) T ˆ (i) (i) 1: Set := 0 (x − x∗ ) Dh (x ) β k k 2: for k =1 to K do α = = iP=1 (53) N 2 3: Update x := x − Dkh(x) γ Dhˆ (i) x(i) ( k ) 4: end for i=1 2 P 5: Set iter := K +1. For the sake of proving α > 1, we only need to calculate 6: while kDK h(x)k≥ ǫ and iter ≤ maxIter do the square of L2 norm on the numerator and denominator 7: Update x := x − DK h(x) respectively. 8: Update iter := iter +1 N 2 x(i) x(i) T Dhˆ (i) x(i) 9: end while ( k − ∗ ) ( k ) i=1 2 P N 2 N 2 x(i) x(i) Dh ˆ (i) x(i) = ( k − ∗ ) + ( k ) It is noteworthy that the learned function must be strictly 2 2 i=1 i=1 ∗ P N P monotone at x , fortunately, it is possible in point registration. x(i) x (i) T Dhˆ ( i) x(i) (54) Theorem.1 + 2( k − ∗ ) ( k ) does not guarantee that the error of each sample Xi=1 reduced in each iteration, but it guarantees the reduction of >0 average error. The algorithm are summerized in Alg.1 and N 2 Alg.2. Here, ǫ means the terminated condition when the | Dhˆ (i) x(i) {z } ≥ ( k ) updating parameters’ norm is less than it, and maxIter for i=1 2 P terminating the inﬁnite infering. Frome the Eq.(54), we proved that α> 1, and the Theorem 1 can be proved completely as follows. IV. EXPERIMENTS AND DISCUSSION N 2 x(i) x(i) Dh¯ (i) x(i) ∗ − k + ( k ) In this section, we evaluate our improved DO by comparing i=1 2 it with other six State-Of-The-Art (SOTA) algortihms. They PN 2 2 x(i) x(i) Dh¯ ( i) x(i) are Coherent Point Drift (CPD) [8], Gaussian Mixture = ∗ − k + ( k ) 2 2 iP=1 Model Registration (GMMReg) [12], Iterarive Closest Point x(i ) x(i) T Dh¯ (i) x(i) + 2( ∗ − k ) ( k ) (ICP) [5], Kernel Correlation with Correspondence Estimation N N 2 2 (KCCE) [38], Bayesian Coherent Point Drift (BCPD) [39] x(i) x(i) Dhˆ (i) x(i) = ∗ − k + α ( k ) which is the latest method, and the Original Discriminative i=1 2 2 P Xi=1 Optimization (DO) [2]. To provide a quantitative comparison,

=α2γ we listed six algorthms’ experimental results on two evalution N | {z } criteria. All of the compared algorithms are implemented in x(i) x(i) T Dhˆ (i) x(i) +2α ( ∗ − k ) ( k ) the available source codes from their website. We tested all Xi=1 algorithms on the public available dataset. The experiments =−β are executed on a desktop equipped with Intel Xeon 2.5GHz, N 2 Nividia P5000 GPU, and 16G RAM. x|(i) x(i) {z2 } = ∗ − k + α γ − 2αβ i=1 2 PN 2 x(i) x(i) β2 β2 A. Experimental Dataset Preparation = ∗ − k + γ − 2 γ i=1 2 We tested our improved DO on two dataset (shown in Fig.2): NP 2 2 x(i) x(i) β i) 3D synthetic dataset [40]; ii) 3D real urban LIDAR dataset = ∗ − k − i=1 2 γ [41]. P >0 1) Synthetic dataset: Similar to the original DO paper, N 2 the synthetic experiments we used is the Stanford Bunny x(i) x(i) |{z} < ∗ − k dataset, which is consist of 35947 points. We utilized the off- i=1 2 P (55) the-shelf function in MATLAB to averagely pcdownsample JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, NOVEMBER 2020 7

-0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 settings: 1) Random Noise Standard Deviation: 0 to 0.05, 2) 0.18 0

0.16 Random Scene Point Number: 400 to 800, 3) Random Outliers

0.14 Number: 0 to 300, 4) Random Incomplete Ratio: 0 to 0.3, 5) 0.12 Random Rotation Angle: 0 to 90 (degree), and 6) Random 0.1 Translation: 0 to 0.3. Comparing with the perturbation of 0.08 testing samples, the perturbation level of training sample is 0.06 more slight. It is the advantage of the DO-based algorithm 0.04 than the deep neural network based algorithm. We trained 30

ground vegetation building wall bridge parking rail car footpath bike water traffic road street furniture maps: D1, D2, ..., D30. The trade-off parameter λ is set to (a) Synthetic Dataset (b) Real Dataset 0.0002, and we set the Gaussian Deviation σ2 to 0.03. Fig. 2. Experimental Point Cloud Datasets We compared the training error curves of our improved DO with that of the original DO. Both of the mean error and sample 514 points as the model point cloud M. For the standard deviation on 30000 training samples are depicted sake of highlight the competitiveness of our improved DO, in Fig.3.(a) and Fig.3.(b). We can see that the training we increased the difﬁculty level on all testing samples. We loss of our algorithm decreased faster than the original designed six types of perturbations on the dataset, and they DO, and achieved a high training precision. The training are noise, point number, outlier, incompleteness, rotation, and error’s standard deviation decrease in all training epoches translation. Especially, the translation perturbation is a basic indicates that our improved DO is superior to the DO in but indispensable case, which is absent in the DO paper. For learning. We randomly select 600 training samples and display each perturbation, one of the following setting is applied to the their parameters regression error (depicted in Fig.3.(c) and model set to create a scene point cloud. When we perturbed Fig.3.(e)), while the training error on the same samples one type, the rest types are set to our default values in the regressed by the DO are depicted in Fig.3.(d) and Fig.3.(f). The curly brace rather than the DO’s in the square bracket. Each ﬁgures demonstrate that most of the parameters are predicted perturbation level generated 100 testing samples. precisely in our improved DO but roughly in the original DO algorithm. 1) Noise Standard Deviation (NoiseSTD): the noise

standard deviation of all scene points ranges between 30000 Samples Training Error (Ours) 30000 Samples Training Error (DO) 1.4 1.4 mean mean 0 to 0.1; our default = 0.05, [DO’s default = 0]; std std 1.2 1.2

2) Scene Point Number (SceneNumber): the number of all 1 1 Norm) Norm) scene points ranges from 100 to 4000; our default = 400, 2 0.8 2 0.8 [DO’s default = 200 to 600]; 0.6 0.6 3) Outlier Number (Outlier): the outlier number ranges Training Error (L 0.4 Training Error (L 0.4 0.2

from 0 to 600; our default = 300,[DO’s default = 0]; 0 0.2 0 5 10 15 20 25 30 0 5 10 15 20 25 30 4) Incomplete Ratio (Incomplete): the incomplete ratio Training Epoches Training Epoches ranges from 0 to 0.7; our default = 0.3, [DO’s default = (a) Training Error (OURS) (b) Training Error (DO)

0]. Rotation Normal Training Error (Ours) Rotation Normal Training Error (DO) 2.5 2.5 d d 1 1 2 2 5) Rotation Angle (Rotation): the initial rotation angle d d 2 2 1.5 d d 3 1.5 3 ranges from 0 to 180 degrees (rotated along a random 1 1 Norm) Norm)

2 0.5 2 3D vector); our default = 60, [DO’s default = 0]; 0.5 0 0 6) Translation (Translation): the initial translation ranges -0.5 -0.5 -1 Training Error (L Training Error (L -1 from 0 to 1.0; our default = 0.3, [DO’s default = 0]. -1.5 -2 -1.5

2) Real dataset: We tested our algorithm on the -2.5 -2 0 100 200 300 400 500 600 0 100 200 300 400 500 600 latest LIDAR dataset [41], which is an urban-scale Training Samples Training Samples photogrammetric 3D point clouds dataset scanned from (c) Rotation Error (OURS) (d) Rotation Error (DO)

Birmingham, Cambridge, and York. This dataset includes three Translation Training Error (Ours) Translation Training Error (DO) 0.5 0.5 dx dx 2 0.4 billion annotated 3D points that covers 7.6 of the city 0.4 dy dy km dz dz 0.3 0.3 landscape. We use the Birmingham 3D LIDAR point cloud as 0.2

Norm) 0.2 Norm)

2 2 0.1 our experimental dataset. In contrast to the synthetic dataset 0.1 0

-0.1 experiment which added a sparse outliers in testing samples, 0 -0.2 Training Error (L -0.1 Training Error (L here, we added structured outliers on the real dataset instead -0.3 -0.2 -0.4 of the sparse outliers and the rest of the. We cropped part of -0.3 -0.5 0 100 200 300 400 500 600 0 100 200 300 400 500 600 the LIDAR point cloud and averagely sample a point cloud Training Samples Training Samples as model point. We also individually generated 100 testing (e) Translation Error (OURS) (f) Translation Error (DO) samples for each perturbation level. Fig. 3. Training Curves. We compared the learned maps and histogram from the DO B. Training and Evaluation Criteria and our improved DO, they are illustrated in Fig.4 and Fig.5. In training phase, we generated 30000 training samples, The Fig.4(a,c,e) plot the DO’s learned map D1, D15, and D30 each training sample is generated as the following perturbation by MATLAB imagesc, the ﬁgures revealed that the original JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, NOVEMBER 2020 8

Learned Map 1 (DO) Learned Map 1 (OURS) DO’s map is not a sparse matrix. Similarly, our trained maps 1 1 1.5 1 ploted in Fig.4(b,d,f) are not sparse matrix, which means the 0.8 0.6 2 1 2 extended ”up-down” and ”clockwise-anticlockwise” sides are 0.4 0.5 counted in the linear regression. Thus, by multiplicating the 3 3 0.2 0 0 4 front-back 4 front-back up-down clock-anticlock extended histogram, our map will collect more information to -0.2 -0.5 -0.4 predict the registration parameters more precise and efficient 5 5 -1 -0.6 than the original DO. 6 6 -0.8 -1.5 -1 To illustrate the both of the histogram’s evolution trend 100 200 300 400 500 600 700 800 900 1000 500 1000 1500 2000 2500 3000 during the inferring steps, the histogram of DO and improved (a) Map 1 (DO) (b) Map 1 (OURS) DO are ploted in two column in Fig.6. The first column is Learned Map 15 (DO) Learned Map 15 (OURS) DO’s histogram and the second column is ours. In order to 0.5 1 0.4 1 0.15 compare them with a baseline, here, we generate a reference 0.3 2 2 line which is the histogram that extracted from two identical 0.2 0.1 0.1 3 3 0.05 model point cloud. The reference line is painted in blue dot 0 0 line in Fig.5. All of the testing scene point cloud’s histogram 4 front-back -0.1 4 front-back up-down clock-anticlock -0.2 -0.05 5 5 will approach to the reference line, and none of them will be -0.3 -0.1 -0.4 identical with the reference line because of the perturbation. 6 6 -0.15 -0.5 But we can utilize it as a auxiliary criteria in registration. In the 100 200 300 400 500 600 700 800 900 1000 500 1000 1500 2000 2500 3000 first column of Fig.5, we can see that the histograms painted (c) Map 15 (DO) (d) Map 15 (OURS) in red color represent the ”front-back” sides histogram. When Learned Map 30 (DO) Learned Map 30 (OURS) 0.25 1 1 0.1 the inferring continues, they are approaching the reference- 0.2

0.15 2 2 line. In the second column of Fig.5, each ﬁgure contains three 0.05 0.1 parts of histograms that represent the ”front-back”, ”up-down”, 3 0.05 3 0 and ”clockwise-anticlockwise” sides respectively. Because the 0 4 front-back -0.05 4 front-back up-down clock-anticlock

”up-down” and ”clockwise-anticlockwise” sides histogram are -0.1 -0.05 5 5 global feature descriptor (i.e. a statistical feature), so that the -0.15 -0.2 -0.1 6 6 changes are not apparent than the local feature descriptor -0.25 100 200 300 400 500 600 700 800 900 1000 500 1000 1500 2000 2500 3000 ”front-back” historgram. The point cloud pair we tested is (e) Map 30 (DO) (f) Map 30 (OURS) coarsely registered at iteration 10 by our improved DO. Fig. 4. Comparison of the Learned Maps. Therefore the histogram from Fig.6(b) to Fig.6(d) clearly show the trends of ”up-down” histogram approaching the reference-line. Fig.6(d), Fig.6(f), and Fig.6(h) shown the ﬁne tuning progress. The evaluation criterias we utilized are: 1) mean PointRMSE on each perturbation level, then ploted Point Registration Accuracy (PointAcc), 2) Point Registration them in Fig.7 and Fig.8. Every sub-ﬁgure demonstrates all Root-Mean-Sqart-Error (PointRMSE). The PointAcc can be algorithms’ performance on different distortion. the distortions calculated as follows: include NoiseSTD, Outliers, PointNum, Incomplte, Rotation, and Translation. Fig.7 shows that our improved DO achieved Acc = num(|T(S, ˆx) − S∗|

10-3 Feature h @Iterative: 1 (DO) Feature h @Iterative: 1 (OURS) Initialization (DO) Initialization (OURS) 2.5 0.06 reference reference front-back front-back 0.05 up-down 2 clock-anticlockwise

0.04 0.5 0.5 1.5 0 0 0.03

1 -0.5 -0.5 0.02 -1 -1

0.5 1 1 0.01 0.5 1 0.5 1 0 0 0 0 -0.5 -0.5 0 0 0 200 400 600 800 1000 1200 0 500 1000 1500 2000 2500 3000 3500 -1 -1 (a) Iteration 1 (DO) (b) Iteration 1 (OURS) (a) Initization (DO) (b) Initization (OURS)

10-3 Feature h @Iterative:20 (DO) Feature h @Iterative:10(OURS) Registration Iteration @30 (DO) Registration Iteration @10 (OURS) 2.5 0.06 reference reference front-back front-back 0.05 up-down 2 clock-anticlockwise

0.04 0.5 0.5 1.5 0 0 0.03

1 -0.5 -0.5 0.02 -1 -1 0.5 0.01 1 1.5 1 1.5 0.5 1 0.5 1 0 0.5 0 0.5 0 0 0 0 -0.5 0 200 400 600 800 1000 1200 0 500 1000 1500 2000 2500 3000 3500 -0.5 -0.5 -0.5 (c) Iteration 20 (DO) (d) Iteration 10 (OURS) (c) Iteration 30 (DO) (d) Iteration 10 (OURS)

10-3 Feature h @Iterative:40 (DO) Feature h @Iterative:30(OURS) Final Reistration Iteration @121 (DO) Final Reistration Iteration @71 (OURS) 3.5 0.06 reference reference front-back front-back 3 0.05 up-down clock-anticlockwise 2.5 0.04 0.5 0.5

2 0 0 0.03

1.5 -0.5 -0.5 0.02 1 -1 -1 1 0.01 1 0.5 0.5 1 1.5 0.5 0 1 0 0.5 -0.5 0 0 0 0 -0.5 0 200 400 600 800 1000 1200 0 500 1000 1500 2000 2500 3000 3500 -1 -0.5 (e) Iteration 40 (DO) (f) Iteration 30 (OURS) (e) Iteration 121(Final) (DO) (f) Iteration 71(Final) (OURS) Fig. 6. Experimental Result on Synthetic Dataset. 10-3 Final Feature h @Iterative:109 (DO) Final Feature h @Iterative:71 (OURS) 2.5 0.06 reference reference front-back front-back 0.05 up-down 2 clock-anticlockwise TABLE I AVERAGE POINTACC ON SYNTHETIC DATASET 0.04 1.5 Average PointAcc 0.03 BCPD CPD GMMReg ICP KCCE DO OURS 1 NoiseStd 0.650 0.333 0.100 0.000 0.000 0.863 0.990 0.02 Outliers 0.683 0.326 0.129 0.000 0.000 0.889 0.994

0.5 PointNum 0.700 0.494 0.104 0.000 0.000 0.866 0.944 0.01 Incomplete 0.643 0.295 0.093 0.000 0.000 0.855 0.955 Rotation 0.380 0.176 0.093 0.001 0.123 0.487 0.621 0 0 0 200 400 600 800 1000 1200 0 500 1000 1500 2000 2500 3000 3500 Translation 0.662 0.242 0.102 0.000 0.000 0.545 0.755 (g) Iteration 109(Final) (DO) (h) Iteration 71(Final) (OURS) Fig. 5. Plots of Features. achevied the best performance in almost cases, except the translation degradation case. For each perturbation experiment, between model and scene pionts getting far away, the Gaussian we calculated the average PointAcc and average PointRMSE weight will be a inﬁnitesimal value, thus the histogram will result from each algorithm, and listed them in the Table.I and be a sparse vector, thus the regressor will be failed. When Table.II, and highlight the best result is in bold. The result the translation initial value above 0.7, the DO’s performance demonstrated that our proposed algorithm outperform than the declined dramatically. But our improved DO still better than other algorithms. DO because of the extra histograms. Fig.8 summarized the We tested our improved Disciminative Optimization statistical results of mean PointRMSE. As can be seen from algorithm on the 3D LIDAR Oxford SensatUrban dataset, the ﬁgure, ICP, GMMREG and KCCE performed poorly. and the experimental results of all algorithm are ploted in BCPD still performed better than CPD in all cases, and Fig.9 and Fig.10, respectively. Differ from the Stanford Bunny DO better than the previous algorithm. Our improved DO dataset which forms a closed surface, the Oxford SensatUrban JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, NOVEMBER 2020 10

NoiseStd Mean Accuracy Result Outliers Mean Accuracy Result NoiseStd Mean RMSE Result Outliers Mean RMSE Result 1 1 1.2 1.2 BCPD BCPD BCPD BCPD CPD CPD CPD CPD 0.9 GMMReg 0.9 GMMReg GMMReg GMMReg ICP ICP 1 ICP 1 ICP 0.8 KCCE 0.8 KCCE KCCE KCCE DObase DObase DObase DObase 0.7 OURS 0.7 OURS OURS OURS 0.8 0.8 0.6 0.6

0.5 0.5 0.6 0.6

0.4 0.4 0.4 0.4

0.3 0.3 Outliers Mean RMSE NoiseStd Mean RMSE Outliers Mean Accuracy NoiseStd Mean Accuracy 0.2 0.2 0.2 0.2 0.1 0.1

0 0 0 0 0 0.02 0.04 0.06 0.08 0.1 0 100 200 300 400 500 600 0 0.02 0.04 0.06 0.08 0.1 0 100 200 300 400 500 600 NoiseStd Outliers NoiseStd Outliers (a) NoiseStd Acc (b) Outliers (a) NoiseStd (b) Outliers

PointNum Mean Accuracy Result RemovePoint Mean Accuracy Result PointNum Mean RMSE Result RemovePoint Mean RMSE Result 1 1 1.4 1.2 BCPD BCPD BCPD BCPD CPD CPD CPD CPD 0.9 GMMReg 0.9 GMMReg GMMReg GMMReg 1.2 ICP ICP ICP 1 ICP 0.8 KCCE 0.8 KCCE KCCE KCCE DObase DObase DObase DObase 0.7 OURS 0.7 OURS 1 OURS OURS 0.8 0.6 0.6 0.8 0.5 0.5 0.6 0.6 0.4 0.4 0.4 0.3 0.3 0.4 PointNum Mean RMSE PointNum Mean Accuracy RemovePoint Mean RMSE

0.2 RemovePoint Mean Accuracy 0.2 0.2 0.2 0.1 0.1

0 0 0 0 0 1000 2000 3000 4000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 1000 2000 3000 4000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 PointNum RemovePoint PointNum RemovePoint (c) PointNum (d) Incomplete (c) PointNum (d) Incomplete

Rotation Mean Accuracy Result Translation Mean Accuracy Result Rotation Mean RMSE Result Translation Mean RMSE Result 1 1 1.5 1.8 BCPD BCPD BCPD BCPD CPD CPD CPD CPD 0.9 GMMReg 0.9 GMMReg GMMReg 1.6 GMMReg ICP ICP ICP ICP 0.8 0.8 KCCE KCCE KCCE 1.4 KCCE DObase DObase DObase DObase 0.7 OURS 0.7 OURS OURS OURS 1 1.2 0.6 0.6 1 0.5 0.5 0.8 0.4 0.4 0.5 0.6 0.3 0.3 Rotation Mean RMSE Translation Mean RMSE Rotation Mean Accuracy

Translation Mean Accuracy 0.4 0.2 0.2

0.1 0.1 0.2

0 0 0 0 0 50 100 150 200 0 0.2 0.4 0.6 0.8 1 0 50 100 150 200 0 0.2 0.4 0.6 0.8 1 Rotation Translation Rotation Translation (e) Rotation (f) Translation (e) Rotation (f) Translation Fig. 7. PointAcc on Dataset-1. Fig. 8. PointRMSE on Synthetic Dataset.

TABLE II TABLE IV AVERAGE POINTRMSE ON SYNTHETIC DATASET AVERAGE POINTRMSE ON REAL DATASET

Average PointRMSE Average PointRMSE BCPD CPD GMMReg ICP KCCE DO OURS BCPD CPD GMMReg ICP KCCE DO OURS NoiseStd 0.266 0.454 0.607 1.086 0.609 0.098 0.026 NoiseStd 0.217 0.400 0.641 1.122 0.494 0.132 0.024 Outliers 0.242 0.482 0.590 1.102 0.614 0.087 0.024 Outliers 0.177 0.414 0.627 1.128 0.495 0.132 0.020 PointNum 0.253 0.383 0.599 1.125 0.607 0.100 0.062 PointNum 0.237 0.343 0.619 1.144 0.497 0.130 0.045 Incomplete 0.301 0.528 0.612 1.071 0.631 0.105 0.047 Incomplete 0.242 0.420 0.637 1.128 0.515 0.169 0.049 Rotation 0.676 0.807 0.802 1.118 0.802 0.576 0.464 Rotation 0.635 0.754 0.793 1.154 0.757 0.626 0.430 Translation 0.291 0.609 0.751 1.270 0.708 0.405 0.273 Translation 0.291 0.609 0.751 1.270 0.708 0.405 0.273

Dataset is an urban-scale LIDAR dataset which is an un- low point registration accuracy and high point registration closed surface. The original DO performs well in the point RMSE in most of cases. The structured outliers generated cloud of closed surface. But for the unclosed surface dataset, a density mixture Gaussain which inﬂuenced the Gaussian- DO’s performance becomes close to that of the BCPD in the based approaches serverely. So that the performance of CPD, NoiseSTD, Outliers, PointNum, and Occalusion perturbation GMMReg, and KCCE are declined fastly. Our improved DO experiments. In the Rotation and Translation perturbation is robust to the shape of points, and we still achieved the experiments, the BCPD’s performance better than that of best performance in almost all testing cases. The Table.III and DO. That means the original DO is sensetive to the shape Table.IV listed the experimental results of average PointAcc of points. The ICP, GMMReg, KCCE, and CPD still has a and average PointRMSE on real dataset, which demonstrated that our algorithm outperformed the other algorithms.

TABLE III V. CONCLUSION AVERAGE POINTACC ON REAL DATASET In this paper, we proposed an improved Discriminative Average PointAcc Optimization against the orginal DO. By extending the BCPD CPD GMMReg ICP KCCE DO OURS NoiseStd 0.743 0.383 0.020 0.000 0.003 0.777 0.993 relationship between point cloud from ”front-back” to Outliers 0.789 0.369 0.049 0.000 0.011 0.783 0.991 ”front-back”, ”up-down”, and ”clockwise-anticlockwise”, the PointNum 0.728 0.516 0.052 0.000 0.004 0.808 0.952 Incomplete 0.718 0.328 0.055 0.000 0.005 0.743 0.953 information of histogram was enrich, so that it is of beneﬁt Rotation 0.417 0.176 0.084 0.000 0.175 0.404 0.648 to the linear regression in the improved discriminative Translation 0.662 0.242 0.102 0.000 0.000 0.545 0.755 optimization algorithm. We compared our improved DO with JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, NOVEMBER 2020 11

NoiseStd Mean Accuracy Result Outliers Mean Accuracy Result 1 1 BCPD BCPD other classical point cloud registration algorithms on two CPD CPD 0.9 GMMReg 0.9 GMMReg ICP ICP LIDAR dataset. The experimental results demonstrate that 0.8 KCCE 0.8 KCCE DObase DObase 0.7 OURS 0.7 OURS our improved DO performs well against The-State-Of-Art 0.6 0.6

0.5 0.5 methods. Especially, it outperform the original DO in all cases.

0.4 0.4

0.3 0.3 Outliers Mean Accuracy NoiseStd Mean Accuracy 0.2 0.2

0.1 0.1 REFERENCES

0 0 0 0.02 0.04 0.06 0.08 0.1 0 100 200 300 400 500 600 NoiseStd Outliers [1] P. Doll´ar, P. Welinder, and P. Perona, “Cascaded pose regression,” (a) NoiseStd (b) Outliers Proceedings of the IEEE conference on computer vision and pattern recognition, 2010. PointNum Mean Accuracy Result RemovePoint Mean Accuracy Result 1 1 BCPD BCPD [2] J. Vongkulbhisal, F. De la Torre, and J. P. Costeira, “Discriminative CPD CPD 0.9 GMMReg 0.9 GMMReg ICP ICP optimization: Theory and applications to point cloud registration,” in 0.8 KCCE 0.8 KCCE DObase DObase Proceedings of the IEEE conference on computer vision and pattern 0.7 OURS 0.7 OURS

0.6 0.6 recognition, 2017, pp. 4104–4112.

0.5 0.5 [3] D. l. T. F. Xiong X, “Supervised descent method and its applications 0.4 0.4 to face alignment,” in Proceedings of the IEEE conference on computer 0.3 0.3

PointNum Mean Accuracy vision and pattern recognition, 2013, pp. 532–539. 0.2 RemovePoint Mean Accuracy 0.2 [4] Y. Guo, H. Wang, Q. Hu, H. Liu, and M. Bennamoun, “Deep learning 0.1 0.1

0 0 for 3d point clouds: A survey,” Transactions on Pattern Analysis and 0 1000 2000 3000 4000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 PointNum RemovePoint Machine Intelligence, vol. PP, no. 99, pp. 1–1, 2020. (c) PointNum (d) Occlusion [5] P. J. Besl and H. Mckay, “A method for registration of 3-d shapes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, Rotation Mean Accuracy Result Translation Mean Accuracy Result 1 1 BCPD BCPD no. 2, pp. 239–256, 1992. CPD CPD 0.9 0.9 GMMReg GMMReg [6] S. Rusinkiewicz and M. Levoy, “Efﬁcient variants of the icp algorithm,” ICP ICP 0.8 KCCE 0.8 KCCE DObase DObase Proceedings Third International Conference on 3-D Digital Imaging and 0.7 OURS 0.7 OURS Modeling, 2001. 0.6 0.6

0.5 0.5 [7] A. W. Fitzgibbon, “Robust registration of 2d and 3d point sets,” Image

0.4 0.4 and vision computing, vol. 21, no. 13-14, pp. 1145–1153, 2003.

0.3 0.3

Rotation Mean Accuracy [8] A. Myronenko and X. Song, “Point set registration: Coherent point 0.2 Translation Mean Accuracy 0.2 drift,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 0.1 0.1

0 0 vol. 32, no. 12, pp. 2262–2275, 2010. 0 50 100 150 200 0 0.2 0.4 0.6 0.8 1 Rotation Translation [9] Y. Tsin and T. Kanade, “A correlation-based approach to robust point (e) Rotation (f) Translation set registration,” in European Conference on Computer Vision, 2004. [10] S. Gold, A. Rangarajan, C. P. Lu, S. Pappu, and E. Mjolsness, Fig. 9. Mean Accuracy Experimental Result on Dataset-2. “New algorithms for 2d and 3d point matching: pose estimation and correspondence,” Pattern recognition, vol. 31, no. 8, pp. 1019–1031,

NoiseStd Mean RMSE Result Outliers Mean RMSE Result 1998. 1.2 1.2 BCPD BCPD CPD CPD [11] P. Bergstr¨om and O. Edlund, “Robust registration of point sets using GMMReg GMMReg 1 ICP 1 ICP KCCE KCCE iteratively reweighted least squares,” Computational optimization and DObase DObase OURS OURS applications, vol. 58, no. 3, pp. 543–561, 2014. 0.8 0.8 [12] J. B. and V. B. C., “Robust point set registration using gaussian 0.6 0.6 mixture models,” IEEE transactions on pattern analysis and machine

0.4 0.4 intelligence, vol. 33, no. 8, pp. 1633–1645, 2010. Outliers Mean RMSE NoiseStd Mean RMSE [13] D. Campbell and L. Petersson, “An adaptive data representation 0.2 0.2 for robust point-set registration and merging,” in IEEE International 0 0 0 0.02 0.04 0.06 0.08 0.1 0 100 200 300 400 500 600 Conference on Computer Vision, 2016. NoiseStd Outliers [14] V. Golyanik, S. Aziz, and D. Stricker, “Gravitational approach for point (a) NoiseStd RMSE (b) Outliers RMSE set registration,” in International Conference on Computer Vision and Pattern Recognition, 2016. PointNum Mean RMSE Result RemovePoint Mean RMSE Result 1.4 1.2 BCPD BCPD CPD CPD [15] T. F. Cootes, G. J. Edwards, and C. J. Taylor, “Active appearance GMMReg GMMReg 1.2 ICP 1 ICP models,” in Proceedings of the 5th European Conference on Computer KCCE KCCE DObase DObase 1 OURS OURS Vision, 1998. 0.8

0.8 [16] F. Jurie and M. Dhome., “Boosted regression active shape models,” IEEE 0.6 transactions on pattern analysis and machine intelligence, vol. 24, no. 7, 0.6

0.4 p. 996–1000, 2002. 0.4 PointNum Mean RMSE

RemovePoint Mean RMSE [17] D. Cristinacce and T. F. Cootes., “Active appearance models,” in 0.2 0.2 Proceedings of British Machine Vision Conf., 2007.

0 0 0 1000 2000 3000 4000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 [18] E. Bayro-Corrochano and J. Ortegon-Aguilar, “Lie algebra template PointNum RemovePoint tracking,” in International Conference on Pattern Recognition, 2004. (c) PointNum RMSE (d) Occlusion RMSE [19] E. B. Corrochano and J. O. Aguilar, “Lie algebra approach for tracking and 3d motion estimation using monocular vision,” Image and Vision Rotation Mean RMSE Result Translation Mean RMSE Result 1.6 1.8 BCPD BCPD Computing, vol. 25, no. 6, pp. 907–921, 2007. CPD CPD 1.6 GMMReg 1.4 GMMReg ICP ICP [20] O. Tuzel, F. Porikli, and P. Meer., “Learning on lie groups for invariant KCCE 1.4 KCCE 1.2 DObase DObase OURS OURS detection and tracking,” in International Conference on Computer Vision 1.2 1 and Pattern Recognition, 2008. 1 0.8 0.8 [21] J. Saragih and R. Goecke, “Iterative error bound minimisation for aam

0.6 0.6 alignment,” in International Conference on Pattern Recognition, vol. 2. Rotation Mean RMSE 0.4 Translation Mean RMSE 0.4 IEEE, 2006, pp. 1196–1195.

0.2 0.2 [22] C. C. Chang and C. J. Lin, “Training v-support vector regression: Theory

0 0 and algorithms,” Neural Computation, vol. 14, no. 8, pp. p.1959–1977, 0 50 100 150 200 0 0.2 0.4 0.6 0.8 1 Rotation Translation 2002. (e) Rotation RMSE (f) Translation RMSE [23] X. Cao, Y. Wei, F. Wen, and J. Sun, “Face alignment by explicit shape regression,” International Journal of Computer Vision, vol. 107, no. 2, Fig. 10. Mean RMSE Experimental Result on Dataset-2. pp. 177–190, 2014. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, NOVEMBER 2020 12

[24] S. Yi, X. Wang, and X. Tang, “Deep convolutional network cascade for facial point detection,” in International Conference on Computer Vision and Pattern Recognition. IEEE, 2013. [25] X. Xiong and F. De la Torre, “Global supervised descent method,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2664–2673. [26] K. Zimmermann, J. Matas, and T. Svoboda, “Tracking by an optimal sequence of linear predictors,” Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 4, pp. 677–692, 2009. [27] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classiﬁcation and segmentation,” in Conference on Computer Vision and Pattern Recognition, 2017, pp. 652–660. [28] ——, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” in Conference on Neural Information Processing Systems, 2017, pp. 5105–5114. [29] Y. Aoki, H. Goforth, R. A. Srivatsan, and S. Lucey, “Pointnetlk: Robust and efﬁcient point cloud registration using pointnet,” in Conference on Computer Vision and Pattern Recognition, 2019, pp. 7163–7172. [30] X. Huang, G. Mei, and J. Zhang, “Feature-metric registration: A fast semi-supervised approach for robust point cloud registration without correspondences,” in Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 366–11 374. [31] S. Ao, Q. Hu, B. Yang, A. Markham, and Y. Guo, “Spinnet: Learning a general surface descriptor for 3d point cloud registration,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021. [32] C. Choy, W. Dong, and V. Koltun, “Deep global registration,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 2514–2523. [33] J. C. Gower, “Generalized procrustes analysis,” Psychometrika, vol. 40, no. 1, pp. 33–51, 1975. [34] G. Elbaz, T. Avraham, and A. Fischer, “3d point cloud registration for localization using a deep neural network auto-encoder,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017, pp. 4631–4640. [35] Z. Gojcic, C. Zhou, J. D. Wegner, L. J. Guibas, and T. Birdal, “Learning multiview 3d point cloud registration,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 1759–1769. [36] W. Lu, G. Wan, Y. Zhou, X. Fu, P. Yuan, and S. Song, “Deepvcp: An end-to-end deep neural network for point cloud registration,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 12–21. [37] B. Hall, “Lie groups, lie algebras, and representations: an elementary introduction,” Springer, vol. 222, 2015. [38] P. Chen, “A novel kernel correlation model with the correspondence estimation,” Journal of Mathematical Imaging and Vision, vol. 39, no. 82, pp. 100–120, 2011. [39] O. Hirose, “A bayesian formulation of coherent point drift,” Transactions on Pattern Analysis and Machine Intelligence, 2020. [40] S. U. C. G. Laboratory, “The stanford 3d scanning repository,” https://graphics.stanford.edu/data/3Dscanrep/. [41] Q. Hu, B. Yang, and Khalid, “Towards semantic segmentation of urban-scale 3d point clouds: A dataset, benchmarks and challenges,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.